Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Articulatory dynamics and stability in multi-gesture complexes
(USC Thesis Other)
Articulatory dynamics and stability in multi-gesture complexes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ARTICULATORY DYNAMICS AND STABILITY IN MULTI-GESTURE COMPLEXES
by
Miran Oh
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Linguistics)
December 2021
Copyright 2021 Miran Oh
ii
Acknowledgments
First and foremost, I would like to express my innermost gratitude to my advisor Dani Byrd, who always
sought ways for me to grow, taught me how to think as a scientist, believed in me, and anchored my
journey as a phonetician. Dani’s enthusiasm and perfection in maneuvering research projects have been
undeniably respectful and admirable, and I knew I could not go wrong under her guidance. From her, I
have learned a great deal of professionalism and gained abundant research opportunities necessary to earn
expertise as a linguist. I also want to express my deepest appreciation to Louis Goldstein, who always had
an answer when I needed one, guided me to a right path of thinking, and gave me wisdom to not worry
and to be myself. Their invaluable and insightful comments, suggestions, and feedback have significantly
improved this dissertation, and I would not have been successful without them. I deeply admire their
immense knowledge and passion in the scientific endeavor, which have encouraged me to pursue next
levels and to aim higher. I feel very lucky to have received their invaluable advice and treasured support,
and I am grateful for their persistent confidence in me. Thanks to them, my PhD journey has been a
glorious experience.
I would like to extend my sincere thanks to my other committee members, Shrikanth Narayanan
and Krishna Nayak, for their helpful comments and suggestions in shepherding this dissertation. I am also
thankful to my qualifying exam committee who were involved in the earlier stages of this dissertation.
The interactions with Khalil Iskarous and Asterios Toutios have benefited me greatly. They all provided
me with new perspectives and ideas that advanced my research.
In addition to my committee, I have received great help from many scholars at USC. I want to
thank the SPAN MRI acquisition team, Yongwan Lim, Weiyi Chen, Asterios Toutios, Colin Vaz, and
Tanner Sorensen, who I met weekly on Sunday nights to collect data over the past years. Thanks to them,
I was able to collect the real-time MRI data used for my dissertation project.
iii
My appreciation also goes to my friends and colleagues at USC Linguistics who have shared this
journey together and made it a memorable and fun experience. I am thankful to Sarah H. Lee, who has
been with me from the beginning and has encouraged and supported me tremendously all the time. I am
pleased to have met Yoonjeong Lee as my inspiring friend and caring collaborator. I want to extend my
gratitude to Hayeun Jang, Silvia Kim, Jina Song, Yoona Lee, Daniel Plesniak, Madhumanti Datta, and
Merouane Benhassine. I am also grateful to the members of USC Phonetics and Phonology Group: Yijing
Lu, Sarah Harper, Reed Blaylock, Yifan Yang, Mairym Llorens Monteserin, and everyone else who have
shared helpful comments in our group meetings.
I am truly thankful to my lovely ex-roomies Dahye (who was literally right next to me during my
last spurt) and Daye (whom I could share all our ups and downs with), and my dear friends Boba,
Sinjeong, Hyeyeon, Jihee, Susie, Emily, Yujin, Seoyeon, Jiin, Seoyoung, Minjung, Phuong, Minsok,
Junghoon, Jonghyeop, Yungho, Seungjong, Jinhyun, Hyunuk, Ukjin, Jun-Ha, Jinwoo, Yunseok,
Jeongmin, Hyunchul, Jungchan, and Dongwook, for their kind support and encouragement. I am also
grateful to my nine beautiful ladies who will elegantly grow old together with me: Hyewon, Heejin, Nuri,
Saehim, Sunhwa, Yeeun, Yeongju, Yewon, and Bomi. Our genuine friendship from high school will
continue to shine.
Finally, I would like to thank my loving family for always being there for me. I thank my mom
YoungYeah Hong, my dad Hye-Keun Oh, my sister Uran Oh, my grandmas, and my beloved grandpas
who are up there watching over me, for their endless love and support.
This dissertation was supported by funding from NSF 2116376 (PI: Byrd, Co-PI: Oh), from NIH
DC03172 (PI: Byrd), and from NIH DC007124 (PI: Narayanan).
Thank you everyone!
iv
Table of Contents
Acknowledgments ......................................................................................................................................... ii
List of Tables ............................................................................................................................................... vii
List of Figures ............................................................................................................................................ viii
Abstract …… ............................................................................................................................................... x
1. Introduction ...................................................................................................................................... 1
2. Larynx-Oral Complexes in Hausa Non-Pulmonic Consonants ....................................................... 7
2.1. Introduction ............................................................................................................................ 7
2.1.1. Instrumental Studies on Vertical Laryngeal Movement ............................................... 8
2.1.2. Predictions Assessed in the Present Study ................................................................. 10
2.2. Methods ................................................................................................................................ 14
2.2.1. Subjects ...................................................................................................................... 14
2.2.2. Data Acquisition ......................................................................................................... 15
2.2.3. Materials ..................................................................................................................... 15
2.2.4. Data Analysis ............................................................................................................. 17
2.3. Results .................................................................................................................................. 24
2.3.1. Vertical Laryngeal Activity ........................................................................................ 24
2.3.1.1. Larynx raising ................................................................................................ 24
2.3.1.2. Larynx lowering ............................................................................................ 26
2.3.2. Larynx-Oral Coordination .......................................................................................... 29
2.3.2.1. Ejectives vs. implosives ................................................................................. 29
2.3.2.2. Non-pulmonic consonants vs. pulmonic consonants ..................................... 30
2.4. Discussion ............................................................................................................................. 33
2.5. Conclusion ............................................................................................................................ 36
3. Prosodic Variability of Multi-Gesture Complexes: Ejectives and Implosives .............................. 38
3.1. Introduction .......................................................................................................................... 38
3.1.1. Superordinate Goals for Multi-Gesture Complexes ................................................... 39
3.1.2. Predictions Assessed in the Present Study ................................................................. 42
3.2. Methods ................................................................................................................................ 45
3.3. Results .................................................................................................................................. 47
3.3.1. Prosodic Lengthening and Strengthening Effects ...................................................... 47
3.3.1.1. Gestural duration ........................................................................................... 48
3.3.1.2. Gestural magnitude ........................................................................................ 50
3.3.2. Prosodic Timing Variability of Ejectives and Implosives .......................................... 53
3.3.2.1. Onset lags ...................................................................................................... 53
3.3.2.2. Vertical larynx onset to oral closure target lags ............................................ 54
3.3.3. Prosodic Timing Variability of Implosives and Voiced Plosives .............................. 56
3.3.3.1. Onset lags ...................................................................................................... 57
3.3.3.2. Larynx lowering onset to oral closure target lags .......................................... 58
3.3.3.3. Correlations between timing and duration ..................................................... 60
3.4. Discussion ............................................................................................................................. 61
3.5. Conclusion ............................................................................................................................ 64
v
4. Velum-Oral Complexes in Korean Singleton and Juncture Geminate Nasals ............................... 65
4.1. Introduction .......................................................................................................................... 65
4.1.1. The Phonological Representation of Underlying and Derived Geminates ................ 66
4.1.2. The Articulatory Properties of Geminates .................................................................. 69
4.1.3. Predictions Assessed in the Present Study ................................................................. 72
4.2. Methods ................................................................................................................................ 79
4.2.1. Subjects ...................................................................................................................... 79
4.2.2. Data Acquisition ......................................................................................................... 80
4.2.3. Materials ..................................................................................................................... 80
4.2.4. Data Analysis ............................................................................................................. 82
4.3. Results .................................................................................................................................. 84
4.3.1. Singletons and Geminates .......................................................................................... 85
4.3.1.1. Duration ......................................................................................................... 85
4.3.1.2. Magnitude ...................................................................................................... 88
4.3.1.3. Timing ........................................................................................................... 90
4.3.2. Geminate Types .......................................................................................................... 92
4.3.2.1. Count ............................................................................................................. 92
4.3.2.2. Duration ......................................................................................................... 93
4.3.2.3. Magnitude ...................................................................................................... 94
4.3.2.4. Timing ........................................................................................................... 96
4.4. Discussion ............................................................................................................................. 97
4.5. Conclusion .......................................................................................................................... 100
5. Prosodic Variability of Multi-Gesture Complexes: Nasals .......................................................... 102
5.1. Introduction ........................................................................................................................ 102
5.1.1. Prosodic Variability of Within-Segment Timing ..................................................... 103
5.1.2. Prosodic Variability of Singletons and Geminates ................................................... 107
5.1.3. Predictions Assessed in the Present Study ............................................................... 110
5.2. Methods .............................................................................................................................. 113
5.3. Results ................................................................................................................................ 115
5.3.1. Gestural Actions vs. Inter-Gestural Timing ............................................................. 115
5.3.1.1. Correlations between duration, magnitude, and timing ............................... 116
5.3.1.2. Gestural actions and timing across prosodic modulations .......................... 119
5.3.2. Variability in Timing: Onset vs. Coda Nasals .......................................................... 121
5.3.2.1. Onset timing ................................................................................................ 122
5.3.2.2. Consonant nasality timing ........................................................................... 124
5.3.2.3. Nasality lags across prosody ........................................................................ 125
5.3.2.4. Relative lags across prosody ........................................................................ 126
5.3.3. Variability in Timing: Assimilated vs. Concatenated Geminate Nasals .................. 128
5.3.3.1. Onset timing ................................................................................................ 128
5.3.3.2. Consonant nasality timing ........................................................................... 130
5.3.3.3. Nasality lags across prosody ........................................................................ 131
5.3.3.4. Relative lags across prosody ....................................................................... 132
5.4. Discussion ........................................................................................................................... 134
5.5. Conclusion .......................................................................................................................... 141
6. Computational Modeling of Timing for Multi-Gesture Complexes ............................................ 143
6.1. Introduction ........................................................................................................................ 143
6.2. Relative Phase Model ......................................................................................................... 148
6.2.1. Background .............................................................................................................. 148
6.2.1.1. Task Dynamic Application (TADA) ........................................................... 151
vi
6.2.1.2. Stability and coordination in dynamical systems ........................................ 152
6.2.1.3. Relative phase stability ................................................................................ 155
6.2.2. Predictions ................................................................................................................ 158
6.2.3. Methods .................................................................................................................... 161
6.2.4. Results ...................................................................................................................... 164
6.2.4.1. Relative phase patterns ................................................................................ 164
6.2.4.2. Stabilization time ......................................................................................... 167
6.2.4.3. Variations in relative phases ........................................................................ 170
6.2.5. Summary .................................................................................................................. 171
6.3. Machine Learning Classification of Multi-Gesture Complexes ......................................... 173
6.3.1. Background .............................................................................................................. 173
6.3.2. Predictions ................................................................................................................ 175
6.3.3. Methods .................................................................................................................... 178
6.3.3.1. Support vector machine classification ......................................................... 178
6.3.3.2. Parameters and feature selection ................................................................. 179
6.3.3.3. Classification procedure .............................................................................. 181
6.3.4. Results ...................................................................................................................... 182
6.3.4.1. Ejectives vs. implosives ............................................................................... 182
6.3.4.2. Implosives vs. voiced plosives .................................................................... 186
6.3.5. Summary .................................................................................................................. 189
6.4. Conclusion .......................................................................................................................... 190
7. Conclusion .................................................................................................................................... 191
References ……………………………………………………………………………………………… 196
Appendices ……………………………………………………………………………………………... 211
Appendix A: Hausa Stimuli …………………………………………………………………… 211
Appendix B: Korean Stimuli ……………………………………………….………................. 212
vii
List of Tables
Table 2.1. Target consonants in Hausa ..................................................................................................... 16
Table 3.1. Tests on timing variability in phrase-initial onset lags between ejective and implosives ....... 54
Table 3.2. Tests on variability in phrase-initial onset-to-target lags between ejective and implosives ... 56
Table 3.3. Tests on timing variability in phrase-internal onset lags for implosives vs. voiced Cs .......... 58
Table 3.4. Tests on timing variability in phrase-initial onset lags for implosives vs. voiced Cs ............. 58
Table 4.1. Background information of the subjects .................................................................................. 80
Table 4.2. Target consonants in Korean ................................................................................................... 80
Table 4.3. Prosodic conditions for Korean (/n#n/) ................................................................................... 81
Table 4.4. Tukey’s post-hoc pairwise differences of segment for TT duration ....................................... 86
Table 4.5. Tukey’s post-hoc pairwise differences of segment for VEL duration .................................... 86
Table 4.6. Count of identifiable VEL gestures (count/total token) .......................................................... 92
Table 5.1. Tests on the coefficients of variations in onset lags by syllable structure ............................. 123
Table 5.2. Tests on the coefficients of variations in nasality lags by syllable structure ......................... 125
Table 5.3. Tests on the coefficients of variations in onset lags by geminate type ................................. 130
Table 5.4. Tests on the coefficients of variations in nasality lags by geminate type ............................. 131
Table 6.1. Actual (observed) and model predicted relative phases for onset and coda nasals ............... 166
Table 6.2. Beta values for each variable in linear SVM models (ejectives vs. implosives) ................... 183
Table 6.3. Beta values for each variable in linear SVM models (implosives vs. voiced plosives) ........ 186
viii
List of Figures
Figure 2.1. Vocal tract mid-line from automatic calculation (left) & ROI overlayed image (right) ....... 17
Figure 2.2. ROI-overlayed images of a speaker producing /b/ (LAB), /d/ (COR), and /k/ (DOR) .......... 18
Figure 2.3. Pre-processing steps for automatic centroid tracking ............................................................ 20
Figure 2.4. The centroid tracking output of the production of a VCV sequence /ɑɠɑ/ ............................ 21
Figure 2.5. Temporal landmarks of a schematic gesture .......................................................................... 22
Figure 2.6. Larynx raising displacement (left) and larynx extremum (right) ........................................... 25
Figure 2.7. Larynx raising displacement (left) and extremum (right) for individual speakers ................ 25
Figure 2.8. Larynx raising displacement (left) and extremum (right) in ejective stops and fricatives .... 26
Figure 2.9. Larynx lowering displacement (left) and larynx extremum (right) ....................................... 27
Figure 2.10. Larynx lowering displacement (left) and extremum (right) for individual speakers ............. 28
Figure 2.11. Larynx displacement (left) and larynx extremum (right) for individual speakers ................. 28
Figure 2.12. Oral-vertical larynx timing in ejectives vs. implosives ......................................................... 30
Figure 2.13. Oral target to larynx onset lag in pulmonic consonants, ejectives, and implosives ............... 31
Figure 2.14. Onset lag in pulmonic consonants, ejectives, and implosives ............................................... 32
Figure 3.1. Oral duration (left) and LX duration (right) at phrase-internal and -initial positions ............ 49
Figure 3.2. Oral magnitude (left) and LX magnitude (right) at phrase-internal and -initial positions ..... 51
Figure 3.3. LX displacement at phrase-internal and -initial positions for implosives and voiced Cs ...... 52
Figure 3.4. Onset lags for ejectives and implosives ................................................................................. 54
Figure 3.5. Onset-to-target lags for ejectives and implosives at phrase-internal & -initial positions ...... 56
Figure 3.6. Onset lags for implosives and voiced Cs at phrase-internal & -initial positions ................... 58
Figure 3.7. Onset-to-target lags for implosives and voiced Cs at phrase-internal & -initial positions .... 59
Figure 3.8. Correlation graphs for intergestural timing and LX duration ................................................ 60
Figure 4.1. Tracking of the velum centroids over time in the production of an intervocalic /n/ ............. 83
Figure 4.2. TT duration (left) & VEL duration (right) for singleton and geminate nasals ...................... 85
Figure 4.3. TT closure (left) & release (right) duration for singleton and geminate nasals ..................... 87
Figure 4.4. VEL lowering (left) & raising (right) duration for singleton and geminate nasals ............... 88
Figure 4.5. TT magnitude (left) & VEL magnitude (right) for singleton and geminate nasals ............... 89
Figure 4.6. VEL lowering (left) & fronting (right) extremum for singleton and geminate nasals ........... 90
Figure 4.7. VEL lowering onset to TT onset lag (left) & TT onset to VEL raising onset lag (right) ...... 91
Figure 4.8. TT duration (left) & VEL duration (right) for geminate nasals ............................................. 93
Figure 4.9. TT magnitude (left) & VEL magnitude (right) for geminate nasals ...................................... 94
Figure 4.10. VEL lowering (left) & fronting (right) extremum for geminate nasals ................................. 95
Figure 4.11. VEL lowering to TT lag (left) & TT to VEL raising lag (right) for geminate nasals ............ 96
Figure 5.1. Temporal lags between TT and VEL gestures ..................................................................... 114
Figure 5.2. Correlation graphs (z-scored within speaker) for duration & magnitude (TT/VEL) ........... 116
Figure 5.3. Correlation graphs (z-scored) for onset lag vs. VEL duration & magnitude ....................... 117
Figure 5.4. Correlation graphs (z-scored) for onset-to-target lag vs. VEL duration & magnitude ........ 118
Figure 5.5. Correlation graphs (z-scored) for onset-to-target lag vs. TT duration & magnitude ........... 118
Figure 5.6. TT duration and magnitude at boundaries (Wd, AP, IP) and under focus (AP+focus) ....... 119
Figure 5.7. VEL duration and magnitude at boundaries (Wd, AP, IP) and under focus (AP+focus) .... 120
ix
Figure 5.8. Onset lags and TT onset-to-VEL target lags at boundaries (Wd, AP, IP) and under focus
(AP+focus) ........................................................................................................................... 121
Figure 5.9. Density plots for onset lags in onset & coda nasals ............................................................. 123
Figure 5.10. Density plots for nasality lags in onset & coda nasals ......................................................... 124
Figure 5.11. Nasality lags at boundaries and under focus in onset and coda nasals ................................ 126
Figure 5.12. Relative onset lags at boundaries and under focus in onset and coda nasals ....................... 127
Figure 5.13. Relative nasality lags at boundaries and under focus in onset and coda nasals ................... 128
Figure 5.14. Density plots for onset lags in geminate nasals: (a) overall and (b) individual ................... 129
Figure 5.15. Density plots for nasality lags in geminate nasals: (a) overall and (b) individual ............... 130
Figure 5.16. Nasality lags at boundaries and under focus in juncture geminate nasals ........................... 132
Figure 5.17. Relative onset lags at boundaries and under focus in juncture geminate nasals .................. 133
Figure 5.18. Relative nasality lags at boundaries and under focus in juncture geminate nasals .............. 134
Figure 6.1. Potential energy functions with in-phase timing variability (left) versus out-of-phase
timing stability (right) .......................................................................................................... 145
Figure 6.2. HKB potential function simulating anti-phase (left) to in-phase (right) transition ............. 149
Figure 6.3. Coupling graph for pa.pa#pa generated by TADA’s coupled oscillator model .................. 156
Figure 6.4. Relative phases for the production pa.pa#pa in TADA ....................................................... 157
Figure 6.5. Coupling graphs for Korean onset (left) and coda nasals (right) ......................................... 160
Figure 6.6. Coupling graphs for pa.na (left) and for pan.ta (right) ........................................................ 162
Figure 6.7. Relative phase plot for (pa.na) generated in TADA ............................................................ 163
Figure 6.8. Sample relative phase plots for the coupled oscillator simulation of (pa.na) ...................... 164
Figure 6.9. Sample relative phase plots for the coupled oscillator simulation of (pan.ta) ..................... 165
Figure 6.10. Relative phases for onset nasals (left) and coda (right) nasals (30 iterations) ..................... 168
Figure 6.11. Stabilization time of relative phases for onset and coda nasals ........................................... 169
Figure 6.12. Density plots of relative phases before stabilization for onset and coda nasals .................. 170
Figure 6.13. Sample trajectories for voiced implosives (left) and voiced plosives (right) ...................... 176
Figure 6.14. Histogram of onset lags for ejectives, implosives, and pulmonic Cs .................................. 177
Figure 6.15. Schematic gestural organization for ejectives, implosives, and voiced plosives ................. 178
Figure 6.16. 5-fold cross validation .......................................................................................................... 181
Figure 6.17. Sample classification results for ejectives (blue) and implosives (red); .............................. 184
Figure 6.18. Confusion matrix for ejectives vs. implosives SVM classification model .......................... 185
Figure 6.19. ROC curves for ejectives (left) and implosives (right) ........................................................ 185
Figure 6.20. Sample classification results for implosives (red) and voiced plosives (blue) .................... 187
Figure 6.21. Confusion matrix for implosives vs. voiced stops SVM classification model .................... 188
Figure 6.22. ROC curves for implosives (left) and voiced stops (right) .................................................. 188
x
Abstract
Speech production involves combined actions of multiple coordinated articulatory gestures. The atomic
linguistic units are elegantly coupled with one another to yield a structured spatiotemporal realization of
the gestural components of speech, thereby enabling humans to perceive and parse language effortlessly.
The goal of this dissertation is to develop our theoretical understanding of the linguistic representation of
articulatory coordination in speech production, drawing on an interdisciplinary approach incorporating
phonetics, phonology, biomedical imaging, computational modeling, and a dynamical systems approach
to motor control. The project undertakes four empirical real-time MRI (rtMRI) experiments to understand
how contrastive linguistic ‘molecules’—focusing on segment-sized multi- gesture complexes—interact
with positional and phrasal variation in speech, followed by modeling analyses of the self-organization
and coordination among these interacting levels of linguistic structure.
Specifically, this dissertation undertakes a kinematic examination of intergestural timing stability
within multi-gesture segments such as ejectives, implosives, and nasals that may possess specific
temporal goals critical to their realization. Using rtMRI speech production data from Hausa and Korean,
the dissertation illuminates speech timing among oral constriction and larynx/velum actions within
segments and the role this intergestural timing plays in realizing phonological contrast and processes in
varying prosodic contexts. Results demonstrate that within such segment-sized gestural molecules
coordination is inherently stable due to their specific internal intergestural coupling relations. We
successfully model the empirical findings on timing—in particular distinct patterns of timing
variability—via a dynamical coupling architecture or ‘graph’ among the component gestures. The
experimental and computational assessment of coordination in multi-gesture structures can reveal the role
of coupling relations and timing variability in phonological representation as realized in a variety of
syllabic and prosodic environments. This dissertation furthers our linguistic knowledge of how the basic
atoms of speech are synergistically built up to produce meaningful speech sounds and to convey linguistic
information.
1
1. Introduction
Human motor skills, including the activity of speech production, involve combined actions of multiple
coordinated components of complex intended events. In the motor system, the components of motor
synergy are expected to compensate—in a trading type exchange—among the component actions by
virtue of principled variability, and organization emerges automatically from the coordination of
components. In the Articulatory Phonology framework for linguistic phonological representation—the
framework adopted for this dissertation—the primitive phonological components are referred to as
articulatory gestures; these are the most basic, foundational units of phonology, simultaneously the action
primitives of speech production and the information encoding primitives of combinatorial phonology
(Browman & Goldstein 1990, 1995).
These gestural primitives are necessarily woven into complexes integrated in an elegant
choreography that yields the coordinated spatiotemporal realization that compose a word. These gestural
‘constellations’ (Browman & Goldstein 1992) or ‘molecules’ (Browman & Goldstein 2000, Saltzman et
al. 1998) may be realized and examined at a multitude of scales or granularities. For our purposes, the
largest molecule is the coordinated whole word itself (though in fact gestures or possibly gestural
complexes are likely coordinated with higher structural prosodic primitives as well, a point which we will
return to). But sub-word molecules—smaller than a word but larger than a single gesture—are in fact
crucial building blocks of words. These coordinated gestural complexes might arise to encode the syllable
structure of a word, cohering its nucleus and onset for example. Or a gestural complex may quite typically
be of a segment-sized granularity, a complex unit that we see deployed repeatedly in the combinatorial
phonology of the language’s words. (Though some segments are plausible represented with only single
gestures.) This dissertation will largely focus on gestural complexes that are of a traditional segment-sized
granularity. An understanding how the gestures in complex speech configurations behave with respect to
one another necessitates an understanding of how they are phased—their intergestural timing. Not only is
2
a qualitative or descriptive cross-linguistic account of gestural coordination in such structures requisite,
but we further seek a principled assessment of the types of patterns observed in speech production that
can account, at least in part, for the relative stability of these patterns: How are gestural complexes
encoded and to what degree and with what limitations are they temporally malleable?
Working within the Articulatory Phonology framework, we adopt a dynamical systems
perspective on the coordination of action. We model such coordination using coupled oscillators. To
understand the coordination structure between atomic speech gestures, intergestural timing relations are
implemented using gestural planning oscillators with a pair-wise coupling network (Goldstein et al. 2006,
2007, Saltzman & Byrd 2000, Saltzman et al. 2008). Relative timing between gestures, each represented
by an individual planning oscillator, is encoded via coupling relationships among the set of individual
gestures that stand in a coordination relationship. The network of such coupled gestures of an utterance is
referred to as its coupling graph. Coupling graphs encode information at the planning and
representational stage about how pairs of gestures are coordinated in time—i.e., by how they are phased
relative to one another. This coupling architecture thus defines the patterns of intergestural timing for
structures active in speech production. Such coupling graphs are hypothesized to include critical
information in the representation of a speaker’s phonological knowledge and are available in the speech
planning process.
The coupled oscillator dynamics offers a vehicle for the investigation of how gestures in the
coupling graphs are systematically phased in lexical forms. We can thus entertain questions related to the
stability and variability in intergestural timing within such a model. For example, a coupling graph
network allows specific theoretically motivated predictions about linguistic coordination patterns, by
controlling the parameters that shift the strength or tightness in coupling relations. Leveraging a
dynamical coupling architecture account of relative timing internal to multi-gesture complexes, the
dissertation examines the potential linguistic importance of how gestures are temporally coordinated—
which determines different relative phase patterns, e.g., in-phase and anti-phase relations, and to what
3
degree gestures are stably coordinated with each other—which determines the rigidity/flexibility in timing
in the face of linguistic and/or other systematic variations.
The issue of stability in timing can be considered with respect to a variety of linguistic structures.
Nittrouer et al. (1988) note the possibility that interarticulator phasing between articulatory gestures is
more stable within a segment than between adjacent segments. Yanagawa (2006) observe that there is a
weaker link between adjacent gestures at higher prosodic boundaries due to changes in the coupling
coordination. Stability in timing as well as its systematic variation inform us to whether these gestures in
coordination characterize an explicit phonological structure or element.
There are two major types of stability patterns: in-phase stability and anti-phase stability. In the
coupled oscillator planning model in Articulatory Phonology (Goldstein et al. 2009), syllable-initial
gestures are organized simultaneously (in-phase) and syllable-final consonantal gestures have a sequential
relation (anti-phase). Some empirical data (Byrd 1996a, Sussman et al. 1997) and some modeling (Nam et
al. 2009, Saltzman et al. 2006) suggest a differentially stability of onsets and codas, such that onsets are
more stable in their intergestural timing. And this distinction has also been observed in motor studies in
other domains of in-phase and anti-phase timing (Carson et al. 1995, Haken et al. 1985, Lee et al. 1995,
Wimmers et al. 1992). That said, in speech production research most intergestural timing studies
including stability analyses examine the stability of intergestural phasing between synchronous
component gestures (Löfqvist & Yoshioka 1984, Mücke et al. 2014, Nittrouer et al. 1988). Less is known
about the stability exhibited in anti-phase or off-phase relations.
We do have some clarity with regard to precedence relations in speech production. While
neighboring gestures are typically produced in an overlapping pattern, in many instances a specific
precedence (versus synchrony) is key in the successful production (and consequent perceptual retrieval)
of a word form. In Articulatory Phonology, precedence relations among gestures fall out systematically
from the specification of intergestural phase or coupling relations (Browman & Goldstein 1990, Byrd
1996b). Consider, for example, that ejectives and implosives are described as having a goal of increasing
or decreasing oral air pressure by changing the volume of the sealed oral cavity. To have a profound
4
effect on oral pressure changes, larynx raising or lowering must begin after an oral closure is achieved
(Ladefoged 1968, Kingston 1985). Other types of sequential timing patterns are observed in the studies of
multi-gestural components such as liquids and nasals. That is, for English tongue body gesture precedes
tongue tip gesture in liquids, and velum lowering occurs before the formation of oral constriction in
nasals (Krakow 1989, Browman & Goldstein 1995). It follows that if the precedence timing relation is
crucial in achieving a speech goal, this timing would be stable—or at least sufficiently stable—for
example across prosodic modulations or varying phonological contexts. Although a number of these
temporally complex patterns in multi-gestural structures have been described in the literature, only a few
studies are complemented with supporting articulatory data, and fewer still examine stability or lesser
studied languages. Understanding how component gestures are realized differently (or similarly) in
distinct multi-gesture articulatory structures (e.g., by examining in production their spatiotemporal
characteristics and relative temporal lag) can reveal a more complete and nuanced view of the realization
or articulatory manifestation of complex phonological structure built by coupling gestural primitives. The
study of multi-gestural coordination in complex gestural structures can reveal the role of coupling
relations in phonological encoding or representation and its consequences for the systematic variability
that characterizes articulatory gestures—individually or as complexes—when they are integrated into the
prosodic (informational) structuring of boundaries and prominence.
In the dissertation, the first objective is to characterize the system of intergestural behaviors in
select multi-gesture complexes involving non-oral gestural components—including upward and
downward larynx movement for glottalic consonants and velum lowering for nasals—and their
coordinated oral constriction gestures. While some segment-sized gestural molecules do involve two or
rarely three component gestures that are all oral—American [l] and [ɹ], palatalization and perhaps
diphthongs and geminates—a far larger number of gestural complexes of this granularity involve the
coordination of an oral constriction component with a non-oral component. Such complexes would
include but are not limited to: implosives, ejectives, clicks, nasals (and nasal non-pulmonics), voiceless
5
consonants, and tones with TBUs; yet we know little about the internal organization of such complexes.
In large measure, this lacuna has arisen due to the lack of instrumental data.
The current dissertation addresses this deficit by deploying real-time Magnetic Resonance
Imaging (rtMRI) for observing vocal tract actions during speech production and by developing novel
analysis techniques for tracking actions of the larynx and velum, in combination with state-of-the-art
rtMRI analysis of oral constriction formation. In addition to validating the spatiotemporal patterns
hypothesized for molecules incorporating non-oral-constriction gestures, this work is intended to unveil
the timing relations among gestures that are stable enough to act as a phonological “set” of gestures or a
distinct phonological unit. In our view, the more stable the ‘link’ between two gestures, the more likely
they are to comprise a single phonological (cognitive combinatorial) unit. Therefore, the findings from the
stability analysis of the multi-gestural timing patterns will aid the understanding of phonological
representation of multi-gestural complexes. Lastly, the third objective is to provide modeling analyses of
the temporal stability of the multi-gestural molecules identified in the empirical data, paying attention to
their systematic variability across prosodic perturbations. The modeling of temporal coordination will
shed light on the cognitive underpinnings of gestural coupling structures and deepen our understanding of
how speech gestures in human language may be organized and coordinated so as to instantiate internal
and relational linguistic structures.
The dissertation consists of four articulatory studies followed by modeling analyses of the
empirical findings. The real-time Magnetic Resonance Imaging (rtMRI) technique used to collect the
kinematic data is useful in obtaining high resolution spatiotemporal information from both oral and non-
oral (larynx and velum) articulators during speech production, in a way that surpasses other articulatory
data collection techniques such as ultrasound imaging, which is restricted to retrieving tongue contours,
and electromagnetic articulography (EMA), which is ill-suited to quantifying velum or larynx movement
due to difficulties in placing sensors on these interior parts of the vocal tract. The first articulatory study
(Chapter 2) examines larynx raising and lowering action and its relative timing with respect to
coordinated oral gesture in the production of glottalic consonants, specifically ejectives and implosives in
6
Hausa. Chapter 3 continues with Hausa to examine prosodic effects on gestural actions and intergestural
timing for both pulmonic and non-pulmonic gestural molecules. The third articulatory study (Chapter 4)
investigates velum and oral activities and their coordination in nasal geminates and singletons in Korean
at different prosodic boundaries. Chapter 5 investigates prosodic stability and variability in Korean nasal
consonant production. These results indicate that there are strong and stable off-phase timing patterns
between oral and non-oral component gestures associated with non-pulmonic consonants and nasal
segments. The last portion of the dissertation (Chapter 6), through dynamic modeling analyses (part 1)
and classification models (part 2), aims to enrich our understanding of the dynamic mechanisms that
underlie the internal organization of multi-gestures coupled with each other, with a special attention to the
differential stability in intergestural timing of oral and non-oral gestures associated with the multi-gesture
complexes. Each of chapters two through six are intended, for the most part, to stand-alone as empirical
or modeling contributions to the field, with largely their own introduction, method, results, and
conclusion. That said, the thesis taken as a whole provides a window of view into a diversity of multi-
gesture molecules in speech production, some of which have never been studied instrumentally in this
way—a view that we hope illuminates the internal structuring, stability, and potential
phonological/cognitive representation of these multi-gesture complexes. The dissertation concludes with
Chapter 7, which serves as an overview of the work’s arguments, findings, and implications.
7
2. Larynx-Oral Complexes in Hausa Non-Pulmonic Consonants
2.1. Introduction
Non-pulmonic consonants such as ejectives and implosives are produced with a glottalic airstream
mechanism by initiating airflow in the supralaryngeal vocal tract by means of changes in the vertical
larynx position (Catford 1971, Demolin 1995, Greenberg 1970, Ladefoged 1968). For instance, ejective
and implosive consonants involve rapid raising or lowering, respectively, of the larynx coordinated with
an oral constriction formation and release. Although vertical movement of the larynx is one of the major
characteristics of these non-pulmonic consonants, vertical larynx behavior itself is not unique to this set
(as voicing, pitch changes, etc. also involve vertical larynx movement), and the vertical aspect alone
cannot fully exhaust the articulatory characterization of the non-pulmonic stops. For example, although
phonological contrasts are associated with distinctive patterns of articulatory activity, no clear division is
made between implosives and voiced stops in their articulatory characteristics as they both permit
lowering of the larynx mainly to decrease oral air pressure and to maintain voicing, respectively
(Clements & Osu 2002, Kingston 1985, Ladefoged 1968, 1971, Ladefoged & Maddieson 1996).
1
It has
been suggested that there is a gradient continuum between one form of voiced stops and true implosives,
the latter being produced with a comparatively greater amount of lowering and more rapid descent of the
larynx during the coordinated oral gesture than the former (Ladefoged 1971, Ladefoged & Maddieson
1996). However, given that voiced implosives and voiced stops form two distinct categories as manifested
in their phonological distribution (Gallagher 2010, Greenburg 1970, MacEachern 1997, Mackenzie 2009,
Newman 2000), the claim that the two classes differ along a continuum without having robust auditory
and/or articulatory differences is unsatisfactory.
1
Voicing can be maintained not only through oral cavity expansion but also through nasal and/or oral leakage
during stop closure (Ohala & Solé 2010, Solé 2007, 2014, 2018).
8
In addition to the speed and magnitude of larynx raising/lowering, timing of the vertical larynx
movement with respect to its coordinated oral constriction formation is another area in need of more
exploration. Kingston (1985) raises the question of whether the timing of larynx movement varies to
create phonological contrasts. He suggests that larynx movement during ejectives and implosives is
“timed so that the larynx is at its highest or lowest point near the oral release, since maneuvers which
change the volume of the oral cavity have more profound effects on [oral air pressure] if they are initiated
after the oral closure is made” (Kingston 1985:17-18). Moreover, Maddieson and Ladefoged state that the
phonemic contrasts between non-pulmonics and their pulmonic pairs can be manifested by the
“differ[ence] in the mode of action of the larynx, or in the timing of laryngeal activity in relation to the
oral articulation” (1996:47). However, dynamic movement of the vertical larynx gesture for non-
pulmonic consonants and its coordination with oral gestures has not yet been widely studied, and no clear
evidence is found with accompanying quantifiable measures on the vertical larynx movement of non-
pulmonic stops.
2.1.1. Instrumental Studies on Vertical Laryngeal Movement
Previous studies measuring vertical movement of the larynx have mainly focused on the relation between
larynx height and tone/fundamental frequencies (f0). For example, the effects of tonal categories on
larynx height were investigated in Gandour and Maddieson’s (1976) cricothyrometer study on Standard
Thai and in Wang and Kong’s (2010) X-ray movie data of Mandarin speakers. Vertical positioning of the
larynx in singing was examined in Shipp (1975) and Neuschaefer-Rube et al. (1996) using lateral still
photographs and Magnetic Resonance Imaging (MRI) data, respectively. The change in the larynx height
with respect to fundamental frequencies (f0) is reported in studies using ultrasound measurements
(Hamlet 1980), MRI data (Hirai et al. 1994, Honda et al. 1999), X-ray photographs (Andersen &
Sonninen 1960, Lindqvist et al. 1973), a thyrometer (Kakita & Hiki 1976), electroglottography and
videofluorography (Laukkanen et al. 1999).
While there is much literature on the relation of vertical larynx position to f0 during speech, only
9
a few research studies have examined laryngeal activities in different segmental articulations. The
following studies investigated larynx mechanisms in sounds other than non-pulmonic consonants. Esling
and Moisik (2011) used videofluoroscopy with simultaneous laryngoscopy and laryngeal ultrasound to
observe larynx height during pharyngeal sounds. The data included information on temporal sequencing
of larynx raising and tongue retraction in aryepiglotto-epiglottal stops. In X-ray data comparing geminate
and singleton stops in Tarifit Berber (Bouarourou et al. 2015), no positive correlation was detected
between increased sub-glottal pressure in geminates and vertical elevation of the larynx and hyoid bone
position. Proctor et al. (2013) investigated laryngeal displacement during human beatboxing using a real-
time MRI (rtMRI) data. Various percussion sound effects from beatboxing resemble the production of
ejectives (e.g., kick effects), and these sound effects involve rapid upward vertical movement of the
larynx. Moreover, one of the sound effects (i.e., pre-labialized voiceless nasal uvular-dental click) in
beatboxing showed some degree of larynx lowering, although it may not be categorized as a glottalic
ingressive sound.
As for research specifically involving non-pulmonic consonants such as ejectives or implosives,
Shosted et al. (2011) proposed a method for imaging and quantifying the vertical displacement of the
larynx using exterior electromagnetic articulography (EMA). Their data included the production of
(bilabial) voiced, voiceless, ejective, and implosive stops by a phonetically trained American English
speaker. The estimation of the larynx position of these stops, however, is based on a limited data that is
not from a native speaker having ejectives and implosives in their language. Bückins et al. (2018) also
conducted an EMA study investigating the larynx movement in the production of Georgian ejectives, and
they found that ejectives are associated with larger and more upward movement of the skin above the
larynx compared to pulmonic sounds. However, both Shosted et al. (2011) and Bückins et al. (2018) use
the indirect method of estimation from placing EMA sensors on the exterior skin above the larynx (i.e.,
neck), which has not been validated. Another indirect method of detecting vertical larynx movement in
ejectives is introduced in Simpson and Brandt (2019), which estimates larynx traces by the relative
amplitudes of the signals from two pairs of electrodes using dual-channel electoglottography (DC-EGG).
10
They found that the movement in the larynx traces occurs close to ejectives stop releases compared to the
relatively stable traces of the pulmonic stops. Again, their data is obtained from trained phoneticians
rather than from native speakers of non-pulmonic consonants.
The study reported in the current paper on Hausa non-pulmonic consonants differs from previous
literature because it directly examines vertical larynx movement coordinated with ejectives and
implosives in native speakers’ production using imaging created with fast real-time MRI of the mid-
sagittal vocal tract. Although various methods for quantifying larynx height have been proposed, most of
them involve manual handling of the obtained data. For example, in Proctor et al. (2013), laryngeal
displacement was measured by manual selection of the end points of a larynx outline. In Honda et al.
(1999), vertical larynx position was defined as the rotation angle of the cricoid cartilage, which was traced
manually. Moreover, previous studies used either static data or non-dynamic methods that do not permit
real-time tracking of the laryngeal movement (Hirai et al. 1994, Honda et al. 1999, Proctor et al. 2013,
Shosted et al. 2011). The current study explores articulatory characteristics of the non-pulmonic
consonants using a time-varying centroid tracking technique (Tilsen et al. 2016, Oh et al. 2017, Oh & Lee
2018). The centroid tracking method is used to reveal information on vertical laryngeal movements of
ejectives and implosives, as well as the temporal relations between their vertical laryngeal gesture and
their coordinated supralaryngeal gestures.
2.1.2. Predictions Assessed in the Present Study
The language Hausa, mainly spoken in Nigeria and the Republic of Niger, is examined because there are
both ejectives and implosives, as well as their pulmonic counterparts, in this language. Hausa ejectives are
produced in three places of articulation—alveolar ejective fricatives (/s’/), velar ejective stops (/k’/), and
labio-velar ejective stops (/kw’/)—which can be compared with pulmonic consonants, /s/, /k/, and /kw/,
respectively. Hausa has bilabial and alveolar voiced implosives (/ɓ/ & /ɗ/), as well as their plain
counterparts: voiced bilabial and alveolar stops /b/ and /d/.
11
Two areas of investigation are addressed in the current study regarding the dynamics of the
articulation of non-pulmonic consonants in Hausa. The study examines the velocity profile of the
laryngeal gesture (Hypothesis A) and intergestural timing of the vertical larynx-oral coordination
(Hypothesis B).
The first hypothesis serves as a quantitative confirmation that the rtMRI images examined with
the data analysis protocol described below can provide an informative account of the articulatory patterns
of laryngeal movement for the speakers of Hausa and how they differentiate glottalic consonants from
pulmonic consonants in the language. Voiceless ejectives have been hypothesized to involve rapid larynx
raising, whereas voiceless pulmonics have not been necessarily associated with vertical larynx
movements. Previous studies on ejectives versus pulmonic stops (Bückins et al. 2018, Simpson & Brandt,
2019) have reported differences in the pattern of vertical larynx actions, which needs to be validated by
the direct articulatory imaging of the larynx. Thus, we compare vertical larynx actions in ejective and
pulmonic consonants. Moreover, it has been suggested that there is a gradient continuum between one
form of voiced stops and true implosives, the latter being produced with a comparatively greater amount
of lowering and more rapid descent of the larynx during the coordinated oral gesture than the former
(Ladefoged 1971, Ladefoged & Maddieson, 1996).
It is predicted that ejectives and implosives show raising and lowering, respectively, of the larynx
to a greater degree than a corresponding, paired pulmonic consonant (Clements & Osu 2002, Kingston
1985, Ladefoged 1968, 1971, Ladefoged & Maddieson 1996). Specifically, we assess whether ejectives
and implosives are reliably differentiated in terms of the direction of vertical larynx movement, as well as
movement magnitude (or postural extremum) for laryngeal behavior. In addition, larynx raising may vary
among different types of ejectives. For example, due to the fact that ejective fricatives have leakage at the
anterior oral constriction as opposed to the complete oral closure for ejective stops, ejective fricatives may
exhibit even larger and/or faster vertical larynx displacement than ejective stops so as to build up the
same amount (or sufficient) air pressure and/or to maintain the aerodynamic flow requirements for the
generation of turbulence. Specifically, Hypothesis A is tested as below:
12
HYPOTHESIS A. Non-pulmonic consonants show larger vertical larynx movement than their pulmonic
counterparts.
HYPOTHESIS A1. Voiceless ejectives (/k’, kw’, s’/) show larger upward larynx movement than
voiceless pulmonic consonants (/k, kw, s/).
HYPOTHESIS A2. Within voiceless ejectives, the manner class causes differences in vertical
larynx movements: ejective fricatives (/s’/) show larger upward larynx movement than ejective
stops (/k’, kw’/).
HYPOTHESIS A3. Voiced implosives (/ɓ, ɗ/) show larger downward larynx movement than
voiced pulmonic stops (/b, d/).
Turning next to intergestural timing, in the production ejectives, movements of the oral and the laryngeal
closure are reported to occur approximately simultaneously (Dent 1981), and the release of the oral
gesture is understood to critically precede the release of the laryngeal closure as the inverse order (i.e.,
glottal release before oral release) would not exhibit the requisite acoustical differences from voiceless
stops (Ladefoged & Johnson 2014). However, these observations have primarily concerned the timing
relation between the oral closure and the laryngeal adduction or glottal closure. Little information is
available about the coordination of the laryngeal raising, except that it must occur while the oral closure is
in place if a significant oral pressure differential is to result. The lag between oral closure and vertical
larynx gestures must be short, given the necessity for the larynx raising to create a volume compression in
the oral cavity (during the oral closure and before oral stop release).
In contrast, for implosives, the closure of the oral constriction gesture is reported to occur before
the downward movement of the larynx gesture, the two gestures being sequentially produced (Ladefoged
& Johnson 2014). Ejectives have an aerodynamic task goal of a sharp increase in air pressure that must
precede oral stop release, whereas implosives have a goal of maintaining pressure difference across the
glottis through various ways of vocal tract expansion. Based on these articulatory descriptions,
Hypothesis B1 predicts that the vertical movement of the larynx is produced synchronously with, or with
13
a very small lag relative to, a coordinated oral gesture in ejective stops, whereas the timing of the onsets
of larynx and oral gestures is predicted to be sequential in implosive stops. Further, it predicts that the
variation (indexed for example by the coefficient of variation) in the oral-vertical larynx lag will be less
for ejectives than for implosives, due to tight temporal constraints for ejectives compared to implosives.
HYPOTHESIS B1. The temporal lags between oral closure and vertical larynx movement onset are
simultaneous (near-zero) and more stable in ejectives and sequential (slightly longer) and more variable in
implosives.
Between non-pulmonic consonants and pulmonic consonants, both ejectives and implosives
require oral constriction formation to occur before the initiation of the vertical larynx movement. On the
other hand, for pulmonic consonants, for example in the production of plain voiceless stops, no such
temporal requirement is necessary between the oral gesture and the accompanying vertical larynx
movement (if there are any). Pulmonic voiced stops may involve larynx lowering movement so as to
facilitate voicing during closure via volume expansion, but the vertical larynx movement need not occur
at the moment of oral closure. Therefore, the precedence timing relation between oral and vertical larynx
movements is predicted to occur in the production of ejectives and implosives, but not in the production
of pulmonic consonants. Thus, the following hypothesis is entertained:
HYPOTHESIS B2. Oral gestures precede vertical larynx gestures in ejectives and implosives, while oral
and vertical larynx movements are not temporally constrained in pulmonic counterparts.
The gestural schemas presented in (1) demonstrates potential coupling relations between the oral
and the vertical larynx gestures. Coupled oscillator models are used to model and predict different
intergestural timing patterns in speech production, by representing how gestures are phased with one
another in the coupling graphs (Nam et al. 2009, Goldstein et al. 2007). The coupling relations between
oral and vertical larynx gestures are assumed based on the above predictions of temporal coordination
14
patterns (The examples here include Tongue Tip [TT] and vertical larynx [LX] vocal tract gestures for
alveolar consonants). While both ejectives and implosives involve the articulation of the oral (TT)
constriction gesture before the initiation of the vertical larynx (LX) gesture (indicated by the anti-phase
relation), no such timing relations are anticipated for pulmonic voiced stops (Hypothesis B2). In addition,
the in-phase relation between the oral closure target and the larynx raising onset postulated for ejectives
(1a) facilitates ejective’s aerodynamic goal of air pressure increase made by larynx raising during oral
closure (Hypothesis B1). Although the articulatory representation of each category must be validated with
experimental data, the coupling structures presented in (1) illustrate how the same set of gestures create
phonological contrasts by differentiating manner of articulation through crucial use of timing relations.
These coupling relations assumed for non-pulmonic and pulmonic consonants also generate predictions
on differential stability/variability on timing (e.g., ejectives have the most stable timing relations,
followed by implosives and voiced stops, respectively, predicted by the number of links in the coupling
graph), which will be further explored in Chapter 6.
(1) a. Ejective fricative /s’/ b. Implosive stop /ɗ/ c. Voiced stop /d/
2.2. Methods
2.2.1. Subjects
The subjects are three native Hausa speakers (S1-S3; ages ranging from 24 to 30 years old) residing in the
United States at the time of the experiment. They are all from Northern Nigeria. The subjects were
instructed to read aloud the target sentences written in Hausa orthography, which were presented on a
projection screen one at a time. The subjects spoke lying supine on a scanner bed and were able to read
15
the prompts from inside the scanner using a mirror, without moving their heads. The total recording time
including calibration and pauses between utterances was about one hour.
2.2.2. Data Acquisition
MRI data of the mid-sagittal vocal tract and audio data were simultaneously acquired using a real-time
MRI protocol developed for research on speech production (Narayanan et al. 2004). Data were acquired
at Los Angeles County Hospital on a 1.5 T scanner with gradient amplitude of 4.0 G/cm and 10.5
G/cm/ms slew rate. A 13-interleaf spiral gradient echo pulse sequence was used. Each spiral is acquired
over 6.004 ms (repetition time (TR)); therefore, every image comprises information spanning 13 × 6.004
= 78.052 ms. Image data were acquired at a rate of 12 frames/second, an imaging field of view (FOV) of
200 × 200 mm, and a flip angle of 15 degrees. Slice thickness was 6 mm, located mid-sagittally; image
size was 84 × 84 pixels yielding a spatial resolution in the sagittal plane 2.4 mm. Scan plane localization
of the mid-sagittal slice was performed using RTHawk (HeartVista, Inc., Los Altos, CA) (Santos et al.
2004). The videos were reconstructed with a 2-TR sliding window giving an effective frame rate of 83.3
frames/s (=1/(2×TR) = 1/(2×6.004 ms)), enabled by constrained reconstruction (Lingala et al. 2016,
2017).
Audio was simultaneously recorded inside the scanner at a sampling frequency of 20,000 Hz
while the subject was imaged. The audio signals were synchronized with the MR video signal for data
analysis per published protocols (Lingala et al. 2016, 2017, Narayanan et al. 2004). The recorded speech
was enhanced by acoustic denoising method developed in Vaz et al. (2018) and was synchronized with
the reconstructed dynamic MR images. The subject wore ear plugs for protection from the scanner noise
but was still able to communicate with the experimenters outside the scanner room via an in-scanner
intercom system.
2.2.3. Materials
Target tokens obtained for data analysis were voiceless velar and labialized velar ejective stops (/k’, k
w
’/),
16
alveolar ejective fricatives (/s’/), and voiced bilabial and alveolar implosive stops (/ɓ, ɗ/) in Hausa. In
addition, pulmonic voiceless stops and fricatives (/k, k
w
, s/) and voiced stops (/b, d/) were collected
(Table 2.1). These ten target consonants were located in two prosodic conditions: phrase-initial and
phrase-internal positions. An example of two prosodic conditions is given in (2).
(2) a. Phrase-initial
Fàɗa: sàu ɗaya, kà:za: shine kalmà: à Hausa.
‘Say once, chicken is the word in Hausa.’
b. Phrase-internal
À yànzu, biyà: kà:za: kàmar
̃ kalmà: à Hausa.
‘Right now, read aloud chicken as a word in Hausa.’
Note. Low tone is marked by a grave accent (`), and long vowels are marked by a colon (:). In the
presentation to subjects, however, tone and duration are not marked, as is standard in written Hausa.
Each carrier sentence is 14 syllables long, and the target word is located six syllables from the beginning
of the sentence. Each target consonant is at the onset of a bi-syllabic word having a Low-High tone
sequence. All target consonants were preceded and followed by the vowel /ɑ/. Each target item was
repeated 7 times in a randomly ordered list. A total of 140 tokens (10 consonants × 2 prosodic conditions
× 7 repetitions) were collected for each speaker.
Table 2.1. Target consonants in Hausa
Bilabial Alveolar Velar Labio-velar
Plosive b d k k
w
Implosive ɓ ɗ
Ejective s’ k’ k
w
’
Fricative s
Nasal m n
17
2.2.4. Data Analysis
Two techniques were used to obtain information on articulatory timing of the supralaryngeal and
laryngeal (vertical) gestures from the real-time MRI video recordings. First, a Region-of-Interest (ROI)
technique (Bresch et al. 2010, Lammert et al. 2013b, Proctor et al. 2011) was used to track supralaryngeal
constriction formation over time—specifically lips, tongue tip, and tongue body movements involved in
the production of bilabial, alveolar, and velar consonants, respectively. A mid-line of the vocal tract was
calculated by selecting pixels with the highest standard deviation over time (Figure 2.1: left). The regions
placed along this mid-line most effectively capture the fluctuation of pixel intensities (Blaylock et al.
2016, Lammert et al. 2013a). Three pseudo-circular regions with a radius of three pixels were manually
placed along the automatically derived mid-line over the locations of oral constriction formations (Figure
2.1: right). The first region (‘LAB’) was selected around the front of the lower lip so that the region
covers the movement of the lower lip when it is most protruded and so that the boundary of the region
touches the upper lip, which does not show (much) active movement for this speaker. The second region
(‘COR’) was placed over the location of tongue tip constriction, that is, with its top edge at the alveolar
ridge and immediately posterior to but non-overlapping with the labial region. The final region (‘DOR’)
was placed at the front-end of the soft palate (velum) to capture movement for tongue rear (velar)
constriction gestures.
Figure 2.1. Vocal tract mid-line from automatic calculation (left) & ROI overlayed image (right)
(right: image of a speaker producing a vowel /ɑ/, with regions of interest
[LAB in yellow, COR in pink, and DOR in green])
18
Examples of the constrictions produced by labial, tongue tip, and tongue body gestures are
presented in Figure 2.2. The average pixel intensity in each region (a circular region with 261 sq mm) was
calculated frame-by-frame. The pixel intensity values over time provide time series reflecting articulator
motions, with higher mean pixel intensity indicates a greater amount of tissue in the region (Lammert et
al. 2013a), as the active articulator of interest forms a constriction at the passive articulator along the
upper surface of the vocal tract. Each of these circular ROIs are full when a corresponding oral
constriction is mostly formed (Figure 2.2). These time series were smoothed using a locally weighted
linear regression technique with the kernel width of h=.9 (Lammert et al. 2013a, Proctor et al. 2011).
/b/ /d/ /k/
Figure 2.2. ROI-overlayed images of a speaker producing /b/ (LAB), /d/ (COR), and /k/ (DOR)
The second technique used for the articulatory analysis—specifically for laryngeal movement in
this study—is a centroid tracking technique. This tool was developed for this project’s work, though its
possible extensions are many, and tracks the centroid of a moving object within a region of interest in the
real-time MR video (Oh et al. 2017, Oh & Lee 2018; see also Tilsen et al. 2016). In contrast to the oral
constriction formation gestures with specific goals for constriction location, movement of the velum and
larynx (involved in the production of oral/nasal articulation and ejectives/implosives, respectively) may
not have a specific region or constriction location parameter linguistically defined in the vocal tract. (It is
19
beyond the scope of this study to define the speech production goals of these gestures but certainly
open/closed and raised/lowered have been conceived of for the velum, and aerodynamic goals are
possible for the vertical larynx movement in glottalic consonants.) Given that these velum and laryngeal
actions are of interest in the current study, a centroid tracking technique is implemented to capture the
movement of these articulators.
Vertical movements of the larynx were measured by tracking the time-varying pixel intensity
centroid (i.e., intensity-weighted average spatial position) of manually selected rectangular Vocal Tract
Regions (VTRs). For the larynx, a fixed VTR (‘LX’) was selected for each subject based on the location
of cervical vertebra—defined from the bottom line of the 2nd cervical vertebra to the bottom line of the
4th cervical vertebra and having the posterior side of the larynx region placed at the rear pharyngeal wall.
The dimension of the larynx VTR was 4 to 6 pixels in width and 14 pixels in height, which was ‘tall’
enough to include the highest and the lowest position of the larynx inside the region for each speaker.
2
Once the VTR was defined, an initial seed is selected anywhere on the object of interest (see
Figure 2.3 and Oh & Lee 2018). The centroid of the object is then automatically calculated for each frame
over time using the following protocol. Based on each pixel’s intensity values, a binary matrix was
acquired by designating 1 for pixels brighter than the intensity based on the average and standard
deviation across the region (with a 95% confidence interval), and 0 otherwise. This binary matrix is used
to get connected components (CC) in the VTR using the flood-fill algorithm (Yapa & Koichi, 2007). The
intensity-weighted centroid of each connected component was calculated, and the centroid that was
closest to the seed was set as the tracked centroid of the first frame. In the following frame, the centroid
closest to the previous frame was set as the current centroid of a given frame. This process was
undertaken so as to capture only the movement of an object that is of interest and to prevent other objects
that come into the VTR from impinging on the calculation of the larynx centroid.
2
Larynx (‘LX’) VTR sizes for each speaker: Speaker A (5 × 14), Speaker B (6 × 14), and Speaker C (4 × 14).
20
(a) (b) (c)
Figure 2.3. Pre-processing steps for automatic centroid tracking
(a) manual selection of the rectangular VTR for larynx, (b) seed selection to capture only the larynx
object, (c) calculation of intensity-weighted centroids from raw images and connected components (CC)
By way of exemplifying the algorithm, Figure 2.4 shows the output for a trained phonetician
producing the VCV sequence /ɑɠɑ/ with an intervocalic voiced velar implosive (from the USC-IPA
dataset; Toutios et al. 2016). The VTR selected for the larynx is shown as a blue box on the top-left plot.
The vertical centroid plot (bottom-center) of the larynx region exhibits a clear lowering of the larynx
(starting at about frame 20 and ending around frame 60), as we expect to see in the production of
implosives. The novel centroid tracking technique can also capture velum raising/lowering movement
(e.g., bottom-right plot), as well as the horizontal/longitudinal larynx movement that may be associated
with vertical larynx actions (though this latter potential use has not been thoroughly assessed at this
point). This tool enables obtaining kinematic profiles of various articulatory movements that have not
previously been satisfactorily quantifiable. In the current study, we will focus on the vertical movement of
the larynx.
21
Figure 2.4. The centroid tracking output of the production of a VCV sequence /ɑɠɑ/
(1st VTR: larynx, 2nd VTR: velum; y-axis: pixels, x-axis: frames)
The vertical position of the centroid for the larynx gesture is retrieved as the resulting signal from
the MRI data. To reduce noise and intensity fluctuations, all signals were smoothed by a loess smoothing
(i.e., a locally weighted scatter plot smooth method) using a quadratic polynomial regression model with a
local span of 30 data points.
Based on the trajectories obtained from the ROI technique and the centroid tracking technique,
the temporal landmarks of the articulatory actions used in producing the target Hausa consonants were
calculated using the find_gest algorithm (Tiede 2010) (Figure 2.5). Movement onset (ONS) was defined
to be the point at which velocity reached 20% of its first maximum velocity for the movement towards the
target. Peak velocity (PVEL) was defined at the maximum velocity point during the movement. Target
attainment (TONS) was identified as the timepoint after maximum velocity and before maximum
extremum position at which velocity crossed a 20% threshold of the first maximum velocity. The
maximum displacement of the gesture (MAX) was defined at the velocity minimum closest to the
22
movement’s peak intensity/centroid weighting. The target offset (TOFF) was calculated at the point
following maximum displacement at which velocity increased above the 20% threshold of the gesture’s
maximum velocity for the movement away from the target.
Figure 2.5. Temporal landmarks of a schematic gesture
Derived variables of intragestural durations and intergestural lag were quantified using the
following temporal landmarks (see also Figure 2.5) for the activity of the vertical larynx centroid value
and the oral constrictions’ pixel intensity value:
(3) Temporal Landmarks
• Movement Onset (ONS): the beginning of the articulator motion
• Peak Velocity (PVEL): the maximum velocity before target achievement
• Target Onset (TONS): when the movement was mostly reaches MAX (target attainment)
• Maximum Target (MAX): maximum displacement of the movement
• Target Offset (TOFF): the beginning of the movement away from the target
Displacement of the larynx was calculated by the change in position between the location of
larynx maximum (MAX) and the movement onset position (ONS). Larynx extremum was determined by
23
the absolute position of the larynx in the VTR at movement maximum (MAX). Various measures of
temporal lag between larynx and oral gestures were measured by subtracting the landmarks of oral
closure (Oral_TONS, Oral_MAX, and Oral_TOFF) from the beginning of laryngeal movement
(‘LX_ONS’). In addition, vertical larynx movement was measured in two ways: (i) displacement (vertical
larynx position at its maximum [MAX] subtracted by the larynx position at the movement onset [ONS])
and (ii) extremum (the absolute vertical position of the larynx at its movement maximum in the LX VTR).
Therefore, the dependent variables in this study are:
(4) Dependent Variables
• Displacement (change in position between MAX & ONS)
• Extremum (absolute position of the larynx at its spatial maximum in a gesture)
• Temporal lags (LX_ONS minus Oral_TONS, Oral_MAX, Oral_TOFF, or Oral_ONS)
(Positive lag values indicate that the larynx movement onset follows the oral landmark.)
o Interval from oral closure target to vertical larynx movement onset
o Interval from oral maximal closure to vertical larynx movement onset
o Interval from oral release onset to vertical larynx movement onset
o Interval from oral movement onset to vertical larynx movement onset
Tokens were necessarily omitted for individual dependent variables when the gestures in the
target word were not captured with the find_gest algorithm.
3
For example, utterance-initial LAB gestures
for bilabial stops were not likely to be identified when a speaker had closed lips during the pause between
utterances.
For statistical testing, linear mixed effects regression models are used with subjects, items, and
3
Number of items (out of 14 tokens for each consonant) without quantifiable vertical larynx movement, and thus
omitted from analysis, are represented inside the parentheses. S1: /m/ (8), /n/ (7), /ɗ/ (1), /b/ (3), /d/ (2), /k/ (3), /k
w
/
(1), /s/ (1), /k
w
’/ (2); S2: /m/ (2), /n/ (2), /b/ (3), /k/ (1), /k
w
/ (2), /k
w
’/ (1); S3: none.
24
prosodic boundaries as random effects with Tukey’s post-hoc pairwise comparison tests;
4
Levene’s tests
for homogeneity of variance (HOV) with means were used. The level of statistical significance was set as
p < .05.
2.3. Results
2.3.1. Vertical Laryngeal Activity
Ejectives and implosives are predicted to show larger and faster vertical movement of the larynx than
their pulmonic counterparts (Hypothesis A). To test this hypothesis, the activities of larynx raising in
ejectives and larynx lowering in implosives are examined individually. The two prosodic conditions
(phrase-initial and phrase-internal) are pooled for statistical analyses.
2.3.1.1. Larynx raising
The magnitude of larynx raising is reflected in two dependent variables: larynx displacement (in mm) and
extremum (in px). The displacement measure indicates the change in vertical position from the movement
onset to extremum, whereas the absolute vertical extremum position indicates where within the vocal tract
the larynx is located at its highest/lowest for a gesture. In Figure 2.6, vertical larynx actions involved in
voiceless consonants and voiceless ejectives is compared in terms of vertical displacement and absolute
extremum position. The linear mixed effects model shows that ejectives have larger displacements and
higher extremum positions of the larynx than pulmonic consonants (displacement: F(1,209.61) = 8.145, p
< .01*; extremum: F(1,191.5) = 8.341, p < .01*), supporting Hypothesis A1.
4
For all linear mixed effects regression models and Tukey’s post-hoc comparisons, the Kenward-Roger’s method, a
more conservative degrees of freedom method than the default Satterthwaite’s method, was used for lmerTest
package in R (Kuznetsova et al. 2017).
25
Figure 2.6. Larynx raising displacement (left) and larynx extremum (right)
Individual speaker results for larynx magnitude (Figure 2.7) show that the significant differences between
voiceless consonants and ejectives are found in larynx displacement for S1 and S3 and in larynx
extremum for S2 and S3.
5
This suggests that each speaker may exploit different low-level patterns for
producing contrast between plain and ejective consonants, while exhibiting the common general pattern
of ‘bigger’ vertical larynx movement for ejectives.
Figure 2.7. Larynx raising displacement (left) and extremum (right) for individual speakers
5
Larynx displacement for S1: F(1,69.002) = 7.401, p = .008*; for S2: F(1,65.959) = 0.096, p = .758; for S3:
F(1,70.972) = 7.850, p = .007*; Larynx extremum for S1: F(1,61.173) = 0.307, p = .582; for S2: F(1,71.827) =
9.485, p = .003*; for S3: F(1,57.944) = 6.487, p = .014*
26
Let’s turn next to larynx raising among ejectives with different manner of articulation; that is,
ejective stops (/k’, k
w
’/) versus ejective fricatives (/s’/). There is a main effect of segment on larynx
raising displacement (F(2,96.141) = 7.474, p < .001*; Figure 2.8: left), with Tukey’s post-hoc tests
suggesting that the larynx raises more in the production of ejective fricatives compare to that of ejective
stops (/s’/ vs. /k’/: t(96.4) = 3.765, p < .001*; /s’/ vs. /k
w
’/: t(108.1) = 2.624, p = .027*; /k
w
’/ vs. /k’/:
t(86.6) = 1.06, p = .541). The effect of segment on larynx extrema is also significant (F(2,87.152) =
10.325, p < .001*; Figure 2.8: right), but the distinction is not due to the manner class contrast but rather
velar ejective stops have lower extremum positions than labio-velar ejective stops (/k
w
’/ vs /k’/: t(76.9) =
4.316, p < .001*) and alveolar ejective fricatives (/s’/ vs. /k’/: t(88.1) = 3.252, p < .01*), but no distinction
in larynx extrema is found between the latter two segments with different manner of articulation (/s’/ vs.
/k
w
’/: t(100.6) = -1.043, p = .551). Thus Hypothesis A2 predicting larger larynx magnitude on ejective
fricatives compared to ejective stops receives mixed support, with differences specifically in larynx
raising displacement being in the predicted direction.
Figure 2.8. Larynx raising displacement (left) and extremum (right) in ejective stops and fricatives
2.3.1.2. Larynx lowering
Larynx lowering magnitude is measured by lowering displacement and by extremum (i.e., the absolute
position in the VTR when the larynx is maximally lowered). Voiced pulmonic consonants (/b, d/) and
27
voiced implosives (/ɓ, ɗ/) are compared; it was predicted that larynx lowering magnitude will be greater
in implosives than in voiced pulmonic consonants (Hypothesis A3). Findings, however, show that neither
larynx lowering displacement (F(1,134.13) = 2.211, p = .139) nor extrema (F(1,119.9) = 0.265, p = .608)
differentiates these two classes (Figure 2.9).
Figure 2.9. Larynx lowering displacement (left) and larynx extremum (right)
This lack of distinction in the degree of lowering between plain voiced stops and voiced implosives is
consistent in the results of individual speakers (Figure 2.10). No speaker (S1-S3) exhibits differing
degrees of larynx lowering (either in displacement or in extremum) between their production of voiced
pulmonic consonants and implosives. This implies that the relatively well-accepted previous description
of this phonological contrast stating that implosives have larger downward larynx movement than
pulmonic consonants (Hypothesis A3; Clements & Osu 2002, Ladefoged 1971, Ladefoged & Maddieson
1996) is not supported in our spatial data on larynx lowering.
28
Figure 2.10. Larynx lowering displacement (left) and extremum (right) for individual speakers
Figure 2.11. Larynx displacement (left) and larynx extremum (right) for individual speakers
29
Figure 2.11 presents overall larynx displacement and extremum data for the individual speakers,
including nasal consonants (in green) as well as pulmonic and non-pulmonic stops and fricatives. In this
figure, we observe a gradual pattern of larynx displacement increasing in the order of nasals < voiced
implosives < voiced stops (three of which involve larynx lowering) < voiceless consonants < voiceless
ejectives (two of which involve larynx raising).
2.3.2. Larynx-Oral Coordination
In this section, the coordination of the vertical larynx gesture and the oral gesture is compared between
ejectives and implosives (Hypothesis B1) and between non-pulmonic consonants (ejectives and
implosives) and pulmonic consonants (Hypothesis B2). We review these hypotheses below.
2.3.2.1. Ejectives vs. implosives
Ejectives are predicted to exhibit smaller lags between vertical larynx and oral gestures than implosives
(Hypothesis B1), as larynx raising is (described as) synchronous to the oral closure period in ejectives
whereas larynx lowering has been described as sequential to oral closure in implosives. Figure 2.12
presents temporal lags from oral closure—three landmarks of oral target achievement [Oral_TONS], oral
maximum constriction [Oral_MAX], and oral release onset [Oral_TOFF])—to the movement onset of the
coordinated vertical larynx gesture (LX_ONS). The negative lags between larynx and oral gestures
(dashed line indicating the median, in the results shown in Figure 2.12) indicate that vertical larynx
movement starts before the temporal landmarks of the oral gesture. Temporal lags are closest to zero in
the leftmost plot in Figure 2.12, which represents the timing between oral constriction target achievement
and vertical larynx onset (median for ejectives: -36.1 ms; median for implosives: -12 ms).
30
Figure 2.12. Oral-vertical larynx timing in ejectives vs. implosives
Linear mixed effects models indicate that these temporal lags do not differentiate ejectives from
implosives (LX_ONS – Oral_TONS: F(1,175.09) = 1.993, p = 0.160; LX_ONS – Oral_MAX:
F(1,175.54) = 2.922, p = .089; LX_ONS – Oral.TOFF: F(1,174.71) = 3.249, p = .073). That said, the lags
are more variable for implosives compared to ejectives for all three intervals of gestural overlap. Levene’s
tests for coefficients of variation indicate significantly less variability for ejectives than for implosives in
all three lag measures (LX_ONS – Oral_TONS: F(1,204) = 8.631, p < .01*; LX_ONS – Oral_MAX:
F(1,204) = 10.603, p < .01*; LX_ONS – Oral_TOFF: F(1,204) = 13.191, p < .001*). Taken together, the
temporal lag patterns between oral and vertical larynx gestures are similar in ejectives and implosives, but
ejectives exhibit a tighter coordination than implosives in that intergestural timing is less variable in
ejectives compared to implosives.
2.3.2.2. Non-pulmonic consonants vs. pulmonic consonants
For Hypothesis B2 we test whether oral gestures precede vertical larynx gestures in ejectives and
implosives, while oral and vertical larynx movements exhibit a freer coordination in pulmonic
counterparts. The temporal lag between oral closure achievement and the coordinated vertical larynx
gesture (i.e., LX_ONS – Oral_TONS) in ejectives and implosives is found to be synchronous (see Figure
2.12: left), but for pulmonic airstream mechanism consonants with no specific aerodynamic constraints in
the timing between oral closure and vertical larynx movement, intergestural timing is expected to be more
31
variable (Hypothesis B2). This oral target achievement to larynx onset lag (LX_ONS – Oral_TONS) for
non-pulmonic and pulmonic consonants is plotted in Figure 2.13 (dashed line indicates median).
Figure 2.13, which is an extension of the leftmost plot in Figure 2.12, presents intergestural
timing for pulmonic consonants as well as ejectives and implosives. While non-pulmonic consonants
show a near-zero lag, pulmonic consonants exhibit a negative lag, consistent with a much earlier start of
vertical larynx movement relative to the oral closure target achievement. This is statistically confirmed in
linear mixed effects models showing a main effect of consonant type on oral target to larynx onset lag
(F(2,364.48) = 25.622, p < .001*). Tukey’s post-hoc pairwise comparisons indicate that non-pulmonic
consonants have smaller negative lags than pulmonic consonants, while no lag difference is found
between the pairs of non-pulmonic consonants (ejectives vs. pulmonic Cs: t(363) = 5.52, p < .001*;
implosives vs. pulmonic Cs: t(356) = 6.142, p < .001*; ejectives vs. implosives: t(381) = -1.182, p
= .465). Notably, Levene’s tests show that pulmonic consonants show a more variable timing than
glottalic consonants for this temporal lag (F(1,380) = 10.816, p < .01*), and this is clearly seen in the
qualitative data in Figure 2.13.
Figure 2.13. Oral target to larynx onset lag in pulmonic consonants, ejectives, and implosives
32
Onset-to-onset lags (from oral movement onset to vertical larynx movement onset) are a useful
illustration of how the two gestures are phased to each other. Figure 2.14 shows that for pulmonic
consonants, onset lags are near-zero (median is 0 ms), when ejectives and implosives have positive onset
lags (median for ejectives: 72.1 ms; median for implosives: 84 ms). There is a main effect of consonant
type on onset lags (F(2,361.25) = 20.119, p < .001*), and pairwise comparison tests reveal that non-
pulmonic consonants have longer onset lags than pulmonic consonants (Tukey’s: ejectives vs. pulmonic
Cs: t(360) = 5.436, p < .001*; implosives vs. pulmonic Cs: t(353) = 4.887, p < .001*; ejectives vs.
implosives: t(377) = -0.093, p = .995). With regard to timing variability, pulmonic consonants show more
variable onset lags than the lags in ejectives and implosives (F(1,380) = 9.355, p < .01*).
Figure 2.14. Onset lag in pulmonic consonants, ejectives, and implosives
Both observed temporal lags (i.e., oral target to larynx onset lags and onset lags) indicate that
pulmonic and non-pulmonic consonants have fundamentally different timing patterns: non-pulmonic
consonants are produced with larynx raising/lowering starting at the oral closure target (i.e., larynx moves
later than oral closure onset), while oral constriction and vertical larynx movements are synchronous in
the production of pulmonic consonants. Furthermore, the temporal coordination between the oral and the
33
vertical larynx gesture is more stable in ejectives than in implosives; likewise the timing relations are less
variable in non-pulmonic glottalic consonants than in pulmonic consonants.
2.4. Discussion
This chapter illuminates the airstream mechanism activities of the vertical larynx gesture and the temporal
coordination with its associated oral gesture in glottalic consonants, which have not been directly studied
in the previous literature. The results on vertical larynx magnitude measured by displacement and larynx
extrema indicate that voiceless ejectives (which involve larynx raising) show larger upward movement
than voiceless pulmonic stops and fricatives (which have no apparent basis for larynx raising). Individual
speaker results indicate that speakers may use different mechanisms to increase raising magnitude, either
by starting from a lower beginning position so that the larynx can move further up or by ending higher or
both. Both serve to decrease the volume of oral cavity and presumably create the raised air pressure that is
critical for the class of ejective consonants.
In contrast, between voiced implosives and voiced pulmonic stops, no significant difference is
found in larynx lowering. Unlike previous literature describing contrasts in lowering magnitude between
implosives and plain voiced consonants (Ladefoged 1971, Ladefoged & Maddieson 1996), this spatial
difference is not exhibited in the current data. This is in fact consistent however with previous
observations that implosives in Hausa and other languages such as Owere Igbo do not necessarily or even
often drop below atmospheric pressure (Clements & Osu 2008, Ladefoged 1971, Ladefoged et al. 1976).
In fact, Ladefoged et al. (1976) describes that voiced bilabial implosives in Owere Igbo, compared to
voiced bilabial stops, do not have increase nor decrease in oral pressure during the closure, the air
pressure being approximately the same inside and outside the oral seal (i.e., the mouth). The sole fact that
larynx lowering occurs not only in implosives but also in voiced consonants inhibits “larynx lowering”
from being a distinct property of implosive consonants. Instead, implosives may be more adequately
characterized by the lack of increase in oral air pressure, which must be validated with aerodynamic data.
34
On the other hand, when larynx raising actions are compared among different ejectives, ejective
fricatives exhibit larger larynx raising displacement (but not higher extremum) than ejective stops. This
mechanism of a larger upward movement appears to serve the need for creating a sufficiently large
increase in oral air pressure for the ejective fricatives, which, with their lack of a complete seal in the oral
cavity and their consequent venting to create turbulent flow anterior to the constriction, require extensive
larynx raising to build up enough oral pressure need for both the glottalic airstream mechanism and for
the turbulent airflow.
Turning next to temporal lags between vertical larynx and oral gestures, the results show that
larynx raising and lowering movement begins slightly before the achievement of the oral closure in the
non-pulmonic consonants. Variability in temporal lags suggests that the segment-internal intergestural
lags for ejectives are more stable than for implosives. In line with the prediction that vertical larynx and
oral gestures in non-pulmonic consonants are more tightly coupled to each other than those in pulmonic
consonants due to the gestures’ joint achievement of the aerodynamic goal for non-pulmonic consonant
segments, the results in the current experiment indicate that the oral-vertical larynx lags in ejectives and
implosives are more stable than those in pulmonic consonants. While the current chapter investigated
token-to-token variability as an indicator of temporal stability in coordination, the issue of variability will
be investigated further in the next chapter by probing different prosodic contexts for these multi-gesture
complexes and observing the effect of prosody on timing relations for these complex multi-gestural
segments.
In differentiating implosives and voiced stops the degree of larynx lowering is similar, suggesting
that the phonological contrasts between two stop classes may be manifested in other articulatory
activities, such as the timing between oral constriction and vertical larynx movement or the details of the
glottal adduction. Among many possibilities, the current findings suggest that a distinction between the
two phonological categories is apparent in their segment internal temporal lags. Both the vertical larynx
onset to oral target lags and the larynx-oral onset lags show that for implosives the larynx lowering starts
after the onset of oral constriction formation, beginning around the time of oral closure achievement. On
35
the other hand, vertical larynx movement and the oral closure gesture begin simultaneously for voiced
pulmonic stops. The later lowering of the larynx in implosives is expected in that the acoustics of
implosives are typically characterized by increasing voicing amplitude during closure (Demolin 1995,
Jessen 2002, Kulikov 2010, Ladefoged & Johnson 2014, Ladefoged & Maddieson 1996, Raza et al.
2004), as compared to the typical amplitude die-out of voicing during the closure of pulmonic voiced
stops. In the production of implosives, as larynx lowers after the closure, the volume of the vocal tract
increases, keeping supralaryngeal pressure from increasing, which allows voicing of the implosives to be
maintained with the same or increasing amplitude throughout the closure (Lindau 1984, Russell 1997).
For voiced stops, on the other hand, larynx lowers before the closure is formed, and the vocal tract size
gradually decreases during the closure. In turn, the supralaryngeal pressure increases and voicing
eventually dies out.
While it seems clear that vertical larynx movement is indeed one of the controlled speech motor
actions necessary in the representation and production of ejectives and implosives, the vertical laryngeal
gesture as a variable has not usually been implemented in articulatory models or has been only
represented statically in terms of larynx height (Browman & Goldstein 1989, Goldstein 1980, Maeda
1990, Ménard et al. 2007, Mermelstein 1973). Additionally glottal adduction and potential pitch changes
must also necessarily be represented; the former is requisite to successfully achieve the aerodynamic
requirements of these consonants. Still, as gestural sequencing is a crucial factor in differentiating one
category from another (implosives and voiced stops in this case), a better understanding of the dynamics
of the vertical laryngeal activities and associated gestures’ timing relations in non-pulmonic consonants is
necessary in developing articulatory models and representations. The current findings suggest that
ejectives and implosives may have different phonological representations that cannot be explained by
simple bi-directional distinctions of egressive and ingressive air flow or larynx raising and lowering, etc.
For example, ejectives are characterized by larger upward movement of the larynx as well as tighter
coordination of oral-larynx timing compared to pulmonic stops and implosives. Furthermore, implosives
36
are more distinguishable from pulmonic counterparts in terms of the timing relations between oral and
vertical larynx gestures, but not in the actions of individual gestures.
This specific contrast in intergestural timing between voiced stops and voiced implosives brings
attention to multi-gesture complexes’ inherent coupling structures. Segments, at least those with multiple
coupled gestures, are equipped with not only the types of gestures and the degree of gestural activations,
but also the arrangement or phasing among the gestures. The current study proposes the utility of
intergestural timing in predicting both relative timing and its stablility within segment-sized multi-
gestural molecules. Throughout the dissertation, this importance in timing relations in the representation
of multi-gesture complexes will be discussed, by investigating the stability and variability in timing
(Chapter 3 and 5), and further incorporating timing into the modeling of complex segments (Chapter 6).
2.5. Conclusion
This chapter presents an investigation of the spatiotemporal properties of vertical larynx movement in
Hausa consonants and of how those actions are coordinated with supralaryngeal constriction gestures. The
results of the rtMRI articulatory study suggest that phonological contrasts between non-pulmonic
consonants and pulmonic consonants are manifested with different gestural sequencing patterns as well as
with different timing variability. In addition to ejectives and implosives exhibiting raising and lowering of
the larynx, respectively, the larynx gestures’ vertical position differs for different consonant types
(ejectives > voiced obstruents, ejective fricatives > ejective stops). Moreover, the beginning of the vertical
larynx movement is tightly locked to the closing achievement of the oral gesture in non-pulmonic
consonants, with ejectives exhibiting a more stable larynx-oral coordination than implosives. Taken
together, the findings from the current chapter deepens our understanding of the vertical larynx actions
and temporal variations of this vertical larynx movement for consonants with specific aerodynamic
demands. The results also highlight that the phonological contrasts among these multi-gesture complexes
include not only the vertical laryngeal gesture and the glottal adduction gesture but also a specification of
37
relative timing and stability. How much of this additional specification needs to be stipulated in a
phonological representation, and how much can follow automatically from a segment-internal dynamics
association with aerodynamic tasks is a very open question for the future.
38
3. Prosodic Variability of Multi-Gesture Complexes: Ejectives and Implosives
3.1. Introduction
In this chapter, temporal stability between gestures within multi-gesture complexes—i.e., phonological
units (segments, syllables, etc.) that have multiple gestures phased with one another—in Hausa glottalic
and pulmonic consonants is investigated. Unlike pulmonic stops, which control oral constriction to build
up and release air pressure from the lung action, ejectives and implosives use a glottalic airstream
mechanism involving both oral constriction and vertical larynx movement to create local air pressure
changes in the oral cavity. Due to this aerodynamic goal for non-pulmonic consonants, crucial timing
relations are required between the oral and vertical larynx gesture. For instance, in the production of
ejectives, the larynx is expected to raise during the time when the oral constriction is sealed so as to yield
a large increase in intraoral air pressure.
Rt-MRI data allows capturing kinematic data on not just oral gestures such as lip and tongue
movement, but also the movement of the vertical larynx gesture for non-pulmonic consonants and the
velum movement for nasal consonants. With this availability of quantifying multiple articulatory
movements in a production, we can understand how co-articulated gestures may be tightly or loosely
coupled to each other. What’s more, investigating coordination structures and variability in multi-gestural
complex speech segments illuminates whether their stability/variability in relative timing is relevant
linguistically, since this coupling may have different phonological consequences such as sound change
patterns or the learnability or perceptual recoverability.
This chapter investigates whether and how the phonological contrasts between non-pulmonic and
pulmonic consonants are realized in their intergestural timing in the face of prosodic forces on their
coordination. In the previous chapter, the findings show that voiced pulmonic consonants and non-
pulmonic consonants (i.e., implosives) differ not in their articulatory magnitude but crucially in their
temporal organization of tongue constriction actions with vertical larynx actions. For voiced stops, oral
39
constriction formation and larynx lowering begin simultaneously, whereas for voiced implosives, oral
constriction gesture precedes larynx lowering—lowering of the larynx starts around the target
achievement of the oral closure. This temporal pattern can be characterized dynamically as an in-phase
coordination between vertical larynx and oral gestures for pulmonic stops and an off-phase relation of the
multi-gestures for the gesturally complex non-pulmonic consonants (Oh et al. 2018, 2019). Additionally,
the temporal coordination seems to be more stable in the non-pulmonic consonant production compared
to that in pulmonic consonants in terms of the token-to-token variability (See Ch. 2.3.2.2.).
Building on these previous observations, the current chapter examines whether the tightness in
coordination observed for glottalics is reflected under prosodic forces that can modulate intergestural
timing; that is, whether the intergestural timing in the non-pulmonic consonants is more stable across
prosodic modulations. The prediction is that the coordination between oral and vertical larynx gestures is
less malleable and more resistant to prosodic variations in the glottalic consonants than in the voiced
pulmonic consonants. In the next section, this stable coordination of gestures for complex multi-gestural
molecules is described in terms of cooperative goals (‘superordinate goal’) for multi-gesture segments.
3.1.1. Superordinate Goals for Multi-Gesture Complexes
Previous studies suggest that the timing within segments is highly cohesive and stable compared to the
timing across segments (Byrd 1996a, Fowler 2015, Hoole & Pouplier 2015, Kelso et al. 1986, Löfqvist
1991, Munhall et al. 1994). For example, Löfqvist (1991: 346) states that “gestures forming a segment may
show a greater degree of internal stability in the form of coherence of patterns of muscular activity and/or
movement than those associated with different segments,” a view also supported by Saltzman et al. (2000)
in their perturbation study on bilabial-laryngeal timing. This strong temporal cohesion among gestures
within segments compared to those across segments can be represented by specifying couplings between
gestures for segmental gestural molecules. Regardless of the theoretical mechanism for intergestural timing,
however, the temporal control internal to the units must be encoded in the representation of such units to
40
produce systemically cohesive temporal structures in the service of phonological contrast and
morphological structuring of words.
The current study particularly investigates variations in within-segment timing. The prediction of
lesser timing variability in glottalic segments arises from the goal-based aerodynamic constraint for these
consonants; that is, the two vocal tract sub-systems cooperatively achieve an aerodynamic goal, causing
their temporal coordination to be more constrained. A cohesive unit emerges by “groups of muscles and
articulators act[ing] synergistically to achieve phonetic goals” (Munhall et al. 1994: 3615). For glottalic
consonants, one of their phonetic goals is the creation of an aerodynamic pressure differential. Such an
objective—aerodynamic in this case—requiring multiple gestures to synergistically achieve a mutual
higher-level goal for a segment is referred to in the present study as a superordinate goal. The term
‘superordinate goal’ is adopted from psychology literature (Brown & Wade 1987, Hunger & Stern 1976,
Fishbach et al. 2006)
6
and is introduced here to describe speech tasks that require cooperative actions of
two or more articulatory gestures in order to achieve a single goal. We postulate that a ‘superordinate goal’
is restricted to multi-gestural complexes that comprise segments and that involve gestures having one or
more off-phase relations, the precise achievement of which plays a critical role in the language’s system of
contrast. Such multi-gesture segments with superordinate goals are understood to include, for example,
ejectives, implosives, clicks, (syllable-final) nasals, and doubly articulated stops. Multi-gestural segments
with a superordinate goal—an aerodynamic one for ejectives and implosives—are expected to show a tight
and stable coordination between the cooperative gestures involved in executing this task due to their co-
dependency in achieving this superordinate goal.
Not all multi-gesture complexes have timing-specific superordinate goals; they could have an
acoustic goal, e.g., lowering of F3 for liquid rhotics (Lindau 1985) or a laryngeal (pitch accent) goal, e.g.,
changes in F0 for segmental (Silva 2006) and tonal (Xu & Wang 2001) contrasts (McGowan & Saltzman
6
“A superordinate goal” is a goal that can be achieved only through the mutual efforts of the members of two or
more groups (Brown & Wade 1987, Deschamps & Brown 1983, Hunger & Stern 1976).
41
1995). Nor do all multi-gesture complexes necessarily have superordinate goals, i.e., goals beyond or more
abstract than those controlled by the individual articulatory tasks of the gestures. For example, coordination
variations in laryngeal and oral timing (as well as spatial variation) produce different phonological patterns
in aspirated pulmonic consonants, such as voiceless aspirated stops and breathy murmured stops, with each
language exhibiting varying temporal arrangements between laryngeal and oral gestures for these
consonants (Cho & Ladefoged 1999, Hussain 2018). The gestures composing multi-gesture complexes for
pulmonic consonant segments, do show evidence of tight coordination, as we reviewed below (e.g. Lofqvist
& Yoshioka 1981, 1984), but it is possible they might not be quite as rigidly fixed in their coordination as
multi-gesture complexes that have superordinate goal that demand temporal constraints—e.g., non-
pulmonic consonants’ aerodynamic goal, doubly-articulated stops’ acoustic goal requiring distinct initial
and final formant transitions. In our view, such differences in relative stability will fall out from the coupling
structures in the architecture of the multi-gestural complex or molecule.
Ejectives and implosives, investigated in this chapter, require specific timing constraints, with
narrow temporal intervals, to achieve the task. Kingston (1985) noted that the timing between oral and
vertical laryngeal gestures depends on the degree of oral constriction, as the time-varying vocal tract shape
created by oral and laryngeal gestures influence voicing and pressure levels. Considering that the
aerodynamic state in terms of intraoral air pressure (e.g., compression) is the goal for these non-pulmonic
consonants (perceptually to produce bursts with a particular acoustic characteristics), oral and laryngeal
gestures would be specifically phased so that the pressure change is maximized for a specific gestural
configuration. For instance, for ejectives, larynx raising must specifically occur during the short interval
when the oral cavity is constricted. Due to this coordinative action, required to achieve the aerodynamic
superordinate goal, glottalic ejectives and implosives are predicted to have tighter timing relation than
pulmonic consonants.
By investigating temporal relations between gestures of non-pulmonic and of pulmonic consonants,
the current study aims to reveal coordination and coupling structures among gestures that comprise a multi-
gestural segmental unit, which may involve strong bonds between participating gestures due to the temporal
42
control imposed by complex gestural molecule’s superordinate goals. The investigation of timing stability
will provide useful information for the representation of coupling structures for these multi-gesture
complexes.
3.1.2. Predictions Assessed in the Present Study
Under the dynamic account of segments, the following predictions can be postulated. First, for segments
with a tight cohesion between gestures, the timing would be more rigid, and thus would less likely to be
affected by prosodic effects such as speech rate or the boundary lengthening effect. Prosody can be viewed
as a probe or perturbation of the relative timing in a target consonant. Byrd et al. (2000) tested how
articulatory actions and intergestural timing are modulated with respect to varying prosodic structures.
Moreover, Byrd et al. (2009) used rtMRI data to investigate oral-velum coordination across prosodic
contexts. In the current study, variations in prosodic contexts are introduced in order to compare the relative
stability of vertical larynx-oral coordination between pulmonic and non-pulmonic consonants in the face of
prosodic perturbation at phrase edges. Specifically, given the strict aerodynamic requirements of the
glottalic consonants, it is predicted that gestures in non-pulmonic glottalic stops will be less affected by the
prosodic lengthening/strengthening effect compared to like gestures in pulmonic stops. Additionally, we
compare kinematics of individual gestural actions as well as vertical larynx-oral coordination in ejectives,
implosives, and their pulmonic counterparts. Beginning with these kinematics of the individual component
gestures, the following hypothesis is proposed:
Hypothesis A. Pulmonic consonants are more susceptible to prosodic lengthening (in duration) and
strengthening (in magnitude) on individual gestural actions compared to non-pulmonic consonants.
Previous findings on prosodic effects have shown that gestural duration is lengthened and in some
cases the magnitude increases at higher prosodic boundaries (Byrd 2000, Byrd & Saltzman 1998, Cho 2001,
Cho & Keating 2001, Yanagawa 2006). Furthermore, prosodic phrase boundaries are found to reduce
43
gestural overlap (Bombien et al. 2006, Byrd et al. 2000, Byrd & Choi 2010, Holst & Nolan 1995). It is
possible that for multi-gesture complexes (e.g., complex segments such as ejectives and implosives) with
stably coordinated gestures, this prosodic effect may be reduced due to the rigid and less flexible timing
relations expected for such segments. Additionally prior findings of phrasal effects in articulation have been
limited to oral gestural actions (and the velum in Byrd et al. 2009); the current experiment additionally
seeks to examine the vertical larynx behavior under the influence of prosodic modulations. Whether
prosodic effects are reflected similarly or differently in the oral and the non-oral articulators is an important
aspect of our knowledge regarding prosody and speech production, as this is directly related to how each
articulator may or may not be distinctively controlled, particularly when working collaboratively to achieve
a superordinate task, as in the production of a complex multi-gesture segment.
Turning next from individual gestural actions, a related prediction can be made with regard to
prosodic effects on intergestural timing. As stated in the previous chapter, besides observed distinctive
timing patterns between pulmonic and non-pulmonic consonants, ejectives and implosives also show
difference in their intergestural timing variability. The aerodynamic necessity of oral cavity compression
and air pressure elevation in ejectives suggests that this timing relation must be highly stable given the
necessity for the larynx raising to create a volume compression in the oral cavity (before oral stop release).
In contrast, for implosives, larynx lowering is critical not only to the airstream mechanisms (i.e., glottalic
rarefaction) but also to the volume expansion that facilitates sustaining voicing. Additionally in implosives,
the oral constriction gesture occurs before the downward larynx movement (see Chapter 2), meaning that
oral and vertical larynx gestures are sequentially produced. Ejectives have an aerodynamic task goal of a
sharp increase in oral air pressure that must occur precisely during a brief oral closure interval (preceding
the oral stop’s moment of release), whereas voiced implosives have a goal of maintaining a transglottal
pressure drop through various mechanisms of vocal tract volume expansion.
7
Larynx lowering, is thus not
7
Voiceless implosives, although their occurrence is very uncommon across world’s languages, may serve as a direct
comparison with voiceless ejectives (Ladefoged 1990). However, no apparent difference in oral air pressure
decrease is exhibited between voiced and voiceless implosives (Mc Laughlin 2005).
44
tied to a brief moment in time relative to the oral closure gesture. Based on these descriptions, coupling
schema for ejectives (1a), implosives (1b), and pulmonic stops are illustrated in (1), taken from Chapter 2
(no specific timing relations are anticipated for voiced stops [1c]).
(1) a. Ejective fricative /s’/ b. Implosive stop /ɗ/ c. Voiced stop /d/
These coupling graphs makes predictions about the stability in timing relations among the gestures.
For example, intergestural timing between oral and larynx raising gestures in ejectives is expected to be
tighter, less susceptible to change, than the timing between oral and larynx lowering gestures in implosives
due to the larger number of links among the gestures in the ejective multi-gesture complex. Thus, it is
predicted that the observed intergertural timing variation, indexed for example by the coefficient of
variation, for the timing between oral and vertical larynx gestures will be more stable, i.e., less variable, for
ejectives than for implosives across prosodic variations. This leads to Hypothesis B.
Hypothesis B. The temporal lag between oral closure and vertical larynx onset gestures in ejectives is less
variable than the lags in implosives.
Regarding articulatory distinctions generally between glottalic and pulmonic consonants,
Maddieson and Ladefoged state that their phonemic contrasts can be manifested by the “differ[ence] in the
mode of action of the larynx, or in the timing of laryngeal activity in relation to the oral articulation” (1996:
47). However, it has proven difficult to draw a ‘bright line’ distinguishing the articulation of voiced
implosives and voiced stops (though acoustic differences, e.g., in the amplitude envelope of voicing, often
exist). This is largely because larynx lowering, one of the major articulatory characteristics of implosives,
45
is not unique to implosives since the production of voiced pulmonic stops also permits lowering of the
larynx so as to facilitate the maintenance of voicing (Clements & Osu, 2002, Kingston 1985, Ladefoged
1968, 1971, Ladefoged & Maddieson 1996). The current study considers whether their articulatory
contrasts can be found in the temporal coordination of these pulmonic oral and vertical laryngeal maneuvers.
We hypothesize that the coordination of gestures in glottalic consonants is critical, whereas the coordination
between supralaryngeal and laryngeal gestures for pulmonic consonants (e.g., an oral constriction gesture
and a downward larynx gesture for voiced stops) is likely more loosely coupled in time, as these gestures
do not (obviously) jointly act to achieve a single critical and contrastive task goal. (Though Löfqvist and
Yoshioka [1981, 1984] report very stable timing of vocal fold adduction and oral closure for pulmonic
stops.) Therefore, it is predicted that the temporal coordination between laryngeal and supralaryngeal
gestures in implosives will be more stable than the coordination found in pulmonic consonants (Hypothesis
C).
Hypothesis C. The temporal lag between oral and larynx gestures in non-pulmonic voiced implosive stops
that exhibit larynx lowering is more stable across prosodic than that seen in pulmonic voiced stops.
These three hypotheses are tested by analyzing systematic variability in gestural actions as well as
intergestural timing across prosodic variations. We especially focus on whether the measures of
variability in articulatory actions and timing can differentiate the three phonological categories, separating
ejectives from implosives and non-pulmonic consonants from pulmonic ones. Empirical findings from
these vertical larynx-oral actions have further import for the phonological representations of these multi-
gesture segments, as well as for models their internal coupling graphs, issues we will return to.
3.2. Methods
The same dataset used in Chapter 2 is analyzed to examine prosodic variability in this Chapter. See 2.2.
for details on Methods, including subjects, materials, data acquisition, and data analysis. To reprise
46
briefly, the collected speech production data were from three native Hausa speakers (S1, S2, & S3),
eliciting production of ejectives (/k’, k
w
’, s’/), voiceless obstruents (/k, k
w
, s/), implosives (/ɓ, ɗ/), and
voiced obstruents (/b, d/). There were two prosodic conditions: phrase-initial and word-internal
conditions. For phrase-initial conditions, target consonants (for example, /k/ in [2a]) are located at the
beginning of the intonational phrase (IP). IP boundaries are created by a pause induced by commas. For
phrase-internal conditions (2b), words with target consonants (an object noun) immediately follow a
phrase-initial word.
(2) a. Phrase-initial
Fàɗa: sàu ɗaya, kà:za: shine kalmà: à Hausa.
‘Say once, chicken is the word in Hausa.’
b. Phrase-internal
À yànzu, biyà: kà:za: kàmar
̃ kalmà: à Hausa.
‘Right now, read aloud chicken as a word in Hausa.’
Phrase boundary effects on vertical laryngeal actions and intergestural timing are investigated in
this chapter. The measurement used for the current data analysis are as follows.
(3) Measurements [LX is used as an acronym for vertical larynx gesture.]
• Oral duration (time from oral closure movement onset to release offset)
• LX duration (time from vertical larynx movement onset to movement [release] offset)
• Oral magnitude (change in oral positions between movement onset to maximum)
• LX displacement (change in LX positions between movement onset to maximum)
• Onset lag (interval from oral movement onset to LX movement onset)
• Onset-to-target lag (interval from oral closure target to LX movement onset)
Statistical analyses used are linear mixed effects models with subject and items as random effects
and consonant types (glottalic vs. pulmonic) and prosodic conditions (boundary vs. no-boundary) as fixed
47
effects.
8
To test for significant differences in variance between groups of difference consonant types, a
modified version of Levene’s tests for homogeneity of variance are used, which tests the equality of the
population variances by carrying out an analysis of variance of absolute deviations of observations from
the group median (rather than using the group mean, as proposed in the original Levene’s test [Levene
1960]).
9
The LeveneTest function in the package car (Fox 2016, Fox & Weisberg 2019) implemented in R
is used to determine whether there is equality of variance on intergestural timing between ejectives and
implosives, and between voiced implosives and voiced stops. Testing differences in variance is
supplemented with tests of coefficient of variations, which is particularly useful when comparing groups
with different means, measures, or values. Specifically, modified signed likelihood ratio tests (M-SLRT)
for the equality of coefficients is selected, which is considered superior than SLRT in terms of controlling
the Type I error rates (Diciccio et al. 2001) and performs better than asymptotic tests when the sample
size is relatively small (Krishnamoorthy & Lee 2014, Marwick & Krishnamoorthy 2019).
3.3. Results
3.3.1. Prosodic Lengthening and Strengthening Effects
In this section, gestural patterning for Hausa pulmonic and non-pulmonic consonants is investigated
across phrase-initial and -internal prosodic conditions. Both oral and vertical larynx (LX) gestural
duration and magnitude are examined for each prosodic condition to test whether individual gestural
duration lengthens and/or gestural magnitude increases at phase boundaries compared to within-phrase
tokens. Two comparison groups are examined separately: Voiceless ejectives versus their voiceless
8
The Kenward-Roger method, with a more conservative degrees of freedom was used for linear mixed effects
regression models, instead of the default method (i.e., Sattethwaite’s method) implemented in the R package
lmerTest (Kuznetsova et al. 2017).
9
Performing analysis of variance across deviations from the median instead of the mean is reportedly more robust to
departures from normality (Anderson 2006). For example, the Brown-Forsythe test is an extended version of the
Levene’s test in that the analysis of variance is carried out on the absolute deviations about the median (Brown &
Forsythe 1974). The modified Levene’s tests used in the current analysis is equivalent to the Brown-Forsythe tests
for equality of variances.
48
pulmonic counterparts and voiced implosives versus their voiced pulmonic counterparts. Thus, a two-by-
two (consonant type [non-pulmonic vs. pulmonic] & boundary condition [boundary vs. no-boundary])
factorial design is used for statistical analyses, and in considering the hypotheses regarding the relative
magnitude of prosodic perturbation, particular attention will be paid to the interaction effect.
3.3.1.1. Gestural duration
It is expected that individual gestural duration is lengthened at boundary conditions compared to no-
boundary conditions due to prosodic lengthening effects. For the oral gestural duration for voiceless
consonants, there is no interaction effect between consonant type and boundary (F(1,228.69) = 0.04, p
= .842), as well as no main effect of consonant type (F(1,228.69) = 2.712, p = .101). However, as seen in
Figure 3.1 (left), a significant main effect of boundary is exhibited on oral duration (F(1,75.28) = 63.653,
p < .001*). Tukey’s post-hoc comparisons show that prosodic lengthening effects on oral duration are
exhibited both in ejectives (t(178) = 5.489, p < .001*) and in voiceless consonants (t(178) = 5.773, p
< .001*), with longer oral duration at phrase boundaries versus no-boundary. For voiced implosives and
voiced stops, no interaction is found between consonant type and boundary (F(1,152.631) = 1.11, p
= .294), but there are main effects of consonant type (F(1,152.631) = 5.917, p = .016*) and of boundary
(F(1,73.007) = 69.668, p < .001*) (Figure 3.1: right). Voiced pulmonic Cs have significantly longer oral
duration than voiced implosives, and they both have longer oral duration at phrase boundaries compared
to phrase-internal conditions (all at p < .001).
49
Figure 3.15. Oral duration (left) and LX duration (right) at phrase-internal and -initial positions
On the other hand, a prosodic lengthening effect is not found on vertical larynx (LX) duration for
voiceless consonants (Figure 3.1: right, blue and black boxplots), as there is no main effect of boundary
(F(1,74.325) = .033, p = .856). Nor is there an interaction effect between consonant type and boundary
(F(1,219.408) = 1.665, p = .198) or a main effects of consonant (F(1,219.464) = .335, p = .564). For
voiced consonants, there is an interaction effect between consonant type and boundary (F(1,135.393) =
6.399, p = .013*), indicating that the effect on consonant type depends on the boundary conditions or vice
versa. Pairwise comparisons show that LX duration is lengthened only for voiced pulmonic stops at
boundary versus no-boundary conditions (t(133) = 3.208, p < .01*).
Contradicting our hypothesis that the individual gestural components of non-pulmonic consonants
are less susceptible to prosodic changes than those in pulmonic consonants (Hypothesis A), prosodic
lengthening effects are shown in both pulmonic and non-pulmonic consonants’ oral constriction duration.
On the other hand, vertical larynx duration remains the same across the boundary and no-boundary
prosodic conditions for non-pulmonic consonants. That said for pulmonic consonants (voiceless and
voiced Cs) only, voiced consonants do exhibit prosodic lengthening of LX duration. Therefore, greater
prosodic stability of non-pulmonic consonants compared to pulmonic consonants is partially shown in the
measure of LX duration. Additionally, the fact that oral and LX gestures behave differently under
50
prosodic modulations is a novel finding that may be attributed to their distinct role in speech motor
planning; this will be described further in the discussion.
3.3.1.2. Gestural magnitude
At higher prosodic conditions, gestures may reach at a higher maximum point (e.g., greater degree of
constriction or linguapalatal contact) due to longer activation intervals, which provide more time for
target attainment to be achieved before the articulators are called on for another upcoming control
structure (Saltzman et al. 2000). In our stimuli, gestural magnitude is thus predicted to increase when the
target consonant is at a phrase boundary as compared to when it is phrase-medial. However, Hypothesis A
predicts that non-pulmonic consonants will show lesser boundary effects compared to pulmonic
consonants due to the overarching superordinate goal that, we postulate, constrains prosodic modulation.
Figure 3.2 presents results for oral constriction magnitude and LX displacement. For oral
magnitude for voiceless consonants, prosodic effects are not exhibited in any consonant types, boundary
conditions, nor in the interaction term (interaction: F(1,228.567) = 0.01, p = .922; consonant type:
F(1,228.567) = 3.239, p = .073; phrase boundary: F(1,75.248) = 3.338, p = .072). For voiced consonant,
there is no interaction between consonant type and phrase boundary (F(1,139.64) = .026, p = .873) and no
main effect of boundary (F(1,75.91) = 0.015, p = .903), but there is a main effect of consonant type
(F(1,139.64) = 4.223, p = .042*), with voiced stops having slightly greater mean oral constriction
magnitude than voiced implosives.
51
Figure 3.2. Oral magnitude (left) and LX magnitude (right) at phrase-internal and -initial positions
Turning to the magnitude of the vertical larynx movement, LX vertical displacement (Figure 3.2:
right) for voiceless consonants is positive (involves larynx raising) and LX displacement for voiced
consonants is negative (involves larynx lowering); thus voiceless and voiced consonants are tested
separately. For voiceless consonants, there is no interaction effect between consonant type and phrase
boundary on LX displacement (F(1,208.497) = 0.028, p = .868). The main effects of consonant
(F(1,208.466) = 8.106, p < .01*) and boundary (F(1,76.069) = 9.828, p < .01*) are both significant, with
LX displacement greater for ejectives than for voiceless consonants, and LX raises more at phrase
boundaries than without boundaries. For voiced stops, there is again no interaction effect between
consonant type and boundary on LX displacement (F(1,131.861) = 2.293, p = .132), nor main effect of
consonant type (F(1,131.92) = 2.543, p = .113). However, the main effect of phrase boundary is
significant (F(1,74.816) = 13.819), p < .001*), with larynx lowering more at phrase boundaries compared
to phrase-internal conditions. Note that for voiceless ejectives and pulmonics, LX displacement increases,
that is, LX raises more in the phrase boundary condition, whereas for voiced implosives and pulmonics,
LX lowers more in this condition.
In Figure 3.3, LX displacement magnitudes under prosodic modulations are compared for each
segment that involve larynx lowering. Voiced implosives and voiced pulmonic consonants are not
differentiated by LX lowering magnitude (see previous chapter Ch 2.3.1.2); however, when the prosodic
52
conditions are examined, while no phrase effect in seen in implosives, greater LX lowering displacement
is observed in the boundary condition for voiced pulmonic consonants (/ɓ/: F(1,38) = 2.94, p = .095; /ɗ/:
F(1,37.045) = 1.762, p = .193; /b/: F(1,30.178) = 5.613, p = .024*; /d/: F(1,34.304) = 15.555, p < .001*).
This supports our prediction that pulmonic consonants are more sensitive to prosodic variations compared
to non-pulmonic consonants for LX gestural magnitude.
Figure 3.3. LX displacement at phrase-internal and -initial positions for implosives and voiced Cs
Overall, prosodic effects on the individual gestural actions of these multi-gesture complexes are
differentially realized in oral and vertical larynx gestures. At a prosodic boundary, oral gesture’s duration
is lengthened while LX (vertical larynx) duration remains relatively the same, and LX magnitude is
increased while oral constriction magnitude does not change. The current finding indicates that oral
gesture’s temporal characteristics and LX gesture’s spatial characteristics are subject to prosodic planning
and contextual variations, contrary to Byrd and Saltzman’s expectations (2003) that all active tract
variables will be similarly impacted by a co-active π-gesture. Additionally, the hypothesis on non-
pulmonic and pulmonic consonants’ degree of stability under prosodic modulations is partially supported
by the spatiotemporal patterns of the LX gestures: voiced pulmonic stops show greater prosodic effects,
53
and are thus more variable across different prosodic conditions, compared to voiced implosives. We turn
next to the intergestural patterning, the subject of Hypotheses B and C.
3.3.2. Prosodic Timing Variability of Ejectives and Implosives
To this point, prosodic effects on individual gestural actions have been investigated. In this section and
the next one, prosodic variability in non-pulmonic consonants for two intergestural lags are examined:
first, the onset-to-onset lag (i.e., onset lags), which is the interval from oral constriction movement onset
to LX raising/lowering onset, and second, the onset-to-target lag measuring the interval from LX
raising/lowering onset to the target achievement of the oral closure. It is predicted that temporal lags are
more variable in implosives than in ejectives (Hypothesis B), and that the lags are more variable in voiced
plosives than in voiced implosives (Hypothesis C). These predicted patterns of timing variability are
observed in the token-to-token variability (see Ch 2.3.2 on Larynx-Oral Coordination), and here, we
investigate how flexible or rigid the intergestural timing is in the face of prosodic perturbation at a phrase
boundary versus phrase-internally (no-boundary). With subjects and items as random effects, consonant
type (ejectives vs. implosives) and boundary condition (boundary vs. no-boundary) are included as fixed
effects in the model. When the interaction effect is not present, the main effects are reported. When there
is an interaction effect, Tukey’s post-hoc pairwise comparison tests are conducted to test differences
between each level of a factor within each level of the other factor in the interaction.
3.3.2.1. Onset lags
The results on intergestural timing indicate that onset lags have an interaction effect between consonant
type (ejectives vs. implosives) and boundary (boundary vs. no-boundary) (F(1,162.671) = 10.267, p
< .01*), indicating that the effect of consonant types depends on the effect of boundary conditions. Post-
hoc pairwise comparison analyses show that onset lags get longer at the presence of boundaries only for
ejectives and this lag difference between boundary versus no-boundary conditions is not found for
implosives (ejectives: t(126) = 3.366, p < .01*; implosives: t(166) = 0.712, p = .892) (Figure 3.4). This is
54
in contrast with our prediction that ejectives are more stable across prosodic variations than implosives.
However, when the temporal variability (token-to-token variability) for ejectives and implosives is
compared within each prosodic condition, Levene’s tests for homogeneity of variance show that there is
no difference in the onset lag variability between ejectives and implosives phrase-internally (ejectives vs.
implosives: F(1,99) = 1.327, p = .252), whereas phrase-initial ejectives’ onset lags show less variability
compared to onset lags for implosives phrase-initially (ejectives vs. implosives: F(1,103) = 7.968, p
< .01*). Table 3.1 summarizes tests for measuring variability at phrase-initial conditions, with both
statistics indicating that ejectives have significantly less variability in onset lags than implosives.
Table 3.1. Tests on timing variability in phrase-initial onset lags between ejective and implosives
Test name Test measure Test statistic p-value
Levene’s test Homogeneity of variance F(1,103) = 7.968 .006*
M-SLRT Coefficient of variation 10.424 .001*
Figure 3.4. Onset lags for ejectives and implosives
at phrase-internal (no boundary) & -initial (boundary) positions
3.3.2.2. Vertical larynx onset to oral closure target lags
The second intergestural lag measured for analysis is the lag from LX onset to oral closure target
achievement. This LX onset-to-oral target lag is meaningful in that the temporal constraints from the
55
aerodynamic goal for ejectives and implosives—requiring vertical larynx movement once the oral closure
is formed so as to raise oral air pressure—specifically relate to the initiation of the vertical larynx
movement relative to the closure of the oral gesture.
Figure 3.5 illustrates the LX onset to oral target lags for ejectives and implosives. Linear mixed
effects models reveal that there is a significant interaction effect between consonant type and phrase
boundary (F(1,169.887) = 11.615, p < .001*). Post-hoc pairwise comparisons do not show prosodic
boundary effect on onset-to-target lag in ejectives and in implosives, with a significant difference found
only between the ejective and the implosive consonants positioned phrase-internally (phrase-internal
ejectives vs. phrase-internal implosives: t(169) = 3.425, p = .004*). The temporal lags for boundary vs.
no-boundary conditions are not different, indicating that this oral-to-target timing is rather stable across
prosodic variations. Note also that the onset-to-target lags tend to become shorter at higher prosodic
conditions for ejectives but longer for implosives. The onset-to-target lags approaches closer to zero for
ejectives at phrase-initial positions (phrase-internal ejectives: 63.24 ms ± 41.36 ms; phrase-initial
ejectives: 31.64 ms ± 31.9 ms), while the lags are longer at the boundary for implosives (phrase-internal
implosives: 5.28 ms ± 41.73 ns; phrase-initial implosives: 52.61 ms ± 54.75 ms). That said, the lack of a
significant boundary lengthening effects on the onset-to-target lag measures (i.e., no decrease in gestural
overlap) is interesting in that oral closure gestures do undergo boundary lengthening. This finding implies
that the LX onset to oral target coordination is controlled so as to remain stable across different boundary
conditions (no boundary vs. boundary).
Moreover, the measures of temporal variability for each prosodic condition as assessed in the
Levene’s tests reveal no difference in variability is observed between ejectives and implosive phrase-
internally (F(1,99) = 0.367, p = .546), but at phrase-initial positions, ejectives have less timing variability
than implosives (F(1,103) = 13.407, p < .001*). Overall timing variability across prosodic conditions also
indicates that ejectives show less variability in onset-to-target lags than implosives (Levene’s test for
ejectives vs. implosives: F(1,204) = 8.631, p < .01*). The statistical test results for onset-to-target lag
variability are shown in Table 3.2.
56
Table 3.2. Tests on variability in phrase-initial onset-to-target lags between ejective and implosives
Test name Test measure Test statistic p-value
Levene’s test Homogeneity of variance F(1,204) = 8.631 .004*
M-SLRT Coefficient of variation 4.802 .028*
Figure 3.5. Onset-to-target lags for ejectives and implosives at phrase-internal & -initial positions
The observations from the two intergestural lags—onset lags and LX onset-to-oral target lags—
across prosodic conditions provide limited support for Hypothesis B, that ejectives have more stable
temporal lags than implosives. At phrase-initial positions, ejectives do show less timing variability than
implosives, but the difference in lag variability is not exhibited phrase-internally. However, this onset-to-
target temporal lag remains stable across prosodic contexts for both ejectives and implosives.
3.3.3. Prosodic Timing Variability of Implosives and Voiced Plosives
Lastly, timing variability across prosody perturbation is investigated comparing voiced implosives and
voiced pulmonic stops. In the previous chapter, voiced stops are found to have distinct temporal
organizations and greater intergestural timing variability compared to voiced implosives. Based on these
findings and the postulated temporal constraints for implosives (i.e., oral and vertical larynx gestures
coordinatively achieve an aerodynamic superordinate goal), it is predicted that voiced stops will show
57
greater timing variability compared to voiced implosives (Hypothesis C). Variability in temporal lags is
tested by i) comparing changes in lag patterns across different prosodic conditions and ii) investigating
how intergestural timing changes as a function of individual gestural duration. Linear mixed effects
model conducted on temporal lags included subjects and items as random effects, and consonant type
(implosives vs. voiced plosives) and boundary condition (boundary vs. no-boundary) as fixed effects.
3.3.3.1. Onset lags
Figure 3.6 shows that implosives have positive mean onset lags; i.e., larynx lowering starts after oral
closure movement onset (phrase-internal: 78.5 ms ± 40.68 ms; phrase-initial: 58.04 ms ± 54.62 ms). On
the other hand, voiced consonants have near-zero to slightly negative mean onset lags, which indicates
that oral constriction and larynx lowering begins simultaneously for the voiced pulmonic consonants
(phrase-internal: -7.88 ms ± 68.49 ms; phrase-initial: -36.02 ms ± 68.98 ms). There is no interaction effect
between consonant type and boundary (F(1,124.534) = 0.035, p = .852), and the subsequent tests show
that there is a main effect of consonant type on onset lags (implosives vs. voiced stops: F(1,124.588) =
25.13, p < .001*), implosives having significantly longer onset lags than voiced stops. However, onset
lags are not affected by differences in the boundary condition (boundary vs. no boundary: F(1,75.887) =
1.543, p = .218). That is, onset lags for both implosives and voiced stops are stable, or at least equally
unstable, across boundary conditions.
58
Figure 3.6. Onset lags for implosives and voiced Cs at phrase-internal & -initial positions
Tables 3.3 and 3.4 present different measures of timing variability between implosives and voiced
stops for onset lags, phrase-internal lag variability and -initial lag variability, respectively. Differences in
lag variability are not observed in the Levene’s tests, but with M-SLRT testing (the equality of coefficient
of variation taking account of differences in means), onset lags have significantly greater variability in the
production of voiced pulmonic stops compared to in voiced implosives.
Table 3.3. Tests on timing variability in phrase-internal onset lags for implosives vs. voiced Cs
Test name Test measure Test statistic p-value
Levene’s test Homogeneity of variance F(1,74) = 1.957 .166
M-SLRT Coefficient of variation 15.414 < .001*
Table 3.4. Tests on timing variability in phrase-initial onset lags for implosives vs. voiced Cs
Test name Test measure Test statistic p-value
Levene’s test Homogeneity of variance F(1,81) = 3.511 .065
M-SLRT Coefficient of variation 9.292 .002*
3.3.3.2. Larynx lowering onset to oral closure target lags
Turning next to the temporal lag between larynx lowering onset to oral closure target, implosives have
mean onset-to-target lags that are near-zero to slightly positive, indicating that larynx lowering begins
59
right before or at the oral closure target (phrase-internal: 5.28 ms ± 41.73 ms; phrase-initial: 52.61 ms ±
54.75 ms). For voiced plosives, mean onset-to-target lags have longer positive lags; that is, larynx
lowering starts a fair bit before oral constriction is formed (phrase-internal: 98.81 ms ± 64.82 ms; phrase-
initial: 164.02 ms ± 61.75 ms) (Figure 3.7). Linear mixed effects tests show that there is no interaction
effect between consonant type and boundary (F(1,117.09) = 0.299, p = .586), and the subsequent tests
show that there are main effects of both consonant type (implosives vs. voiced stops: F(1,117.124) =
39.009, p < .001*) and boundary condition (boundary vs. no boundary: F(1,76.704) = 7.815, p < .01*).
Figure 3.7. Onset-to-target lags for implosives and voiced Cs at phrase-internal & -initial positions
In contrast to the prediction that voiced implosives are less variable than voiced plosives across
prosodic modulations, changes in temporal lag patterns are similar between the two categories (i.e., no
change in onset lags & increase in onset-to-target lags in the phrase boundary condition). Taken together,
intergestural timing for implosives and voiced stops behaves more or less similarly under prosodic
modulations. In the next section, a different measure of timing variability is used to investigate relative
stability/variability in timing; covariance analyses are conducted between intergestural timing and
gestural duration.
60
3.3.3.3. Correlations between timing and duration
In this section, correlations between individual gestural duration and intergestural timing are examined to
test whether intergestural timing is relatively stable in the presence of variations in individual gestural
duration. Again, it is expected that implosives show stable intergestural timing across variations in
gestural duration, whereas voiced consonants’ timing is affected by changes in gestural duration.
Figure 3.8. Correlation graphs for intergestural timing and LX duration
(left: onset lags × LX duration; right: LX onset-to-oral target lags × LX duration)
The correlation results for onset lags show that there is a significant negative correlation between
the onset lag and LX duration for voiced stops (/b/: R = -0.46; /d/: R = -0.62), while onset lags remain
stable across variations in vertical larynx duration for voiced implosives (Figure 3.8: left). Similarly, LX
onset-to-oral target lags and LX duration show significant positive correlations for voiced stops (/b/: R =
0.54; /d/: R = 0.53), whereas no correlation (positive or negative) is found for onset-to-target timing
versus LX duration for implosives (Figure 3.8: right). In other words, for voiced stops, as LX duration
lengthens larynx lowering begins earlier relative to the oral closure onset (negative onset lags over
duration), and oral closure target achievement is delayed relative to the larynx lowering onset (positive
onset-to-target lags over duration). However, neither intergestural lag changes much in voiced implosive
61
stops regardless of the duration of the vertical larynx movement. These different degrees of covariance
relations provide evidence that implosives and voiced pulmonic stops have different coordination
constraints such that oral closure target is closely synchronized with the onset of the larynx lowering for
implosives, whereas oral target may be coordinated to a specific timepoint within larynx movement
interval (e.g., larynx lowering target) for voiced stops, thus these temporal lags are more affected by
changes LX duration. Findings on these two intergestural timing (onset lags and LX onset-to-oral target
lags) suggest that implosives’ vertical larynx onset is timed with the coordinated oral gesture, and this
timing relation is independent of the duration of vertical larynx actions.
3.4. Discussion
This chapter investigates the stability and variability in multi-gesture complexes, mainly comparing
between non-pulmonic and pulmonic consonants. First, prosodic effects on movement duration and
magnitude for individual oral and vertical larynx gestures are tested. The results are largely consistent
with implosives’ larynx lowering magnitude being less affected by the prosodic effect compared to that of
voiced stops. Variability in intergestural timing between ejectives and implosives suggests that the
coordination of vertical larynx and oral gestures is flexible as a function of prosodic conditions. Although
we expected tight coordination between vertical larynx and oral gestures for non-pulmonic consonants,
especially for ejectives, prosodic variations still participate as a force to modulate timing relations. That
said, at phrase boundaries ejectives do however seem to have more stable intergestural timing than
implosives. Such a distinction can be modeled by phase windows or some similar approach allowing
variability in intergestural phasing relations, representing ejectives with a tighter relationship between
larynx raising and oral gestures than implosives’ larynx lowering and oral gestures (Byrd 1996a,
Saltzman & Byrd 1999, 2000). Referring back to the coupling schema in (1), the presence of the coupling
relations between oral release and larynx raising gestures and the lack thereof between oral release and
62
larynx lowering gestures do predict that intergestural timing is more stable in ejectives compared to
implosives.
Byrd (1996) discusses another non-pulmonic consonant—click—as a quintessential example of
narrow phase windows allowing very little intergestural timing variability between co-occurring anterior
and posterior constrictions: “Certain aspects of gestural structure might not be recoverable unless
coordinated in a specific way. For example, in certain cases a precise temporal coordination may be
necessary to yield aerodynamic properties that typify a sound, such as ingressive airflow in a click” (p.
160). She predicts that the coordination of larynx raising/lowering gestures with oral gestures in
ejectives/implosives should be more stable than the coordination of those gestures with adjacent vowels.
The current study instead compares a similar type of coordination (i.e., larynx lowering and associated
oral gestures) in implosives and voiced plosives.
Although timing variability patterns across prosodic variations are similar between implosives
and voiced plosives, the findings for the relative timing over durational changes crucially show that
intergestural timing in question is not affected by gestural duration in implosives compared to the
systematic variation on intergestural timing over changes gestural duration found in voiced stops. Similar
results are found in Shaw et al. (2019) showing that the timing in complex segments is not affected by
gestural duration but that the timing in consonant sequences varies with changes in gestural duration.
These results illuminate how the covariance relations between timing and duration can be used to
understand the rigidity or flexibility in timing for multi-gestural phonological units. When the timing of
the two (or more) articulatory gestures is critical, other gestural actions (e.g., gestural duration) may
compensate to preserve the relative timing. Such stable relative timing may be indicating that this specific
temporal coordination must necessarily be represented phonologically, for example, in the coupling
structure. Implosives’ stability in timing across durational variations suggests that voiced implosives and
voiced plosives are differentiated not only by patterns of temporal organizations (i.e., differences in
temporal lag patterns), but also by the cohesion of intergestural timing. In other words, in addition to how
the gestures are differently patterned over time, the current chapter addresses the differences between
63
voiced implosives and voiced stops are generated via different crucial coordination/synchronization
relations.
Let’s turn next to distinct patterns found in oral and vertical larynx gestures under prosodic
modulations. For oral gestures, prosodic effects are generally shown in the temporal domain but not in the
spatial domain. For vertical larynx gestures, on the other hand, prosodic effects are mainly exhibited in
the spatial, but not in the temporal, domain for the multi-gesture complexes. These differential behaviors
under prosodic manipulations reflect which articulatory maneuvers are controlled in prosodic planning. In
addition to well-known prosodic lengthening effects on oral gestures and intergestural timing, vertical
larynx magnitudes may be impacted by prosodic organizations and boundary structures. For example, a
recent study on vertical larynx movement in Korean (Oh & Lee 2020), where this action plays a role in
pitch management rather than aerodynamic management, suggests a potential role of vertical larynx
spatial actions in realizing accentual phrase boundaries and prominence.
Previous articulatory models that implement a vertical laryngeal gesture mostly represent the
vertical larynx gesture statically in terms of larynx height (Browman & Goldstein 1989, Goldstein 1980,
Maeda 1990, Ménard et al. 2007, Mermelstein 1973). The current investigation of Hausa glottalic
consonants and pulmonic consonants highlights the important role of the vertical larynx gesture in
creating phonological contrasts, both in larynx raising magnitude (ejectives vs. voiceless pulmonic
consonants) and in intergestural timing (implosives vs. voiced pulmonic counterparts). Based on the
spatiotemporal characteristics of multi-gesture complexes, timing (phasing relations) of the gestures, in
addition to duration (activation interval) and magnitude (target), must be encoded in the representation of
these complex gestural molecules. Additionally, we propose that the stability of the timing relations
within the multi-gesture complex must also be incorporated into phonological representation. A coupling
graph provides a mechanism for doing so. The coupling architecture and how this relates to and allows
the prediction of timing stability will be explored later in Chapter 6.
One of the limitations in the current experiment is that the comparisons between ejectives and
implosives are not based on the minimal pairs, because Hausa ejectives and implosives are not produced
64
with identical places of articulation. To ensure direct comparisons between ejectives and implosives,
future articulatory studies on glottalic consonants should explore languages such as Zulu (includes
bilabial and velar ejectives and implosives) and Yucatec Maya (includes bilabial ejectives and
implosives). Furthermore, adding another layer of prosodic conditions (i.e., utterance-initial [post-pausal]
or AP to supplement the IP condition) could reveal potentially gradient directional behaviors (vertical
larynx magnitude and/or more or less gestural overlap) for different types of glottalic consonants as well
as pulmonic ones. Finally, complex gestural molecules can be examined syntagmatically to assess
whether, for example, the stability of vertical larynx-oral coordination patterns in line with velum-oral
coordination, both within and across languages; manipulation of syllable structure or perturbation studies
of coordination within multi-gesture complexes could be deployed to this end.
3.5. Conclusion
This chapter addresses how prosodic stability/variability can be differently encoded in non-pulmonic
ejectives and implosives compared to pulmonic consonants in Hausa. The results from rtMRI speech
production data suggest that oral and vertical larynx gestures are differently affected by prosodic
modulations. Furthermore, voiced implosives and voiced plosives contrast in intergestural timing and its
variability, rather than in terms of the spatiotemporal properties of individual gestural actions alone.
Therefore, information encoding timing relations may be crucial to differentiate non-pulmonic consonants
from pulmonic consonants. Overall, the current study deepens our understanding of the role of vertical
larynx actions and their coordination with oral gestures in consonants and helps advance modeling of the
articulatory representation of complex multi-gestural molecules.
65
4. Velum-Oral Complexes in Korean Singleton and Juncture Geminate Nasals
4.1. Introduction
In this chapter, we further investigate the internal coordination of multi-gesture complexes, focusing on
nasal consonants in underlying and derived contexts. Nasals are not traditionally considered to be
complex segments, but they are in fact composed of multiple gestures (i.e., the oral and the velum
gestures) that may be coordinated in-phase or anti-phase for different languages and/or contexts. Thus
they can be considered gestural super-structures called “gestural molecules” (Browman & Goldstein
1989, Byrd & Saltzman 2003, Goldstein & Fowler 2003, Goldstein et al. 2006). “The conception of
gestural molecules serve[s] to recognize the phonological vitality of groups of gestures that recur in many
words, are systematically patterned, and are temporally cohesive.” (Byrd & Krivokapić 2021:34).
Segments have been postulated to be a canonical gestural molecule (Saltzman & Munhall 1989, Byrd
1996a), allowing them to be readily combinable with other structures thanks to their cohesiveness (see
also Goldstein & Fowler 2003).
As such, if this coordination of an oral and a velum gesture synergistically is deployed to form a
nasal consonant in a language, it is reasonable to assume that this internal coordination is more stably
structured compared to gestural phasings across segments. In this chapter, the internal gestural
organization and the coordination structure of various nasal sequences are examined. The goal of this
chapter is to understand articulatory organizations of nasals in varying structures, including onsets, codas,
and derived geminates of different types. Next in the following chapter, the issue of stability in the velum-
oral coordination in nasal structures will be explored.
In addition to singleton onset and coda nasals, the articulatory characteristics of nasal geminates
are investigated in order to observe the gestural behavior for long and short consonants. From prior
acoustic and articulatory studies, and common in general to theories of phonological representation of
geminates, we expect the articulatory gestures associated with a geminate to be longer than those of a like
66
singleton. That said, most experimental studies of the durational characteristics of geminates have been on
their acoustic duration, and the few articulatory studies that exist have examined oral constriction
duration. Little is known about the durational behavior of other articulatory subsystems of these long
segments such as velum behavior, as well as the intergestural timing between oral and velum gestures, in
nasal geminates. In this articulatory study, we investigate derived nasal geminates in various prosodic
conditions. By examining how the velum acts for nasal geminates in coordination with oral gestures, this
chapter aims to understand the representation of geminates as complex gestural configurations. We will
turn first to a consideration of the phonological representation of underlying and derived geminates.
4.1.1. The Phonological Representation of Underlying and Derived Geminates
In autosegmental phonology, geminates are distinguished from singletons by the former having an
additional unit on the timing tier (Hayes 1986, McCarthy 1986, Lahiri & Hankamer 1988, Ridouane
2010).
10
A singleton is linked to a single timing slot (1a), whereas a geminate is linked to two timing units
(1b-d). Depending on how these timing units are associated to the melodic tier or a feature bundle,
geminates are classified as lexical (‘true’/‘underlying’) geminates and concatenated or assimilated
(‘fake’/‘derived’) geminates. In the representation of underlying lexical geminates, the two elements in
the timing tier are linked to a single (segmental) feature matrix on the melodic tier, as shown
schematically in (1b). Concatenated geminates are geminates created by the concatenation of two
phonologically identical elements across a morpheme (and possibly prosodic) boundary, and each of the
two timing slots is linked to its own melodic unit, as in (1c). Lastly, assimilated geminates are also
represented with two timing units each associated to a separate melodic unit, but the non-identical
elements undergo total assimilation such that they surface as a geminate, as in (1d).
10
Aside from prosodic length analysis of geminates, weight representation of geminates posits that geminate
consonants are underlyingly moraic, whereas contrasting singletons are not (Hayes 1989, Topintzi 2008, Davis
2011).
67
(1)
Note. Examples are from Italian (a-b) and from Korean (c-d).
While singletons and geminates are clearly distinguished representationally by the number of units on the
timing tier, any phonological or consequent phonetic distinction among different types of geminates has
been debated. It has been argued that all geminates are alike in the surface representation by ‘tier
conflation’—i.e., when there are adjacent identical elements in the timing tier, they are fused in the
melodic tier (Hayes 1986, McCarthy 1986). That is, geminates that underlyingly have different
phonological representations in their association between the timing units and melodic tiers become
identical in the surface representation via tier conflation, all resulting in two timing units associated with a
single melodic unit.
Previous studies have bolstered this supposition, showing that geminates of different origins
(lexical, concatenated, and assimilated) may have different phonological representations but are not
different from each other acoustically, supporting that they have identical surface representation. Lahiri
and Hankamer (1988) made a distinction between concatenated and assimilated geminates, the latter
involving phonological neutralization (which may or may not lead to phonetic neutralization), but
assimilated geminates were acoustically indistinguishable from other types of geminates in Bengali. They
concluded that an identical timing representation must be assigned to all kinds of geminates since their
acoustic properties (VOT, closure duration, preceding vowel duration) did not differ from each other.
Ridouane (2010) also found that these three types of geminates in Tashlhiyt Berber showed the same
temporal values in their closure duration.
Others have argued that lexical and assimilated geminates pattern together and behave differently
from concatenated ones (Kenstowicz 1982, 1994, Kirchner 2000, Ladd & Scobbie 2003, Ridouane 2010).
The former group of geminates are unaltered by spirantization cross-linguistically, whereas the latter can
68
be altered (e.g., Tamil, Tigrinya, Tiberian Hebrew; Kenstowicz 1982, Kirchner 2000). In addition,
although closure duration of consonants was the same in different types of geminates in Tashlhiyt Berber,
duration of the preceding vowel was shorter in assimilated geminates, similar to that of lexical ones, than
in concatenated geminates (Ridouane 2010). Ridouane (2010) groups lexical and assimilated geminates as
‘true’ geminates and concatenated ones as ‘fake’ geminates. This classification of geminates denies that
melodic tiers in the representation of concatenated geminates are conflated. Instead, lexical and
assimilated geminates have two timing slots associated with one melodic tier whereas concatenated
geminates have two timing units each linked to a melodic tier without conflation.
Although tier-based phonological representations make different predictions for geminates of
different origins, a single type of geminate can also pattern differently between languages (Dunn 1993,
Kraehenmann 2011, Smith 1995) and in varying speech rate within a language (Arvaniti 1999, Hirata &
Whiton 2005, Mitterer 2018). Kraehenmann and Lahiri (2008) find that geminates’ duration and its ratio
with singleton’s duration largely vary as a function of prosodic boundaries. For example, geminates are
longer word-medially than phrase-initially, and longest utterance initially. In sum these studies show that
timing patterns of overlapping gestures for geminates are governed by various factors such as types of
geminates, speech rate, and prosodic boundary.
In sum, geminates with an identical phonological structure can vary in their durational patterns
due to language-specific phasing relations and/or to prosodic modulation of speech materials. Such
variability in surface representations is available in the Articulatory Phonology’s conception of timing
relations among the gestures coupled in larger multi-gesture structures. In Articulatory Phonology,
gestures are primitive and abstract units of phonological representation that control coordinated
articulatory movements (Browman & Goldstein 1989, 1990, 1992). Gestures are associated with the
measurable movements of articulator tract variables, so that temporal as well as spatial characteristics of a
gesture can be estimated (Smith 1995). As durational parameters (i.e., activation duration and stiffness),
in addition to spatial parameters defined in the vocal tract (i.e., constriction degree and location), are an
intrinsic component of the phonological representation of gestures, this framework is potentially well
69
suited to characterizing the relationship between the phonological specification of geminates and their
observed durational behavior. Specific models of temporal (‘phasing’ or coupling) relations among
gestures can predict how the representation of certain phonological structures are realized and undergo
principled linguistic variations.
Geminates can be articulatorily represented either with a single gesture with longer activation
interval or with two gestures overlapping one another. For example, lexical or true geminates might be
represented with no increase in stiffness and just with an increase in the activation interval (Gafos &
Goldstein 2012), whereas derived geminates might behave like consonant clusters such as /mp/ in having
two overlapping homorganic constrictions (Dunn 1987, Payne 2005, Smith 1995). Compared to
autosegmental phonology, these differences in gestural structure and organizations make additional
predictions regarding different types of geminates. For example, two co-active gestures may be produced
with various amount of overlap, creating greater variability in its overall duration. Thus, derived
geminates, being composed of two overlapped gestures, would be predicted to be more susceptible to
prosodic or contextual variability than single-gesture lexical geminates that have a fixed gestural duration.
Crucially, the investigation of coordination among coupled gestural structures must take variability into
account, therefore providing useful information about phonological representation (Gafos et al. 2014). We
will turn to this in the next chapter.
4.1.2. The Articulatory Properties of Geminates
In addition to the overt distinction in acoustic duration, studies have shown that there are many complex
articulatory characteristics that distinguish geminates from singletons. Geminates usually have a tighter
constriction or more contact in their constriction locations than single consonants, presumably due to their
longer activation interval (Gafos & Goldstein 2012, Honorof 2003, Payne 2006, Ridouane 2003,
Vaxelaire 1995, Zeroual et al. 2008). In addition, Löfqvist (1995) predicted that if the virtual targets for
geminates and singletons were different, geminates would be produced with a higher peak velocity and a
larger displacement than singletons. His results, however, show that geminates and singletons may have
70
identical virtual target positions given that their peak velocities were not different. The actual difference
in target positions between geminates and singletons may rather be the product of singletons exhibiting
undershoot due to insufficient time to reach the target value. Other studies on juncture/concatenated
geminates also report that these geminates are not accompanied by a higher peak velocity than singletons
(Byrd 1995, Gafos & Goldstein 2012). Hagedorn et al. (2011a, 2011b) found that geminates have greater
maximal constrictions than singletons. They examined constriction duration, location, and degree for
lexical geminates and singletons in Italian with mid-sagittal MRI data using a pixel intensity analysis
technique. In their analysis, they defined the site of maximal constriction change (i.e., pixel with most
systematic intensity changes) as the constriction location. Their findings indicated that constriction
locations of geminates and singletons do not differ within places of articulations, but geminates were
accompanied by higher maximum pixel intensity than singletons, suggesting that geminates have higher
compression at the same constriction location.
However, geminates do not consistently have a larger displacement than singletons. In fact,
singleton onset consonants are sometimes produced with a larger displacement, or proportionally higher
maximum linguapalatal contact, than juncture geminates (Byrd 1995, Byrd et al. 2009). Moreover, it is
difficult to observe possible differences in displacement when the manner of articulation of a consonant is
a stop, i.e., once compressed contact is made between the articulators.
11
Moreover, in the case of fricative
geminates, aerodynamic constraints may preclude producing them with larger displacement than fricative
singletons, so as to prevent any change in their manner of articulation to stop-like consonants. Thus,
previous findings provide mixed results about the spatial characteristics of geminates compared to that of
singletons. Few studies offer data on the coordination patterns in geminates that can be understood as de-
composable.
11
Alternatively, the relative amplitude of the release burst of a stop is used to distinguish geminates from singletons,
geminates showing higher burst amplitude compared to their singleton counterparts (Ridouane 2007).
71
To better understand how the gestural structures of geminates and singletons are differently
controlled, both in space and time, it is worth investigating the behavior of non-constriction gestures (e.g.,
the velum gesture), in addition to oral gestures. Even if one postulates a gestural target defined in terms of
an aperture goal (e.g., an ‘anti-constriction degree goal’ in the velum vocal tract subsystem), the
observation of the realization of such a goal in geminates would be less likely to be impaired by tissue-to-
tissue compression effects or conflation with homorganic but phonologically distinct aperture degree. A
brief observation of real-time MRI data on Italian lexical geminates (Hagedorn et al. 2011b; data from
http://sail.usc.edu/span) showed that only oral constriction duration lengthened in geminates compared to
singletons, but no lengthening of the velum gesture in geminates was exhibited.
Articulatory studies on geminates with oral stop consonants can only measure temporal properties
of single constrictions or relative timing of identical oral gestures in ‘fake’ geminates when they are
pulled apart, or vowel-to-consonant phasing relations. On the other hand, nasal geminates, which have an
internal complex coordinated articulatory structure of oral and velum gestures, allow empirical access to
intergestural timing information among gestures of distinct (and distinguishable) vocal tract subsystems
reflecting the internal structure of geminates. The study of geminate nasals is thus important due to its
potential to reveal the crucial intergestural coupling structure.
Smith’s work (1992, 1995) on intergestural timing in lexical geminates showed that different
languages may have different coupling structures. For example, in Japanese, consonants are coupled to
each other, showing a C-C timing with consonants in turn coordinated with the preceding and the
following vowels respectively, yielding a direct chain-like structure; whereas in Italian, vowels are
coupled directly to each other, i.e., a V-V timing. In line with each language’s coupling structure, the
findings show that for Japanese geminates, as the consonant gets longer, the vowel following the
geminate consonant is delayed, exhibiting an increased timing lag between the vowels in a VCCV
sequence (where CC in the middle indicates a geminate). For geminates in Italian, on the other hand, the
timing between the vowels remains the same when a VCV sequence and a VCCV sequence are compared
(thus support a V-V timing), and only the duration of the oral closure lengthens in geminates.
72
Dunn’s (1993) studies of Italian and Finnish geminates confirmed that Italian exhibited the
predicted vowel-to-vowel timing patterns, whereas Finnish showed a more variable timing pattern
between the vowels surrounding a geminate or a singleton. The regularity in timing from vowel-to-vowel
in Italian and the lack thereof in Finnish is also attributed to the timing structure of each language, where
Italian is a syllable-timed language while Finnish is a mora-timed language. Or, the difference may come
from phonotactic restrictions such that Italian have length contrasts in consonants only, whereas Finnish
allow both consonant and vowel geminates and singletons. Although important language-specific
coordination structures can be understood from these accounts of timing in geminates, timing within a
geminate consonant, rather than the consonant-to-vowel or the vowel-to-vowel timing in the vicinity of a
geminate, is poorly explored. Thus, the current study exploits nasal singletons and juncture geminates to
examine the internal timing patterns as well as individual gestural behaviors.
12
4.1.3. Predictions Assessed in the Present Study
Korean can provide a useful testbed for understanding potential articulatory differences between juncture
geminates derived from different phonological sources. Korean does not have lexically distinct long and
short consonants, however, it does include various types of derived nasal juncture geminates, such as
concatenated and assimilated geminates, as well as geminates with nasal insertion. Concatenated nasal
geminates are a sequence of identical nasal consonants (2a), and assimilated geminates underlyingly have
a sequence of a stop and a nasal consonant with the pre-nasal stop realized as assimilated to the nasal
(2b). And lastly, a nasal consonant can be inserted between a coda consonant and a following glide onset.
Therefore, a /t#j/ sequence undergoes nasal insertion (/t#nj/) as well as nasal assimilation, resulting in a
12
Setting aside the geminate/singleton distinction, most articulatory studies on nasal consonants with respect to
varying prosodic conditions have not had available dynamic information of velum gesture realization. Fougeron and
Keating (1996) reported that there was more (oral) linguapalatal contact for /n/ in higher prosodic positions. This
result is accompanied by less nasal airflow in higher prosodic conditions (Fougeron & Keating 1996), interpreted as
a smaller velum aperture. In contrast, Cho’s (2017) acoustic study finds more nasal energy exhibited in higher
boundary conditions. Dynamics of nasal geminates and its context sensitivity can be better understood with time-
varying kinematic information for the velum gesture itself.
73
nasal geminate ([-n.nj-]), as in the first example of (2c). Examples of these different geminate formations
in Korean are given in (2).
(2) a. Concatenated geminates (/n#n/)
/k
h
ɨn/ + /namu/ → [k
h
ɨn.na.mu] ‘big tree’
b. Assimilated geminates (/t#n/)
/ɕat/ + /namu/ → [ɕan.na.mu] ‘Korean pine tree’
c. /n/-inserted geminates (/t#j/, /n#j/)
/nɨt/ + /jʌɾɨm/ → [nɨn.njʌ.ɾɨm] ‘late summer’
/han/ + /jʌɾɨm/ → [han.njʌ.ɾɨm] ‘mid summer’
/pom/ + /jʌɾɨm/ → [pom.njʌ.ɾɨm] ‘spring, summer, …’
whereas no gemination in:
/k
h
ɨn/ + /ʌɾɨm/ → [k
h
ɨ.nʌ.ɾɨm]/*[k
h
ɨn.nʌ.ɾɨm] ‘big ice’
The Korean phonological process of nasal gemination in these derived contexts is most prevalent word-
internally and across morpheme boundaries, but it is non-obligatory across word boundaries and at higher
prosodic boundaries.
13
Outstanding questions prompted by previous accounts of nasal structures and nasal
geminates in Korean, include (i) whether the velum action and/or the oral-to-velum timing differ between
nasal singletons and geminates, in addition to oral constriction differences. And (ii) whether the different
types of geminates—concatenated, assimilated, and /n/-inserted—exhibit comparable or differing
articulatory foundations?
Across the prior acoustic and articulatory studies, and common in general to theories of
phonological representation of geminates, we expect the articulatory gestures of any sort associated with a
13
The factors giving rise to the /n/-insertion in a stop and a glide sequence is not clearly understood; one possible
motivation is that /n/ is inserted to strengthen the left edge of the prosodic domain by increasing the articulatory
constriction of the initial segment (Ahn 2008, Lee 2004, Oh 2006, Kim et al. 2007). Lee (2004) further argues that
the domain-initial /j/ is modified to a less sonorant /n/ in order to decrease the boundary initial sonority. He also
claims that /n/ is inserted among other coronal consonants because the /n/-insertion can achieve the boundary initial
strengthening without intimidating hearer’s natural perception.
74
geminate to be longer than those of a like singleton. Thus in nasals, in addition to a longer oral
component, we would expect longer velum lowering. Byrd et al. (2009) also found that the velum plateau
duration (how long the velum stays in its lowest position) is generally longer for juncture geminates than
for singleton onsets and codas in English. With regard to spatial magnitude, Honorof (2003) showed that
singleton nasals are less tightly constricted than the juncture geminate nasals in their oral component.
Byrd et al. (2009) found little or no systematic difference in the velum aperture magnitude between
juncture geminates and singletons.
Finally, with regard to the internal intergestural timing of the singletons versus the geminates, we
consider the temporal lag between the velum and the oral gestures that form the nasal consonant. The
coupled oscillator model of syllable structure (Goldstein et al. 2009) hypothesizes that onset clusters and
coda clusters have different coupling structures, the former with in-phase relations among gestures and
the latter with anti-phase relations. Although this seems to be the general pattern for consonant clusters,
there are language-specific variations for segment-internal timing relations. For example, gestures
associated with nasals can in fact be simultaneously articulated in coda (Tabain 2004), and in languages
such as Western Canadian English, liquids have greater temporal lag in the onset than coda (Gick et al.
2006). Labio-velar glides (/w/) in American English also show greater asynchrony in the onset than in the
coda position (Gick 2003). Whether greater simultaneity is found in the onset or in the coda position,
languages nevertheless often do show distinct coordinative patterns as a function of syllable positions
(even if language-specific patterns need to be be specified for multi-gestural molecules). It is thus
hypothesized that Korean nasals may also show different velum lowering-to-oral constriction timing
relations in codas and onsets. Juncture geminates are predicted to not be distinct from a coda type
organization as the first element (i.e., initial edge) of the geminate is indeed a coda (Byrd at al. 2009).
One articulatory factor that might contribute to the juncture geminate having longer perceived ‘nasality’
than a coda nasal is a longer interval from the formation of oral constriction to the point where the velum
raises to seal the nasal port. Therefore, we also examine the timing of velum raising gestures with respect
to the oral gestures, in addition to that of velum lowering gestures.
75
In addition to geminate versus singleton distinctions, the comparison between Korean onset and
coda nasals are worth exploring, especially due to the on-going phonological sound change process of
onset denasalization or onset nasal weakening (Ahn 2013, Chen & Clumeck 1975, Cho & Keating 2001,
Lee & Kim 2007, Kim 2011, Yoshida 2008, Yoo & Nolan 2020).
In Korean, specifically the nasals in the syllable onset position are weakened, and are often
perceived as oral stops, while this is never the case for coda nasals. Moreover, onset nasals are more
likely to become weaker in: a higher prosodic domain (Cho & Keating 2001, Yoshida 2008), for alveolar
nasals than for bilabial nasals (Chen & Clumeck 1975, Kim 2011), before a higher vowel (Chen &
Clumeck 1975), and for younger speakers (Yoo & Nolan 2020). Yoshida (2008) argues in his nasometer
study that although the nasality in the onset weakens at higher boundaries, it is not completely ceased, as
there is quantifiable and measurable nasal sound even at the highest phrase boundary. He goes on to
suggest that Korean onset nasals are presumably accompanied “with less lowering of velum and weaker
nasal sound.” (Yoshida 2008:12). In addition to the aerodynamic data showing lesser nasality in the onset
position (Lee & Kim 2007, Kim 2011, Yoshida 2008), other studies find that the acoustic nasal duration
in the onset is shorter than that in the coda (Ahn 2013, Lee & Kim 2007). Considering this phenomenon
of onset nasal weakening in Korean, it is predicted that the articulation of onset nasals will have shorter
velum duration and less velum lowering, as well as a shorter interval between the oral and the velum
raising gestures than coda nasals. These variables will be assessed across the syllable positions in this
chapter. Further, the effect of prosodic boundaries on this denasalization process will be fleshed out in the
following chapter on prosodic variability.
Taken together and based on the limited number of production studies of multi-gesture
complexes, the following predictions are made for the Korean nasal singletons and geminates.
HYPOTHESIS A1. Concatenated geminates have longer duration of oral & velum gestures than
singletons (geminates /n#n/ vs. singletons /n#/, /#n/).
76
HYPOTHESIS A2. Concatenated geminates are not distinct from singletons (geminates /n#n/ vs.
singletons /n#/, /#n/) in magnitude of oral & velum gestures.
HYPOTHESIS A3. Nasal geminates’ internal oral-velar temporal relations (overlap) are distinct from
that of onset singletons and are similar to that of coda singletons (geminates /n#n/ vs. singletons /n#/,
/#n/).
HYPOTHESIS A4. Onset nasals have shorter velum duration, less velum lowering, and less overlap
between the oral and the velum raising gesture compared to coda nasals. (onset /n#/ vs. coda /#n/).
The second portion of this study investigates whether there are articulatory differences between
juncture geminates that are derived from different phonological sources. It is an open question whether
the concatenated and assimilated geminates have differing gestural representations. In tier-based
autosegmental phonology, it is hypothesized that concatenated geminates are longer than assimilated
geminates, because the former (concatenated) is generally understood as resulting from gestural overlap
of two independent elements on both the timing tier and melodic tier, while the latter (assimilated) usually
patterns in various phonological processes with the true lexical geminates, which are understood to be a
single long phonological (melodic or gestural) element, as discussed in Section 4.1. Consequently, the co-
produced concatenated geminate may yield larger magnitude movements if the aperture or nasal targets
are amplified in some way during the period of overlapping co-activation. The schematic tier-based
representations in (3) show how concatenated and assimilated geminates differ, potentially giving rise to
different phonetic patterns.
(3)
77
The hypothesized gestural coupling graphs for concatenated geminates and assimilated geminates
are shown in (4a) and (4b-i), respectively. An alveolar stop /t/ is represented with a single TT gesture, and
an alveolar nasal /n/ as having one TT gesture and a VEL gesture. In-phase relations are predicted for
onset consonants and anti-phase relations for coda consonants (Goldstein et al. 2009). From these basic
structures, the following coupling graphs are postulated. When the VEL gesture sufficiently overlaps with
the initial TT gesture in (4b-i) and (4b-ii), the /t#n/ sequence would be perceived as assimilated nasal
geminates.
(4)
Much research on timing stability has demonstrated that in-phase coordination is more stable than
antiphase coordination (see on general motor control: Haken et al. 1985, Sternad et al. 1996; and on
articulatory timing: Goldstein et al. 2009, Nam et al. 2009). Further with regard to articulatory coupling,
the coupled structures are predicted to be more stable when there are (i) greater number of paths between
the two nodes, (ii) when there are more direct and shorter paths between the two nodes, and/or (iii) when
the strength of coupling is greater for a given connected path (Goldstein et al. 2009, Nam et al. 2009).
Building on these foundations, the postulated coupling structures in (4) give rise to predictions of
differential variability. For example, consider the two TT gestures in (4a) and (4b-i), respectively. In (4a),
there is one direct path that links the two gestures and another indirect path via the two VEL gestures.
Similarly in (4b-i), there is one direct path and one indirect path that connect the two TT gestures, but the
indirect path in this case is shorter, passing through just one VEL gesture. Therefore, by virtue of
exploiting a shorter path, we can predict that the two TT gestures in assimilated geminates (if structured
as in 4b-i) will exhibit tighter (i.e., less variable) coordination than that of concatenated geminates (4a).
78
Alternatively, the oral gesture in assimilated geminates may be coupled only to the first gesture
(i.e., VEL) of the following consonant as illustrated in (4b-ii). When (4a) and (4b-ii) are compared, the
coupled structure in (4a), having a greater number of paths between any two nodes (also a more direct
path between the two TT gestures), is predicted to have a more stable timing than the structure in (4b-ii).
The investigation of intergestural timing variability between the oral and the velum gestures will be
further developed in the next chapter (Ch. 5) on prosodic variability.
Regarding the temporal coordination between gestures within a nasal geminate structure,
intergestural timing between oral and velum gestures in different types of derived geminates is not readily
known. A prediction based on the suggested coupling graphs in (4) is that a competitive coupling among
a single VEL gesture and two TT gestures in the graph for assimilated geminate nasals (4b-i) may create
greater gestural overlap, thus reduced temporal lags between oral and velum gestures, compared to the
multiple oral and non-oral gestures represented for concatenated geminate nasals (4a). With the
experimental data examining coordination within multi-gestural nasal complexes, we attempt to validate
the coupling graphs and associated predictions for timing relations of derived geminate nasals.
Combining previous phonological accounts and proposed coupling structures of concatenated
geminates and assimilated geminates, it is predicted that concatenated geminates (categorized as ‘fake’
geminates with two timing slots and two velum gestures) will have longer duration and temporal lags
compared to assimilated geminates (categorized as lexical or ‘true’ geminates with one timing slot and
one velum gesture).
Another type of derived geminates briefly discussed above is /n/-inserted geminates. The /n/-
insertion for a consonant-glide sequence occurs in compounds, derivational words, and phrase-medially,
but this phonological process is highly variable and optional, and /n/-insertion may well be absent across
morpheme or word boundaries and at higher phrase boundaries (Ahn 2008, Jun 2015). Moreover, this /n/-
insertion before a glide [j] is a rather newer productive process in modern Korean (Ahn 2008). The
phenomenon of /n/-insertion to date has only been discussed in a phonological context (Ahn 2008, Hong
2002, 2006, Jun 2015, Lee 2016, Lee & Lee 2006) with minimal phonetic or instrumental descriptions,
79
and no articulatory data are available to understand the representation of this structure. Unlike the other
two derived juncture geminates with underlying /n/ (/t#n/ & /n#n/), the /t#j/ sequence, if it is not
geminated, will not even have an identifiable velum movement. Therefore, for this special type of
geminate, the following questions are being asked: will it be realized as a geminate across phrasal
boundaries? If so, will it have velum movement? And what (articulatory property) makes it a geminate
and are the articulatory structures similar or different from other derived nasal geminates? Also, as the
/n/-inserted geminates are largely variable across token-to-token, it is expected that there will occasionally
be cases when de-gemination or a lack of gemination is observed? This rtMRI study will be the first
attempt to document the articulatory characteristics of /n/-inserted geminates in Korean.
In sum, based on these structural and distributional distinctions of the foundational studies noted
above, the following hypotheses are entertained for the present articulatory study:
HYPOTHESIS B1. /n/-inserted geminates have the least frequent velum movement among the three
types of geminates (concatenated /n#n/ vs. assimilated /t#n/ vs. /n/-inserted /t#j/). But when velum action
occurs, its articulatory patterning will resemble the other juncture geminate cases.
HYPOTHESIS B2. Concatenated geminates have longer duration and larger displacement than
assimilated geminates (concatenated /n#n/ vs. assimilated /t#n/).
HYPOTHESIS B3. Concatenated geminates have longer velum-oral temporal lags than assimilated
geminates (concatenated /n#n/ vs. assimilated /t#n/).
4.2. Methods
4.2.1. Subjects
Five native Korean speakers (2 female and 3 male) participated in the MRI experiment. Their ages ranged
from 25 to 31 at the time of the experiment. The subjects’ background information is presented in Table
4.1, with their native language, age at time of recording, city they grew up, and the age of arrival (AOA)
80
to the United States and the length of residence (LOR) in the U.S. They all grew up in Seoul, speaking
Seoul Korean with no reading or hearing difficulty.
Table 4.1. Background information of the subjects
Native language (Gender) Age City AOA (LOR)
Speaker A Korean (M) 31 Incheon 30 (1 year)
Speaker B Korean (M) 32 Seoul 31 (1 year)
Speaker C Korean (F) 27 Seoul 25 (2 years)
Speaker D Korean (M) 32 Seoul 29 (3 years)
Speaker E Korean (F) 25 Seoul 24 (1.5 year)
4.2.2. Data Acquisition
This study examines various nasal juncture geminates and singletons in Korean using real-time MRI
(rtMRI) data. The same data acquisition protocol as described in the previous chapter’s non-pulmonic
study is used to collect data for this nasal study (Narayanan et al. 2004).
4.2.3. Materials
Target consonants are alveolar oral/nasal stops occurring as singletons and as juncture geminates across a
phrase boundary. Target nasals include singleton onsets (/#n/), singleton codas (/n#p/, /n#t/), concatenated
geminates (/n#n/), and two derived geminates: assimilated geminates (/t#n/) and /n/-inserted geminates
(/t#j/). As a control, oral stops (/#t/, /t#p/, /t#t/) are also included in the stimuli (Table 4.2).
Table 4.2. Target consonants in Korean
Concatenated Assimilated n-inserted Control
Geminates /n#n/ /t#n/ /t#j/ /t#t/
Onset Coda Control
Singletons /#n/ /n#p/, /n#t/ /#t/, /t#p/
To elicit these target sequences, a combination of a noun and a number classifier is used. Three
nouns are used to create pre-boundary conditions (i.e., ‘a blackboard’ /ɕ
h
ilp
h
an/ for coda /n/, ‘a garden
field’ /t
h
ʌtp
*
at/ for coda /t/, and ‘a fish cake bar’ /hatp
*
a/ for no coda). Post-boundary segments are
81
elicited by using one of the following number classifiers: /nɛt/ ‘four,’ /tasʌt/ ‘five,’ /jʌsʌt/ ‘six,’ and /pɛk/
‘a hundred.’ These combinations of words are placed at varying prosodic boundaries (5-6).
(5) At a word boundary (/t#n/)
Mincu-nun sesehi thespath ney pyeng-ul cangsikhayssta.
/t
h
ʌtp
*
at/ /nɛ p
h
jʌŋɨl/
Minju-TOP slowly AP[garden field four-OBJ] decorate-PAST-DECL
‘Minju slowly decorated AP[four garden fields].’
(6) At an Accentual Phrase (AP) boundary (/t#n/)
Mincu-nun pululun thespath ney pyeng-ul cangsikhayssta.
/t
h
ʌtp
*
at/ /nɛ p
h
jʌŋɨl/
Minju-TOP AP[lush garden field] four-OBJ decorate-PAST-DECL
‘Minju decorated four AP[lush garden fields].’
The target sequences are placed in different prosodic boundaries: at a word boundary, at an AP
boundary, and at an Intonational Phrase (IP) boundary. For the AP boundary context, there is an
additional prosodic condition, adding a phrase-initial focal prominence, for a total of four prosodic
conditions. Table 4.3 shows example sentences for each prosodic condition.
Table 4.3. Prosodic conditions for Korean (/n#n/)
Conditions Target sentences
Wd boundary
toŋhuninɨn # p
*
aɾɨkɛ # AP[ɕ
h
ilp
h
an # nɛ kɛɾɨl] # unbanɛt
*
a
‘blackboard # four’
Donghoon moved AP[four blackboards] quickly.
AP boundary
-
tojʌŋinɨn # AP[t
h
ɨnt
h
ɨnan ɕ
h
ilp
h
an] # nɛ kɛɾɨl # unbanɛt
*
a
‘blackboard’ # ‘four’
Doyoung moved four AP[sturdy blackboards].
focus
tojʌŋinɨn # AP[t
h
ɨnt
h
ɨnan ɕ
h
ilp
h
an] # nɛ kɛɾɨl # unbanɛt
*
a
‘blackboard’ # ‘four’
Doyoung moved four AP[sturdy blackboards].
IP boundary
IP[i ɕakp
h
umɨn # AP[t
h
ɨnt
h
ɨnan ɕ
h
ilp
h
an]] # nɛka # sako sip
h
ɨn # ɕakp
h
umida
‘blackboard’ # ‘me’
This work is “sturdy blackboard,” I want to buy this work.
82
Each prosodic condition formed a block, and the order of presentation was from the highest
prosodic condition (IP boundary) —followed by AP boundary with focus and without focus—to the
lowest prosodic condition (Wd boundary), with items randomized within each block. In sum, three kinds
of nasal geminates (concatenated /n#n/, assimilated /t#n/, and n-inserted /t#j/), onset and coda nasal
singletons (/#n/, /n#p/, /n#t/) , and geminate and singleton stops (/t#t/, /#t/, /t#p/) were collected in four
prosodic conditions: an Intonational Phrase (IP) boundary, an Accentual Phrase (AP) boundary with focus
implementation, an AP boundary without focus, and a Word boundary. Each item was repeated 8 times
per speaker, yielding total of 288 tokens collected for each speaker (9 items X 4 conditions X 8
repetitions).
14
4.2.4. Data Analysis
The same techniques used in the previous chapter’s data analysis were used to obtain kinematic
trajectories of articulators: A centroid tracking analysis (Oh & Lee 2018) and a region-of-interest (ROI)
image sequence analysis (Lammert et al. 2013) were performed to provide kinematic trajectories of
Velum (VEL) lowering and raising gestures and Tongue Tip (TT) constriction formation, respectively
(See Chapter 2.2.4. for technical details of the two analyses). For each gesture, we examine duration (oral
constriction and velum lowering), magnitude (TT constriction degree and Velum lowering [and fronting]
degree), and intergestural timing lag (between TT and VEL). In tracking velum movement, triangular
Vocal Tract Regions (VTR) are used (rather than a rectangular shaped region as in Chapter 1) to
minimize tongue body intrusion into the VTR (see Figure 4.1). A fixed triangular VTR was selected for
each subject based on fixed anatomical landmarks. The anterior point of the velum VTR was placed at the
edge of the posterior nasal spine, and the rear side of the velum region met the rear pharyngeal wall. The
bottom point of ROI was chosen based on where the velum is located at its lowered state during nasal
14
Only 7 repetitions were collected by Speaker C due to a major software crash. To avoid crashes resulting from
recording a single long scan (over 30 seconds), each block with 9 sentences was separated into two scans, having a
short break after the fourth sentence, followed by five sentences (for Speakers A, B, and E).
83
breathing. The triangular VTR’s width ranged from 10 to 15 pixels and its height ranged from 10 to 16
pixels. Velum ROI sizes for each speaker were as follows: Speaker A (15 × 11), Speaker B (12 × 15),
Speaker C (10 × 11), Speaker D (13 × 16), and Speaker E (10 × 10).
Figure 4.1. Tracking of the velum centroids over time in the production of an intervocalic /n/
Signals obtained from the centroid analysis were smoothed by a loess smoothing with a local
span of 30 data points. As lowering trajectory of the velum is not strictly vertical in the standard
orientation of our images, but rather diagonal, tangential velocity is used for analysis, calculated from
both vertical and horizontal component trajectories of the velum centroid.
Once TT and VEL trajectories were obtained, the find_gest algorithm (Tiede 2010) was used to
calculate gestural actions. A velocity threshold of 20% (from the maximum velocity) was used to capture
movement onset, target achievement, release onset, and release offset. The duration of each gesture was
measured from the gestural onset to the gestural offset, spatial magnitude for the oral gesture was indexed
by a measure of pixel intensity at its maximal constriction, and the magnitude for the velum gesture
displacement was calculated as the displacement (absolute value) between the vertical and horizontal
positions of the velum centroid at is initiation of lowering to its point of maximal lowering, measuring the
Euclidean distance from the onset of velum movement to the maximum velum movement. Another
84
measure of the velum magnitude was its extremum position, which is the absolute position of the velum
in the vocal tract when it is maximally lowered.
With regard to intergestural timing, Velum-to-Tongue-Tip intergestural lags were indexed as
intervals between the two gestures’ temporal landmarks. The interval from the velum lowering onset to
the oral constriction onset was retrieved to indicate the phasing relations of the two gestures, and the
interval from the oral constriction onset to the velum raising onset was calculated to estimate the
articulatory duration of nasality; that is, the time during which the oral constriction is formed while velum
is lowered. The following summarizes the measurements collected for data analysis.
(7) Measurements
• Oral & velum gestural duration (from onset to offset)
• Oral magnitude (pixel intensity at maximal constriction)
• Velum magnitude (diagonal absolute displacement of centroids)
• Velum extrema (absolute position at maximum movement)
• Velum (VEL) lowering onset to Tongue Tip (TT) constriction onset lag
• Tongue Tip (TT) constriction onset to Velum (VEL) raising onset lag
In total, statistical analyses were conducted with 760 items. Results were analyzed using linear
mixed effects regression models with subjects and items as random effects, accompanied with Tukey’s
post-hoc pairwise comparison tests. The level of statistical significance was set as p < .05.
4.3. Results
Only the two lower boundary conditions—Wd boundary and AP boundary—are included in this chapter
that focuses on the articulatory characteristics of nasal consonants in varying syllable positions. The
results for the other prosodic boundary conditions—AP boundary with focus and IP boundary—will be
introduced in the following chapter on prosodic variability. Fixed effects included in section 4.3.1. are
three types of nasals based on their consonant length (short or long) and their syllable position (onset or
coda), namely singleton onsets (/#n/), singleton codas (/n#p/ & /n#t/ pooled), and geminate nasals
85
(/n#n/).
15
In section 4.3.2., the coded fixed effexts are three different types of derived geminates (i.e.,
concatenated /n#n/, assimilated /t#n/, and /n/-inserted /t#j/) .
4.3.1. Singletons and Geminates
4.3.1.1. Duration
In nasal geminates and singletons, geminate nasals are expected have longer Tongue Tip (TT) movement
duration than singleton nasals, and the same is expected for the Velum (VEL) duration (Hypothesis A1).
Linear mixed effects model
16
shows that the effect of a nasal type (either singleton onset, singleton coda,
or concatenated geminate nasals) is significant for both TT duration (F(2,247.9) = 40.642, p < .001*) and
for VEL duration (F(2,202.24) = 58.294, p < .001*). The results for Tukey’s post-hoc pairwise
differences of segment are in accordance with our expectation; that is, both the oral and the velum
gestures have longer duration in concatenated geminate nasals (/n#n/) than in singleton onset and coda
nasals (/#n/ & /n#/; Figure 4.2), confirming Hypothesis A1 (See Tables 4.4-4.5).
Figure 4.2. TT duration (left) & VEL duration (right) for singleton and geminate nasals
15
Assimilated /t#n/ and /n/-inserted /t#j/ geminate nasals are not included.
16
For all linear mixed effects models and Tukey’s post-hoc comparisons, a more conservative degrees of freedom
method (i.e., the Kenward-Roger method) was used instead of the default Sattethwaite’s method implemented in the
lmerTest package in R (Kuznetsova et al. 2017).
86
Table 4.4. Tukey’s post-hoc pairwise differences of segment for TT duration
Estimate Std. Error df t-ratio p-value
geminate – coda 38.4 4.30 250 8.937 <.001*
geminate – onset 20.8 4.97 241 4.191 <.001*
coda – onset -17.6 4.30 250 -4.098 <.001*
Table 4.5. Tukey’s post-hoc pairwise differences of segment for VEL duration
Estimate Std. Error df t-ratio p-value
geminate – coda 18.9 4.88 205 3.872 <.001*
geminate – onset 73.3 6.85 199 10.710 <.001*
coda – onset 54.4 6.25 201 -8.712 <.001*
Additionally, between singleton onset and coda nasals, TT duration is longer while VEL duration
is shorter in the onset nasal, compared to the oral and velum durations for nasals in the coda position.
That is, in addition to the singleton and geminate distinction, Korean nasals exhibit significantly different
gestural behaviors as a function of syllable structures (onset vs. coda).
To further investigate whether the durational increase in these Korean derived geminates
compared to singletons comes from the closure or the release of the oral gesture or both (for velum
gesture, lowering and/or raising actions), the overall gestural duration is split into the formation duration
(from gestural onset to target achievement) and the release duration (from the release to gestural offset).
Both the TT closure and release durations have main effects of segment (closure: F(2,224.43) = 32.79, p
< .001*; release: F(2,252.47) = 30.282, p < .001*). Further pairwise comparisons indicate that for oral
closure duration, coda nasals are shorter than both onset nasals (t(226) = -6.156, p < .001*) and geminate
nasals (t(226) = -7.059, p < .001*), but onset and geminates are not significantly different (t(219) = 0.788,
p = .711; Figure 4.3: left). On the other hand, geminate nasals have longer oral release duration than both
singleton onsets (t(246) = 4.757, p < .001) and codas (t(255) = 7.776, p < .001*), while the TT release
duration of onset and coda nasals do not reach significance (t(255) = 2.273, p = .061; Figure 4.3: right).
87
Figure 4.3. TT closure (left) & release (right) duration for singleton and geminate nasals
As for velum actions, there are main effects of segments on velum lowering duration (F(2,216.48)
= 51.946, p < .001*) and velum raising duration (F(2,223.82) = 12.183, p < .001*). In the post-hoc
comparisons among the onset, coda, and geminate nasals, onset nasals have significantly shorter velum
lowering movement than coda nasals (t(218) = -9.703, p < .001) and geminate nasals (t(215) = -9.196, p
< .001*), while coda and geminates are not distinguishable (t(216) = -0.503, p = .869; Figure 4.4: left).
Velum raising movement is longer for geminate nasals than for onset nasals (t(223) = 3.364, p < .01*) and
coda nasals (t(221) = 4.779, p < .001*), but between onset and coda nasals, VEL release duration is not
significantly different (t(227) = 0.035, p = .999; Figure 4.4: right).
88
Figure 4.4. VEL lowering (left) & raising (right) duration for singleton and geminate nasals
In general, for both the oral and the velum gestures, the TT release and the VEL raising durations
(and less for the oral closure and the velum lowering durations) contribute to the length distinction
between geminates and singletons, resulting in longer gestural duration for geminates than for singletons.
4.3.1.2. Magnitude
The magnitudes for the oral and the velum gestures for nasal consonants are predicted to be
indistinguishable between concatenated geminates and singletons (Hypothesis A2). There are main effects
of nasal segment types on TT magnitude (F(2,219.76) = 30.2, p < .001*; Figure 4.5: left) and on VEL
magnitude (F(2,205.44) = 132.93, p < .001*; Figure 4.5: right). The post-hoc analysis of TT magnitude
show that concatenated geminate nasals in fact have greater maximum oral constriction than singleton
nasals (geminate vs. singleton onset: t(215) = 5.99, p < .001*); geminate vs. singleton coda: t(222) =
7.488, p < .001*), whereas onset and coda nasals are not significantly different in their TT magnitude
(t(222) = 0.638, p = .799).
17
The VEL magnitude (measured by the lowering and fronting velum
17
Within singleton coda nasals, /n#p/ and /n#t/ differed in their TT maximum constriction, where /n#p/
had significantly smaller TT magnitude compared to that of /n#t/. This is potentially affected by the
following bilabial stop, which has a different place of articulation; thus the TT constriction formation for
89
movement), however, does not distinguish geminates from singletons. Geminate nasals and coda nasals
do not have significantly different VEL magnitude (t(208) = 1.088, p = .523), while they both have
greater VEL magnitude than onset nasals (geminate vs. singleton onset: t(202) = 13.821, p < .001*);
singleton coda vs. singleton onset: t(205) = 15.981, p < .001*).
Figure 4.5. TT magnitude (left) & VEL magnitude (right) for singleton and geminate nasals
In addition to VEL displacement magnitude, the lowering and fronting extrema are analyzed to
examine the absolute position in the vocal tract region when the velum is at its most lowered posture.
There is no main effect of nasal segment on VEL lowering extremum (F(2,170.92) = 2.567, p = .079),
indicating that the singletons and geminates are all alike in their lowest absolute VEL position (Figure
4.6: left). A main effect is found for VEL fronting extremum (F(2,214.13) = 200.1, p < .001*; Figure 4.6:
right), with post-hoc comparisons revealing that the velum is not so fronted in the onset nasal compared
to the coda and geminate nasals (singleton coda vs. singleton onset: t(215) = 19.619, p < .001*; geminate
the /n#p/ sequence may result in an undershoot due to gestural overlap between the Tongue Tip and the
lip gestures.
90
vs. singleton onset: t(211) = 16.938, p < .001*). Singleton coda and geminate nasals do not differ in their
VEL fronting extremum (t(215) = 1.348, p = .370).
Figure 4.6. VEL lowering (left) & fronting (right) extremum for singleton and geminate nasals
4.3.1.3. Timing
Nasals, being multi-gesture complexes, allow us to investigate intergestural timing between the oral and
non-oral components. The prediction for internal oral-velum timing for nasals is that the timing for
geminates is distinct from that of singleton onsets and is similar to that of singleton codas (Hypothesis
A3). The linear mixed effects models for temporal lags within nasals reveal that no main effect is found
for VEL lowering onset to TT raising onset lag (F(2,218.66) = 0.795, p = .453; Figure 4.7: left) and that
there is a main effect of TT raising onset to VEL raising onset lag (F(2,224.46) = 146.46, p < .001*;
Figure 4.7: right). Note that the positive VEL lowering onset to TT onset lag shown for all three nasal
types indicates the VEL lowering onset begins before the initiation of the TT gesture. For the second
temporal lag, TT onset to VEL raising onset lags are positive for coda and geminate nasals, indicating that
VEL raising is after the TT closure onset, whereas the two temporal landmarks occur roughly
simultaneously in the production of onset nasals.
91
Figure 4.7. VEL lowering onset to TT onset lag (left) & TT onset to VEL raising onset lag (right)
For the lag measures between the TT onset to VEL raising onset, post-hoc pairwise comparisons
reveal that singleton onsets have shorter lags than singleton codas and geminate nasals (singleton coda vs.
singleton onset: t(228) = 16.425, p < .001*; geminate vs. singleton onset: t(224) = 15.243, p < .001*), and
that the same lags are not significantly different between geminate nasals and singleton coda nasals
(t(222) = 0.423, p = .906).
The current findings on timing show that the initial phasing relations between the velum and the
oral gestures are alike among singleton onsets, codas, and geminate nasals. The difference in temporal
lags is instead found in the lag between the oral gestural onset and the VEL raising onset; consistent with
the hypothesis that geminates’ timing is distinct from that of onsets and is close to that of codas for these
multi-gestural nasal structures.
Summing up the results on onset and coda nasals’ articulatory characteristics, onset nasals have
shorter velum lowering duration, less velum lowering magnitude, and less overlap in the TT and VEL
raising onset lag, which are all indicative of “weaker nasality,” thus, Hypothesis A4 on onset-coda
asymmetry due to onset denasalization is supported.
92
4.3.2. Geminate Types
4.3.2.1. Count
Before moving on to statistical analyses, Table 4.6 illustrates raw counts of identifiable VEL gestures in
the rtMRI articulatory data.
18
Each speaker produced each item 7-8 times per boundary condition (Wd &
AP). In each cell in Table 4.6, the number of quantifiable VEL gestures for a given segment are recorded
out of total utterances.
Table 4.6. Count of identifiable VEL gestures (count/total token)
/#n/ /t#j/ /t#n/ /n#n/
Speaker A 12/16 10/16 12/16 15/16
Speaker B 4/16 6/16 9/16 16/16
Speaker C 13/14 14/14 14/14 14/14
Speaker D 12/16 4/16 8/16 16/16
Speaker E 1/16 0/16 2/16 16/16
Total 42/78 34/78 45/78 77/78
Notice that for concatenated geminate nasals (/n#n/), the VEL gestures were almost always
exhibited.
19
(One missing count of /n#n/ for Speaker A is due to de-aggregated or non-geminated TT
gestures.) For onset nasals (/#n/) and other two derived nasals (/t#j/, /t#n/), VEL gestures were identifiable
only in some utterances with individual variations (cf. for oral stops, quantifiable number of VEL gestures
were as follows: /#t/: 8/78; /t#/: 6/78; /t#t/: 4/78). For example, Speaker E does not lower the velum to
produce the singleton onset nasals, /n/-inserted and assimilated nasals, whereas Speaker C has VEL
gestures throughout all sequences of nasals most of the time. This simple raw count implies the
weakening of nasality in the onset position as well as the optionality of nasal gemination in the derived
contexts, with /n/-inserted geminates associated with the least number of velum gestures, followed by
assimilated geminates and concatenated geminates, respectively. This is in line with Hypothesis B1 about
18
Identifiable gestures are determined by the find_gest algorithm (Tiede 2010), by calculating velocity profiles in
the kinematic trajectories and retrieving only the movement which its velocity goes above the set threshold (20%).
19
For singleton coda nasals, VEL gestures were always present for all speakers.
93
/n/-inserted geminates having minimum instances of velum gestures compared to assimilated and
concatenated geminates.
4.3.2.2. Duration
When presenting the findings for the derived geminates, data for the onset nasals are also displays for a
comparison. However, the statistical analyses in this section only include different types of juncture
geminates. For these derived geminates, it is predicted that gestures will be longer for concatenated
geminates than assimilated geminates (Hypothesis B2).
Figure 4.8. TT duration (left) & VEL duration (right) for geminate nasals
Main effects of geminate types are found for both TT duration (F(2,167.17) = 16.053, p < .001*)
and for VEL duration (F(2,113.58) = 35.062, p < .001*) (Figure 4.8). Post-hoc comparisons indicate that
concatenated geminates (/n#n/) have longer TT and VEL durations than the other two geminates (TT
duration—concatenated vs. assimilated: t(166) = 4.800, p < .001*; concatenated vs. /n/-inserted: t(180) =
5.015, p < .001*; VEL duration—concatenated vs. assimilated: t(113) = 5.576, p < .001*; concatenated
vs. /n/-inserted: t(121) = 7.915, p < .001*). Assimilated geminates and /n/-inserted geminates do not differ
94
in their TT duration (t(158) = 0.235, p = .970), but the VEL duration on /n/-inserted geminates is the
shortest among three geminate types (assimilated vs. /n/-inserted: t(108) = 2.808, p = .016*).
4.3.2.3. Magnitude
Similar to the prediction in gestural duration, gestural magnitude for concatenated geminates is expected
to be greater than for assimilated geminates (Hypothesis B2). The linear mixed effects model shows that
there is the main effect of geminate types on TT magnitude (F(2,141.81) = 53.497, p < .001*) as well as
on VEL magnitude (F(2,112.42) = 50.4, p < .001*). See Figure 4.9.
Figure 4.9. TT magnitude (left) & VEL magnitude (right) for geminate nasals
Post-hoc comparisons show that TT magnitude is the smallest in /n/-inserted geminates (assimilated vs.
/n/-inserted: t(135) = 7.804, p < .001*; concatenated vs. /n/-inserted: t(150) = 9.814, p < .001*), followed
by assimilated and concatenated geminates, the last having the largest TT magnitude (concatenated vs.
assimilated: t(141) = 2.490, p = .037*). For VEL magnitude, concatenated geminates have greater VEL
magnitude than assimilated and /n/-inserted geminated (concatenated vs. assimilated: t(111) = 8.083, p
< .001*; concatenated vs. /n/-inserted: t(119) = 8.619, p <. 001*), but no difference is found between
assimilated and /n/-inserted geminates (t(107) = 1.357, p = .367).
95
In addition to the displacement magnitude from movement onset to movement maximum, VEL
lowering (y) and fronting (x) extrema components are examined individually to assess the absolute
location of the velum when the velum is at its maximally lowered position. There are main effects for
both VEL lowering and fronting extrema (lowering: F(2,70.406) = 20.478, p < .001*; fronting F(2,78.78)
= 14.823, p < .001*) (Figure 4.10). Pairwise comparisons show that for concatenated geminates nasals,
the velum at its lowest posture has a lower and more fronted absolute position in the vocal tract compared
to either assimilated or /n/-inserted geminates (all comparisons differ with p < .001*). There is no
difference in VEL extrema between assimilated and /n/-inserted geminates.
Figure 4.10. VEL lowering (left) & fronting (right) extremum for geminate nasals
In sum, magnitudes, as indicated by TT and VEL displacements as well as VEL lowering and
fronting extrema, all indicate that concatenated geminates have greater spatial magnitude than assimilated
geminates (Hypothesis B2). The /n/-inserted geminates, which also involve assimilation, show patterns
similar to assimilated geminates in general, with no statistical difference found in their VEL displacement
and extremum.
96
4.3.2.4. Timing
Next, the temporal lags between the velum and the oral gestures are investigated, with the prediction that
concatenated geminate nasals will have longer temporal lags compared to assimilated geminate nasals
(Hypothesis B3). A main effect is found for the onset lag (F(2,114.07) = 16.576, p < .001*)—i.e., the lag
from VEL lowering onset to TT raising onset—but not for TT onset to VEL raising onset lag (F(2,113.33)
= 2.112, p = .126) (Figure 4.11). The onset lag indicates how oral and velum gestures are phased with
respect to each other, while the latter lag is indicative of the consonant nasality interval. Post-hoc
comparisons for onset lag show that concatenated geminate nasals have significantly longer lag compared
to the two other geminate nasals (concatenated vs. assimilated: t(114) = 4.038, p < .001*; concatenated
vs. /n/-inserted: t(121) = 5.339, p < .001*). No difference is found between the assimilated and the /n/-
inserted geminate nasals (t(108) = 1.655, p = .227).
The results for timing show that the initial phasing relations between concatenated and
assimilated geminate nasals are different, with the former exhibiting longer velum to tongue tip onset
lags. The timing between TT onset to VEL raising onset, on the other hand, is not distinguishable among
different types of geminates.
Figure 4.11. VEL lowering to TT lag (left) & TT to VEL raising lag (right) for geminate nasals
97
4.4. Discussion
The current findings on articulatory characteristics of singleton and geminate nasals provide rich
information on internal gestural behaviors and coordination of gestures that comprise a nasal consonant.
Nasals involve oral constriction formation and velum lowering/opening movement; therefore, individual
gestural actions for both the oral and the velum gestures are investigated, as well as the temporal
coordination between the two gestures. Using real-time MRI speech articulation data, this study
investigates the spatiotemporal articulatory patterns that distinguish geminates from singletons, provides
articulatory indices relevant to the degree of consonant nasality, and compares kinematics of different
types of derived geminate nasals.
First, singleton and juncture geminate nasals are crucially differentiated not only by the oral TT
gestural duration (which is a commonly observed distinction), but also by the velum duration, about
which little instrumental data has existed previously. Specifically, the oral release duration and the velum
raising duration, rather than the oral closure and the velum lowering durations, are the critical measures
found to be longer in Korean juncture geminate nasals, compared to singleton onset and coda nasals. In
other words, oral and velum gestures in Korean geminate nasals take a longer time to de-activate their
closure or lowering actions, respectively, and return to a neutral position. This shows that at least in
distinguishing consonant length, the overall length of the gestural activation interval, including both the
constriction formation as well as the release gesture for the oral articulation (for velum, both lowering and
raising), should be taken into account. In terms of their spatial patterns, while juncture geminate nasals
have greater oral TT magnitude compared to singleton nasals, these are not differentiated for VEL
magnitude, both juncture geminates and coda nasals being somewhat larger than onset nasals. The
intergestural timing between the oral and the velum gestures is also not found to be distinctive between
singleton and geminate nasals. For example, the temporal lags between oral and velum onsets are not
differentiated between singleton and geminate nasals, indicating that gestures for these nasals have similar
phase relations. Thus, the results for velum magnitude and the intergestural timing relations indicate that
98
nasal geminates and singletons (especially singleton codas) are not distinguishable, while oral and velum
‘release’ durations as well as oral magnitudes are distinguishing factors that separate singleton nasals
from geminate nasals in Korean. Further investigation on other languages’ nasals may exhibit different
articulatory patterns than Korean.
One special phonological phenomenon highlighted for Korean was the onset denasalization
process. The reported onset nasal weakening process in Korean is articulatorily examined for the first
time in the current study. When comparing singleton onset and singleton coda nasals, onset nasals are
accompanied by shorter velum lowering duration (and longer TT closure duration). In addition, while the
oral TT magnitude is similar in onsets and codas, the VEL magnitude is significantly smaller in onset
nasals compared to coda nasals, both in the VEL displacement magnitude and the degree of fronting of
the lowered velum. As for the timing relations, the lags between TT onset to VEL raising onset exhibit a
large difference between onsets and codas, onsets having a near-zero lag and codas having a positive lag.
That is, for onset nasals, the velum starts to raise as soon as the TT closure begins to form, yielding very
little time to generate an acoustically nasal (consonant) sound.
These asymmetric articulatory patterns found between Korean onset and coda nasals are well
aligned with the reported onset denasalization process in Korean. Nasal weakening in the onset position is
not exhibited for the TT gestural duration and magnitude but instead is generated by velum duration and
magnitude, with shorter duration and smaller magnitude for onset velum gestures. Moreover, the
intergestural lag (indicated by the TT to VEL raising onset lag) also shows that the interval for consonant
nasality is shorter for onset nasals than for coda nasals. Again, this lag is close to zero for onset nasals,
which, along with the small VEL magnitude, explains why onset nasals in Korean are reported to be
denasalized. As velum gestures can still be identifiable for onset nasal production for most speakers in the
current data, onset nasals in Korean are better described as only partially denasalized or weakened. That
said, one of five subjects (Speaker E) showed a complete denasalization in that this subject did not have
any quantifiable velum movement in her onset nasals. Perhaps, this individual difference may be due to
an age factor, given that denasalization process is reported to be stronger for younger speakers (Yoo &
99
Nolan 2020), and Speaker E is the youngest speaker (25 years old) while the average age of other
speakers is 30.5 years old.
Recall that coda and geminate nasals are not different in their VEL magnitude nor in their oral-
velum timing measures. This lack of contrast between singleton coda and geminate nasals is also evidence
for weak onset nasality, in that the concatenated geminate nasals explored at this juncture is a sequence of
a coda plus an onset nasal. If this onset nasal is weakened and denasalized, juncture geminate nasals
would exhibit patterns that are similar to singleton coda nasals. For languages other than Koran, without
this onset denasalization process, the distinction between singletons and juncture geminates may be
greater.
Turning from singleton nasals, articulatory patterns among different types of geminate nasals in
Korean are also examined: these include concatenated geminate nasals, assimilated geminate nasals, and
the /n/-inserted geminate nasals. If geminates of different types have identical timing representations and
coupling structures, their articulatory realizations would also be similar. Based on previous accounts that
phonological processes that apply to singletons also apply to assimilated geminates (e.g., spirantization of
velars in Tigrinya), assimilated geminates undergo phonological neutralization to singletons, and thus
they are categorized as lexical or ‘true’ geminates, while concatenated geminates, where such
phenological process is blocked, are categorized as ‘fake’ geminates (Kenstowicz 1982, 1994, Kirchner
2000, Ladd & Scobbie 2003, Lahiri & Hankamer 1988, Ridouane 2010), different gestural representations
are hypothesized for these two groups of derived geminates. The general prediction is that concatenated
geminates exhibit longer activation interval and greater magnitude compared to assimilated geminates,
which involve neutralization. A third type of geminate, /n/-inserted geminates (underlyingly a /t#j/
sequence without a nasal), is compared with other two geminates, to examine whether the optionality in
gemination affects the degree of gemination in any articulatory parameters.
Overall, as indicated by the raw frequencies of identifiable velum gestures in three geminates,
concatenated geminate nasals almost always have velum movement, while velum actions are partly absent
for the other two geminates, with /n/-inserted geminates being the least likely to have a velum gesture. In
100
general, articulatory results for duration and magnitude suggest that for both oral and velum gestures in
these geminates, concatenated geminates have the longest duration and the greatest magnitude, followed
by assimilated geminates and /n/-inserted geminates. The velum-to-oral onset lags are also the longer for
concatenated geminates compared to assimilated and /n/-inserted geminates. The results show that
concatenated and assimilated geminates are clearly distinguishable in various articulatory domains,
including duration, magnitude, and timing patterns.
Apart from the differences in articulatory patterns, there is one articulatory parameter that is
shown to be similar for all three types of geminates; that is, the temporal lag from oral to velum raising
onset, which indexes the consonant nasality interval. Three different types of geminates are not
differentiated by this consonant nasality lag, and this imply that timing relations may play a crucial role in
defining what counts as a nasal geminate, or what is the representation of nasals in Korean.
By exploiting the multi-gestural structure of Korean nasal consonants, the current study explored
how oral and non-oral gestural actions as well as their coordination contribute to realizing phonological
distinctions. There are nevertheless unanswered questions that need further investigation. Temporal
coordination patterns might provide a window onto the gestural coupling structures present in the
representations of these multi-gestural complexes. For example, variability in timing relations can signal
the degree of cohesiveness in coupling structures. To test the tightness in coupling structures, it is worth
exploring how stably or flexibly the gestures within a multi-gestural unit are coordinated. This issue of
timing stability/variability will be considered in the next Chapter by probing temporal coordination under
prosodic variations and across individuals.
4.5. Conclusion
The current study provides understanding of the individual oral and non-oral actions as well as the oral-
velum coordination associated with Korean nasal consonant production. An articulatory investigation of
such multi-gestural complex units of nasal consonants was possible due to the use of real-time MRI
101
speech imaging technology, which captures both oral and non-oral dynamic articulation in midsagittal
vocal tract imaging. The Korean singleton and juncture geminate nasal distinctions were critically
distinguished by oral release and velum raising durations. Moreover, the findings clarify how the sound
change process of Korean onset denasalization is realized in the articulatory domain. Weaking in nasality
in the onset position is realized in various articulatory parameters for the velum actions, but not on the
oral behavior. Lastly, different types of juncture geminate nasals are compared to test whether geminates
have identical representations and were indeed found to differ. To conclude, this study illustrates how
multi-gesture structures like nasals are articulatorily characterized to construct phonologically contrastive
linguistic units.
102
5. Prosodic Variability of Multi-Gesture Complexes: Nasals
5.1. Introduction
This chapter explores the question of durational and timing stability of and among the gestures composing
the multi-gesture nasal molecules (segments). Dynamics of nasal singletons and juncture geminates in
Korean are examined to understand how prosody modulates their internal gestural coordination, since
variations in prosody—such as phrase boundaries, prominence—induce intra- and inter-gestural timing
variability (Byrd & Choi 2010, Katsika 2018, Mücke & Grice 2014). For example, prosodic effects on
timing relations predict longer and less overlapped gestures in the vicinity (or scope) of large phrase
boundaries. However, such articulatory timing variation has mostly been studied for individual
constriction gestures across segments, such as oral CC sequences or CV coordination (Beňuš & Šimko
2014, Byrd 1996b, Byrd & Choi 2010, Cho 2001, Cho & Keating 2001, Katsika 2018, Marin & Pouplier
2010, Mücke 2014, Saltzman & Byrd 2000, Yanagawa 2006).
However, less is understood about segment-internal intergestural timing and about the effect of
prosody on the within-segment timing relations. Previous studies suggest that gestures that comprise a
segmental molecule have a particularly high degree of cohesiveness, compared to those across segments
(Byrd 1996b, Fowler 2015, Hoole & Pouplier 2015, Kelso et al. 1986, Munhall et al. 1994).
Consequently, the tightness in coupling structures may directly influence the stability in gestural
coordination. For example, Shaw et al. (2019) measures the stability in timing by investigating how
resistant intergestural timing is with respect to individual gestural variations. They found that within-
segment intergestural overlap (e.g., onset-to-onset lag) exhibits a more stable pattern than intergestural
timing across-segments—i.e., segment-internal timing is less affected by individual gestural duration
compared to intergestural timing in consonant clusters.
Investigation of the stability in timing for multi-gesture linguistic structures provides information
on the strength in coupling relations, which in turn informs our understanding of the gestural
103
representation of those complex linguistic units. The current study uses prosodic phrase boundaries of
varying strengths as a probe into how malleable the coupled gestures are with regard to their intra- and
intergestural timing. In general, we explore how flexible the intergestural timing is under prosodic
variations for a given multi-gesture segmental molecule of nasal consonants with the prediction that
variation in coupling structures entails systematic differences in timing malleability. We will explore how
different molecular structures with the same atomic units (i.e., oral and velum gestures for nasals) can
have different temporal realizations.
5.1.1. Prosodic Variability of Within-Segment Timing
The basic assumption of Articulatory Phonology (AP) is that gestures—which are discrete units of that
represent the fixed dynamic control parameters for particular vocal tract tasks —are the primitive units
forming phonological structures (Browman & Goldstein, 1989, 1992). Gestures are basic pre-linguistic
units whose systematic organization into overlapping gestural arrangements—i.e., gestural constellations
or molecules (Browman & Goldstein, 1992)—is the basis for phonological contrast. Within the
framework of AP, dynamical organizations of the syllable structure and higher prosodic structures have
been proposed (C-center effect: Browman & Goldstein 1988; Coupled Oscillator Model of Syllable
Structure: Browman & Goldstein 1995, Goldstein et al. 2009; π- gestural approach for phrase boundary
effects: Byrd & Saltzman 2003; μ-gestural approach for stress and prominence; Saltzman et al. 2008). In
contrast to these larger phonological units, however, the dynamic structure of segment-sized units has
been less explicitly discussed. The basic tenets of AP do not reject the notion of a segment, but at the
same time, those tenets argue that the segmental unit is generally assumed rather than being tested and
directly demonstrated. There is also no overt distinction between segments as a distinct dynamical unit
active in speech production and segments as being a convenient term to refer more fundamentally to a
gestural molecule, but one that is relatively small and local in some way.
In articulatory terms, we view segments to be multi-gesture structures with a particularly high
degree of cohesiveness (Byrd 1996a, Hoole & Pouplier 2015), with strong intergestural glue between
104
gestures (Saltzman et al. 1998, 2000). The gestures comprising a segment are claimed to be tighter in their
coordination than those spanning segments (Byrd 1996a, Fowler 2015, Kelso et al. 1986, Munhall et al.
1994).
20
If some combinations of gestures are more stable in relative timing than other overlapping
gestures, this tight coupling implies that the stably coupled gestures are more likely to show up as a unit
in speech.
Foundationally in Articulatory Phonology phonological contrast is characterized as arising from
varying parameters and organizations of gestures. Contrasts in AP can involve the same articulatory
system moving in different characteristic ways (Goldstein 1989). For example, a contrast between an
alveolar stop and an alveolar fricative is made by changing the value of the constriction degree of the
Tongue Tip gesture, and the contrast between alveolar and palato-alveolar fricatives is encoded in
different values for the constriction location of the Tongue Tip gesture. Constriction location and degree
are major vocal tract variable parameters, as changing these parameters with the same active tract variable
is often a source of phonological contrast. Additionally however, contrast can be made by modifying the
timing patterns; for instance, pre- and post-nasalized consonants differ in their relative timing between
oral and velum gestures. In sum, utterances with the same gestures can contrast with one another in how
the gestures are temporally organized (Browman & Goldstein 1992). Thus, contrastive coordination
patterns can exist in a way specific to certain gestural molecules.
The question then follows: how is this timing or phasing variable incorporated in the cognitive
representation of a segment, and is such within-segment timing specification or representation
qualitatively or quantitatively the same or different from across-segment timing? In a phrase repetition
task with perturbation (Cummins & Port 1998, Tilsen 2009), there is a general tendency to phase the
onsets of content words to three points in the cycle aligned with the metronome tones: one third, half, and
two thirds of the cycle. This shows some preferred natural phasing of the rhythmic oscillatory system
20
To understand whether a set of gestures form a linguistically significant unit, Browman and Goldstein (1990)
suggest two approaches: i) first assume some particular unit and then search for gestural correlates, and ii) parse
articulatory movements and establish gestural units based on cohesiveness and variability.
105
corresponding to the phrase, so that timing is determined by, or at least sensitive to, the structure of the
prosodic unit itself. The coupled oscillator model of syllable structure, for example, predicts in-phase
timing relations in the onset position and anti-phase timing relations in the coda position (Nam et al.
2009, Goldstein et al. 2009).
21
On the other hand, relative phase relations may be directly determined by
gestural planning systems in a way specific to certain consonantal and vocalic segments, rather than by
naturally preferred phasing based on the coupled oscillator system. For example, a goal for the successful
production of a nasal consonant requires a systematic temporal coordination of the oral and the velum
gestures that will ensure some interval of a lowered velum posture while the oral constriction is formed.
22
If such specific higher-level or superordinate goals
23
exist for a segment (a segmental gestural molecule
as a linguistically significant unit is suggested in Browman & Goldstein [1990] and Mattingly [1990]), we
would expect due to this co-dependency between the gestures, to find a tight and more stable coordination
between these cooperative gestures that comprise a segment than for the coordination in larger linguistics
units including larger gestural molecules.
Goldstein (1989) considers gestural contrastive organizations on the basis of the Quantal Theory.
He suggests that a set of stable regions within the articulatory-acoustic space arise from different
organizations of gestures, and that these regions form the basis of possible phonological contrasts in
languages. When two gestures are phased by synchronizing some point in the two gestures’ cycles, there
are only a few types of phasing relations that yield contrast in the articulatory-acoustic states. For two oral
gestures A and B, there can be minimal or partial overlap with either A or B preceding another, but the
total overlap of the two gestures would be unstable (because the production of one articulator is impeded
21
This timing structure can be modeled by the double-well potentials by employing nonlinear dynamical functions
with potential minima at 0 and 180 degrees for in-phase and anti-phase relations, respectively (Iskarous 2017, Nam
et al. 2009).
22
In English, coda nasals have different velum-oral coordination before [t] or [s], in that “the maximum velic
lowering is synchronized with the onset of tongue-tip raising (Goldstein 1994: 261). Still, there is an interval when
velum is lowered during oral constriction.
23
The term ‘superordinate goal’, borrowed from psychology, is used to indicate tasks that require cooperation of
two or more gestures in achieving a single goal (See also Chapter 3 for the first mention of this term.).
106
by another or the acoustic consequences of one would completely mask the other) and thus banned from
possible contrastive organizations. On the other hand, for one oral gesture and one velum or laryngeal
gesture, total overlap as well as minimal and partial overlap generate stable articulatory states. For
example, contrasts can be made by phasing the velum gesture slightly before (pre-nasal stops), after (post-
nasal stops), or simultaneous to (nasal stops) the oral gesture.
Let’s turn from the role of contrastive phasing relations to stability in phasing relations as critical
in constituting contrastive segmental structures. In Maddieson’s (1998) report, doubly-articulated plosives
and fricatives are considered a phonological unit corresponding to a segment since the duration of two
oral gestures is not significantly different from each oral gesture’s duration. Moreover, Maddieson (1998,
p. 373) argues that labial-velar stops ([k
͡ p]) are not a sequence of distinct segmental articulations, but “a
coordinated single entity, with very little timing variation.” Velar+labial clusters (/k+p/), on the other
hand, show more variation in their timing. Oral gestures in doubly-articulated segments are not entirely
in-phase to each other, but nearly simultaneous with a slight offset. This small offset in time is necessary
for the two distinct sounds being heard (which would be part of the gestural planning goal of this
molecule, related to the perceptual recoverability), whereas a complete overlap of two gestures will mask
one sound over another. Furthermore, this fixed relative timing between gestures reduces the risk of
excess variability (Maddieson 1998).
If the tightness in coordination of gestures is what determines the nature of a multi-gestural
segmental complexes, this can be understood as coupled gestures being attracted to a phase-relation target
for a given segment. However, the segmental phase target may not be limited to one single point in the
cycle, as multi-gestural structures of a segmental gestural molecule have been observed to show distinct
phasing depending on the syllable position. For a given segment, there may be an asymmetry in target
phasing as a function of syllable position/structure.
In the coupled oscillator model of syllable structure, it is postulated that gestures with narrower
constrictions, more consonant-like gestures (and sometime more anterior gestures) occur more
peripherally in the syllable, while the gestures with wider constrictions, vocalic gestures, are closer to
107
vocalic syllable center or peak (Browman & Goldstein 1995, Sproat & Fujimura 1993). That said,
gestures in the syllable onset are phased simultaneously (and realized with small offsets if competing
simultaneous phasing targets are resolved), whereas gestures are sequentially phased with one other in the
coda (Browman & Goldstein 1988, 1990, Byrd 1996b, Gick 2003, Krakow 1989, Maddieson 1998).
Moreover, the simultaneous onset timing generally exhibits more stability than the sequential coda timing
(Byrd 1996b, Goldstein et al. 2009, Nam et al. 2009). While this stability have been studied for onset and
coda consonant clusters, gestural molecules of a segment-sized granularity have received more limited
attention. The relevant examples are Tongue Tip and Tongue Body gestures associated with liquids, and
oral and velum gestures that comprise nasals, the production of which have been examined for American
English but rarely for other languages.
There are, however, language-specific variations that do seem to fail to adhere to these general
onset and coda phasing relations. For example, gestures associated with coda nasals can show an in-phase
relations in Aboriginal languages in English such as Arrernte (Tabain et al. 2004), and Western Canadian
English speakers’ production of liquids and glides in American English may show a more sequential
timing in the onset compared to the coda timing (Gick 2003, Gick et al. 2006). Thus, taken together, the
general coupled oscillator model of syllable structure and the language-specific superordinate goal of the
segmental gestural planning system can be thought to interact with each other, resulting in variation in
target phase relations. Segment-internal timing patterns with respect to syllable positions will be
examined in the current chapter.
5.1.2. Prosodic Variability of Singletons and Geminates
By comparing whether and how different types of geminates pattern in their articulation under prosodic
modulation, we can get a better picture of the phonological representation of temporal stability of co-
produced, overlapping gestures. It is well established that phrase boundaries of increasing strength locally
perturb gestures such that they become slower, larger, and less overlapped (see review in Byrd &
Krivokapić 2021). In the study reported below, different strengths of phrase boundaries are examined
108
with the working assumption that the relative timing of pairs of gestures specified with more stable
phonological coordination will be less affected by such variations in prosodic conditions. For instance, a
parallel to this in the domain of global speech tempo is that in a sequence of voiceless consonants such as
in kiss Ted, the distance between two glottal peak openings (for coda /s/ and onset /t/) decreases as speech
rate increases (Munhall & Löfqvist 1992). At slow rates, two opening movements of the glottis occur and
there is a trough between the two openings, and at fast rates, a single movement is found with similar
durations for the abduction and adduction phases. Based on familiar results of prosodic effects on timing,
we expect longer and less overlapped gestures at stronger phrase boundaries. Given the many findings of
this behavior for oral [C#C] sequences (Byrd & Choi 2010, Cho 2001, Yanagawa 2006), it is expected
that the velum and the oral gestures of a geminate spanning a boundary exhibit this common prosodic
behavior of the boundary lengthening effect.
That said, as the degree of overlap in geminates is susceptible to change depending on the
prosodic structure, singletons and derived (concatenated & assimilated) geminates are predicted show
noticeable difference in the stability in timing across prosodic conditions. Geminates, regardless of being
lexical or post-lexical, are generally articulatorily represented with sequences of two identical overlapped
gestures rather than one elongated gesture.
24
For instance, in their EMA study, Zeroual et al. (2015) find
that geminates and singletons do not differ in their opening and closing movement, but only in their
articulatory constriction (plateau) duration, suggesting that the relative acceleration/deceleration phases
are comparable between geminates and singletons and that the gestural activation duration and the
intergestural timing patterns distinguish the two categories (see also Byrd 1995). Between different
derived geminates (concatenated & assimilated), it is predicted that when two identical gestures in
geminates are pulled apart due to high level prosodic boundaries, geminates with a more flexible
24
See Gafos (2002) and Hagedorn et al. (2011b) for other possible accounts of the articulatory representation of
geminates, e.g., geminates may be distinguished from singletons by having longer activation duration and/or greater
constriction degree.
109
coordination tend to be affected more by prosodic variations. As a consequence, they are more likely to
be de-geminated (i.e., two gestures splitting apart) than those with a more stable coordination structure.
The coordination structures of distinctive phonological units can be represented by distinct
combinations of intergestural coupling relations. For example, structures with multiple competitive
couplings are predicted to result in shorter duration and less variability than those with a single non-
competitive coupling (Nam 2011). Nam (2011) for example found greater timing stability among gestures
linked or coupled in a tripartite structure than in a dual structure. That said, he proposes that (lexical)
geminates and singletons crucially differ in their coupling structures, geminates having no competitive
coupling, while singletons have competing coupling relations. The predicted coupling structure for
singletons is that both the closure and the release of the oral gesture are coupled in-phase with the
vowel—creating a competitive structure—whereas for geminates, only the oral release gesture is coupled
in-phase with the vowel. Thus, geminates show a more variable coordination and longer duration than
singletons that have with a tighter and shorter (closer in time) coordination. As such, the investigation of
the stability in timing allows us to assess coupling structures among gestures, thereby providing
understanding of the representation and organization of multi-gestural structures.
Another example of how different multi-gestural structures give rise to specific predictions for
intergestural timing is illustrated in Gafos et al. (2014) in their study on simplex and complex onsets. For
languages like English, complex onsets are permitted (e.g., [kɾaɪ] ‘cry’ or [glu] ‘glue’), whereas in
languages like Moroccan Arabic, only simplex onsets are allowed (e.g., [k.ra] ‘rent’ or [g.lih] ‘he grilled,’
where ‘.’ marks a syllabic boundary; Dell & Elmedlaoui 2002, Gafos et al. 2014). They describe these
language-specific phonotactics using differences in the coordination structures: for complex onsets, two
onset consonants are both coupled to the vowel, creating a stable timing relation between the consonants
and the vowel, whereas for simplex onsets, only the last consonant is coupled to the vowel, therefore
previous consonants are less stably coordinated with the vowel. Temporal coordination and stability
patterns can be derived from these coupling structures, and the postulated coupling relations of a gestural
molecule can be evaluated using experimental articulatory data.
110
5.1.3. Predictions Assessed in the Present Study
To assess within-segment intergestural timing stability, this chapter examines how rigid velum-oral
timing in nasals compares to other single gestural properties such as gestural duration and magnitude,
across prosodic variations. Nasal consonant production data collected using real-time MRI is investigated
to analyze the oral and the velum timing in multi-gesture molecules. Previous accounts of within-segment
timing report cohesiveness in intergestural coordination. We suggest that the within-segment gestures
cooperatively achieve a superordinate goal and that, due to this, tight (i.e., highly cohesive) coupling
structures may exist and cause within-segment timing to resist being malleable to local prosodic
modulations. A superordinate goal for nasal consonants, whether perceptual or cognitive in nature, would
necessarily require a certain interval of nasal airflow during the formation of the oral constriction. Thus,
the following hypothesis is postulated:
HYPOTHESIS A. Intergestural timing is more stable (less variable) than gestural duration & magnitude
as a consequence of prosodic context for within-segment gestures.
Next, we turn to the question of how changes in phonological processes are reflected in
articulatory variation. For example, the phonological sound change process of denasalization or nasal
weakening is commonly observed in Korean nasal consonants in the onset position (Ahn 2013, Chen &
Clumeck 1975, Cho & Keating 2001, Lee & Kim 2007, Kim 2011, Yoshida 2008, Yoo & Nolan 2020;
see also Ch. 4). The weakening of onset nasals tends to have greater effects at a higher prosodic boundary
and for younger generations (Cho & Keating 2001, Yoshida 2008, Yoo & Nolan 2020). For example, in
Yoshida (2008), oral and nasal sounds are separately recorded for Korean nasals using a nasometer, and
they found that the nasality of the onset weakens, although does not completely cease, at higher prosodic
boundary conditions. He reports that nasal sounds are realized weakly at the oral closure release, which
may also be accompanied by less velum lowering.
111
In parallel with the acoustic result, it is expected that an articulatorily-indexed nasality for onset
nasals will also become weaker at higher prosodic boundaries. The degree of nasality can be articulatorily
measured by the duration and magnitude of the velum lowering as well as the timing between oral and
velum gestures. Specifically, the oral closure onset to the velum raising onset lag is analyzed to quantify
the consonant nasality interval, as this lag indicates that duration of oral constriction during which the
velum is actively lowered. For coda nasals, however, such nasal weakening is not observed. The reported
nasal weakening at higher boundaries for onset nasals indicate their susceptibility to prosodic variation.
Unlike onset nasals, this specific oral-velum temporal lag indexing nasality would be expected to be
relatively stable across prosodic conditions. Using prosodic phrase boundaries of varying strengths as a
probe for how malleable temporal lags can be, the intergestural timing stability in onsets and codas are
compared in the present study across different levels of prosodic boundaries. We predict that different
degrees of nasality characterizing Korean onset and coda nasals are correlated with differences in timing
variability. Due to the on-going phonological sound change process of onset nasal weakening in Korean,
the following hypothesis is entertained:
HYPOTHESIS B. Onset nasals will show a more variable timing than coda nasals at boundaries of
varying prosodic strengths.
As for the juncture geminates in Korean, we compare two derived geminates: concatenated
(/n#n/) and assimilated (/t#n/) nasal geminates. Gemination frequently occurs at a morpheme boundary,
but when the boundary is big enough so that the gestures are de-aggregated, concatenated geminates will
become a sequence of a coda and an onset nasal, and assimilated geminates an oral stop coda and a
singleton nasal onset. That is, when focusing on the initial portion of a juncture geminate nasal molecule,
concatenated geminates may have properties of a geminate type organization or a coda type organization
depending on prosodic (and other) conditions. On the other hand, assimilated nasal geminates could
exhibit characteristics that range from a geminate type pattern to an onset type organization. As illustrated
112
in the previous chapter, geminates generally show a timing pattern that resembles that of coda
organization. Therefore, concatenated geminates would consistently have a coda type timing pattern,
whereas assimilated geminates will vary in their temporal coordination (between an onset and a coda
timing) as a function of prosodic conditions. In sum, as the degree of overlap in geminates is susceptible
to change depending on the prosodic structure, concatenated and assimilated geminates might show
noticeable differences in stability in timing across varying prosodic conditions.
HYPOTHESIS C. Concatenated geminate nasals show a more stable oral-velum timing relation
compared to assimilated geminate nasals (concatenated /n#n/ vs. assimilated /t#n/).
The gestural schemas for different types of derived geminates shown in Chapter 4 are repeated in (1).
(1)
Given these postulated coupling structures, variability in articulatory timing can be predicted (see Ch. 3).
Comparing between the concatenated geminate structure in (1a) and the assimilated geminate schema in
(1b-i), the coupling structure that has more direct paths (1b-i) is predicted to have tighter coordination (in
[1a], no direct path is in place between the first TT and the second VEL gestures and also between the
first VEL and the second TT gestures). On the other hand, if the assimilated geminate nasals are more
loosely coupled as in (1b-ii) with fewer number of coupled links than in (1b-i), the relative timing would
be more variable. By observing how concatenated and assimilated juncture geminates differ in their
propensity for malleability (e.g., de-gemination) as a function of varying prosodic conditions, we can
predict the coupling structures and the representation of multi-gestural segmental molecules.
113
We explore these hypotheses by analyzing timing pattern stability across structural and temporal
variation. This work on velum-oral coordination examines the temporal characteristics of syllable
structure and of consonant length, using prosodic perturbation as a tool for analyzing timing stability and
variability for a multi-gestural phonological structure.
5.2. Methods
The same dataset used in Chapter 4 is analyzed with regard to timing variability here in Chapter 5. See
4.2. for details on methods including subjects, materials, data acquisition, and data analysis. Recall that
data were collected from five Seoul Korean native speakers (Speaker A-E). The subjects produced
various nasal sequences: onsets (/#n), codas (/n#p/, /n#t/), concatenated geminates (/n#n/), and assimilated
geminates (/t#n/). There were four different prosodic conditions: Word (Wd), Accentual Phrase (AP),
Accentual Phrase with focus implementation (AP+focus), and Intonational Phrase (IP). (See Table 4.3.)
The target nasal sequences were placed spanning across these different prosodic boundaries. These three
levels of boundary conditions and additionally a prosodic condition with a focal prominence at an AP
boundary are either pooled or separately investigated to quantify stability/variability under prosodic
modulations (cf. only Wd and AP conditions were used in Chapter 4). The following summarizes
measurements used for the current data analysis.
(2) Measurements
• Oral & velum gestural duration (from onset to offset of movement)
• Oral magnitude (pixel intensity at maximal constriction)
• Velum magnitude (diagonal absolute displacement of velum centroids)
• Onset lag (the interval from velum lowering onset to tongue tip [TT] raising onset)
• Onset-to-target lag (the interval from TT raising onset to velum lowering target)
• Consonant nasality lag (the interval from TT raising onset to VEL raising onset)
(positive lags mean that the first gesture’s landmark is achieved earlier than the latter)
• Relative onset lag (TT raising onset relative to VEL duration)
• Relative nasality lag (VEL raising onset relative to TT duration)
114
In addition to absolute temporal lags, relative temporal latencies—e.g., the percentage of the way
through one gesture at which movement onset for the other gesture is initiated—are calculated to examine
how early or late one gesture’s movement onset occurs within the activation interval of the other gesture.
(See similar measures in Byrd 1996b, Byrd & Choi 2010, Chitoran et al. 2002, Pouplier et al. 2017). In
the current study, two relative lags are computed: a) relative onset lag indicating the percentage of the
way through the velum lowering movement at which TT constriction initiates and b) relative nasality lag
corresponding to when the VEL starts to raise relative to the TT constriction interval. Figure 5.1 displays
both absolute and relative timing measures used for data analysis.
Figure 5.1. Temporal lags between TT and VEL gestures
(a) onset lags from VEL lowering onset to TT onset, (b) onset-to-target lags from TT onset to VEL
lowering target, (c) nasality lags from TT onset to VEL raising onset, (d) relative onset lags measuring the
timing of TT onset relative to VEL movement cycle (from onset to raising), (e) relative nasality lags
measuring the timing of VEL raising onset relative to TT movement cycle (from onset to release).
As this chapter includes data from higher prosodic conditions (i.e., IP boundary and focus
conditions), 31 tokens were necessarily omitted due a lack of gemination, e.g., having de-aggregated
Tongue Tip (TT) gestures with a noticeable trough between the two gestures, in the derived geminate
115
nasals (/t#n/ & /n#n/). In addition, prolonged low velum postures of 500 ms or more were considered to
be indicative of a pause or breathing, and 18 such tokens were also excluded from data analysis.
25
Thus,
including only the nasal productions with identifiable VEL and TT gestures, a total of 736 tokens were
used in the data analysis.
Statistical analyses include linear mixed effects regression models with subjects and items as
random effects and prosodic conditions as fixed effects. Additionally, Pearson correlation coefficients (r)
are used to measure the strength and direction of linear relationship between two variables. Three
statistical tests are performed to compute significant differences in temporal variability; modified signed
likelihood ratio tests (M-SLRT) and asymptotic tests are used to measure the equality of coefficients of
variations (CoV)
26
and Levene’s tests for homogeneity of variance were additionally used (Frank &
Althoen 1995, Krishnamoorthy & Lee 2014, Marwick & Krishnamoorthy 2019). To supplement
variability test and provide a thorough comparative view of variability, interquartile ranges (IQR) are
additionally reported.
5.3. Results
5.3.1. Gestural Actions vs. Inter-Gestural Timing
In this section, gestural patterning for Korean coda nasals (/n#p/, /n#t/, and /n#n/) is investigated with
respect to different prosodic boundary conditions (i.e., Wd, AP, AP+focus, & IP boundary). First,
correlations between gestural duration, magnitude, and intergestural timing are examined to test whether
individual gestural actions affect intergestural timing or the timing is more or less stable under variations
25
At higher prosodic boundaries, some geminate sequences were pulled apart and had two separate TT gestures (IP
boundary: 20 tokens; AP boundary with focus: 3 tokens; cf. AP and Wd boundary: none). Additional 8 tokens from
the assimilated geminate production (/t#n/) were removed from Speakers A and C, due to a lack of nasal gemination
indicated by the late start of VEL lowering relative to TT constriction formation. There were 18 tokens with nasals
produced with velum duration longer than 500 ms, and they were all from the highest IP boundary condition (except
one, which was from the AP+focus condition) and from Speakers A and C.
26
Also called relative standard deviations (RSD = standard deviation devided by the sample mean.
116
in individual gesture’s duration and magnitude. Second, we test whether duration, magnitude, and timing
in multi-gestural nasal structures undergo prosodic boundary and/or prominence effects.
5.3.1.1. Correlations between duration, magnitude, and timing
In this section, z-scored data are used to report the pooled results across subjects and prosodic conditions.
For both TT and VEL gestures, there is a significant positive correlation between gesture duration and
gesture magnitude (TT: R = 0.21, VEL: R = 0.55). That is, longer duration is correlated with larger
magnitude (Figure 5.2).
Figure 5.2. Correlation graphs (z-scored within speaker) for duration & magnitude (TT/VEL)
For individual speakers, correlations between the oral TT gestural actions are significant except for
Speaker A (Speaker A: p = .11; Speakers B & C: p < .001*; Speakers D & E: p < .01*). The positive
correlations found between VEL duration and magnitude are statistically significant for all individual
speakers (all at p < .001).
Next, the pattern of temporal lags between velum gesture onset and tongue tip gesture onset (i.e.,
onset lags) with respect to the earlier nasal component gesture (i.e., VEL gesture) is investigated. It is
predicted that the intergestural lag within segments, with a stable coordination, is generally unaffected by
117
individual gestural duration and magnitude. The correlation results, however, show that there is a
significant positive correlation between the onset lag and VEL duration (R = 0.47) as well as between the
onset lag and VEL magnitude (R = 0.35; Figure 5.3). The onset of the oral TT component gesture is
delayed as the nasal VEL component lengthens or increases in magnitude. Individual results show that the
positive correlation is exhibited for all speakers for onset lag and VEL duration (all at p < .001) and for
onset lag and VEL magnitude (Speakers A & E: p < .01*; Speakers B, C, & D: p < .001*).
Figure 5.3. Correlation graphs (z-scored) for onset lag vs. VEL duration & magnitude
Another measure of intergestural lag—the interval from the tongue tip onset to the velum
lowering target (i.e., TT onset to VEL target)—is examined. In contrast to onset lag, this temporal lag
includes partial intervals of both VEL and TT gestures, so we test how sensitive this timing relation is
with respect to changes in VEL gestural actions and TT actions, respectively. Unlike the results for the
onset lags, there is no correlation between the oral onset to velum target lag and the velum actions, both
for duration (Figure 5.4: left) and for magnitude (Figure 5.4: right). There exists some individual speaker
variation in which Speakers A and C exhibit a negative correlation between the onset-to-target timing and
VEL duration, while other speakers do not reach significance. The relation between the onset-to-target
timing and the VEL magnitude is found to be insignificant for all speakers.
118
Figure 5.4. Correlation graphs (z-scored) for onset-to-target lag vs. VEL duration & magnitude
For the onset-to-target lag and TT gestural actions, again, no correlation (positive or negative) is found for
timing versus TT duration (Figure 5.5: left) nor versus TT magnitude (Figure 5.5: right). Individual
speakers’ correlations vary between oral onset to velum target lag and TT duration (Speaker A & C:
negative correlation; Speaker B & D: no correlation; Speaker E: positive correlation), but between onset
to target lag and TT magnitude, no speakers have a significant correlation.
Figure 5.5. Correlation graphs (z-scored) for onset-to-target lag vs. TT duration & magnitude
119
The correlation results indicate that although VEL to TT onset-to-onset timing increases with the
action of the first velum component of the nasal consonant, TT onset to VEL target lag remains basically
stable across variations in oral and velum duration and magnitude. In the next section, we will observe
how the gestural actions and intergestural timing pattern differently under the influence of prosodic
modulation.
5.3.1.2. Gestural actions and timing across prosodic modulations
The z-scored data (normalized for each speaker) are analyzed using linear mixed effects models with
subjects and items as random effects and prosody (three phrase boundary conditions & a focus condition)
as a fixed effect.
There is the main effect of prosody (F(3,206.12) = 34.9, p < .001*), and the Tukey’s post-hoc
pairwise comparisons show that TT duration at IP boundary and under focus are significantly lengthened
compared to TT duration at smaller Wd & AP boundaries (Figure 5.6: left). Prosodic boundary and
prominence effects are also shown on TT magnitude (F(3,196.96) = 34.287, p < .001*; Figure 5.6: right).
Post-hoc tests indicate that TT magnitude increases at higher prosodic boundaries.
Figure 5.6. TT duration and magnitude at boundaries (Wd, AP, IP) and under focus (AP+focus)
120
Turning next to the prosodic effects on the velum gesture, there is a main effect of prosody for
both duration and magnitude (VEL duration: F(3,203.88) = 44.368, p < .001*; VEL magnitude:
F(3,216.09) = 11.65, p < .001*; Figure 5.7). Specifically, duration increases with focus implementation
and increases significantly more at the largest phrase boundary (IP). Similarly, VEL magnitude becomes
greater under focus, and the VEL gesture moves the farthest away from its raised posture at IP boundary.
These results show that both the oral and the non-oral velum component gestures are sensitive to prosodic
modulations.
Figure 5.7. VEL duration and magnitude at boundaries (Wd, AP, IP) and under focus (AP+focus)
Finally, we examine whether intergestural timing is generally stable across prosodic variations.
There is a main effect of prosody on onset lags (F(197.83) = 6.089, p < .001*), with pairwise comparisons
revealing that onset lags at AP boundaries are shorter than the lags at higher IP boundaries and under
focus (Figure 8: left). However, the smallest Wd boundary condition and the largest IP boundary
condition are not distinguished by onset lags, nor are the Wd boundary and the focus condition.
Additionally, no effect of prosody is found on TT onset to VEL target intergestural lag (F(3,189.55) =
1.162, p = .326; Figure 5.8: right). This indicates that this specific TT onset-to-VEL target timing does
121
not vary significantly across different levels of prosodic phrase boundaries and under the influence of
prominence.
Figure 5.8. Onset lags and TT onset-to-VEL target lags at boundaries (Wd, AP, IP)
and under focus (AP+focus)
The current findings indicate that while individual gestural actions for multi-gesture complexes—
whether oral or velum—are subject to prosodic modulations, there is a crucial intergestural timing
(namely TT onset to VEL target lags) that has stable coordination and is resistant to variations in prosodic
conditions (Hypothesis A).
5.3.2. Variability in Timing: Onset vs. Coda Nasals
In this section, two intergestural lags are examined as an index of intergestural timing stability: first, the
onset-to-onset lag, which is the interval from VEL lowering onset to TT movement onset, and second, the
consonant nasality lag, indicated by the time from TT movement onset to VEL raising onset. Intergestural
timing in onset nasals (/#n/) and coda nasals (/n#p/, /n#t/, /n#n/)
27
are investigated, based on the prediction
27
Here, concatenated juncture geminate nasals (/n#n/) are also coded as coda nasals for statistical analyses because
they are composed of a coda and a subsequent onset nasal, and these temporal lags consider specifically the initial
portions (‘left edge’) of the sequence (cf. Chapter 4.3.1.3)
122
that the lag measures are more variable in onset nasals compared to intergestural lags in coda nasals
(Hypothesis B) because Korean onset nasals are subject to phonological sound change process of
denasalization. The results from four out of five speakers (Speaker A-D) are presented. Speaker E is
excluded as only a few tokens of quantifiable VEL gestures are observed in Speaker E’s production of
onset nasal consonants (/#n/). We examine whether the phonological structure is represented in the
articulatory timing variability, in addition to having considered patterns of intergestural timing in the
preceding chapter.
5.3.2.1. Onset timing
Onset and coda nasals have similar mean VEL onset-to-TT onset lags (onset /#n/: mean lag 137.9 ms
[st.dev: 71.1 ms]; coda /n#/: 105.7 ms [28.8 ms]). (In terms of variability in timing, Figure 5.9 (a) shows
coda and geminate nasals have similar onset lag patterns,
28
and in the individual speakers’ density plots in
Figure 5.9 (b), coda and geminate nasals are pooled together as noted above.) Figure 5.9 indicates that
while onset lags have overlapping distributions between onset and coda nasals, onset nasals have a more
dispersed distribution than coda nasals for all speakers.
28
Coda nasal: mean onset lag 100.6 ms (st.dev 33.9 ms); geminate nasal: mean onset lag 101.4 ms (st.dev 32.3 ms).
123
(a)
(b)
Figure 5.9. Density plots for onset lags in onset & coda nasals
(a) overall and (b) individual with geminates coded as codas
The temporal variability in onset nasals compared to coda nasals is statistically tested as indicated
in Table 5.1. All three statistical tests find that variations in onset-to-onset lags are different between
syllable onset nasals and coda nasals with onset nasals exhibiting greater timing variability. Interquartile
Range (IQR) also suggesta that onset nasals have more variable onset lags compared to that in coda nasals
(onset nasals: IQR = 66 ms; coda nasals: IQR = 36 ms). Moreover, for all individuals, intergestural onset
lags’ RSD values are greater in the syllable onset nasals than in the coda nasals.
Table 5.1. Tests on the coefficients of variations in onset lags by syllable structure
Test name Test measure Test statistic p-value
M-SLRT Coefficient of variation 51.027 < .001*
Asymptotic Coefficient of variation 66.302 < .001*
Levene’s test Homogeneity of variance F(1,429) = 55.045 < .001*
124
5.3.2.2. Consonant nasality timing
The consonant nasality interval indicated by the TT onset to VEL raising onset lag is tested next. Recall
that the mean nasality lags are near-zero for onset nasals (20.3 ms [st.dev: 37.8 ms]), and coda nasals have
positive nasality lags (92.7 ms [st.dev: 39.9 ms]). In addition to the difference in mean lag measures,
syllable onset and coda nasals show clearly distinct distributions (Figure 5.10). (Again, with highly
overlapping timing patterns found between coda and geminate nasals as shown in Figure 10(a),
29
geminate nasals (beginning with a coda nasal) and coda nasals are both coded as coda nasals in the
individual results in Figure 10(b) as well as for statistical analyses.)
(a)
(b)
Figure 5.10. Density plots for nasality lags in onset & coda nasals
(a) overall and (b) individual with geminates coded as codas
29
Coda nasal: mean nasality lag 90.2 ms (st.dev 52 ms); geminate nasal: mean nasality lag 96 ms (st.dev 40.2 ms).
125
Tests of nasality timing variability in Table 5.2 show mixed results: the modified signed
likelihood ratio test (M-SLRT) and the asymptotic test show that onset and coda nasals have different
timing variation, with onset nasals having more variable timing than coda nasals. However, the Levene’s
test results indicate that there is no difference in the variation of the consonant nasality lag in onset and
coda nasals. The significant differences in coefficients of variations (CoV) may be due to onset nasals
having near-zero nasality lags, as CoV values increase sharply when the mean value is close to zero. A
supplementary variability measure, IQR, which is less affected by outliers, indicates that onset nasals
have more variable nasality lags (IQR = 48 ms) than the same lag in coda nasals (IQR = 24.1 ms).
Table 5.2. Tests on the coefficients of variations in nasality lags by syllable structure
Test name Test measure Test statistic p-value
M-SLRT Coefficient of variation 87.707 < .001*
Asymptotic Coefficient of variation 168.534 < .001*
Levene’s test Homogeneity of variance F(1,422) = 1.156 .283
5.3.2.3. Nasality lags across prosody
Next we turn to a consideration of using prosodic structure to probe the relative stability of the several
coupled nasal structures under investigation. In essence the prosodic context is used to (potentially)
perturb the intergestural coordination internal to the nasal segment. Recall that Hypothesis B stated that
onset nasals will show more timing variability than coda nasals at boundaries of varying prosodic
strengths, that is, the internal intergestural timing of onset nasals will be more influenced by prosodic
context/structure than for codas. Given the stronger onset denasalization in Korean at high phrase
boundaries, the articulatory nasal interval is expected to be shortened in specific prosodic conditions for
onset nasals. Thus, the consonant nasality lag, indicated by the TT onset to VEL raising onset lag, is
examined across different prosodic conditions, including three boundary conditions (Wd, AP, & IP) and
one focus condition (AP+focus).
126
There is no main effect of prosody on nasality timing in onset nasals (F(3,80.47) = 0.409, p
= .747; Figure 5.11: left). There is however a significant effect of prosody found on coda nasals’ nasality
lag (F(3,161.54) = 31.781, p < .001*; Figure 5.11: right). Post-hoc comparisons for coda nasals indicate
that nasality timing is significantly lengthened at the highest phrase boundary (IP) compared to the
nasality interval at smaller boundaries (all at p < .001). Nasality lags also increased under focus for coda
nasals (AP+fc vs. AP: t(157) = 2.715, p = .037*).
Figure 5.11. Nasality lags at boundaries and under focus in onset and coda nasals
The results on nasality timing across different prosodic conditions show that there is a prosodic
effect decreasing internal intergestural overlap for timing in coda nasals, and a lack thereof in onset
nasals. This lack of prosodic effect on onset nasals will be further considered in the discussion.
5.3.2.4. Relative lags across prosody
In addition to absolute nasality lags, relative latencies can provide an index of proportional overlap
between the gestures. This effectively ‘normalizes’ for individual gesture duration of the ‘earlier’ gesture,
in this case the velum lowering, which indicates the phase between the pair of gestures. Relative onset lag
indicates when the TT gesture begins to form its constriction relative to VEL duration, i.e., within the
interval from velum lowering onset to the start of its raising. For this relative latency measure, Figure 5.12
127
demonstrates that TT constriction starts later for syllable onset nasals, i.e., closer to the initiation of velum
raising (with an average of 87.7 % ± 21.7 %), while for syllable coda nasals the TT constriction gesture
begins when the VEL lowering gesture is halfway through its lowering (53.3 % ± 11.7 %). Moreover,
there is no significant effect of prosody on relative onset lags in syllable onset nasals (F(3,84.227) = 0.98,
p = .406), whereas in coda nasals, relative onset lags are significantly different across prosodic conditions
(F(3,156.87) = 10.295, p < .001*). Post-hoc comparisons for coda nasals indicate that relative onset lags
are shorter at the IP boundary condition compared to other prosodic conditions (all at p < .001). The
earlier proportional placement of TT onset relative to VEL duration in coda nasals at higher prosodic
conditions may be the result of prosodic lengthening of velum duration. In addition to absolute nasality
lags, relative onset lags show that the proportional interval from TT onset to VEL raising becomes longer
at IP prosodic boundaries only in the coda nasal production.
Figure 5.12. Relative onset lags at boundaries and under focus in onset and coda nasals
Next, relative nasality lags are computed as the proportional timing of VEL raising within the TT
constriction duration (from movement onset to initiation of release). Figure 5.13 illustrates different
temporal latency patterns between syllable onset and coda nasals: relative nasality lags are near zero
percent in onset nasals (12.5 % ± 22.5 %) and around the middle in coda nasals (63.3 % ± 18 %). This
implies that as soon as the TT constriction is formed, the velum begins to raise for onset nasals, and
128
velum raising is relatively later within TT gesture for coda nasals. Prosodic effects on relative nasality
lags are found to be absent for both onset and coda nasals (onset /#n/: F(3,80.293) = 0.659, p = .58; coda
/n#/: F(3,158.13) = 2.035, p = .111).
Figure 5.13. Relative nasality lags at boundaries and under focus in onset and coda nasals
5.3.3. Variability in Timing: Assimilated vs. Concatenated Geminate Nasals
This section examines the variability in oral-to-velum timing in two different types of juncture geminate
nasals: assimilated and concatenated geminates. It is predicted that the intergestural timing in assimilated
geminate nasals is more variable than the timing in concatenated geminate nasals (Hypothesis C). Similar
to the onset vs. coda nasal comparisons, the onset lag (i.e., VEL lowering onset to TT onset lag) and the
consonant nasality lag (i.e., TT onset to VEL raising onset lag) are investigated separately. Target
consonants are assimilated geminate nasals (/t#n/) and concatenated geminate nasals (/n#n/). Data from
Speakers A-D are reported, excluding Speaker E’s data because no velum movement is observed at focus
and at IP boundary conditions for this speaker’s production of assimilated geminates (/t#n/).
5.3.3.1. Onset timing
With mean onset lags for assimilated geminates (83.8 ms [st.dev 27.7 ms]) being slightly shorter than
onset lags for concatenated geminate (105.8 ms [st.dev 28.3 ms]), density plots in Figure 5.14 show that
129
onset lags in assimilated geminate nasals and in concatenated geminate nasals are highly overlapped in
distribution.
(a)
(b)
Figure 5.14. Density plots for onset lags in geminate nasals: (a) overall and (b) individual
Subsequent variability tests on coefficients of variation and homogeneity of variance all indicate
that the onset timing variability in assimilated and concatenated geminate nasals is not significantly
different from each other (Table 5.3), though some tests approach significance. That said, the calculated
interquartile range (IQR) for onset lags is longer in assimilated geminates than in concatenated geminates
(assimilated /t#n/: 48 ms; concatenated /n#n/: 36 ms). Overall, there is no strong evidence that onset
timing variability differentiates the two different types of derived geminate nasals.
130
Table 5.3. Tests on the coefficients of variations in onset lags by geminate type
Test name Test measure Test statistic p-value
M-SLRT Coefficient of variation 3.484 .062
Asymptotic Coefficient of variation 3.587 .058
Levene’s test Homogeneity of variance F(1,187) = 0.019 .89
5.3.3.2. Consonant nasality timing
The articulatory nasality interval indicated by the TT onset to the VEL raising lag is widely dispersed for
assimilated geminate nasals (mean lag 56.1 ms [st.dev 45 ms]) compared to concatenated geminate nasals
(mean lag 93.3 ms [st.dev 39 ms]). In Figure 5.15, the distribution patterns of consonant nasality lags are
visually represented with density plots. For assimilated geminates, a huge interspeaker variability is
exhibited ranging from a near-zero lag (Speaker D) to a positive lag (Speakers A-C). On the other hand,
for concatenated geminates, all speakers have their peak density in nasality lags at around 80-90 ms.
(a)
(b)
Figure 5.15. Density plots for nasality lags in geminate nasals: (a) overall and (b) individual
131
The statistical tests on the variability in nasality timing also confirms the above observation that
assimilated geminate nasals have greater timing variability compared to concatenated geminate nasals.
This is shown in Table 5.4, where the tests of variability in nasality lags indicate a significant difference
between assimilated and concatenated geminate nasals. Moreover, the IQR is longer in assimilated
geminates (60.1 ms) than in concatenated geminates (24 ms), and likewise, the overall RSD value is
greater in assimilate geminates (78.2%) compared to concatenated geminates (41.7%). The RSD for
individual speakers also indicates that for all speakers, greater RSD is found in assimilated geminate
nasals compared to that in concatenated geminate nasals.
Table 5.4. Tests on the coefficients of variations in nasality lags by geminate type
Test name Test measure Test statistic p-value
M-SLRT Coefficient of variation 22.456 < .001*
Asymptotic Coefficient of variation 24.183 < .001*
Levene’s test Homogeneity of variance F(1,187) = 12.744 < .001*
5.3.3.3. Nasality lags across prosody
Next, the nasality timing is examined by each prosodic condition for assimilated and concatenated
derived geminates. The effect of prosodic conditions on nasality lags shows that for assimilated geminate
nasals, the nasality lags are similar across different phrase boundaries and focus conditions (F(3,72.391) =
1.666, p = .182; Figure 5.16: left), while for concatenated geminate nasals, the nasality lags are
significantly different across varying prosodic conditions (F(3,103.56) = 12.07, p < .001*; Figure 5.16:
right). Post-hoc pairwise comparisons indicate that the nasality lag for concatenated geminates increases
at the highest boundary condition (IP boundary) compared to other prosodic conditions (Wd: t(104) =
5.365, p < .001*, AP: t(104) = 5.37, p < .001*, & AP+fc: t(104) = 3.595, p < .01*).
132
Figure 5.16. Nasality lags at boundaries and under focus in juncture geminate nasals
5.3.3.4. Relative lags across prosody
Next we turn to an index of relative or proportional overlap internal to the nasals, looking first at relative
onset latencies. In assimilated geminate nasals, compared to concatenated geminate nasals, relative onset
latencies increase at higher prosodic conditions (Figure 5.17). This is statistically confirmed with the main
effect of prosody on relative onset lags in assimilated geminate nasals (F(3,72.342) = 3.66, p = .016*) but
a lack thereof in concatenated geminate nasals (F(3,103.51) = 2.13, p = .101). Pairwise comparisons
reveal that in assimilated nasals, relative onset lags are longer at IP boundaries than the same lags at Wd
boundaries (IP vs. Wd: t(73) = 3.133, p = .013*). At lower boundaries, TT starts about midway through
the VEL gestural movement for both assimilated and concatenated geminates. At higher boundaries and
with focus, however, the initiation of the TT constriction is delayed relative to velum lowering, showing a
relative timing pattern that is similar to that of singleton onset nasals, which have weak nasality.
133
Figure 5.17. Relative onset lags at boundaries and under focus in juncture geminate nasals
Next, relative nasality lags changes across prosodic variations in both assimilated geminates
(F(3,72.353) = 7.306, p < .001*; Figure 5.18: left) and concatenated geminates (F(3,103.12) = 5.227, p
< .01*; Figure 5.18: right). Post-hoc tests indicate that relative nasality lags are shorter at higher prosodic
conditions, given significant differences between the lags at Wd boundaries compared to relative timing
at other prosodic conditions (assimilated geminates—AP: t(72.1) = 2.647, p = .048*; AP+fc: t(72.3) =
4.05, p < .001*; IP: t(73) = 4.099, p < .001*; concatenated geminates—AP: t(103) = 3.14, p = .012*;
AP+fc: t(103) = 3.293, p < .01*; IP: t(103) = 3.009, p = .017*). Again, assimilated geminate nasals show
relative nasality timing patterns that are comparable to relative timing exhibited for concatenated nasals at
lower prosodic boundaries. However at higher prosodic conditions, assimilated geminates’ relative
nasality lags approaches zero percent, indicating that the VEL begins to close/raise soon after the start of
TT constriction. Overall, the findings from relative temporal lags reveal that relative timing in assimilated
geminate nasals is much more flexible than the like timing in concatenated geminate nasals in Korean.
134
Figure 5.18. Relative nasality lags at boundaries and under focus in juncture geminate nasals
5.4. Discussion
In this chapter, the stability and the variability of intergestural timing in multi-gesture complexes is
examined, with prosodic boundary and focus conditions serving to induce variability in gestural temporal
organizations. The presence of prosodic variations allows for a probe to assess how rigid or malleable the
internal intergestural timing is in the production of the Korean nasal consonants in various syllable
positions and of varying derived origins. First, the stability in segment-internal intergestural timing is
explored, comparing the intergestural timing patterns with the spatiotemporal patterning of the individual
gestural actions. Following this examination of intergestural coordination patterns, the variability in
timing is considered with regard to whether differential stability exists for nasals at different syllable
positions (onset and coda nasals) or nasals in different types of juncture geminates (assimilated and
concatenated geminates).
The results for stability in intergestural timing in Korean coda nasals (setting aside onset nasals,
which undergo sound change process in Korean) suggest that individual gestural actions’ duration and
magnitude are positively correlated with each other for both the oral and the velum gestures. However,
one instance of oral-velum intergestural timing—specifically the temporal lag from the TT raising onset
to the velum lowering target—is found to be unaffected by individual gestural durations and magnitudes.
135
In other words, this oral-to-velum target timing remains stable across variations in individual gestural
actions.
These findings for gestural organization illuminate potential speech production goals. We assume
that the least variable properties across varying conditions are likely to be the (representationally)
controlled cognitive goals of speech production, while more variable properties are interpretable as the
degrees of freedom of the speech task (Perrier & Fuchs 2015). Thus, investigating stability in
intergestural timing illuminates how multiple speech articulators collaboratively are controlled to achieve
the goal of producing speech materials, the nasal segmental unit or molecule in this instance. In the case
of coda nasal complexes in the current experimental data, the coordination between the formation of oral
constriction and the achievement of velum lowering is stable across prosodic variations. This specific
stability in the lag between the oral onset to the velum target may thus be representationally encoded in
the intergestural coupling graph and considered to be a goal for the nasal consonant production in Korean.
That said, it is certainly possible if not likely that slightly different specific temporal lags may be found to
be the stable arrangement in other languages (e.g., in English, oral and velum gestures begin
simultaneously in onset nasals; Byrd et al. 2009; Krakow 1989).
The EMA study on Russian complex segments (i.e., palatalized labials) and segment sequences
(i.e., labial + liquid sequences) from Shaw et al. (2019) suggest that onset lags between gestures within a
(complex) segment remains stable while gesture duration varies, whereas onset lags and gesture durations
are positively correlated in gestures that span across segments. Coda nasals, explored in the current study,
would thus be predicted to show a within-segment timing pattern with stable onset lags across variations
in gesture duration. However, the onset-to-onset lags (from the velum gesture to the TT gesture) in coda
nasals are found to increase with the increase in duration as well as in magnitude of the velum gesture;
that is, onset lags in the present study of Korean are positively correlated with velum duration and with
velum magnitude. In Korean nasals, onset to velum target lag, rather than onset lags, are found to be
stable. That said, different languages and, perhaps, segments or segmental natural classes may have
136
different temporal lags that are found to be stable for a certain segmental structure, as encoded in their
specific coupling graphs.
Next, the stability in within-segment intergestural timing is examined across different prosodic
conditions, focusing on how individual gestural actions and intergestural timing changes (or not) as a
function of prosodic boundaries and prominence. Articulatory gestures in the domain of prosodic phrase
boundaries are affected both temporally and spatially, with increases in gesture duration and temporal
lags (decreased overlap) and/or with concomitant increase in gesture magnitude (e.g., Byrd et al. 2000,
Byrd & Choi 2010, Cho & Keating 2001; see e.g., references in Byrd & Krivokapić 2021). Such prosodic
effects are exhibited in the current data, specifically for coda nasals, in that the oral and the velum
gestures increase in duration at higher boundary conditions (at IP boundary) and under focus (AP with
focus implementation) compared to the durations at lower boundaries (Wd and AP boundary). Moreover,
magnitudes of oral and velum gestures are greater at IP boundaries and with focus. On the other hand, the
intergestural lags (i.e., the oral onset to velum target lag and relative lags) do not show prosodic effects in
the production of coda nasals; these temporal lags are not lengthened at stronger prosodic conditions.
30
Therefore, although individual gestural actions undergo prosodic effects, the intergestural timing in the
nasal consonant production remains stable across prosodic modulations. This finding suggests a criticality
in the segment internal timing relations within nasal multi-gesture complexes, a stability that may be
crucial in the representation of nasal segment.
Furthermore, the EMA results on Korean consonant or consonant-vowel sequences in Cho’s
(2001) study show that there is a word-internal morpheme boundary effect on intergestural timing, that is,
gestural coordination is less overlapped across a morpheme boundary than within a single morpheme
across segments. On the contrary, the current results on oral onset-to-velum target timing within a word-
final singleton nasal segment do not exhibit a boundary effect, even at contexts with higher boundaries
(Wd, AP, and IP) than the morpheme boundary. This presents compelling evidence that some segment-
30
Note that the absolute ‘nasality lag’ increases at higher prosodic conditions in the coda nasals.
137
internal timing is more stable across prosodic variations than across-segment timing. Taken together,
gestural coordination within Korean nasal segments has a special status of being resistant to prosodic
variations, and this stable coordination supports multiple gestures’ synergistic achievement of the
superordinate goal for nasal segments, requiring a fixed velum-oral coordination.
Turning next to different degrees of variation in intergestural timing, lags (onset and consonant
nasality) are investigated comparing i) nasals in the syllable onset and in the coda position and ii) nasals
in assimilated and concatenate juncture geminates. Overall findings show that the Korean onset nasals
have more variable internal temporal lags compared to coda nasals, and that assimilated geminate nasals
exhibit more variable nasality timing than concatenated geminate nasals. An account of these different
degrees of variability in timing is further discussed (and will be modelled in Chapter 6).
The coupled oscillator model of syllable structure has proposed gestures in the syllable onset have
in-phase relations, while those in the coda have anti-phase relations (Goldstein et al. 2009, Nam et al.
2009). The foundational differences in phase relations predict systematic differences in phasing stability
across the syllable margins, with in-phase timing, as suggested for syllable onsets, generally being more
stable than anti-phase timing, as suggested in codas. Further, potential phase transitions from anti-phase
to the more stable in-phase are predicted and have been shown with, for example, increases in speech rate
(Haken et al. 1985, Lee et al. 1995, Parrell 2012, Saltzman et al. 2006, de Jong 2001). The greater
stability of onsets having in-phase relations than codas with anti-phase relations is also confirmed in
previous articulatory studies (Byrd 1996b, Goldstein et al. 2007). However, the extended discussion in the
field of the stability shown for syllable onset timing is discussed specifically for timing across segments.
The within-segment intergestural timing in a single onset (or coda) consonant is far less studied and not
fully understood; it is not well-attested whether the predictions of less variable in-phase coupling
compared to anti-phase coupling are born out for within-segment coupling relations as well. The present
study allows us to undertake this.
Onset and coda nasals in Korean explored in this study are distinguished in phonological
processes because only onset nasals, but not coda nasals, are subject to the denasalization sound change
138
process. Due to their exhibiting different degrees of denasalization, which depend on the speakers’ age,
prosodic conditions and other contextual variations, nasals in the syllable onset are predicted to be
produced with varying degrees of nasality. This phonologically unstable nasal status in onsets is realized
(in part) via the variability in intergestural timing, with onset lags showing more variable timing than
nasals in codas. Moreover, the articulatory nasality lags, both absolute and relative, are near-zero in onset
nasals and longer in coda nasals, indicating a reduced nasality interval in onset nasals. (That said, nasality
lag variability cannot be reliably tested due to biases in variability measures for near-zero values.)
Given that the coda nasal condition includes nasals in three different segmental contexts (coda /n/
before a following /p/, /t/, and /n/) whereas there is only one context for the onset nasal condition, greater
timing variability found in onset nasals than in coda nasals is even more surprising. Overall, in contrast to
the onset stability reported in across-segment timing, the within-segment timing in Korean nasals
suggests that syllable onset timing can be more variable than coda timing, with on-going sound change
processes contributing to systematic temporal variability and/or vice versa.
Turning to the sources that determine the stability in intergestural timing, previous accounts in
Articulatory Phonology suggest that gestures coordinated with a narrow phase window show a high
degree of cohesion potentially due to lexically specified timing in their coupling structure (Byrd 1996a) or
that stronger bonding strength between gestures results in lesser variability in timing (Browman &
Goldstein 2000). In this line, segments with multiple gestures are assumed to exhibit stable intergestural
timing due to their tight coupling structures. Alternatively, without adding extra strength in coupling
relations, stable temporal organization can emerge from certain coupling structures for segments. Under
the simplest account, there are no specific degrees of overlap other than in-phase onset and anti-phase
coda relations are presumed for sequences of gestures. For segmental complexes with multiple gestures,
however, a certain temporal arrangement must necessarily be realized in order to achieve their goal of
adequately producing the intended sound (‘a superordinate goal’). For example, for nasal consonants, the
139
interval for velic opening must overlap with the oral closure.
31
If the coupling relations are specified in
the lexicon, which means that gestures are specifically timed with each other, intergestural timing will be
more rigid in response to interactions with syllable structure or prosodic variations than the gestures that
do not stand in such coupling relations.
To account for the surface stability and variability in intergestural timing, previous studies
assumed that variability must be explicitly represented in the lexicon. For example, in Byrd’s (1996a)
phase window model, the sources of variability in timing are the width of phase window (the narrower the
phase window is, the less variable) and the weighting function determined by linguistic and
extralinguistic factors such as syllable structure, prosody, and speaking rate. In a coupled oscillator model
(Goldstein et al. 2009, Nam et al. 2009), coupling strength specified for every pairwise couplings in the
coupling graph determines variability (the greater the coupling strength, the less variable). On the other
hand, stability and variability in timing can emerge from the temporal architecture of coupling graphs. For
example, Goldstein et al. (2009) attributes greater stability found in the syllable onset clusters compared
to the coda clusters to the greater number of links in the coupling graph for onset structures than for codas
(see also Nam & Saltzman 2003 for simulation results).
The current study argues that no explicit mechanisms for variability is necessary in the lexicon or
in the representation, because differential variability can be freely obtained by the temporal structure of
coupling relations. This implication for different coupling structures giving rise to differential timing
stability will be further pursed in the modeling in Chapter 6 using the relative phase model via the
coupled oscillator dynamics.
Moving on from the temporal “segmenthood” of gestures, variability in timing is also compared
between two different types of geminates: assimilated and concatenated geminate nasals at the juncture.
31
This is not to say there is only one temporal alignment strategy to produce a nasal consonant, and different
languages can adopt different temporal patterns if they adhere to the superordinate goal for nasals. Additionally,
specific timing may be differently organized in the onset and in the coda, as exhibited in different timing patterns
and stability in syllable onset and coda nasals in this study.
140
Structurally, concatenated geminate nasals (/n#n/) behave like singleton coda nasals if the boundary-
initial onset portion is denasalized. On the other hand, assimilated geminate nasals (/t#n/) may be realized
as [n.n] or behave like singleton onset nasals if the phrase boundary blocks assimilation process. Thus,
assimilated nasals are predicted to vary in timing with patterns ranging from those characteristic onset
nasals to coda/concatenated geminates. The current findings show that for nasality lags only, assimilated
geminate nasals have greater timing variability than concatenated geminate nasals. Assimilated geminates
are also produced with great interspeaker variability as well, with Speaker D having near-zero mean
nasality lags while others have longer nasality lags that are comparable to the nasality lags for
concatenated geminates. Moreover, the results for relative timing indicate that assimilated geminates have
more flexible relative intergestural lags across prosodic modulation compared to temporal lags in
concatenated geminates. These findings for timing variability suggest that gestures associated with
assimilated geminates may be more loosely coupled to each other compared to those associated with
concatenated geminates.
Lastly, let’s consider whether nasality lags (i.e., the interval from TT onset to VEL raising onset)
are lengthened due to the prosodic effect of lesser overlap or shortened due to the Korean denasalization
process (in the case of onset nasals). Unlike token-to-token variability showing less variable nasality lags
for coda nasals than for onset nasals, when the absolute nasality lags are examined for each prosodic
condition, this lag increase at higher prosodic conditions. This is different from the onset-to-target lags
(and relative oral-velum lags), which are found to be stable across prosodic conditions in coda nasals.
These results show that the token-to-token variability in timing has implications that are different from
the intergestual timing across prosodic modulations. While the former measurement may indicate the
overall index of variability in timing relations, the latter also includes how the timing interacts with the
prosodic system. In syllable onset nasals, it is predicted that the nasality lags are decreased at strong
boundaries, in line with stronger denasalization reported at higher phrase boundaries (Cho & Keating
2001, Yoshida 2008, Yoo & Nolan 2020). Contrary to this prediction, nasality lags are not shortened at IP
141
boundaries. This may be due to the floor effect, as the nasality lags in the onset nasals are already close to
zero, therefore not leaving much room for shortening.
While the nasality timing lag in onset nasals is not decreased at higher prosodic boundaries, nor is
it lengthened. One possible explanation is that there are interacting effects in onset nasals of both the
prosodic slowing (which may lengthen intergestural timing) and denasalization weakening (which may
shorten the nasality interval), resulting in the lack of changes in the nasality timing across prosodic
variations. Although we cannot determine whether there are bidirectional effects or no effect on timing
with the current data, the temporal lag patterns across prosodic boundaries suggest different sources of
timing variations are at play, with potential interactions on intergestural coordination between prosodic
variability and phonologically driven variability.
5.5. Conclusion
This chapter illustrates the stability and the variability in within-segment intergestural timing (oral-velum
timing) in the production of nasal multi-gesture complexes. The articulatory speech production data
obtained using real-time MRI provides useful information on the temporal coordination between gestures
as well as the variability in coordination, which can illuminate the coupling structures of phonologically
informative multi-gestural units. First, the timing from oral onset to velum target landmarks within a nasal
segment is found to be stable across prosodic modulations, while individual gesture duration and
magnitude are malleable under prosodic variations. Next, different degrees of timing variability are
observed for onset nasals and coda nasals, with onset nasals exhibiting greater timing variability. Taken
with earlier findings on articulatory timing for across segment gestural timing, this greater timing
variability found within onset nasals compared to coda nasals has implications that the stability patterns
are differently encoded for within-segment timing and for across-segment timing. Furthermore, the
experimental findings on timing variability suggest that intergestural timing is not only modulated by
prosodic variability but is also associated with various sources of variations such as syllable structure,
142
coupling structures created via concatenation and assimilation, and phonological sound change processes.
The articulatory patterning in multi-gesture complexes reveals that even with the same atomic units and
similar temporal organizations, multi-gesture complexes having different structural properties in their
phonological representation can be differentiated by temporal variability. The study of stability and
variability in timing relations crucially informs us about different strengths in coordination structures and
coupling relations, which can further our knowledge on the representation of linguistic units.
143
6. Computational Modeling of Timing for Multi-Gesture Complexes
6.1. Introduction
In addition to the rich empirical data provided by real-time articulatory imaging, dynamical systems
modeling of speech production provides a framework for understanding the principled or systematic
variability evidenced for linguistic systems. This dissertation’s empirical studies of intergestural timing in
the languages of Hausa and Korean provide data that can be deployed to model how language users
maintain a balance between flexibility and stability in the system. Flexibility in coordination, we argue,
encodes contextual, informational, and structural properties in speech, and stability in coordination
ensures that certain gestural molecules achieve their specific higher-level (superordinate) task-goals
necessary to construct informative linguistic units and generate phonological contrasts. This systematic
balance between flexibility and stability is critical in language processing and production, as the
unbalanced system potentially either disrupts comprehenders’ recovery of phonological units and/or fails
to capture the lack of invariance typifying speech production and perception. The present chapter aims to
model coordination structures and their stability in coordination to explore potential bases of variability
and stability within a coupling architecture among speech gestures and to advance our understanding of
how this information may be embodied in the representation of phonologically contrastive units within
larger lexical and prosodic structures.
In addition to stochastic variability, differences in intergestural temporal stability can be
understood to emerge in the presence of the interaction among lexical level sub-systems (e.g., segmental
gestural molecules, syllable structures) and various sources of linguistic variation (prominence, prosodic
boundaries). An account of the differential stability of multi-gestural structures is important not only for
well-considered theories of phonological representation but also because it can be directly related to the
learnability of the linguistic units. In this chapter we will elucidate two modeling approaches relevant to
these issues.
144
Here we first pursue a computational model that is intended to account for the varying stability
patterns between onset and coda nasals in Korean. Specifically, the issue of stability in timing is
addressed with respect to intergestural coupling structures. Korean nasals exhibit patterns such that onset
nasals have greater timing variability than coda nasals. This temporal stability pattern for complex multi-
gestural segments is in contrast with that captured in some earlier coupling modeling of syllable structure,
which generally reported more stability in the in-phase (simultaneous) relations than in the anti-phase
(sequential) relations (Goldstein et al. 2006, 2009, Nam 2007, Nam et al. 2009). By calculating the
stability of the relative phases given the coupling graphs using Task Dynamics Application (TADA; Nam
et al. 2004)
32
, a phase stability analysis can illustrate how coupling structures with different architectures
can be used to predict differences in timing stability and variability without having to introduce additional
parameters specifically to account for temporal variation.
One instantiation of varying stability based on relative phases can be obtained via the dynamical
systems approach using an attractor-based model (Gafos 2009, Tuller et al. 1994). Roessig et al. (2019)
uses the attractor landscapes to represent acoustic patterns found in speech data, specifically the f0 values
with respect to pitch accent types. The attractor-based model is based on the potential energy function of a
dynamical system V(x), and to determine changes in the system with the attractor landscape of V(x), the
force function F(x), which is the negative differential of the potential function, including the random
noise, can be used. The following cubic function in (1) with changes in coefficient values (a-d) generates
different shapes of attractor landscapes, which may have one or many steady states.
(1) F(x) = -ax3 + bx2 + cx – d + N
(a, b, c, d are coefficients and N is gaussian random noise)
32
Nam and Saltzman (2003) conducted stability analysis on relative phases directly, whereas the current modeling
conducts analyses on the stabilizing time of relative phases to compare stability among coupling relations with
different relative phase values.
145
Here, a ‘mini’ simulation of attractor-based models that represent relative timing and the
corresponding statistical distributions is examined. As seen in the articulatory patterning of nasals
in Korean, onset nasals have near-zero (in-phase) consonant nasality lags, whereas coda nasals
have positive nasality lags. What’s more, the onset nasals show greater variability in timing
compared to coda nasals. Based on the following potential energy functions and force functions
postulated for onset nasals (2a) and coda nasals (2b), their corresponding attractor landscapes are
shown in Figure 6.1.
(2) a. In-phase onset nasals b. Out-of-phase coda nasals
V(x) = 4x
4
V(x) = 4x
4
+ x
F(x) = - (16x
3
) + N F(x) = - (16x
3
+ 1) + N
Figure 6.1. Potential energy functions with in-phase timing variability (left)
versus out-of-phase timing stability (right)
Figure 6.1 illustrates simulations of the systems in (2) with 1,000 iterations. A gaussian random noise of
amplitude 3 is added to the cubic functions with coefficients given in (2). For both attractors, the first
146
three coefficients are the same (a = 16, b = 0, c = 0), and the difference is only in the last coefficient (d =
0 for Figure 6.1: left, and d = -1 for Figure 6.1: right). Assuming that the x axis represents relative phase
angles, the attractor landscape on the left has a mean phase angle of zero (a steady state is at phase 0), and
the one on the right has a mean phase angle of 0.5, which corresponds to a positive relative phase of 90°
(when 1 = π). Furthermore, the histograms associated with each landscape shows that the in-phase
landscape on the left results in greater variability in relative phases compared to the off-phase landscape
on the right.
This simulation shows that attractor-based models can demonstrate observed patterns of
intergestural timing behavior as well as predict the associated variability according to the attractor
landscapes. Crucially, these models show how the synchronicity in timing of 0 target phase may yield
greater variability in the statistical distribution than the sequential timing. Based on the implications from
the attractor-based models and the findings from the empirical data, the first computational analysis
below uses coupled oscillator models to test how intergestural coupling relations within coupling graphs
exhibiting different structural properties are realized as different relative phase values on the surface as
well as differences in the variability of those relative phases.
In the second computational thrust of this chapter, a machine learning algorithm using a Support
Vector Machine (SVM) classification is implemented to calculate the relative contribution among the
articulatory spatio-temporal variables for classifying different classes of phonological segments. Timing
relations and their systematic variations can inform phonologically relevant articulatory dynamics. In our
previously examined articulatory data on Hausa glottalic and pulmonic consonants, implosives and voiced
stops were differentiated crucially by the timing between the oral and the vertical larynx gestures, rather
than by individual gestural actions such as gestural duration and magnitude. Recall that the empirical data
for these sets of multi-gesture complexes in Hausa show that pulmonic consonants exhibit a simultaneous
oral-vertical larynx coordination, in contrast to non-pulmonic consonants with a sequential intergestural
timing. On the other hand, between two glottalic consonant classes in Hausa—i.e., ejectives and
implosives—the intergestural timing patterns are similar while they have distinct vertical larynx
147
magnitudes (ejectives have upward larynx movement and implosives, downward movement). This
illustrates that at least for multi-gesture complex segments, intergestural timing may play a crucial role in
phonological representation and thus in segment classification (whereas for linguistic units larger than a
segmental gestural molecule, timing relations are expected to be widely variable and thus not as
informative in terms of the recoverability of certain units).
In order to quantify temporal coordination structure among speech gestures, the Speech
Articulatory Coordination (SAC) metric (Williamson et al. 2018) has been applied to coupled oscillator
planning models (Lammert et al. 2020). The SAC metric is computed by pair-wise comparisons between
the factors in time series speech production data, for example acoustic data such as formant frequencies or
mel-frequency cepstral coefficients or articulatory data such as vocal tract variables (Williamson et al.
2019). The second part of the computational modeling shares the same intuition: based on the factors
obtained from articulatory dynamic data, a machine learning classification method is applied to develop
our understanding of how differences in articulatory variables (such as intergestural timing of articulatory
gestures) may lead to an increased performance in category classification.
The goal of this chapter is to provide modeling analyses of the temporal stability of the multi-
gestural molecules found in the empirical data, paying attention to their systematic variability across
prosodic perturbations. The empirical data provided by the state-of-the-art real-time imaging and data
analysis of the preceding four chapters of this dissertation makes it possible to inform a dynamical
systems approach for understanding coordination within multi-gesture molecules. In particular, the
stability of timing among articulatory gestures will be a focal area of endeavor for the computational
modeling in the present chapter, as this is crucial in understanding abstract and specialized goals within
the speech production system. The statistical and computational modeling analyses will begin to provide a
better ‘angle of view’ for understanding the parameters of representations in the coupling structures that
lead to differences in relative timing as well as in the stability in timing across prosodic contexts. These
parameters that encode stability and flexibility should, we argue, be incorporated in the phonological
representation of speech sounds, at least for the segment-sized multi-gesture complexes. The modeling of
148
temporal coordination will shed light on the cognitive underpinnings of gestural coupling structures and
deepen our understanding of how speech gestures in human language are organized and coordinated so as
to instantiate internal and relational linguistic structures.
6.2. Relative Phase Model
6.2.1. Background
This dissertation’s empirical studies on timing and phasing relations in multi-gesture complexes show that
timing and its stability differentiates onset nasals from coda nasals in Korean, as well as implosives from
plosives in Hausa. How, then, is this phasing information incorporated in the representation of a segment?
To account for relative timing between coordinated speech gestures, a coupled oscillator model has been
proposed to postulate that each gesture in an utterance is associated with a non-linear limit-cycle planning
oscillator, or clock, that triggers the activation of that gesture (Goldstein et al. 2007). Each gesture’s
oscillator is coupled to one another in the coupling graphs, which store information on linguistically
significant relative timing between gestures. The coupling graph architecture for speech gestures
specifically encodes coordination structures by assigning a target relative phase for each coupled link.
This graph can be used to predict differential temporal coordination patterns found in phonological units
and to understand the control system of intergestural timing.
We explore the possibility that different degrees of timing stability can naturally emerge from the
structures of the coupling graphs themselves, without postulating any external device or mechanism to
account for varying timing stability for multi-gesture segmental molecules (cf. Sorensen & Gafos 2016).
This is in line with approaches by Kröger & Cao (2015), Nam (2007) and Yanagawa (2006). This
emergence of timing stability directly from the coupling graphs is distinct from the dynamical systems
models of rhythmic and prosodic gestures that use independent μ and π gestures (Byrd & Saltzman 2003,
Saltzman et al. 2008) to modulate the behavior of ongoing constriction gestures of the lexical items, either
through a coupled oscillator or via a modulation of activation functions of individual gestures. For
149
example, π-gestural approach is used for phrase boundary effects (Byrd & Saltzman 2003), and μ-gestural
approach for prominence and stress patterns (Saltzman et al. 2008). The present approach is also distinct
from the earlier Phase Windows model of timing variability (Byrd 1996a), which posited a separate
parameter of ‘window width’ as a range of phasing variation to explicitly account for differential
variability in intergestural timing.
In phase perturbation studies (Port 2009, Tilsen 2009), naturally preferred phasing of the
rhythmic oscillatory system has been noted, namely the phases of one third, half, and two thirds of the
cycle. The coupled oscillator model of syllable structure (Goldstein et al. 2007), for example, calls on
double-well potentials by employing the nonlinear dynamical HKB potential function (Haken et al. 1985)
with potential minima at 0 and 180 degrees for in-phase and anti-phase relations, respectively (Goldstein
et al. 2007, Iskarous 2017, Nam et al. 2009). The potential function equation is given in (3), and Figure
6.2 illustrates the model of phase transition from a less stable anti-phase relation (180°) to a more stable
in-phase relation (0°) as frequency/rate increases, by changing coefficients of the potential function. This
phase shift has been modeled by making the coefficients b/a dependent on the oscillation frequency.
(3) V(ψ) = -a cos(ψ) – b cos(2ψ); where ψ = ϕ2-ϕ1 (HKB potential function)
Figure 6.2. HKB potential function simulating anti-phase (left) to in-phase (right) transition
In addition to these naturally preferred (dynamically optimal) phase relations between two
gestures, relative phases may, we argue, be determined for a multi-gestural complex by a gestural
150
planning system that incorporates an abstract and cognitive superordinate goal. Examples of
superordinate goals for such multi-gestural structures include aerodynamic goals for glottalic consonants,
articulatory/perceptual goals for nasal consonants, and acoustic goals for liquid rhotics. This dissertation
is limited to the exploration of the superordinate goals that necessitate specific temporal constraints. The
articulatory kinematic results on Korean nasals can be understood in terms of differential coupling graphs
that yield differential variability in oral-velum timing as a function of syllable structure and that
potentially gives rise to the phonological sound change process of Korean onset de-nasalization.
Previous studies on intergestural timing generally report gestures in the syllable onset to be in-
phased (simultaneous) with another, whereas gestures in the syllable coda are in an anti-phase
(sequential) relation (Browman & Goldstein 1990, Browman & Goldstein 1995, Byrd 1996a, Gick 2003,
Krakow 1989, Maddieson 1998). Since in-phase relations have been shown to be more stable than the
anti-phrase relations with a potential transition of phase relations at faster speed in limb studies (Kelso et
al. 1986, Haken et al. 1985), previous accounts of syllable onset synchrony and coda asynchrony predict
that the onset timing is more stable relative to the coda timing (Goldstein et al. 2009, Nam et al. 2009).
However, the current articulatory findings within Korean nasals in contrast show that the intergestural
timing within the nasals is more variable in syllable onsets compared to the timing within coda nasals
even though oral and velum gestures are not synchronous in either position, with lags that differ only by
30 ms. Therefore, giving a single internal coupling structure to all nasal consonants fails to capture
different timing patterns and stability found between syllable onset and coda nasals. At least for Korean—
assuming different languages choose different sets of temporal arrangements—the internal timing of an
onset nasal and a coda nasal consonant seems likely to have different coupling structures, as their timing
is differently affected by prosody.
The modeling of this internal timing of the nasal gestural complex via coupling graphs and
resulting relative phase relations aims not only to account for distinctive temporal coordination patterns
found in this multi-gestural complex but also crucially to address the issue of differential stability in
timing via distinct coupling structures. In order to capture the asymmetric timing stability found between
151
the Korean onset and coda nasals, target relative phases were chosen for coupling graphs for onset nasal
and coda nasal structures, based on the empirical findings for the intergestural lags. Korean is
hypothesized to include the oscillators for oral closure and release gestures as well as for velum lowering
and raising gestures, separately. Graphs serve as input to the Task Dynamics Application (TADA; Nam et
al. 2004).
6.2.1.1. Task Dynamic Application (TADA)
Syllable structure-based coupling models, coupled oscillator models of intergestural coordination, and
task dynamic models of interarticulator coordination of speech production have been implemented in the
Task Dynamic Application (TADA) in MATLAB (Nam et al. 2004, 2006). Steriade (1992) proposed that
plosives with release are inherently bipositional and that non-place articulators such as nasal and
laryngeal gestures can be associated synchronously with either closure or release of the plosive
constriction. For example, nasality can be associated either to the release (postnasalized stops), to the
closure (prenasalized stops), or to both (fully nasal stops). In Articulatory Phonology, the possible
combinations between oral and velum gestures would be anti-phase relations with velum preceding oral
gestures or vice versa, and in-phase relations. Additionally, velum gestures may as well be implemented
as having separate controls for lowering and raising gestures.
33
In the syllable structure-based coupling models implemented in TADA, phasing relations
between gestures within a segment (mainly liquids and nasals) for English have been already
implemented based on a set of English coupling principles, expressed in the TADA code (coupling.ph).
These include phase relations between primary (oral gestures with narrower constriction) and secondary
(velum or glottal gesture, oral gestures with wider constriction) gestures are specified in the coupling
33
In addition to phasing, the activation interval may be another factor contributing to contrasts in articulatory states
(e.g., prenasalized stops and full nasal stops can be contrasted by the duration of velum gesture. Likewise, between
aspirated and unaspirated stops).
152
graph separately for the onset and the coda position. Some examples of segmental coupling specified in
coupling.ph are summarized in (4).
(4) Within C coupling in TADA coupling graph
34
• Oral closure gesture is anti-phase with oral release gesture
• Glottal gesture (for aspiration) is delayed relative to oral stop gesture
• Glottal gesture is synchronous with fricatives’ oral constriction gesture
• Velum gesture is synchronous with oral constriction gesture in onset
• Velum gesture is anti-phase to oral constriction gesture in coda
• Secondary gesture of /r/ and /l/ is synchronous with primary constriction gesture in onset
• Secondary gesture of /r/ and /l/ is anti-phase to primary constriction gesture in coda
The current modeling makes use of the coupled oscillator model implemented in TADA, which
takes coupling graphs as input and generates gestural scores with specific activation intervals and relative
timing among gestures. In the present modeling, the input coupling graphs are selected based on the
empirical findings for Korean onset and coda nasals’ intergestural timing. Then, the output relative phases
calculated from the coupling graphs are examined to test whether the coupling graphs themselves
sufficiently predict differences in observed relative timing and stability or whether other mechanisms
controlling stability/variability in timing are necessary.
6.2.1.2. Stability and coordination in dynamical systems
Although variability in timing may be considered to oppose stability in timing, movement variability
needs to be distinguished from dynamical stability. For example, coordinative variability or systematic
34
In addition to these segment-internal coupling, other types of coupling (C-C coupling, C-V coupling, cross-
syllable coupling, etc.) are implemented. What is not in the current TADA model is the difference in coupling
strength with respect to the size of the coupled units. As discussed above, gestures within a contrastive segmental
unit may be associated with stronger coupling relations than gestures across segments. Although this within- and
across- segment timing is one area in need of modeling developments, the scope of this chapter is to model the
difference in timing stability within segments with different syllable structures.
153
variability, which refers to movement variability in performing the task over repetitions, is actually
expected to achieve higher level goals in varying circumstances, i.e., dynamic stability (Harbourne &
Stergiou 2009). These higher-order goals are unchanging, maintaining stability at the end-point, while
there can be numerous variable solutions to approach the goal (van Emmerik et al. 2000, 2016).
Stability in motor coordination is largely studied in the domain of interlimb coordination such as
human locomotion, estimating gait stability, and bimanual rhythmic coordination (Bruijn et al. 2013, Post
et al. 2000, Schmidt et al. 1998). Specifically, these studies investigate how humans maintain stability in
motor skills while resisting or recovering from external perturbations. For instance, gait stability refers to
the ability to avoid falling despite perturbations that disturb the oscillatory gait. Currently available
dynamic measures of stability include the maximum Lyapunov exponent, the maximum Floquet
multiplier from the Poincaré section, extrapolated centre of mass, variability measures, stabilizing and
destabilizing forces, among others (Bruijn et al. 2013, van Emmerik et al. 2016).
35
However, only a few
measures from above are applicable to assess stability in relative timing, and even fewer suitable for
calculating stability in phasing between coordinated speech gestures which can be coupled to each other
in addition to being coupled with higher coordination structures.
Schöner (1995: 298-299) discussed a dynamical account of coordination in motor control:
“When [...] relationships [between coordinated components] are measurable, the
description of patterns of coordination can proceed independently of a description of the
relevant components. [...] The patterns are governed by the laws of dynamical systems.
Stable patterns of coordination are stable stationary solutions of such dynamical
systems.”
35
The calculation of the maximum Lyapunov exponent and the maximum Floquet multiplier requires the
construction of a state space (e.g., a space where positions and velocities of all elements of the system are
represented) from kinematic data. Some restrictions are as follows: the same number of cycles should be contained
in the state space as the estimated maximum Lyapunov exponent is dependent on time-series length, the Floquet
theory can only be applied to strictly periodic systems with non-varying length of cycles (Bruijn et al., 2013).
154
He then introduces the timing level (i.e., the set of dynamic variables that are stabilized under
perturbations of timing such as phase and relative phase) and the movement goal level (i.e., parameters
maintained over the entire trajectory such as movement time, direction, and amplitude), suggesting that
the relative phase dynamics of motor coordination attributed to the timing level are largely independent of
the dynamic principles operating at the movement goal level, although these two levels are coupled to
each other. This dynamical approach to coordination implies that not only the goal of individual
components of the motor system but also the patterns of coordination that are stable and reproducible can
be included in the representation at the movement planning level.
In Schmidt et al. (1998), the stability of bimanual coordination indexed as relative phase angle is
examined by changing coordination mode and oscillation frequency. The coupling strength was greater
for the in-phase than the anti-phase mode, and strength decreased as frequency increased, as predicted by
the dynamical model with the HKB potential function given in (5). Note that the equation below includes
the difference between the oscillators’ eigenfrequencies Δω (Haken et al. 1985, Schöner et al. 1986). The
ratio between two coefficients in the equation, b/a, governs the coupling strengths of the oscillators. Post
et al. (2000) computed relative phase stability using standard deviations and the rate or recovery (i.e.,
relaxation time) after a perturbation of frequency and amplitude. Studies on phasing stability manipulate
oscillation frequency and/or coupling strength. While oscillation frequency is inversely associated with
stability, coupling strength is directly related to stability (Cohen et al. 1982).
(5) V(ϕ) = -Δωϕ - a cos(ϕ) – b cos(2ϕ) (HKB potential function with frequency detuning)
For our purposes, coupled oscillators at the ‘planning’ level are entertained to investigate
potential sources of variability at the behavioral level, which may arise from the coupling architecture.
There is a planning oscillator associated with each gesture. The oscillators can be set to random initial
phases and then will start to settle towards the target relative phases given random fluctuations due to
noise. When the network of coupled oscillators is sufficiently settled, production of the syllable begins,
155
and each gesture is triggered at a fixed phase of its planning oscillator (e.g., gesture triggered at phase 0
degrees). The settled relative phases will determine the relative timing of gesture triggering. Since at any
point in the settling process, different coupling relations may have settled to different degrees, at the
moment of production initiation, the relative phase specifications of the different couplings will be more
or less stable. The ones that require a longer settling time will show more variability. Consider two stable
coordinated structures that differ in coupling strength. The coordination with a stronger coupling is faster
in stabilizing to the target relative phase. This settling time is related to relative stability, so the
coordination of components with a shorter settling/stabilization time toward stable phasing will have
greater stability in timing. In a phase oscillator with a stable equilibrium, phasing stability can be indexed
by the stiffness of the phase oscillator system, i.e., how fast a system reaches the stable equilibrium value.
Following this line, relative stability between gestural molecules with different coupling structures will be
examined in the modeling below using coupling graphs implemented in TADA.
6.2.1.3. Relative phase stability
Phasing stability in the current context is the ability to arrive at or return to the fixed state rapidly with
random initial values or with perturbations. TADA has a capability of returning relative phases according
to the computed coupled-oscillator model in the coupling graph mode (Nam et al. 2006). Although there
are various parameters that can be manipulated to test different degrees of stability (e.g., coupling
strength, new coupling links, target relative phase, initial phase value, etc.), in the modeling of the Korean
data pursued below, input graphs only vary in number of coupling links and in target relative phases.
More in-depth manipulations and comparisons in relative stability will be tested and incorporated at a
later date. For the moment, we demonstrate how settling time, as an index of phasing stability, is
correlated with the number of coupling links of the relevant gestures and their underlying relative phases.
Figure 6.3 shows the coupling graph of the nonce-phrase [pa.pa#pa], input as ARPABET
(PA)(PA)#(PA), generated from TADA. The parentheses indicate syllable boundaries and the pound sign
indicates a word boundary. Between syllables within a word (i.e., the first two ‘PA’s), the vowel gesture
156
is coupled to the oral closure gesture in the following syllable. At a word boundary, no such V-to-C
relation is posited; rather, the two vocalic gestures across a word boundary are coupled to each other.
Figure 6.3. Coupling graph for pa.pa#pa generated by TADA’s coupled oscillator model
The relative phase plot based on the above coupling graph is presented in Figure 6.4 showing the
computed relative phases for the pa.pa#pa sequence with random initial phase values, with default values
based on English. “.” represents a syllable boundary in the input, and “#” represents phonological word
boundary or small phrase boundary. In the sample relative phase plot in Figure 6.4, the coordination in the
second syllable, between labial (LAB) closure and LAB release gestures (‘LABclo2 – LABrel2’) as well
as between LAB closure and glottal (GLO) gestures (‘LABclo2 – GLO2’) in [p
h
] take the shortest time to
reach the target relative phase, compared to the relative phases between gestures in the first and the third
[p
h
] in pa.pa#pa. This difference in settling time can be explained by the additional coupling link that
exists between the second oral closure gesture and the preceding vowel gesture (‘V1 – LABclo2’; see
Figure 6.3). To summarize how the coupling structure gives rise to differences in the stabilization time,
the oral closure gesture of the second syllable is coupled to four other gestures (glottal, oral release, and
two vowel gestures [anti-phase with the preceding vowel and in-phase with the tautosyllabic vowel]),
while the closure gestures in the first and the third syllable only link to three gestures (namely glottal, oral
157
release, and tautosyllabic vocalic gestures). This example demonstrates that the gestures associated with a
greater number of coupling links show more stability in relative phases (all else equal), as evidenced in
their shorter settling time.
Figure 6.4. Relative phases for the production pa.pa#pa in TADA
(ticks & numbers on the trajectory plots indicate stabilization timepoints)
Simulation data of this sort from TADA is useful to calculate relative stability from the output
relative phases computed over time. For the relative phase stability analysis, two measures of stability are
entertained: first, the stabilization time (details on the computation in the methods below) of the relative
phase plots generated by TADA, and second, the variation in stabilization time over iterations of runs.
These choices are based on two assumptions—first, that a coupling structure with a more variable relative
timing will take longer time to stabilize, and second that a more variable stabilization time will be
associated with a more variable resultant relative phase. The empirical data leveraged in this modeling is
from the experimental dataset (Chapter 5) for oral and velum gestures in Korean onset and coda nasals.
158
6.2.2. Predictions
The general predictions that arise from the coupling graph dynamics are that the coordination is more
stable a) if there are greater number of paths between the two nodes, b) if there are more direct and
shorter paths between the two nodes, and c) if the strength of coupling is greater for a given connected
path (Goldstein et al. 2009, Nam et al. 2009). What is not predicted in the coupling graph dynamics is
whether the coupling structures in a loop (e.g., a tripartite coupling structure) that have inharmonious
relative phase values—phase angles of two coupled links that do not sum up to the phase angle of the
other link—have diminished stability in coordination. For example, if gestures A and B are coordinated
in-phase with each other but are both in an off-phase relation with a gesture C but with different phase
values, then the entire tripartite structure becomes inharmonious, and thus the resultant (i.e., output
observed) phase values change from the target (i.e., represented) relative phase values (6a). On the other
hand, if gestures A → B and B → C are both 90 degrees out of phase and A → C is anti- phased (180
degrees), this tripartite structure is harmonious (i.e., the sum of two phase angles [90°+90°] equals that of
the other phase angle [180°]), and the resultant phase values remain the same, despite the fact that the
structure only has off-phase relations without any in-phase relations (6b). Such phase departures from the
target values (or the lack thereof) are demonstrated in the dynamical systems implemented here in TADA.
(6) a. inharmonious & b. harmonious coupling loop structures
159
With the coupled oscillator models and Task Dynamics models, we examine whether such
incompatibilities in the coupling architecture produce unstable coordination. To recapitulate empirical
findings on Korean nasals’ oral-velum timing in Chapters 4 and 5, the intergestural timing between
Velum (VEL) lowering onset and Tongue Tip (TT) closure gesture onset—i.e., onset lags—are highly
overlapped in distributions between onset and coda nasals; the mean onset lags are positive (i.e., velum
lowering precedes oral closure gesture onset) and slightly longer for onset nasals than for coda nasals
(/#n/: 136 ms; /n#/: 106 ms). On the other hand, the consonant nasality lag, indexed by the interval from
the oral closure gesture onset to the velum raising onset, is distributed bimodally between onset and coda
nasals, with considerably shorter consonant nasality lags for onset nasals than for coda nasals (/#n/: 20.2
ms; /n#/: 100.6 ms). For both lag measures, greater variability in intergestural timing is exhibited for
onset nasals compared to the timing of coda nasals.
In the coupling graphs shown in Figure 6.5, phase relations for onset and coda nasals for Korean
are presented based on the current empirical findings. The coupling relations between the onset of the
VEL lowering gesture (VELl) and onset of the TT closure gesture (TTclo) are initially posited as 90° out-
of-phase for both onset and coda nasals, based on positive onset lags exhibited in the articulatory timing
data. Timing between the onset of TT closure gesture (TTclo) and the onset of VEL raising gesture
(VELr), the interval indicating the consonant nasality lag, is near-zero for onset nasals, thus modeled as
an in-phase relation (represented with a light blue solid line in Figure 6.5), whereas the same temporal lag
is positive in the coda nasals, thus posited at 90° out of phase (indicated with red arrows in Figure 6.5).
This in-phase link posited for onset (but not for coda) nasal structure, based on the previous chapters’
empirical findings, provides articulatory grounding for onset de-nasalization: by synchronizing the
initiation of the velum raising with the oral closure, this temporal constraint significantly shortens the
interval for oral constriction while velum is still lowered, resulting in a very reduced temporal window for
perceived nasality in articulatory terms. Finally, the velum lowering (VELl) and velum raising gesture
(VELr) gestures are coupled anti-phase (180°) to each other in the model.
160
Figure 6.5. Coupling graphs for Korean onset (left) and coda nasals (right)
The postulated coupling graph for onset nasals in Figure 6.5 exhibits inharmonious tripartite
structures. TADA simulations show that the input target relative phase values for this coupling graph are
not in fact achieved in the output phases (changes from target are indicated with blue texts in Figure 6.5).
On the other hand, no changes from target phase relations occur in the output in TADA simulations for
the postulated coda nasal coupling structure.
With these empirically-driven coupling structures, the relative phase stability analysis tests
whether the different coupling architectures result in greater phasing variability for the inharmonious
onsets compared to the harmonious coupling structures of the codas that emerge without any phase
transitions. Crucially, the modeling of phase stability investigates whether the empirical data on
differences in timing stability between onset and coda nasals in Korean is also predicted by the coupled
oscillator systems model of relative phases generated solely by their differences in the coupling
architecture.
161
6.2.3. Methods
Based on the pre-imposed coupling relations implemented in the coupling structure, TADA simulates and
generates relative phase time functions from the computed coupled-oscillator model for a given coupling
graph (Nam et al. 2006).
During the coupled oscillator simulations, every oscillator begins at a random initial phase. In the
absence of competition between two or more coupling relations, the final relative phase of any two
coupled oscillators after the coupling simulation will settle at the same value as the input target relative
phase, and the oscillators will trigger the corresponding gestures accordingly. For example, for a relative
phase of 0 degrees, two corresponding gestures will begin at the same time, and for a relative phase of
180 degrees, one oscillator/gesture will significantly precede the other in time. When there is competition
between multiple coupling links, the final relative phase at the end of planning is determined by the
competitive coupling of all the coupling relations and potential compromises among competing target
phases (Nam & Saltzman 2003, Goldstein et al. 2006, 2007, 2009, Saltzman et al. 2006). Because the
initial phases of the oscillators are set to random values, each iteration of runs for the same coupling
graph architecture gives non-identical resulting relative phases.
Korean onset nasals (/#n/) are simulated as a sequence pa.na, and coda nasals (/n#/) as pan.ta.
For the coupling relations in both pa.na and pan.ta, input target relative phases are set as 90° for VEL
lowering (VELl) to TT closure (TTclo) gestures, and 180° for VEL lowering to VEL raising gestures
(VELr). The difference in coupling graph between onset and coda nasals is in the link between TT closure
and Velum raising gestures; the target relative phase for TT closure to VEL raising is set as 0° for the
onset nasals, whereas such coupling is set as 90° for coda nasals. Figure 6.6 shows the complete coupling
architecture used for the word with a syllable onset nasal and for the word with a syllabic coda nasal.
These coupling architectures are used within TADA to calculate relative phases over time.
162
Figure 6.6. Coupling graphs for pa.na (left) and for pan.ta (right)
(The crucial difference between onset & coda nasals lies in one coupling link;
i.e., TTclo – VELr in yellow oval)
Figure 6.7 presents a sample relative phase plot for pa.na with coupling relations shown in Figure
6.6 to test the hypothesis that the different graphs for onset and coda replicate the intergestural timing
results for Korean nasals, as well as the variability results. The parameters chosen for the computation of
stabilization/settling time of the relative phase is the following: the threshold is set as ± 0.5° from the final
relative phase (defined as the value at the last time point of the planning simulation), and the window size
is set as 50 frames (≈ 625 milliseconds). The stabilization time is defined as the timepoint when all the
phase values within a window fall under the threshold. For example, if the final relative phase is 0, the
timepoint when the following 50 frames’ relative phases all fall in the range between -0.5° and 0.5° (with
a 0.5° threshold) is defined as the stabilization point of the final relative phases. The vertical ticks on the
relative phase trajectories in Figure 6.7 indicate the computed stabilization time. The pairs of phase
trajectories with the final relative phase of 0° and the final relative phase of 65° in Figure 6.7 demonstrate
how different coupling relations with the same final state—both 0° or both 65°—may differ in their
stability in coordination. as indexed by stabilization time noted in the offset tick marks within each pair.
For example, the settling time is ≈16 timeframes for the coupling between Tongue Tip closure in /n/
(TTclo) and the vowel in the second syllable (V2), whereas it is longer (≈24 sec) for the Labial closure
gesture (LABclo) and the first vocalic gesture (V1). Both represent an in-phase C-V timing with 0° as the
settled relative phase, but the phase stability seems to be different between these two coupled gestures.
163
Figure 6.7. Relative phase plot for (pa.na) generated in TADA
(ticks on the trajectory plots indicate stabilization timepoints)
(final relative phase values are stated on each trajactory)
For the current simulation analysis, the timeseries relative phase plots generated by TADA’s
coupled oscillator simulation are used to calculated stabilization times for onset and coda nasals coupling
structures, separately (100 iterations for each, with a total of 200 iterations). Relative phase stability is
determined by the following two measurements: i) the computed stabilization time, with the assumption
that coupling relations with a more stable timing will take shorter timer to stabilize, and ii) the variation in
relative phases. Stabilization and variation are assessed by stopping the simulation before relative phases
stabilize, over iterations of runs (again 100 iterations each for onsets as well as for codas), with the
assumption that these variable relative phases at this timepoint in the planning represents the variability
that exists at the planning level of speech production. Tests of homogeneity of variance are conducted via
Levene’s test calculating absolute deviations of observations from the median (Fox 2016, Fox &
Weisberg 2019).
164
6.2.4. Results
6.2.4.1. Relative phase patterns
For the purpose of the relative phase analysis comparing onset and coda nasals using the coupling graphs
for pa.na and pan.ta, respectively, only the relative phases between oral and velum gestures composing a
nasal multi-gesture molecule are examined. Recall that for onset nasals in pa.na, the target relative phase
for Velum lowering (VELl) and Velum raising (VELr) gestures is set as 180°, (see Figure 6.6), set at 90°
for VEL lowering (VELl) and TT closure (TTclo) gestures, and set at 0° for TTclo-toVELr coupling.
After running the simulations as detailed above, final relative phases for these three coupling relations are
150°, 120°, and 30°, respectively (Figure 6.8).
Figure 6.8. Sample relative phase plots for the coupled oscillator simulation of (pa.na)
(Two trial iterations are shown.)
For coda nasals in pan.ta, target relative phases for VELl-to-VELr coupling and for VELl-to-
TTclo coupling are the same for the onset nasal graph in pa.na, which are 180° and 90°, respectively. The
only difference between the onset and the coda nasal graph within the nasal multi-gesture complex
(although other coupled oscillators in the coupling graph may also be relevant in computing the final
relative phase) is the target relative phase for TT closure (TTclo) to VEL raising (VELr) gestures, which
is set for 90° in the coda nasal structure (in contrast to the in-phase [0°] relation in the onset nasal’s for
0 5 10 15 20 25 30
Time
0
50
100
150
200
250
Relative Phases (°)
Relative Phases for (pa.na)
|
|
|
|
|
|
150°
30°
120°
VELl - VELr
TTclo - VELr
VELl - TTclo
0 5 10 15 20 25 30
Time
0
50
100
150
200
250
Relative Phases (°)
Relative Phases for (pa.na)
|
|
|
|
|
|
150°
30°
120°
VELl - VELr
TTclo - VELr
VELl - TTclo
165
the same coupling link). Figure 6.9 plots sample iterations of coupled oscillator simulation of pan.ta and
shows that final relative phases for the oral and velum gestures composing a coda nasal do not change
from the initial target relative phases.
Figure 6.9. Sample relative phase plots for the coupled oscillator simulation of (pan.ta)
(Two trial iterations are shown.)
We see that the inharmonious coupling structure for onset nasals (/#n/) results in departures from the
target relative phases due to competition among coupling links, while the coda nasal’s (/n#/) harmonious
coupling structure preserves the input phase relations to the final output relative phase values.
The investigation of the stabilization time allows comparison of timing stability among coupling
relations that potentially have different relative phases. For both onset and coda nasals, the stabilization
time—representing the time it takes for the relative phase to reach its final stable state (i.e., equilibrium
state), indicated by the ticks—is shorter for VELl-to-VELr timing (blue lines at 150° for onset nasals & at
180° for coda nasals) compared to the settling time for both VELl-to-TTclo timing (yellow lines at 120°
for onset nasals & at 90° for coda nasals) and TTclo-to-VELr timing (red lines at 30° for onset nasals & at
90° for coda nasals).
0 5 10 15 20 25 30
Time
0
50
100
150
200
250
Relative Phases (°)
Relative Phases for (pan.ta)
|
| |
|
| |
|
180°
90°
90°
VELl - VELr
TTclo - VELr
VELl - TTclo
0 5 10 15 20 25 30
Time
0
50
100
150
200
250
Relative Phases (°)
Relative Phases for (pan.ta)
|
| |
|
| |
|
180°
90°
90°
VELl - VELr
TTclo - VELr
VELl - TTclo
166
Table 6.1. Actual (observed) and model predicted lags for onset and coda nasals
onset nasal /n#/ coda nasal /n#/
observed lag
mean (st.dev)
Predicted lag
(phase angle)
Observed lag
mean (st.dev)
Predicted lag
(phase angle)
VELl-to-TTclo (A) 136 ms (7.15) 166.67 ms (120°) 106 ms (6.26) 125 ms (90°)
TTclo-to-VELr (B) 20.2 ms (6.07) 41.67 ms (30°) 100.6 ms (5.28) 125 ms (90°)
VELl-to-VELr (C) 155 ms (11.8) 208.33 ms (150°) 209 ms (11.3) 250 ms (180°)
The model output of relative phases shown in Table 6.1 is compared with the kinematic data on
Korean velum-oral intergestural timing. The final phase relations of the tripartite coupling graph results in
a harmonious output (i.e., the sum of relative phases A and B equals to the relative phase C; thus, A+B-C
= 0°).
36
Table 6.1 indicates the observed and predicted temporal lags (predicted lag measures are
estimated with the following equation: (θ/360°) × 500 ms/cycle where θ is the output relative phase angle
and clock frequency is 2 Hz. E.g., for a phase angle of 30°, the predicted lag is 30/360 × 500 ms/cycle =
41.67 ms). Note that the predicted lag measures are in reasonably good agreement with the observed
temporal lags in the articulatory data, overall 20-50 ms longer than the observed temporal lags (which can
be tuned by adjusting the clock frequency), but crucially the relative magnitudes of phase differences for
the model predicted relative timing well in-line with the articulatory data.
The actual observed data and the predicted lags are compared for each coupling relation. First, the
onset lags, indicated in the model by the coupling relations between Velum lowering (VELl) and TT
closure (TTclo) gestures, show that the final relative phase for onset lags lengthens to 120 degrees
(predicted lags: 166.67 ms) in onset nasals, whereas the relative phase for onset lags remains the same
(90°) for coda nasals (predicted: 125 ms). Thus, the model correctly predicts that onset lags are longer for
36
The input of the onset nasals’ coupling graph started as an inharmonious structure, but the competitive coupling
via dynamical systems approach implemented in TADA generated compatible output relative phrase structures via
compromises in the target relative phase values.
167
syllable onset nasals than for syllable coda nasals (mean onset lags for onset nasals: 136 ms; mean onset
lags for coda nasals: 106 ms). Second, the final relative phrase for the consonant nasality lag, represented
by the TT closure gesture onset (TTclo) to the Velum raising gesture onset (VELr), also exhibits a
departure from target phase (0° → 30°) while no such change is seen for coda nasals (90°). Therefore, the
model correctly predicts shorter nasality lags in the production of onset nasals than in coda nasals (final
relative phase for onset nasals: 30° [predicted lags: 41.67 ms]; final relative phase for coda nasals: 90°
[predicted lags: 125 ms]), which is the case in the intergestural timing results for Korean nasals (mean
consonant nasality lags for onset nasals: 20.2 ms; mean consonant nasality lags for coda nasals: 100.6
ms). Additionally, the model predicts a shortened velum duration (180° → 150°) for onset nasals only,
specifically the interval from the velum lowering onset (VELl) to the velum raising onset (VELr). The
articulatory empirical results on velum duration (indexed as the interval from velum lowering onset to its
raising onset) do show that velum lowering movement duration is shorter for onset nasals than for coda
nasals.
Taken together, the final relative phases from the coupled oscillator models correctly predict the
actual temporal coordination observed in the dissertation’s empirical real-time MRI data for Korean
nasals: onset nasals are associated with longer onset lags (A: VELl-to-TTclo timing), shorter consonant
nasality lags (B: TTclo-to-VELr timing), and shorter velum duration (C: VELl-to-VELr timing). The
distinct relative timing patterns for onset and coda nasals are generated by the current coupling model,
and the model achieves this only by a difference in the initial relative phase values for a single coupling
link in a loop. The next section focuses on whether the differences in the (in)compatibility in the coupling
structure can also predict differences in the stability in temporal coordination among articulatory gestures
for multi-gesture nasal complexes.
6.2.4.2. Stabilization time
Figure 6.10 plots the relative phases among gestures for a nasal consonant (30 iterations), which is the
focus of this phasing stability analysis, setting aside other phase relations in the coupling graphs.
168
Figure 6.10. Relative phases for onset nasals (left) and coda (right) nasals (30 iterations)
For both onset and coda nasals, relative phasing of the velum duration interval (indicated by the timing
between VEL lowering [VELl] and VEL raising [VELr] gestures in black in Figure 6.10) stabilizes more
rapidly than the relative phase for intergestural timing between velum and oral gestures (in red and blue,
respectively in Fig 6.10). Next, the comparisons between onset and coda nasals for each coupling relation
are examined via stabilization timepoints determined for each relative phase state over time. Again, the
stabilization time is calculated by the time the relative phase reaches its stable (equilibrium) final state
based on selected threshold and window size (details on parameters in the methods section above).
Figure 6.11 presents the results of stabilization time computed by TADA’s coupled oscillator
simulation for onset and coda nasals (100 iterations each, a total of 200 iterations). Using the datapoints
from the simulation, linear mixed effects regression tests using Kenward-Roger’s method are conducted
with repetitions as a random effect and syllable position (onset vs. coda) and coupling relation (VELl-
TTclo, TTclo-VELr, & VELl-VELr) as fixed effects. There is no interaction effect between syllable
position and coupling relation (F(2,495) = 0.916, p = 401), but there are significant main effects of
coupling relation (F(2,495) = 329.829, p < .001*) and of syllable position (F(1,495) = 96.604, p < .001*).
Tukey’s post-hoc pairwise comparisons show that stabilization time for relative phases between the VEL
169
lowering and raising gestures is significantly shorter than that between the VEL and TT constriction
gestures (VELl-TTclo vs. VELl-VELr: t(498) = 19.761, p < .001*; TTclo-VELr vs. VELl-VELr: t(498) =
20.939, p < .001*), but there is no difference in stabilization time between relative phases for oral-velum
timing (TTclo-VELr vs. VELl-TTclo: t(498) = 1.178, p = .467).
Figure 6.11. Stabilization time of relative phases for onset and coda nasals
Moreover, for all three coupling relations, the stabilization time exhibited for coda nasal timing is
shorter than the stabilization for onset nasal timing.
37
This simulation result on phasing stability is
consistent with the empirical data presented in Chapter 5 on temporal variations between onset and coda
nasals in Korean, which shows that onset nasals have greater internal timing variability than coda nasals.
As onset nasals are accompanied by longer stabilization time than for coda nasals, the onset-coda
asymmetry in timing stability is captured in the coupled oscillator simulation. This is consistent with the
hypothesis that inharmonious coupling structures produce more variability in relative phases.
37
Pairwise comparisons show that the stabilization time for coda nasals is shorter than for onset nasals for the three
relative phases presented in Figure 6.11 (onset lag: t(495) = 4.928, consonant nasality lag: t(495) = 6.753, velum
duration: t(495) = 5.343, all at p < .001*).
170
6.2.4.3. Variations in relative phases
In addition to how rapidly relative phases stabilize over time, variations of relative phases over iterations
of runs (100 iterations for each onset and coda nasal; a total of 200 iterations) are computed. The relative
phase trajectories for each coupling link are stopped at a timestep of 10 ms, as might be the case if
planning time is relatively short as it might be in when producing the word in context, and the relative
phase value at the stopped timepoint (before it is settled to the final state) is measured.
38
Figure 6.12. Density plots of relative phases before stabilization for onset and coda nasals
38
For the coupling between the oscillators of VELl and VELr for coda nasals, the final relative phase is already
settled to 180° for 84 tokens out of 100 iterations; this is due to its short stabilization time (< 10 ms). For
visualization purposes and for the variability analysis between onsets and codas, relative phase values are retrieved
at the 5 ms timestep (instead of the 10 ms) for these specific coupled oscillators.
171
Figure 6.12 presents the distribution of each relative phase value at a 10 ms time unit, comparing
onset nasals with coda nasals. The relative phases for the nasal consonant simulation correspond to the
following intergestural lags: velum-oral onset-to-onset timing (VELl-TTclo), consonant nasality timing
(TTclo-VELr), and velum duration (VELl-VELr), respectively. Levene’s tests reveal that for relative
phases between velum and oral gestures, onset nasals exhibit greater variability than coda nasals (onset-
to-onset timing: F(1,198) = 7.474, p = .007*, consonant nasality timing: F(1,198) = 13.775, p < .001*).
However, no difference in variations on VEL duration between onset and coda nasals is found (F(1,198)
= 0.366, p = .546).
39
The findings imply that the current coupling graphs for onset and coda nasals are
able to predict differential variabilities for intergestural timing between velum and oral gestures and the
lack of such differences in the velum duration interval.
6.2.5. Summary and Discussion
The current investigation of timing stability and its relation to inharmonious coupling relations in a
coupling graph architecture suggests that simple tripartite structures with different relative phase values
can yield structured variability in their output phasing and can capture linguistically driven patterns in this
intergestural timing variability. The modeling results for the coupled oscillator simulation illustrate how
coupling structures can be used to predict differences in timing variability. Crucially, the current relative
phase model suggests that the “inharmonious” phase target structure in the input coupling graph alone can
yield variable and instable relative phase patterns, without introducing additional parameters that
explicitly account for systematic variations (e.g., prosodic gestures or coupling strengths). This is a novel
finding which has not been demonstrated before in the coupling oscillator model in Articulatory
Phonology.
39
Recall that our empirical analysis made no particular predictions regarding stability of velum duration.
Nevertheless, panel 3 is included in Figure 12 since it is interesting to note potential application of relative phase
models to predict variations in gestural duration interval with phasing stability between closure and release gestures.
172
Based on articulatory data on intergestural timing for Korean onset and coda nasals, different
underlying relative phase values are postulated in the phonological representation of onset and coda nasal
via coupling structures. The initial inharmonious phasing relations for onset nasals change via
competition among coupling relations, while initial (harmonious) phasing relations remain the same in the
final relative phases for coda nasals. This inherent instability in the coupling structure of onsets may
contribute to the Korean onset nasals being more variable and malleable under prosodic modulations. The
structural variability in timing exhibited for onset nasals may in turn have contributed to onset nasals
being more susceptible to the phonological sound change process of de-nasalization, compared to more
stable coupling structure of the coda nasal multi-gesture complex. Conversely, onset and coda nasals may
have started with the same coupling graph, but as the velum gesture shortens (e.g., due to lax onset
consonant weakening in Koeran, a general aspect of Korean phonetics [Chen & Clumeck 1975, Kim
2011, Yoshida 2008, Yoo & Nolan 2020]), speakers learn an in-phase relation between the oral and the
VEL raising gesture, creating the inharmonious pattern for onset nasals but not for coda nasals. This
inharmonious coupling structure, then, causes the onset nasals to become even more variable and flexible
in timing.
The current model examines coupling graphs for syllable onset and coda nasals in Korean. Future
work is directed to use these relative phase models to account for different types of geminate nasals
(assimilated and concatenated) in Korean in order to investigate whether their phase and variability
properties can also be modeled by simple (in)harmonious phase target properties of the coupling graphs.
In sum, our empirical and computational results in part one of this chapter provide articulatory
grounding for a phonological sound change process with sources of variability in timing directly linked to
the articulatory representation of a multi-gestural nasal segment. Next we turn to how intergestural timing
can be advantageously used for the classification among different phonological categories.
173
6.3. Machine Learning Classification of Multi-Gesture Complexes
6.3.1. Background
This section provides a machine learning approach to phoneme classification that has been applied to
classification problems using data from various acoustic dimensions (Dekel et al. 2004, Graves &
Schmidhuber 2005, Yousafzai et al. 2008). Among many acoustic parameters used for phoneme
classification, e.g., formant frequencies, spectral peak locations, zero-crossing rates and band energy ratio
(Frid & Lavner 2010), mel-frequency cepstrum coefficients (MFCCs) retrieved from the TIMIT acoustic-
phonetic corpus (Garofolo et al. 1993) are by far the most studied speech-relevant variables (Graves &
Schmidhuber 2005, Graves et al. 2005, Salomon 2001, Yousafzai et al. 2008). In contrast, articulatory
dimensions have rarely been used for phoneme classification,
40
and the current modeling aims to fill this
gap by introducing various dimensions of articulatory variables in both the temporal and spatial domains.
Various techniques used for phoneme classification (and speech recognition) include
Bidirectional Long Short Term Memory (BLSTM) networks (Graves & Schmidhuber 2005), Mixtures of
Probabilistic PCA (MPPCA) methods (Tipping & Bishop 1999, Yousafzai et al. 2008), and Recurrent
Neural Networks (RNN; Atlas et al. 1987, Chandra et al. 2007, Robinson 1994), among others. In the
current study, the Support Vector Machines (SVM) classification method is chosen (Frid & Lavner 2010,
Salomon 2001, Yousafzai et al. 2008, Vapnik 1995). The SVM technique has been used for various
speech task applications, from phoneme classification (Clarkson & Moreno 1999, Ganapathiraju et al.
2000) to speaker identification and verification (Ma et al. 2001, Schmidt & Gish 1996). The advantage of
using SVM in phoneme recognition, especially based on articulatory variables, is that SVM is known to
produce good performance results on a relatively small dataset (Garofolo et al. 1988, Salomon 2001), a
constraint of nearly all articulatory datasets.
40
Lammert et al. (2020) introduces vocal tract constriction variables along with a coupling graph architecture, in
addition to MFCCs and the frequency domain, in their computation of the speech articulatory coordination (SAC)
metric.
174
Although the prior studies noted above have proven the efficiency of using SVM in classification
of acoustic speech production data, the SVM classification process has not been extended to include
articulatory variables. The current chapter uses SVM specifically on a diverse set of spatiotemporal
articulatory variables obtained from real-time MRI speech production data, with the aim to incorporate
not only acoustic variables in speech processing problems but also spatiotemporal articulatory variables,
including particularly temporal coordination among speech gestures. This novel attempt to use SVM on
articulatory speech data will, it is hoped, ultimately foster research on the acoustic-articulatory interface
in the ongoing development of human speech technology.
The aim of this modeling is to analyze and characterize various discriminative articulatory
features for the classification of multi-gesture complexes. This work focuses on two pairs of phonetic
categories that have been explored in this dissertation’s rtMRI experimental work: Hausa ejective vs.
implosive consonants and Hausa implosives & voiced plosives. Crucially, we explore in further detail
how the variables relevant to intergestural timing relations may be differentially effective in each
classification problem, to test whether certain parameters (e.g., relative timing between onsets) are more
valuable than others in solving the classification problems.
The results of this study can deepen our understanding of key features of these phonological
classes by utilizing different types of articulatory dynamic variables, and crucially by considering the
spatiotemporal variables that we have postulated to play critical roles in the internal architecture of these
multi-gesture complexes. The relative foundational importance or ‘weight’ of these properties may help
explain why certain phonetic categories are more difficult to learn than others. For example, minimally
contrasting speech sounds with multiple highly-weighted features are discernable in various ways—which
may result in high accuracy of the classification model—and are thus easier to process and learn the
distinct categories, in contrast to contrasting sounds with more or less indistinguishable and low-weighted
features. In turn, information on how different sets of articulatory variables may be important in
combination, as evidenced by how they embed in phoneme classification problems, may be fruitful in
refining abstract phonological representation of speech segments.
175
Further, this identification to find relevant articulatory features for machine learning classification
problems provides additional information beyond what is readily available in the statistical testing of the
empirical data. For example, the models obtained from the phoneme classification offer relative weights
of the articulatory variables, allowing comparisons among different measures of variables that extend
beyond claims of significant differences in the spatiotemporal characteristics of individual gestural
actions and their relative timing. In the statistical analysis of the articulatory studies in Chapters 2 and 3,
comparing relative importance between variables is not plausible, for example due to differences in the
units of measurement as well as the basic structure of the statistical testing. The relative weights
calculated from the classifier model, however, represent each feature’s relative contribution to the
solution, albeit within the confines of the classification problem. Therefore, not only the articulatory
variables with different unit measurements but ultimately other speech variables (acoustic and/or
aerodynamic) can be included all together in the classification problem, and this allows the investigation
an array of variables and variable types as potentially useful distinguishing features for a given set of
speech sounds and of how different subsets of variables may have combinatorial/synergistic effects on
improving the performance of the model. Finally, this information on the set of relevant features for each
phoneme classification, and further for each language, can provides knowledge that is useful for
advancing speech recognition models.
6.3.2. Predictions
Given that segmental multi-gestural structures are known to have a high degree of cohesion (Browman &
Goldstein 1990, 2000, Byrd 1996a, Kelso et al. 1986) among their gestural components, relative timing
between gestures is predicted to exhibit a greater effect in phoneme classification than the durational (or
spatial) properties of individual gestural actions in distinguishing one complex from another with highly
similar components. The temporal choreography and cohesion, we have argued in the preceding
experiments and stabilization modeling, is critical to defining critical properties of these phonological
176
representations due to specific and tight coordination expected for these multi-gesture complexes in
service to their superordinate goals.
Our empirical data in previous chapters investigating articulatory dynamics of various multi-
gesture complexes has examined Hausa complex and simplex segments: non-pulmonic consonants vs.
pulmonic counterparts, respectively. Articulatory characteristics observed for each phonological category
in previous chapters will be the basis of the current model predictions. In our real-time MRI speech data
on oral and vertical larynx actions in Hausa consonant production, relative timing differentiates non-
pulmonic consonants versus pulmonic consonants. Figure 6.13 presents sample kinematic trajectories for
vertical larynx (LX; in green) and labial (LAB; in yellow) gestures for voiced implosives (left) and voiced
plosives (right).
Figure 6.13. Sample trajectories for voiced implosives (left) and voiced plosives (right)
The vertical line on each plot in Figure 6.13 represents the onset of vertical larynx movement and
indicates this temporal landmark’s relative coordination with the associated oral constriction gesture. As
illustrated in the schematic gestural scores at the bottom of the plots, the oral (LAB) constriction gesture
precedes the vertical larynx (LX) gesture for voiced implosives; larynx lowering starting just before the
release of the oral constriction. For voiced stops, on the other hand, the LAB and LX gestures begin
roughly simultaneously. Thus, the oral onset-to-vertical larynx onset timing is positive for implosives
177
(i.e., vertical larynx movement initiation is delayed with respect to oral constriction onset) and is near-
zero for voiced pulmonic consonants.
In considering the generalization of the exhibited contrast in relative timing between labial
implosives and voiced stops to glottalic versus pulmonic distinctions more generally, Figure 6.14 presents
histograms for gestural onset lags in three phonological categories: non-pulmonic consonants (ejectives
[/k’, k
w
’, s’/] and implosives [/ɓ, ɗ/] plotted separately), and their pulmonic counterparts (/k, k
w
, s, b, d/).
Note that onset lags for implosives as well as ejectives are positive (mean onset lag in between 60-70 ms),
while mean onset lags for pulmonic consonants are close to zero. Thus we have proposed that the
kinematic data and the distribution of onset lags are evidence that phonological contrast between
glottalics and (like) pulmonics lies in their intergestural timing patterns. In contrast ejectives and
implosives are not differentiated by timing patterns, but rather vertical larynx action magnitude crucially
separates ejectives (with upward larynx movement) from implosives (with downward larynx movement).
Figure 6.14. Histogram of onset lags for ejectives, implosives, and pulmonic Cs
178
These spatiotemporal characteristics of contrastive phonological categories are illustrated in
Figure 6.15. Based on these articulatory characteristics from the empirical data, it is predicted that the
intergestural timing variables may serve as highly significant features in solving the classification
problem between non-pulmonic versus pulmonic consonants, whereas individual vertical larynx actions
may be weighed more heavily than timing relations when classifying ejectives versus implosives.
Figure 6.15. Schematic gestural organization for ejectives, implosives, and voiced plosives
6.3.3. Methods
6.3.3.1. Support vector machine classification
Support Vector Machine (SVM) classification is based on statistical learning frameworks and developed
from a linear maximum margin classifier to a non-linear classifier. SVMs are commonly used to solve the
binary classification problem. An SVM constructs a hyperplane or a set of hyperplanes in a high-
dimensional space, which ideally separates the two classes with the largest distance (i.e., margin) to the
nearest training data point of any class. If the margin is large, the classification errors are reduced;
additionally, the current standard SVM allows soft margins, whereby a certain number of mistakes (i.e.,
misclassifications) are permissible while keeping the margins as wide as possible (Cortes & Vapnik
1995). Therefore, SVM is useful to classification problems for speech data, as the categories of interest in
speech data sets generally have a large amount of overlap (Frid & Lavner 2010, Salomon 2001).
179
The general formulation of linear SVMs is the following:
1. Given a training dataset D of N points of the form (x1, y1), (x2, y2), …, (xN, yN), where yi are
either 1 or -1, indicating the class to which the point xi belongs. Each sample is composed of a
training example xi of length M, with elements xi = (x1, x2, …, xM).
2. The goal is to find the maximum-margin hyperplane that divides the group of points xi with yi =
1 from the group of points with yi = -1. Such classifier has a decision function f(x), such that
f(xi) = yi, for all (xi, yi) in D.
When the optimal separating hyperplane(s) is found, the SVM can predict the unseen datapoints. For
example, if you have K number of variables, a one-versus-one multiclass classifier (Burges 1998,
Campbell 2001)
41
creates one binary SVM for each combination of variables, resulting in K (K – 1) / 2
binary SVMs. The unseen example is classified for each SVM, and the vote is given to the winning class.
Finally, the unseen example is predicted as the class with the most winning votes (Salomon 2001). Due to
huge computation costs of processing multiple combinations of binary SVMs, SVM is used for the
problems with a relatively small amount of training data. This suits the current examination of feeding
articulatory variables into the classifier problem, given that the acquisition of the articulatory variables
requires compared to acoustic variables the additional computations associated with image processing and
analysis to obtain the kinematic information.
6.3.3.2. Parameters and feature selection
The current SVM analysis is based on articulatory characteristics of non-pulmonic consonants and
implosives and voiced plosives in Hausa. For the classification, a total of 9 articulatory parameters (or
features) drawn from both the spatial and temporal domains are incorporated. The selected features are
summarized in (7).
41
The most commonly used one-versus-rest classifier is not used, as the performance on speech data is scarce and it
is reported to perform poorly in terms of properly separating the speech data (Chin 1999, Clarkson & Moreno 1999)
180
(7) Articulatory kinematic parameters for classification
Gestural duration
• Oral duration (oral dur; from movement onset to offset)
• LX duration (lx dur; from movement onset to offset)
Gestural magnitude
• Oral constriction at movement onset (oral onset)
• Oral constriction maximum (oral max)
• LX vertical onset position (lx onset)
• LX vertical displacement (lx disp)
• LX vertical extremum (lx ext)
Intergestural timing
• Oral onset-to-LX onset lag (o.o – lx.o)
• LX onset-to-oral target lag (lx.o – o.t)
For the classification of ejectives versus implosives, there were 210 samples with 9 variables, with 1,890
datapoints (210 × 9) and for the classification of implosives versus voiced plosives, a total of 1,512 (168
× 9) datapoints.
The aim of the classification modeling is to find the model with the best performance and to
obtain a subset of features that are useful to build a good predictor model (Blum & Langley 1997, Guyon
& Elisseeff 2003, Kohavi & John 1997). This feature ranking using SVM can be used to understand the
representation of speech data and to identify relevant features for the classification problems. For
example, the feature ranking via weights obtained from linear SVM models is related to the relative
importance of each feature in constructing the model, and the weights on feature ranking can be
effectively used to produce good classification performance (Chang & Lin 2008). The SVM models will
evaluate the relationship between each input variable and the response variable and calculate the weights
for each input variable, which shows the strength of its relationship with the target response. In the
current analysis, the relative importance of features for each classification problem (ejectives vs.
implosives & implosives vs. voiced plosives) is compared with one another. Additionally, we will
identify for a given pair of phonological categories a set of distinct articulatory features that contribute to
maximizing classification predictions.
181
6.3.3.3. Classification procedure
The Classification Learner in MATLAB is used to compute the SVM models, with the cross-validation of
5 folds being used to estimate accuracy on each fold. The dataset is split into 5 folds, and in the first
iteration, the first fold is used as the testing set and the rest are used as the training data. In the next
iteration, the second fold is used to test the model and the rest serve as the training set. This process is
repeated until each of the 5 folds have been used as the testing data (Figure 6.16). Cross validation
protects against overfitting by partitioning the dataset into folds.
Figure 6.16. 5-fold cross validation
In the classification learner in MATLAB, the dataset includes all nine articulatory variables as predictors
as well as the response variable, which stores the target classification values. Then, the learner trains the
dataset on all SVMs, including the linear SVM, polynomial (e.g., Quadratic and Cubic) SVM, and
Gaussian SVMs using the radial basis function (RBF) kernel.
The classification algorithms are run on the two separate datasets independently, the first
including tokens for ejectives and implosives only and the second subset including the voiced implosives
and voiced plosives only. The learner trains the data using all nine articulatory variables and estimates
accuracy of each SVM models. The linear SVM model also computes the weights of each of the nine
variables in determining the multi-dimensional hyperplanes. The hyperplane function has the following
form:
182
(8) β0 + β1x1 + β2x2 + … + βNxN = 0 (where N = number of variables)
If the left-hand formula in (8) returns a negative value, the testing data will be classified as y = -1, and if it
has a positive value, the predicted classification would be y = 1. Moreover, β0 indicates bias of the
hyperplane, and the vector (β1, β2, …, βN) indicates the β values for each articulatory variable, which
signals the relative importance of the variables. For example, if one beta is smaller than the other (e.g., β1
< β2), the variable x1 is contributing less to the calculation of the hyperplane (i.e., is less important in
terms of classification) compared to the variable x2.
Based on this β vector, the least important feature, or variable, is removed, and in the next stage,
N-1 variables are used to train the SVM models. This process is repeated until we are left with only one
variable, which would correspond to the most relevant variable in solving this specific binary
classification problem. Therefore, the feature selection method is based on the β values retrieved from the
linear SVMs. After training the models with this feature selection method, the SVM model with the
highest accuracy is reported along with the variables used for that model.
The data for the SVM training process consisted of 126 samples of ejectives (/k’, /k
w
’, s’/), 84
samples of implosives (/ɓ, ɗ/) and 84 samples of voiced stops (/b, d/). All the feature vectors (see [3])
were scaled to zero mean and unit variance (z-scored), and this standardized data are used for the
classification.
6.3.4. Results
6.3.4.1. Ejectives vs. implosives
The first stage of classification training uses all 9 features. This linear SVM model has 93.8% accuracy.
Table 6.2 summarizes the beta values for each linear SVM model with 9 variables, 8 variables, etc.,
dropping the variable that has the lowest absolute beta value. The model with the best performance is the
quadratic SVM with the accuracy of 95.2 percent. In this optimal SVM model, the following six features
are selected: LX vertical displacement (lx disp), LX vertical extremum (lx ext), LX vertical onset position
183
(LX onset), onset lags (o.o. – lx.o), LX onset to oral target lags (lx.o – o.t), and oral duration (oral dur),
presented in order from highest to lowest beta values.
42
Table 6.2. Beta values for each variable in linear SVM models (ejectives vs. implosives)
n.
lx
disp
lx
ext
lx
onset
o.o –
lx.o
lx.o –
o.t
lx
dur
oral
dur
oral
onset
oral
max
1 -3.405 -1.904 1.279 -.846 -.738 .391 -.331 -.262 .207
2 -3.179 -1.759 1.213 -.838 -.732 .368 -.217 -.117
3 -3.165 -1.764 1.195 -.681 -.573 .359 -.305
4 -2.804 -1.587 1.036 -1.105 -.964 .355
5 -2.477 -1.326 .987 -.975 -.725
6 -2.427 -1.304 .963 -.203
7 -1.754 -.948 .691
8 -2.294 -.285
The variables with greater relative importance in the ejective vs. implosive classification as indicated by
greater beta values are LX displacement, LX extremum, and LX onset position, among others. This can
be interpreted as demonstrating that the vertical LX magnitude values distinguish ejectives from
implosives with the largest margins, with fewer overlapping datapoints. The optimal SVM model also
includes the intergestural timing variables (gestural onset lags and onset-to-target lags), and marginally
oral duration. The features not included in the model are LX duration and oral magnitude. This implies
that these features show an overlapping distribution and thus are not useful in differentiating the ejective
category from the implosive category.
42
This model has a bias of -0.8636, and the kernel scale is 1.5187.
184
(a)
(b)
(c)
(d)
Figure 6.17. Sample classification results for ejectives (blue) and implosives (red);
(a-c) features with good separation between classes and (d) features with poor separation (overlapping)
(Misclassified datapoints are indicated with “×”)
The classifier model can be understood as identifying the set of features that are sufficiently
distinctive in classifying two phonological categories of ejectives versus implosives, as seen for example
in Figures 6.17(a-c) above with good separation between the two classes versus those that are not useful
as seen for example in Figure 6.17(d) with generally overlapping distribution of features between two
classes.
185
Predicted class
Ejectives Implosives
True class
Ejectives
98% 2%
Implosives
8% 92%
Figure 6.18. Confusion matrix for ejectives vs. implosives SVM classification model
Figure 6.19. ROC curves for ejectives (left) and implosives (right)
The confusion matrix in Figure 6.18 shows true positive and false negative rates for each class,
and the ROC (receiving operating characteristics) curves in Figures 6.19 show true and false positive rates
for ejectives and implosives. The overall quality of the classifier indicated by Area Under Curve (AUC) is
0.98, which represents a high performance for this classifier.
186
6.3.4.2. Implosives vs. voiced plosives
The second binary classification problem explored here is to find the best classifier from the dataset
including the articulatory characteristics of implosives and voiced plosives. The initial linear SVM model
with all 9 articulatory variables has an accuracy of 66.1 %, which is significantly lower than the
classification models for ejectives vs. implosives. The corresponding beta values for the linear SVM
model are presented in Table 6.3. This shows that the most important variable for the classification
between implosives and voiced plosives is the LX onset-to-oral target lag (lx.o – o.t), as indicated by the
largest absolute weight in the computation of the hyperplane for the SVM model.
Table 6.3. Beta values for each variable in linear SVM models (implosives vs. voiced plosives)
n.
lx.o –
o.t
oral
dur
oral
max
oral
onset
o.o –
lx.o
lx
dur
lx
disp
lx
ext
lx
onset
1 -1.182 -.546 -.514 -.442 .296 .219 -.143 -.066 .030
2 -1.172 -.522 -.493 -.430 .262 .211 -.157 -.034
3 -1.126 -.395 -.360 -.374 .113 .214 -.113
4 -1.066 -.243 -.288 -.293 .036 .185
5 -1.044 -.212 -.268 -.281 .173
6 -.695 -.098 -.214 -.198
7 -.445 -.150 -.121
8 -.201 -.089
The model with the highest accuracy for this classification problem is 73.2%, which uses the medium
gaussian SVM with two variables, onset-to-target lag (lx.o – o.t) and oral maximum constriction (oral
max).
43
With respect to which variables have relatively high importance in this SVM classification model,
the LX onset-to-oral target timing followed by the action of oral gesture (both in duration and in
43
This model has a bias of -0.5229 and the kernel scale of 1.4.
187
magnitude) contribute more positively to the performance of the model than the individual laryngeal
gestural actions. Articulatory variables such as LX duration and LX magnitude are not included in the
best performing model, and thus they are considered not useful in solving the classification problem
between implosives and plosives. Figure 6.20 shows the classification results with highly weighted
variables (i.e., timing and oral gestural actions), but with a relative low accuracy compared to the model
in Figure 6.17, the distribution of the two classes is overlapping and not linearly separable (the correctly
classified points are indicated with dots and the misclassified points with x).
Figure 6.20. Sample classification results for implosives (red) and voiced plosives (blue)
(Misclassified datapoints are indicated with “×”)
The 73.2% accuracy of this model is significantly lower than the accuracy of classifying ejectives
and implosives (95.2%), which may indicate that voiced implosives and voiced plosives have many
overlapping articulatory features making the classification learning harder to achieve with high accuracy.
Alternatively, it may be the case that certain acoustic or aerodynamic parameters also make critical
contributions to a successful classification of implosives and plosives.
188
Predicted class
Implosives Voiced stops
True class
Implosives
71% 29%
Voiced stops
25% 75%
Figure 6.21. Confusion matrix for implosives vs. voiced stops SVM classification model
Figure 6.22. ROC curves for implosives (left) and voiced stops (right)
Figures 6.21 and 6.22 present overall performance for the classification model. The true positive
rates in the confusion matrix and the AUC of 0.78 both indicate that the classification between implosives
and voiced plosives have much worse performance that the classification between ejectives and
implosives.
189
6.3.5. Summary
The current machine learning classification for i) ejectives and implosives and for ii) implosives and
voiced plosives show that each binary classification task uses a different subset of articulatory features in
its best performance model(s). For the classification between ejectives and implosives, the articulatory
variables related to LX vertical magnitude have the strongest relative importance in the computation of
the classifier. For the classification between implosives and voiced plosives (which shows lower overall
accuracy than the former classification problem), the vertical larynx onset-to-oral target timing has the
strongest relative importance in constructing the classification model, followed by oral gestural actions.
The classification results presented above are consistent with the articulatory experimental findings for
Hausa consonants in which ejectives and implosives are differentiated by vertical larynx actions and
implosives and plosives are best distinguished by the LX-oral timing relations.
The results from SVM classification models demonstrate that key variable types or subsets of
variables among the individual dependent measures of the rtMRI experiment—vertical larynx actions for
ejectives as compared to implosives, and timing patterning for implosives as compared to plosives—are
of potential use in understanding how phonological contrast may be encoded and recovered for these
different classes of speech sounds. This is of course not surprising for the importance of larynx raising
versus lowering in contrasting ejectives and implosives, but the classification method provides novel
confirmation of the importance of relative timing for distinguishing implosives from voiced stops, at least
in Hausa. Furthermore, such a machine learning technique can ultimately be extended to larger datasets to
investigate distinctive articulatory, aerodynamic, and acoustic characteristics for a given set of phonetic
categories that may be difficult to identify or distinguish with classic statistical methods.
Furthermore, the SVM feature selection process and the relative importance ranking returned
among the input variables provide insights as to which sets of features are relevant in phoneme
classification problems and may have implications for the articulatory representation of phonological
units in that such representations can sensibly be thought to best capture properties important to
distinguishing minimal contrasts in a language. The machine learning classification models adds to the
190
experimental findings that identify significant effects of kinematic variables on creating contrastive
phonological systems by offering knowledge on the relative importance or ‘weights’ of various
articulatory (and other speech-related) variables. Moreover, classification methods such as SVM use how
multiple variables can work cooperatively in solving the task (e.g., one-to-one pairwise groupings of
features used in this study). In this line, highly weighted variables in the model may be synergistically
contributing to a better performance of the model, which in turn has implications for the representation of
a group of speech gestures that involves coordinative actions of multiple gestures to achieve a
superordinate speech goal. Future research using the SVM classification methods on speech data can
provide deeper understanding of hidden relationships among speech variables, and this provides another
analytic resource for probing multi-gesture complexes.
6.4. Conclusion
In this chapter, computational modeling approaches to the internal relative timing of multi-gesture
complexes are proposed. First, the relative phase stability analysis reveals how differences in the coupling
structures can predict systematic stability/variability in relative phase values. The results from these phase
stability simulations show that certain speech segments or syllable structures with inharmonious coupling
relations may generate variable relative phase, which may make the structure susceptible to sound change
processes and in general to contextual variation in speech. Second, the SVM classification study shows
that among various spatiotemporal articulatory variables characterizing non-pulmonic and pulmonic
consonants, intergestural timing is found to be one of the important articulatory variables in classifying
non-pulmonic consonants from pulmonic ones.
Consequently, the outcome of the current modeling analyses on timing further illuminates our
theoretical understanding of within-segment coupling structures for multi-gesture complexes and
potentially impacts the development of speech production models.
191
7. Conclusion
The overarching goal of this dissertation is to understand how contrastive linguistic structures are built at
different levels of granularity and how they interact with high-level contextual variation in speech. In
previous chapters, articulatory approaches are taken to investigate speech and prosodic variation by
analyzing dynamic speech production imaging data, thereby deepening our understanding of how
different levels of linguistic sub-systems self-organize. Specifically, empirical phonetic evidence of
phonological units at various scales are examined, paying attention to how segmental and supra-
segmental structures such as prosody interact with each other. Understanding how the same atomic
gestural units situated in different (segmental/syllabic/prosodic) structures vary in their coordinative
patterns and in their temporal stability can inform us as to why certain linguistic units and not others
undergo specific phonological processes, both synchronically and diachronically. By illuminating the
basis for these patterns within a cognitive framework for phonological representation, we can obtain a
better understanding of how speakers systematically organize the smallest linguistic units to generate and
convey multiple levels and types of linguistic information.
Investigation of articulatory timing in multi-gestural complexes has been difficult due to a lack of
instrumental data, especially with regard to quantifying non-oral component actions such as velum,
laryngeal, glottal, and pharyngeal gestures. Knowledge of the behavior of non-oral component gestures
and their coordination with oral gestures is crucial in fully revealing intergestural coordination, especially
coordination within segment-sized gestural molecules, which often include such non-constriction actions.
The articulatory studies of this thesis work to fill this gap by incorporating empirically understudied
complex speech configurations such as glottalic consonants and nasal consonants using real-time MRI
data (Narayanan et al. 2004). The new horizons offered by this state-of-the-art speech imaging are
complemented by innovation in data analysis through the application of our automatic centroid tracking
tool (Oh & Lee 2018) for quantifying laryngeal and velum action, which provides a quantifiable method
192
for studying articulation of non-constriction gestures and non-oral gestures using rtMRI (or similar
imaging) data.
This dissertation consists of four articulatory studies on multi-gesture complexes and concomitant
computational modeling to examine how the smallest atoms of speech segments are stably and
synergistically coordinated to produce meaningful speech sounds. The stability and flexibility of the
temporal dynamics of speech segments are examined by probing various levels of phonological
granularity from segmental to syllabic to prosodic. The empirical studies undertake real-time MRI
experiments on the production of Hausa pulmonic and non-pulmonic consonants (Chapters 2-3) and on
Korean singleton onset and coda nasals as well as juncture geminate nasals (Chapter 4-5). In Chapter 6,
the computational approaches to modeling of the multi-gesture complexes examine underlying coupling
graphs that give rise to relative phase patterns with differential stabilities and apply machine learning
SVM classification to identify useful features that improve the performance of binary classification
models of certain linguistic properties of these complexes.
The dissertation integrates theoretical stances on the dynamics of gestural structures that have
been developed by Browman and Goldstein (1990, 1995) within Articulatory Phonology and Task
Dynamics with novel articulatory data. This work is accompanied by new modeling of intergestural
timing stability within a coupling graph architecture (Goldstein et al. 2006, Nam et al. 2009), with the
goal of a re-visiting and enriching our theoretical linguistic understanding of the representation of action
coordination in speech production. The dissertation addresses the following questions: a) do non-oral and
non-constriction articulators, such as vertical larynx actions and velum behaviors, have different speech
motor planning systems compared to oral gestures; specifically, do non-oral and oral gestures behave
differently when interacting with linguistic variations such as prosodic modulations, b) are intergestural
timing and stability patterns different for within-segment coordination as compared to what has been
suggested for the across-segment timing patterns (e.g., Byrd & Choi 2010), c) how does the variability in
timing relations play a role in phonological sound change processes, d) is intergestural timing
193
systematically controlled in the speech planning system as part of the phonological representation for (at
least certain) multi-gesture complexes.
The first component of the articulatory studies examines non-pulmonic and pulmonic consonants
in Hausa, and the findings provide an explanation on how glottalic consonants (ejectives and implosives)
exhibit different intergestural timing patterns as well as differential timing stability compared to their
pulmonic counterparts. Ejectives and implosives have active in their production, we suggest, a critical
“superordinate” goal of aerodynamic pressure change, which require synergistic actions of oral and
vertical larynx gestures. This temporal constraint necessitates that gestures in glottalic consonants have a
more stable coordination than pulmonic consonants, which lack such a superordinate goal. Furthermore,
while oral gestures are known to lengthen at the presence of phrase boundaries, the vertical larynx
gestures examined here increase in magnitude when produced at a prosodic phrase boundary compared to
phrase-internal conditions. The finding suggests that speech motor control systems at the prosodic level
must allow for the possibility to differently encode their modulation for specific vocal tract subsystems
(‘tract variables’), determining which articulatory maneuvers are ‘visible’ to the prosodic planning system
for each gesture or multi-gesture complex. This deepens our knowledge of the diversity of mechanisms
that can be manipulated to convey linguistic prosodic information.
The second corpus of speech production data obtained and used for articulatory analysis includes
singleton and geminate nasal consonants in Korean, multi-gesture complexes that have as their
superordinate goal nasal airflow during oral closure.. One of the key findings in this portion of the
dissertation shows that the timing between the velum and the oral gestures is distinguished for syllabic
onset and coda nasals. Crucially, the intergestural timing between the oral closure onset and the velum
raising onset is synchronous for onset nasals, and sequential for coda nasals. Further, this difference in the
patterning of relative timing is accompanied by greater timing variability for syllable onsets compared to
nasals in the coda position. This onset variability in timing within a nasal segment contrasts with the in-
phase onset stability that is often recognized for the across-segment timing and syllable position (Byrd
1996b, Haken et al. 1985, Kelso et al. 1986, Goldstein et al. 2009, Nam et al. 2009), suggesting that
194
timing relations are differently encoded within a segment compared to the timing across segment
sequences. This temporal variability found in onset nasals is further discussed in connection with the
sound change process of onset de-nasalization in Korean. The inherent variability, or instability, in
intergestural timing may contribute to onset nasals becoming susceptible to the phonological process of
nasal weakening. Alternatively, the positional asymmetric sound change process may have shaped onset
nasals to have systematic flexibility in timing relations, at least for this specific multi-gesture complex.
The modeling component in the dissertation serves to connect the articulatory findings with
theories of coupling structure for within-segment intergestural timing. Additionally, coupling oscillator
dynamics can be shown to account for differential variability in timing illuminated in the empirical data.
In particular, stability in timing between articulatory gestures is the particular focus of the modeling, as
this is crucial, we argue, in understanding the achievement of abstract and specialized goals within the
speech motor system beyond those of individual gestural constriction actions. The coupled oscillator
model, which encodes relative phase relations for each pair of speech gestures, implemented in the TADA
Task Dynamics model (Nam et al. 2004) allows predictions regarding how interacting multipartite input
representations of relative timing yield differences in the output stability and variability for intergestural
timing. The novel results from the relative phase model reveal that the inharmonious coupling structures,
which requires inevitable competition among coupling relations, generate greater relative phase
variability than the harmonious coupling graphs. The model thus predicts onset nasals in Korean to have a
more variable timing than coda nasals due to their inharmonious coupling graph, defined based on the
articulatory findings. This modeling work explores new approaches to indexing variability in intergestural
timing, namely the stabilization time of final relative phases and the variations in relative phase values at
the speech planning level. Lastly, a machine learning classification using SVM is performed on
contrastive multi-gesture complexes in Hausa. The findings indicate that the vertical larynx magnitudes
are weighed heavily in classifying ejectives and implosives, whereas the timing from oral onset to larynx
lowering target is found to be useful in the classification problem between implosives and voiced stops.
The classification model provides information specifically on the relative importance among articulatory
195
variables with different unit measurements, potentially informing the construction of speech production
models that recognize a role for abstract superordinate goals.
In sum, the dissertation serves to add novel scientific knowledge to the field of linguistic speech
production regarding the internal articulatory dynamics of multi-gesture complexes, with special attention
directed to their intergestural timing and the stability and variability in this temporal coordination. The
coordination of oral speech constriction gestures has been examined and described for syllable structure
(e.g., Browman & Goldstein 1995, Goldstein et al. 2009) and other higher prosodic structures (e.g., Byrd
& Saltzman 2003, Saltzman et al. 2008), and this work contributes to this scientific endeavor by
incorporating an experimental corpus and suite of modeling of coordinative timing for multi-gesture
segmental structures that critically include non-oral, non-constriction gestures. Specifically, we introduce
the plausibility ‘superordinate goals’ for the planning and production of segment-sized multi-gesture
complexes, which involve specific temporal constraint and stability among the coordinated gestures in
order to achieve a viable production of the intended speech segment. The findings from this work have
potential to further our knowledge of how the basic atoms of speech are built up into organized structures
of larger granularity to produce and convey linguistic information in speech production. These and future
investigations of the complex patterning and structured stability in coordination among linguistic vocal
tract actions can provide insight into the process of self-organization of gestures at interacting levels of
linguistic sub-systems and under contextual variation, thus serving the scholarly goal of refining the
cognitive representation understood to be at work in encoding the structured phonological units used in
languages.
196
References
Ahn, M. J. (2008). The Korean/n/-insertion before [j] as [nasal] insertion. Studies in Phonetics,
Phonology, and Morphology, 14(3), 371-388.
Ahn, M. J. (2013). Acoustic duration of Korean nasals. Studies in Phonetics, Phonology, and
Morphology, 19(3), 411-431.
Andersen, K. F., & Sonninen, A. (1960). The function of the extrinsic laryngeal muscles at different
pitch. Acta Oto-Laryngologica, 51(1-2), 89-93.
Anderson, M. J. (2006). Distance‐based tests for homogeneity of multivariate dispersions. Biometrics,
62(1), 245-253.
Arvaniti, A. (1999). Effects of speaking rate on the timing of single and geminate sonorants. In
Proceedings of the 14th International Congress of Phonetic Sciences (pp. 599-602).
Atlas, L., Homma, T., & Marks, R. (1987). An artificial neural network for spatio-temporal bipolar
patterns: Application to phoneme classification. In Neural Information Processing Systems (pp.
31-40).
Blaylock, R., Goldstein, L., & Narayanan, S. S. (2016). Velum control for oral sounds.
In INTERSPEECH (pp. 1084-1088).
Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning.
Artificial Intelligence, 97(1-2), 245-271.
Bombien, L., Mooshammer, C., Hoole, P., Kühnert, B., & Schneeberg, J. (2006). An EPG study of initial
/kl/ clusters in varying prosodic conditions in German. In H. C. Yehia, D. Demolin, & R.
Laboissière (Eds.), Proceedings of the 7th International Seminar on Speech Production (pp. 35-
42). Pampulha, Brazil: CEFALA
Bouarourou, F., Vaxelaire, B., Laprie, Y., Ridouane, R., & Sock, R. (2015, October). The timing of
geminate consonants in Tarifit Berber. In 1st International Conference on Natural Language and
Speech Processing.
Bresch, E., Katsamanis, A., Goldstein, L., & Narayanan, S. S. (2010). Statistical multi-stream modeling of
real-time MRI articulatory speech data. In INTERSPEECH (pp. 1584-1587).
Browman, C. P., & Goldstein, L. (1988). Some notes on syllable structure in articulatory phonology.
Phonetica, 45(2-4), 140-155.
Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units. Phonology, 6(2), 201-
251.
Browman, C. P., & Goldstein, L. (1990). Tiers in articulatory phonology, with some implications for
casual speech. In J. Kingston, & M. E. Beckman (Eds.), Papers in Laboratory Phonology I:
Between the Grammar and Physics of Speech (pp. 341-376). Cambridge: Cambridge University
Press.
197
Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3-4), 155-
180.
Browman, C. P. & Goldstein, L. (1995). Gestural syllable position effects in American English. In F.
Bell-Berti, & L. J. Raphael (Eds.), Producing Speech: Contemporary Issues, For Katherine
Safford Harris (pp. 19-33). AIP Press: Woodbury, NY.
Browman, C. P., & Goldstein, L. (2000). Competing constraints on intergestural coordination and self-
organization of phonological structures. Les Cahiers de l'ICP. Bulletin de la Communication
Parlée, (5), 25-34.
Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the
American Statistical Association, 69(346), 364-367.
Brown, R., & Wade, G. (1987). Superordinate goals and intergroup behaviour: The effect of role
ambiguity and status on intergroup attitudes and task performance. European Journal of Social
Psychology, 17(2), 131-142.
Bruijn, S. M., Meijer, O. G., Beek, P. J., & Van Dieën, J. H. (2013). Assessing the stability of human
locomotion: a review of current measures. Journal of the Royal Society Interface, 10(83),
20120999.
Bückins, A., Greisbach, R., and Hermes, A. (2018). Larynx movement in the production of Georgian
ejective sounds. In Challenges in Analysis and Processing of Spontaneous Speech, 127-138.
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and
Knowledge Discovery, 2(2), 121-167.
Byrd, D. (1995) Articulatory characteristics of single and blended lingual gestures. in K. Elenius, & P.
Branderud (Eds.), Proceedings of the XIIIth International Congress of Phonetic Sciences (pp.
438-441).
Byrd, D. (1996a). A phase window framework for articulatory timing. Phonology, 13(2), 139-169.
Byrd, D. (1996b). Influences on articulatory timing in consonant sequences. Journal of Phonetics, 24(2),
209-244.
Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures. Phonetica 57, 3-16.
Byrd, D., & Choi, S. (2010). At the juncture of prosody, phonology, and phonetics–the interaction of phrasal
and syllable structure in shaping the timing of consonant gestures. In Laboratory Phonology, 10,
31-60. De Gruyter Mouton.
Byrd, D., Kaun, A., Narayanan, S., & Saltzman, E. (2000). Phrasal signatures in articulation. In M. B. Broe,
& J. B. Pierrehumbert, (Eds.), Papers in Laboratory Phonology V (pp. 70-87). Cambridge:
Cambridge University Press.
Byrd, D., & Krivokapić, J. (2021). Cracking Prosody in Articulatory Phonology. Annual Review of
Linguistics, 7, 31-53.
198
Byrd, D., & Saltzman, E. (1998). Intragestural dynamics of multiple phrasal boundaries. Journal of
Phonetics, 26, 173-99.
Byrd, D., & Saltzman, E. (2003). The elastic phrase: Modeling the dynamics of boundary-adjacent
lengthening. Journal of Phonetics, 31(2), 149-180.
Byrd, D., Tobin, S., Bresch, E., & Narayanan, S. (2009). Timing effects of syllable structure and stress on
nasals: a real-time MRI examination. Journal of Phonetics, 37(1), 97-110.
Campbell, C. (2001). An introduction to kernel methods. Studies in Fuzziness and Soft Computing, 66,
155-192.
Catford, J. C. (1971). Fundamental Problems in Phonetics. Edinburgh: Edinburgh University Press.
Chandra, R., & Omlin, C. W. (2007). The Comparison and Combination of Genetic and Gradient Descent
Learning in Recurrent Neural Networks: An Application to Speech Phoneme Classification. In
Artificial Intelligence and Pattern Recognition (pp. 286-293).
Chang, Y. W., & Lin, C. J. (2008, December). Feature ranking using linear SVM. In Causation and
Prediction Challenge (pp. 53-64). PMLR.
Chen, M., & Clumeck, H. (1975). Denasalization in Korean: A search for universals. In C. A. Ferguson,
L. M. Hyman, & J. J. Ohala (Eds.), Nasálfest: Papers from a Symposium on Nasals and
Nasalization (pp. 125-131). Stanford, CA: Stanford University Linguistics Department.
Chin, K. K. (1999). Support Vector Machines Applied to Speech Pattern Classification. Master’s thesis.
Cambridge University.
Chitoran, I., Goldstein, L., & Byrd, D. (2002). Gestural overlap and recoverability: Articulatory evidence
from Georgian. In C. Gussenhoven, & N. Warner (Eds.), Laboratory Phonology 7 (pp. 419-448).
Berlin: Mouton de Gruyter.
Cho, T. (2001). Effects of morpheme boundaries on intergestural timing: Evidence from Korean.
Phonetica, 58(3), 129-162.
Cho, T., & Keating, P. A. (2001). Articulatory and acoustic studies on domain-initial strengthening in
Korean. Journal of Phonetics, 29, 155-90.
Cho, T., & Ladefoged, P. (1999). Variation and universals in VOT: evidence from 18 languages. Journal
of Phonetics, 27(2), 207-229.
Clarkson, P., & Moreno, P. J. (1999, March). On the use of support vector machines for phonetic
classification. In IEEE International Conference on Acoustics, Speech, and Signal Processing.
Proceedings (cat. No. 99CH36258), 2, 585-588.
Clements, G. N., & Osu, S. (2002). Explosives, implosives and nonexplosives: the linguistic function of air
pressure differences in stops. In C. Gussenhoven, & N. Warner (Eds.), Laboratory Phonology 7
(pp. 299-350). Berlin/New York: Mouton de Gruyter.
Cortes, C., & Vapnik, V. N. (1995). Support-vector networks, Machine Learning, 20(3), 273-297.
Davis, S. (2011). Geminates. The Blackwell Companion to Phonology, 1-25.
199
de Jong, K. J. (2001). Rate-induced resyllabification revisited. Language and Speech, 44(2), 197-216.
Dekel, O., Keshet, J., & Singer, Y. (2004, June). An online algorithm for hierarchical phoneme
classification. In International Workshop on Machine Learning for Multimodal Interaction (pp.
146-158). Springer, Berlin, Heidelberg.
Dell, F., & Elmedlaoui, M. (2002). Syllables in Tashlhiyt Berber and in Moroccan Arabic. Dordrecht,
Boston: Kluwer Academic Publishers.
Dent, L. J. (1981). Laryngeal Control in the Production of Three Classes of Voiceless Stops, with
Occasional Reference to Bolivian Quechua. Doctoral dissertation. University of Pennsylvania.
Demolin, D. (1995). The phonetics and phonology of glottalized consonants in Lendu. In B. Connell, & A.
Arvanti (Eds.) Papers in Laboratory Phonology IV: Phonology and Phonetics Evidence (pp. 368-
385). Cambridge: Cambridge University Press.
Deschamps, J. C., & Brown, R. (1983). Superordinate goals and intergroup conflict. British Journal of
Social Psychology, 22(3), 189-195.
Diciccio, T. J., Martin, M. A., & Stern, S. E. (2001). Simple and accurate one‐sided inference from signed
roots of likelihood ratios. Canadian Journal of Statistics, 29(1), 67-76.
Dunn, M. H. (1987). Temporal effects of geminate consonants and consonant clusters. The Journal of the
Acoustical Society of America, 82(S1), S114-S115.
Dunn, M. H. (1993). The Phonetics and Phonology of Geminate Consonants: A Production Study.
Doctoral dissertation. Yale University.
Esling, J. H., & Moisik, S. R. (2011, August). Multimodal observation and measurement of larynx height
and state during pharyngeal sounds. In Proceedings of the 16th International Congress of Phonetic
Sciences (ICPhS) (pp. 643-646).
Fishbach, A., Dhar, R., & Zhang, Y. (2006). Subgoals as substitutes or complements: the role of goal
accessibility. Journal of Personality and Social Psychology, 91(2), 232.
Fougeron, C., & Keating, P. A. (1996). The influence of prosodic position on velic and lingual
articulation in French: evidence from EPG and airflow data. In 1st ETRW on Speech Production
Modeling: From Control Strategies to Acoustics; 4th Speech Production Seminar: Models and
Data.
Fox, J. (2016). Applied Regression Analysis and Generalized Linear Models. Third Edition. Sage.
Fox, J. & Weisberg, S. (2019). An R Companion to Applied Regression. Third Edition, Sage.
Frank, H., & Althoen, S. C. (1995). Statistics: Concepts and Applications. Cambridge: Cambridge
University Press.
Frid, A., & Lavner, Y. (2010, November). Acoustic-phonetic analysis of fricatives for classification using
SVM based algorithm. In IEEE 26th Convention of Electrical and Electronics Engineers in
Israel, 751-755.
200
Gafos, A. I. (2002). A grammar of gestural coordination. Natural Language & Linguistic Theory, 20(2),
269-337.
Gafos, A. I. (2009). Dynamics in grammar: comment on Ladd and Ernestus & Baayen. In L. Goldstein,
D. H. Whalen, & C. Best (Eds.), Laboratory Phonology 8 (pp. 51-80). De Gruyter Mouton.
Gafos, A., & Goldstein, L. (2012). Articulatory representation and organization. The handbook of
laboratory phonology, 220-231.
Gafos, A. I., Charlow, S., Shaw, J. A., & Hoole, P. (2014). Stochastic time analysis of syllable-referential
intervals and simplex onsets. Journal of Phonetics, 44, 152-166.
Gallagher, G. E. S. (2010). The Perceptual Basis of Long-Distance Laryngeal Restrictions. Doctoral
dissertation. Massachusetts Institute of Technology.
Ganapathiraju, A., Hamaker, J., & Picone, J. (2000). Hybrid SVM/HMM architectures for speech
recognition. In Sixth international conference on spoken language processing.
Gandour, J., & Maddieson, I. (1976). Measuring larynx movement in Standard Thai using the
cricothyrometer. Phonetica, 33(4), 241-267.
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1988). Getting started with the
DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute
of Standards and Technology (NIST), Gaithersburgh, MD, 107, 16.
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., & Dahlgren, N. L. (1993). Darpa
timit acoustic phonetic continuous speech corpus cdrom {TIMIT}.
Goldstein, L. (1989). On the domain of the Quantal Theory. Journal of Phonetics, 17, 91-97.
Goldstein, L. (1994). Do acoustic landmarks constrain the coordination of articulatory events? In P. A.
Keating (Ed.), Phonological Structure and Phonetic Form (Papers in Laboratory Phonology III)
(pp. 259-263). Cambridge, U. K., Cambridge University Press.
Goldstein, L., Byrd, D., & Saltzman, E. (2006). The role of vocal tract gestural action units in
understanding the evolution of phonology. In M. Arbib (Ed.), Action to Language: The Mirror
Neuron System (pp. 215-249). Cambridge: Cambridge University Press.
Goldstein, L., Chitoran, I., & Selkirk, E. (2007, August). Syllable structure as coupled oscillator modes:
evidence from Georgian vs. Tashlhiyt Berber. In Proceedings of the 16th International Congress
of Phonetic Sciences (pp. 241-244).
Goldstein, L., & Fowler, C. (2003). Articulatory phonology: a phonology for public language use. In A.
Meyer, & N. Schiller (Eds.), Phonetics and Phonology in Language Comprehension and
Production: Differences and Similarities (pp. 159-207). Berlin: Mouton de Gruyter.
Goldstein, L., Nam, H., Saltzman, E., & Chitoran, I. (2009). Coupled oscillator planning model of speech
timing and syllable structure. In Proceedings of the 8th Phonetic Conference of China and the
International Symposium on Phonetic Frontiers in Phonetics and Speech Science, 239-249.
201
Goldstein, U. G. (1980). An Articulatory Model for the Vocal Tracts of Growing Children. Doctoral
dissertation. Massachusetts Institute of Technology.
Graves, A., Fernández, S., & Schmidhuber, J. (2005, September). Bidirectional LSTM networks for
improved phoneme classification and recognition. In International Conference on Artificial
Neural Networks (pp. 799-804). Springer, Berlin, Heidelberg.
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and
other neural network architectures. Neural Networks, 18(5-6), 602-610.
Greenberg, J. H. (1970). Some generalizations concerning glottalic consonants, especially
implosives. International Journal of American Linguistics, 36(2), 123-145.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine
Learning Research, 3, 1157-1182.
Hagedorn, C., Proctor, M., Goldstein, L., & Narayanan, S. (2011a). Automatic analysis of geminate
consonant articulation using real-time MRI. Proc. ISSP, Montreal, Canada.
Hagedorn, C., Proctor, M., & Goldstein, L. (2011b). Automatic analysis of singleton and geminate
consonant articulation using real-time magnetic resonance imaging. In Twelfth Annual
Conference of the International Speech Communication Association, 409-412.
Haken, H., Kelso, J. A., & Bunz, H. (1985). A theoretical model of phase transitions in human hand
movements. Biological Cybernetics, 51, 347-356
Hamlet, S. L. (1980). Ultrasonic measurement of larynx height and vocal fold vibratory pattern. Journal of
the Acoustical Society of America, 68(1), 121-126.
Harbourne, R. T., & Stergiou, N. (2009). Movement variability and the use of nonlinear tools: principles
to guide physical therapist practice. Physical Therapy, 89(3), 267-282.
Hayes, B. (1986). Inalterability in CV phonology, Language, 62, 321-351.
Hayes, B. (1989). Compensatory lengthening in moraic phonology. Linguistic Inquiry, 20, 253-306.
Hirai, H., Honda, K., Fujimoto, I., & Shimada, Y. (1994). Analysis of magnetic resonance images on the
physiological mechanisms of fundamental frequency control. Journal of Acoustical Society of
Japan, 50, 296-304 (in Japanese).
Hirata, Y., & Whiton, J. (2005). Effects of speaking rate on the single/geminate stop distinction in
Japanese. The Journal of the Acoustical Society of America, 118(3), 1647-1660.
Holst, T., & Nolan, F. (1995). The influence of syntactic structure on [s] and [ʃ] assimilation. In B. Connell,
& A. Arvanti (Eds.), Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV (pp.
313-333). Cambridge, UK: Cambridge University Press.
Honda, K., Hirai, H., Masaki, S., & Shimada, Y. (1999). Role of vertical larynx movement and cervical
lordosis in F0 control. Language and Speech, 42(4), 401-411.
Hong, S. (2002). A Conspiracy in Native Korean /n/-insertion. Studies in Modern Grammar, 28, 1-16.
202
Hong, S. (2006). /n/-insertion in native Korean and Sino-Korean revisited. Studies in Phonetics,
Phonology, and Morphology, 12(2), 391-413.
Honorof, D. N. (2003). Articulatory evidence for nasal de-occlusivization in Castilian. In Proceedings of
the 15th International Congress of Phonetic Sciences (pp. 1759-1763).
Hunger, J. D., & Stern, L. W. (1976). An assessment of the functionality of the superordinate goal in
reducing conflict. Academy of Management Journal, 19(4), 591-605.
Hussain, Q. (2018). A typological study of Voice Onset Time (VOT) in Indo-Iranian languages. Journal of
Phonetics, 71, 284-305.
Jessen, M. (2002). An acoustic study of contrasting plosives and click accompaniments in
Xhosa. Phonetica, 59(2-3), 150-179.
Jun, J. (2015). Korean n-insertion: a mismatch between data and learning. Phonology, 32(3), 417-458.
Kakita, Y., & Hiki, S. (1976). Investigation of laryngeal control in speech by use of thyrometer. Journal of
the Acoustical Society of America, 59(3), 669-674.
Kelso, J. A., Saltzman, E. L., & Tuller, B. (1986). The dynamical perspective on speech production: Data
and theory. Journal of Phonetics, 14(1), 29-59.
Kenstowicz, M. (1982). Gemination and spirantization in Tigrinya. Studies in Linguistic Sciences, 12,
103-122.
Kenstowicz, M. (1994). Phonology in Generative Grammar. Blackwell Publishers.
Kim, H.-S, Kim, B., & Oh, M. (2007). An optimality theoretic analysis of phonetically motivated /n/-
insertion. The Linguistic Association of Korea Journal, 15(2), 187-205.
Kim, Y. S. (2011). An Acoustic, Aerodynamic and Perceptual Investigation of Word-initial
Denasalization in Korean. Doctoral dissertation. University College London.
Kingston, J. (1985). The Phonetics and Phonology of the Timing of Oral and Glottal Events. Doctoral
dissertation. University of California, Berkeley.
Kirchner, R. (2000). Geminate inalterability and lenition. Language, 509-545.
Kohavi, R., & John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-
324.
Kraehenmann, A. (2011). Initial geminates. In Marc van Oostendorp, C. J. Ewen, E. Hume, & K. Rice
(Eds.), The Blackwell Companion to Phonology 2 (pp. 1124-1146). Malden, MA and Oxford:
Wiley-Blackwell.
Kraehenmann, A., & Lahiri, A. (2008). Duration differences in the articulation and acoustics of Swiss
German word-initial geminate and singleton stops. Journal of the Acoustical Society of America,
123, 4446-4455.
Krishnamoorthy, K., & Lee, M. (2014). Improved tests for the equality of normal coefficients of
variation. Computational Statistics, 29(1- 2), 215-232.
203
Kröger, B. J., & Cao, M. (2015). The emergence of phonetic–phonological features in a biologically
inspired model of speech processing. Journal of Phonetics, 53, 88-100.
Kulikov, V. (2010, July). Voicing and vowel raising in Sundanese. In Proceedings of the 17th Annual
Meeting of the Austronesian Formal Linguistics Association. Stony Brook, NY.
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. (2017). lmerTest package: tests in linear mixed
effects models. Journal of Statistical Software, 82(13), 1-26.
Ladd, R. D., & Scobbie, J. M (2003). External sandhi as gestural overlap? Counter-evidence from
Sardinian. In J. Local, R. Ogden, & R. Temple (Eds.), Papers in Laboratory Phonology VI:
Phonetic Interpretation (pp. 164-182). Cambridge: Cambridge University Press.
Ladefoged, P. (1968). A Phonetic study of West African Languages: An Auditory-Instrumental Survey (No.
1). Cambridge University Press.
Ladefoged, P. (1971). Preliminaries to Linguistic Phonetics. Chicago: The University of Chicago Press.
Ladefoged, P. (1990). Some reflections on the IPA. Journal of Phonetics, 18(3), 335-346.
Ladefoged, P., & Maddieson, I. (1996). The Sounds of the World’s Languages, Oxford: Basil Blackwell.
Ladefoged, P., & Johnson, K. (2014). A Course in Phonetics. Nelson Education. Blackwell.
Ladefoged, P., Williamson, K., Elugbe, B., & Uwulaka, A. (1976). The stops of Owerri Igbo, Studies in
African Linguistics, Supplement 6, 147-63.
Lahiri, A., & Hankamer, J. (1988). The timing of geminate consonants. Journal of Phonetics, 16, 327-
338.
Lammert, A., Goldstein, L., Narayanan, S., & Iskarous, K. (2013a). Statistical methods for estimation of
direct and differential kinematics of the vocal tract. Speech Communication, 55(1), 147-161.
Lammert, A., Ramanarayanan, V., Proctor, M., & Narayanan, S. (2013b). Vocal tract cross-distance
estimation from real-time MRI using region-of-interest analysis. In INTERSPEECH (pp. 959-962).
Lyon, France.
Lammert, A., Williamson, J., Seneviratne, N., Espy-Wilson, C., & Quatieri, T. F. (2020). Coupled oscillator
planning account of the speech articulatory coordination metric with applications to disordered
speech. In the 12
th
International Seminar on Speech Production.
Laukkanen, A. M., Takalo, R., Vilkman, E., Nummenranta, J., & Lipponen, T. (1999). Simultaneous
videofluorographic and dual-channel electroglottographic registration of the vertical laryngeal
position in various phonatory tasks. Journal of Voice, 13(1), 60-71.
Lee, P. (2004). Korean sonorant modifications as domain initial strengthening. Korean Journal of
Linguistics, 29(4), 607-633.
Lee, S. (2016). Revisiting the /n/-insertion in Korean, Korean Journal of Linguistics, 41(4), 637-659.
Lee, S.-D., & Kim, S.-J. (2007). Aerodynamic approach to nasals in Korean and English. The New
Korean Journal of English Language and Literature, 49(3), 85-100.
204
Lee, Y., & Lee, M. (2006). n-insertion as y-devocalization in Korean. Korean Journal of Linguistics,
31(3), 413-440.
Lee, T. D., Swinnen, S. P., & Verschueren, S. (1995). Relative phase alterations during bimanual skill
acquisition. Journal of Motor Behavior, 27(3), 263-274.
Levene, H. (1960). Robust tests for equality of variances. In I. Olkin, S. G.Ghurye, W. Hoeffding, W. G.
Madow, & H. B. Mann (Eds.), Contributions to Probability and Statistics, (pp. 278-292).
Stanford, California: Stanford University Press.
Lindau, M. (1984). Phonetic differences in glottalic consonants. Journal of Phonetics, 12(2), 147-155.
Lindau, M. (1985). The story of /r/. In V. A. Fromkin (Ed.) Phonetic Linguistics: Essays in Honor of
Peter Ladefoged (pp. 157-168). Orlando, FL: Academic Press.
Lindqvist, J., Sawashima, M., & Hirose, H. (1973). An investigation of the vertical movement of the larynx
in a Swedish speaker. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, 7,
27-34.
Lingala, S. G., Toutios, A., Töger, J., Lim, Y., Zhu, Y., Kim, Y. C., Vaz, C., Narayanan, S., & Nayak, K.
S. (2016). State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure
and Function. In INTERSPEECH (pp. 475-479).
Lingala, S. G., Zhu, Y., Kim, Y. C., Toutios, A., Narayanan, S., & Nayak, K. S. (2017). A fast and flexible
MRI system for the study of dynamic vocal tract shaping. Magnetic Resonance in Medicine, 77(1),
112-125.
Löfqvist, A., & Yoshioka, H. (1981). Interarticulator programming in obstruent
production. Phonetica, 38(1-3), 21-34.
Löfqvist, A., & Yoshioka, H. (1984). Intrasegmental timing: Laryngeal-oral coordination in voiceless
consonant production. Speech Communication, 3(4), 279-289.
Löfqvist, A. (1991). Proportional timing in speech motor control. Journal of Phonetics, 19(3-4), 343-350.
Löfqvist, A. (1995). Laryngeal mechanisms and interarticulator timing in voiceless consonant production.
Producing speech: Contemporary Issues for Katherine Safford Harris, 99-116.
Ma, C., Randolph, M. A., & Drish, J. (2001, May). A support vector machines-based rejection technique
for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal
Processing. Proceedings (Cat. No. 01CH37221), 1, 381-384.
MacEachern, M. (1997). Laryngeal Cooccurrence Restriction. Doctoral dissertation. University of
California, Los Angeles.
Mackenzie, S. (2009). Contrast and Similarity in Consonant Harmony Processes. Doctoral dissertation.
University of Toronto.
Maddieson, I. (1998, August). Why make life hard? Resolutions to problems of rare and difficult sound
types. In Annual Meeting of the Berkeley Linguistics Society, 24(1), 367-380.
205
Maddieson, I., & Ladefoged, P. (1996). The Sounds of the World’s Languages. Malden, MA: Blackwell
Publishing.
Maeda, S. (1990) Compensatory articulation during speech: Evidence from the analysis and synthesis of
vocal-tract shapes using an articulatory model. In W. J. Hardcastle, & A. Marchal (Eds.), Speech
Production and Speech Modelling (pp. 131-149). Dordrecht: Kluwer Academic Publishers.
Marwick, B., & Krishnamoorthy, K. (2019) cvequality: Tests for the Equality of Coefficients of Variation
from Multiple Groups. R software package version 0.1.3. Retrieved from
https://github.com/benmarwick/cvequality, on 09/18/2020
Mattingly, I. G. (1990). The global character of phonetic gestures. Journal of Phonetics, 18(3), 445-452.
McCarthy, J. (1986). OCP effects: germination and antigemination. Linguistic Inquiry, 17, 207-263.
McGowan, R. S., & Saltzman, E. L. (1995). Incorporating aerodynamic and laryngealcomponents into task
dynamics. Journal of Phonetics, 23(1-2), 255-269.
Mc Laughlin, F. (2005). Voiceless implosives in Seereer-Siin. Journal of the International Phonetic
Association, 35(2), 201-214.
Ménard, L., Schwartz, J. L., Boë, L. J., & Aubin, J. (2007). Articulatory–acoustic relationships during vocal
tract growth for French vowels: Analysis of real data and simulations with an articulatory
model. Journal of Phonetics, 35(1), 1-19.
Mermelstein, P. (1973). Articulatory model for the study of speech production. Journal of the Acoustical
Society of America, 53(4), 1070-1082.
Mitterer, H. (2018). The singleton-geminate distinction can be rate dependent: Evidence from Maltese.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 9(1).
Munhall, K., & Löfqvist, A. (1992). Gestural aggregation in speech-laryngeal gestures. Journal of
Phonetics, 20(1), 111-126.
Munhall, K. G., Löfqvist, A., & Kelso, J. S. (1994). Lip–larynx coordination in speech: Effects of
mechanical perturbations to the lower lip. The Journal of the Acoustical Society of America,
95(6), 3605-3616.
Nam, H. (2007). Syllable-level intergestural timing model: Split-gesture dynamics focusing on positional
asymmetry and moraic structure. In J. Cole, & I. J. Hualde (Eds.), Laboratory Phonology 9 (pp.
483-506). Mouton de Gruyter, New York.
Nam, H. (2011). Phonology of positional asymmetry and geminates: Constraints from gestural
coordination dynamics. Korean Journal of Linguistics, 36(2), 337-365.
Nam, H., Goldstein, L., Browman, C., Rubin, P., Proctor, M., & Saltzman, E. (2006). TADA (TAsk
Dynamics Application) Manual.
Nam, H., Goldstein, L., & Saltzman, E. (2009). Self-organization of syllable structure: A coupled
oscillator model. Approaches to Phonological Complexity, 16, 299-328.
206
Nam, H., Goldstein, L., Saltzman, E., & Byrd, D. (2004). TADA: An enhanced, portable Task Dynamics
model in MATLAB. The Journal of the Acoustical Society of America, 115(5), 2430.
Nam, H., & Saltzman, E. (2003, August). A competitive, coupled oscillator model of syllable structure. In
Proceedings of the 15th International Congress of Phonetic Sciences, 1, (pp. 2253-2256). Spain:
Barcelona.
Narayanan, S. S., Nayak, K. S., Lee S., Sethy, A, & Byrd, D. (2004). An approach to real-time magnetic
resonance imaging for speech production. Journal of the Acoustical Society of America, 115(4),
1771-1776.
Neuschaefer-Rube, C., Wein, B., Angerstein, W., & Klajman, S. (1996). MRI examination of laryngeal
height during vowel singing. Folia Phoniatrica et Logopaedica, 48(4), 201-209.
Newman, P. (2000). The Hausa Language: An Encyclopedic Reference Grammar. New Haven: Yale
University Press.
Oh, M. (2006). Nieun-sapip hwangyeongui jaegeomto (‘Study on /n/- insertion environment’). The
Linguistic Association of Korea Journal, 14(3), 117-135.Browman, C. P., & Goldstein, L. (1989).
Articulatory gestures as phonological units. Phonology, 6(2), 201-251.
Oh, M., Toutios, A., Byrd, D., Goldstein, L., & Narayanan, S. S. (2017, December) Tracking larynx
movement in real-time MRI data. Journal of the Acoustical Society of America, 142(4), 2579. New
Orleans, LA.
Oh, M., & Y. Lee. (2018). ACT: An Automatic Centroid Tracking tool for analyzing vocal tract actions in
real-time magnetic resonance imaging speech production data. The Journal of the Acoustical
Society of America, 144(4), EL290-EL296.
Oh, M., Byrd, D., Goldstein, L., & Narayanan, S. S. (2018). Enriching the understanding of glottalic
consonant production: Vertical larynx movement in Hausa ejectives and implosives. Journal of the
Acoustical Society of America, 144(3), 1940-1941. Victoria, Canada.
Oh, M., Byrd, D., Goldstein, L., & Narayanan, S. S. (2019, June). Vertical larynx actions and larynx-oral
timing in ejectives and implosives. In 3rd Phonetics and Phonology in Europe (PaPE), Lecce,
Italy.
Oh, M., & Y. Lee (2020). Focusing on vertical larynx action dynamics. Journal of the Acoustical Society
of America, 148(4), 2655. Acoustics Virtually Everywhere
Ohala, J. J., & Solé, M. J. (2010). Turbulence and phonology. Turbulent Sounds: An Interdisciplinary Guide,
37-97.
Parrell, B. (2012). The role of gestural phasing in Western Andalusian Spanish spiration. Journal of
Phonetics, 40(1), 37-45.
Payne, E. (2005). Phonetic variation in Italian consonant gemination. Journal of the International
Phonetic Association, 35(2), 153-189.
Payne, E. (2006). Non-durational indices in Italian geminate consonants. Journal of the International
Phonetic Association, 36(1), 83-95.
207
Post, A. A., Peper, C. E., Daffertshofer, A., & Beek, P. J. (2000). Relative phase dynamics in perturbed
interlimb coordination: stability and stochasticity. Biological Cybernetics, 83(5), 443-459.
Pouplier, M., Marin, S., Hoole, P., & Kochetov, A. (2017). Speech rate effects in Russian onset clusters
are modulated by frequency, but not auditory cue robustness. Journal of Phonetics, 64, 108-126.
Proctor, M., Bresch, E., Byrd, D., Nayak, K., & Narayanan, S. (2013). Paralinguistic mechanisms of
production in human “beatboxing”: A real-time magnetic resonance imaging study. Journal of the
Acoustical Society of America, 133(2), 1043-1054.
Proctor, M., Lammert, A., Katsamanis, A., Goldstein, L., Hagedorn, C., & Narayanan, S. (2011). Direct
estimation of articulatory kinematics from real-time Magnetic Resonance Image sequences. In
INTERSPEECH (pp. 281-284). Florence, Italy.
Raza, S., Agha, F. Z., & Usman, R. (2004). Phonemic inventory of Sindhi and acoustic analysis of voiced
implosives. Center for Research in Urdu language Processing (CRULP).
Ridouane, R. (2003). Geminates vs. singleton stops in Berber: An acoustic, fiberscopic and
photoglottographic study. In Proceedings of the 15th International Congress of Phonetic Sciences
(pp. 1743-1746).
Ridouane, R. (2007). Gemination in Tashlhiyt Berber: an acoustic and articulatory study. Journal of the
International Phonetic Association, 37(2), 119-142.
Ridouane, R. (2010). Geminates at the junction of phonetics and phonology. Papers in Laboratory
Phonology, 10, 61-90.
Robinson, T. (1994). An application of recurrent nets to phone probability estimation. In IEEE
Transactions on Neural Networks, 5(2), 298-305.
Roessig, S., Mücke, D., & Grice, M. (2019). The dynamics of intonation: Categorical and continuous
variation in an attractor-based model. PloS One, 14(5), e0216859.
Russell, S. M. (1997). Some Acoustic Characteristics of Word Initial Pulmonic and Glottalic Stops in Mam,
Doctoral dissertation. Simon Fraser University.
Salomon, J. (2001). Support Vector Machines for Phoneme Classification. Master of Science, School of
Artificial Intelligence, Division of Informatics, University of Edinburgh.
Saltzman, E., & Byrd, D. (1999, August). Dynamical simulations of a phase window model of relative
timing. In Proceedings of the 14th International Congress of Phonetic Sciences (pp. 1-7). San
Francisco, CA.
Saltzman, E., & Byrd, D. (2000). Task-dynamics of gestural timing: Phase windows and multifrequency
rhythms. Human Movement Science, 19(4), 499-526.
Saltzman, E., Löfqvist, A., Kay, B., Kinsella-Shaw, J., & Rubin, P. (1998). Dynamics of intergestural
timing: A perturbation study of lip-larynx coordination. Experimental Brain Research, 123, 412-
424.
208
Saltzman, E., Lofqvist, A., & Mitra, S. (2000). “Glue” and “clocks”: Intergestural cohesion and global
timing. In M. B. Broe, & J. B. Pierrehumbert (Eds.), Papers in laboratory phonology V (pp. 88-
101). Cambridge: Cambridge University Press.
Saltzman, E., & Munhall, K. G. (1989). A dynamical approach to gestural patterning in speech
production, Ecological Psychology, 1, 333-382.
Saltzman, E., Nam, H., Goldstein, L., & Byrd, D. (2006). The distinction between state, parameter and
graph dynamics in sensorimotor control and coordination. In M. L. Latach, & F. Lestienne (Eds.),
Motor Control and Learning (pp. 63-73). New York: Springer.
Saltzman, E., Nam, H., Krivokapic, J., & Goldstein, L. (2008, May). A task-dynamic toolkit for modeling
the effects of prosodic structure on articulation. In Proceedings of the 4th International
Conference on Speech Prosody (Speech Prosody 2008), (pp. 175-184). Campinas, Brazil.
Santos, J. M., Wright, G. A., & Pauly, J. M. (2004, September). Flexible real-time magnetic resonance
imaging framework. In Proceedings of the 26th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society (pp. 1048-1051). San Francisco, CA.
Schöner, G. (1995). Recent developments and problems in human movement science and their conceptual
implications. Ecological Psychology, 7(4), 291-314.
Schöner, G., Haken, H., & Kelso, J. A. S. (1986). A stochastic theory of phase transitions in human hand
movement. Biological Cybernestics, 53, 247-257.
Schmidt, R. C., Bienvenu, M., Fitzpatrick, P. A., & Amazeen, P. G. (1998). A comparison of intra-and
interpersonal interlimb coordination: coordination breakdowns and coupling strength. Journal of
Experimental Psychology: Human Perception and Performance, 24(3), 884.
Schmidt, M., & Gish, H. (1996, May). Speaker identification via support vector classifiers. In IEEE
International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings,
1, 105-108.
Shaw, J. A., Durvasula, K., & Kochetov, A. (2019). The temporal basis of complex segments. In
Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia (pp.
676-680).
Shipp, T. (1975). Vertical laryngeal position during continuous and discrete vocal frequency
change. Journal of Speech, Language, and Hearing Research, 18(4), 707-718.
Shosted, R. K., Carignan, C., & Rong, P. (2011, June). Estimating vertical larynx position using EMA. 9th
International Seminar on Speech Production (ISSP) (pp. 139-146). Montreal, Canada.
Silva, D. J. (2006). Acoustic evidence for the emergence of tonal contrast in contemporary Korean.
Phonology, 23(2), 287-308.
Simpson, A. P., & Brandt, E. (2019). Detecting larynx movement in non-pulmonic consonants using dual-
channel electroglottography. In Proceedings of the 19th International Congress of Phonetic
Sciences (ICPhS). (pp. 2401-2405). Melbourne, Australia.
209
Smith, C. L. (1992). The Timing of Vowel and Consonant Gestures. Doctoral Dissertation. Yale
University.
Smith, C. L. (1995). Prosodic patterns in the coordination of vowel and consonant gestures. Papers in
Laboratory Phonology IV, Phonology and Phonetic Evidence. CUP, 205-222.
Solé, M. J. (2007, August). Compatibility of features and phonetic content. The case of nasalization.
In Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS) (pp. 261-266).
Solé, M. J. (2014). The perception of voice-initiating gestures. Laboratory Phonology, 5(1), 37-68.
Solé, M. J. (2018). Articulatory adjustments in initial voiced stops in Spanish, French and English. Journal
of Phonetics, 66, 217-241.
Sorensen, T., & Gafos, A. (2016). The gesture as an autonomous nonlinear dynamical system. Ecological
Psychology, 28(4), 188-215.
Sproat, R., & Fujimura, O. (1993). Allophonic variation in English/l/and its implications for phonetic
implementation. Journal of Phonetics, 21(3), 291-311.
Sternad, D., Amazeen, E. L., & Tervey, M. T. (1996). Diffusive, synaptic, and synergetic coupling: An
evaluation through inphase and antiphase rhythmic movement, Journal of Motor Behavior, 28,
255-269.
Tabain, M., Breen, G., & Butcher, A. (2004). VC vs. CV syllables: a comparison of Aboriginal languages
with English. Journal of the International Phonetic Association, 175-200.
Tiede, M. (2010). MVIEW: Multi-channel visualization application for displaying dynamic sensor
movements. Haskins Laboratories.
Tilsen, S. (2009). Multitimescale dynamical interactions between speech rhythm and gesture. Cognitive
Science, 33(5), 839-879.
Tilsen, S., Spincemaille, P., Xu, B., Doerschuk, P., Luh, W. M., Feldman, E., & Wang, Y. (2016).
Anticipatory posturing of the vocal tract reveals dissociation of speech movement plans from
linguistic units. PloS one, 11(1), e0146813.
Tipping, M. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural
Computation, 11(2), 443-482.
Topintzi, N. (2008). On the existence of moraic onsets. Natural Language and Linguistic Theory, 26,
147-184.
Toutios, A., Lingala, S. G., Vaz, C., Kim, J., Esling, J., Keating, P., Gordon, M., Byrd, D., Goldstein, L.,
Nayak, K., & Narayanan, S. (2016). Illustrating the production of the International Phonetic
Alphabet sounds using fast real-time magnetic resonance imaging. In INTERSPEECH (pp. 2428-
2432).
Tuller, B., Case, P., Ding, M., & Kelso, J. A. (1994). The nonlinear dynamics of speech categorization.
Journal of Experimental Psychology: Human Perception and Performance, 20(1), 3-16.
210
van Emmerik, R. E., & van Wegen, E. E. (2000). On variability and stability in human movement.
Journal of Applied Biomechanics, 16(4), 394-406.
van Emmerik, R. E., Ducharme, S. W., Amado, A. C., & Hamill, J. (2016). Comparing dynamical
systems concepts and techniques for biomechanical analysis. Journal of Sport and Health
Science, 5(1), 3-13.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, N.Y.
Vaxelaire, B. (1995) Single vs. double (abutted) consonants across speech rate: X-ray and acoustic data
from French. In K. Elenius, & P. Branderud (Eds.), Proceedings of the XIIIth International
Congress of Phonetic Sciences, (pp. 384-387).
Vaz, C., Ramanarayanan, V., & Narayanan, S. (2018). Acoustic denoising using dictionary learning with
spectral and temporal regularization. IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 26(5), 967-980.
Wang, G., & Kong, J. (2010, November). The relation between larynx height and F0 during the four tones
of Mandarin in X-ray movie. In 7th International Symposium on Chinese Spoken Language
Processing (ISCSLP) (pp. 335-338). IEEE.
Yanagawa, M. (2006). Articulatory Timing in First and Second Language: A Cross-Linguistic Study.
Doctoral dissertation. Yale University.
Yapa, R. D., & Koichi, H. (2007, March). A connected component labeling algorithm for grayscale
images and application of the algorithm on mammograms. In Proceedings of the 2007 ACM
Symposium on Applied Computing (pp. 146-152).
Yoshida, K. (2008). Phonetic implementation of Korean denasalization and its variation related to
prosody. IULC Working Papers, 8(1).
Yoo, K., & Nolan, F. (2020). Sampling the progression of domain-initial denasalization in Seoul Korean.
Laboratory Phonology: Journal of the Association for Laboratory Phonology, 11(1): 22, 1-32.
Yousafzai, J., Ager, M., Cvetkovic, Z., & Sollich, P. (2008). Discriminative and generative machine
learning approaches towards robust phoneme classification. In IEEE Information Theory and
Applications Workshop, 471-475.
Xu, Y., & Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech
Communication, 33(4), 319-337.
Zeroual, C., Hoole, P., & Gafos, A. I. (2008). Spatio-temporal and kinematic study of Moroccan Arabic
coronal geminate plosives. Proceedings of the 8th ISSP, 133-136.
211
Appendices
Appendix A: Hausa Stimuli
C Word Phrase-initial Phrase-internal
b
bà:ki
‘mouth’
Faɗa sau ɗaya, baki shine kalma a Hausa.
‘Say once, mouth is the word in Hausa.’
A yanzu, biya baki kamar kalma a Hausa.
‘Right now, read aloud mouth as a word in
Hausa.’
ɓ
ɓàrna:
‘loss’
Faɗa sau ɗaya, ɓarna shine kalma a Hausa.
‘Say once, loss is the word in Hausa.’
A yanzu, biya ɓarna kamar kalma a Hausa.
‘Right now, read aloud loss as a word in Hausa.’
d
dà:ce:
‘coincidence’
Faɗa sau ɗaya, dace shine kalma a Hausa.
‘Say once, coincidence is the word in
Hausa.’
A yanzu, biya dace kamar kalma a Hausa.
‘Right now, read aloud coincidence as a word in
Hausa.’
ɗ
ɗà:zu:
‘just now’
Faɗa sau ɗaya, ɗazu shine kalma a Hausa.
‘Say once, just now is the word in Hausa.’
A yanzu, biya ɗazu kamar kalma a Hausa.
‘Right now, read aloud just now as a word in
Hausa.’
k
kà:za:
‘chicken’
Faɗa sau ɗaya, kaza shine kalma a Hausa.
‘Say once, chicken is the word in Hausa.’
A yanzu, biya kaza kamar kalma a Hausa.
‘Right now, read aloud chicken as a word in
Hausa.’
k’
k’àho:
‘trumpet’
Faɗa sau ɗaya, ƙaho shine kalma a Hausa.
‘Say once, trumpet is the word in Hausa.’
A yanzu, biya ƙaho kamar kalma a Hausa.
‘Right now, read aloud trumpet as a word in
Hausa.’
k
w
k
w
à:ɗo:
‘frog’
Faɗa sau ɗaya, kwaɗo shine kalma a Hausa.
‘Say once, frog is the word in Hausa.’
A yanzu, biya kwaɗo kamar kalma a Hausa.
‘Right now, read aloud frog as a word in
Hausa.’
k
w
’
k
w
’à:ro:
‘insect’
Faɗa sau ɗaya, ƙwaro shine kalma a Hausa.
‘Say once, insect is the word in Hausa.’
A yanzu, biya ƙwaro kamar kalma a Hausa.
‘Right now, read aloud insect as a word in
Hausa.’
k
w
’
k
w
’à:ya:
‘drug’
Faɗa sau ɗaya, ƙwaya shine kalma a Hausa.
‘Say once, drug is the word in Hausa.’
A yanzu, biya ƙwaya kamar kalma a Hausa.
‘Right now, read aloud drug as a word in
Hausa.’
s
sà:ƙo
‘message’
Faɗa sau ɗaya, saƙo shine kalma a Hausa.
‘Say once, message is the word in Hausa.’
A yanzu, biya saƙo kamar kalma a Hausa.
‘Right now, read aloud message as a word in
Hausa.’
s’
s’à:da:
‘expensiveness’
Faɗa sau ɗaya, tsada shine kalma a Hausa.
‘Say once, expensiveness is the word in
Hausa.’
A yanzu, biya tsada kamar kalma a Hausa.
‘Right now, read aloud expensiveness as a word
in Hausa.’
m
mà:ge
‘cat’
Faɗa sau ɗaya, mage shine kalma a Hausa.
‘Say once, cat is the word in Hausa.’
A yanzu, biya mage kamar kalma a Hausa.
‘Right now, read aloud cat as a word in Hausa.’
n
nà:wa
‘mine’
Faɗa sau ɗaya, nawa shine kalma a Hausa.
‘Say once, mine is the word in Hausa.’
A yanzu, biya nawa kamar kalma a Hausa.
‘Right now, read aloud mine as a word in
Hausa.’
212
Appendix B: Korean Stimuli
Boundary Target phrases C Target sentences
Wd
hatp
*
a
‘fish
cake bar’
nɛ
‘four’
#n
kyengsekinun, hwangkuphi AP[haspa # ney kaylul] samekessta.
‘Kyungseok hurriedly bought and ate AP[four fish cake bars].’
tasʌt
‘five’
#t
unyenginun, chenchenhi AP[haspa # tases kaylul] teywessta.
‘Eunyoung slowly heated AP[five fish cake bars].’
ɕ
h
ilp
h
an
‘black-
board’
nɛ
‘four’
n#n
tonghwuninun, ppalukey AP[chilphan # ney kaylul] wunpanhayssta.
‘Donghoon quickly moved AP[four blackboards].
tasʌt
‘five’
n#t
yuncenginun, halwuey AP[chilphan # tases kaylul takkassta.
‘Yoonjeong cleaned AP[five blackboards] a day.’
pɛk
‘a hundred’
n#p
cenghwuninun, setwulle AP[chilphan # payk kaylul] cwumwunhayssta.
‘Junghoon rapidly ordered AP[a hundred blackboards].’
t
h
ʌtp
*
at
‘garden
field’
nɛ
‘four’
t#n
taynginun, sesehi AP[thespath # ney phyengul] kyengcakhayssta.
‘Dayoung gradually decorated AP[four yards of garden field].’
tasʌt
‘five’
t#t
mincenginun, tto tasi AP[thespath # tases phyengul] kakkwessta.
‘Minjeong cultivated AP[five yards of garden field] again.’
jʌsʌt
‘six’
t#j
cinwukinun, han peney AP[thesphath # yeses phyengul] phalassta.
‘Jinuk sold AP[six yards of garden field] at once.’
pɛk
‘a hundred’
t#p
ciuninun, sengkuphi AP[thespath # payk phyengul] chepwunhayssta.
‘Junghoon rapidly ordered AP[a hundred blackboards].’
AP -
hatp
*
a
‘fish
cake bar’
nɛ
‘four’
#n
kyengcwuninun, AP[ttukewun haspa] # ney kaylul samekestta.
‘Kyungjun bought and ate four AP[hot fish cake bars].’
tasʌt
‘five’
#t
polaminun, AP[chakawun haspa] # tases kaylul teywessta.
‘Boram heated five AP[cold fish cake bars].’
ɕ
h
ilp
h
an
‘black-
board’
nɛ
‘four’
n#n
toyenginun AP[thunthunhan chilphan] # ney kaylul wunpanhayssta.
‘Doyoung moved four AP[sturdy blackboards].’
tasʌt
‘five’
n#t
cenuninun, AP[khetalan chilphan] # tases kaylul takkassta.
‘Jungeun cleaned five AP[huge blackboards].’
pɛk
‘a hundred’
n#p
minsenginun, AP[hayansayk chilphan] # payk kaylul cwumwunhayssta.
‘Minsung ordered a hundred AP[white blackboards].’
t
h
ʌtp
*
at
‘garden
field’
nɛ
‘four’
t#n
yeycininun, AP[phwululun thespath] # ney phyengul kyengcakhayssta.
‘Yejin decorated four yards of AP[lush garden field].
tasʌt
‘five’
t#t
tongwukinun, AP[hakkyo aph thespath] # tases phyengul kakkwessta.
‘Dongwook cultivated five yards of AP[garden field by the school].’
jʌsʌt
‘six’
t#j
cwuhyeninun AP[cholahan thespath] # yeses phyengul phalassta.
‘Juhyun sold six yards of AP[shabby garden field].’
pɛk
‘a hundred’
t#p
cwunsekinun, AP[hwanglyanghan thespath] # payk pheyngul chepwunhayssta.
‘Junseok sold out a hundred yards of AP[desolated garden field].’
213
AP focus
hatp
*
a
‘fish
cake bar’
nɛ
‘four’
#n
kyengcwuninun, AP[ttukewun haspa] # ney kaylul samekestta.
‘Kyungjun bought and ate four AP[hot fish cake bars].’
tasʌt
‘five’
#t
polaminun, AP[chakawun haspa] # tases kaylul teywessta.
‘Boram heated five AP[cold fish cake bars].’
ɕ
h
ilp
h
an
‘black-
board’
nɛ
‘four’
n#n
toyenginun AP[thunthunhan chilphan] # ney kaylul wunpanhayssta.
‘Doyoung moved four AP[sturdy blackboards].’
tasʌt
‘five’
n#t
cenuninun, AP[khetalan chilphan] # tases kaylul takkassta.
‘Jungeun cleaned five AP[huge blackboards].’
pɛk
‘a hundred’
n#p
minsenginun, AP[hayansayk chilphan] # payk kaylul cwumwunhayssta.
‘Minsung ordered a hundred AP[white blackboards].’
t
h
ʌtp
*
at
‘garden
field’
nɛ
‘four’
t#n
yeycininun, AP[phwululun thespath] # ney phyengul kyengcakhayssta.
‘Yejin decorated four yards of AP[lush garden field].
tasʌt
‘five’
t#t
tongwukinun, AP[hakkyo aph thespath] # tases phyengul kakkwessta.
‘Dongwook cultivated five yards of AP[garden field by the school].’
jʌsʌt
‘six’
t#j
cwuhyeninun AP[cholahan thespath] # yeses phyengul phalassta.
‘Juhyun sold six yards of AP[shabby garden field].’
pɛk
‘a hundred’
t#p
cwunsekinun, AP[hwanglyanghan thespath] # payk pheyngul chepwunhayssta.
‘Junseok sold out a hundred yards of AP[desolated garden field].’
IP
hatp
*
a
‘fish
cake bar’
nɛga
‘I’
#n
i meynyunun, “AP[ttukewun haspa],” # IP[nayka cacwu meknun kansikita].
‘This menu is, “the hot fish cake bar,” the snack I eat often.’
taɨm
‘next’
#t
i yenghwanun. “AP[chakawun haspa],” # IP[taum phyeni tewuk kitaytoynta].
‘This movie is, “the cold fish cake bar,” the movie I long for the next sequel.’
ɕ
h
ilp
h
an
‘black-
board’
nɛga
‘I’
n#n
i cakphwumun, “AP[thunthunhan chilphan],” # IP[nayka sakosiphun cakphwumita].
‘This masterpiece is, “the sturdy blackboard,” the piece I wish to buy.’
taɾɨn
‘different’
n#t
i cokakun, “AP[khetalan chilpan],” # IP[talun keskwa chapyelsengi issta].
‘This piece is, “the huge blackboard,” the piece different from others.’
paɾɨn
‘good’
n#p
i yengsangun, “AP[hayansayk chilphan],” # IP[palun kongpwu pepul allyecwunta].
‘This video clip is, “the white blackboard,” it depicts good study habits.’
t
h
ʌtp
*
at
‘garden
field’
nɛga
‘I’
t#n
i kulimun, “AP[phwululun thespath],” # IP[nayka cohahanun kuliminta].
‘This painting is, “the lush garden field,” the painting I like.’
tasi
‘again’
t#t
i manhwanun, “AP[hakkyo aph thespath],” # IP[tasi pwato nemwu caymiissta].
‘This cartoon is, “school’s garden field,” such a fun even when reading again.’
jʌɾɨm
‘summer’
t#j
i soselun, “AP[cholahan thespath],” # IP[yelum chelul paykeyngulo hayssta].
‘This novel is, “the shabby garden field,” which has a summer background setting.’
paɾam
‘breeze’
t#p
i nolaynun, “AP[hwanglyanghan thespath],” # IP[palam pwunun naley tutki cohta].
‘This song is, “the desolated garden field,” good to listen to in a breezing day.’
*Korean alphabet orthography (Hangul) is converted according to Yale Romanization.
**In the AP+focus boundary condition, the word “after” the boundary (underlined) is the target of focus.
Abstract (if available)
Abstract
Speech production involves combined actions of multiple coordinated articulatory gestures. The atomic linguistic units are elegantly coupled with one another to yield a structured spatiotemporal realization of the gestural components of speech, thereby enabling humans to perceive and parse language effortlessly. The goal of this dissertation is to develop our theoretical understanding of the linguistic representation of articulatory coordination in speech production, drawing on an interdisciplinary approach incorporating phonetics, phonology, biomedical imaging, computational modeling, and a dynamical systems approach to motor control. The project undertakes four empirical real-time MRI (rtMRI) experiments to understand how contrastive linguistic ‘molecules’ㅡfocusing on segment-sized multi- gesture complexesㅡinteract with positional and phrasal variation in speech, followed by modeling analyses of the self-organization and coordination among these interacting levels of linguistic structure. ❧ Specifically, this dissertation undertakes a kinematic examination of intergestural timing stability within multi-gesture segments such as ejectives, implosives, and nasals that may possess specific temporal goals critical to their realization. Using rtMRI speech production data from Hausa and Korean, the dissertation illuminates speech timing among oral constriction and larynx/velum actions within segments and the role this intergestural timing plays in realizing phonological contrast and processes in varying prosodic contexts. Results demonstrate that within such segment-sized gestural molecules coordination is inherently stable due to their specific internal intergestural coupling relations. We successfully model the empirical findings on timingㅡin particular distinct patterns of timing variabilityㅡvia a dynamical coupling architecture or ‘graph’ among the component gestures. The experimental and computational assessment of coordination in multi-gesture structures can reveal the role of coupling relations and timing variability in phonological representation as realized in a variety of syllabic and prosodic environments. This dissertation furthers our linguistic knowledge of how the basic atoms of speech are synergistically built up to produce meaningful speech sounds and to convey linguistic information.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The prosodic substrate of consonant and tone dynamics
PDF
The planning, production, and perception of prosodic structure
PDF
Harmony in gestural phonology
PDF
Effects of speech context on characteristics of manual gesture
PDF
Individual differences in phonetic variability and phonological representation
PDF
Visualizing and modeling vocal production dynamics
PDF
Dynamics of speech tasks and articulator synergies
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Investigating the production and perception of reduced speech: a cross-linguistic look at articulatory coproduction and compensation for coarticulation
PDF
Fast upper airway MRI of speech
PDF
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
PDF
Emotional speech production: from data to computational models and applications
PDF
Syntax-prosody interactions in the clausal domain: head movement and coalescence
PDF
Fast flexible dynamic three-dimensional magnetic resonance imaging
PDF
Dynamics of consonant reduction
PDF
Speech production in post-glossectomy speakers: articulatory preservation and compensation
PDF
Tone gestures and constraint interaction in Sierra Juarez Zapotec
PDF
Selectivity for visual speech in posterior temporal cortex
Asset Metadata
Creator
Oh, Miran
(author)
Core Title
Articulatory dynamics and stability in multi-gesture complexes
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Degree Conferral Date
2021-12
Publication Date
09/23/2021
Defense Date
09/01/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
articulatory phonology,articulatory timing,ejectives,glottalic consonants,Hausa,implosives,intergestural timing,Korean,larynx actions,multi-gesture complexes,nasals,OAI-PMH Harvest,prosodic stability,prosody,real-time MRI,speech coordination,speech dynamics,speech imaging,speech motor control,speech production,timing stability,velum actions
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Byrd, Dani (
committee chair
), Goldstein, Louis (
committee member
), Narayanan, Shrikanth (
committee member
), Nayak, Krishna (
committee member
)
Creator Email
miranoh@usc.edu,miranoh1102@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15925683
Unique identifier
UC15925683
Legacy Identifier
etd-OhMiran-10095
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Oh, Miran
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
articulatory phonology
articulatory timing
ejectives
glottalic consonants
Hausa
implosives
intergestural timing
larynx actions
multi-gesture complexes
nasals
prosodic stability
prosody
real-time MRI
speech coordination
speech dynamics
speech imaging
speech motor control
speech production
timing stability
velum actions