Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The planning, production, and perception of prosodic structure
(USC Thesis Other)
The planning, production, and perception of prosodic structure
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE PLANNING, PRODUCTION, AND PERCEPTION
OF PROSODIC STRUCTURE
by
Jelena Krivokapi ć
________________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(LINGUISTICS)
August 2007
Copyright 2007 Jelena Krivokapi ć
ii
Dedication
To my parents
iii
Acknowledgements
This dissertation was done with the intellectual and emotional support from many people,
and it is with great pleasure that I am expressing my gratitude to them.
First, I would like to thank my advisor Dani Byrd. She has introduced me to the
study of phonetics and guided me on my way ever since. The time and effort she has
devoted to me are immense, and through many inspiring meetings, she has helped shape
my linguistic perspective and sharpened my views as a scientist. My intellectual debt to
her is written on every page of this dissertation. Her kindness, support, encouragement
and confidence in me have made me do my best.
I am deeply indebted to Sun-Ah Jun for her continuous help and support. Long
discussions with her and her detailed comments on the dissertation and various other
projects have clarified many questions for me and greatly improved my work. Her good
wishes for me and kindness always reassured me.
I would also like to thank my other committee members, Shri Narayanan and
Michael Arbib, for sharing their knowledge with me, their helpful feedback and interest
in my work.
I am very grateful to Hagit Borer who was a great advisor through my M.A.
thesis. For administrative reasons she has not been on my final committee, but she has
generously supported me, in many ways, throughout my whole studies.
iv
I owe great thanks to Roumi Pancheva who has given me many academic
opportunities, and helped, encouraged and believed in me from my first days at USC till
now.
A number of people have given me insightful suggestions and help with my work.
I would particularly like to thank Louis Goldstein and Elliot Saltzman for valuable
discussions and suggestions, Sungbok Lee for extensive help with collecting and
analyzing EMA data, recordings and statistical advice, Sankarnarayanan
Ananthakrishnan for collaborating with me on the statistical analysis in Chapter 3, Mark
Tiede for help with analyzing EMA data, Dr. James Mah for help with collecting EMA
data. Thanks also go to all my speakers and listeners for patiently getting through the
many sentences in my experiments.
I am also very grateful to Pat Keating, Peter Ladefoged
†
, Stefanie Shattuck-
Hufnagel and Jean-Roger Vergnaud for their interest and help with my work. I also
greatly appreciate the help I received from Abby Kaun, Elsi Kaiser, Elena Guerzoni, and
Ed Finegan, who were all helpful with both comments on my work and with career
advice. I would like to thank Aaron Jacobs who has helped me with drawing figures.
Thanks also to the members of the Phonetics Lab Rebeka Campos-Astorkiza,
Susie Choi, Stephen Tobin, Aaron Jacobs and Daylen Riggs for useful discussions and
for making lab-life joyful. Thanks also to my other friends and colleagues in the
department: Janet Anderson, Justin Aronoff, Shadi Ganjavi, Carolina Gonzalez, Cristian
Iscrulescu, Fetiye Karabay, Nihan Ketrez, Agnieszka Łazorczyk, Emily Nava, Michal
v
Martinez, Tommi Leung, Eunjeong Oh, Isabelle Roy, Ana Sánchez-Muñoz, Michael
Shepherd and Florence van der Houwen.
I would also like to thank all the students, faculty and staff at the linguistics
department for making it such a stimulating environment and a wonderful place to be.
I am also grateful to the USC College of Letters, Arts, and Sciences, the USC
Phonetics Laboratory, the Department of Linguistics, the USC Morkovin Fellowship, the
Acoustical Society of America’s Raymond H. Stetson fellowship and NIH grant
DC03172 to Dani Byrd for financially supporting my education.
Before coming to USC, I spent two years at UCLA and I would like to thank the
Linguistics Department for their warm welcome. I am especially grateful to Susie Curtiss
for her huge help at the beginning of my stay in the US and for showing me the
fascinating world of neurolinguistics. While at UCLA I also met Elma Blom, Stella de
Bode, Alexander Kaiser and Stefano Vegnaduzzo. They made my first year in the US a
great time.
Further away, I would like to thank my professors at the English Department at
the Georg-Augusta University in Göttingen, Germany, Hero Janßen, Anja Wanner, and
Thomas Gardner for introducing me to the study of language. I am especially grateful to
Hildegard Farke for sharing her enthusiasm for linguistics and for believing in me from
the start.
I am lucky to have had my friends here and abroad who have shared the good and
the less good times with me and kept me in good spirits. Thank you Ron Bassilian, Stella
de Bode, Natascha Bremer, Rebeka Campos-Astorkiza, Nicole Dehé, Kristina Droege,
vi
Carolina Gonzalez, Florence van der Houwen, Katerina Kroucheva, Dejan Kosti ć,
Agnieszka Łazorczyk, Silke Lambert, Tommi Leung, Chris Lungley, Elvira Mandi ć,
Nataša Maslo, Alexandra Moffarts, Tamara Popovi ć, Ana Sánchez-Muñoz, Katja
Schmidt, PC Sharma, Rajka Smiljani ć, Stacy Benjamin Wood.
I would like to thank the German branch of my family, Tante Gi, Onkel Gunthi,
Annette and Bettina for making me feel at home throughout my stay in Germany. Thanks
to the Paris branch of my family, čika Đoka, Anijela, Marija and Miloš for cheering me
on all these years. Thanks to my brother-in-law Nils for all his support. And thanks to
Stefano Vegnaduzzo who has been next to me over the past seven years and shared with
me every bump on the way, and many good times.
My final thanks go to my sister Nana, and my parents Gudrun and Mirko, for their
love and support, for always being there for me and for going through every step with
me. Without them, this dissertation would not have been possible and it would not mean
nearly as much.
vii
Table of Contents
Dedication ii
Acknowledgments iii
List of Tables x
List of Figures xvi
Abstract xviii
Chapter 1: Introduction 1
1. Prosody 1
2. Prosodic Structure 2
3. Temporal Properties of Prosodic Boundaries 4
4. Articulatory Phonology 5
5. Outline of the Dissertation 6
Chapter 2: The Scope of Effect of Prosodic Boundaries 10
1. Introduction 10
2. Background: The Scope of Effect 10
3. The π-gesture Model 15
4. Experiment: The Scope of Effect of Prosodic Boundaries 18
5. Methods 19
5.1. Stimuli and Subjects 19
5.2. Data Collection 22
5.3. Measurements 23
5.4. Statistical Analysis 28
6. Results 29
6.1. Pre-boundary Temporal Results ([C
3
C
2
C
1
# ) 29
6.2. Pre-boundary Spatial Results ([C
3
C
2
C
1
#) 41
6.3. Post-boundary Temporal Results (# C
1
C
2
C
3
]) 48
6.4. Post-boundary Spatial Results ( # C
1
C
2
C
3
]) 58
6.5. Summary of Results 66
7. Discussion 69
8. Conclusions: The Scope of Effect of Prosodic Boundaries 78
Chapter 3: Gradiency and Categoricity in Prosodic Boundary Production and
Perception 80
1. Introduction 80
2. Background: Gradiency and Categoricity in the Production and
Perception of Prosodic Boundaries 80
3. Experiment: Categoricity or Gradiency of Prosodic Boundaries 85
viii
3.1. Methods: Production 86
3.1.1. Stimuli and Subjects 86
3.1.2. Data Collection 88
3.1.3. Measurements 88
3.2. Methods: Perception 92
3.2.1. Stimuli and Subjects 92
3.2.2. Data Collection 93
3.2.3. Measurements 95
3.3. Statistical Analysis 95
4. Results 96
4.1. Results: Production 96
4.2. Results: Perception 100
5. Discussion 106
6. Conclusions: Categoricity or Gradiency of Prosodic Boundaries 109
Chapter 4: Prosodic Boundary Perception and Articulation 110
1. Introduction 110
2. Background: Production-perception Link 110
3. Experiment: Prosodic Boundary Perception and Articulation 113
3.1. Methods: Production 114
3.1.1. Stimuli and Subjects 114
3.1.2. Measurements 115
3.2. Methods: Perception 117
3.2.1. Stimuli, Subjects, Data Collection and Measurements 117
3.3. Statistical Analysis 119
4. Results 120
5. Discussion 131
6. Conclusions: Prosodic Boundary Perception and Articulation 134
Chapter 5: Prosodic Structure and Pause Duration 136
1. Introduction 136
2. Background: Pause Duration 137
3. Experiment: Prosodic Structure and Pause Duration 140
3.1. Methods 141
3.1.1. Stimuli and Subjects 141
3.1.2. Data Collection 144
3.1.3. Measurements 144
3.1.4. Statistical Analysis 146
4. Results 146
4.1. Length Effects 148
4.2. Prosodic Complexity Effects 150
4.3. Interactions 152
5. Discussion 160
6. Conclusions: Prosodic Structure and Pause Duration 167
ix
Chapter 6: Conclusions 169
Summary 181
References 182
x
List of Tables
Table 2.1: Stimuli for Experiment 1. The boundary strength is expected to increase
from sentence 1 (control sentence, no boundary) to sentence 6. 20
Table 2.2: Realization of prosodic boundaries. 28
Table 2.3: Summary of temporal boundary effects on pre-boundary C
1
showing the
relationship between sentences and the number of boundary strengths
(levels) distinguished for duration and time-to-peak velocity.
The consonant string is [C
3
C
2
C
1
#. 31
Table 2.4: Summary of temporal boundary effects on pre-boundary C
2
, showing the
relationship between sentences and the number of boundary strengths
(levels) distinguished for duration and time-to-peak velocity.
The consonant string is [C
3
C
2
C
1
#. 32
Table 2.5: Summary of temporal boundary effects on pre-boundary C
3
, showing the
relationship between sentences and the number of boundary strengths
(levels) distinguished for duration and time-to-peak velocity.
The consonant string is [C
3
C
2
C
1
#. 33
Table 2.6: Results for pre-boundary opening and closing movement duration
(in ms) for subject B, (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s
PLSD determining the behavior of individual sentences).
The consonant string is: [C
3
C
2
C
1
#. 35
Table 2.7: Results for pre-boundary time-to-peak-velocity (in ms) for subject B.
(ANOVA, means and standard deviations, summary of the levels of
boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: [C
3
C
2
C
1
#. 36
Table 2.8: Results for pre-boundary opening and closing movement duration
(in ms) for subject E. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s
PLSD determining the behavior of individual sentences).
The consonant string is: [C
3
C
2
C
1
#. 37
Table 2.9: Results for pre-boundary time-to-peak-velocity (in ms) for subject E.
(ANOVA, means and standard deviations, summary of the levels of
boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences).The consonant string is: [C
3
C
2
C
1
#. 38
xi
Table 2.10: Results for pre-boundary opening and closing movement duration
(in ms) for subject R. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and
Fisher’s PLSD determining the behavior of individual sentences).
The consonant string is: [C
3
C
2
C
1
#. 39
Table 2.11: Results for pre-boundary time-to-peak-velocity (in ms) for subject R.
(ANOVA, means and standard deviations, summary of the levels of
boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: [C
3
C
2
C
1
#. 40
Table 2.12: Summary of results for pre-boundary C
1
displacement, showing the
relationship between sentences and the number of boundary strengths
(levels) distinguished for displacement.
The consonant string is [C
3
C
2
C
1
#. 42
Table 2.13: Summary of results for pre-boundary C
2
displacement, showing the
relationship between sentences and the number of boundary strengths
(levels) distinguished for displacement.
The consonant string is [C
3
C
2
C
1
#. 43
Table 2.14: Summary of results for pre-boundary C
3
displacement, showing the
relationship between sentences and the number of boundary strengths
(levels) distinguished for displacement.
The consonant string is: [C
3
C
2
C
1
#. 43
Table 2.15: Results for pre-boundary displacement (in mm) for subject B. (ANOVA,
means and standard deviations, summary of the levels of boundary
strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: [C
3
C
2
C
1
#. 45
Table 2.16: Results for pre-boundary displacement (in mm) for subject E. (ANOVA,
means and standard deviations, summary of the levels of boundary
strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: [C
3
C
2
C
1
#. 46
Table 2.17: Results for pre-boundary displacement (in mm) for subject R. (ANOVA,
means and standard deviations, summary of the levels of boundary
strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: [C
3
C
2
C
1
#. 47
xii
Table 2.18: Summary of temporal boundary effects on post-boundary C
1
, showing
the relationship between sentences and the number of boundary
strengths (levels) distinguished for duration and time-to-peak velocity.
The consonant string is: # C
1
C
2
C
3
]. 49
Table 2.19: Summary of temporal boundary effects on post-boundary C
2
, showing
the relationship between sentences and the number of boundary
strengths (levels) distinguished for duration and time-to-peak velocity.
The consonant string is # C
1
C
2
C
3
]. 49
Table 2.20: Summary of temporal boundary effects on post-boundary C
3
, showing
the relationship between sentences and the number of boundary
strengths (levels) distinguished for duration and time-to-peak velocity.
The consonant string is # C
1
C
2
C
3
]. 50
Table 2.21: Results for post-boundary opening and closing movement duration
(in ms) for subject B. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s
PLSD determining the behavior of individual sentences).The consonant
string is: # C
1
C
2
C
3
]. 52
Table 2.22: Results for post-boundary time-to-peak-velocity (in ms) for subject B.
(ANOVA, means and standard deviations, summary of the levels of
boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: # C
1
C
2
C
3
]. 53
Table 2.23: Results for post-boundary opening and closing movement duration
(in ms) for subject E. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s
PLSD determining the behavior of individual sentences). The consonant
string is: # C
1
C
2
C
3
]. 54
Table 2.24: Results for post-boundary time-to-peak-velocity (in ms) for subject E.
(ANOVA, means and standard deviations, summary of the levels of
boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: # C
1
C
2
C
3
]. 55
Table 2.25: Results for post-boundary opening and closing movement duration
(in ms) for subject R. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s
PLSD determining the behavior of individual sentences). The consonant
string is: # C
1
C
2
C
3
]. 56
xiii
Table 2.26: Results for post-boundary time-to-peak-velocity (in ms) for subject R.
(ANOVA, means and standard deviations, summary of the levels of
boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: # C
1
C
2
C
3
]. 57
Table 2.27: Summary of results for post-boundary C
1
, showing the relationship
between sentences and the number of boundary strengths (levels)
distinguished for displacement. The consonant string is # C
1
C
2
C
3
]. 58
Table 2.28: Summary of results for post-boundary C
2
, showing the relationship
between sentences and the number of boundary strengths (levels)
distinguished for displacement. The consonant string is # C
1
C
2
C
3
]. 59
Table 2.29: Summary of results for post-boundary C
3
, showing the relationship
between sentences and the number of boundary strengths (levels)
distinguished for displacement. The consonant string is # C
1
C
2
C
3
]. 60
Table 2.30: Summary of the effects of prosodic boundaries, showing the relationship
between sentences and the number of boundary strengths (levels)
distinguished. The numbers represent the sentences in the experiment. 61
Table 2.31: Results for post-boundary displacement (in mm) for subject B. (ANOVA,
means and standard deviations, summary of the levels of boundary
strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: # C
1
C
2
C
3
]. 63
Table 2.32: Results for post-boundary displacement (in mm) for subject E. (ANOVA,
means and standard deviations, summary of the levels of boundary
strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: # C
1
C
2
C
3
]. 64
Table 2.33: Results for post-boundary displacement (in mm) for subject R. (ANOVA,
means and standard deviations, summary of the levels of boundary
strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: # C
1
C
2
C
3
]. 65
Table 2.34: Comparison of magnitude of effect (differences given in ms). Each
column shows the differences between sentence means for a given
variable. Only those differences are given that occur on more than one
articulatory movement. 72
Table 3.1: Experiment stimuli. The boundary to be examined is between the words
‘donut and’, and ranges from ‘no boundary’ to ‘very strong boundary’.
Sentences in italics have not been collected for subject R. 87
xiv
Table 3.2: Data points not included in the analysis. 91
Table 3.3: Means and variances for each cluster (in standard deviations). 99
Table 3.4: Means and variances for each cluster. The top row refers to the
numbers of clusters in the analysis. 103
Table 3.5: Cluster distance. The rows titled “clusters examined” refer to the two
clusters in one clustering analysis that are being compared. 105
Table 4.1: Stimuli for the experiment. The boundary to be examined is between
the words ‘donut and’, and ranges from ‘no boundary’ to ‘very strong
boundary’. 115
Table 4.2: Results of linear regression. C
1
refers to the pre-boundary, and C
2
to
the post-boundary constriction. 121
Table 4.3: Results of stepwise multiple regression. 125
Table 4.4: Results of binomial regression. 125
Table 4.5: Results of stepwise multiple regression for weaker boundaries (in the
sequence VC
1
#VC
2
). 126
Table 4.6: Results of stepwise multiple regression for strong boundaries. 127
Table 4.7: Results of linear regressions performed on tokens of individual
sentences for constriction C
1
. Blank cell indicates no significant effect. 130
Table 4.8: Results of linear regressions performed on tokens of individual
sentences for constriction C
2
. Blank cell indicates no significant effect. 131
Table 5.1: Experiment conditions. 142
Table 5.2: Experiment stimuli. 143
Table 5.3: Pause durations for individual dyads, means and standard deviations
(in milliseconds) and for the dyads pooled (z-scores). 147
Table 5.4: Main effects of length and Fisher’s PLSD. The results for individual
dyads give the pause length in milliseconds, the pooled results in
z-scores. 149
xv
Table 5.5: Pre-boundary and post-boundary effects on pause duration. Individual
dyads and pooled results. 151
Table 5.6: Comparisons of means. The effect of prosodic matching on pause
duration. “Match shorter pause” means that the condition where there
was prosodic matching had a shorter pause duration than the condition
where there was no prosodic matching. All dyads pooled. 159
xvi
List of Figures
Figure 1.1: Prosodic constituents. 3
Figure 2.1: π-gesture representation. 18
Figure 2.2: Data measurements for pre-boundary C
2
([N]). 24
Figure 2.3: Schematized representation of tongue-tip tracking and derived
measurements for one constriction. 26
Figure 2.4: Schematic representation of the pre-boundary constrictions. 29
Figure 2.5: Schematic representation of the post-boundary constrictions. 48
Figure 2.6: Duration of the closing and opening movements. 67
Figure 2.7: Schematic representation of the scope of the effect of the boundary. 75
Figure 3.1: Schematized representation of tongue-tip tracking and derived
measurements for one constriction. 90
Figure 3.2: Visual Analogue Scale. 94
Figure 3.3: Histogram of the duration of the pre-boundary opening movement (C
1
). 96
Figure 3.4: Opening movement (C
1
). The results of one learning trial of the mixture
model. 98
Figure 3.5: Histogram of Perceived Boundary Strength (PBS) values. 101
Figure 3.6: PBS values. The results of one learning trial of the mixture model. 102
Figure 4.1: Schematized representation of tongue-tip tracking and derived
measurements for one constriction. 116
Figure 4.2: Visual Analogue Scale. 119
Figure 4.3: Correlation of the pre-boundary closing movement C
1
(in the sequence
VC
1
#VC
2
) to the perceived boundary strength. 122
Figure 4.4: Correlation of the pre-boundary opening movement C
1
(in the sequence
VC
1
#VC
2
) to the perceived boundary strength. 122
xvii
Figure 4.5: Correlation of the post-boundary closing movement C
2
(in the
sequence VC
1
#VC
2
) to the perceived boundary strength. 123
Figure 4.6: Correlation of the post-boundary opening movement C
2
(in the
sequence VC
1
#VC
2
) to the perceived boundary strength. 123
Figure 4.7: Box plots for PBS values for individual sentences. 128
Figure 5.1: Prosodic complexity. 136
Figure 5.2: Main effect of length. All dyads pooled. 150
Figure 5.3: Pre-boundary and post-boundary prosodic complexity effects.
All dyads pooled. 152
Figure 5.4: Interaction between pre-boundary complexity and length. Dyad C. 153
Figure 5.5: Interaction between pre-boundary complexity and length. Dyad E. 153
Figure 5.6: Interaction between post-boundary complexity and length. Dyad A. 154
Figure 5.7: Interaction between pre- and post-boundary complexity. Dyad I. 155
Figure 5.8: Interaction between pre- and post-boundary complexity. Dyad K. 155
Figure 5.9: Interaction between pre- and post-boundary complexity. Pooled dyads. 156
Figure 5.10: Interaction between phrase length, pre-, post-boundary complexity.
Dyad M. 157
Figure 5.11: Interaction between phrase length, pre-, post-boundary complexity.
Pooled dyads. 158
Figure 5.12: Rhythmic (matching) and non-rhythmic (non-matching) prosodic
structure. 163
Figure 6.1: Prosodic recursion. 171
Figure 6.2: A theoretical schematization of the production of prosodic structure. 180
xviii
Abstract
This dissertation examines aspects of phrase boundary production, perception and the
structural properties of boundaries from a multifaceted experimental perspective. The
term prosody refers to the accentual prominence and phrasal organization of speech, and
the dissertation focuses on the later. An example of this aspect of phrasal organization is
given below, where the two sentences differ in prosodic phrasing.
a. She knew, Ann thought, about the present.
b. She knew Ann thought about the present.
In addition to intonational events, at their edges, prosodic phrase boundaries introduce
systematic phonetic variation in the temporal properties of segments. Acoustic studies
have shown that at boundaries segments increase in duration. Articulatory studies have
shown that speech movements—gestures—become temporally longer in the vicinity of
boundaries and that this articulatory lengthening increases with boundary strength. In this
dissertation a series of experimental studies is presented examining a) the articulation of
gestures near phrase junctures, b) the categoricity and gradiency in the production and in
the perception of prosodic boundaries, c) the link between articulatory properties of
boundaries and listeners’ perception of boundaries, and d) the effect of prosodic structure
on pause duration. Results from this research further our understanding of the linguistic
representation of prosodic structure and its relation to processes involved in producing
spoken language.
1
Chapter 1: Introduction
The dissertation focuses on prosodic structure, its phonetic realization and its perception,
and I investigate these relations from a variety of perspectives.
Prosodic boundaries introduce systematic variation in the acoustic and
articulatory temporal properties of segments. Acoustic studies have shown that at
boundaries segments increase in duration (e.g., Gaitenby 1965, Oller 1973, Klatt 1975,
Shattuck-Hufnagel & Turk 1998, Wightman, Shattuck-Hufnagel, Ostendorf & Price
1992). Articulatory studies have shown that speech movements—gestures—become
temporally longer in the vicinity of boundaries and that articulatory lengthening increases
cumulatively for larger prosodic boundaries (e.g., Byrd & Saltzman 1998, Byrd 2000,
Edwards, Beckman & Fletcher 1991, Tabain 2003b). The dissertation is an investigation
of the temporal properties of prosodic boundaries from multiple perspectives. We explore
the relation between prosodic representation and articulation, acoustic reflections of
prosodic planning, and juncture perception. By using this multi-faceted approach, the
dissertation aims to make a contribution to our understanding of the relationship between
abstract prosodic structure and its phonetic realization.
1. Prosody
The term prosody refers to the phrasal organization and accentual prominence in speech. It
is the level of linguistic structure above the word level. Examples of accentual prominence
are given in (1), and of phrasal organization in (2). In the examples, prominence is indicated
2
by the capital letters and phrasal organization by the punctuation. The sentences (a) and (b)
differ in meaning due to their different prosodic structures.
1. a. She doesn’t LIKE apples… (but she eats them because they are healthy.)
b. She doesn’t like APPLES… (but she likes bananas.)
2. a. She knew, Ann thought, about the present.
b. She knew Ann thought about the present.
Accentual prominence draws the attention of the speaker to new and relevant information.
Phrasal organization groups words together for cognitive processing for the speaker. It also
facilitates speech processing for the listener, by signaling which words group into a
processing unit. The dissertation focuses on prosodic phrasing in speech, specifically on the
temporal properties of boundaries marking the phrases.
2. Prosodic Structure
Prosodic structure is generally understood to be hierarchically organized (e.g., Selkirk
1986, Nespor & Vogel 1986). Utterances are grouped in prosodic units, with higher units
dominating lower units. Depending on the theoretical approach, prosodic units can be
largely inferred based on syntactic structure (e.g. Selkirk 1984, Nespor & Vogel 1986) or
on intonational properties (e.g. Beckman & Pierrehumbert 1986, Pierrehumbert 1980; see
Jun 1993, 1998 for a comparison of these two approaches). While among the different
theories of prosodic structure there are disagreements on the number and definition of
prosodic constituents, in general all theories assume a major and minor prosodic category
above the level of word. In this dissertation we will use the terminology of Beckman &
Pierrehumbert’s 1986 model and refer to these categories as the Intonational Phrase (IP)
and the Intermediate Phrase (ip), respectively (for an overview of the various models see
Shattuck-Hufnagel & Turk 1996). The IP is the largest unit and is defined as the domain
of a coherent intonational contour that has at least a nuclear pitch accent, a phrase accent,
and a boundary tone. The ip includes at least a nuclear pitch accent and a phrase accent.
Both IP and ip exhibit final lengthening, but IPs are lengthened more than intermediate
phrases. The Tone and Break Indices (ToBI) intonation transcription system (Beckman &
Elam 1997), which will also be used, is based on this model for intonation and on the
work of Price et al. (1991) and Wightman et al. (1992) for the break indices. The distinct
prosodic break indices (signaling the perceived boundary strength) in ToBI correspond to
the three distinct prosodic categories: word (break index 1), intermediate phrase (break
index 3) and intonation phrase (break index 4), and one more break index signaling the
within a word boundary (e.g., a clitic boundary) (break index 0).
1
The prosodic
constituents are schematized in Figure 1.1.
IP Intonation Phrase (IP)
ip ip Intermediate Phrase (ip)
ω ω ω ω Word (w)
σ σ σ σ σ σ σ σ Syllable
3
T* T* T- T* T-
T%
Figure 1.1. Prosodic constituents (adapted from Beckman & Pierrehumbert 1986). “T*” stands for different types of
pitch accents, ‘T-’ for phrase accents and ‘T%’ for boundary tones.
1
In addition to these break indices there is also break index 2 signaling a mismatch between tonal
properties and the perception of the prosodic break.
4
3. Temporal Properties of Prosodic Boundaries
A number of studies have investigated the temporal acoustic and articulatory properties
of segments in the vicinity of prosodic boundaries. Acoustic studies have shown that at
the boundary, segments become longer, and this effect is generally referred to as phrase-
final lengthening (e.g., Gaitenby 1965, Oller, 1973, Klatt 1975, Cooper & Paccia-Cooper
1980, Wightman, Shattuck-Hufnagel, Ostendorf & Price 1992, Shattuck-Hufnagel &
Turk 1998, Turk 1999, Lee & Cole 2006). Although it has been examined much less, the
effect has been observed phrase initially as well (e.g., Pierrehumbert & Talkin 1991,
Shattuck-Hufnagel & Turk 1998, Tabain 2003a, Cho, McQueen & Cox 2007; but see
Wightman et al. 1992, where no post-boundary effects have been found). It has also been
found that the magnitude of final lengthening can distinguish several levels of
boundaries, with stronger boundaries showing more lengthening (Wightman et al.1992,
Tabain 2003a, Tabain & Perrier 2005).
Articulatory studies have shown that at boundaries, gestures become longer. This
has been shown phrase finally (e.g., Edwards, Beckman & Fletcher 1991, Tabain 2003b)
and phrase initially (e.g., Byrd & Saltzman 1998, Byrd 2000, Tabain 2003b). Relatedly,
studies examining linguopalatal contact have found that prosodic boundaries increase the
length of the contact for phrase-initial stops (e.g., Fougeron & Keating 1997, Fougeron
2001 for French, Cho & Keating 2001 for Korean). Further, final and initial articulatory
temporal lengthening have been observed to increase cumulatively for larger prosodic
boundaries, distinguishing several levels of phrasal lengthening (phrase finally: e.g.,
Byrd & Saltzman 1998, Byrd 2000, Cho 2005, Tabain 2003b, Tabain & Perrier 2005, and
5
phrase initially: e.g., Byrd & Saltzman 1998, Cho & Keating 2001, Fougeron 2001, Cho,
2005, Keating et al. 2004, Tabain 2003b).
The experiments in this dissertation will examine prosodic boundaries further,
investigating those temporal properties that have received less attention in the literature.
4. Articulatory Phonology
The work presented is framed within the Articulatory Phonology theory (e.g. Browman &
Goldstein 1986, 1989, 1990a, 1992, 1995). Fundamental to Articulatory Phonology is the
notion of gestures as linguistically relevant constrictions of the vocal tract (e.g. tongue tip
narrowing, lip aperture closure etc.). In other words, a gesture specifies a linguistic task.
There are three central assumptions in Articulatory Phonology: gestures are both units of
information (serving phonology) and articulatory units of action (serving phonetics). This
is possible because of the second central assumption, which is that gestures are dynamical
systems,
2
and as such they are inherently specified for temporal and spatial properties.
Characterizing gestures as dynamical systems means that each gesture is specified by a
set of spatial and temporal point attractor parameters, including its target, stiffness (its
inherent temporal quality) and damping (critical damping is assumed). Resulting
articulator movement trajectories depend on initial conditions and the coactive gestures.
It should be noted that a gesture is defined in terms of a specific task (tongue tip
constriction for example), not in terms of the individual articulator behavior. Articulatory
Phonology is implemented in the Task Dynamics Model (Saltzman & Munhall 1989).
2
A dynamical system model is a system in which the changes of the system are predicted by its current
state and a set of laws that specify the forces changing the system (Saltzman & Kelso 1987, Saltzman
1995).
6
Once the gestural activation score is specified, the Task Dynamics Model calculates the
articulatory trajectories of the individual articulators, which then can be executed in a
vocal tract model that generates the acoustic signal.
Gestures are both abstract phonological units, invariantly specified across the
different speech contexts (for example, a tongue tip constriction for an alveolar
consonant) and units of articulatory action. In other words, the phonological and phonetic
properties are isomorphic, one representing the macroscopic and the other the
microscopic dimension of the same process (Browman & Goldstein 1990b), and there is
no need for translation between the phonological and phonetic level of representation. An
approach to prosodic structure within this model (the π-gesture model of Byrd &
Saltzman 2003) will be outlined in Chapter 2.
5. Outline of the Dissertation
The first experiment, presented in Chapter 2, examines the articulatory manifestation of
prosodic structure by testing how distant from a prosodic boundary the effect of that
boundary extends, that is, by evaluating the temporal scope of prosodic boundary effects.
In two earlier articulatory studies (Byrd, Krivokapi ć & Lee 2006 and Lee, Byrd &
Krivokapi ć 2006), we have shown that the effect of an Intonational Phrase boundary is
largely limited to the gestures at the final and initial prosodic phrase edges. Surprisingly,
we have found compensatory shortening further away from the boundary. In this
dissertation, we extend this research to examine the temporal scope and magnitude of
boundaries of varying strength in an articulator movement tracking study (EMA).
7
Chapter 3 examines whether prosodic boundaries are produced in a categorical
manner, i.e., whether boundary strength values cluster around a small, fixed number of
values (one characteristic for each category) or pattern in a gradient manner, i.e., the
strengths of junctures do not cluster around a limited number of target values. The
question is addressed by using articulatory movement-tracking data. This same question
is also approached experimentally in part 2 of this experiment from the point of view of
listeners’ perception. The investigation of gradiency versus categorical
production/perception addresses the issue of how abstract structural representations are
conveyed in speech. Few studies have tackled these questions of categoricity, despite the
fact that our assumptions about core properties of linguistic structure depend on the
answers.
By combining production and perception experimental techniques, in Chapter 4
we address the nature of the link between articulation and perception. Acoustic studies on
the perception of prosodic boundaries have found a correlation between acoustic
properties of prosodic boundaries and their perception in speech. Since articulation
shapes acoustics, we expect that there will be a link between articulatory properties and
the perception of boundaries, but the precise nature of this relation is not clear.
In Chapter 5 a further aspect of prosody that is examined is prosodic structural
effects on pause duration. Earlier studies have explored the effects of syntactic structure on
pauses, but there are indications that prosodic structure, rather than syntactic, might be the
dominant factor in determining pause duration (e.g., Ferreira 1991, Gee & Grosjean 1983).
In this part of the dissertation, an acoustic study is conducted using the synchronous speech
8
paradigm in which speakers read sentences synchronously in dyads thereby minimizing
variability due to speech rate and individual differences. In a previous study (Krivokapi ć
2007) using this paradigm, we have shown that post-boundary prosodic branching of an
upcoming phrase induces shorter pause duration than an upcoming non-branching phrase of
equal length and, in a second experiment, that length of preceding and upcoming phrases
both increase pause duration. We interpreted these results as indicating that phonetic
encoding proceeds one prosodic chunk at a time. Phonological length (e.g., syllables) was
argued to determine how long the encoding time for an upcoming phrase will be. These
results have implications for models of speech production. While few speech production
models consider the planning of prosodic structure, two approaches can be distinguished:
the partially incremental, prosody-first approach (Keating & Shattuck-Hufnagel 2002) and
the fully incremental, prosody-last approach (Levelt 1989, Levelt, Roelofs & Meyer 1999).
The effects of prosodic and phonological structure on pause duration argue for a prosody-
first, partially incremental model.
Chapter 6 discusses the results of the dissertation and their implication for prosodic
theory and models of speech production and we outline our own view of the production of
prosody.
In summary, while many studies have addressed the phonology-phonetics
interface, the relation between prosody and phonetics is less well understood. The crucial
question addressed in this dissertation is to what degree abstract prosodic representation
is or is not divorced from speech production processes and the question of what linguistic
information is present in the prosodic structural representation and how this is realized in
9
the speech signal. By combining acoustic, articulatory, and perception behavioral studies
with attention to issues of linguistic structural representation, I hope to contribute to our
understanding of the prosody-phonetic interface.
10
Chapter 2: The Scope of Effect of Prosodic Boundaries
1. Introduction
In this chapter the temporal scope of effect of prosodic boundaries will be investigated.
While many studies have examined the effect of prosodic boundaries on segments
immediately adjacent to the boundary, only few have investigated how far leftward
(earlier in time preceding the boundary) and how far rightward (after the boundary) the
effects extend. We also investigate the scope of effect of prosodic boundaries of different
strengths. The effect of six boundaries of different strengths, ranging from no boundary
to very strong boundary, on three consonants leftward and three consonants rightward
from the juncture will be considered.
2. Background: The Scope of Effect
A few acoustic and articulatory studies examine the scope of effect of Intonation Phrase
boundaries, and one acoustic study of Dutch (Cambier-Langeveld 1997) examines the
scope of effect for boundaries of different strength.
Oller (1973) examines acoustic lengthening of phrase-final syllables and finds
that onset, nucleus and coda lengthen.
1
Kohler’s (1983) study for German shows final
lengthening in two- and three-syllable words and shows that the magnitude of
lengthening is largest on the final segment of the words but extends, to a lesser degree,
further away from the boundary (the scope of lengthening is given by morphological
categories, not by individual segments, so it is not clear how far into the wordstem the
lengthening spreads). Also for German, Silverman’s (1990) experiment on three-syllable
1
Unless otherwise indicated, the studies discussed examine processes in English.
11
words with stress on the second syllable indicates that phrase-final lengthening extends
up to three syllables leftwards, with the first syllable lengthening less than the second (the
data for the third syllable are not reported), indicating a decrease of boundary effect at
segments further away from the boundary. Campbell and Isard (1991) show in a corpus
study that in pre-pausal sentence-final positions, final syllables lengthen progressively, in
that coda and nucleus lengthen significantly more than the onset of a syllable. A corpus
study of Wightman, Shattuck-Hufnagel, Ostendorf and Price (1992) indicates that the
effect of the boundary extends to the final rhyme. (Note that their measures might not
have been sensitive enough to find further effects, since they examined the segments
preceding the final vowel, up to the foot-initial vowel, as one measure, so the onset of the
final syllable was not examined separately.)
Progressive phrase-final lengthening (i.e., increasing as segments become closer
to the boundary) is also found in Turk (1999). She examines two-syllable words with
stress on the first or second syllable. Acoustic lengthening occurs mainly on the syllable
rime, and the scope of lengthening extends to the stressed syllable (including the stressed
syllable’s rime), but only one of the two subjects in this condition shows onset
lengthening on the final syllable. One of the two subjects shows shortening on the first
syllable in words with the lexical stress on the rime of the second syllable. Lengthening
in the final syllable is progressive, strongest at the coda and less at the nucleus. From the
results presented in her Table II (Turk 1999:240), it is also clear that when the rime in
both the penultimate syllable and the final syllable lengthened, the penultimate syllable’s
rime lengthened less than the final syllable, again showing progressive lengthening.
12
Shattuck-Hufnagel and Turk (1998) investigate the scope of boundary-related
acoustic lengthening and find that pre-boundary, it extends to the stressed syllable of a
word and is strongest in the phrase final syllable. Post-boundary, lengthening occurs on
the onset of the syllable in cases where the pre-boundary word is iambic but not when it
is trochaic.
The most extensive acoustic studies on the scope of phrase-final lengthening have
been conducted by Berkovits (Berkovits 1993a,b, 1994), for Hebrew. Berkovits (1993a,b)
examines bisyllabic words with stress on the final syllable and finds that lengthening
extends to the first syllable. The effect is progressive, with the effect strongest closest to
the boundary, on the final syllable. Within the final syllable, lengthening is also
progressive, strongest at the final segment and decreasing in magnitude on segments
further away. Berkovits (1993a) finds that 5% of the average word lengthening was on
the initial syllable and 95% on the final syllable. In the final syllable, 8% of the total
syllable lengthening was onset lengthening, 23% nucleus lengthening, and 69% final
consonant lengthening. Berkovits (1994) examines bisyllabic words with stress on the
first or second syllable. She finds that final lengthening extends to the initial syllable in
words with initial stress and is confined to the final syllable in words with final stress.
Lengthening is progressive: in words with initial stress, 25% of the average word
lengthening was on the initial syllable, and 75% on the final. In the final syllable, 18% of
the total syllable lengthening was onset lengthening, 25% nucleus lengthening, and 57%
final consonant lengthening. In words with final stress, 5% of the average word
lengthening was on the initial syllable (which was non-significant), and 95% on the final
syllable. In the final syllable, 17% of the total syllable lengthening was onset lengthening,
13
30% nucleus lengthening, and 53% final consonant lengthening. Berkovtis’s studies
clearly show that prosodic boundaries cause progressive lengthening.
A study on pre-pausal final lengthening for vowels in Estonian (Krull 1997) finds
that in bisyllabic words with stress on the initial syllable, acoustic final lengthening can
extend up to the first vowel, with results varying depending on syllable duration and
speaker.
Finally, one acoustic study of Dutch by Cambier-Langeveld (1997) (see also
Cambier-Langeveld, Nespor & van Heuven 1997) examines the scope of effect of
different prosodic boundaries on pre-boundary lengthening. Cambier-Langeveld (1997)
examines the effect of Utterance, IP, Phonological Phrase, and Prosodic Word boundaries
and finds that the IP and Utterance show more lengthening than the Phonological Phrase
and Prosodic Word. The boundary has an effect on the syllable rhyme in all cases, and
the effect extends sometimes to the onset. When the final syllable’s nucleus is a reduced
vowel, there is a significant lengthening effect on the penultimate syllable, and when the
final syllable’s rhyme is a reduced vowel and a coda, there is a tendency in the same
direction. The effect of the boundary is largest in the final segment and decreases with
distance from the boundary. Cambier-Langeveldt’s interpretation of the different scope of
effect for light versus heavy syllables is that it might be due to the shorter duration of the
light syllables, since syllables containing reduced vowels are shorter than the ones
containing full vowels. (See however Turk 1999, who does not find such an effect of
reduced vs. full syllable on the domain of lengthening.)
There are also articulatory studies that indicate that final lengthening is
progressive. Edwards, Beckman and Fletcher (1991) examined the opening and closing
14
jaw movement in a VC# sequence. They found that, for two out of the three subjects that
show an effect on both the jaw closing movement (closest to the boundary) and the jaw
opening movement (further away from the boundary), the closing movement is longer.
This could indicate that the effect of the boundary is strongest at the movement closest to
the boundary.
Fougeron and Keating (1997) examined linguopalatal contact in [C
1
V
1
# C
2
V
2
]
sequences. They found boundary effects immediately at the boundary, i.e., on V
1
and C
2
but also, in some cases, further away, on V
2
and C
1.
The effect in these cases was smaller
in magnitude, again indicating progressive boundary effects.
The kinematic studies by Byrd and Saltzman (1998) and Byrd (2000) of the
sequences [C
1
V
1
# C
2
V
2
] showed that for the consonants, most lengthening is found for
the closing movement of C
2
and less lengthening for the opening movement for C
1
,
further away from the boundary (Byrd & Saltzman 1998). Similarly for the vowels, most
lengthening is found closer to the boundary, for V
1,
and barely any lengthening is seen for
V
2
(Byrd 2000).
Byrd, Lee, Riggs and Adams (2005) examine the effects of prosodic boundary on
onsets and codas and find that the articulatory movement immediately preceding the
boundary (opening movement) of phrase-final codas lengthen more than the closing
movements, which are further away from the boundary, also showing the pattern that
boundary effects are strongest at the boundary.
A kinematic study by Byrd, Krivokapi ć and Lee (2006. See also Lee, Byrd &
Krivokapi ć 2006) examines pre-boundary and post-boundary scope of effect. In a pre-
boundary [C
1
V C
2
V C
3
V # ] sequence, they examine the consonant closing and opening
15
movements (compared to a no-boundary control condition) and find an effect of the
boundary on C
3
opening movement (closest to the boundary), and for one subject (of
four) on C
3
closing movement. Post-boundary, in the sequence [# C
1
V C
2
V C
3
]
(comparing it to a no-boundary control condition), they find an increase in duration on C
1
closing movement and for two subjects on its opening movement. Further away from the
boundary, on C
2
and C
3
, a temporal shortening is observed. Byrd et al. (2006) interpret
the shortening as compensatory shortening, in the sense that it might compensate (in
direction, though not in magnitude) for the boundary induced lengthening, and represent
a return of the gestures to a prosodically unperturbed timing. (Note that the shortening
effect is understood as not caused directly by the boundary, it is a reaction to the effects
of the boundary, see Byrd et al. 2006.) In this study, as in the acoustic studies, the effect
of the boundary is strongest at the boundary.
To summarize, the studies examining the scope of effect of prosodic boundaries
have found that the effect of the boundary is strongest at segments immediately adjacent
to the boundary. Though effects exist further away from the boundary, they are less
strong. However, most of these studies only examined an Intonation Phrase boundary,
i.e., a juncture marked with a boundary tone. The study presented here examines the
scope of effect of boundaries of various strengths.
3. The π-gesture Model
A unified account of the temporal and spatial effects occurring at prosodic boundaries has
been offered by the π-gesture model (Byrd & Saltzman 2003; see also Byrd, Kaun,
Narayanan & Saltzman 2000 for an earlier development of the model.). The π-gesture
16
model views prosodic boundaries as prosodic gestures ( π-gestures) that extend over a
period of time and whose effect is to slow down the central clock that paces the activation
of gestures. As a consequence, computational simulations have shown that gestures
become temporally longer, less overlapped and may show more displacement (due to
these temporal changes).
The concept of the π-gesture extends the Articulatory Phonology approach
(Browman & Goldstein 1992, 1995) to phonological representation to the level of
prosodic structure. In Articulatory Phonology the basic phonological unit is the gesture.
A gesture is a linguistic unit, specifying a target for a vocal tract constriction, for example
tongue tip constriction for alveolar consonants. A central point of Articulatory Phonology
is that gestures are units of information and units of action. As units of information they
specify lexical contrast, for example by means of the presence of a gesture (vs. absence)
or by difference in constriction degree or target, or by the temporal relations between
gestures. As units of action, these phonological representations contain spatial and
temporal information. All the information necessary for speech production are contained
in the lexical representation, thus obviating the need for a translation from phonological
structure to phonetic implementation.
The π-gesture approach extends these properties to prosodic structure. π-gestures
extend over a period of time, that is, their temporal properties are part of their
representation. Like constriction gestures, π-gestures are also coordinated with other
gestures. Unlike constriction gestures, the π-gesture does not have a constriction (i.e.
vocal tract) task. The linguistic goal (or task) of the π-gesture is the slowing, at prosodic
boundaries (and possibly other prosodic events), of the central clock determining the
17
global timing of speech of an individual speaker. In other words, the π-gesture slows the
timecourse of co-active constriction gestures. As a consequence, constriction gestures
become temporally longer and less overlapped.
2
So the π-gesture modulates the
activation properties of the constriction gestures it is co-active with, by effectively
stretching or warping these gestures. These effects correspond to the empirically
observed effects of prosodic boundaries that have been reported in the literature, as
discussed in Chapter 1 (see Byrd & Saltzman 2003 for further discussion of these effects
and for the modeling of the π-gesture effects on articulation).
Figure 2.1 schematizes the π-gesture. At the time when the π-gesture has the
strongest activation its effects on the constriction are strongest. The strength of activation
of the π-gesture and, as a consequence, its strength of effect, is related to boundary
strength, stronger boundaries induce stronger effects.
The activation of the π-gesture increases and decreases smoothly, therefore
boundary effects are expected to be continuous. So the boundary effects are predicted to
be strongest at the boundary, progressively decreasing as the distance from the boundary
increases. How remote from the phrase edge these articulatory effects extend is an
empirical question this study investigates. A further empirical question is how the π-
gesture is coordinated with other gestures – whether boundary effects extend equally pre-
2
As a consequence of this slowing and lessened overlap, gestures can become spatially larger as well, since
there is more time for a gesture to reach its target.
boundary and post-boundary or whether the strongest effect occurs pre-boundary or post-
boundary.
3
This is a further question addressed in the present study.
π-gesture
Constriction
gesture
Activation
strength
s c o p e
Constriction
gesture
Constriction
gesture
Figure 2.1. π-gesture representation (Byrd & Saltzman 2003, Byrd et al. 2000). The top of the figure shows
the π-gesture, its activation increasing and decreasing (strongest, in this representation, at the center). The
lower shaded box shows the scope of the gesture, and the strength of the effect is represented with the
shading: the effect is strongest in the center, and decreases from there. (Note however that the actual
coordination of the π-gesture with constriction gestures is an empirical question).
4. Experiment: The Scope of Effect of Prosodic Boundaries
The goal of this articulatory experiment is to study how far from the juncture in the
articulatory movements the effect of the boundary will extend for prosodic boundaries of
different strength. As mentioned in the literature review, only a few articulatory studies
of boundary scope exist (Byrd et al. 2006, Lee et al. 2006), and only one acoustic study
on Dutch examined more than one type of prosodic boundary (Cambier-Langeveld 1997).
Based on these studies and on the predictions of the π-gesture framework, it is expected
that the strongest effect of the boundary will be local to the boundary but extend further
18
3
Note that this could be a matter of shape of the π-gesture as well, in the sense that a stronger pre-
boundary effect for example could indicate that the π-gesture has its strongest activation earlier in its
interval of activation.
19
away from it, decreasing in magnitude with distance from the boundary. The strength of
the effect is expected to increase with boundary strength. The null hypothesis for the
scope of effect for boundaries of varying strength is that it is the same regardless of
boundary strength. Boundaries of different strength would then vary only in magnitude of
effect. A study on Dutch by Cambier-Langeveldt (1997) seems to lend support to this
idea, in that she finds that lengthening for boundaries of different strength differs in
magnitude of effect but not in scope of effect. We also report effects of the boundary on
displacement, with the goal of contributing empirical data.
5. Methods
5.1. Stimuli and Subjects
Five sentences were constructed, all containing the sequence a donut # and a sweet,
where # marks the expected prosodic boundary, and a sixth, control sentence, with a
donut and a sweet expected to be produced with only a word boundary. The temporal and
spatial properties of the three constrictions leftward ([D N T] in donut) and three
constrictions rightward ([ND S T] in and a sweet) were measured to evaluate the scope of
effect of the boundary. The constrictions in the sequence [D N T # ND S T] will be
referred to as [C
3
C
2
C
1
# C
1
C
2
C
3
] respectively, the numbers reflecting the closeness to
the prosodic boundary. Syntactic structure, phrase length and position in the sentence
were varied to create boundaries of different strength. The stimuli are given in Table 2.1,
with the relevant sequence underlined:
20
Table 2.1. Stimuli for Experiment 1. The boundary strength is expected to increase from sentence 1 (control sentence)
to sentence 6.
1. Mary’d like a donut and a sweet apple for breakfast, as always on Sundays.
2. Pete brought me a humongous mocha donut and a sweet apple from Spain.
3. While Ann was eating a donut and a sweet angora cat was waiting, he came.
4. The trainer who was eating a donut, and a sweet angora cat, have arrived.
5. On Sundays they like a donut, and a sweet apple biscuit on Saturdays.
6. The kids would like a donut?! And a sweet hostess wouldn’t even let them?
Note that the segmental content was controlled as much as possible in the vicinity of the
boundary. However, due to lexical limitations, certain segmental differences were
unavoidable before or after the target phrase, and this will of course have subtle
reflections in the articulation.
The first sentence is the no-boundary sentence, while in the remaining sentences
a boundary after donut, gradually increasing in strength, was anticipated. We anticipated
an increase in boundary strength based on syntactic structure, boundary position and
length of the phrases surrounding the boundary between donut and and (see e.g., Cooper
& Paccia-Cooper 1980, Grosjean, Grosjean & Lane 1979, Sanderman & Collier 1995 on
how these factors affect boundary strength, or Chapter 5 of this dissertation for an
overview). In sentence 1, of the known factors influencing boundary placement, none led
us to expect a boundary (the noun phrases before and after the boundary (a donut and a
sweet apple) are short and both are part of a larger phrase, namely the verb’s object). In
sentence 2 the noun phrase before the boundary is longer than that in sentence 1, and
since there is evidence that longer phrases lead to stronger boundaries (e.g., Sanderman &
Collier 1995), we expected a bigger boundary compared to sentence 1. In sentence 3 the
two phrases before and after the boundary, Ann was eating a donut and a sweet angora
cat was waiting, are both long constituents with a complex syntactic structure (compared
21
to sentences 1 and 2) of approximately equal length which is likely to induce a boundary
(Grosjean, Grosjean & Lane 1979). Due to the combination of these factors the strength
of the boundary in 3 was expected to be stronger than in sentences 1 and 2 (e.g., Grosjean
et al. 1979, Sanderman & Collier 1995). Sentence 4 is similar to sentence 3, except that
the boundary strength was expected to be stronger because of the reinforcement by
punctuation and the syntactically more complex pre-boundary phrase (noun phrase with a
relative clause). In sentence 5 the pre-boundary constituent is syntactically complex (a
clause containing a preposed adverbial phrase) and the expected boundary is dividing the
sentence into two approximately equally long parts, with a semantic contrast between the
two parts which we expected might further lead to a stronger boundary. We expected
sentence 6 to contain the strongest boundary since the boundary is between two complete
sentences, and in addition, although there is no evidence in the literature for this, we
expected that the surprise indicated by the punctuation would lead to an even stronger
boundary.
Note that due to the necessary constraints on syntactic structure and lexical items
a consonant constriction ([C
1
]) is immediately preceding the boundary, and a vowel
constriction is immediately following the boundary. This means that post-boundary [C
1
]
is further away from the boundary than pre-boundary [C
1
].
These six sentences were collected as part of a larger experiment (described in
Chapter 3). For both experiments together, 28 sentences were constructed. Each sentence
was repeated six times (yielding a total of 168 sentences). The set of sentences was
pseudo-randomized, in blocks of 28 stimuli. To ensure that the control sentence did not
vary according to context, this sentence was also read 6 times at the beginning of the
22
experiment, and 6 times at the end of the experiment, in addition to being included within
the 168 sentences (so the whole experiment was 180 sentence in total). However, data
collected from the first subject (B) showed that the average duration of the control
sentence in each of the three positions (pre-, post- and within the experiment) did not
vary from the average of all the control sentence tokens. Therefore, in order to shorten
the otherwise long experiment duration for the two other subjects, the pre- and post-
sentences were removed from the following experiment runs. The twelve tokens of the
control sentences (pre- and post-experiment) recorded for the first subject (B) were not
included in the data analysis. In addition, four more of the 28 stimuli were excluded for
the 3
rd
subject (R) in order to shorten the experiment further (of course none of these
were sentences intended for use in this experiment). The eliminated sentences were
estimated to add least boundary gradation.
Three subjects participated (B, E, and R). All are native speakers of American
English, with no known language deficits. They were paid for their participation and
naïve as to the purpose of the experiment. They were instructed to read the sentences as if
they were reading a story to someone.
5.2. Data Collection
Data were collected using the Carstens Articulograph (AG200). Three sensors were
placed on the subject: one on the tongue tip, tracking articulatory movement, and two
reference sensors on the bridge of the nose and maxilla. The articulatory data were
sampled at 200Hz and acoustic data at 16 kHz. The data were corrected for head
movement (using the nose and maxillary reference sensor tracking) and rotated to the
23
occlusal plane. The tongue tip y (vertical) signal was differentiated, and signals were
smoothed before and after differentiation with a 9th order Butterworth filter of cutoff
frequency 15 Hz.
5.3. Measurements
Data were analyzed using in-house software within the MATLAB computing
environment. For each constriction, five time points were identified from the velocity
signal (see Figure 2.2): the onset of the closing movement, peak-velocity time point for
the closing movement and for the opening movement, the extremum of the closing
movement, and the end of the opening movement. These time points were identified from
the tongue-tip y-velocity zero-crossings and peak velocity time points. The extremum for
the closing movement is also defined as the beginning of the opening movement. The end
of the opening movement is also defined as the onset for the closing movement for the
following consonant constriction. This leads to a total of 25 data points for the series of
six consonants. The onset of the first consonant could not be reliably identified for two
subjects (E and R) and was therefore not included in the analysis. The total number of
data points for each utterance was 24. For these points, the spatial position of the tongue
tip vertical movement was recorded as well.
1280 1300 1320 1340 1360 1380 1400 1420 1440
msec
Audio
TTy velocity
Zon Zon Zon Pon Von
TTy
Figure 2.2. Data measurements for pre-boundary C
2
([N]), timescale is in ms. The top window shows the complete
audio signal for the stimulus. The second window shows the audio signal for the selected sequence. The third window
shows the vertical movement of the tongue tip. On it are marked the data points of interest here (derived from the
vertical velocity signal tracking, shown in the bottom window). The zero-crossings (marked as Zon) identify the
following data points, in this order: closing movement onset, closing movement extremum/onset of opening
movement, end of opening movement/onset of closing movement for the next constriction. The markings Pon and Von
identify the peak-velocity for the closing and opening movements respectively. The bottom window shows the vertical
velocity signal tracking, from which the markings are derived.
From these data, we calculate the following dependent spatiotemporal measures
(see also Figure 2.3 for a schematized representation):
• closing movement time-to-peak-velocity: the time from closing movement onset to
peak velocity
• closing movement duration: the time from closing movement onset to extremum
• opening movement time-to-peak-velocity: the time from extremum (opening
movement onset) to the peak velocity of the opening movement
• opening movement duration: the time from extremum (opening movement onset) to
the end of the opening movement
24
25
• displacement of closing movement: the distance between the closing movement
onset position and the extremum position
• displacement of the opening movement: the distance between the extremum position
(opening movement onset) and the end of the opening movement.
These variables inform us about the closing and opening movement duration and
magnitude, and time-to-peak-velocity is a good indicator of the gestural stiffness
parameter within the Task Dynamics’ mass-spring gestural model (Saltzman & Munhall
1989, Byrd & Saltzman 1998). The gestural stiffness parameter (one of the gestural
parameters in the Task Dynamics model) indicates the inherent duration of a gesture; a
gesture with lower stiffness will have longer duration and longer time-to-peak velocity
than a gesture with higher stiffness (Byrd, Kaun, Narayanan & Saltzman 2000). Stiffness
is a control parameter that has been used to describe effects of prosodic boundaries
(Edwards, Beckman & Fletcher 1991, Byrd et al. 2000). Within the π-gesture model
(Byrd & Saltzman 2003), gestural stiffness is affected as a consequence of the clock
slowing caused by the π-gesture as the gestural activation trajectory gates in this
parameter value over time.
Closing
movement
duration
Opening
movement
duration
time-to-peak-
velocity
Vertical tongue tip position
Time
closing movement
extremum/opening
movement onset
Velocity zero-crossing
Peak velocity
displacement
closing
movement
onset
opening movement
end/closing movement
onset
Figure 2.3. Schematized representation of tongue-tip tracking and derived measurements for one constriction.
For the pre-boundary C
3
, C
1
and post-boundary C
2
, ([D], [T] and [S]) occasionally
there was a plateau for the consonant constriction (i.e., more than one zero-crossing for
the tongue-tip extremum). In such cases, the first extremum point was measured for zero-
crossing constriction forming. In occasional cases of more than one velocity peak, the
highest peak velocity was taken as the relevant data point. At the boundary, between the
pre-boundary C
1
and post-boundary C
1,
there were cases of a dip with more than one
zero-crossing possibly being identified as the pre-boundary C
1
opening movement
end/post-boundary C
1
closing movement onset. In these cases, the zero-crossing most
closely preceding the maximum peak velocity for post-boundary C
1
was logged.
26
The production of all the sentences in the experiment was evaluated using ToBI
guidelines (Beckman & Elam 1997). It was also checked that subjects produced the
stimuli consistently. A few sentence tokens were taken out because they were produced
markedly different (in terms of boundary strength) from the rest of the tokens of the same
sentence by the same subject. For subject E, three sentences were taken out (one token of
sentence 1, sentence 2 and sentence 3), for subject B five sentences were excluded from
27
further analysis (two tokens of sentence 2, two tokens of sentence 3 and one token of
sentence 4), and for subject R two tokens were excluded (a token from sentence 1 and
one token from sentence 3). Additionally, one token each from subject R (sentence 1) and
subject E (sentence 5) was excluded because of data collection error. Finally, subject E
produced [s] with the constriction far away from the tongue tip and as a result, tracking of
the [s] constriction for that subject was not possible.
The sentences were evaluated as to boundary strength using ToBI guidelines
(Beckman & Elam 1997). The categories Intonation Phrases, Intermediate Phrases, and
no boundary (i.e., word boundary) were marked. Intermediate Phrase (ip) boundaries
were identified by a phrase accent and final lengthening, and Intonation Phrase (IP)
boundaries were identified by phrase accent, final lengthening, and a boundary tone. The
predicted increase in boundary strength was realized, i.e., there was an increase in
boundary strength going from no boundary in sentence 1 to strong boundary (IP) in
sentence 6. Note that while ToBI does not distinguish more than the three transcribed
categories (except for clitic boundaries, which did not occur in this experiment), based on
syntactic structure, phrase length and boundary position in the sentence we a priori
expected an increase in boundary strength from sentence 1 to sentence 6 (as discussed in
section 5.1). One subject (B) produced an ip boundary in sentence 1 as well, so for this
subject there was no sentence without any boundary. (Subject B also produced her other
sentences (not included in this experiment) with stronger boundaries than the other two
subjects). Table 2.2. gives the boundary production of the six sentences for each of the
28
subjects.
4
Inter-subject variability is expected, as previous studies have shown that while
there is a default prosodic realization of sentences (Jun 2003a), subjects differ in their
production of boundaries, both in number and strength of prosodic domains (e.g.
Fougereon & Keating 1997, Byrd & Saltzman 1998, Byrd, Lee, Riggs & Adams 2005).
This is also the reason that the statistical analysis is performed separately for the
individual subjects.
Table 2.2. Realization of prosodic boundaries.
B E R
Sentence 1 ip no boundary no boundary
Sentence 2 ip ip ip
Sentence 3 IP ip ip
Sentence 4 IP ip IP
Sentence 5 IP IP IP
Sentence 6 IP IP IP
5.4. Statistical Analysis
A within-subject ANOVA was conducted testing for effects of boundary strength of each
sentence on duration, time-to-peak-velocity, and displacement for each consonant, where
sentence number (1-6) indicated increasing boundary strength both in terms of a priori
syntactic structure and ToBI analysis. When there was a significant main effect, Fisher’s
PLSD post-hoc tests were conducted to determine the behavior of the individual
sentences. Criterial statistical significance was set at p <.05. All and only statistically
significant results are reported.
4
We have also conducted the analysis of the scope of effect for these sentences grouping them into three
groups of boundary strength, and obtained very similar results (see Krivokapi ć 2006).
6. Results
6.1. Pre-boundary Temporal Results ([C
3
C
2
C
1
# )
The post-boundary constrictions are schematically represented in Figure 2.4.
Figure 2.4. Schematic representation of the pre-boundary constrictions.
The means and significant results for pre-boundary temporal effects are given in Tables
2.6 to 2.11 at the end of this section. A summary of the effects of the boundary on each
constriction is given after the results for that constriction are reported.
Closest to the boundary, for the C
1
opening movement, all subjects show an effect
of boundary strength. Three levels of boundary strengths are distinguished for all
subjects. Subject B distinguishes three levels of duration, with sentences 1-3 showing
shorter opening movements than sentences 4 and 5, which in turn have shorter opening
movements than sentence 6 (1-3 < 4, 5 < 6). Subject E also distinguishes three levels of
lengthening, with sentences 1-5 having a shorter opening movement than sentence 6, and
sentence 3 having less lengthening than sentence 5 (3 < 5 < 6).
5
Subject R also
29
5
When establishing the number of levels of boundary strength, we count the maximum distinctions made.
For example, subject E distinguishes for the opening movement sentences 1-5 from sentence 6, which gives
two levels of boundary strength, but in addition, sentence 3 is distinguished from sentence 5. So there are
30
distinguishes three levels of lengthening, with sentences 1-5 having a shorter opening
movement than sentence 6, and sentences 1-3 also showing less lengthening than
sentence 5 (1-3 < 5 < 6). Times-to-peak-velocity for the C
1
opening movement show
results in the same direction, although fewer levels are distinguished. All subjects
distinguish two levels of duration. For subject B sentences 1-4 and 6 have shorter time-
to-peak-velocity than sentence 5, an effect not quite predicted, as sentence 6 is the
sentence with the strongest boundary. For subjects E and R the sentences 1-5 have shorter
time-to-peak-velocity than sentence 6.
Further away from the boundary, C
1
closing movement, subject B distinguishes
two levels of duration, sentence 1 has shorter closing movement duration than sentences
3-6. Subject E also distinguishes two levels of duration, sentences 1-5 have shorter
duration for the closing movement than sentence 6. Subject R distinguishes 3 levels of
duration, with sentences 1-5 having shorter closing movement than sentence 6, and
sentence 1 also having shorter closing movement than sentence 5 (1 < 5 < 6). Time-to-
peak-velocity shows results in a similar direction, for two subjects (R and E), although
the effect of the boundary is weaker than for the closing movement duration. For subject
R, sentences 1-5 have shorter closing movement time-to-peak-velocity than sentence 6,
and sentence 1 in addition has a shorter time-to-peak-velocity than sentences 4 and 5, so
there are three levels of duration of time-to-peak-velocity (1 < 4, 5 < 6), similar to that
subject’s closing movement duration. Subject E distinguishes two levels of boundary
three levels of boundary strength distinguished, namely sentence 3 is distinguished from sentences 5 and 6,
and in turn sentence 5 is distinguished from sentence 6, so the three levels are 3<5<6.
31
strength in terms of time-to-peak-velocity: sentence 1 has a shorter time-to-peak-velocity
than sentences 4 and 6, sentence 2 has a shorter time-to-peak-velocity than sentences 4-6,
and sentence 3 has a shorter time-to-peak-velocity than sentence 6. There was no effect
for subject B. The summary of the results for C
1
, showing the different degrees of
boundary strength distinguished,
is given in Table 2.3:
Table 2.3. Summary of temporal boundary effects on pre-boundary C
1
showing the relationship between sentences and
the number of boundary strengths (levels) distinguished for duration and time-to-peak velocity. The consonant string is
[C
3
C
2
C
1
#.
Subject B Subject E Subject R
C
1
C
1
C
1
Constriction
closing opening closing opening closing opening
Duration Sentence
1<3,4,5,6
2 levels
Sentence
1,2,3<4,5<6
3 levels
Sentence
1,2,3,4,5<6
2 levels
Sentence
1,2,3,4,5<6
3<5
3 levels:
3<5<6
Sentence
1,2,3,4,5<6
1<5
3 levels:
1<5<6
Sentence
1,2,3,4,5<6
1,2,3<5
3 levels:
1,2,3<5<6
Time-to-peak
velocity
n.s. Sentence
1,2,3,4,6<5
2 levels
Sentence
1<4,6;
2<4,5,6;
3<6;
2 levels
Sentence
1,2,3,4,5<6
2 levels
Sentence
1,2,3,4,5<6;
1<4,5;
3 levels:
1<4,5<6
Sentence
1,2,3,4,5<6
2 levels
For the consonant opening movement C
2
, further before the boundary, there is no
effect for either duration or time-to-peak-velocity.
For the C
2
consonant closing movement duration one subject (R) has an effect,
distinguishing two levels, such that sentence 3 has shorter closing movement than
sentence 6 and sentences 1, 2 and 5 have shorter closing movement duration than
sentences 4 and 6. For time-to-peak velocity for C
2
consonant closing movement, two
subjects show an effect (B and R). Subject B, who didn’t show an effect for C
2
closing
movement duration, distinguishes two levels of duration for time-to-peak-velocity,
namely, sentences 1, 2, 4 and 5 have shorter time-to-peak-velocity duration than sentence
32
3, an effect not in the predicted direction, as sentence 3 has a weaker boundary than
sentences 4 and 5. It is unclear whether this effect, which is discontinuous and not in the
predicted direction, is a boundary effect. Subject R distinguishes three levels of duration
for this variable, such that sentences 1- 5 have shorter time-to-peak-velocity than
sentence 6, and sentence 2 has shorter time-to-peak-velocity than sentences 4 and 6 (2 < 4
< 6). The summary of the results for C
2
is given in Table 2.4:
Table 2.4. Summary of temporal boundary effects on pre-boundary C
2
, showing the relationship between sentences and
the number of boundary strengths (levels) distinguished for duration and time-to-peak velocity. The consonant string is
[C
3
C
2
C
1
#.
Subject B Subject E Subject R
C
2
C
2
C
2
Constriction
closing opening closing opening closing opening
Duration n.s. n.s n.s. n.s. Sentence
3<6;
1,2,5<4,6;
2 levels
n.s.
Time-to-peak
velocity
Sentence
1,2,4,5<3
2 levels
n.s. n.s. n.s. Sentence
1,2,3,4,5<6;
2<4,6
3 levels:
2<4<6;
n.s.
For the pre-boundary consonant furthest away from the boundary, C
3
, there was
no effect on the opening movement. The closing movement could not reliably be
identified for two subjects, and subject B distinguishes two levels of length, sentence 3
has shorter duration of the closing movement than sentences 1, 2, 5 and 6, and sentence 4
has shorter closing movement duration than sentences 1 and 6, again an unpredicted
effect. The summary of the results for C
3
is given in Table 2.5:
33
Table 2.5. Summary of temporal boundary effects on pre-boundary C
3
, showing the relationship between sentences and
the number of boundary strengths (levels) distinguished for duration and time-to-peak velocity. The consonant string is
[C
3
C
2
C
1
# .
Subject B Subject E Subject R
C
3
C
3
C
3
Constriction
closing opening closing opening closing opening
Duration Sentence
3<1,2,5,6;
4<1,6;
2 levels
n.s. NA n.s. NA n.s.
Time-to-peak
velocity
n.s. n.s. NA n.s. NA n.s.
Note that the effect for C
3
closing movement duration is not likely to be a
boundary effect, in part because the effect would be discontinuous (there is no temporal
or spatial effect on the C
3
opening movement), contrary to the predictions based on prior
findings. Furthermore, the effect is on sentences 3 and 4, and while all six stimuli have
the string donut and a sweet, sentences 3 and 4 are preceded by eating a, and the other
four sentences are preceded by like a (sentences 1, 5 and 6) and mocha (sentence 2).
While it is not clear how this particular context would affect the results in the observed
direction, it seems likely that these are not boundary effects but contextual effects having
to do with how this particular speaker articulates the velar [ ŋ] versus [k]. Another
possible explanation is that the distance between C
3
([D] in donut) and the preceding
prominent syllable causes in some way the difference in duration: in sentences 3 and 4
the prominent syllable (underlined) is two syllables away from [D] (eating a donut) while
in sentences 1, 2, 5 and 6 the prominent syllable is one syllable away from [D] (mocha
donut; like a donut).
34
Tables 2.6-2.11 summarize the results of the analysis for the duration and time-to-
peak-velocity of the closing and opening movement for the pre-boundary constrictions
for each subject.
35
Table 2.6. Results for pre-boundary opening and closing movement duration (in ms) for subject B, (ANOVA, means
and standard deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
F(5,14)=7.067,
p=.0017
Sentence means
(SD):
Sent.1, n=3, 88
(3);
Sent.2, n=2 ,85 (7)
Sent.3, n=4, 77,(3)
sent.4, n=5,.82 (5)
sent.5, n=1, 89
sent.6, n=5, 91 (2)
Summary:
2 levels:
3<1,2,5,6;
4<1,6;
Differences
between sentence
means
and results of
Fisher’s PLSD
post-hoc tests:
1-3=11, p=.0021;
1-4=6, p=.0384;
2-3=7, p=.0426;
5-3=12, p=.0104;
6-3=13, p=.0001;
6-4=9, p=.0022;
Opening
n.s.
Means:
Sent.1,
n=6, 127
(10);
Sent.2,
n=4, 140
(28);
Sent.3,
n=4, 128
(15);
sent.4, n=5,
122 (12);
sent.5, n=4,
144 (14);
sent.6, n=6,
127 (9);
Closing
n.s.
Means:
Sent.1, n=6, 77
(5);
Sent. 2,n=4, 81
(3);
Sent. 3, n=4, 87
(5);
Sent. 4, n=5, 80
(3);
Sent. 5, n=4, 81
(2);
Sent. 6, n=6, 82
(9);
Opening
n.s.
Means:
Sent.1, n=6, 71 (8);
Sent.2, n=4, 70 (6);
Sent.3, n=4, 76 (8);
sent.4, n=5, 79 (2);
sent.5, n=4, 74 (6);
sent.6, n=6, 79 (6);
Closing
F(5,23)=3.570,
p=.0155;
Means:
Sent. 1, n=6, 67
(7);
Sent. 2, n=4, 75
(20);
Sent. 3, n=4, 86
(9);
Sent. 4, n=5, 87
(9);
Sent. 5, n=4, 87
(9);
Sent. 6, n=6, 81
(4);
Summary:
2 levels:
1<3,4,5,6;
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
3-1=19.61 ,
p=.0061;
4-1=20.41 ,
p=.0028;
5-1=20.83,
p=.0039;
6-1=14.17,
p=.0228;
Opening
F(5,23)=50.78
8, p<.0001
Means:
Sent.1, n=6,
63 (21);
Sent.2, n=4,
79 (19);
Sent.3, n=4,
108 (26);
Sent.4, n=5,
223 (89);
Sent.5, n=4,
297 (69);
Sent.6, n=6,
620 (111);
Summary:
3 levels:
1,2,3<4,5<6;
Differences
between
means
and results of
Fisher’s PLSD
post-hoc tests:
4-1=160,
p=.0010;
4-2=144,
p<.0053
4-3=115,
p=.0218;
5-1=234,
p=<.0001;
5-2=219,
p=.0002;
5-3=190,
p=.0008;
6-1=557,
p<.0001;
6-2=541,
p<.0001;
6-3=512,
p<.0001;
6-4=397,
p<.0001;
6-5=322,
p<.0001;
36
Table 2.7. Results for pre-boundary time-to-peak-velocity (in ms) for subject B. (ANOVA, means and standard
deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
Opening n.s.
Means:
Sent.1, n=6, 87
(9);
Sent.2, n=4, 104
(30);
Sent.3, n=4, 86
(14);
sent.4, n=5, 86
(6);
sent.5, n=4, 106
(16);
sent.6, n=6, 90
(7);
Closing
F(5,23)=2.808,
p<0402;
Means:
Sent.1, n=6, 42 (4);
Sent.2, n=4, 42 (3);
Sent.3, n=4, 52 (6);
Sent.4, n=5, 43 (3);
Sent.5, n=4, 44 (5);
Sent.6, n=6, 47 (7);
Summary:
2 levels:
1,2,4,5<3;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
3-1=10, p=.0050;
3-2=10, p=0094;
3-4=9, p=.0095;
3-5=9, p=.0209;
Opening
n.s.
Means:
Sent.1, n=6, 40
(9);
Sent.2, n=4, 37
(6);
Sent.3, n=4, 39
(6);
Sent.4, n=5, 43
(7);
Sent.5, n=4, 42
(6);
Sent.6, n=6, 42
(4);
Closing
n.s.
Means:
Sent.1, n=6, 32
(3);
Sent.2, n=4, 42
(18);
Sent.3, n=4, 40
(4);
Sent.4, n=5, 40
(5);
Sent.5, n=4, 39
(5);
Sent.6, n=6, 37
(3);
Opening
F(5,23)=39.198,
p<.0001
Means:
Sent.1, n=6, 37 (15);
Sent.2, n=4, 41 (13);
Sent.3, n=4, 55 (17);
Sent.4, n=5, 66 (19);
Sent.5, n=4, 237
(51);
Sent.6, n=6, 56 (23);
Summary:
2 levels:
1,2,3,4,6<5;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
5-1=200, p<.0001;
5-2=196, p<.0001;
5-3=182, p<.0001;
5-4=171, p<.0001;
5-6=182, p<.0001;
37
Table 2.8. Results for pre-boundary opening and closing movement duration (in ms) for subject E. (ANOVA, means
and standard deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
Opening
n.s.
Means:
Sent.1, n=5, 155
(9);
Sent.2, n=5, 155
(10);
Sent.3, n=6, 161
(5);
sent.4, n=6, 155
(10);
sent.5, n=5, 153
(10);
sent.6, n=5, 154
(15);
Closing
n.s.
Means:
Sent.1, n=5, 99
(7);
Sent. 2, n=5, 102
(6);
Sent. 3, n=6, 101
(5);
Sent. 4, n=6, 104
(10);
Sent. 5, n=5, 115
(19)
sent. 6, n=5, 106
(2)
Opening
n.s.
Means:
Sent.1, n=5, 87
(3);
Sent.2, n=5, 83
(11);
Sent.3, n=6, 86
(5);
Sent.4, n=6, 83
(12);
Sent.5, n=5, 92
(8);
Sent.6, n=5, 81
(7);
Closing
F(5,22)=8.044,
p=.0002
Sent.1, n=3, 97 (17);
Sent.2, n=3, 93 (20);
Sent.3, n=6, 107
(24);
Sent.4, n=6, 123
(20);
Sent.5, n=5, 122
(12);
Sent.6, n=5, 185
(42);
Summary:
2 levels:
1,2,3,4,5<6
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
6-1=87.91, p<.0001;
6-2=91.33, p<.0001;
6-3=78.08, p<.0001;
6-4=61.34, p=.0005;
6-5=62.77, p=.0007;
Opening
F(5,21)=6.342,
p=.0010
Means:
Sent.1, n=3, 72
(16);
Sent.2, n=3, 47
(25);
Sent.3, n=6, 45
(24);
Sent.4, n=6, 134
(109);
Sent.5, n=4, 177
(117);
Sent.6, n=5, 303
(111);
Summary:
3 levels:
1,2,3,4,5<6
3,<5
3 levels :
3<5<6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
5-3=132,
p=.0263;
6-1= 232,
p=0013;
6-2=257,
p=.0005;
6-3=258,
p<.0001
6-4=169,
p=.0037;
6-5=126,
p=.0395;
38
Table 2.9. Results for pre-boundary time-to-peak-velocity (in ms) for subject E. (ANOVA, means and standard
deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
Opening
n.s.
Means:
Sent.1, n=5,
106 (7);
Sent.2, n=5,
110 (10);
Sent.3, n=6,
110 (6);
sent.4, n=6,
107 (9);
sent.5, n=5,
106 (2);
sent.6, n=5,
106 (10);
Closing
n.s.
Means:
Sent.1, n=5,
54 (8);
Sent.2, n=5,
59 (10);
Sent.3, n=6,
60 (5);
sent.4, n=6, 60
(12);
sent.5, n=72,
(19);
sent.6, n=5,
69 (5);
Opening
n.s.
Means:
Sent.1, n=5,
46 (4);
Sent.2, n=5,
42 (8);
Sent.3, n=6,
42 (6);
Sent.4, n=6,
42 (9);
Sent.5, n=5,
53 (10);
Sent.6, n=5,
42 (6);
Closing
F(5,25)=3.661,
p=.0127;
Means:
Sent. 1, n=5, 44 (11);
Sent. 2, n=5, 40 (12);
Sent. 3, n=6, 57 (23);
Sent. 4, n=6, 75 (16);
Sent. 5, n=5, 70 (6);
Sent. 6, n=4, 86 (43);
Summary
2 levels:
1<4,6;
2<4,5,6;
3<6;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
4-1=31, p=.0209;
4-2=35, p=.0100;
5-2=30, p=.0313,
6-1=42, p=.0056;
6-2=46, p=.0027;
6-3=30, p=.0364;
Opening
F(5,22)=7.710,
p=.0003
Means:
Sent.1, n=3, 43 (19);
Sent.2, n=3, 28 (19);
Sent.3, n=6, 27 (16);
Sent.4, n=6, 62 (35);
Sent.5, n=5, 68 (44);
Sent.6, n=5, 214
(115);
Summary
2 levels:
1,2,3,4,5<6;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
6-1=171, p=.0004;
6-2=186, p=.0002 ;
6-3=188, p<.0001;
6-4=152, p=.0002 ;
6-5=147, p=.0004;
39
Table 2.10. Results for pre-boundary opening and closing movement duration (in ms) for subject R. (ANOVA, means
and standard deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
Opening
n.s.
Means:
Sent.1, n=4, 117
(13);
Sent.2, n=6, 117
(8);
Sent.3, n=6, 113
(24);
sent.4, n=5, 137
(5);
sent.5, n=6, 115
(7);
sent.6, n=6, 118
(22);
Closing
F(5,27)=5.210,
p=.0018
Sent.1, n=4, 76
(6);
Sent. 2, n=6, 79
(7);
Sent. 3, n=6, 83
(3);
Sent. 4, n=5, 88
(7);
Sent. 5, n=6, 80
(5);
Sent. 6, n=6, 90
(3);
Summary
3<6;
1,2,5<4,6;
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
4-2= 8.64,
p=.0122;
6-2=10.65,
p=.0018;
6-3=7.36,
p=.0236;
4-5=7.97,
p=.0198;
4-1=11.75,
p=.0027;
6-5=9.97,
p=.0031;
6-1=13.76,
p=.0004;
Opening
n.s.
Means:
Sent.1, n=4, 87
(6);
Sent.2, n=6, 93
(12);
Sent.3, n=6, 91
(7);
sent.4, n=5, 92
(6);
sent.5, n=6, 89
(6);
sent.6, n=6, 102
(6);
Closing
F(5,27)=7.871,
p=.0001
Means:
Sent. 1, n=4, 56
(16);
Sent. 2, n=6, 65
(26);
Sent. 3, n=6, 71 (7);
Sent. 4, n=5, 81
(12);
Sent. 5, n=6, 84
(14);
Sent. 6, n=6, 122
(29);
Summary
3 levels: 1<5<6;
1,2,3,4,5<6;
1<5;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
5-1=27.95, p=.0347;
6-1=66.17, p<.0001;
6-2=57.29, p<.0001;
6-3=51.47, p<.0001;
6-4=41.44, p=.0016;
6-5=38.22, p=.0021;
Opening
F(5,27)=9.112,
p<.0001
Means:
Sent.1, n=4, 31
(25);
Sent.2, n=6, 37
(21);
Sent.3, n=6, 54
(8);
Sent.4, n=5, 76
(15);
Sent.5, n=6, 113
(50);
Sent.6, n=6, 173
(79);
Summary
3 levels:
1,2,3<5<6;
1,2,3,4,5<6;
1,2,3<5;
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
5-1=82,
p=.0059;
5-2=76,
p=.0047;
5-3=59,
p=.0233;
6-1=142,
p<.0001;
6-2=136,
p<.0001;
6-3=119,
p<.0001;
6-4=97,
p=.0008;
6-5=60,
p=.0218;
40
Table 2.11. Results for pre-boundary time-to-peak-velocity (in ms) for subject R. (ANOVA, means and standard
deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
Opening
n.s.
Sent.1, n=4, 86
(14);
Sent.2, n=6, 88
(9);
Sent.3, n=6, 78
(26);
Sent.4, n=5,
102 (8);
Sent.5, n=6, 81
(4);
Sent.6, n=6,
81 (21);
Closing
F(5,27)=5.602,
p=.0011
Means:
Sent.1, n=4, 39
(3);
Sent.2, n=6, 38
(3);
Sent.3, n=6, 40
(4);
Sent.4, n=5, 46
(6);
Sent.5, n=6, 45
(9);
Sent.6, n=6, 53
(5);
Summary:
3 levels:
2<4<6;
1,2,3,4,5<6;
2<4;
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
4-2=8,
p=.0396;
6-1=15,
p=.0006;
6-2=15,
p=.0001;
6-3=13,
p=.0005;
6-4=7,
p=.0458;
6-5=8,
p=.0192;
Opening
n.s.
Means:
Sent.1, n=4, 49 (9);
Sent.2, n=6, 45 (5);
Sent.3, n=6, 49 (2);
Sent.4, n=5, 43 (3);
Sent.5, n=6, 42 (5);
Sent.6, n=6, 48 (7);
Closing
F(5,27)=8.077,
p<.0001;
Means:
Sent.1, n=4, 26
(10);
Sent.2, n=6, 36
(16);
Sent.3, n=6, 37
(9);
Sent.4, n=5, 49
(12);
Sent.5, n=6, 47
(7);
Sent.6, n=6, 70
(16);
Summary:
3 levels:
1<4,5<6;
1,2,3,4,5,<6;
1<4,5;
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
4-1=23, p=0107;
5-1=21, p=.0129;
6-1=44, p<.0001;
6-2=34, p<.0001;
6-3=33, p<.0001;
6-4=21, p=.0092;
6-5=22, p=.0040;
Opening
F(5,27)=5.05
4,
p<.0021;
Means:
Sent.1, n=4,
21 (13);
Sent.2, n=6,
23 (13);
Sent.3, n=6,
30 (5);
Sent.4, n=5,
35 (6);
Sent.5, n=6,
64 (35);
Sent.6, n=6,
120 (86);
Summary:
2 levels
1,2,3,4,5<6;
Differences
between
means
and results of
Fisher’s
PLSD post-
hoc tests:
6-1=99,
p=.0008;
6-2=96,
p=.0003;
6-3=90,
p=.0007;
6-4=85,
p=.0018;
6-5=56,
p=.0247;
41
6.2. Pre-boundary Spatial Results ([C
3
C
2
C
1
#)
The means and significant sentence comparisons for the pre-boundary spatial results are
given in Tables 2.15 to 2.17 at the end of this section. A step-by-step summary of the
effects of the boundary on each constriction is also given after the results for that
constriction are reported.
The spatial results show overall similar patterns to the temporal effects. Closest to
the boundary, at C
1
opening movement, all subjects show an effect such that sentences
with stronger prosodic boundaries show greater spatial displacement than those with
weaker boundaries. Subject B distinguishes three levels of displacement, such that
sentence 1 shows less displacement than sentences 2-6, and sentence 2 in turn shows less
displacement than sentences 5 and 6 (1 < 2 < 5, 6). Subject E distinguishes two levels of
displacement, such that sentences 2 and 3 show less displacement than sentences 4-6.
Subject R also distinguishes two groups of displacement strengths, such that sentences 1
and 2 show less displacement for C
1
opening movement than sentences 4-6. Sentence 3
shows less displacement than sentences 5 and 6.The effects are similar to the temporal
properties, where for the duration of the opening movement all subjects distinguished
three levels, and for time-to-peak-velocity all subjects distinguished two levels of
duration.
Further before the boundary, C
1
closing movement, two subjects (B and R) show
an effect of the boundary. Subject B distinguishes two levels of displacement strength,
such that sentence 1 shows less displacement than sentences 3, 4 and 6, and sentence 2
shows less displacement than sentences 3 and 6. Subject R distinguishes two groups of
displacement strength, namely sentences 1 and 2 show less displacement than sentences
42
4-6, and sentence 3 shows less displacement than sentence 6.This is a weaker effect than
the temporal effects, where all subjects showed effects, distinguishing between two and
three levels of constriction duration, and two subjects had time-to-peak velocity effects,
distinguishing two (subject E) and three (subject R) levels of duration. The summary of
the results for C
1
is given in Table 2.12.
Table 2.12. Summary of results for pre-boundary C
1
displacement, showing the relationship between sentences and the
number of boundary strengths (levels) distinguished for displacement. The consonant string is [C
3
C
2
C
1
#.
Subject B Subject E Subject R
C
1
C
1
C
1
Constriction
closing opening closing opening closing opening
Displacement Sentence
1<3,4,6;
2<3,6
2 levels
Sentence
1<2,3,4,5,6;
2<5,6
3 levels:
1<2<5,6
n.s. Sentence
2,3<4,5,6
2 levels
Sentence
1,2<4,5,6
3<6
2 levels
Sentence
1,2<4,5,6
3<5,6
2 levels
At pre-boundary C
2
there is an effect for subject B. For C
2
opening movement,
subject B has three levels of displacement strength: sentences 1, 2, 4 and 5 show less
displacement than sentence 6, sentence 1 shows additionally less displacement than
sentences 3 and 4, and sentence 2 less displacement than sentence 3. The three levels of
displacement are 1 < 4 < 6. At C
2
closing movement, this subject distinguishes two levels
of amount of displacement: sentences 1 and 5 show less displacement than sentences 3
and 6, and sentence 4 shows less displacement than sentence 6. For this subject then the
most salient pre-boundary results are the spatial not the temporal effects, since although
the temporal effects extend two constrictions overall, the only effect on C
2
was
discontinuous from C
1
and not in the predicted direction. This is different for the spatial
effects, where we see a continuous effect in the predicted direction on all movements of
43
the pre-boundary C
1
and C
2
constrictions (with one exception, namely subject B’s closing
C
2
movement shows, among effects in the predicted direction, also 5 < 3). The summary
of the results for C
2
is given in Table 2.13.
Table 2.13. Summary of results for pre-boundary C
2
displacement, showing the relationship between sentences and the
number of boundary strengths (levels) distinguished for displacement. The consonant string is [C
3
C
2
C
1
#.
Subject B Subject E Subject R
C
2
C
2
C
2
Constriction
closing opening closing opening closing opening
Displacement Sentence
1,5<3,6;
4<6
2 levels
Sentence
1,2,4,5<6;
1<3,4;
2<3;
3 levels:
1<4<6
n.s. n.s. n.s. n.s.
At C
3
opening movement, only subject R shows an effect and distinguishes two
levels of boundary strength; namely sentences 1, 2 and 6 show less displacement than
sentence 4 and sentence 2 shows less displacement than sentence 3. The C
3
closing
movement could not reliably be identified for two subjects (E and R), and subject B does
not show any displacement effects of the boundary. The summary of the results for C
3
is
given in Table 2.14.
Table 2.14. Summary of results for pre-boundary C
3
displacement, showing the relationship between sentences and the
number of boundary strengths (levels) distinguished for displacement. The consonant string is [C
3
C
2
C
1
#.
Subject B Subject E Subject R
C
3
C
3
C
3
Constriction
closing opening closing opening closing opening
Displacement n.s. n.s. NA n.s. NA Sentence
1,2,6<4;
2<3
2 levels
44
Note that is not clear whether the effects on C
3
opening movement for subject R are
boundary effects. There are no temporal effects on this articulatory movement, and the
spatial effects are not quite in the expected direction, in that it is sentences 3 and 4, in the
medium range of boundary strength, that exhibit an effect. Note that we argued in section
6.1 that sentences 3 and 4 exhibit contextual effects on the C
3
closing movement for
subject B. Given this, and the discontinuity and direction of the effect, in what follows we
will not consider these spatial effects to be driven by the boundary.
Tables 2.15-2.17 show the results of the analysis for the displacement of the
closing and opening movement for the pre-boundary constrictions for the three subjects.
45
Table 2.15. Results for pre-boundary displacement (in mm) for subject B. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of individual
sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
n.s.
Means:
Sent.1,
n=3, 9.7
(.2);
Sent.2,
n=2, 10.2
(2.2);
Sent.3,
n=4, 8.5
(1.1);
Sent.4,
n=5, 8.6
(1.3);
Sent.5,
n=1, 9.5 (-
);
Sent.6,
n=5, 9.1
(1);
Opening
n.s.
Means:
Sent.1, n=6,
9.6 (1);
Sent.2, n=4,
10.1 (2.5);
Sent.3, n=4,
10.2 (1.5);
Sent.4, n=5, 9
(1);
Sent.5, n=4, 9
(.8);
Sent.6, n=6,
10.6 (1);
Closing
F(5,23)=3.514,
p=.0166;
Means:
Sent.1, n=6, 7.7
(1);
Sent.2, n=4, 9.2
(1.8);
Sent.3, n=4, 10.4
(1.5);
Sent.4, n=5, 8.6
(1.3);
Sent.5, n=4, 7.9
(1.1);
Sent.6, n=6, 10.6
(2);
Summary:
2 levels:
1,5<3,6
4<6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
3-1= 2.8, p=.0085;
3-5= 2.5, p=.0280;
6-1= 2.9, p=.0025;
6-4=2,
p=.0393;
6-5=2.6, p=.0121;
Opening
F(5,23)=5.990
p=.0011
Means:
Sent.1, n=6, 6.5
(1.7);
Sent.2, n=4, 7.5
(1.7);
Sent.3, n=4, 10.2
(2);
Sent.4, n=5, 8.9
(.7);
Sent.5, n=4, 7.7
(1.4);
Sent.6, n=6, 11.7
(2.6);
Summary:
3 levels:
1,2,4,5<,6
1<3,4
2<3
3 levels: 1<4<6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
3-1=3.75
(p=.0042);
3-2=2.77
(p=.0432);
4-1=2.36
(p=.0447);
6-1=5.15
(p<.0001);
6-2=4.17
(p=.0018);
6-4=2.79
(p=.0193);
6-5=3.93
(p=.0029);
Closing
F(5,23)=4.191,
p=.0075;
Means:
Sent.1, n=6, 5.8
(1.8);
Sent.2, n=4, 6.7
(1.9);
Sent.3, n=4, 9.7
(2.3);
Sent.4, n=5, 8.4
(.9);
Sent.5, n=4, 7.5
(1.3);
Sent.6, n=6, 9.4
(1.7);
Summary:
2 levels:
1<3,4,6
2<3,6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
3-1=3.8, p=.0019;
3-2=3, p=.0204;
4-1=2.6, p=.0180;
6-1=3.6, p=.0012;
6-2=2.7, p=.0199;
Opening
F(5,23)=9.009
p<.0001
Means:
Sent.1, n=6, 4.3
(2.1);
Sent.2, n=4, 6.7
(1.5);
Sent.3, n=4, 7.9
(1.3);
Sent.4, n=5, 8.9
(1.7);
Sent.5, n=4,
10.2 (1.8);
Sent.6, n=6, 9.1
(1.1);
Summary:
3
levels:1<2<5,6
1<2,3,4,5,6
2<5,6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
2-1=2.42
(p=.0301);
3-1=3.62
(p=.0021);
4-1=4.62
(p<.0001);
5-1=5.9
(p<.0001);
5-2=3.49
(p=.0057);
6-1=4.82
(p<.0001);
6-2=2.4
(p=.0308);
46
Table 2.16. Results for pre-boundary displacement (in mm) for subject E. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of individual
sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
Could not be
reliably
identified
Opening
n.s.
Means:
Sent.1, n=5,
12.6 (1.2);
Sent.2, n=5,
12.8 (.9);
Sent.3, n=6,
13 (1);
sent.4, n=6,
11.6 (1.1);
sent.5, n=5,
12.6 (1.1);
sent.6, n=5,
13.1 (2);
Closing
n.s.
Means:
Sent.1, n=5,
10.3 (1);
Sent.2, n=5,
10 (.2);
Sent.3, n=6,
11.3 (.9);
Sent.4, n=6,
10.3 (.9);
Sent.5, n=5,
10.6 (.7);
Sent.6, n=5,
10.8 (2);
Opening
n.s.
Means:
Sent.1, n=5,
8.3 (1.5);
Sent.2, n=5,
8.1 (2.3);
Sent.3, n=6,
8.2 (1.7);
sent.4, n=6,
8.6 (2.9);
sent.5, n=5,
8.7 (.7);
sent.6, n=5,
9.8 (1.9);
Closing
n.s.
Means:
Sent.1, n=3, 7.4 (2.3);
Sent.2, n=3, 6.3 (2.4);
Sent.3, n=6, 6.8 (2.5);
Sent.4, n=6, 8.2 (2.1);
Sent.5, n=5, 8.8 (1);
Sent.6, n=5, 9.5 (2.1);
Opening
F(5,21)=6.342
(p=.0010)
Means:
Sent.1, n=3, 3.1 (.7);
Sent.2, n=3, 1 (1);
Sent.3, n=6, 1.2
(1.2);
Sent.4, n=6, 5.1
(2.2);
Sent.5, n=4, 5.6
(2.9);
Sent.6, n=5, 5.8
(1.8);
Summary:
2 levels: 2,3<4,5,6
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
4-2=4.17 (p=.0046);
4-3=3.9 (p=.0016);
5-2=4.67 (p=.0035);
5-3=4.4 (p=.0014);
6-2=4.87 (p=.0017;
6-3=4.61 (p=.0005);
47
Table 2.17. Results for pre-boundary displacement (in mm) for subject R. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of individual
sentences). The consonant string is: [C
3
C
2
C
1
#.
C3 C2 C1
Closing
could not be
reliably
identified
Opening
F(5,27)=3.124,
p=.0237
Means:
Sent.1, n=4,
8.7 (.7);
Sent.2, n=6,
8.2 (.7);
Sent.3, n=6,
9.5 (1);
sent.4, n=5, 10
(.6);
sent.5, n=6, 9.1
(1);
sent.6, n=6, 8.5
(1);
Summary:
2 levels:
1,2,6<4
2<3
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
3-2=1.3,
p=.0156;
4-1=1.3,
p=.0386;
4-2=1.8,
p=.0023
4-6=1.5,
p=.0101
Closing
n.s.
Means:
Sent.1, n=4, 8.3
(1.1);
Sent.2, n=6, 9
(1.1);
Sent.3, n=6,
9.6 (1.4);
Sent.4, n=5,
10.3 (.7);
Sent.5, n=6,
9.4 (1.2);
Sent.6, n=6, 9.9
(.6);
Opening
n.s.
Means:
Sent.1, n=4, 11.3 (1);
Sent.2, n=6, 11.2 (1.5);
Sent.3, n=6, 11.6 (1.2);
sent.4, n=5, 12.1 (.7);
sent.5, n=6, 12.4 (1.3);
sent.6, n=6, 12.9, (1);
Closing
F(5,27)= 5.529
p=.0012
Means:
Sent.1, n=4, 4.5
(3.7);
Sent.2, n=6, 5.1
(4);
Sent.3, n=6, 7.3
(3.1);
Sent.4, n=5, 9.2
(1.6);
Sent.5, n=6, 10.5
(2.1);
Sent.6, n=6, 11.2
(.7);
Summary:
2 levels
1,2<4,5,6
3<6
4-1=4.8,
p=.0155;
4-2=4.1,
p=.0203;
5-1=6.1,
p=.0020;
5-2=5.4,
p=.0021;
6-1=6.7,
p=.0008;
6-2=6.1,
p=.0007;
6-3=3.9,
p=.0203;
Opening
F(5,27)=6.863,
p=.0003
Means:
Sent.1, n=4,
1.1 (1.6);
Sent.2, n=6,
2.1 (2.3);
Sent.3, n=6,
4.5 (2.8);
Sent.4, n=5,
6.7 (1.8);
Sent.5, n=6,
8.6 (1.4);
Sent.6, n=6,
8.1 (4.6);
Summary:
2 levels:
1,2<4,5,6
3<5,6
Differences
between
means
and results of
Fisher’s PLSD
post-hoc tests:
4-1=5.58,
p=.0053;
5-1=7.49,
p=.0002;
6-1=7,
p=.0005;
4-2=4.61,
p=.0098;
5-2=6.52,
p=.0003;
6-2=6.03,
p=.0007;
5-3=4.07
(p=.0159)
6-3=3.59
(p=.0317)
6.3. Post-boundary Temporal Results (# C
1
C
2
C
3
])
The post-boundary constrictions are schematically represented in Figure 2.5.
Figure 2.5. Schematic representation of the post-boundary constrictions.
The means and significant results for the post-boundary temporal effects are given in
Tables 2.21 to 2.26 at the end of this section. A summary of the effects of the boundary
on each constriction is given after the results for that constriction are reported.
Immediately following the boundary is a vowel constriction. For the following
consonant constriction C
1
, only one subject (B) shows an effect of boundary on temporal
properties. At C
1
closing movement, subject B shows an effect for time-to-peak-velocity.
Two levels are distinguished, namely sentences 1-3 show shorter time-to-peak-velocity
than sentences 4-6. This subject also shows an effect on time-to-peak-velocity for the C
1
opening movement. However, the effect is in the opposite direction, namely shortening.
Two levels are distinguished, sentences 4 and 6 show shorter time-to-peak-velocity than
48
49
sentence 3, and sentences 4-6 show shorter time-to-peak-velocity than sentence 1. There
are no effects for the other two subjects.
6
The results are shown in Table 2.18.
Table 2.18. Summary of temporal boundary effects on post-boundary C
1
, showing the relationship between sentences
and the number of boundary strengths (levels) distinguished for duration and time-to-peak velocity. The consonant
string is # C
1
C
2
C
3
].
Subject B Subject E Subject R
C
1
C
1
C
1
Constriction
closing opening closing opening closing opening
Duration n.s. n.s. n.s. n.s. n.s.
Time-to-peak
velocity
Sentence
1,2,3<4,5,6
2 levels
Sentence
4,6<3;
4,5,6<1
2 levels
n.s. NA n.s. n.s.
At the consonant further away after the boundary, C
2
, only subject B has an effect
on time-to-peak-velocity for the closing movement, such that sentences 1, 3 and 4 have
shorter time-to-peak-velocity than sentence 6. The results for C
2
are shown in Table 2.19.
Table 2.19. Summary of temporal boundary effects on post-boundary C
2
, showing the relationship between sentences
and the number of boundary strengths (levels) distinguished for duration and time-to-peak velocity. The consonant
string is # C
1
C
2
C
3
].
Subject B Subject E Subject R
C
2
C
2
C
2
Constriction
closing opening closing opening closing opening
Duration n.s. n.s. NA NA n.s. n.s.
Time-to-peak
velocity
Sentence
1,3,4<6
2 levels
n.s. NA NA n.s. n.s.
For consonant C
3
, there is no effect on the closing movement, and subjects show a
somewhat unsystematic effect on the opening movement. Subject B shows an effect on
6
Note that for subject E the [s] constriction (C
2
) could not be tracked since the constriction was formed far
away from the tongue tip where the sensor was placed, so C
1
opening movement duration, C
2
closing
movement and time-to-peak-velocity, C
2
opening movement and time-to-peak-velocity and C
3
closing
movement and time-to-peak-velocity could not be obtained.
50
duration for opening movement, distinguishing two levels of duration, namely sentence 4
has shorter opening movement duration than the other sentences. Subjects E and R show
an effect on C
3
opening movement duration and time-to-peak-velocity. Subject E has
shorter opening movement duration for sentence 3 than for sentences 1, 2, 5 and 6, and
sentence 4 has shorter opening movement duration than sentences 2, 5 and 6. For opening
movement time-to-peak-velocity for subject E, sentences 3 and 4 show shorter time-to-
peak-velocity than sentences 1, 5 and 6. Subject R has shorter opening movement
duration and shorter time-to-peak-velocity for sentences 3 and 4 than for sentences 1, 2, 5
and 6. Table 2.20 summarizes the results for the temporal effects of the boundary on
post-boundary C
3
.
Table 2.20. Summary of temporal boundary effects on post-boundary C
3
, showing the relationship between sentences
and the number of boundary strengths (levels) distinguished for duration and time-to-peak velocity. The consonant
string is # C
1
C
2
C
3
].
Subject B Subject E Subject R
C
3
C
3
C
3
Constriction
closing opening closing opening closing opening
Duration n.s. Sentence
4<1,2,3,5,6
2 levels
NA Sentence
3<1,2,5,6;
4<2,5,6
2 levels
n.s. Sentence
3,4<1,2,5,6
2 levels
Time-to-peak
velocity
n.s. n.s. NA Sentence
3,4<1,5,6
2 levels
n.s. Sentence
3,4<1,2,5,6
2 levels
Note that the effects on C
3
are not predicted in direction, as sentences 3 and 4, which are
the ones with the shorter duration, should not have shorter duration than the sentences
with weaker boundaries (sentence 1 and 2), and also, for subjects E and R, these effects
are the only post-boundary temporal effects, whereas the prediction based on prior
findings is that the effects of the boundary are continuous. Similarly, for subject B, the
51
effects extended till C
2
closing movement, so for this subject an effect on C
3
opening
movement would be discontinuous as well. This indicates that the effects shown on C
3
may not be related to the boundary. Furthermore, for all subjects these effects relate to
shorter durations for sentences 3 (subjects R and E) and 4 (subject B). These sentences
differ from the other four sentences in the phonetic content. Looking at the stimuli, it can
be seen that while in all cases there is a non-high vowel following the string donut and a
sweet, the following segments differ. In sentences 1, 2, 5 the controlled string is followed
by the word apple, in sentences 3 and 4 by angora and in sentence 6 by the word hostess,
so the effects on C
3
seem to be co-articulatory effects. In the following, we will therefore
not consider these effects to be boundary-related effects.
7
Tables 2.21-2.26 show the
results of the analysis for the duration of the closing and opening movement and time-to-
peak velocity for the post-boundary constrictions.
7
The only problem with that explanation is that for subject B sentences 3 and 4 differ, even though they
both have the sequence donut and a sweet angora.
52
Table 2.21. Results for post-boundary opening and closing movement duration (in ms) for subject B. (ANOVA, means
and standard deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences).The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
n.s.
Means:
Sent.1, n=6, 77
(4);
Sent.2, n=4, 75
(4);
Sent.3, n=4, 76
(5);
sent.4, n=5, 81
(6);
sent.5, n=4, 84
(11);
sent.6, n=6, 82
(3);
Opening
n.s.
Means:
Sent.1, n=6, 94 (6);
Sent.2, n=4, 80 (16);
Sent.3, n=4, 89 (7);
sent.4, n=4, 89 (14);
sent.5, n=3, 78 (8);
sent.6, n=6, 79 (12);
Closing
n.s.
Means:
Sent.1, n=6, 47
(7);
Sent.2, n=4, 74
(23);
Sent.3, n=4, 60
(16);
sent.4, n=4, 54
(26);
sent.5, n=3, 58
(14);
sent.6, n=6, 55
(10);
Opening
n.s.
Means:
Sent.1, n=6,
103 (17);
Sent.2, n=4,
122 (17);
Sent.3, n=4,
128 (15);
sent.4, n=5, 116
(4);
sent.5, n=4, 112
(17);
sent.6, n=6, 99
(17);
Closing
Means:
n.s.
Sent.1, n=6, 78
(16);
Sent.2, n=4, 86
(5);
Sent.3, n=4, 77
(13);
sent.4, n=5, 82
(7);
sent.5, n=4, 85
(6);
sent.6, n=6, 88
(11);
Opening
F(5,22)=3.829;
p=.0120
Means:
Sent.1, n=6,
100 (3);
Sent.2, n=4,
101 (8);
Sent.3, n=4, 96
(5);
Sent.4, n=4, 85
(9);
Sent.5, n=4, 97
(9);
Sent.6, n=6,
101 (4);
Summary:
2 levels
4<1,2,3,5,6;
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
1-4=15,
p=.0015;
2-4=16,
p=.0017;
3-4=11,
p=.0212;
5-4=12,
p=.0118;
6-4=16,
p=.0009;
53
Table 2.22. Results for post-boundary time-to-peak-velocity (in ms) for subject B. (ANOVA, means and standard
deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
F(5,23)=6.143,
p=.0009;
Means:
Sent.1, n=6, 35 (0);
Sent.2, n=4, 37 (3);
Sent.3, n=4, 39 (3);
Sent.4, n=5, 45 (6);
Sent.5, n=4, 45 (6);
Sent.6, n=6, 44 (4);
Summary:
2 levels:
1,2,3<4,5,6;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
4-1=10, p=.0004;
4-2=7, p=.0099;
4-3=6, p=.0287;
5-1=10, p=.0008;
5-2=7, p=.0140;
5-3=6, p=.0378;
6-1=9, p=.0006;
6-2=7, p=.0165;
6-3=5, p=.0479;
Opening
F(5,23)=3.579,
p=.0154;
Means:
Sent.1, n=6, 52 (11);
Sent.2, n=4, 40 (9);
Sent.3, n=4, 50 (12);
Sent.4, n=5, 29 (7);
Sent.5, n=4, 34 (15);
Sent.6, n=6, 32 (14);
Summary:
2 levels:
4,6<3;
4,5,6<1;
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
1-4=23, p=.0033;
1-5=19, p=.0222;
1-6=21, p=.0057;
3-4=21, p=.0145;
3-6=18, p=.0247;
Closing
F(5,21)=
2.907, p=.0379
Means:
Sent.1, n=6, 23
(3);
Sent.2, n=4, 26
(2);
Sent.3, n=4, 24
(2);
Sent.4, n=4, 24
(5);
Sent.5, n=3, 27
(3);
Sent.6, n=6, 30
(4);
Summary:
2 levels:
1,3,4 <6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
6-1= 7,
p=.0034;
6-3=6,
p=.0118;
6-4=6,
p=.0120;
Opening
n.s.
Means:
Sent.1, n=6, 59
(17);
Sent.2, n=4, 75
(33);
Sent.3, n=4, 75
(41);
Sent.4, n=5, 75
(22);
Sent.5, n=4, 56
(20);
Sent.6, n=6, 52
(15);
Closing
n.s.
Means:
Sent.1, n=6, 54
(15);
Sent.2, n=4, 56
(8);
Sent.3, n=4, 54
(11);
sent.4, n=5, 48
(7);
sent.5, n=4, 56
(6);
sent.6, n=6, 55
(13);
Opening
n.s.
Means:
Sent.1, n=6,
47 (5);
Sent.2, n=4,
46 (6);
Sent.3, n=4,
41 (2);
Sent.4, n=5,
48 (12);
Sent.5, n=4,
46 (7);
Sent.6, n=6,
50 (6);
54
Table 2.23. Results for post-boundary opening and closing movement duration (in ms) for subject E. (ANOVA, means
and standard deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
n.s.
Means:
Sent.1, n=3, 62 (13);
Sent.2, n=3, 82 (15);
Sent.3, n=6, 77 (14);
sent.4, n=6, 83 (25);
sent.5, n=4, 81 (11);
sent.6, n=5, 82 (12);
Opening
Could not be
measured
Closing
Could not be
measured
Opening
Could not be
measured
Closing
Could not
be measured
Opening
F(5,26)=7.198,
p=.0002
Summary
2 levels:
3<1,2,5,6;
4<2,5,6;
Means:
Sent.1, n=5, 167
(22);
Sent.2, n=5, 198
(18);
Sent.3, n=6, 111
(14);
Sent.4, n=6, 135
(43);
Sent.5, n=5, 171
(23);
Sent.6, n=5, 187
(38);
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
1-3=55, p<.0036;
2-3=86, p<.0001;
2-4=63, p<.0012;
5-3=59, p<.0021;
5-4=36, p<.0490;
6-3=75, p<.0002;
6-4=52, p<.0059;
55
Table 2.24. Results for post-boundary time-to-peak-velocity (in ms) for subject E. (ANOVA, means and standard
deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
n.s.
Means:
Sent.1, n=3, 32 (8);
Sent.2, n=3, 43 (6);
Sent.3, n=6, 43
(14);
Sent.4, n=6, 47
(19);
Sent.5, n=4, 44 (7);
Sent.6, n=5, 49 (9);
Opening
n.s.
Means:
Sent.1, n=5,
42 (10);
Sent.2, n=5,
46 (7);
Sent.3, n=6,
52 (11);
Sent.4, n=6,
40 (6);
Sent.5, n=5,
43 (4);
Sent.6, n=5,
38 (4);
Closing
Could not be
measured
Opening
Could not be
measured
Closing
Could not be
measured
Opening
F(5,26)=3.844,
p=.0097
Means:
Sent.1, n=5, 61 (5);
Sent.2, n=5, 55 (3);
Sent.3, n=6, 41 (7);
Sent.4, n=6, 44 (6);
Sent.5, n=5, 62
(4);
Sent.6, n=5, 67
(29);
Summary
2 levels:
3,4<1,5,6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
1-3=20, p=.0133;
1-4=17, p=.0358;
5-3=21, p=.0100;
5-4=18, p=.0276;
6-3=26, p=.0020;
6-4=23, p=.0059;
56
Table 2.25. Results for post-boundary opening and closing movement duration (in ms) for subject R. (ANOVA, means
and standard deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the
behavior of individual sentences). The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
n.s.
Sent.1, n=4, 76
(10);
Sent.2, n=6, 72
(10);
Sent.3, n=6, 74
(2);
sent.4, n=5, 72
(8);
sent.5, n=6, 69
(8);
sent.6, n=6, 65
(12);
Opening
n.s.
Means:
Sent.1, n=4, 55
(9);
Sent.2, n=6, 63
(26);
Sent.3, n=6, 54
(11);
sent.4, n=5, 50
(6);
sent.5, n=6, 47
(5);
sent.6, n=6, 50
(6);
Closing
n.s.
Means:
Sent.1, n=4, 100
(34);
Sent.2, n=6, 73
(29);
Sent.3, n=6, 96
(41);
sent.4, n=5, 96
(32);
sent.5, n=6, 107
(31);
sent.6, n=6, 78
(65);
Opening
n.s.
Means:
Sent.1, n=4, 103
(41);
Sent.2, n=6, 147
(44);
Sent.3, n=6, 122
(41);
sent.4, n=5, 133
(25);
sent.5, n=6, 127
(39);
sent.6, n=6, 134
(70);
Closing
n.s.
Means:
Sent.1, n=4, 70
(0);
Sent.2, n=6, 73
(5);
Sent.3, n=6, 70
(3);
sent.4, n=5, 73
(6);
sent.5, n=6, 69
(7);
sent.6, n=6, 72
(4);
Opening
F(5,25)=6.9
88,
p=.0003
Means:
Sent.1, n=3,
110 (5);
Sent.2, n=6,
112 (5);
Sent.3, n=5,
85 (15);
Sent.4, n=5,
78 (21);
Sent.5,
n=6,102
(10);
Sent.6,
n=6,109
(7);
Summary
2 levels:
3,4<1,2,5,6;
Differences
between
means
and results
of
Fisher’s
PLSD post-
hoc tests:
1-3=25,
p=0101;
1-4=32,
p=.0014;
2-3=27,
p=.0013;
2-4=34,
p=.0001;
5-3=17,
p<.0272;
5-4=24,
p=.0028;
6-3=24,
p=.0033;
6-4=31,
p=.0003;
57
Table 2.26. Results for post-boundary time-to-peak-velocity (in ms) for subject R. (ANOVA, means and standard
deviations, summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of
individual sentences). The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
n.s.
Means:
Sent.1, n=4, 44
(7);
Sent.2, n=6, 39
(9);
Sent.3, n=6, 37
(3);
Sent.4, n=5, 40
(3);
Sent.5, n=6, 38
(6);
Sent.6, n=6, 37
(8);
Opening
n.s.
Means:
Sent.1, n=4, 25
(4);
Sent.2, n=6, 25
(0);
Sent.3, n=6, 27
(7);
Sent.4, n=5, 24
(2);
Sent.5, n=6, 24
(4);
Sent.6, n=6, 24
(2);
Closing
n.s.
Means:
Sent.1, n=4, 41
(36);
Sent.2, n=6, 18
(24);
Sent.3, n=6, 27
(4);
Sent.4, n=5, 23
(4);
Sent.5, n=6, 25
(3);
Sent.6, n=6, 24
(2);
Opening
n.s.
Means:
Sent.1, n=4, 65
(32);
Sent.2, n=6, 104
(49);
Sent.3, n=6, 79
(27);
Sent.4, n=5, 90
(27);
Sent.5, n=6, 91
(49);
Sent.6, n=6, 91
(67);
Closing
n.s.
Means:
Sent.1, n=4, 44
(2);
Sent.2, n=6, 45
(4);
Sent.3, n=6, 42
(2);
sent.4, n=5, 46
(6);
sent.5, n=6, 43
(6);
sent.6, n=6, 42
(4);
Opening
F(5,27)=14.4
21, p<.0001;
Means:
Sent.1, n=6,
47 (5);
Sent.2, n=4,
47 (6);
Sent.3, n=4,
41 (2);
sent.4, n=5,
48 (12);
sent.5, n=4,
46 (7);
sent.6, n=6,
50 (6);
Summary:
2 levels:
3,4<1,2,5,6
Differences
between
means
and results
of
Fisher’s
PLSD post-
hoc tests:
1-3=12,
p=.0006;
1-4=15,
p<.0001;
2-3=14,
p<.0001;
2-4=18,
p<.0001;
5-3=11,
p=.0004;
5-4=14,
p<.0001;
6-3=13,
p<.0001;
6-4=17,
p<.0001
58
6.4. Post-boundary Spatial Results ( # C
1
C
2
C
3
])
The means and significant results for the post-boundary spatial effects are given in Tables
2.31-2.33 at the end of this section. A summary of the effects of the boundary on each
individual constriction is given after the results for that constriction are reported.
The post-boundary spatial effects are not very extensive. At the first constriction
closing movement, two subjects (B and E) show an effect. Subject B, who also has an
effect for time-to-peak-velocity, distinguishes two levels of displacement; sentence 1
shows less displacement than sentences 4-6, and sentences 2-3 show less displacement
than sentence 5. Subject E, who did not show temporal effects on C
1
closing movement,
has two levels of displacement, namely sentences 1 and 3 show less displacement than
sentences 4 and 6. No effects were observed for C
1
opening movement. The results for
post-boundary displacement for C
1
are summarized in Table 2.27.
Table 2.27. Summary of results for post-boundary C
1
, showing the relationship between sentences and the number of
boundary strengths (levels) distinguished for displacement. The consonant string is # C
1
C
2
C
3
].
Subject B Subject E Subject R
C
1
C
1
C
1
Constriction
closing opening closing opening closing opening
Displacement Sentence
1<4,5,6
2,3<5
2 levels
n.s. Sentence
1,3<4,6
2 levels
NA n.s. n.s.
Further after the boundary, no effects were observed for C
2
closing movement.
For C
2
opening movement, one subject, R, has an effect, such that sentence 1 has less
displacement than sentences 3, 4 and 6, and sentence 5 has less displacement than
sentence 6. The results for post-boundary displacement for C
2
are shown in Table 2.28.
59
Table 2.28. Summary of results for post-boundary C
2
, showing the relationship between sentences and the number of
boundary strengths (levels) distinguished for displacement. The consonant string is # C
1
C
2
C
3
].
Subject B Subject E Subject R
C
2
C
2
C
2
Constriction
closing opening closing opening closing opening
Displacement n.s. n.s. NA NA n.s. Sentence
1<3,4,6
5<6
2 levels
For C
3
closing movement, subject R shows an effect, again distinguishing two
levels, namely sentences 1, 4 and 5 show less displacement than sentence 6, and sentence
5 also shows less displacement than sentence 2. The effects on C
2
opening and C
3
closing
movement for subject R are unpredicted, as this subject does not show any temporal or
spatial effects close to the boundary. While it is not clear what the cause of these effects
is, it seems unlikely that the boundary effects would occur this far away from the
boundary without any effect on previous post-boundary constrictions.
Finally, for the C
3
opening movement all subjects show an effect, similar to the
observed temporal effects. Subject B distinguishes two levels, such that sentences 3, 4
and 6 have less displacement than sentences 1, 2 and 5. Subject E shows less
displacement for sentences 3 and 4 than for sentences 1, 2, 5 and 6, similar to her
temporal effects. Subject R distinguishes three levels of displacement, namely sentences
3 and 4 show less displacement than sentence 6, which shows less displacement than
sentences 1, 2 and 5. These effects pattern with the temporal effect for C
3
opening
movement, except that this time in addition to sentences 3 and 4 (with the sequence donut
and a sweet angora) sentence 6 is also shorter for two subjects than sentences 1,2 and 5
(donut and a sweet apple), and again, sentence 6 does differ in phonetic content (donut
and a sweet hostess) with the tongue tip more forward for the [s] in hostess than it is for
60
[l] in apple. As for the temporal effects on C
3
opening movement, the spatial effects are
therefore unlikely to be related to the boundary. The results for post-boundary
displacement for C
3
are summarized in Table 2.29.
Table 2.29. Summary of results for post-boundary C
3
, showing the relationship between sentences and the number of
boundary strengths (levels) distinguished for displacement. The consonant string is # C
1
C
2
C
3
].
Subject B Subject E Subject R
C
3
C
3
C
3
Constriction
closing opening closing opening closing opening
Displacement n.s. Sentence
3,4,6<1,2,5
2 levels
NA Sentence
3,4<1,2,5,6
2 levels
Sentence
1,4,5<6;
5<2
2 levels
Sentence
3,4<6<1,2,5
3 levels
Table 2.30 summarizes all results in terms of levels of boundary strength distinguished,
and tables 2.31 to 2.33 show the results for post-boundary spatial effects.
61
Table 2.30. Summary of the effects of prosodic boundaries, showing the relationship between sentences and the number of boundary strengths (levels) distinguished. The
numbers represent the sentences in the experiment. Continued on next page.
Pre-boundary results Post-boundary results
Subject B
C
3
C
2
C
1
C
1
C
2
C
3
Constriction
closing opening closing opening closing opening closing opening closing opening closing opening
Duration 3<1,2,5,6;
4<1,6;
2 levels
n.s. n.s. n.s 1<3,4,5,6;
2 levels
1,2,3<4,5<6;
3 levels
n.s. n.s. n.s. n.s. n.s. 4<1,2,3,5,6
2 levels
Time-to-peak
velocity
n.s. n.s. 1,2,4,5<3
2 levels
n.s. n.s. 1,2,3,4,6<5;
2 levels
1,2,3<4,5,6;
2 levels
4,6<3;
4,5,6<1;
2 levels
1,3,4<6;
2 levels
n.s. n.s. n.s.
Displacement n.s. n.s. 1,5<3,6;
4<6
2 levels
1,2,4,5<6;
1<3,4;
2<3;
3 levels:
1<4<6
1<3,4,6;
2<3,6;
2 levels
1<2,3,4,5,6;
2<5,6;
3 levels:
1<2<5,6
1<4,5,6
2,3<5;
2 levels
n.s. n.s. n.s. n.s. 3,4,6<1,2,5;
2 levels
Subject E
C
3
C
2
C
1
C
1
C
2
C
3
Constriction
closing opening closing opening closing opening closing opening closing opening closing opening
Duration NA n.s. n.s. n.s. 1,2,3,4,5<6
2 levels
1,2,3,4,5<6
3<5;
3 levels:
3<5<6
n.s. NA NA NA NA 3<1,2,5,6;
4<2,5,6;
2 levels
Time-to-peak
velocity
NA n.s. n.s. n.s. 1<4,6;
2<4,5,6;
3<6;
2 levels
1,2,3,4,5<6;
2 levels
n.s. n.s. NA NA NA 3,4<1,5,6;
2 levels
Displacement NA n.s. n.s. n.s. n.s. 2,3<4,5,6;
2 levels
1,3<4,6;
2 levels
NA NA NA NA 3,4<1,2,5,6;
2 levels
62
Table 2.30. Continued.
Pre-boundary results Post-boundary results
Subject R
C
3
C
2
C
1
C
1
C
2
C
3
Constriction
closing opening closing opening closing opening closing opening closing opening closing opening
Duration NA n.s. 3<6;
1,2,5<4,6;
2 levels
n.s. 1,2,3,4,5<6
1<5;
3 levels:
1<5<6
1,2,3,4,5<6
1,2,3<5;
3 levels:
1,2,3<5<6
n.s. n.s. n.s. n.s. n.s. 3,4<1,2,5,6;
2 levels
Time-to-peak
velocity
NA n.s. 1,2,3,4,5<6;
2<4,6;
3 levels:
2<4<6;
n.s. 1,2,3,4,5<6;
1<4,5;
3 levels:
1<4,5<6
1,2,3,4,5<6;
2 levels
n.s. n.s. n.s. n.s. n.s. 3,4<1,2,5,6;
2 levels
Displacement NA 1,2,6<4;
2<3;
2 levels
n.s. n.s. 1,2<4,5,6;
3<6;
2 levels
1,2<4,5,6;
3<5,6;
2 levels
n.s. n.s. n.s. 1<3,4,6;
5<6;
2 levels
1,4,5<6;
5<2;
2 levels
3,4<6<1,2,5;
3 levels
63
Table 2.31. Results for post-boundary displacement (in mm) for subject B. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of individual
sentences).The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
F(5,23)=4.340,
p=.0063;
Means:
Sent.1, n=6, 5.5
(1.3);
Sent.2, n=4, 6.5
(1.4);
Sent.3, n=4, 7.1
(1.4);
Sent.4, n=5, 8.1
(1.7);
Sent.5, n=4, 9.9
(1.7);
Sent.6, n=6, 7.9
(1.8);
Summary:
2 levels:
1<4,5,6
2,3<5
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
4-1=2.6, p=.0123;
5-1=4.4, p=.0002;
5-2=3.3, p=.0064;
5-3=2.8, p=.0200;
6-1=2.4, p=.0141;
Opening
n.s.
Means:
Sent.1, n=6, 4.2
(.7);
Sent.2, n=4, 3.7
(1);
Sent.3, n=4, 4.6
(.8);
Sent.4, n=4, 3.5
(.7);
Sent.5, n=3, 3.6
(1);
Sent.6, n=6,
2.7 (1.4);
Closing
n.s.
Means:
Sent.1, n=6, .7
(.3);
Sent.2, n=4, 1
(.3);
Sent.3, n=4, 1
(.5);
sent.4, n=4, .6
(.5);
sent.5, n=3,
1.2 (.8);
sent.6, n=6, .7
(.2);
Opening
n.s.
Means:
Sent.1, n=6,
2.6 (.7);
Sent.2, n=4,
3.3 (1.2);
Sent.3, n=4,
2.2 (.7);
Sent.4, n=5,
2.9 (1.4);
Sent.5, n=4,
3.4 (1.1);
Sent.6, n=6,
2.5 (.6);
Closing
n.s.
Means:
Sent.1, n=6, 5.1
(.7);
Sent.2, n=4, 5.6
(1);
Sent.3, n=4, 4.4
(.9);
Sent.4, n=5, 5.4
(1.2);
Sent.5, n=4, 5.7
(.9);
Sent.6, n=6, 5
(.7);
Opening
F(5,22)=16.177,
p<.0001
Means:
Sent.1, n=6, 16
(1.5);
Sent.2, n=4, 15.1
(1.5);
Sent.3, n=4, 9.3
(2.5);
Sent.4, n=4, 9.8
(1.4);
Sent.5, n=4, 14.8
(1.1);
Sent.6, n=6, 11.2
(1.2);
Summary:
2 levels:
3,4,6<1,2,5
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
1-3=6.64
(p<.0001);
1-4=6.17
(p<.0001)
1-6=4.78
(p<.0001)
2-3=5.81
(p<.0001)
2-4=5.34
(p<.0001)
2-6=3.95
(p=.0008)
5-3=5.47
(p<.0001)
5-4=5 (p=.0002)
5-6=3.61
(p=.0017)
64
Table 2.32. Results for post-boundary displacement (in mm) for subject E. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of individual
sentences).The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
F(5,21)=2.751
p=.0460
Means:
Sent.1, n=3, 2.8
(1.4);
Sent.2, n=3, 3 (.4);
Sent.3, n=6, 2.9 (.8);
Sent.4, n=6, 5.5
(2.1)`;
Sent.5, n=4, 5.3
(2.2);
Sent.6, n=5, 5.6
(2.5);
Summary
2 levels:
1,3<4,6
Differences
between means
and results of
Fisher’s PLSD post-
hoc tests:
4-1=2.8, p=.0425;
4-3=2.6, p=.0228;
6-1=2.9, p=.0420;
6-3=2.7, p=.0240;
Opening
Could not be
measured
Closing
Could not be
measured
Opening
Could not be
measured
Closing
Could not be
measured
Closing
F(5,26)=94.544
(p<.0001)
Means:
Sent.1, n=5, 15.1
(1.2);
Sent.2, n=5, 16.6
(.8);
Sent.3, n=6, 5.5
(1.2);
Sent.4, n=6, 6 (1);
Sent.5, n=5, 16.2
(1.2);
Sent.6, n=5, 15.1
(2);
Summary
2 levels:
3,4<1,2,5,6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
1-3=9.63 (p<.0001;
1-4=9.04 (p<.0001;
2-3=11.14
(p<.0001;
2-4=10.55
(p<.0001;
5-3=10.76
(p<.0001;
6-3=9.64 (p<.0001;
5-4=10.18,
p<.0001;
6-4=9.05 (p<.0001;
65
Table 2.33. Results for post-boundary displacement (in mm) for subject R. (ANOVA, means and standard deviations,
summary of the levels of boundary strength distinguished, and Fisher’s PLSD determining the behavior of individual
sentences).The consonant string is: # C
1
C
2
C
3
].
C1 C2 C3
Closing
n.s.
Means:
Sent.1, n=4, 7.1
(3.1);
Sent.2, n=6, 7.3
(2.7);
Sent.3, n=6, 7.7
(1.6);
Sent.4, n=5, 8.8
(.9);
Sent.5, n=6, 9.4
(2.2);
Sent.6, n=6, 7.9
(5.1);
Opening
n.s.
Means:
Sent.1, n=4, 2.3
(.5);
Sent.2, n=6, 2.5
(1);
Sent.3, n=6, 2.4
(.9);
sent.4, n=5, 2.5
(.3);
sent.5, n=6, 2.7
(.7);
sent.6, n=6, 2.7
(1);
Closing
n.s.
Means:
Sent.1, n=4, 1.1
(.5);
Sent.2, n=6, 1.3
(.9);
Sent.3, n=6, 1.5
(.5);
sent.4, n=5, 1.7
(.3);
sent.5, n=6, 2.2
(.7);
sent.6, n=6, 2.1
(.3);.
Opening
F(5,27)=3.171
p=.0222
Means:
Sent.1, n=4, 2.3
(.5);
Sent.2, n=6, 3 (.5);
Sent.3, n=6, 3.2
(.3);
Sent.4, n=5, 3.3
(.7);
Sent.5, n=6, 2.7
(.6);
Sent.6, n=6, 3.5
(.6);
Summary:
2 levels
1<3,4,6
5<6
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
3-1=.946,
p=.0134;
4-1=1,
p=.0117;
6-1=1.194,
p=.0024;
6-5=.820,
p=.0161;
Closing
F(5,27)=2.616,
p<.0470;
Means:
Sent.1, n=4, 3.8
(.6);
Sent.2, n=6, 4.4
(.7);
Sent.3, n=6, 4.2
(.3);
Sent.4, n=5, 3.9
(.7);
Sent.5, n=6, 3.5
(.8);
Sent.6, n=6, 4.9
(.8);
Summary:
2 levels:
1,4,5<6
5<2
Differences
between means
and results of
Fisher’s PLSD
post-hoc tests:
2-5=.8, p=.0430;
6-1=1.1,
p=.0284;
6-4=.9, p=.0389;
6-5=1.3,
p=.0027;
Opening
F(5,25)=149.0
30, p<.0001
Means:
Sent.1, n=3,
15.6 (.3);
Sent.2, n=6,
15.3 (.6);
Sent.3, n=5,
6.3 (.8);
Sent.4, n=5,
5.7 (.5);
Sent.5, n=6,
14.5 (1.3);
Sent.6, n=6,
11.6 (.7);
Summary:
3 levels
3,4<6<1,2,5
Differences
between
means
and results of
Fisher’s
PLSD post-
hoc tests:
1-3=9.28
(p<.0001);
1-4=9.92
(p<.0001)
1-6=3.97
(p<.0001);
2-3=9.01
(p<.0001);
2-4=9.65
(p<.0001);
2-6=3.7
(p<.0001);
5-3=8.19
(p<.0001);
5-4=8.83
(p<.0001);
5-6=2.88
(p<.0001);
6-3=5.31
(p<.0001);
6-4=5.95
(p<.0001);
66
6.5. Summary of Results
A graph showing the closing movement duration and opening movement duration results
is given in Figure 2.4. (Note that these results show consonant constriction closing and
opening movements, and post-boundary there is a vowel constriction immediately
following the boundary, so the observed lengthening is not on the first, but rather on the
second post-boundary gesture, i.e. the first consonantal constriction.)
67
Figure 2.6. Duration of the closing and opening movements. ‘Subject E: C1 opening to C3 closing, post-boundary movement’ refers to Subject E’s
1 3
post-boundary duration from C extremum to C extremum, as the opening and closing movements between these two extrema could not be measured.
boundary
Post-boundary
-1500 -1000 -500 0
Sentence 6
Sentence 5
Sentence 4
Sentence 3
Sentence 2
Sentence 1
Sentence 6
Sentence 5
Sentence 4
Sentence 3
Sentence 2
Sentence 1
Sentence 6
Sentence 5
Sentence 4
Sentence 3
Sentence 2
Sentence 1
Pre-boundary
E
R
B
1000 500
Subject E: C1 opening to C3 closing, post-
boundary movement
C3 post-boundary opening movement
C3 post-boundary closing movement
C2 post-boundary opening movement
C2 post-boundary closing movement
C1post-boundary opening movement
C1 post-boundary closing movement
C3 pre-boundary closing movement
C3 pre-boundary opening movement
C2 pre-boundary closing movement
C2 pre-boundary opening movement
C1 pre-boundary closing movement
C1 pre-boundary opening movement
68
All subjects in the study show boundary adjacent lengthening and distinguish up
to three levels of boundary strength. The effect is strongest immediately at the boundary
for the pre-boundary opening movement of C
1
. In more detail, closest to the boundary,
for the pre-boundary C
1
opening movement duration, all subjects distinguish three levels
of lengthening, such that stronger boundaries lead to more lengthening. For time-to-peak
velocity two levels of boundary strength are distinguished for this movement. Further
pre-boundary, for the C
1
closing movement duration, two subjects (B and E) show two
levels and one subject (R) three levels of lengthening. For C
1
closing movement time-to-
peak velocity, two subjects show an effect, such that subject E distinguishes two and
subject R three levels of lengthening. Still further away from the boundary, at the C
2
opening movement, no temporal effects were observed. Finally, one subject (R) shows an
effect on the closing movement of pre-boundary C
2
; subject R distinguishes three levels
for time-to-peak velocity and two levels for the movement duration. There were further
effects on C
2
and C
3
closing movement for subject B, but these were argued not to be
boundary effects.
To summarize, the pre-boundary temporal effects (duration and time-to-peak-
velocity) extend up to two consonant constrictions for subject R and one constriction for
subjects B and E. Up to three levels of lengthening for all subjects are distinguished.
Pre-boundary spatial effects pattern similarly to the temporal effects, with effects
observed for two constrictions for subject B and one constriction for subjects E and R.
Subject R shows an effect as far as the opening movement C
3
, but we argue that it is not
an effect of the boundary but is coarticulatory in nature. Subject B distinguishes up to
69
three degrees of displacement, the two other subjects show two degrees of displacement.
Both temporal and spatial effects are such that sentences with a stronger boundary show
more lengthening and greater spatial effects.
The post-boundary effects extend to one consonant constriction C
1
(which is the
second post-boundary constriction, the first being the vowel) for subject E and to two
consonant constrictions for subject B. For the closing movement, subject B has both
temporal (time-to-peak velocity) and spatial effects and distinguishes two levels of
duration and displacement. For subject E only spatial effects are observed, with two
levels of displacement. On C
1
opening movement, subject B shows shortening effects for
time-to-peak velocity, and for the C
2
closing movement lengthening is observed again.
Further after the boundary, for subject R from C
2
opening movement to C
3
opening
movement, and for subjects B and E on C
3
opening movement, we observe further effects
but argue that these are segmental coarticulatory effects, not boundary effects.
7. Discussion
The study was conducted to examine the scope of effect of prosodic boundaries of
different strength. Based on previous studies and the predictions of the π-gesture model,
it was hypothesized that the strongest effects of the boundary will be on the movements
immediately adjacent to the boundary and that the magnitude of effect will decrease with
distance from the boundary.
Both pre- and post-boundary temporal and spatial effects have been observed
clearly. Mainly, these effects pattern as predicted: larger prosodic boundaries induce
70
more lengthening and displacement, and the effect is local and not discontinuous. In what
follows, we will discuss the results in more detail.
We will first examine the temporal results, starting with the effects of the
boundary on the pre-boundary constrictions. For subject B, these effects extend one
constriction for time-to-peak-velocity and for duration. The effect of duration on pre-
boundary C
1
opening and closing movement is as predicted, stronger boundaries have
stronger effects, in terms of the levels of boundary effects that are distinguished. The
effect on time-to-peak-velocity for the C
1
opening movement is mixed in that although
sentence 5 shows longer time-to-peak-velocity than sentences 1-4, it is also longer than
sentence 6; however sentence 5 does have a strong IP boundary, and as such it is
expected to have a large effect.
The results for subject E are as predicted. For this subject the scope of temporal
effect is one constriction pre-boundary for both duration and time-to-peak-velocity, with
stronger boundaries inducing more lengthening.
For subject R, who has an effect on time-to-peak-velocity and on duration for C
2
and C
1
closing movement and on C
1
opening movement, there is a discontinuity of
effects since there are no effects on C
2
opening movement. Otherwise, the pre-boundary
effect is as predicted, with stronger boundaries exhibiting more lengthening.
Post-boundary, only subject B has temporal effects, and they extend for time-to-
peak-velocity to post-boundary C
2
closing movement. No temporal effects were observed
for subjects E and R. The effect on post-boundary C
1
closing movement is as expected,
larger boundaries show longer time-to-peak-velocity. The effect of the boundary on C
1
71
opening movement however is a shortening of the duration. And although it does not
reach significance, the tendency in the shortening direction is also observed on C
2
opening movement for duration for this subject (B). Also there is a similar trend for C
2
time-to-peak-velocity and duration for both closing and opening movement for subject R.
In Byrd et al. (2006) similar significant shortening effects have been found remotely after
a prosodic boundary and are interpreted as compensatory shortening, compensating for
the boundary-induced lengthening. They argue that this is not a direct effect of the π-
gesture, in the sense that it is not a planned effect; it is interpreted, rather, as a reaction to
the boundary-induced lengthening and an attempt to return to the gestural timing not
prosodically perturbed. Such an interpretation is also plausible for the effect observed
here. Furthermore, these effects replicate Byrd et al.’s (2006) findings in a study using a
wider variety of boundaries and different consonants.
The observed effects for subject B on C
2
closing movement time-to-peak-velocity,
however, are again lengthening effects. It is not clear however, if these are best
understood as effects of the boundary, given that there is a shortening effect on C
1
opening movement for this subject and a tendency in the shortening direction for the
duration of C
2
opening movement.
A prediction of the π-gesture model is that magnitude of effect decreases with
distance from the boundary. The magnitude of effect can be seen when we compare the
strength of boundary effect on a movement closer and a movement further away from the
boundary (when there is an effect on both). The results for all such cases are shown in
Table 2.34, which shows the difference in means between two sentences for a given
72
variable (when two sentences differ significantly in effect across more than one
articulatory movement).
Table 2.34. Comparison of magnitude of effect (differences given in ms). Each column shows the differences between
sentence means for a given variable. Only those differences are given that occur on more than one articulatory
movement.
Subject B
pre-boundary C
2
pre-boundary C
1
Constriction
closing opening closing opening
Duration Mean differences:
Sentence 4-1=20.41;
Sentence 5-1=20.83;
Sentence 6-1=14.17;
Mean differences:
Sentence 4-1=160;
Sentence 5-1=234;
Sentence 6-1=557;
Subject E
pre-boundary C
2
pre-boundary C
1
Constriction
closing opening closing opening
Duration Mean differences:
Sentence 6-1=87.91;
Sentence 6-2=91.33;
Sentence 6-3=78.08;
Sentence 6-4=61.34;
Sentence 6-5=62.77;
Mean differences:
Sentence 6-1= 232;
Sentence 6-2=257;
Sentence 6-3=258;
Sentence 6-4=169;
Sentence 6-5=126;
Time-to-peak
velocity
Mean differences:
Sentence 6-1=42;
Sentence 6-2=46;
Sentence 6-3=30;
Mean differences:
Sentence 6-1=171;
Sentence 6-2=186;
Sentence 6-3=188;
Subject R
pre-boundary C
2
pre-boundary C
1
Constriction
closing opening closing opening
Duration Mean differences:
Sentence 6-1=13.76;
Sentence 6-2=10.65;
Sentence 6-3=7.36;
Sentence 6-5=9.97;
Mean differences:
Sentence 5-1=27.95;
Sentence 6-1=66.17;
Sentence 6-2=57.29;
Sentence 6-3=51.47;
Sentence 6-4=41.44;
Sentence 6-5=38.22;
Mean differences:
Sentence 5-1=82;
Sentence 6-1=142;
Sentence 6-2=136;
Sentence 6-3=119;
Sentence 6-4=97;
Sentence 6-5=60;
Time-to-peak
velocity
Mean differences:
Sentence 6-1=15;
Sentence 6-2=15;
Sentence 6-3=13;
Sentence 6-4=7;
Sentence 6-5=8;
Mean differences:
Sentence 6-1=44;
Sentence 6-2=34;
Sentence 6-3=33;
Sentence 6-4=21;
Sentence 6-5=22;
Mean differences:
Sentence 6-1=99;
Sentence 6-2=96;
Sentence 6-3=90;
Sentence 6-4=85;
Sentence 6-5=56;
We see from these results that in all cases where a comparison can be made, the
difference is qualitatively larger closer to the boundary. In terms of magnitude of effect
then, there is a decrease in the strength of effect with an increase in distance from the
boundary, as predicted by the π-gesture model.
73
We turn now to the spatial effects, starting with the pre-boundary effects. We see
that all subjects show an effect on C
1
opening movement, and two subjects (B and R)
show an effect further than C
1
opening movement, up to C
1
closing movement for R and
up to C
2
closing movement for B. The effects are suggested by the π-gesture model, in
which lesser overlap between consonants and vowels can yield larger spatial
displacements. Subject R further shows an effect on opening movement C
3
, on sentences
3 and 4. This is similar to the temporal effects for B on C
3
closing movement, and as
discussed above, these do not seem to be boundary effects. Post-boundary spatial effects
are observed for two subjects (B and E) on C
1
closing movement and are as predicted
with stronger boundaries causing more displacement. The scope of the spatial effects
does not pattern consistently with the scope of the temporal effects. For subject B the
scope of the spatial effects is larger than the scope of the temporal effects pre-boundary,
for subject R the scope of spatial effects is smaller than the scope of temporal effects pre-
boundary, and the spatial scope compared to the temporal scope is smaller pre-boundary
and larger post-boundary for subject E. Note however that a prior study on spatial
prosodic effects has shown greater within and between subject variability in the spatial
domain (Byrd, Lee, Riggs & Adams 2005).
Summarizing, the scope of effect seen in this experiment (temporal and spatial) is
up to two consonant constrictions pre-boundary and one, or possibly even two consonant
constrictions post-boundary (the effect on the second post-boundary constriction would
be the time-to-peak velocity effect on the closing movement C
2
for subject B,
and due to
the shortening effects observed on the C
1
opening movement for this subject we are not
74
sure that the C
2
effect is in fact a boundary effect). In terms of CV syllables, the scope of
effect is up to one whole syllable pre-boundary, and one, or arguably two syllables (up to
the onset of the second syllable) post-boundary (remember that the string examined was
donut and a sweet). These findings are schematically represented in Figure 2.7. The
effects are strongest at the boundary and decrease with distance form the boundary. Up to
three levels of boundary strength are distinguished. These results thus confirm the
original predictions. Further, temporal and spatial effects do not show the same pattern
regarding the scope of effect (see Byrd, Lee, Riggs & Adams 2005 for similar
observation.)
temporal effects
spatial effects
B
X
C1 C2 C3
closing
closing
closing
opening
opening
opening
# AND A S WEE T
C3 C2 C1
closing
closing
closing
opening
opening
opening
D O N U T
X? X X X
PRE-BOUNDARY # POST-BOUNDARY
X X X
temporal effects
spatial effects
E
X X
X X
temporal effects
spatial effects
R
X
X X X
X
X
temporal effects
spatial effects
B
X
C1 C2 C3
closing
closing
closing
opening
opening
opening
# AND A S WEE T
C3 C2 C1
closing
closing
closing
opening
opening
opening
D O N U T
X? X X X
PRE-BOUNDARY # POST-BOUNDARY
X X X
temporal effects
spatial effects
E
X X
X X
temporal effects
spatial effects
R
X
X X X
X
X
Figure 2.7. Schematic representation of the scope of the effect of the boundary. ‘X’ represents the articulatory
movements for which an effect was observed. Temporal effects include both time-to-peak velocity and movement
duration. Effects for which we argued that they are not boundary effects (including compensatory shortening on C
1
opening movement for subject B) are not included.
The results regarding the number of boundary strength levels distinguished by
duration, time-to-peak-velocity, or displacement do not show clear-cut sentence
groupings, in the sense that sentences group in different groups for different measures,
different subjects, and at different points in the sequence (see for example pre-boundary
C
2
and C
1
closing vs. opening movement for displacement for subject B, where the
sentences group differently for each of these variables, indicating that the sentences do
75
76
not form categories regarding boundary strength. Similar mismatch in the grouping of
sentences can be seen across the variables examined). Furthermore, the temporal and
spatial effects show gradiency in the sense that in the sentence comparisons, sentences
often fall between two groups. For example for subject R, pre-boundary closing
movement C
1
, sentences 2, 3 and 4 are between being grouped with sentence 1and 5, in
the sense that sentences 1 through 5 show smaller duration than sentence 6, and sentence
1 smaller duration than sentence 5, but sentences 2, 3 and 4 show neither longer duration
than sentence 1 nor shorter duration than sentence 5, so they are not clearly grouped with
either sentence 1 nor sentence 5. For the same subject (R), for the opening movement C
1
sentence 4 is not grouped clearly with sentences 1-3 nor with sentence 5, and for the
time-to-peak velocity for C
1
closing movement, sentences 2 and 3 do not form a group
with either sentence 1 or with sentences 4 and 5. Other cases are for subject E, pre-
boundary opening movement C
1
, sentence 3, and sentences 1- 5 for the time-to-peak
velocity. These findings suggest that while we see differences in boundary strength,
sentences do not group categorically into groups of different strength, but rather form a
continuum of boundary strength.
Regarding the question of whether boundaries of different strength have different
scopes of effect, the answer here appears to be not. For subject B, the temporal effects of
the boundary for sentences 4-6 extend from the pre-boundary closing movement C
1
to
post-boundary closing movement C
1
, so the scope is the same across different sentences
(although for post-boundary effects we cannot answer this question with certainty, as we
see lengthening at C
2
, which we could not determine to be boundary related or not).
77
Similarly for the two other subjects, the scope of temporal effect is the same across
different boundaries, as for subject E sentences 4-6 show a temporal effect of equal
scope, namely on both pre-boundary closing and opening movement C
1
.
(Sentence 4 does
not show a boundary effect at the pre-boundary opening movement C
1
but the trend
towards lengthening is there as shown in the means for sentences 1-6 for the temporal
effects). For subject R the scope is the same for different boundaries, as both sentences 4
and 6 show an effect up till closing movement C
2
(though sentence 5 does not have an
effect beyond the closing movement C
1
).
Regarding the symmetry of the effects, two of the three subjects (E and R) show
effects pre-boundary (subject E also shows displacement effect on post-boundary C
1
closing movement), and the third subject shows effects both pre- and post-boundary.
Such individual variation might indicate that the coordination of the π-gesture with
constriction gestures can vary in a limited way across subjects—with two subjects in this
experiment co-coordinating the prosodic gesture earlier with respect to the lexical
sequence and the third subject more symmetrically around the juncture.
Finally, it should be noted that the results showing that the effects of prosodic
boundaries extend over a number of articulatory movements can be accounted for within
the π-gesture model, because in this model, as in Articulatory Phonology, linguistic units
have inherent temporal properties, and it is to be expected that prosodic boundary effects
will extend over a period of time due to the interval of activation of the π-gesture. The
decrease in magnitude of effect with distance from the boundary also follows from the
78
model due to the activation shape of the π-gesture, decreasing smoothly with distance
from the boundary.
8. Conclusions: The Scope of Effect of Prosodic Boundaries
The study examined the scope of effect of prosodic boundaries of different strength on
three pre-boundary and three post-boundary consonant constrictions. The temporal
effects for all subjects are strongest at C
1
release; this is the constriction closest to the
boundary. Further away from the boundary, effects diminish, both in the number of levels
that are distinguished, and in the magnitude of effect, the latter as predicted by the π-
gesture model (Byrd & Saltzman 2003). The scope of temporal effect is one constriction
pre-boundary and one, possibly two, constrictions post-boundary for subject B. For
subject E the effect extends one constriction pre-boundary, and for subject R two
constrictions pre-boundary. For the spatial effects, the scope for subject B is two
constrictions pre-boundary and one post-boundary, subject E has an effect on one
constriction pre- and on one constriction post-boundary, and subject R has an effect one
constriction pre-boundary. So overall, the scope of effect seen in this experiment can be
up to two consonant constrictions pre-boundary and one, possibly up to two consonant
constrictions post-boundary, or, in terms of syllables, one whole syllable pre-boundary,
and one, possibly two (up to the onset of the second syllable) post-boundary (if we
assume that the effect on C
2
closing for subject B is a boundary effect). Overall,
disregarding whether the effect of the boundary is pre- or post-boundary, the effect of the
prosodic boundary on articulatory movement extends about 2 syllables for two subjects
79
(B and E), and one syllable for subject R. In addition to the effects of the prosodic
boundary, indications of compensatory shortening effects have been observed,
reinforcing the preliminary observations offered in Byrd et al. (2006).
80
Chapter 3: Gradiency and Categoricity in Prosodic Boundary Production and
Perception
1. Introduction
This study considers both the production and perception of prosodic boundaries. In
particular, this chapter investigates whether the production and perception of prosodic
boundaries is categorical or gradient. This question is of interest because of the
implications it has for the theory of prosodic structure. Most theories assume a small set
of prosodic categories which are marked by categorically different prosodic boundaries
(see overview in Shattuck-Hufnagel & Turk 1996) while an alternative view suggests the
possibility of gradiently varying prosodic boundaries (Byrd & Saltzman 2003). We
evaluate these views in this chapter. The first part of this study is an articulatory study
investigating the production of twenty-four junctures elicited through a wide variety of
syntactic structures. By presenting speakers with a variety of syntactic structures, we
expect to elicit a range of prosodic boundaries, ranging from ‘no boundary’ to ‘very
strong boundary’. The second part of this study examines the perception of these same
boundaries.
2. Background: Gradiency and Categoricity in the Production and Perception of
Prosodic Boundaries
While many studies have examined the phonetic properties of prosodic boundaries (see
Chapter 1) and the function of prosodic boundaries in speech perception (see e.g., Cutler,
Dahan & Donselaar 1997 for an overview), few studies have examined the production
81
and perception of boundaries from the point of view of gradiency/categoricity. Do the
boundaries that speakers produce and listeners perceive cluster in strength (or degree of
disjuncture) in a small number of groups, behaving categorically, or do the
production/perception boundary strength values pattern gradiently, behaving as a
continuum of values? The answer to this question should inform our conception of the
representation of prosodic structure in the speech planning and perception processes.
In the literature on prosodic structure there are two views on the question of
gradiency versus categoricity of prosodic boundaries. According to the standard view of
prosodic hierarchy (e.g., Beckman & Pierrehumbert 1986; see also e.g., Nespor & Vogel
1986, Selkirk 1986, Hayes 1989 for a categorical view of prosodic boundaries), speakers
will produce (and by extension listeners perceive) prosodic boundaries in a categorical
way and will distinguish at least three different categories, namely Intonational Phrases,
intermediate phrases and prosodic words. It is also possible that a distinction is made for
a fourth category, namely the within-a-word boundary category, for example for clitic
boundaries. Although there is no phonetic evidence for acoustic or articulatory
differences between prosodic word boundaries and clitic boundaries, it is a category that
has been postulated, for example in the ToBI transcription system.
1
A large study pertaining to this question has been conducted by Wightman et al.
(1992). They examine phrase-final lengthening in American English, addressing the
question how many levels of final lengthening can be distinguished. Three trained
labelers marked boundaries in a large pool of data with the following seven break indices
1
The category Utterance has also been argued for (Hayes 1989, Nespor & Vogel 1986) but following
Beckman & Pierrehumbert’s (1986) argument that there is no phonetic evidence for this category as a
phonological phrase, in what follows we will not assume Utterance as a prosodic category.
82
as outlined in the perceptual labeling system of Price et al. 1991: 0 = no prosodic break,
1 = prosodic word boundary, 2 = boundary marking minor grouping of words, or
accentual phrase (AP), 3 = intermediate phrase (ip) boundary, 4 = intonational phrase (IP)
boundary, 5 = breath/pause within sentence, and 6 = sentence boundary.
The normalized
2
duration of vowels in phrase-final syllables showed that four levels
of lengthening were distinguished, namely levels 0-1, 2, 3, 4-6. This study has been taken
as the groundwork for the break indices part of the ToBI labeling system. Note, however,
that Wightman et al. point out that a larger number of prosodic boundaries might be
distinguished if cues other than final lengthening are taken into account, for example pauses
or pitch.
In a similar study Lee & Cole (2006) examine final vowel lengthening in a part of
the Boston Radio News corpus, specifically testing for evidence for three categories (word,
intermediate phrase, Intonational Phrase). They examine four vowels, and though
individual speakers show different effects, for the speakers pooled together they find
evidence of lengthening for three vowels, two of which distinguish two categories, and one
which distinguishes all three categories. The effect is such that higher prosodic categories
induce more lengthening (i.e., word < intermediate phrase < Intonational Phrase). Although
this study to some extent shows categorical boundary production, note that only for one
vowel there is an effect showing more than two prosodic categories. Lee and Cole (2006)
also examine the effects of prosodic structure on F1 and F2, and find that overall two
2
The normalization measure employed takes the duration of a phone to be dependent on the mean duration,
standard deviation and articulation rate for each speaker, calculated across all occurrences for that phone.
83
prosodic boundaries can be distinguished, so these cues did not distinguish further prosodic
categories.
The categorical view of boundary production is also a necessary assumption if the
Strict Layering Hypothesis (Selkirk 1984, Nespor & Vogel 1986) is assumed. The Strict
Layering Hypothesis (SLH) states that each prosodic constituent immediately dominates a
prosodic constituent of a category level lower than itself, that is, prosodic recursion is not
allowed. In this view, the only differences in boundary strength that are determined by
prosodic structure arise from differences in boundary type. As Ladd (1996) points out, this
view is contradicted by evidence that listeners are able to agree on the relative strength
ordering of many prosodic boundaries.
To summarize, under the categorical view described so far, prosodic structure
consists of a small number of distinct categories, each with well-defined articulatory and
acoustic properties. The prediction for the production of prosodic boundaries is that these
distinct categories should be observed in some way in the production. The prediction for the
perception of prosodic boundaries is that listeners should perceive (and process) prosodic
boundaries in a categorical manner.
A different view hypothesizes prosodic boundary production and perception to be
gradient. Such a view corresponds to the results discussed in Ladd (1988), de Pijper and
Sanderman (1994) and Swerts (1997). Ladd (1988) examines prosodic recursion and reports
phonetic evidence (pause duration and pitch reset) showing that IP boundaries can differ in
strength, depending on their depth of embedding, contrary to what would be predicted
under the SLH. While a gradient view does not entail prosodic recursion, prosodic
recursion is a way gradience could be structurally accounted for. Another way, as pointed
84
out by Ladd (1996), in which boundaries of the same category but of different strength
could be accounted for is as the result of the probability with which a given prosodic
boundary (ip or IP) is likely to occur, as suggested in Pierrehumbert and Liberman (1981).
Note that in Pierrehumbert and Liberman’s (1981) view the gradiency thus (potentially)
arising is not a structural effect.
Evidence for a gradient view also comes from two perception studies. De Pijper and
Sanderman (1994) examine the perception of a large database of word boundaries and
observe that the perceived strength of word boundaries does not appear to cluster around a
limited number of target values. They view these results as suggesting a theory allowing
prosodic boundaries of the same category to be realized with different strengths. Similarly
Swerts (1997) in a study of discourse boundaries finds that listeners distinguish six degrees
of boundary strength for pause duration and two degrees for pitch reset, showing prosody to
mark several levels of boundary strength (though at the discourse level) indicating a large
number of prosodic boundaries, and in that way gradiency in the perception of boundaries.
These studies – though not designed to test categoricity or gradiency - indicate that
boundary production and boundary perception might be gradient, a view under which the
production values and the perceived boundary strength (PBS) values are expected to be
dispersed across a continuum of values.
Both a categorical and a gradient production/perception of prosodic boundaries is
compatible within the π-gesture framework, and more generally, within the Articulatory
Phonology framework (see Byrd 2006). The level of activation of π-gestures (Byrd &
Saltzman 2003) could be understood as gradient by allowing for a continuum of activation
strength values of the π-gesture, in turn yielding gradiency in the production and
85
presumably perception of prosodic boundaries (this would also be a way in which the
gradient production could arise without recursion – but with an explicit structural
representation). Under a categorical account of prosodic boundaries, a small number of
attractors could divide the possible activation space of prosodic boundary production into
distinct categories or regions, i.e., there would be a small number of attractors for the π-
gesture’s possible activation (see Byrd 2006), which would lead to a categorical production
and, presumably, perception of prosodic boundaries. A different option would be that the
temporal properties of boundaries vary in a gradient manner but that the co-occurring
production (or lack thereof) of specific tones (e.g., phrase accent or boundary tone) is
triggered, or licensed, when the π-gesture reaches a certain activation strength, and that it is
specifically the tonal properties of boundaries that lead to the categorical perception of
boundaries (Byrd, p.c.).
To date, the experiments above are the only studies informing these opposing views,
and more data are needed in order to evaluate this issue. The work in this chapter evaluates
the categoricity or gradiency in the production and perception of prosodic boundaries,
thereby contributing to an informed account of their representation.
3. Experiment: Categoricity or Gradiency of Prosodic Boundaries
The goal of the experiment is to investigate the question of gradiency and categoricity in
prosodic boundary production and perception, to evaluate the theories presented in
section 2. The experiment consists of a production and a perception part. Audio recording
collected simultaneously with articulatory kinematic data in the production part were
used as stimuli in the perception part of the study. There are 3 outcomes that, based on
86
the prosodic theories outlined above, we can expect: 1) Categorical
production/perception, as predicted by standard prosodic models, with three or four
different prosodic categories distinguished (e.g., Beckman & Pierrehumbert 1986) and
also compatible with an Articulatory Phonology approach to prosodic boundaries (Byrd
& Saltzman 2003), 2) gradient production/perception of prosodic boundaries (compatible
with suggestions in Ladd 1996 and the π-gesture framework of Byrd & Saltzman 2003),
and 3) gradient production but categorical perception in which the production of the
temporal properties of prosodic boundaries comes from a single distribution, but the tonal
properties of prosodic boundaries yield a categorical perception of prosodic boundaries
(as suggested in Byrd 2006).
3.1. Methods: Production
3.1.1. Stimuli and Subjects
Twenty-eight sentences were constructed, each containing the sequence “donut and.” The
constrictions of interest are [t] and [nd], with a potential boundary between them. In order
to engender variability in boundary strength, syntactic structure, phrase length before and
after the boundary and boundary position within the sentence were manipulated. Each
sentence was repeated six times (yielding a total of 168 sentences). The set of sentences
was pseudo-randomized, in blocks of 28 stimuli. Six of the stimuli were also used for a
different experiment, examining the scope of prosodic boundary effects (see Chapter 2).
3
3
One of these six sentences was the control sentence for the experiment presented in Chapter 2 of this
dissertation. To ensure that the control sentence did not vary according to context, this sentence was also
read 6 times at the beginning of each experiment, and 6 times at the end of each experiment, in addition to
being included within the 168 sentences. Therefore, the whole experiment was 180 sentences in total. The
first subject we recorded for the experiment (subject B) yielded average durations for the control sentence
87
The stimuli for the experiment are shown in Table 3.1. The total number of sentences
used in this study was 480 (168 for subjects B and E and 144 for subject R).
All three subjects participating (B, E, R) were native speakers of American
English with no known language deficits. They were paid for their participation and were
naïve as to the purpose of the experiment.
Table 3.1. Experiment stimuli. The boundary to be examined is between the words ‘donut and’, and ranges from
‘within word boundary’ to ‘very strong boundary’. Sentences in italics have not been collected for subject R.
1. After lunch Joyce bought a big box of donut-and-apple cereal.
2. Mary-Ann always has the donut-and-orange flavor in her fridge.
3. Would you mind getting me a donut and apple on your way back from school?
4. Johnny’d like a donut and apple for a change, instead of his usual lunch.
5. Mary’d like a donut and a sweet apple for breakfast, as always on Sundays.
6. If you get Steve a donut and a sweet apple he will be very happy.
7. All Tanya needs is a banana donut and a sweet apple, he said.
8. Tommi likes a big donut and a sweet apple, or a muffin and a tea.
9. A hot banana donut and a sweet apple biscuit is all I need.
10. Bribe Paul with a large banana donut and a sweet apple biscuit.
11. Pete brought me a humongous mocha donut and a sweet apple from Spain.
12. Scott gave her a big banana donut and a sweet open jar of cookies.
13. If you ask for a donut and a sweet boy brings you a pie, just say thank you.
14. While Ann was eating a donut and a sweet angora cat was waiting, he came.
15. It’s not that Phil didn’t like the donut and a sweet boy ate it – he stole it!
16. Eric was eating a donut and a sweet cat was playing, and then you came.
17. It’s possible that Nina didn’t like the donut and a sweet boy ate it.
18. It’s rare that the head coach offers a donut and a sweet athlete refuses.
19. The trainer who was eating a donut, and a sweet angora cat, have arrived.
20. I saw the man who never ate a donut, and a sweet cat in the garden.
21. On Sundays they like a donut, and a sweet apple biscuit on Saturdays.
22. She wondered why Alex had a donut and a sweet answer escaped her.
23. She talked to a tall boy eating a donut and a sweet lady gave them tea.
24. Susie was eating a donut, and a sweet camera was filming it all.
25. When she came, Mike took a donut. And a sweet excuse allowed him to leave.
26. The kids would like a donut?! And a sweet hostess wouldn’t even let them?
27. She was surprised Gary ate a donut. And a sweet surprise it was.
28. The princess gave the fairy a donut. And a sweet tale came to an end.
(pre-, post- and within the experiment) that did not vary from the overall average of all the control
sentences, so the pre- and post- sentences were removed from the following experiment runs, in order to
shorten the otherwise long experiment duration for the two other subjects. The twelve sentences recorded
for subject B in the pre- and post-experiment conditions were not used. In addition, four more sentences
were excluded for the 3
rd
subject (R) in order to shorten the experiment further. The sentences eliminated
were judged to be most similar in boundary strength to other sentences in the experiment.
88
3.1.2. Data Collection
Data were collected using Carstens Articulograph (AG200). Three sensors were placed
on the subject, one on the tongue tip, tracking articulatory movement, and two reference
sensors on the nose and maxilla. The articulatory data were sampled at 200Hz and
acoustic data at 16kHz. The data were corrected for head movement (using the nose and
maxillary reference sensor tracking) and rotated to the occlusal plane. The tongue tip y
(vertical) signal was differentiated, and signals were smoothed before and after
differentiation with a 9th order Butterworth filter of cutoff frequency 15 Hz.
3.1.3. Measurements
Data were analyzed using in-house MATLAB software. For each consonant constriction,
five time points were identified from the velocity signal: the onset of the closing
movement, peak-velocity time point for the closing movement and for the opening
movement, the extremum of the closing movement (i.e., the point where the tongue tip
reaches its most extreme vertical position), and the end of the opening movement. These
time points were identified from the tongue-tip y-velocity zero -crossings and peak
velocity time points on the velocity trajectory. The extremum for the closing movement is
also by definition the beginning of the opening movement. The end of the opening
movement is also defined as the onset for the closing movement for the following
consonant constriction. This leads to a total of 9 data points for the two consonants. For
these data points the spatial position of the tongue tip vertical movement was recorded as
well.
89
From these data, we calculate the following derived dependent variables (see also
Figure 3.1 for a schematized representation):
• closing movement time-to-peak-velocity: the time from closing movement onset to
peak velocity
• closing movement duration: the time from closing movement onset to extremum
• opening movement time-to-peak-velocity: the time from extremum to the peak
velocity of the opening movement
• opening movement duration: the time from extremum to the end of the opening
movement
• displacement of closing movement: the distance between the closing movement
onset position and the extremum position
• displacement of the opening movement: the distance between the extremum position
and the end of the opening movement.
These variables inform us about the closing and opening movement duration and
magnitude, and time-to-peak-velocity is a good indicator of the gestural stiffness
parameter, which is affected by the π-gesture (Byrd & Saltzman 2003). For this study,
only the pre-boundary opening movement was examined, but the other variables were
collected for a perception study to be discussed in Chapter 4.
Closing
movement
duration
Opening
movement
duration
time-to-peak-
velocity
Vertical tongue tip position
Time
closing movement
extremum/opening
movement onset
Velocity zero-crossing
Peak velocity
displacement
closing
movement
onset
opening movement
end/closing movement
onset
Figure 3.1. Schematized representation of tongue-tip tracking and derived measurements for one constriction.
The constrictions in the sequence [T # ND] will be referred to as [C
1
# C
2
]. For the pre-
boundary consonant C
1
occasionally there was a plateau for the consonant constriction
(i.e., more than one zero-crossing for the tongue-tip extremum). In such cases, the first
extremum point was logged as the closing movement extremum. In occasional cases of
more than one velocity peak the highest peak velocity was taken as the relevant data
point. At the boundary, between C
1
and C
2,
there were cases of a low dip with more than
one zero-crossing possibly being identified as the C
1
opening movement end/C
2
closing
movement onset. In these cases, the zero-crossing immediately preceding the maximum
peak velocity for C
2
was logged. There were also a few instances of blended gestures. In
these cases the onset of the gesture was taken as onset of C
1
, while the extremum, peak-
velocity release and end of the opening movement of the blended gesture were taken as
measures for C
2
. The peak velocity closing movement was not used as a measure, as its
position compared to the not blended gestures did not clearly correspond to either C
1
or
C
2
.
90
91
One token each from subject R (sentence 5) and subject E (sentence 21) were
excluded because of data collection error. In addition, constriction C
2
was often followed
by the words a sweet and subject E produced [s] mostly with the constriction far away
from the position of the sensor, and as a result, tracking of C
2
opening movement was in
most cases not possible. Additionally, a number of data points could not be identified by
the software. The summary of data points not available for the study is given in Table
3.2.
Table 3.2. Data points not included in the analysis.
Production variable # Missing Production variable # Missing
time-to-peak velocity
closing movement C
1
closing movement C
1
Total 1 Total 41
Subject B 0 Subject B 4
Subject E 1 Subject E 28
Subject R 0 Subject R 9
time-to-peak velocity
closing movement C
2
closing movement C
2
Total 60 Total 61
Subject B 7 Subject B 7
Subject E 32 Subject E 33
Subject R 21 Subject R 21
time-to-peak velocity
opening movement C
1
opening movement C
1
Total 41 Total 41
Subject B 4 Subject B 4
Subject E 28 Subject E 28
Subject R 9 Subject R 9
time-to-peak velocity
opening movement C
2
opening movement C
2
Total 23 Total 129
Subject B 3 Subject B 15
Subject E 8 Subject E 99
Subject R 12 Subject R 15
For this analysis, the opening movement of the pre-boundary (C
1
) constriction was used
(the other articulatory variables were used for a study in Chapter 4). This is the
articulatory movement that based on predictions of the π-gesture model is expected to
92
show the strongest effect of the boundary, as it is closest to the boundary, and also the
movement that in previous studies has shown the strongest boundary effect (e.g., Byrd,
Lee, Riggs & Adams 2005, Byrd, Krivokapi ć & Lee 2006, Chapter 2 of this dissertation).
To pool the data across subjects, the values for the production variables were
converted to z-scores, for each variable and for each speaker separately. A z-score of a
data point reflects the number of standard deviations this data point is above or below the
mean of all data points; so here a z-score above 0 represents a higher than average
duration of a variable, and a z-score below 0 represents a lower than average duration of a
variable. Pre-boundary opening movement duration data points more than 2.5 standard
deviations above or below the average were removed from the analysis (9 out of 480 data
points total).
3.2. Methods: Perception
The acoustic data collected in the production part of the study were used as stimuli for the
perception study. In the perception part of the study, subjects were asked to rate the
strength of boundaries. The results were used to explore the categoricity/gradiency of
prosodic boundary perception.
3.2.1. Stimuli and Subjects
In order to keep the experiment within a time manageable for subjects, the sentences
produced by the three speakers was divided into two groups. From the sentences from
each of the three speakers, two lists of stimuli were prepared, each containing half of that
speaker’s produced tokens of each stimulus. In the production part of the study, each
93
speaker produced six tokens of each of the 24 sentences.
4
Tokens 1, 3 and 5 (where the
number refers to the order of production) of each sentence were put into one list, and
tokens 2, 4 and 6 into the second list. Each listener was presented with one list of the
stimuli from each speaker, three lists total. Thus each subject listened to 72 sentences
produced by each speaker (3 tokens x 24 sentences), for a total of 216 sentences (72
tokens x 3 speakers). There were 30 listeners, so a total of 6480 perception rating were
collected. All subjects were naïve to the purpose of the experiment and were paid for
their participation.
3.2.2. Data Collection
Listeners’ evaluations were collected using a computer version of the Visual Analogue
Scale (VAS). The VAS has a longstanding use in measuring clinical phenomena, e.g., the
level of pain (Wewers & Lowe 1990). In the paper version of the VAS, subjects are
presented with a line (usually horizontal, 100mm long), with the ends of the line marked
for the phenomena measured (e.g., ‘no pain’ and ‘highest pain’). The subjects respond by
marking the line in a position which corresponds to their estimate of the stimuli (for
example strength of pain). The results are evaluated by measuring the distance of the
mark from one end of the scale. The advantage of using a scale, with only the final ends
of the scale marked rather than a specific experimenter-provided number of potential
boundary strengths, is that it allows listeners a full range of answers, not limited by an
experimenter’s a priori postulated number of categories/boundary strengths.
4
Actually, as detailed in section 3.1.1, in the production part of the study two of the three subjects read 28
sentences, and one subject read 24 sentences. Only the 24 sentences produced by all speakers were used in
the perception part of the study.
For this study, a computer version of the VAS, as implemented by Granqvist
(1996) was used. The instructions to the subjects were as follows:
“You will hear a number of sentences. In each sentence, the word donut will
appear. Please judge how strongly connected the word donut is with the word
following it. You can listen to the sentences two times. When you have decided,
give your answer by clicking on the bar on the screen.”
The screen the subjects saw is shown in Figure 3.2. The ends of the scale were
marked as ‘weakest connection’ (signaling the strongest boundary) and ‘strongest
connection’ (signaling two very connected words, e.g., no boundary in prosodic terms).
Figure 3.2. Visual Analogue Scale.
Subjects were given a practice trial, with the purpose of exemplifying the use of ‘strong’
and ‘weak’ connection between words and making them familiar with using the scale.
They heard two sentences, She gave me a donut. And I hate donuts and Just a plain donut
n’coffee please as examples for the two ends of the scale. The two sample sentences were
spoken by a speaker different from the speakers in the production part of the study. There
94
95
were three parts of the experiment (each part corresponding to one list of sentences), with
a two minute break between each part. On average, the experiment lasted 45 minutes.
3.2.3. Measurements
For each response of the listener, the software returns a numerical value on a scale of 0-
1000. In this case, this is 0 for ‘weakest connection’ (i.e., strongest boundary), and 1000
for ‘strongest connection’ (i.e., weakest boundary). In order to pool the responses across
subjects, the values were converted to z-scores. A z-score above 0 represents a higher
than average rating of the perceived connection between words, and a z-score below 0
represents a lower than average rating of the perceived connection between words, so the
lower the z-score of the perceived boundary strength (PBS) the stronger the boundary.
Results more than 2.5 standard deviations above or below the average were removed
from the analysis (3 data points total) as outliers.
3.3. Statistical Analysis
To investigate whether listeners produce and/or perceive prosodic boundaries in a
gradient or in a categorical manner, two evaluations were conducted on both the
production and on the perception data. For the production data, first a histogram on the
values of the C
1
opening movement was conducted. While this is a good way to do an
exploratory analysis of the data, histograms can be misleading in that the number of
peaks in a histogram does not necessarily reflect the number of distributions in the
histogram, as one peak can arise from more than one distributions. The second analysis
conducted addresses this concern by fitting Gaussian mixture distributions to the data.
The same two evaluations were conducted for the perceived boundary strength (PBS)
values.
4. Results
4.1. Results: Production
A histogram of the opening movement for C
1
, for all speakers pooled, is shown in Figure
3.3.:
0
10
20
30
40
50
60
70
80
90
100
Count
-2 -1.5 -1 -.5 0 .5 1 1. 2 2.5 3
Opening movement C
1
duration, z-scores
short long
Figure 3.3. Histogram of the duration of the pre-boundary opening movement (C
1
).
96
The results of the histogram show one peak in the distribution of boundary production.
There is one clear peak in the histogram, and one standard deviation at the peak contains
97
data.
5
This is a statistical method for
determ set.
e
s x nsamples
ts 'x' according to the mixture model;
other words, it represents the distribution of the data. The penalty term subtracted from
ns
f
253 data points (so more than half of the data points) and contains sentences in the range
from within word boundary to IP boundary.
The number of categories produced by the speakers was further examined by
fitting mixture distributions to the production
ining the number of distributions, in this case Gaussian distributions, in a data
A number of distributions are fitted to the data (up to ten in this study) to determine
which number gives the best fit to the data. To find the best fit, Bayesian’s Information
Criterion (BIC) was used. The smaller the BIC number is, the better the model fits th
data. BIC is defined as given in (1):
(1) BIC = log(p(x)) - 0.5 x nparam
where log(p(x)) is the log-likelihood of the data poin
in
the log-likelihood is a function of the number of parameters (nparams) and number of
data points (nsamples). The parameters for one distribution are the mean, standard
deviation and weight assigned to the distribution. The total number of parameters is N
times the number of parameters for 1 distribution, where N is the number of mixture
components. Thus, the penalty term depends on the number of distributions, which mea
that, even though in principle the larger the number of distributions the better the fit o
the data, the increase in the number of parameters is penalized, thus offsetting the bias of
5
The mixture model analysis for both the production and the perception data was conducted in
collaboration with Sankarnarayanan Ananthakrishnan. I am grateful to Elliot Saltzman for suggesting this
method of analysis.
the model towards a larger number of underlying distributions. To summarize, the BIC
score represents the difference between the true distribution of the data and the model, so
the smaller the BIC score, the better the fit of the model to the data.
The result of the mixture model of the pre-boundary opening movement (C
1
)
duration is shown in Figure 3.4.
Figure 3.4. Opening movement (C
1
). The results of one learning trial of the mixture model.
BIC score
Number of distributions
98
The results of the m
the pre-boundary opening movement, as seen in the lowest BIC score at two Gaussian
distributions. There is no further lowering of the BIC score with more Gaussian
ixture model indicate that there are two clusters in the production of
99
the K-means
distributions. The mixture model is a learning model and results vary across learning
trials, but the big benefit from one to two distributions and no further increase in benefit
with more Gaussian distributions was constant across several trials.
The results of the mixture model indicate a categorical production of prosodic
boundaries, with two prosodic categories dominating. To examine where the clusters fall
within the histogram data, a clustering analysis was conducted using
algorithm, in which the number of clusters is determined by the experimenter. The data
was clustered into 2 clusters, as determined by the BIC score. The means and variances
for the two clusters are given in Table 3.3.
Table 3.3. Means and variances for each cluster (in standard deviations).
mean variance
cluster 1 -0.446 0.202
cluster 2 1.327 0.350
The mean for cluster 1 (-0.446 standard deviation) corresponds to the big peak in the
h r with short pre-boundary opening movement (C
1
). The mean for
luster 2 (1.327 standard deviation) corresponds to boundaries with a long pre-boundary
und
nd
istogram (Figu e 3.3.),
c
opening movement. We examined the sentence types that are 1 standard deviation aro
the cluster means. For the second cluster, there are 37 sentences within one standard
deviation around the cluster mean, and all except four are IP sentences. For the first
cluster, one standard deviation around the mean contains 245 sentences, of which the
majority is at the ip level (with 42 sentences classified as IP boundaries, 53 as within-
word and word boundary, and 159 as ip boundaries by the ToBI transcription of the
experimenter). From this we can conclude that the two clusters in the analysis correspo
100
he
ne
not
tion whether prosodic categories are perceived in a
radient or in a categorical manner, a histogram of the listeners’ perceived boundary
plotted. The results of the histogram are shown in Figure 3.5.
to a big boundary (IP) and a second category with predominantly ip boundaries in the
production of the sentences. These results are to some extent as predicted by standard
prosodic models, in that there are two dominant categories driving the production of
prosodic boundaries, rather than a gradient production. Note however that contrary to t
predictions of standard prosodic models, there are only two categories, and that while o
of these categories corresponds to the IP category, the status of the second category is
clear, as it contains a large number of IP and within-word and word-boundary values, in
addition to the ip boundaries.
4.2. Results: Perception
As a first evaluation of the ques
g
strength (PBS) values was
strong boundaries weak boundaries
Figure 3.5. Histogram of Perceived Boundary Strength (PBS) values.
On the histogram, we see two clear separate distributions. Each peak region was
examined as to evaluate the ToBI prosodic categories they would correspond with (1
standard deviation around the peak of the histogram). The sentences correspond mostly to
IP and ip boundaries respectively. It should be noted that the two peaks in the histogram
were clearly visible for various binnings of the data, not just for 30 bins as represented in
the histogram in 3.6.
The number of categories perceived by listeners was further examined by fitting
mixture distributions to the perception data. The result of the mixture model of the
perception responses is shown in Figure 3.6.
101
102
Figure 3.6. PBS values. The results of one learning trial of the mixture model.
Since the mixture model is a learning procedure, with each run of the model the results
will be slightly different. Constant in each run, however, was the big benefit in the
increase from 1 to 2 distributions and the subsequent gain in the range of 3 to 8
distributions.
6
To examine where these clusters fall within the earlier PBS histogram, a
clustering analysis was conducted using the K-means algorithm in which the number of
clusters is determined by the experimenter. The data was clustered into 2, 3, 4, 5, 6, 7 and
8 clusters, as determined by the BIC score. The means and variances for each cluster are
given in Table 3.4.
6
For each learning trial, the best fit was within the 3 to 8 range, but which of these distributions had the
lowest BIC score, i.e., represented the best fit, varied with learning trial.
Number of distributions
BIC score
Table 3.4. Means and variances for each cluster. The top row refers to the numbers of clusters in the analysis.
2 clusters 3 clusters 4 clusters 5 clusters 6 clusters 7 clusters 8 clusters
mean variance mean variance mean variance mean variance mean variance mean variance mean variance
cluster 1 -0.789 0.202 -0.956 0.122 -1.25 0.066 -1.39 0.055 -1.5 .042 -1.52 .04 -1.55 .036
cluster 2 0.99 0.222 0.095 0.105 -0.59 0.050 -0.07 0.046 .15 .037 -.57 .019 -.28 .018
cluster 3 1.21 0.101 0.38 0.078 0.68 0.044 .81 .033 .59 .027 .76 .022
cluster 4 1.29 0.078 -0.75 0.031 -.96 .017 -.98 .016 -1.05 .015
cluster 5 1.37 0.059 1.41 .053 1.14 .02 1.25 .016
cluster 6 -.53 .025 1.58 .045 1.69 .044
cluster 7 0.01 .029 -.69 .010
cluster 8 .23 .023
103
104
Note that in the case of an even number of clusters (i.e., 2, 4, 6 or 8 clusters), the means
for half of the clusters are evenly dispersed across the left peak in the histogram
(corresponding to stronger boundaries). For example, when there are four clusters, the
means for cluster 1 and cluster 2 (-1.25, -0.59, as shown in Table 3.4, ) are distributed
evenly across the left peak of the histogram in Figure 3.6. We will return to this in the
discussion section.
For each two clusters in each clustering, the cluster distance and overlap were
measured by using the formula in (1) (Duda, Hart, & Stork. 2001), where the difference
between the cluster means indicates the distance between clusters and σ
2
refers to
variance within a cluster, and is a measure of overlap between the clusters (the larger the
variance within each cluster, i.e., the more dispersed each cluster is, the more likely the
overlap between two clusters). Therefore the larger the resulting number, the bigger the
distance and smaller the overlap between the clusters.
(1) (mean cluster 1 – mean cluster 2)
2
σ
2
cluster 1 + σ
2
cluster 2
The results for the cluster distance/overlap measure for the eight clusters are
shown in Table 3.5.
105
Table 3.5. Cluster distance. The rows titled “clusters examined” refer to the two clusters in one clustering analysis that
are being compared. Table continued below.
Clusters
examined,
2 clusters
Cluster
distance
Clusters
examined,
3 clusters
Cluster
distance
Clusters
examined,
4 clusters
Cluster
distance
Clusters
examined,
5 clusters
Cluster
distance
cluster 1 - 2 7.46 cluster 1 - 2 4.9 cluster 1 - 2 3.6 cluster 1 – 2 17.25
cluster 1 - 3 20.85 cluster 1 - 3 17.83 cluster 1 - 3 43.28
cluster 2 - 3 6.03 cluster 1 - 4 42.73 cluster 1 - 4 4.76
cluster 2 - 3 7.71 cluster 1 - 5 66.82
cluster 2 - 4 28.5 cluster 2 - 3 6.25
cluster 3 - 4 5.45 cluster 2 - 4 6
cluster 2 - 5 19.75
cluster 3 - 4 27.26
cluster 3 - 5 4.62
cluster 4 - 5 49.94
Table 3.5. Continued.
Clusters
examined,
6 clusters
Cluster
distance
Clusters
examined,
7 clusters
Cluster
distance
Clusters
examined,
8 clusters
Cluster
distance
cluster 1 - 2 34.46 cluster 1 - 2 15.3 cluster 1 - 2 29.87
cluster 1 - 3 71.15 cluster 1 - 3 66.45 cluster 1 - 3 92
cluster 1 - 4 4.94 cluster 1 - 4 5.21 cluster 1 - 4 4.9
cluster 1 - 5 89.14 cluster 1 - 5 117.93 cluster 1 - 5 150.77
cluster 1 - 6 14.04 cluster 1 - 6 113.06 cluster 1 - 6 131.22
cluster 2 - 3 6.22 cluster 1 - 7 33.93 cluster 1 - 7 16.08
cluster 2 - 4 22.82 cluster 2 - 3 29.25 cluster 1 - 8 53.7
cluster 2 - 5 17.64 cluster 2 - 4 4.8 cluster 2 - 3 27.04
cluster 2 - 6 7.46 cluster 2 - 5 74.98 cluster 2 - 4 17.97
cluster 3 - 4 62.66 cluster 2 - 6 72.23 cluster 2 - 5 68.85
cluster 3 - 5 4.19 cluster 2 - 7 7.01 cluster 2 - 6 62.59
cluster 3 - 6 30.96 cluster 3 - 4 57.32 cluster 2 - 7 6
cluster 4 - 5 80.24 cluster 3 - 5 6.44 cluster 2 - 8 6.34
cluster 4 - 6 4.4 cluster 3 - 6 13.61 cluster 3 - 4 88.54
cluster 5 - 6 48.25 cluster 3 - 7 6.01 cluster 3 - 5 6.32
cluster 4 - 5 124.85 cluster 3 - 6 13.1
cluster 4 - 6 107.44 cluster 3 - 7 65.7
cluster 4 - 7 21.78 cluster 3 - 8 6.24
cluster 5 - 6 2.98 cluster 4 - 5 170.64
cluster 5 - 7 26.06 cluster 4 - 6 127.25
cluster 6 - 7 33.31 cluster 4 - 7 5.18
cluster 4 - 8 43.12
cluster 5 - 6 3.23
cluster 5 - 7 144.75
cluster 5 - 8 26.68
cluster 6 - 7 104.9
cluster 6 - 8 31.81
cluster 7 - 8 25.645
106
As can be seen in the tables, the distances and overlap between the clusters does not
greatly change with an increase in the number of clusters, i.e., it is not the case that for
example the distances between clusters when there are 3 clusters and when there are 6
decrease significantly. This further suggests that the 3 to 8 distributions seen in the
mixture model represent an accurate account of the data.
The results of the histogram and the mixture model suggest that while listeners
perceive two distinct categories, they are also sensitive to a more gradient structure,
namely to a larger number of prosodic boundaries (shown in the mixture model).
5. Discussion
In the first part of this chapter the categoricity in the production of prosodic boundaries
was investigated. The results of the histogram show one category in the production of
boundaries, but the results of the mixture model show that the pre-boundary opening
movement data cluster into two distinct categories. This confirms the predictions of the
standard prosodic theories (for example Beckman & Pierrehumbert 1986) to the extent
that the production is dominantly within two categories. However, the explicit ToBI
prediction is that there would be at least three and possibly four distinct prosodic
categories (one corresponding to the prosodic word and possibly one separate category
for the within-word-boundary condition), and this prediction is not born out. Further, the
type of prosodic category for the cluster containing largely ip boundaries is not clear.
The question arises as to why there were only these two categories in the
production of prosodic boundaries, as a number of previous studies has found that several
prosodic categories can be distinguished by final lengthening (see review in Chapter 1 of
107
this dissertation). Our own experiment presented in Chapter 2, which is based on a small
subset of the data from the experiment presented in this chapter, has also found three
levels of lengthening for the opening movement. A reason for this discrepancy is likely to
be found in the fact that the design of the experiment focused on above-word level
boundaries, and that there might not have been enough tokens below the ip level to
distinguish a third category of boundary strength.
For the perception part of this study the histogram shows two distinct groups of
boundaries, corresponding to the IP and ip boundaries within the ToBI system. This
result suggests that listeners do perceive boundaries in distinct clusters, as predicted by
standard prosodic models (for example Beckman & Pierrehumbert 1986). However, the
prediction from this type of categorical, strict-layered model is that listeners would
perceive at least three and possibly four distinct prosodic categories as discussed above.
This is clearly not the case, neither in the histogram nor in the result of the mixture model
fitting.
The results of the fitting of multiple distributions show that although the biggest
increase in data fitting is from 1 to 2 distributions, there is also an increase from 3 to 8
distributions. This shows that there are further distributions, beyond the 2 seen in the
histogram, indicating that Ladd (1988) and Wightman et al. (1992) might be correct in
considering the possibility of a larger number of prosodic boundaries. The question arises
how these distributions could be accounted for within models assuming at most 4 distinct
prosodic categories. One possibility is that these distributions arise through recursion, as
suggested by Ladd (1996). This would mean that while the number of different prosodic
categories is two, there are more structurally, though not qualitatively, different
108
boundaries that occur via recursion. One indication that prosodic recursion might be the
cause of the distribution of PBS values that we see in this experiment is the distribution
of clusters mentioned above. As pointed out in section 4 the clusters of the PBS values
are distributed evenly for the left peak (stronger boundaries) of the histogram. For
example, if the clustering algorithm was specified for 6 clusters, three of these clusters
were distributed evenly within the left peak of the histogram. The center of this peak was
found to correspond to IP boundaries, so the separate clusters within this peak indicate
that speakers perceive IPs of different strengths. For the right peak, corresponding to
weaker boundaries, there was not such a clear correspondence, as for any number of
specified clusters there was a cluster mean in the range of 0 to .5 strength of PBS. It is
unclear what the theoretical status of a prosodic boundary in this area of PBS values
would be.
In sum, given that both the findings of the perception part of this study and those
of Wightman et al. (1992) indicate a large number of prosodic categories in the
perception of prosodic boundaries, theories of prosodic structure will need to consider the
cause of these prosodic categories and the possibility of recursion.
The discrepancy in the categoricity of prosodic boundary production and
perception is likely to be due to further information available to listeners in the perception
task. In addition to the temporal properties, listeners also had tonal information available
to them for example. We also only investigated one articulatory movement in the
production part of the study, and examining further articulatory movements at the
boundary might yield a more fine-grained view.
109
6. Conclusions: Categoricity or Gradiency of Prosodic Boundaries
We set out to examine whether prosodic boundaries are produced and perceived in a
categorical or in a gradient manner. While categoricity has been the assumption in many
studies (see for example Schafer 1997 for the perception of prosodic categories, Beckman
& Pierrehumbert 1986 for the production of prosodic categories), very few studies have
examined these assumptions. The production of prosodic boundaries as shown in this
experiment is categorical, showing a large (IP) and a predominantly small prosodic
boundary. The perception of prosodic boundaries showed that listeners perceive two
distinct categories, but also that the data is better explained if more underlying
distributions, up to 8, are assumed. It was suggested that these additional categories in
perceived boundary strength might arise via recursive prosodic structures.
The results of this study further demonstrate the need to examine the possibility of
structurally richer prosodic representations that allow, for example, for prosodic recursion
or some other way to derive the more gradient perception of prosodic boundaries. It also
indicates that the three or four types of prosodic boundaries assumed in standard prosodic
theories might not be phonetically justified, from the point of view of both perception and
production.
110
Chapter 4: Prosodic Boundary Perception and Articulation
1. Introduction
The present study investigates the perception of prosodic boundaries and its relation to
articulation. While research has been done on the function of prosody in speech
perception (as for example the function of prosodic boundaries in syntactic
disambiguation, e.g., Klatt 1975, Lehiste et al. 1976, Streeter 1978, Scott 1982, Schafer
1997 among many others), few studies have examined the nature of prosodic boundary
perception as an independent phenomenon (but see Swerts 1997, de Pijper & Sanderman
1997, Hansson 2003 for acoustic studies). This study focuses on how listeners’
perception is related to articulatory properties of prosodic boundaries. To this end, the
perception of twenty-four junctures ranging from ‘no boundary’ to ‘very strong
boundary’ is investigated.
2. Background: Production-Perception Link
Research on the production of prosodic structure has provided us with insight into the
temporal properties of phrase boundaries. It is well know that at prosodic boundaries,
segments exhibit initial and final acoustic lengthening (e.g., Oller 1973, Shattuck-
Hufnagel & Turk 1998). Articulatory studies have found that gestures become longer in
the vicinity of boundaries and that this effect increases cumulatively for larger prosodic
boundaries (e.g., Byrd & Saltzman 1998, Keating et al. 2004, Cho 2005). Few studies
however have investigated how these properties relate to the perception of prosodic
111
boundary strength. Those that have, have focused on the role of acoustic properties (e.g.,
de Pijper & Sanderman 1994, Hansson 2003, Sanderman & Collier 1995).
The study by de Pijper & Sanderman (1994) examines how listeners’ perception
of prosodic boundaries relates to acoustic properties of boundaries, in Dutch sentences
read by three speakers. They find a strong correlation between the presence of final
lengthening, melodic discontinuity and reset, and pausing and perceived boundary
strength (PBS) with presence, but not duration, of a pause being the strongest cue for
perceived boundary strength. Sanderman and Collier (1995) in a further study found that
listeners’ PBS values were correlated with presence and duration of a pause, melodic
discontinuity and declination reset, while phrase final lengthening did not show a
systematic relation to PBS values. Similarly, Hansson (2003) examines perceived
boundary strength in Swedish and finds that pause occurrence and pause length are both
strongly correlated with PBS, and to a lesser extent to F0 reset. She finds no correlation
between phrase final lengthening and PBS. Note however, that Hansson’s phrase final
lengthening is measured in terms of articulation rate (syllable per second), a measure that
is potentially too crude to find an effect as Hansson points out. Swerts (1997) also
examines the perception of prosodic boundaries and finds a significant correlation
between PBS and pause duration and pitch reset, and between boundary tone and PBS,
such that the proportion of low boundary tones increases with PBS. Overall, these studies
show that acoustic qualities at junctures and PBS values are strongly correlated.
A question that arises when listeners listen to complete sentences and give
boundary strength judgments is whether their judgments are based on syntactic and
112
semantic information or on acoustic/articulatory information. Since listeners are hearing
whole sentences, they might be judging PBS based on the articulatory information
present in the acoustic signal, or they may be assigning boundary strength based on
syntactic information, or their evaluation might be a combination of both. A few studies
indicate that listeners indeed evaluate the acoustic information in the signal. The study
reported above, de Pijper & Sanderman (1994), also investigate a delexicalized version of
the stimuli (i.e., speech rendered unintelligible), so as to be able to estimate the influence
of prosody without lexical, syntactic and semantic information. They found a high
correlation between the lexical and delexicalized version of the stimuli, showing that
listeners’ perception of boundary strength does not change depending on the availability
of semantic and lexical information (de Pijper & Sanderman 1994). Hansson (2003), on
the other hand, reports that when pause length is equal, sentence boundaries receive a
stronger PBS value than clause boundaries. This would indicate the influence of both the
acoustic and syntactic information. However, as she does not report statistical analyses,
and the differences in PBS are very small, this cannot be ascertained. Finally, Swerts
(1997) examined both a text-with-speech and a text-alone condition. A comparison of
these two conditions revealed a strong correlation between them, but it was also found
that in the speech condition there is more inter-subject agreement and that subjects are
clearer as to the exact boundary location. Overall, listeners seem to be guided by both
syntactic and semantic information, if available, and by acoustic information, although
each alone can suffice for boundary strength assignment.
113
The present study moves from a consideration of the role of acoustics in prosody
perception to an examination of how speakers’ perception of phrase boundaries is related
to the temporal articulatory properties of prosodic boundaries. Since articulatory
properties shape the acoustic signal, it is clear that there will be a correlation between the
articulatory properties and PBS, but the question is what the nature of this relation is. We
will consider which production variables in our dataset are the most predictive for
determining PBS values. The production variables examined are the pre- and post-
boundary closing movement durations, opening movement durations and their
corresponding time-to-peak velocities. The two consonants closest to the boundary were
chosen for this study as it was these consonants that prior production studies showed to
be most affected by prosodic boundaries (e.g., Edwards et al. 1991, Berkovits 1993a,
Cambier-Langeveld 1997, Fougeron & Keating 1997, Turk 1999, Tabain 2003b, Byrd et
al. 2006).
3. Experiment: Prosodic Boundary Perception and Articulation
The goal of the experiment is to investigate the perception of prosodic boundaries of
different strengths. We consider how listeners’ perception relates to the articulatory
properties of boundaries and whether listeners perceive boundaries in a categorical or in a
gradient manner. The experiment consists of a production and a perception part.
Articulatory data collected in the production part were used as stimuli in the perception
part of the study. The data presented in this chapter are a subset of the data presented in
Chapter 3.
114
3.1. Methods: Production
3.1.1. Stimuli and Subjects
Twenty-four sentences from the experiment described in Chapter 3 were used as stimuli
for this experiment (see Chapter 3 for details of data collection). As a reminder, this was
an articulatory experiment, using movement-tracking (EMA) of tongue-tip movement.
Each sentence contained the words ‘donut and,’ with a potential boundary of varying
strength between them, ranging from ‘within word boundary’ to ‘very strong boundary’.
The constrictions of interest are [T] and [ND] (referred to in the remaining part of this
chapter as C
1
and C
2
respectively). Three subjects participated. Two of the subjects read
28 sentences, and one subject read 24 sentences. The 24 sentences read by all three
subjects were used in the perception study. Each sentence was repeated six times,
yielding a total of 144 (24 x 6) sentences per subject, and 432 all together (144 x 3). The
list of stimuli is given in Table 4.1, ordered in predicted ascending boundary strength.
115
Table 4.1. Stimuli for the experiment. The boundary to be examined is between the words ‘donut and’, and ranges from
‘no boundary’ to ‘very strong boundary’.
1. After lunch Joyce bought a big box of donut-and-apple cereal.
2. Mary-Ann always has the donut-and-orange flavor in her fridge.
3. Would you mind getting me a donut and apple on your way back from school?
4. Johnny’d like a donut and apple for a change, instead of his usual lunch.
5. Mary’d like a donut and a sweet apple for breakfast, as always on Sundays.
6. All Tanya needs is a banana donut and a sweet apple, he said.
7. A hot banana donut and a sweet apple biscuit is all I need.
8. Pete brought me a humongous mocha donut and a sweet apple from Spain.
9. If you ask for a donut and a sweet boy brings you a pie, just say thank you.
10. While Ann was eating a donut and a sweet angora cat was waiting, he came.
11. It’s not that Phil didn’t like the donut and a sweet boy ate it – he stole it!
12. Eric was eating a donut and a sweet cat was playing, and then you came.
13. It’s possible that Nina didn’t like the donut and a sweet boy ate it.
14. It’s rare that the head coach offers a donut and a sweet athlete refuses.
15. The trainer who was eating a donut, and a sweet angora cat, have arrived.
16. I saw the man who never ate a donut, and a sweet cat in the garden.
17. On Sundays they like a donut, and a sweet apple biscuit on Saturdays.
18. She wondered why Alex had a donut and a sweet answer escaped her.
19. She talked to a tall boy eating a donut and a sweet lady gave them tea.
20. Susie was eating a donut, and a sweet camera was filming it all.
21. When she came, Mike took a donut. And a sweet excuse allowed him to leave.
22. The kids would like a donut?! And a sweet hostess wouldn’t even let them?
23. She was surprised Gary ate a donut. And a sweet surprise it was.
24. The princess gave the fairy a donut. And a sweet tale came to an end.
3.1.2. Measurements
The measurements conducted for the production part of the study are described in
Chapter 3. From these measurements, the following dependent spatiotemporal variables
are derived for each of the two constrictions: (see Figure 4.1 for a schematized
representation):
• closing movement time-to-peak-velocity: the time from closing movement onset to
peak velocity
• closing movement duration: the time from closing movement onset to extremum
• opening movement time-to-peak-velocity: the time from extremum to the peak
velocity of the opening movement
• opening movement duration: the time from extremum to the end of the opening
movement
Closing
movement
duration
Opening
movement
duration
time-to-peak-
velocity
Vertical tongue tip position
Time
closing movement
extremum/opening
movement onset
Velocity zero-crossing
Peak velocity
Closing
movement
onset
Opening movement
end/closing movement
onset
Figure 4.1. Schematized representation of tongue-tip tracking and derived measurements for one constriction.
Not all the data points could be measured (see Chapter 3). From 432 tokens (24 sentences
x 6 repetitions x 3 speakers) for each of the eight production variables (four for each
constriction), the following data points were missing: 41 tokens of closing movement C
1
,
1 token of time-to-peak velocity closing movement C
1
, 61 tokens of closing movement
C
2
, 60 tokens of time-to-peak velocity closing movement C
2
, 41 token for opening
movement C
1
, 41 tokens of time-to-peak-velocity opening movement C
1
, 129 tokens for
opening movement C
2
, and 23 tokens of time-to-peak velocity opening movement C
2
.
These eight variables were used to explore the link between the perception of
prosodic boundaries and the articulatory properties of prosodic boundaries. To pool the
116
117
data, the results were converted to z-scores, for each of the variables, for each of the
speakers separately. A z-score of a data point reflects the number of standard deviations
this data point is above or below the mean; so here a z-score above 0 represents a higher
than average duration of a variable, and a z-score below 0 represents a lower than average
duration of a variable.
3.2. Methods: Perception
The acoustic data collected in the production part of the study were used as stimuli for the
perception study. In the perception part of the study, subjects were asked to rate the
strength of boundaries of these sentences. The perception ratings collected in the study
reported in Chapter 3 were used for this experiment as well (see Chapter 3 for details on
the data collection).
3.2.1. Stimuli, Subjects, Data Collection and Measurements
A detailed report on the stimuli, subjects, data collection and measurements is given in
Chapter 3, section 3.2, and we only give a brief reminder of the procedures here. As
reported in Chapter 3, in order to keep the experiment within a time manageable for
subjects, the sentences produced by the three speakers were divided into two groups.
From the sentences from each of the three speakers, two lists of stimuli were prepared,
each containing half of that speaker’s produced tokens of each stimulus. In the production
part of the study, each speaker produced six tokens of each of the 24 sentences. Tokens 1,
3 and 5 (where the number refers to the order of production) of each sentence were put
118
into one list, and tokens 2, 4 and 6 into the second list. Each listener was presented with
one list of the stimuli from each speaker, three lists total. Thus each subject listened to 72
sentences produced by each speaker (3 tokens x 24 sentences), for a total of 216
sentences (72 tokens x 3 speakers). There were 30 listeners, so a total of 6480 perception
rating were collected. All subjects were naïve to the purpose of the experiment and were
paid for their participation.
Listeners evaluated boundary strength using a computer version of the Visual
Analogue Scale, (VAS), as implemented by Granqvist (1996) (see Chapter 3) for more
details.
The instructions to the subjects were as follows:
“You will hear a number of sentences. In each sentence, the word donut will
appear. Please judge how strongly connected the word donut is with the word
following it. You can listen to the sentences two times. When you have decided,
give your answer by clicking on the bar on the screen.”
Subjects saw the screen shown in Figure 4.2. The ends of the scale were marked as
‘weakest connection’ (signaling the strongest boundary) and ‘strongest connection’
(signaling two very connected words, e.g., no boundary in prosodic terms).
Figure 4.2. Visual Analogue Scale.
For each response of the listener, the software returns a numerical value on a scale
of 0-1000 (in this case, 0 for ‘weakest connection’, i.e., strongest boundary, and 1000 for
‘strongest connection’, i.e., weakest boundary). These values were converted to z-scores
in order to pool the responses across subjects. A z-score above 0 represents a higher than
average rating of the perceived connection between words, and a z-score below 0
represents a lower than average rating of the perceived connection between words, so the
lower the z-score of the perceived boundary strength (PBS) the stronger the boundary.
3.3. Statistical Analysis
In order to determine which of the production variables best predict the perceived
boundary strength, a stepwise multiple regression analysis was conducted, fitting
production variables to perceived boundary strength values. There are four production
variables considered for both the pre- & post-boundary consonant constrictions: closing
and opening movement duration, and time-to-peak velocity for the closing and opening
119
120
movements. Of these, the closing movement time-to-peak velocity and the duration of the
closing movement are not independent of each other, and likewise the opening movement
time-to-peak velocity and the duration of the opening movement are not independent, and
therefore only one of them can be used in the regression analysis. To determine whether
the time-to-peak velocity or the corresponding closing/opening movement duration is the
better predictor of PBS, a linear regression was first performed separately for each of
these variables. Based on this, the variables determined to give the better fit to the PBS
values were then used in the stepwise multiple regression analysis.
Note that regression models assume a normal distribution, although they are fairly
robust against deviations from normality. Since the perception data in this study do not
show a normal distribution (as can be seen in Chapter 3, Figure 3.5), but a bimodal one, a
binomial fit was also done to verify the validity of the multiple regression fit.
4. Results
The results of the linear regression, for each of the four variables, are shown in Table 4.2.
The closing and opening movement durations are a better fit to the PBS values than the
closing and opening movement time-to-peak velocity in three out of four cases, so the
closing and opening movement duration will be used for further analyses (fitting stepwise
multiple regressions).
121
Table 4.2. Results of linear regression. C
1
refers to the pre-boundary, and C
2
to the post-boundary constriction.
C
1
C
2
closing movement time-
to-peak velocity
R
2
=.037, standardized coefficient=-.191,
F(1,6463)=244.993, p<.0001
R
2
=.024, standardized coefficient=-.155,
F(1,5578)=137.126, p<.0001
closing movement
duration
R
2
=048, standardized coefficient=-.220,
F(1,5863)=298.275, p<.0001
R
2
=.018, standardized coefficient=-.134,
F(1,5563)=101.186, p<.0001
opening movement
time-to-peak velocity
R
2
=.064, standardized coefficient=-.253,
F(1,5863)=402.041, p<.0001
R
2
=.089, standardized coefficient=.299,
F(1,6133)=602.423, p<.0001
opening movement
duration
R
2
=.110, standardized coefficient=-.331,
F(1,5863)=721.132, p<.0001
R
2
=.101, standardized coefficient=.318,
F(1,4543)=511.829, p<.0001
Figures 4.3 to 4.6 show the regression plots for the four variables used in further
analyses. Remember that subjects were asked to judge how strongly connected two words
are. Very strongly connected words – with a very weak or no boundary – thus received a
high score in the perception experiment, and weakly connected words – strong
boundaries – received a low score. Therefore an inverse correlation in the graph means
that boundaries perceived as strong (low z-scores) correspond to longer duration of the
production variable. A positive correlation means that boundaries perceived as strong
(low z-scores) correspond to shorter duration of the production variable. A priori we
predict a negative correlation for PBS with the pre-boundary consonant durations, based
on the findings in Chapter 2.
Figure 4.3. Correlation of the pre-boundary closing movement C
1
(in the sequence VC
1
#VC
2
) to the perceived
boundary strength.
Figure 4.4. Correlation of the pre-boundary opening movement C
1
(in the sequence VC
1
#VC
2
) to the perceived
boundary strength.
122
Figure 4.5. Correlation of the post-boundary closing movement C
2
(in the sequence VC
1
#VC
2
) to the perceived
boundary strength.
Figure 4.6. Correlation of the post-boundary opening movement C
2
(in the sequence VC
1
#VC
2
) to the perceived
boundary strength.
123
124
The results of fitting stepwise multiple regressions (see Table 4.3) of the four production
variables to the PBS values show that the production data yield a significant fit to the
perception data (F(4,4135)=319.868, adjusted R
2
=.236). The strongest predictor of
perceived boundary strength is the opening movement C
1
.
The contribution of each variable is shown in Table 4.3. As a reminder, the
constriction sequence was VC
1
# VC
2
, so C
1
is the pre-boundary constriction immediately
at the boundary, and C
2
is the post-boundary constriction, but due to the vowel preceding
it, it is not immediately at the boundary. The order of the variables according to how well
they predict PBS is the following: opening movement C
1
, opening movement C
2
, closing
movement C
1
, and closing movement C
2
. For all production variables, except for the
opening movement C
2
, the correlation is such that an increase in duration of the
production variable corresponds to an increase in perceived boundary strength. In the
case of the opening movement C
2
, boundaries perceived as stronger have shorter
durations of the opening movement C
2
.
Note that the strong correlation between the pre-boundary opening movement C
1
and PBS is not trivial – generally, the opening movement of C
1
would include in the
acoustic signal (if present) the release of the plosive, any pause that might occur, and
occasionally the acoustic signal for the initial portion of the post-boundary vowel. This is
not the information traditionally measured in final lengthening, where lengthening in a
VC rime (when the constriction is a voiceless plosive, as here) would be measured as the
pre-boundary vowel lengthening. In other words, this articulatory variable does not
125
directly correspond to any particular acoustic cue nor to a traditionally examined final
lengthening measure of rime/vowel duration.
Table 4.3. Results of stepwise multiple regression
Predictor variable Standardized coefficient F-to-Remove
opening movement C
1
-.307 378.217
opening movement C
2
.190 181.703
closing movement C
1
-.135 72.807
closing movement C
2
-.113 62.948
As was shown in the analysis of the categoricity of prosodic boundary perception
(Chapter 3), the values of perceived boundary strength did not show a normal distribution
but a binomial one, and therefore a binomial regression was fitted as well. The results,
shown in Table 3.5., indicate the same direction of effect as the stepwise model and the
linear regression.
Table 4.4. Results of binomial regression
C
1
C
2
closing movement
duration
R
2
=064, standardized coefficient=-.297,
F(2,5862)=200.271, p<.0001
R
2
=.020, standardized coefficient=-.147,
F(2,5562)=58.434, p<.0001
opening movement
duration
R
2
=.127, standardized coefficient=-.472,
F(2,5862)=426.298, p<.0001
R
2
=.101, standardized coefficient=.297,
F(2,4542)=257.456, p<.0001
Finally, a jackknifing procedure was also conducted, where the z-scores of the
PBS values were separated into two groups, namely scores above and scores below zero.
This division of the data corresponds to the two peaks in the binomial distribution of the
data which was observed in Chapter 3 (see histogram in Figure 3.5)– the peaks of the
histogram were approximately at 1.25 and -1.25 on the z-score scale, so dividing the data
at zero separates the two distributions. The two groups of data were then separately
126
statistically analyzed to evaluate whether each of the two distributions behaves like the
data set in its entirety.
The results of the stepwise multiple regression for weaker boundaries (i.e., z-
scores above zero) is shown in Table 4.5, and in Table 4.6 for stronger boundaries.
Table 4.5. Results of stepwise multiple regression for weaker boundaries (in the sequence VC
1
#VC
2
)
Coefficient Std. Error Std. Coeff. F-to-Remove
closing movement C
1
-0.029 0.013 -0.054 4.725
closing movement C
2
-0.063 0.011 -0.137 32.52
opening movement C
1
-0.095 0.018 -0.13 28.219
opening movement C
2
0.088 0.011 0.194 68.585
The results of the jackknifed stepwise multiple regression for weaker boundaries show
that the production data yield a significant fit to the perception data [F(4,1839)=54.253,
p<.0001, adjusted R
2
=.104]. All variables contribute to the model, the most predictive
being the opening movement C
2
, followed by the closing movement C
2
, opening
movement C
1
, and closing movement C
1
, as can be seen by the F-to-Remove values. The
results show the same direction of the correlation for the four variables as for the
complete data set. Namely, for all variables except the opening movement C
2
a stronger
PBS corresponds to longer duration of the variable, whereas the opposite is true for the
opening movement C
2
. The results of the jackknifed stepwise multiple regression for
stronger boundaries (Table 4.6) is a model that includes the variables closing movement
C
2
and opening movement C
1
and show that the production data yield a significant fit to
the perception data [F(2,2293)=141.308, p<.0001, adjusted R
2
=.109]. The most
predictive variable is the opening movement C
1
, as can be seen by the F-to-Remove
values.
The two other variables (closing movement C
1
and opening movement C
2
) were
127
not contributing significantly to the model. Note that these production variables are
further away from the boundary than the two that contribute to the model.
Table 4.6. Results of stepwise multiple regression for strong boundaries.
Coefficient Std. Error Std. Coeff. F-to-Remove
closing movement C
2
-0.02 0.009 -0.046 5.421
opening movement C
1
-0.136 0.008 -0.326 273.888
In the discussion of the prior literature, the question was raised whether listeners
use articulatory information in their evaluation of boundary strength, rather than syntactic
or semantic information only. An indication that listeners use phonetic information in
evaluating boundary strength comes from the variability shown in the perception data for
each sentence. Figure 4.7 shows box plots for the perception values for each sentence.
1
The left end of the box indicates the 25
th
percentile and the right end the 75
th
percentile of
the data, and the ends of the horizontal lines represent the 10
th
and 90
th
percentile of the
PBS values. The 50
th
percentile of the PBS values is represented by the vertical line in the
box. The circles represent the PBS values below the 10
th
and above the 90
th
percentile.
Since within each individual sentence the syntactic and semantic information are the
same, the large variability seen in the box plot analysis shows listeners’ response to
speakers’ variation in the production of the sentences, indicating that listeners responded
to phonetic, and not only to syntactic and semantic information.
1
Note that while Figure 4.7. shows only the pooled perception data, the results of individual listeners also
show variability.
Figure 4.7. Box plots for PBS values for individual sentences. The y-axis gives the number of the individual
sentences, and the x-axis the PBS scores.
To further evaluate the question whether listeners respond to phonetic information
(possibly in addition to other information, for example syntactic/semantic information), a
linear regression was performed on each of the 24 sentences separately, for each of the
four variables, although this is only a small number of data points.
2
The results are shown
in Table 4.7 for C
1
and 4.8 for C
2
. Since in each sentence the syntactic and semantic
information are constant, any variability in boundary strength perception can only be due
to listeners’ evaluation of the details of the individual token productions. The results
show a significant regression in 24 cases (out of 96 possible, i.e., in 25% of the cases.).
All the effects are in the same direction as the effects reported above, namely, longer
128
2
I am grateful to Louis Goldstein for suggesting this analysis to me.
129
articulatory durations are correlated with boundaries perceived as stronger, and for the
opening movement C
2
the correlation is in the opposite direction. An exception is the
closing movement C
2
for sentence 5, where there is an inverse correlation as well. Since
the variability in evaluation cannot be due to syntactic and semantic properties, the
significant correlations we observe must be due to articulatory/acoustic information. We
can conclude that listeners did use articulatory information as heard in the acoustic signal
in evaluating boundary strength.
130
Table 4.7. Results of linear regressions performed on tokens of individual sentences for constriction C
1
. Blank cell
indicates no significant effect.
closing movement C
1
opening movement C
1
sentence 5 adjusted R
2
=063, standardized
coefficient=-.259, F(1,223)=16.024,
p<.0001
adjusted R
2
=037, standardized
coefficient=-.204, F(1,223)=9.694,
p=.0021
sentence 8 adjusted R
2
=012, standardized
coefficient=-.127, F(1,238)=3.883,
p=.0499
adjusted R
2
=013, standardized
coefficient=-.131, F(1,238)=4.134,
p=.0431
sentence 10 adjusted R
2
=030, standardized
coefficient=-.184, F(1,268)=9.421
p=.0024
sentence 11 adjusted R
2
=015, standardized
coefficient=-.140, F(1,193)=3.885
p=.0501
Sentence 12 adjusted R
2
=018, standardized
coefficient=-.149, F(1,238)=5.405
p=.0209
sentence 13 adjusted R
2
=026, standardized
coefficient=-.176, F(1,223)=7.092,
p=.0083
sentence 14 adjusted R
2
=015, standardized
coefficient=-.139, F(1,253)=4.996,
p=.0263
sentence 15 adjusted R
2
=029, standardized
coefficient=-.180, F(1,268)=8.938,
p=.0031
adjusted R
2
=079, standardized
coefficient=-.287,
F(1,268)=24.109, p<.0001
sentence 16 adjusted R
2
=026, standardized
coefficient=-.173, F(1,253)=7.828,
p=.0055
sentence 17 adjusted R
2
=018, standardized
coefficient=-.149, F(1,268)=6.047,
p=.0146
sentence 18 adjusted R
2
=111, standardized
coefficient=-.339,
F(1,208)=27.073, p<.0001
sentence 19 adjusted R
2
=030, standardized
coefficient=-.183, F(1,253)=8.794,
p=.0033
sentence 20 adjusted R
2
=047, standardized
coefficient=-.224, F(1,268)=14.219,
p=.0002
adjusted R
2
=087, standardized
coefficient=-.301,
F(1,268)=26.720, p<.0001
sentence 24 adjusted R
2
=033, standardized
coefficient=-.191,
F(1,268)=10.168 p=.0016
131
Table 4.8. Results of linear regressions performed on tokens of individual sentences for constriction C
2
. Blank cell
indicates no significant effect.
closing movement C
2
opening movement C
2
sentence 3 adjusted R
2
=021, standardized
coefficient=-.161, F(1,193)=5.14,
p=.0245
sentence 4 adjusted R
2
=040, standardized
coefficient=.212, F(1,193)=9.107,
p=.0029
sentence 5 adjusted R
2
=026, standardized
coefficient=.174, F(1,223)=6.99,
p=.0088
sentence 18 adjusted R
2
=024, standardized
coefficient=-.170, F(1,208)=6.188,
p=.0136
sentence 19 adjusted R
2
=019, standardized
coefficient=.153, F(1,208)=4.974,
p=.0268
sentence 21 adjusted R
2
=013, standardized
coefficient=-.129, F(1,268)=4.508
p=.0347
5. Discussion
The previous sections examined the link between the perception of prosodic boundaries
and the production of articulatory movements in the vicinity of boundaries.
The results of the analyses show that listeners are most sensitive to the
articulatory movements in the following order: opening movement C
1
,
opening
movement C
2
, closing movement C
1
,
and they are least sensitive to the closing movement
C
2
. Acoustically, most information on C
1
(a voiceless stop) is to be found in the formant
transitions during the closing movement of a constriction. During the opening C
1
movement, there is acoustic information on the release of the plosive, any pause that
might occur, and occasionally acoustic information on the post-boundary vowel.
However, although the closing movement C
1
is predictive of the perceived boundary
strength, the opening movements were found to be more informative. One interpretation
132
of this is that the opening C
1
movement is where most articulatory lengthening is
expected, as it is closest to the boundary and the effects of the π-gesture are strongest
there (see e.g., the production data in this study, Chapter 2, and Byrd, Krivokapi ć & Lee
2006). That listeners are most sensitive to variation in this articulatory movement is an
indication that the movement with the most articulatory information about the boundary
strength is of importance in perception. The second most predictive articulatory variable
is the opening C
2
movement, where the boundary strength is inversely related to
articulatory duration. This inverse relation could indicate that listeners are sensitive to the
pattern of “compensatory” shortening observed in Byrd et al. (2006) – where stronger
boundaries showed compensatory shortening further away from the boundary. This
means that the most predictive variables for PBS are those that have most salient
articulatory information – most final articulatory lengthening and “compensatory”
shortening.
The results of the analysis on the two distributions of the data, below and above
zero z-score (as seen in the analysis of the jackknifed data) confirm that the general
direction of the analysis of the complete set of data is correct. The most predictive
variables were the opening movement C
2
for small boundaries and opening movement C
1
for strong boundaries, while for the complete set of data it was the opening movement C
1
followed by the opening movement C
2
. Importantly, as this was the most predictive
variable for the complete data set, the opening movement C
1
shows a significant
correlation for both small and large boundaries. However, the question arises as to why
the inverse effect, found on the opening movement C
2
when the complete set of data was
133
analyzed, was not observed for strong boundaries but only for weak boundaries. On the
interpretation that the inverse effect arises due to listeners’ response to “compensatory”
shortening in the production, the findings on the jackknifed data mean that listeners rely
on “compensatory” shortening effects for weak but not for strong boundaries. While
more studies need to be done to warrant a firm conclusion, one possibility is that this
could be due to weaker boundaries not exhibiting other cues as strongly as the stronger
boundaries (intonational cues for example) so listeners “use” the “compensatory”
shortening more than in the case of stronger boundaries.
The box plot and linear regression analysis of the individual sentences, conducted
to investigate whether listeners respond to phonetic information or just to
syntactic/semantic information in the sentences, showed that there was a large variation
in the PBS values for individual sentences (as seen in the box plot) and in 25% of the
cases there was a significant correlation between articulatory movement and PBS (linear
regression analyses). This indicates that listeners indeed responded to the
acoustic/articulatory information of the stimuli.
The results of the correlation between the opening movement C
2
and PBS also
bear on this question. Listeners’ responses to “compensatory” shortening are a further
indication of listeners responding to articulatory information, as it would be difficult to
account for this effect based on syntactic or semantic assignment of boundary strength.
Our findings, in addition to the results of previous studies (de Pijper & Sanderman 1994,
Swerts 1997) present good evidence that listeners use phonetic information in assessing
prosodic boundary strength.
134
6. Conclusions: Prosodic Boundary Perception and Articulation
This chapter addressed the link between production and perception. A significant
correlation was found between articulatory properties at the boundaries and PBS
(perceived boundary strength) values. The most predictive variables are the opening
movement C
1
and
opening movement C
2
. We suggest that these are articulatorily the
most informative movements, as they are the movement with the strongest boundary
effect and the movement with “compensatory” shortening.
Given that the typical acoustic measure for final lengthening involves measuring
the rime/vowel duration, this study shows the need to investigate whether the articulatory
measure we used here, or a more traditional acoustic final lengthening is a better
predictor of PBS values. The results of this study further demonstrate the need to
investigate the perception of prosodic boundaries as influenced not just by properties
immediately at the boundary, but also further away from the boundary, as there were
significant correlations between PBS values and articulatory movements further away
from the boundary.
Note also that the strong correlation between the pre-boundary opening movement
C
1
and PBS is not trivial – acoustically, the opening movement of C
1
is manifested as the
release of the plosive (when the release occurs), any pause that might occur, and
occasionally acoustic information on the post-boundary vowel. This is not the
information traditionally measured in final lengthening, where lengthening in a VC
rhyme when the constriction is a voiceless plosive would be measured as the vowel
lengthening. In other words, the articulatory variable does not correspond to any
135
particular acoustic cue, nor to a traditionally examined final lengthening measure of
rime/vowel duration.
Chapter 5: Prosodic Structure and Pause Duration
1. Introduction
The goal of the study in this chapter is to examine the effect of prosodic structure on
boundary strength as instantiated in pause duration. While many studies have examined
the effects of syntactic structure on pause duration (e.g. Cooper & Paccia-Cooper 1980,
Ferreira 1991, Strangert 1991, 1997, Terken & Collier 1992), the effect of prosodic
structure has not been investigated thoroughly (but see, for example, studies by Ferreira
1993, Horne, Strangert & Heldner 1995). In this study, the effect of prosodically complex
phrases as compared to simpler phrases on pause duration will be investigated, where
complex and simple refer to whether an utterance branches into two intonation phrases
(IP) or not, as shown in Figure 5.1.
1
IP/Utt IP/Utt
IP IP IP
a) Simple phrase (non-branching) b) Complex Phrase (branching)
Figure 5.1. Prosodic complexity
In addition to prosodic structure effects, we will examine the effect of phrase length on
pause duration, an area that has also not been investigated thoroughly (but see Ferreira
1991, Zvonik & Cummins 2003). The larger motivation of the study is to advance our
136
1
The question of the type of prosodic phrase referred to here as IP/Utt in Figure 5.1 is open. According to
Selkirk’s Strict Layer Hypothesis (SLH), the phrase dominating the two IPs in Figure 5.1b cannot be an IP.
For the purposes of this experiment it is not important whether the category is Utterance or IP, but I follow
the argument developed in Ladd (1996) and assume that recursion is allowed.
137
understanding of how speakers plan and produce speech. A large body of literature,
starting in the sixties (see the overview in Goldman Eisler 1968) has investigated pauses
as a means to understand how speakers plan speech. For example, studies on the effects
of syntactic structure on pause duration have found that more complex structure leads to
longer pauses, leading investigators to the conclusion that syntactically complex phrases
are more demanding on the production system, and that longer pauses indicate the time
speakers need to plan the more complex structure (e.g., Ferreira 1991, Strangert 1997). In
our study we are specifically interested in how far ahead speakers plan an utterance and
in how phrase length, pre-boundary phrasal complexity and post-boundary phrasal
complexity interact in their effects on pause duration.
2. Background: Pause Duration
Pause duration depends on a number of factors, and consequently it has been investigated
from a number of perspectives. The most important factors for our study are the structural
factors affecting pause duration (syntactic structure and prosodic structure) and phrase
length. These have been found to affect both pause occurrence and pause duration.
Syntactic structure determines prosodic structure to a large extent – pauses often
occur at major syntactic boundaries (e.g., Cooper & Paccia-Cooper 1980, Grosjean et al.
1979, Strangert 1991 for Swedish) and certain syntactic structures always force an IP
boundary, and possibly a pause at the IP boundary(for example vocatives, appositives,
parenthetical expression, see Selkirk 1995).
138
Most studies examining the effect of syntactic structure on pause occurrence and
duration have investigated the effect of syntactic complexity (syntactic branching) and
have found that syntactically complex phrases lead to longer pauses compared to
syntactically simpler phrases (e.g., Cooper & Paccia-Cooper 1980, Grosjean et al. 1979,
Ferreira 1991, Strangert 1991, 1997 for Swedish, and Sanderman & Collier 1995 and
Terken & Collier 1992 for Dutch). Generally, it is assumed that the reason for more
complex phrases to be preceded by longer pauses is because the speaker needs more time
to process a syntactically complex utterance, compared to a syntactically simpler phrase,
so the time to initiation of the post-boundary phrase increases, allowing planning to occur
during this interval (e.g. Ferreira 1991, Strangert 1997).
While these studies have shown that syntactic structure influences pause duration,
a large body of research indicates that prosodic structure might be a better predictor of
pause duration. A study by Gee and Grosjean (1983) examines the data collected by
Grosjean et al. (1979) and finds that pause duration can be better predicted if both
syntactic and prosodic structure is used, rather than just syntactic structure. Grosjean et
al. (1979) examine pause occurrence and show a tendency for speakers to divide phrases
into smaller chunks of equal length, even if syntactic structure would lead to a different
phrasing, indicating that rhythmical aspects of prosodic structure might be the
determining factor of pause occurrence in such cases. Ferreira (1993) examined final
lengthening and pause duration and found that these are determined by prosodic rather
than by syntactic structure. These studies lead to the conclusion that prosodic structure
might be a better predictor in investigating boundary strength than syntactic structure.
139
The effects of prosodic structure on pause duration are also found in a study by
Horne, Strangert and Heldner (1995) where it was found that pause duration increases
with prosodic boundary strength (see also Choi 2003). Our previous study (Krivokapi ć
2007) also found an effect of prosodic structure on pause length. In that study we
employed very long phrases (28 syllables before and 28 syllables after the boundary) and
found that more complex prosodic structure of an upcoming phrase leads to shorter
pauses. This surprising finding was taken to show that speakers use prosody to determine
the size of the upcoming constituent to be encoded (see also Ferreira 1991 for a similar
relation between syntactic complexity and pause duration).
Phrase length has also been found to have an impact on pause duration. In
particular Zvonik and Cummins (2003) show that the length of prosodic phrases before
and after the boundary has an effect on pause duration. Pauses less than 300ms almost
exclusively occur with phrases of ten or less syllables before or after the boundary, and
the likelihood of a pause being less than 300ms increases when both the prosodic phrase
before and after the boundary are less than ten syllables. For pauses longer than 300ms
there is no predictability (Zvonik & Cummins 2003). A study by Ferreira (1991) also
finds a length effect, in that she shows that sentence initiation time increases with the
number of phonological words in a sentence.
2
In what follows we present a study which further examines the effects of prosodic
structure and phrase length on pause duration.
2
Similarly, Watson and Gibson (2004) show that pause occurrence is influenced by phrase length, in that
the number of phonological phrases within a preceding and a following syntactic phrase (and thus, the
length of the syntactic phrase) is a good indicator of the likelihood of an IP boundary (and therefore of the
likelihood of pause occurrence, since IP boundaries are often marked by pauses).
140
3. Experiment: Prosodic Structure and Pause Duration
The goal of the experiment is to examine the effects of phrase length and prosodic
complexity (branching or non-branching phrases) on pause duration. Based on previous
studies (e.g. Terken & Collier 1992, Sanderman & Collier 1995, Strangert 1997, Zvonik
& Cummins 2002, 2003, Krivokapi ć 2007) we predict an effect of phrase length on pause
duration, such that longer phrases lead to longer preceding/following pauses.
The effect of prosodic complexity could yield at least two possible outcomes: 1)
phrasal complexity could lead to shorter pauses, as has been found in Krivokapi ć (2007),
or 2) in line with findings of studies examining the effect of syntactic structure on pause
duration, phrasal complexity could lead to longer pauses. The reason to expect this latter
effect is that the phrases examined in this study are considerably shorter than the ones in
Krivokapi ć (2007) in which pre-boundary and post-boundary phrases were each 28
syllables. The phrases in the study to be presented below are shorter (six, ten and 14
syllables). In this case prosodic structure might not be chunking the upcoming phrase into
smaller units for encoding, as the upcoming phrase might “as is” be short enough for the
processing system to manage in one chunk. One way in which prediction 1 and 2 could
combine is that for shorter phrases, the effect of prosodic complexity may increase pause
duration, while for longer phrases complexity may decrease pause duration. In other
words, there is a possibility that as the length of the phrase increases and the load of the
upcoming phrase increases as well, prosodic structure may chunk the upcoming phrase
causing pause duration to become shorter for branching/complex than for non-
141
branching/simple phrases. On this scenario, in the long branching condition pauses would
be shorter than in the long non-branching condition, while in the short branching
condition pauses would be longer than in the short non-branching condition.
The present study was conducted using the synchronous speech paradigm
introduced by Cummins (2002, 2003, 2004). In this paradigm, two speakers, seated
facing each other, read sentences simultaneously, prompted by the experimenter. The
advantage of this method is that it reduces variability in pause placement and pause
duration without introducing artificial temporal properties into speech (Cummins 2002,
2003, 2004, Zvonik & Cummins 2002, 2003).
3
Given that speakers exhibit large
variability in pause duration in solo speech, the synchronous speech method is a good
way to investigate pause duration, as the effects of linguistic structure will be less
obscured by individual variation.
3.1. Methods
3.1.1. Stimuli and Subjects
There are three independent factors in the design of this experiment: 1) pre-boundary
branching, with the levels branching and non-branching Intonational Phrase (IP), 2) post-
boundary branching, with the levels branching and non-branching IP, and 3) surrounding
phrase length, with the levels short (six syllables before and six syllables after the
3
For example, Cummins (2004) compares the ratio of boundary duration to phrase length in synchronous
and solo condition and finds that the ratios are similar across conditions, but the variability is reduced in
synchronous speech. He finds similar results for the ratio of phrase length of two phrases. Zvonik and
Cummins (2003) further find that both in solo and in synchronous speech speakers had the longest pause
following the longest phrase, again indicating that while reducing variability, synchronous speech does not
change fundamental properties of speech timing.
142
boundary), medium (ten syllables before and ten syllables after the boundary) and long
(14 syllables before and 14 syllables after the boundary). The factors are crossed, for a
total of 12 conditions (2 pre-boundary x 2 post-boundary x 3 length), as summarized in
Table 5.1. For each condition one sentence was constructed. In the branching phrases,
branching was targeted at the middle of the phrases, i.e., at the third, fifth and seventh
syllable for the short, medium and long sentences respectively. To ensure that pause
duration will not vary due to acoustic properties of phonemes, each pre-boundary phrase
ended in ‘Chap’ and each post-boundary phrase started with ‘Abe’. The stimuli are given
in Table 5.2. To avoid memorization of the sentences, there were also twelve filler
sentences, matching the experimental sentences in prosodic structure and length but with
different lexical content. Seven dyads were recorded. There were 13 repetitions of each
sentence for six dyads, and ten repetitions for one dyad. For the six dyads with 13
repetitions, only twelve sentences were meant to be used for the analysis, but the 13
th
repetition was recorded as a backup, in case one of the twelve recordings did not show
the prosodic phrasing intended by the experimenter.
4
The sentences were randomized in
blocks of 24 sentences (12 test plus 12 filler sentences).
Table 5.1. Experiment conditions
pre-boundary post-boundary length
non-branching non-branching short/medium/long
non-branching branching short/medium/long
branching non-branching short/medium/long
branching branching short/medium/long
4
The 13
th
repetition was not used, as it turned out that for the sentences that were not produced as intended
by the experimenter, the 13
th
repetition had the same unintended phrasing as the erroneous sentence it was
meant to replace.
Table 5.2. Experiment stimuli.
Condition Sentence
non-branching # non-branching
IP
IP # IP
Short They were questioning Chap. Abe doubted his long tale.
Medium The Smiths were angrily questioning Chap. Abe completely doubted his alibi.
Long They were furiously interrogating Lucky Chap. Abe doubted his elaborate alibi very much.
non-branching # branching
IP
IP # IP
IP IP
Short Alex was painting Chap. Abe joined them, but left soon
Medium Alexander Wilson was painting Chap. Abe joined them at ten, but then left again.
Long Jonathan Whittiker was slowly painting Lucky Chap. Abe joined them at eleven, but left again at
noon.
branching # non-branching
IP
IP # IP
IP IP
Short Sitting still, she watched Chap. Abe ignored both of them.
Medium Sitting quietly, Jonathan watched Chap. Abe ignored both of them studiously.
Long Quietly admiring, Jonathan watched Lucky Chap. Abe stealthily observed both Jonathan and
Lucky Chap.
branching # branching
IP
IP # IP
ip ip ip ip
Short Although mad, she rang Chap. Abe picked up, but called him.
Medium Although furious, Mariette rang Chap. Abe almost answered, but called Chap instead.
Long Although really furious, Mariette rang Lucky Chap. Abe efficiently answered, but called Lucky
Chap at once.
143
144
3.1.2. Data Collection
Data from 14 speakers (7 dyads) were collected. The subjects were native speakers of
American English, all of them students at the University of Southern California. They
were paid for their participation and were naive as to the purpose of the experiment.
The two subjects were seated facing each other. Before the recording, they familiarized
themselves with the sentences. Once familiar with the sentences, the subjects were asked
to read the sentences aloud, at the prompt of the experimenter, together with their co-
speaker, as if reading a story to someone. In cases of errors, they were asked to read the
sentence again. Errors were rare.
The first four dyads were recorded on a DAT recorder, using two Shure head
mounted unidirectional microphones. The recordings were made at a 44,100 Hz sampling
rate. These recordings were transferred to a PC onto the right and left channels of a stereo
file. The remaining three dyads were recorded direct to disk using M-Audio FireWire
410, directly digitizing the input.
3.1.3. Measurements
In all sentences, the pre-boundary phrase ended in a vowel (/æ/) followed by a voiceless
stop, and the post-boundary phrase started with a vowel (/e ɪ/). The duration of the pause
was measured from the end of the periodic voicing of the pre-boundary vowel to the
beginning of periodic voicing for the vowel of the post-boundary phrase. At the end of a
pre-boundary phrase, there was occasionally glotalization in the form of irregular voicing
(see Dilley et al. 1996). In such cases, the pause was taken to start at the end of the
145
irregularly voiced signal. At the end of the pause, occasionally there was a glottal pulse
preceding the regular periodic voicing for the vowel, which was also taken to be evidence
for glotalization of the phrase initial vowel (see Redi & Shattuck-Hufnagel 2001). In
these cases, the pause was taken to end at the onset of the glottal pulse. In all other cases,
the pause was taken to start with the end of periodic voicing and to end with the onset of
periodic voicing.
In order to verify that the intended prosody was used by the subjects—that is, to
verify whether they produced the branching and non-branching structures intended to be
elicited by the stimuli—the author examined the recordings using the ToBI conventions
for prosodic transcription (Beckman & Elam 1997). Pre-boundary and post-boundary
phrases were examined for whether they contained Intonational Phrases (signaling
prosodic branching) or not (signaling non-branching phrases). Intonational Phrase
boundaries were identified by phrase accent, final lengthening, and a boundary tone. In
case the prosodic structure did not correspond to the expected structure, a token was
excluded. The number of excluded tokens was nine for dyad A, nine for dyad C, seven
for dyad E, 29 for dyad G and one for dyad M. Dyads I and K had all tokens included.
5
5
The large number of excluded tokens for Dyad G is largely due to subjects producing branching rather
than non-branching structures in the long branching # non-branching condition and in the medium and long
non-branching # branching condition and to subjects producing post-boundary branching structures in the
long non-branching#non-branching condition. A possible reason for these branching effects is that it is
easier to produce a rhythmic structure, where both the pre- and post-boundary phrases are branching, than a
structure where pre- and post-boundary phrases do not match (are not symmetric) in prosodic complexity.
We will return to prosodic symmetry effects in the discussion section. The unintended branching in the
post-boundary phrase of the long non-branching#non-branching condition cannot be explained in this way,
since this is an instance of prosodic symmetry.
146
3.1.4. Statistical Analysis
For each sentence token, the pause duration dependent variable was the average pause
duration of the two speakers of each dyad. To pool this data across dyads, the averaged
pause durations were converted to z-scores (calculated for each dyad separately). A three-
factor ANOVA was performed on these data for each dyad separately and for the pooled
data, testing the effect of the three independent factors: 1) pre-boundary prosodic
complexity (with the two levels: branching and non-branching), 2) post-boundary
prosodic complexity (with the two levels: branching and non-branching), and 3) length
(with three levels: short, medium and long). Fisher’s PLSD (probable least-square
difference) test was used to examine the differences of the three levels of length, and
planned comparisons of means were conducted to examine statistically significant
interactions of prosodic complexity and length, specifically comparing the effect on
pause duration of: 1) pre-boundary branching phrases to pre-boundary non-branching
phrases, separately in the long, in the medium and in the short condition and 2) post-
boundary branching phrases to post-boundary non-branching phrases, separately in the
long, in the medium and in the short condition. Significance for all tests was set at
p<0.05.
4. Results
The average pause duration for each condition for each dyad (in milliseconds) and for the
dyads pooled (in z-scores) is given in Table 5.3.
Table 5.3. Pause durations for individual dyads, means and standard deviations (in milliseconds) and for the dyads pooled (z-scores).
Dyad
A
Dyad
C
Dyad
E
Dyad
G
Dyad
I
Dyad
K
Dyad
M
Dyads
poole
d
branching, branching,
long
674
(39)
734
(57)
692
(49)
965
(105)
624
(51)
518
(36)
716
(23)
.569
(.7)
branching, branching,
medium
620
(47)
660
(63)
692
(37)
916
(94)
621
(70)
511
(35)
642
(64)
.108
(.8)
branching, branching,
short
582
(55)
633
(107)
693
(76)
1025
(117)
561
(75)
483
(60)
645
(64)
-.105
(1)
branching, non-
branching, long
659
(81)
708
(72)
690
(53)
1003
(87)
626
(67)
532
(30)
695
(47)
.527
(.9)
branching, non-
branching, medium
659
(32)
711
(90)
696
(52)
969
(102)
642
(82)
526
(35)
724
(163)
.580
(1)
branching, non-
branching, short
638
(57)
604
(58)
679
(31)
952
(105)
596
(87)
498
(40)
652
(72)
-.031
(.9)
non-branching,
branching, long
638
(22)
642
(63)
695
(37)
1008
(161)
625
(57)
538
(28)
691
(61)
.408
(.8)
non-branching,
branching, medium
593
(46)
685
(49)
660
(40)
932
(128)
599
(61)
545
(32)
678
(49)
.167
(.8)
non-branching,
branching, short
576
(54)
604
(85)
607
(28)
907
(118)
552
(57)
470
(40)
599
(54)
-.655
(.8)
non-branching, non-
branching, long
594
(24)
624
(59)
641
(42)
883
(112)
576
(66)
520
(24)
666
(64)
-.181
(.7)
non-branching, non-
branching, medium
579
(51)
672
(88)
635
(44)
878
(98)
567
(69)
472
(25)
625
(81)
-.417
(.9)
non-branching, non-
branching, short
610
(38)
567
(56)
599
(49)
893
(150)
532
(60)
450
(52)
619
(85)
-.762
(1)
147
148
4.1. Length Effects
The results show a main effect of length for six of seven dyads (there was no effect for
Dyad G). Post hoc test show significant differences between two levels of length in all
cases, such that surrounding longer phrases lead to longer pauses. The results are shown
in Table 5.4.
Table 5.4. Main effects of length and Fisher’s PLSD. The results for individual dyads give the pause length in milliseconds, the pooled results in z-scores.
Dyad A Dyad C Dyad E Dyad I Dyad K Dyad M
Main effect F(2,99)=6.45
6, p=.0023
F(2,123)=17.
919, p<.0001
F(2,125)=6.9
65, p=.0014
F(2,132)=8.6
93, p=.0003
F(2,132)=23.
707, p<.0001
F(2,131)=8.2
81, p=.0004
Fisher’s PLSD
long, medium
p=.0037
Means
long: 644
medium: 609
Fisher’s PLSD
long, short
p=.0002
Means
long: 644
short: 600
p<.0001
Means
long: 673
short: 602
p=.0004
Means
long: 680
short: 644
p=.0002
Means
long: 612
short: 560
p<.0001
Means
long: 527
short: 475
p=.0001
Means
long: 691
short: 628
Fisher’s PLSD
medium, short
p<.0001
Means
medium: 678
short: 602
p=.0112
Means
medium: 670
short: 644
p=.0009
Means
medium: 607
short: 560
p<.0001
Means
medium: 513
short: 475
p=.0142
Means
medium: 667
short: 628
149
150
Pooled results show a main effect of length (F(2,918)=52.671, p<.0001). Results
are shown in Figure 5.2. Fisher’s PLSD shows a significant difference between all 3
levels of length, such that surrounding longer phrases lead to longer pauses (long
compared to medium p=.0011, long compared to short p<.0001, medium compared to
short p<.0001).
Figure 5.2. Main effect of length. All dyads pooled.
4.2. Prosodic Complexity Effects
There was a significant effect of pre-boundary prosodic complexity, such that branching
phrases are followed by longer pauses than non-branching phrases for six out of seven
dyads (Dyad K did not show any effect). For post-boundary branching there was a
significant effect for one dyad (E), such that branching phrases are preceded by longer
pauses than non-branching phrases. Results for individual dyads and all dyads pooled are
shown in Table 5.5.
-.75
-.25
.25
.75
-.5
-1
.5
0
1
Pause duration (Z-scores)
long medium short
Table 5.5. Pre-boundary and post-boundary effects on pause duration. Individual dyads and pooled results.
Dyad A Dyad C Dyad E Dyad G Dyad I Dyad M Pooled results
Main effect
pre-boundary
F(1,99)=18.813,
p<.0001
Means
Branching: 636
Non-branching:
597
F(1,123)= 11.364,
p=.0010
Means
Branching: 668
Non-branching:
632
F(1,125)=40.896,
p<.0001
Means
Branching: 690
Non-branching:
640
F(1,103)=5.629,
p=.0195
Means
Branching: 969
Non-branching:
912
F(1,132)=10.489,
p=.0015
Means
Branching: 611
Non-branching:
575
F(1,131)=6.597,
p=.0113
Means
Branching: 679
Non-branching:
646
F(1,918)=75.862,
p<.0001
Means
Branching: .257
Non-branching: -
.253
Main effect
post-boundary
F(1,125)= 4.259,
p=.0411
Means
Branching: 675
Non-branching:
655
F(1,918)=4.805,
p=.0286
Means
Branching: .078
Non-branching: -
.082
151
Pooled results show an effect of both pre-boundary and post-boundary branching
[pre-boundary: F(1,918)=75.862, p<.0001, post-boundary: F(1,918)=4.805, p=.0286],
such that branching phrases are followed and preceded by longer pauses than non-
branching phrases, as shown in Figure 5.3. The pre-boundary effect is qualitatively larger
than the post-boundary effect.
Pause duration (z-scores)
post-boundary
branching
post-boundary
non-branching
-1
-.75
-.5
-.25
0
.25
.5
.75
1
-1
-.75
-.5
-.25
0
.25
.5
.75
1
Pause duration (z-scores)
pre-boundary
branching
pre-boundary
non-branching
Figure 5.3. Pre-boundary and post-boundary prosodic complexity effects. All dyads pooled.
4.3. Interactions
Two dyads (C and E) show an interaction between pre-boundary complexity and length
(shown in Figure 5.4 and 5.5) (F(2,123)=3.382, p=.0372 for dyad C, and F(2,125)=4.481,
p=.0132 for dyad E). A planned comparison of means shows that for dyad C branching
phrases are followed by longer pauses than non-branching phrases (F=15.418, p=.0001)
only for the long phrases. For dyad (E) the planned comparison of means shows that
branching phrases are followed by longer pauses than non-branching phrases for medium
152
and short length phrases (F=11.094, p=.0011 for medium phrases and F=33.793, p=.0001
for short phrases) but not for long phrases.
0
100
200
300
400
500
600
700
800
900
Pause duration (in milliseconds)
pre-boundary
branching
pre-boundary
non-branching
short
medium
long
Figure 5.4. Interaction between pre-boundary complexity and length. Dyad C.
0
100
200
300
400
500
600
700
800
900
Pause duration (in milliseconds)
pre-boundary
branching
pre-boundary
non-branching
short
medium
long
Figure 5.5. Interaction between pre-boundary complexity and length. Dyad E.
One dyad (A) shows an interaction between post-boundary complexity and length
(F(2,99)=5.378, p=.0061). A planned comparison of means shows that branching phrases
153
are preceded by shorter pauses than non-branching phrases for short phrases (F=8.582,
p=.0042) but not for other length conditions. The interaction is shown in Figure 5.6.
0
100
0
300
400
500
600
700
800
900
Pause duration (in milliseconds)
post-boundary
branching
post-boundary
non-branching
short
medium
long
Figure 5.6. Interaction between post-boundary complexity and length. Dyad A.
For two individual dyads (dyad I and dyad K) and for all dyads pooled, there was
an interaction between pre- and post-boundary prosodic complexity (F(1,132)=5.56,
p=.0198 for dyad I, F(1,132)=16.893, p<.0001 for dyad K, and F(1,918)=25.372,
p<.0001 for all dyads pooled). A comparison of means showed that for dyads I and K the
effect is such that pre-boundary non-branching phrases are followed by shorter pauses
when the post-boundary phrase is non-branching, than when it is branching (for dyad I,
F(1,132)=4.480, p=.0362, for dyad K F(1,132)=17.263, p=.0001). For the dyads pooled,
the effect is such that pre-boundary branching phrases are followed by shorter pauses
when the post-boundary phrase is branching than when it is non-branching
(F(1,918)=3.999, p=.0458). Pre-boundary non-branching phrases are followed by shorter
pauses when the post-boundary phrase is non-branching, than when it is branching
154
(F(1,918)=26.445, p=.0001). That is, phrases where the pre- and post-boundary phrase
match in prosodic complexity (i.e., are both branching or both non-branching) have
shorter pauses than when the phrases do not match in complexity. The results are shown
in Figures 5.7, 5.8, and 5.9.
Figure 5.7. Interaction between pre- and post-boundary complexity. Dyad I.
0
100
200
300
400
500
600
700
800
900
Pause duration (in milliseconds)
pre-boundary
branching
pre-boundary
non-branching
post-boundary
non-branching
post-boundary
branching
0
100
200
300
400
500
600
700
800
900
Pause duration (in milliseconds)
pre-boundary
branching
pre-boundary
non-branching
post-boundary
non-branching
post-boundary
branching
Figure 5.8. Interaction between pre- and post-boundary complexity. Dyad K
155
post-boundary
non-branching
post-boundary
branching
z-scores
for pause duration
pre-boundary
branching
pre-boundary
non-branching
-.75
-.5
-.25
0
.25
.
.75
1
matching
matching
Figure 5.9. Interaction between pre- and post-boundary complexity. Pooled dyads.
There was an interaction effect of length, pre-, and post-boundary prosodic
complexity for one dyad (M) and for all dyads pooled (F(2,131)=3.341, p=.0384 for dyad
M, and F(2,918)=4.701, p=.0093 for the dyads pooled), shown in Figure 5.10 and 5.11.
6
156
6
The structures shown in Figure 5.9 and 5.10 assume prosodic recursion, contra the Strict Layer
Hypothesis (Selkirk 1984).
0
100
200
300
400
500
600
700
800
900
Pause duration (in milliseconds)
short
medium
long
non-matching
(branch#non-br)
matching
(branch)
matching
(non-branch)
non-matching
(non-br#branch)
Utt
IP IP
Utt
IP IP
IP IP
Utt
IP IP
IP IP
Utt
IP IP
IP IP IP IP
Figure 5.10. Interaction between phrase length, pre-, post-boundary complexity. Dyad M.
157
-1
-.75
-.5
-.25
0
.25
.5
.75
1
Pause duration (Z-scores)
short
medium
long
Utt
IP IP
non-matching
(non-br#branch)
matching
(non-branch)
Utt
IP IP
IP IP
Utt
IP IP
IP IP
Utt
IP IP
IP IP IP IP
matching
(branch)
non-matching
(branch#non-br)
Figure 5.11. Interaction between phrase length, pre-, post-boundary complexity. Pooled dyads.
To investigate the nature of the effect of prosodic complexity matching, further
comparison of means tests were conducted examining the effects of prosodic matching on
pause duration for the dyads pooled together.
The goal was to compare whether, when the length of the phrases is constant,
conditions in which pre- and post-boundary phrases match in complexity (i.e., both pre-
and post-boundary phrases are branching or both pre- and post-boundary phrases are non-
branching) have shorter pauses than conditions in which pre- and post-boundary phrases
do not match in complexity (i.e., are either branching pre-boundary and non-branching
post-boundary or non-branching pre-boundary and branching post-boundary). If prosodic
matching is the cause for the shorter pauses found in the interactions between the pre-
and post-boundary prosodic complexity reported above, then it is expected that sentences
158
159
of the same length in the conditions with matching complexity will have shorter pauses
than sentences in the non-matching conditions. The results are shown in Table 5.6.
“Match shorter pause” means that the condition where there was prosodic matching
between the pre- and post-boundary phrase had a shorter pause duration than the
condition where there was no prosodic matching, as expected if prosodic matching is
relevant for pause duration.
Table 5.6. Comparisons of means. The effect of prosodic matching on pause duration. “Match shorter pause” means
that the condition where there was prosodic matching had a shorter pause duration than the condition where there was
no prosodic matching. All dyads pooled.
match v. no match LONG MEDIUM SHORT
Branching, branching vs.
Branching, non-branching
match shorter pause*
F=10.254, p=.0014
Branching, branching vs.
non-branching, branching
match longer pause*
F=15.137, p=.0001
non-branching , non-
branching vs. non-
branching, branching
match shorter pause*
F=16.045, p=.0001
match shorter pause*
F=16.701, p=.0001
non-branching , non-
branching vs. branching,
non-branching
match shorter pause*
F=22.028, p=.0001
match shorter pause*
F=45.221, p=.0001
match shorter pause*
F=26.931, p=.0001
The results in Table 5.6 show that long and medium phrases in the matching non-
branching utterances have shorter pause duration than the non-matching utterances. In
addition, for the medium length utterances, matching branching structures lead to shorter
pauses than branching#non-branching structures. For short utterances, for matching non-
branching structures there was a shorter pause than in the branching#non-branching
structures. In one case prosodic matching lead to longer pauses, namely for short phrases,
the branching#branching structure had longer pauses than the non-branching#branching
structure.
Overall then, the results in Table 5.6 show that prosodic matching has an effect on
pause duration, such that in general pauses are shorter if the surrounding prosodic phrases
160
match in prosodic complexity. The effect is most prominent for phrases of medium
length. The case where it is clear that prosodic matching has an effect on pause duration
is the instance where the branching#branching utterances were compared to non-
matching utterances and the matching in complexity lead to shorter pauses. The reason
that this is the most telling case is that the pre- and post-boundary non-branching phrases
have been found to lead to shorter pauses than branching phrases independently of
prosodic matching, (see section 4.2, this chapter), so it is to be expected that when both
pre- and post-boundary phrase are non-branching this will lead to short pauses. But the
branching#branching phrases should, without the effect of prosodic matching, lead to
significantly longer pauses, since pre-boundary branching and post-boundary branching
separately lead to longer pauses. The fact that they do not (except in the case of short
utterances in the comparison between branching#branching and
non-branching#branching) and that they can in fact lead to shorter pauses are indicators
of the effects of prosodic matching.
5. Discussion
The results of the experiment show that prosodic structure and phrase length both have an
effect on pause duration. Phrase length shows a graded effect, such that longer
surrounding phrases lead to longer pauses than medium long phrases, which in turn lead
to longer pauses than short phrases. More complex prosodic structure leads to longer
pauses, both pre- and post-boundary, and matching/symmetry in pre- and post-boundary
prosodic complexity was found to lead to shorter pauses than non-matching in pre- and
161
post-boundary prosodic complexity. The effects of prosodic structure correspond to
effects found in studies examining the relation between pause length and syntactic
structure, where more complex syntactic structure lead to longer pauses. The effects of
phrase length are as predicted, in that pause duration increases with phrase length.
The general view of pause duration (e.g., Cooper & Paccia-Cooper 1980, Ferreira
1991, Griffin 2003, Smith &Wheeldon 1999) is that it reflects effects of speech planning
or production processes, in that longer pauses occur in conjunction with longer planning
or execution time for the upcoming phrase. Our results concerning post-boundary effects
for both phrase length and prosodic complexity fit with this view, as more phonological
length (syllables) and more complex prosodic structure of an upcoming phrase require
longer planning times leading to longer pauses.
7
The findings on the effects of prosodic complexity on pause duration differ from
the findings of Krivokapi ć (2007), where more complex structure led to shorter pauses.
That effect was accounted for by assuming that prosodic structure participates in
determining the size of the chunk to be processed. Depending on the size of a potential
chunk speakers might plan only up to the branching node. In other words, depending on
the size of the chunk (and probably other factors) a hierarchically higher or lower IP (or
ip) might be the chunk to be processed. Note this interpretation assumes recursion, but
the crucial point is that a hierarchically higher or lower category can determine the chunk
7
It should be mentioned that while the experiment investigates read speech, and therefore less planning by
the speaker is required, there is still planning involved – for example articulatory, but also syntactic and
prosodic planning, although more constrained in its structural variables than in spontaneous speech.
162
to be processed, not the exact type of the category, as any prosodic category above the
word level could potentially be the chunk to be processed.
8
The results of the current experiment suggest that when phrases are smaller (six,
ten and fourteen syllables in our study), speakers process the whole post-boundary phrase
and do not chunk it into smaller phrases. In this case then, the prosodically more complex
phrases lead to longer planning time as there is more structure to be planned. Combining
the findings from Krivokapi ć (2007) and the present study, we see that prosodic structure
interacts with phrase length in heterogeneous ways. For very long phrases, speakers use
hierarchically lower phrases to determine the processing chunk, but when the phrases are
shorter, hierarchically higher phrases determine the chunk to be processed. In addition,
this study also shows that speakers have a fairly large lookahead in speech processing –
at the time of the post-boundary phrase initiation, speakers were aware of the prosodic
branching occurring at the third, fifth and seventh syllable after the boundary.
A particularly intriguing finding of the present study is the effect of prosodic
matching or symmetry. To the best of our knowledge, such an effect has not been
previously observed. Prosodic matching can be seen as a rhythmic aspect of prosodic
organization. By rhythm we mean a regular occurrence of prosodic boundaries over the
course of an utterance, as shown in Figure 5.12. In the matching cases, prosodic
boundaries occur at the “same” intervals, as opposed to the non-matching cases.
8
Other factors apart from the size of the potential chunk are likely to influence whether a hierarchically
higher or lower prosodic phrase will be the chunk to be processed by the speaker. Individual differences
might play a role for example. Swets et al. (2007) for example showed that individual differences influence
prosodic phrasing in silent reading. Readers with a low reading span had a greater tendency to break up
larger chunks of text into smaller chunks than readers with a high reading span (Swets et al. 2007). Other
processing demands on the speaker could also have an impact such that the chunks to be processed at a
time become smaller.
Figure 5.12. Rhythmic (matching) and non-rhythmic (non-matching) prosodic structure.
Prosodic matching can be seen as facilitating planning or production, in the sense that the
more rhythmical structure requires less planning or production time and therefore leads to
shorter pauses.
9
This effect is reminiscent of similar rhythmic effects observed in speech that can
be understood as entrainment. Entrainment refers to the notion of two interacting
oscillators synchronizing (in frequency and /or phase).
10
Various processes in speech can
be understood to arise due to entrainment. For example, speech errors have been
understood to arise due to entrainment of two gestures (Goldstein et al. 2007), and studies
by Port and his colleagues have found that speakers entrain easily to a metronome,
leading to the conclusion that speech must be oscillatory at some presumably abstract
163
9
That rhythmic speech makes the planning or production processes easier has also been argued for by
Arbisi-Kelm (2006) for stuttering speech. Arbisi-Kelm (2006) examines stuttering speech and finds
instances of segmentally fluent, but prosodically unnatural (with pitch accents anomalously placed on
function words, i.e., words like ‘to’, ‘of’) speech with very regular (rhythmic) disyllabic foot structure. He
argues that such rhythmicity simplifies the production of prosody and thereby helps avoiding segmental
disfluencies.
10
Entrainment was first reported by the Dutch physicist Christian Huygens in the 17
th
century. He observed
that two pendulum clocks when placed on a wall near each other synchronized after a while so that their
pendulums swung at the same rate (see Pikovsky, Rosenblum & Kurths 2001 for a detailed discussion of
entrainment).
164
planning level as only in that case speakers could easily entrain to a metronome (Port,
Tajima & Cummins 1996, Cummins & Port 1998, Port 2002). The shorter pauses in the
prosodic symmetry cases show the rhythmic nature of prosodic structure and that
speakers’ planning/production is sensitive to it. This rhythmic effect lends itself to an
entrainment modeling approach (for example Barbosa 2002, Nam & Saltzman 2003)
where planning oscillators of prosodic gestures could be entrained to a phrase level
oscillator (a phrase level oscillator has been suggested by Nam, Goldstein & Saltzman
2006).
11
While the model of Nam & Saltzman (2003) and Saltzman & Byrd (2000)
provide an account of coupling mechanisms for syllable structure, and the model of
Barbosa (2002) provides an account of phrase level rhythmical structure, so far there is
no model accounting for rhythmic effects of prosodic boundaries as seen in this study.
The symmetrical structure could be understood as easier for planning or production
because it exhibits a stable frequency and/or because this facilitates entrainment.
Note also that the matchingness effect that we have observed is strongest for the
medium phrase length. This shows an interesting interaction of phrase length and
prosodic symmetry, in that it seems that for long phrases and for short phrases the effects
of prosodic complexity are strong enough for rhythmic effects not to be as noticeable
(long phrases being too demanding and short phrases too simple for prosodic matching
effects to become noticeable), but the medium long phrases are the critical test situation
in which effects of prosodic rhythm can be seen.
11
The basic idea behind planning oscillators is that each speech gesture is associated with a planning
oscillator and that the activation time of a gesture is determined by that oscillator. See Nam, Goldstein &
Saltzman (to appear) for a detailed account of oscillator coupling in syllable structure.
165
As can be seen in Figure 5.11, pause duration across the three phrase lengths
increases in the following order: non-branching # non-branching (matching prosodic
complexity) has the shortest pause, followed by non-branching # branching, followed by
branching # branching, and the longest pause is for the branching # non-branching
condition. Thus pre-boundary prosodic complexity is the strongest factor affecting pause
duration (pre-boundary non-branching leads to shortest pauses), followed by prosodic
matching. This can be seen from the following: the two conditions with the shortest
pauses are the ones with pre-boundary non-branching, regardless of prosodic matching.
Within these two conditions, the shorter pause is for the prosodically matching condition
(non-branching # non-branching) and within the third and fourth condition again the
prosodically matching condition has the shorter pause (branching # branching). If
prosodic matching were the dominant, and complexity (non-branching) the second factor,
the two matching conditions would have the shortest pauses, followed by the non-
matching conditions, and the exact order would then depend on prosodic complexity,
yielding the following order: non-branching # non-branching, branching # branching,
and then the non-matching conditions non-branching # branching, branching # non-
branching. That it is the pre-boundary complexity that has a stronger effect than post-
boundary complexity follows from the fact that non-branching # branching has shorter
pauses than branching # non-branching and can also be seen in the difference in the
strength of effect of pre- and post-boundary prosodic branching, in Figure 5.3.
The just mentioned difference in magnitude of effect between the pre- and post-
boundary prosodic complexity effect (i.e., that pre-boundary effects of prosodic
166
complexity are stronger than the post-boundary effects) was a further result of our
analysis (see Figure 5.3). Relatedly, Ferreira (1991) has found an interaction effect of
pre- and post-boundary syntactic complexity. She has found that pause duration between
a subject noun phrase and a verb noun phrase increases with the complexity of the verb
phrase, but only if the noun phrase is also complex. She interprets this finding as
indicating that when the subject noun phrase is complex, speakers are not able to plan the
verb phrase during the noun phrase production so a pause occurs between the subject and
verb phrase, which gives the speaker the time needed for processing the upcoming verb
phrase. This explanation, which relies on speakers’ incremental production of speech, can
be applied to the prosodic complexity effects observed in our study.
12
The pre-boundary
prosodic complexity effects in our study were stronger than the post-boundary effects.
We might account for this difference in magnitude of effect as an interaction between
pre- and post-boundary complexity. If pre- and post-boundary effects were the same, the
order of the pause duration would be the following, from shortest to longest pause
duration: a) non-branching # non-branching (least structure to be planned); b) non-
branching#branching and branching # non-branching phrases would be the same; c)
branching # branching (most structure to be planned). So the difference in magnitude of
effect is driven by the difference between the two non-matching conditions (the
matchingness effect contributes to branching#branching being third in the hierarchy of
12
Incrementality refers to the notion that speakers process an utterance in stages (e.g., the stage of planning
the conceptual content of a sentence, the stage of its syntactic encoding), and that processing of different
fragments of an utterance can proceed in parallel, at different stages. For example, as soon as one fragment
is ready, speakers can start articulating, without having entirely planned the rest of the utterance, and the
planning of the rest of the utterance proceeds while the speaker is articulating (see e.g., Levelt 1989, Levelt,
Roelofs & Meyer 1999, Ferreira & Swets 2002, Keating & Shattuck-Hufnagel 2002, Meyer, Belke, Häcker
& Mortensen 2007).
167
pause shortness), and the important question is why non-branching#branching and
branching#non-branching do not have comparable phrase length, in other words, why
prosodic complexity has a stronger pre-boundary effect. Following Ferreira (1991), this
can be understood that since a non-branching structure is less demanding on the
production system than a branching phrase, while articulating the pre-boundary non-
branching phrase the speaker can plan the upcoming post-boundary phrase to a greater
extent than she can plan when the pre-boundary phrase is branching. When the pre-
boundary phrase is non-branching, the pause for the upcoming post-boundary phrase will
be shorter than in the case of the pre-boundary branching phrase, as a larger part of the
upcoming post-boundary phrase will have been planned during the articulation of the pre-
boundary phrase.
6. Conclusions: Prosodic Structure and Pause Duration
This study has examined the effect of phrase length and prosodic phrasal complexity on
pause duration. Phrase length was found to exhibit the predicted effect, such that longer
phrases induce longer pauses, in a three-way graded manner. Prosodically complex
phrases were found to induce longer pauses. We interpreted these finding in terms of
speech planning and production effects in that longer and more complex phrases induce
longer pauses as more phonological length and more structure need to be processed by
the speaker. A further finding is that prosodic symmetry leads to shorter pauses, and this
has been understood as an effect of rhythmic facilitatation of planning or execution. In
addition to observing effects of prosodic complexity and phrase length, and showing a
168
large lookahead in the planning in prosodic structure, this study also motivates further
research into global relations between prosodic boundaries, that is, into the interaction of
prosodic boundaries at a distance and into the effects of entrainment or rhythmic relations
among prosodic gestures/events.
169
Chapter 6: Conclusions
The research program we presented in this dissertation seeks to examine the relationship
between abstract prosodic structure and its overt realization in spoken language.
Specifically, we addressed the relationship between prosodic structure, its phonetic
realization, and its perception, with a particular focus on the local and global temporal
properties of prosodic boundaries. We forwarded the following questions:
1) What linguistic information is present in the prosodic structural representation?
2) How is this information used by speakers?
3) What are the articulatory manifestations of prosodic structure in speech?
4) How is prosodic structure perceived by listeners in terms of both abstract structure
and articulatory detail?
Five experiments examining these questions were conducted. In Experiment 1 (Chapter
2) we examined the temporal scope and magnitude of the effect of prosodic boundaries of
various strengths, ranging from no boundary to strong boundary. We have found that
prosodic boundary strength affects both edge-local and more remote articulations,
consistent with the π-gesture approach (Byrd & Saltzman 2003) in which prosodic events
interact with constriction gestures over time. That prosodic boundaries have a scope of
effect (that the effects are not limited to formal edge adjacency), and that they decrease
with distance from the boundary follow naturally from the π-gesture approach as
implemented in the Articulatory Phonology framework. While an ad-hoc mechanism
implementing these effects can likely be incorporated into prosodic theories, no theory of
prosodic structure other than the π-gestural model currently accounts for such a scope of
effect in a principled way.
1
1
For example, Levelt’s (1989) theory of speech production accounts for final lengthening by successively
increasing the duration of stressed syllables until an IP break is reached. This model crucially assumes that
170
Further, compensatory shortening results suggest timing influences at a global
level (Saltzman 1999, Saltzman, Löfqvist & Mitra 2000), such that prosodic perturbation
of timing induced by the π-gesture causes gestures further away from the boundary (and
not co-active with the π-gesture) to show an attraction to return to the unperturbed timing.
In Chapter 3 we presented a study investigating the question of gradiency and
categoricity in prosodic boundary production and perception. The production portion of
the study found that speakers’ production of prosodic boundaries, as seen in the pre-
boundary temporal properties, is largely driven by two prosodic categories. In the
perception part of the study, we have found evidence for both large granularity IP/ip
differentiation (as predicted by prosodic models e.g., Beckman and Pierrehumbert 1986)
and, at a smaller level granularity, suggestions of more subtle gradations in juncture
strength. A representational understanding of these more subtle gradations in juncture
strength will require further investigation. One possibility is that they stem from prosodic
speakers in their production have very little lookahead, i.e., that planning occurs for very short structures.
As Keating & Shattuck-Hufnagel (2002) point out, this model predicts that final lengthening starts with the
second word in a phrase and that it is larger for longer phrases than for shorter phrases. However, final
lengthening is not likely to necessarily start at the second word in a phrase (Byrd et al. 2006, our study of
the scope of effect in Chapter 2). Levelt’s approach does however in some sense incorporate the idea of
progressive final lengthening that has been observed in acoustic (Berkovits 1993a, Cambier-Langeveld
1997) and articulatory studies (Byrd et al. 2006, this study Chapter 2). In fact, the π-gesture approach
would be one way in which Levelt’s (1989) and Levelt et al. (1999) model could approach phrase final
lengthening while keeping the small lookahead and without having to rely on the lengthening of each word
in a phrase Level’s model has a separate phonological and phonetic level, so the concept of the π-gesture is
not directly applicable to it. However, the basic idea is that, rather than incrementally increasing segment
duration from the second word on in a phrase in order to achieve final lengthening, a different procedure
could apply: at a point when a prosodic break is placed (in the process of building the prosodic structure) it
is marked as a π-gesture. When gestures are retrieved for phonetic encoding, at the last lexical stress in the
phrase the π-gesture is retrieved (see below on the coordination between lexical stress and π-gestures) and
final lengthening is achieved in this way, while preserving a small lookahead.
Other prosodic models that incorporate final lengthening, for example by increasing the duration
of a phrase final segment (e.g., Selkirk 1984, Ferreira 1993) or by allowing variation in segmental
properties (as in Keating’s 1990 window model, as modified by Cho 2002 to include prosodic
modifications) do not account for the scope of effect of boundary adjacent lengthening nor can they
account for the decrease in magnitude of effect.
recursion (see Ladd 1996), which would allow for gradations in the perception of
boundaries, such as seen in our study, where the IP cluster contained smaller clusters. An
example of a prosodic structure for such a recursion is shown in Figure 6.1.
IP
171
IP IP
# IP # IP
Figure 6.1. Prosodic recursion
The lack of more subtle gradations in boundary strength in production was argued
to be due to the fact that we only investigated temporal properties of one articulatory
movement at the boundary, and that examining further characteristics of boundaries
(temporal properties of other articulatory movements at the boundary, or tonal properties
for example) might show evidence of a more fine-grained prosodic representation.
Chapter 4 examined the relationship between articulatory properties of prosodic
boundaries and their perception by the listener and found a correlation, both for the
lengthening and for the compensatory shortening effects. The strongest correlation found
for the opening movements indicate that listeners are particularly responsive to the
articulatory movements with salient information in judging juncture strength. Particularly
interesting was the finding that the pre-boundary opening movement, which spans
varying acoustic events (and is not the variable usually measured in final lengthening), is
the most predictive variable for perceived boundary strength, rather than the closing
172
movement which contains more acoustic properties usually measured in final
lengthening. Further studies are needed to specifically test whether this dominant
articulatory variable or the traditional acoustic final lengthening measure of rime/vowel
duration relate more closely to perceived boundary strength.
The study in Chapter 5 examined the effect of prosodic structure and phrase
length on boundary production as manifested in pause duration. Phrase length, prosodic
structural complexity and symmetry have been shown to influence pause duration
patterns. The effects of prosodic complexity and symmetry on pause duration show a
global effect of prosodic structure on pause duration, in that prosodic boundaries at
several syllables distance influence pause duration.
This study also informs models of speech production, in particular as to the
process of incrementality in speech production, as we have observed effects indicating
that speakers have a large lookahead (several syllables). Two theories of speech
production discuss prosodic planning in detail, namely Keating and Shattuck-Hufnagel
(2002) and the theory developed by Levelt and his colleagues (Levelt 1989, Levelt,
Roelofs & Meyer 1999). The approach taken in Levelt (1989) (and also in Levelt et al.
1999), is that speech production is strictly incremental with a minimal amount of
lookahead. Prosodic constituents are created as first prosodic words are produced from
segments from the lexicon. Prosodic structure is built based on syntactic structure and
prosodic boundaries are placed after the head of a syntactic constituent (although an
adjunct can optionally be included into the prosodic constituent for an intermediate
phrase). IPs result from the decision of the speaker to break at a certain point in an
173
utterance, depending on the length of the phrase, the syntactic structure, the availability
of new syntactic material (in the sense that if no further material is available for
processing, the speaker is forced to pause), speech rate, and the speaker’s desire to be
intelligible. Crucially, in this model boundaries are determined by looking at just one
word at the time, with very little lookahead. The only factors constraining the
construction of intermediate phrases are syntactic boundaries, and for IPs the factors
mentioned are seen as options that the speaker can but need not realize. While Levelt
(1989) points out that speakers can occasionally have a larger lookahead (for example in
very carefully produced speech) generally the assumption is that there is little lookahead.
The effects of distant prosodic boundaries seen in our experiment (Chapter 5) do not
follow from the premises of the model, as they extend over a number of phonological
words, thus a model with a small lookahead will not be able to account for such effects.
In Keating and Shattuck-Hufnagel’s (2002) model, prosodic structure is built
based on syntactic information and is restructured in the course of the production process
based on word form and prosodic information; in other words, prosodic structure is build
before phonetic encoding. This model necessarily has a large lookahead, as it requires a
complete prosodic structure before individual segments can be encoded.
2
While Keating
and Shattuck-Hufnagel (2002) do not make a claim about the exact size of syntactic
chunk available for processing, they do assume that it is the whole utterance. However,
their main point is that “the increments must be large enough to account for the facts of
phonological and phonetic segmental sensitivity to prosodic structure” (Keating &
2
By complete prosodic structure Keating and Shattuck-Hufnagel (2002) refer to a rough prosodic structure
representation, not a prosodic structure worked out in full detail.
174
Shattuck-Hufnagel 2002:139). The effects of phrase length, prosodic structure and
prosodic symmetry that we have observed indicate a large lookahead, and Keating and
Shattuck-Hufnagel’s model, while not having an explicit mechanism to capture such
effects, by virtue of having a prosody-first approach and allowing for a large lookahead,
at least could in principle allow for distant effects of prosodic boundaries.
In addition to our own findings, studies from listener perception of prosodic
boundaries are a further indication of these coordinated effects of prosodic boundaries. A
study by Frazier, Clifton and Carlson (2004) has found that the naturalness in prosodic
boundary production (as judged by listeners) depends not just on the strength of a specific
boundary but also on the strength of surrounding boundaries. Studies by Schafer (1997),
Carlson, Clifton & Frazier (2001), Clifton, Carlson and Frazier (2002) and Jun (2003a)
have shown that listeners’ interpretation of boundary strength depends on the boundary
strength of surrounding boundaries, indicating that global prosodic structure guides
listeners’ interpretation.
Such at-a-distance effects of prosodic boundaries across studies lead us to
consider the possibility that prosodic boundaries, as instantiated by π-gestures, are
coordinated events, in this way allowing for their mutual influence. Constriction gestures
have been found to be coordinated through gestural coupling to account, for example, for
speech errors (Goldstein et al. 2007). Coordination of π-gestures would extend the
parallel between constriction and prosodic gestures that has been postulated to exist with
the π-gesture approach to prosodic boundaries (Byrd et al. 2000, Byrd & Saltzman 2003).
175
In what follows we will present an outline of prosodic production and speech
planning that incorporates the results of this dissertation. We will discuss where prosodic
boundaries occur, how boundary strength is determined and how the scope of effect of
prosodic boundaries is accounted for.
The basic idea is that prosodic structure is the result of different factors, each
influencing the prosodic structure in a certain direction (see for similarly conceptualized
approaches to prosodic structure Abney 1992, Truckenbrodt 1999, Jun 2002). We
propose that prosodic structure is an interplay between syntactic, pragmatic,
phonological, and prosodic forces that can interact via coupling mechanisms.
Following the arguments outlined in Keating & Shattuck-Hufnagel (2002), and
our previous discussion of global effects of prosodic boundaries, we assume a prosody-
first approach, in which prosodic structure is built before all phonological/phonetic word
information becomes available. We start with a syntactic structure, already marked for
prominence. Without specifying the exact amount of syntactic structure available at the
point that prosodic structure is built, we will assume, as a working hypothesis, that it is a
sentence. (This part of our proposal is similar to Keating & Shattuck-Hufnagel 2002. As
they point out, this is a simplified assumption, because it assumes that syntactic structure
is already complete, while in fact it might be built incrementally). Note that nothing
hinges on the exact size of the syntactic chunk available, it is only important that there is
176
a syntactic chunk available that is large enough for a prosodic phrase above the word
level to be built.
3
Syntactic structure determines the occurrence of prosodic boundaries to a large
extent. Certain syntactic structures generally induce a boundary, for example vocatives or
unrestrictive relative clauses (e.g., see Selkirk 1981, Nespor & Vogel 1986). In addition
to the structures regularly inducing a prosodic boundary, since clauses are often marked
by IP boundaries we will assume (perhaps as an oversimplification) that the clausal
syntactic unit forces a prosodic boundary as well. We will also assume that any syntactic
boundary encourages a prosodic boundary to some extent, but less complex structures are
less likely to induce a boundary than more complex structures (i.e., boundaries that are
more embedded). So syntactic structure induces prosodic boundaries, with different
structures forcing prosodic boundaries to a different degree. Note that while syntactic
structures demand a prosodic boundary, interaction with other factors may also influence
the prosodic structure in a different direction.
Based on our results showing the influence of prosodic structure on pause
duration (Chapter 5), and on the findings of Grosjean et al. (1979), we also suggest that
prosodic boundaries are most preferred at regular intervals. We suggested that this
prosodic or phrasal rhythmicity might be achieved by coupling the planning level
oscillator of prosodic gestures to a phrase level oscillator (as the one proposed by Nam et
al. 2006). While the phrase level oscillator has not yet been modeled, the suggestion is
that it regulates the occurrence of prosodic boundaries at periodic intervals. Phrase length
3
In fact, since we view various factors as interacting with each other, it is conceivable that prosodic
structure could impact syntactic structure as well, in the sense that prosodic factors could, in principle,
force a syntactic boundary to occur.
177
effects on boundary occurrence might be accounted for in the same vein. Jun (1993)
shows that Accentual Phrases (AP) in Korean most prefer to have five or fewer syllables,
suggesting a rhythmic effect, and we suggest that effects like these are also implemented
through a phrase level oscillator, whose cycles determine the intervals at which prosodic
boundaries occur. Note that such a view of phrase-length effects allows us to incorporate
them without relying on any syllable or word counting mechanisms.
The strength of the prosodic boundary, once in place, depends on a number of
factors: depth of syntactic embedding, phrase length (e.g., Strangert 1997, Zvonik &
Cummins 2002, 2003) and speech rate. We assume that depth of syntactic embedding
will be directly reflected in prosodic boundary strength, in the sense that more embedded
syntactic boundaries lead to a stronger activation of the π-gesture. It is unclear at this
point in what way the effects of phrase length could be incorporated. Speech rate effects
on boundary strength are exhibited in pausing patterns, such that faster speech leads to
shorter pauses (see e.g., Goldman Eisler 1968, Fletcher 1987 for French; and Trouvain &
Grice 1999 for German; but see Butcher 1981 for different results). These effects of
speech rate need to be modeled, but they can be conceived of as the result of global rate
effects affecting the temporal elasticity of the entire gestural score (see Saltzman 1999).
The point in the phrase at which the π-gesture becomes active, i.e., its
coordination, must also be considered. We will go forward with the starting assumption
that in English the π-gesture is coupled to the last lexical stress in a phrase such that it
starts to become active synchronously with the onset of the accented syllable or vowel
(Byrd, p.c.). Under this hypothesis, the location of the prosodic boundary would be
178
determined by the factors mentioned above and the exact onset of the π-gesture would be
determined by lexical stress location. Support for this idea comes from studies by
Shattuck-Hufnagel and Turk (1998) and Turk (1999), who find that phrase final
lengthening extends to the stressed syllable of a pre-boundary word. Our own study on
the scope of effect of prosodic boundaries (Chapter 2) only partially supports this view in
that for one of the three subjects (subject R) the effect of the boundary reached as far as
the lexically stressed syllable. However, the results of Byrd et al. (2006) do not support
the hypothesis, as in that study, the scope of pre-boundary lengthening did not reach the
stressed syllable. Further studies are needed to test this working hypothesis and possible
interactions with pitch accent.
Finally, the intonational contour of an utterance is determined by pragmatic
factors that yield particular pitch accents, phrasal accents, and boundary tones (see e.g.,
Levelt 1989, Pierrehumbert & Hirschberg 1990). Whether a phrasal accent and boundary
tone will occur can be conceived of as depending on a criterial strength of the π-gesture,
in the sense that phrase level tones are triggered by a prosodic boundary of a specific
strength (see Byrd 2006). The nature of the tones themselves (e.g., H*, L*) is a question
orthogonal to our current considerations.
In summary, we see prosodic structure emerging directly from syntactic structure
through an interaction of different forces that determine the occurrence of prosodic
boundaries and that determine their strength. The effects of syntactic structure combine
with the rhythmic effect factor, which forces regularity in boundary assignment and
179
might also be the factor behind phrase length effects. Finally, lexical stress interacts with
prosodic boundaries, attracting the π-gesture.
We have been outlining a view in which prosodic structure is directly phonetically
realized from syntactic structure, without an intermediate prosodic structure. Prosodic
hierarchy is realized through boundary strength. Note however, that nothing hinges on
this approach, as the same principles as the ones outlined above could apply if a separate
prosodic structure is assumed.
The outlined view of prosodic structure production is schematically represented in
Figure 6.2.
180
¿ ¿ ¿ ¿ [[[Her] [[[older] [brother]] [[from] [Bern]]]] [[spoiled] [[Anne’s] [vacation]]]]
¿ ¿
syntactic, semantic and discourse
level information (e.g., Keating &
Shattuck-Hufnagel 2002)
- increased likelihood of
π-gesture activation
¿ - lexical stress
syntactic
forces
P R O C E S S I N G
rhythmic
forces
cycle n cycle n+1
rhythmic oscillator rhythmic oscillator
prominence
(lexical stress)
cumulative activation
threshold trigger field
RESULTING
PRODUCTION
π-gesture π-gesture
constriction gestures
Figure 6.2. A theoretical schematization of the production of prosodic structure. The different forces influencing
prosodic boundaries are given on separate tiers. The arrows indicate a likelihood of activation of the π-gesture, as
determined by one of the factors (syntactic, rhythmic or lexical stress prominence). Each syntactic boundary forces a
prosodic boundary, and the more complex the structure, the more likely a boundary will occur (two degrees of
likelihood are represented in the figure with arrow thickness; although the degrees of likelihood are a continuum, not
binary). On the rhythmic tier, in this example, a boundary is rhythmically forced every six syllables (the actual
frequency and the oscillatory unit by which the rhythm is determined are empirical questions). The shading of a tier
represents the weighting of that tier, such that syntactic structure exerts the strongest influence; prominence determines
only the point at which the π-gesture becomes active and exerts no other influence on boundary occurrence. The
cumulative activation threshold trigger field shows the cumulative result of all the forces in their combined influence
on the occurrence of prosodic boundaries. Here three degrees of activation are shown (none, light, and dark), but the
strength of activation is assumed to be a continuum. A certain sufficient strength will activate or trigger the π-gesture
(dark shading in the figure). The interaction between the different forces can be seen: in this example rhythmicity
forces a boundary after ‘from’ but the syntactic boundary is much stronger after ‘Bern’, so this syntactically forced
boundary combines with the quite nearby rhythmically forced boundary, resulting in the triggerfield being very strong
in this area. Note further that while the arrows point down, indicating the directional flow of the activation for the π-
gesture, it is in fact assumed that all the levels could interact – for example rhythmicity could affect syntactic choices.
Additionally, nothing in this model precludes the resulting production from affecting processing. An example of such
an interaction could be the compensatory shortening observed in Byrd et al. 2006 and in Chapter 2 of this dissertation,
which was interpreted as triggered by the lengthening caused by the prosodic boundary production (see also Saltzman,
Löfqvist, & Mitra 2000).
181
Summary
This dissertation has demonstrated further reasons for the inclusion of temporal
information into the prosodic representation, as suggested by Byrd et al. (2000) and Byrd
& Saltzman (2003). We have observed that global prosodic structure affects speech
planning, that prosodic boundaries are produced in an overall categorical manner,
although indications of gradient boundary production was observed in Chapter 2, and that
listeners perceive boundaries in a gradient manner and particularly respond to articulatory
movements having most salient phrasal influences. We have discussed the implications of
our findings for models of speech production and prosodic theory and given an outline of
our view of production of prosodic structure.
As a whole, these experimental studies allow us to refine our understanding of the
integration of prosodic linguistic structure with speech production and perception
processes, allow us to evaluate theories of prosodic structure and speech production and
point to directions for future research.
182
References:
Abney, S. (1992). Prosodic structure, performance structure and phrase structure. In:
Proceedings of Speech and Natural Language Workshop. San Mateo, CA:
Morgan Kaufmann Publishers, pp. 425-428.
Arbisi-Kelm, T. 2006. An Intonational Analysis of Disfluency Patterns in Stuttering.
Unpublished Ph.D. dissertation. University of California, Los Angeles, Los
Angeles, California.
Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-
oscillator model of rhythm production. In: Proceedings of the Speech Prosody
2002 Conference. Aix-en-Provence, pp. 163-166.
Beckman, M. E. & J. Edwards (1992). Intonational categories and the articulatory control
of duration. In: Speech Perception, Production and Linguistics Structure. Edited
by Y. Tohkura, E. Vatikiotis-Bateson & Y. Sagisaka. Tokyo, Japan: Ohmsha, pp.
359-375.
Beckman, M. E. & G. A. Elam (1997). Guidelines for ToBI labelling. Version 3.0,
unpublished ms. (available online at: http://www.ling.ohio-
state.edu/~tobi/ame_tobi/labelling_guide_v3.pdf).
Beckman, M. E. & J. B. Pierrehumbert (1986). Intonational structure in Japanese and
English. Phonology Yearbook, 3, 255-309.
Berkovits, R. (1993a). Progressive utterance-final lengthening in syllables with final
fricatives. Language and Speech, 36, 89-98.
Berkovits, R. (1993b). Utterance-final lengthening and the duration of final-stop closures.
Journal of Phonetics, 21, 479-489.
Berkovits, R. (1994). Durational effects in final lengthening, gapping, and contrastive
stress. Language and Speech, 37, 237-250.
Browman, C. P. & L. M. Goldstein (1986). Towards an Articulatory Phonology.
Phonology Yearbook, 3, 219-252.
Browman, C. P. & L. M. Goldstein (1989). Articulatory gestures as phonological units.
Phonology, 6, 201-251.
Browman, C. P. & L. Goldstein (1990a). Gestural specification using dynamically-
defined articulatory structures. Journal of Phonetics, 18, 299-320.
183
Browman C. P. & L. Goldstein (1990b). Representation and reality: physical systems and
phonological structure. Journal of Phonetics, 18, 411-424.
Browman, C. P. & L. M. Goldstein (1992). Articulatory Phonology: An overview.
Phonetica, 49, 155-180.
Browman, C. P. & L. Goldstein (1995). Dynamics and Articulatory Phonology. In: Mind
as Motion: Explorations in the Dynamics of Cognition. Edited by R. F. Port & T.
Van Gelder. Cambridge, MA: The MIT Press, pp. 175-193.
Butcher, A. (1981). Aspects of the Speech Pause: Phonetic Correlates and
Communicative Functions. Arbeitsberichte. Institut fur Phonetik, Kiel.
Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures.
Phonetica, 57, 3-16.
Byrd, D. (2006). Relating prosody and dynamic events: Commentary on the papers by
Cho, Navas, and Smiljani ć. In: Laboratory Phonology 8: Varieties of
Phonological Competence. Edited by L. Goldstein. Berlin/New York: Mouton De
Gruyter, pp. 549-561.
Byrd, D., Kaun, A., Narayanan, S., & E. Saltzman (2000). Phrasal signatures in
articulation. In: Papers in Laboratory Phonology 5. Acquisition and the Lexicon.
Edited by M. B. Broe & J. B. Pierrehumbert. Cambridge, UK: Cambridge
University Press, pp. 70-87.
Byrd, D., Krivokapi ć, J. & S. Lee. (2006). How far, how long: On the temporal scope of
phrase boundary effects. Journal of the Acoustical Society of America, 120, 1589-
1599.
Byrd, D., Lee, S., Riggs, D., & J. Adams. (2005). Interacting effects of syllable and
phrase position on consonant articulation. Journal of the Acoustical Society of
America, 118, 3860-3873.
Byrd, D. & E. Saltzman (1998). Intragestural dynamics of multiple phrasal boundaries.
Journal of Phonetics, 26, 173-199.
Byrd, D. & E. Saltzman (2003). The elastic phrase: Modeling the dynamics of boundary-
adjacent lengthening. Journal of Phonetics, 31, 149-180.
Cambier-Langeveld, T. (1997). The domain of final lengthening in the production of
Dutch. In: Linguistics in the Netherlands 1997. Edited by J. Coerts & H. de Hoop.
Amsterdam: John Benjamins, pp. 13-24.
184
Cambier-Langeveld, T., Nespor, M., & V. J. Van Heuven. (1997). The domain of final
lengthening in production and perception in Dutch. In: Proceedings of
EUROSPEECH 1997, 5
th
European Conference on Speech Communication and
Technology. Rhodes, Greece, pp. 931-934.
Campbell W. N. & S. D. Isard. (1991) Segment durations in a syllable frame. Journal of
Phonetics, 19, 37-47.
Carlson, K., Clifton Jr., C. & L. Frazier (2001). Prosodic boundaries in adjunct
attachment. Journal of Memory and Language, 45, 58-81.
Cho, T. (2002). The Effects of Prosody on Articulation in English. New York and
London: Routledge.
Cho, T. (2005) Prosodic strengthening and featural enhancement: Evidence from acoustic
and articulatory realizations of /a,i/ in English. Journal of the Acoustical Society
of America, 117, 3867-3878.
Cho, T. & P. Keating (2001). Articulatory and acoustic studies on domain-initial
strengthening in Korean. Journal of Phonetics, 29, 155-190.
Cho, T., McQueen J., & E. Cox (2007). Prosodically driven phonetic detail in speech
processing: The case of domain-initial strengthening in English. Journal of
Phonetics, 35, 210-243.
Choi, E. (2003). Pause length and speech rate as durational cues for prosody. Poster,
Journal of the Acoustical Society of America, 114, 2395.
Clifton Jr., C., Carlson, K., & L. Frazier (2002). Informative prosodic boundaries.
Language and Speech, 45, 87-114.
Cooper, W. E. & J. Paccia-Cooper (1980). Syntax and Speech. Cambridge, MA: Harvard
University Press.
Cummins, F. (2004). Synchronization among speakers reduces macroscopic temporal
variability. 26th Annual Meeting of the Cognitive Science Society, 304-309.
Cummins, F. & R. F. Port (1998). Rhythmic constraints on stress timing in English.
Journal of Phonetics, 26, 145-171.
Cutler, A., Dahan, D., & W. Donselaar (1997). Prosody in the comprehension of spoken
language: a literature review. Language and Speech, 40, 141-201.
Dilley, L., Shattuck-Hufnagel, S., & Ostendorf, M. (1996). Glottalization of vowel-initial
syllables as a function of prosodic structure. Journal of Phonetics, 24, 423-444.
185
Duda, R. O., Hart, P. E., & D. G. Stork (2001). Pattern Classification. New York, NY:
Willey-Interscience.
Edwards, J., Beckman, M. E., & J. Fletcher (1991). The articulatory kinematics of final
lengthening. Journal of the Acoustical Society of America, 89, 369-382.
Ferreira, F. (1991). Effects of length and syntactic complexity on initiation times for
prepared utterances. Journal of Memory and Language, 30, 210-233.
Ferreira, F. (1993). Creation of prosody during sentence production. Psychological
Review, 100, 233-253.
Ferreira, F., & B. Swets. (2002). How Incremental Is Language Production? Evidence
from the Production of Utterances Requiring the Computation of Arithmetic
Sums. Journal of Memory and Language, 46, 57-84.
Fletcher, J. (1987). Some micro and macro effects of tempo change on timing in French.
Linguistics, 25, 951-967.
Fougeron, C. (2001). Articulatory properties of initial segments in several prosodic
constituents in French. Journal of Phonetics, 29, 109-135.
Fougeron, C. & P. Keating (1997). Articulatory strengthening at edges of prosodic
domains. Journal of the Acoustical Society of America, 101, 3728-3740.
Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-
realist perspective. Journal of Phonetics, 14, 3-28.
Fowler, C. A. (1996). Listeners do hear sounds, not tongues. Journal of the Acoustical
Society of America, 99, 1730-1741.
Frazier, L., Clifton Jr., C., & K. Carlson (2004). Don’t break, or do: prosodic boundary
preferences. Lingua, 114, 3-27.
Gaitenby, J. H. (1965). The elastic word. Haskins Report SR-2, 3.1-3.12.
Gee, J. P. & F. Grosjean (1983). Performance structures: A psycholinguistic and
linguistic appraisal. Cognitive Psychology, 15, 411-458.
Goldman Eisler, F. (1968). Psycholinguistics. Experiments in Spontaneous Speech.
London and New York: Academic Press.
Goldstein, L., Pouplier, M., Chen, L., Saltzman, E., & D. Byrd (2007). Gestural action
units slip in speech production errors. Cognition, 103, 386-412.
186
Granqvist, S. (1996). Enhancements to the Visual Analogue Scale, VAS, for listening
tests. TMH-QPSR, 4/1996, 61-62. Stockholm: Department of Speech, Music and
Hearing, Royal Institute of Technology.
Griffin, Z. (2003). A Reversed Word Length Effect in Coordinating the Preparation and
Articulation of Words in Speaking. Psychonomic Bulletin and Review, 10, 603-
609.
Grosjean F., Grosjean L., & H. Lane (1979). The patterns of silence: Performance
structures in sentence production. Cognitive Psychology, 11, 58-51.
Hansson, P. (2003). Prosodic Phrasing in Spontaneous Swedish. Ph.D. Dissertation.
Travaux de l’institut de linguistique de Lund 43. Lund: Department of Linguistics
and Phonetics, Lund University.
Hayes, B. (1989). The prosodic hierarchy in meter. In: Rhythm and meter. Edited by P.
Kiparsky & G. Youmans. Orlando, FL: Academic Press, pp. 201-260.
Horne, M., Strangert, E. & M. Heldner. (1995). Prosodic boundary strength in Swedish:
final lengthening and silent interval duration. In: Proceedings of the XIIIth
International Congress of Phonetic Sciences,. Stockholm: KTH and Stockholm
University, Vol. 1, pp. 170-173.
Jun, S.-A. (1993). The Phonetics and Phonology of Korean Prosody. Unpublished Ph.D.
dissertation. The Ohio State University, Columbus, Ohio.
Jun, S.-A. (1998). The Accentual Phrase in the Korean prosodic hierarchy. Phonology,
15, 189-226.
Jun, S.-A. (2002) Syntax over Focus. In: Proceedings of the International Conference on
Spoken Language Processing. Denver, Colorado, pp. 2281-2284.
Jun, S.-A. (2003a). Prosodic phrasing and attachment preferences. Journal of
Psycholinguistic Research, 32, 219-249.
Jun, S.-A. (2003b). The effect of phrase length and speech rate on prosodic phrasing. In:
Proceedings of the XVth International Congress of Phonetic Sciences, Barcelona,
Spain, pp. 483-486.
Keating, P.A. (1990). The window model of coarticulation: articulatory evidence. In:
Papers in Laboratory Phonology I. Edited by J. Kingston & M. Beckman.
Cambridge, UK: Cambridge University Press, pp. 451-470.
187
Keating, P. & S. Shattuck-Hufnagel (2002). A prosodic view of word form encoding for
speech production. UCLA Working Papers in Phonetics, 101, 112-156.
Keating, P., Cho, T., Fougeron, C., & C. Hsu (2004). Domain-initial articulatory
strengthening in four languages. In: Phonetic Interpretation (Papers in
Laboratory Phonology VI). Edited by J. Local, R. Ogden & R. Temple.
Cambridge, UK: Cambridge University Press, pp. 143-161.
Klatt, D. (1975). Vowel lengthening is syntactically determined in connected discourse,
Journal of Phonetics, 3, 129-140.
Kohler, K. (1983). Prosodic boundary signals in German. Phonetica, 40, 89-134.
Krivokapi ć, J. (2006). The scope of effect of prosodic boundaries in articulation. Poster
presented at the 151
st
Meeting of the Acoustical Society of America, Providence,
Rhode Island, June 2006. Journal of the Acoustical Society of America, 119,
3304.
Krivokapi ć, J. (2007). Prosodic planning: Effects of phrasal length and complexity on
pause duration. Journal of Phonetics, 35, 162-179.
Krull D. (1997). Prepausal lengthening in Estonian: Evidence from conversational
speech. In: Estonian Prosody: Papers from a Symposium. Edited by I. Lehiste &
J. Ross. Institute of Estonian Language and Authors.
Ladd, R. (1988). Declination “reset” and the hierarchical organization of utterances.
Journal of the Acoustical Society of America, 84, 530-544.
Ladd, R. (1996). Intonational Phonology. Cambridge Studies in Linguistics. Cambridge,
UK: Cambridge University Press.
Lane, H. & F. Grosjean (1973). Perception of reading rate by listeners and speakers.
Journal of Experimental Psychology, 97, 141-147.
Lee, E-K. & J. Cole (2006). Acoustic effects of prosodic boundary on vowels in
American English. Proceedings of the Chicago Linguistic Society, Chicago, IL.
Lee, S., Byrd, D., & J. Krivokapi ć. (2006). Functional data analysis of prosodic effects on
articulatory timing. Journal of the Acoustical Society of America. 119, 1666-1671.
Lehiste, I., Olive, J. P., & L. A. Streeter (1976). Role of duration in disambiguating
syntactically ambiguous sentences. Journal of the Acoustical Society of America,
60, 1199-1202.
188
Levelt, W. J. M. (1989). Speaking. From Intention to Articulation. Cambridge, MA: MIT
Press.
Levelt, W. J. M., Roelofs, A., & A. S. Meyer (1999). A theory of lexical access in speech
production. Brain and Behavioral Sciences, 22, 1-38.
Meyer, A. S., Belke, E., Häcker, C., & L. Mortensen (2007). Use of word length
information in utterance planning. Journal of Memory and Language, 57, 210-
231.
Nam H., Goldstein L., & E. Saltzman (to appear). Self-organization of syllable structure:
a coupled oscillator model. In: Approaches to phonological complexity. Edited by
F. Pellegrino, E. Marisco, & I. Chitoran. Berlin/New York: Mouton de Gruyter.
Nam, H. & E. Saltzman (2003). A competitive, coupled oscillator of syllable structure.
Proceedings of the XVth International Congress of Phonetic Sciences, Barcelona,
Spain, pp. 2253-2256.
Nam, H., Saltzman, E., & L. Goldstein (2006). Dynamical modeling of supragestural
timing. Poster presented at the 10th Conference on Laboratory Phonology, June
2006.
Nespor, M. & I. Vogel (1986). Prosodic Phonology. Dordrecht, Holland/Riverton, USA:
Foris Publications.
Oller, K. D. (1973). The effect of position in utterance on speech segment duration in
English. Journal of the Acoustical Society of America, 54, 1235-1247.
Pierrehumbert, J. B. (1980). The Phonology and Phonetics of English Intonation. Ph.D.
Dissertation, MIT, Cambridge, Massachusetts.
Pierrehumbert, J. & M.Y. Liberman (1982). Modelling the fundamental frequency of the
voice. Contemporary Psychology, 27, 690-692.
Pierrehumbert, J. & J. Hirschberg (1990). The Meaning of Intonation in the Interpretation
of Discourse. In: Intentions in Communication. Edited by P. Cohen, J. Morgan &
M. Pollack. Cambridge, MA: MIT Press, pp. 271-311.
Pierrehumbert, J. & D. Talkin (1991). Lenition of /h/ and glottal stop. In: Papers in
Laboratory Phonology II. Cambridge, UK: Cambridge University Press, pp. 90-
117.
de Pijper, J. R. & A. A. Sanderman (1994). On the perceptual strength of prosodic
boundaries and its relation to suprasegmental cues. Journal of the Acoustical
Society of America, 96, 2037-2047.
189
Pikovsky A, Rosenblum M, & J. Kurths. (2001). Synchronization - A Universal Concept
in Non-linear Sciences. Cambridge UK: Cambridge University Press.
Port, R. (2002). Implications of rhythmic discreteness in speech. Paper prepared for the
conference Temporal Integration in the Perception of Speech in Aix-en-Provence,
April 8-11, 2002, available at http://www.cs.indiana.edu/~port/pubs.html
Port, R. F., Tajima, K., & F. Cummins(1996). Self-entrainment in animal behavior and
human speech. Online proceedings of the Midwest Artificial Intelligence and
Cognitive Science Conference. Indiana University, Bloomington, available at
http://www.cs.indiana.edu/event/maics96/proceedings.html
Price, P. J., Ostendorf, M., Shattuck-Hufnagel, S., & C. Fong (1991). The use of prosody
in syntactic disambiguation. Journal of the Acoustical Society of America, 90,
2956-2970.
Redi, L. & S. Shattuck-Hufnagel. (2001) Variation in realization of glottalization in
normal speakers. Journal of Phonetics, 29, 407-429.
Saltzman, E. L. (1995). Dynamics and coordinate systems in skilled sensorimotor
activity. In: Mind as motion: Dynamic, behavior, and cognition. Edited by R. Port
& T. Van Gelder. Cambridge, MA: MIT Press, pp. 149-173.
Saltzman, E. (1999). Nonlinear dynamics of temporal patterning in speech. In:
Proceedings of symposium on the dynamics of the production and perception of
speech, a satellite symposium of the XIVth international congress of phonetic
sciences. Edited by P. L. Divenyi, & R. J. Porter. Martinez, CA: East Bay Institute
for Research and Education, Inc.
Saltzman, E. & D. Byrd (2000) Task-dynamics of gestural timing: Phase windows and
multifrequency rhythms. Human Movement Science, 19, 499-526.
Saltzman, E. & J. A. S. Kelso (1987). Skilled actions: A task dynamic approach.
Psychological Review, 94, 84-106.
Saltzman, E., Löfqvist, A., & S. Mitra. (2000). ‘Glue’ and ‘clocks’: intergestural
cohesion and global timing. In: Papers in Laboratory Phonology V. Edited by M.
B. Broe & J. B. Pierrehumbert. Cambridge, MA: Cambridge University Press, pp.
88-101.
Saltzman, E. L. & K. G. Munhall (1989). A dynamical approach to gestural patterning in
speech production. Ecological Psychology, 1, 333-382.
190
Sanderman, A. A. & R. Collier (1995). Prosodic phrasing at the sentence level. In:
Producing speech: Contemporary Issues. For Katherine Safford Harris. Edited
by F. Bell-Berti & L. J. Raphael. New York: American Institute of Physics, pp.
321-332.
Schafer, A. J. (1997). Prosodic parsing: The role of prosody in sentence comprehension.
Unpublished Ph.D. dissertation. University of Massachusetts, Amherst,
Massachusetts.
Scott, D.R. (1982). Duration as a cue to the perception of a phrase boundary. Journal of
the Acoustical Society of America, 71, 996-1007.
Selkirk, E. (1981). On prosodic structure and its relation to syntactic structure. In: Nordic
Prosody II. Edited by T. Fretheim. Trondheim: Tapir, pp. 111-140.
Selkirk, E. (1984). Phonology and Syntax: The Relation Between Sound and Structure.
Cambridge, MA: MIT Press.
Selkirk, E. (1986). On derived domains in sentence phonology. Phonology Yearbook, 3,
371-405.
Selkirk, Elisabeth (1995): Sentence prosody: intonation, stress, and phrasing. In: The
Handbook of Phonological Theory. Edited by J. A. Goldsmith. Cambridge, MA,
and Oxford, UK: Blackwell, pp. 550-69.
Shattuck-Hufnagel, S. & Turk, A. (1996). A Prosody Tutorial for Investigators of
Auditory Sentence Processing. Journal of Psycholinguistic Research, 25, 193-
247.
Shattuck-Hufnagel, S. & A. Turk (1998). The domain of phrase-final lengthening in
English. In: The Sound of the Future: A Global View of Acoustics in the 21st
Century, Proceedings of the 16th International Congress on Acoustics and 135th
Meeting Acoustical Society of America, 1235-1236.
Silverman, Kim E. A. (1990). The separation of prosodies: comments on Kohler’s paper.
In: Papers in laboratory phonology I: Between the grammar and physics of
speech. Edited by J. Kingston & M. E. Beckman. Cambridge, UK: Cambridge
University Press, pp. 139-151.
Smith, M. & L. Wheeldon (1999). High level processing scope in spoken sentence
production. Cognition, 73, 205-246.
Strangert, E. (1991). Pausing in texts read aloud. In: Proceedings of the XIIth
International Congress of Phonetic Sciences. Aix-en-Provence: Université de
Provence, Service des Publications, Vol. 4, pp. 238-241.
191
Strangert, E. (1997). Relating prosody to syntax: boundary signaling in Swedish. In:
Proceedings of the 5th European Conference on Speech Communication and
Technology. Vol. 1, pp. 239-242.
Streeter, L.A. (1978). Acoustic determinant of phrase boundary perception. Journal of the
Acoustical Society of America, 64, 1582-1592.
Swerts, M. (1997). Prosodic features at discourse boundaries of different strength.
Journal of the Acoustical Society of America, 101, 514-521.
Swets, B., Desmet, T., Hambrick, D. Z., & F. Ferreira (2007). The role of working
memory in syntactic ambiguity resolution: A psychometric approach. Journal of
Experiment Psychology: General, 64-81.
Tabain, M. (2003a). Effects of prosodic boundary on /aC/ sequences: acoustic results.
Journal of the Acoustical Society of America, 113, 516-531.
Tabain, M. (2003b). Effects of prosodic boundary on /aC/ sequences: articulatory results.
Journal of the Acoustical Society of America, 113, 2834-2849.
Tabain, M. & P. Perrier (2005). Articulation and acoustics of /i/ in pre-boundary position
in French. Journal of Phonetics, 33, 77-100.
Terken, J. & R. Collier (1992). Syntactic influences on prosody. In: Speech Perception,
Production and Linguistic structure. Edited by Y. Tokhura, E. Vatikiotis-Bateson,
& Y. Sagisaki. Amsterdam, Washington, Oxford: IOS Press and Tokyo, Osaka,
Kyoto: Ohmsha, pp. 427-438.
Trouvain, J. & Grice, M. (1999). The effect of tempo on prosodic structure. In:
Proceedings of the XIVth International Congress of Phonetic Sciences. San
Francisco, CA, pp. 1067-1070.
Truckenbrodt, H. (1999). On the relation between syntactic phrases and phonological
phrases. Linguistic Inquiry, 30, 219-255.
Turk, A. E. (1999). Structural influences on boundary-related lengthening in English. In:
Proceedings of the XIVth International Congress of Phonetic Sciences. San
Francisco, CA, pp. 237-240.
Watson, D. & E. Gibson (2004). The relationship between intonational phrasing and
syntactic structure in language production. Language and Cognitive Processes,
19, 713-755.
192
Wewers, M. E. & N. K. Lowe (1990). A critical review of Visual Analogue Scales in the
measurement of clinical phenomena. Research in Nursing & Health, 13, 227-236.
Wheeldon, L. & A. Lahiri (1997). Prosodic units in speech production. Journal of
Memory and Language, 37, 356-381.
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & P. J. Price (1992). Segmental
durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical
Society of America, 91, 1707-1717.
Zvonik, E., & F. Cummins (2002). Pause duration and variability in read texts. In:
Proceedings of the 2002 International Conference on Spoken Language
Processing. Denver, Colorado, pp. 1109-1112.
Zvonik, E., & F. Cummins (2003). The effect of surrounding phrase lengths on pause
duration. In: Proceedings of EUROSPEECH 2003. Geneva, Switzerland, pp. 777-
780.
Abstract (if available)
Abstract
This dissertation examines aspects of phrase boundary production, perception and the structural properties of boundaries from a multifaceted experimental perspective. The term prosody refers to the accentual prominence and phrasal organization of speech, and the dissertation focuses on the later. An example of this aspect of phrasal organization is given below, where the two sentences differ in prosodic phrasing.a. She knew, Ann thought, about the present.b. She knew Ann thought about the present. In addition to intonational events, at their edges, prosodic phrase boundaries introduce systematic phonetic variation in the temporal properties of segments. Acoustic studies have shown that at boundaries segments increase in duration. Articulatory studies have shown that speech movements -- gestures -- become temporally longer in the vicinity of boundaries and that this articulatory lengthening increases with boundary strength. In this dissertation a series of experimental studies is presented examining a) the articulation of gestures near phrase junctures, b) the categoricity and gradiency in the production and in the perception of prosodic boundaries, c) the link between articulatory properties of boundaries and listeners' perception of boundaries, and d) the effect of prosodic structure on pause duration. Results from this research further our understanding of the linguistic representation of prosodic structure and its relation to processes involved in producing spoken language.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
The prosodic substrate of consonant and tone dynamics
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Individual differences in phonetic variability and phonological representation
PDF
Emotional speech production: from data to computational models and applications
PDF
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
PDF
Investigating the production and perception of reduced speech: a cross-linguistic look at articulatory coproduction and compensation for coarticulation
PDF
A computational framework for exploring the role of speech production in speech processing from a communication system perspective
PDF
Interaction between prosody and information structure: experimental evidence from Hindi and Bangla
PDF
The phonology and phonetics of Turkish intonation
PDF
Emotional speech resynthesis
PDF
The Spanish feminine el at the syntax-phonology interface
PDF
Visualizing and modeling vocal production dynamics
PDF
Structure and function in speech production
PDF
Speech production in post-glossectomy speakers: articulatory preservation and compensation
PDF
The effects of a multi-linguistic diagnostic spelling intervention on the writing achievement and writing self-perception beliefs of secondary students: phonology, orthography, and morphology
PDF
Functional real-time MRI of the upper airway
PDF
Creating cities and citizens: municipal boundaries, place entrepreneurs, and the production of race in Los Angeles county, 1926-1978
PDF
The role of individual variability in tests of functional hearing
PDF
Copy theory of movement and PF conditions on spell-out
Asset Metadata
Creator
Krivokapic, Jelena (author)
Core Title
The planning, production, and perception of prosodic structure
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Degree Conferral Date
2007-08
Publication Date
06/21/2007
Defense Date
05/08/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
articulatory phonology,gradiency in prosody,OAI-PMH Harvest,prosodic boundaries,prosodic boundary perception,prosodic boundary production,prosodic planning,prosody,speech production,synchronous speech
Language
English
Advisor
Byrd, Dani (
committee chair
), Arbib, Michael A. (
committee member
), Jun, Sun-Ah (
committee member
), Narayanan, Shrikanth S. (
committee member
)
Creator Email
krivokap@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m547
Unique identifier
UC1476569
Identifier
etd-Krivokapic-20070621 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-510610 (legacy record id),usctheses-m547 (legacy record id)
Legacy Identifier
etd-Krivokapic-20070621.pdf
Dmrecord
510610
Document Type
Dissertation
Rights
Krivokapic, Jelena
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
articulatory phonology
gradiency in prosody
prosodic boundaries
prosodic boundary perception
prosodic boundary production
prosodic planning
prosody
speech production
synchronous speech