Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Speech production in post-glossectomy speakers: articulatory preservation and compensation
(USC Thesis Other)
Speech production in post-glossectomy speakers: articulatory preservation and compensation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i
SPEECH PRODUCTION IN POST-GLOSSECTOMY SPEAKERS:
ARTICULATORY PRESERVATION AND COMPENSATION
by
Christina M. Hagedorn
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
Linguistics
August 2015
i
Acknowledgements
As I reflect on my time spent at USC, I feel overwhelmingly grateful for the many
individuals who have helped to make my graduate school experience so enriching and
expansive.
My advisor, Louis Goldstein, has been a constant source of support and
inspiration. From the very beginning, Louis, so intuitively, provided the structure in
which I learned to think more creatively and broaden my intellectual horizons. Over time,
and in his gentle yet powerful way, Louis taught me to cultivate that structure, on my
own, by helping me understand the importance of considering a broader, more inclusive
view of seemingly low-level phenomena. Being aware of such interconnectivity and
observing patterns mirrored at various levels in our universe invoke in me a sense of awe,
time and time again, and continually serve as a source of inspiration for my work. Louis’s
mentorship and guidance have been instrumental in my personal growth during what has
been a particularly transformative period in my life. I could not have asked for a more
supportive or compassionate advisor.
In addition, I am grateful for the USC Phonetics and Phonology group, and am
especially thankful for the useful feedback and direction provided by Khalil Iskarous,
Dani Byrd, and Rachel Walker. I also feel very fortunate to have been part of the SPAN
group at USC. Shri Narayanan has encouraged my work with such enthusiasm and my
research has benefited greatly thanks to his input and feedback. I’m especially grateful
for the support that Shri provided me during my last semester at USC.
During my first four years at USC, Mike Proctor spent hours acquainting me with
rtMRI data and patiently teaching me how to use analytical tools for rtMRI data,
painstakingly covering every detail. On numerous occasions, Mike spent weeks writing
and customizing analytical algorithms specifically for my projects, and for this I will be
forever indebted to him. Adam Lammert and Fang-Ying Hsieh have also graciously
helped me with coding on numerous occasions. I am so grateful to Asterios Toutios,
Jangwon Kim, Adam Lammert, Yinghua Zhu, Vikram Ramanarayanan, Yoon Kim, Sajan
Lingala and Colin Vaz for carrying out the many tasks involved in scanning rtMRI
subjects and for giving up numerous Sunday afternoons to do so. Jangwon Kim assisted
me tremendously in providing the image segmentation tools needed to carry out the
ii
principal component analyses presented in my dissertation and Colin Vaz provided
invaluable assistance in creating the online form required to gather the perception study
data presented in this paper, as well. None of the data collection necessary for my
dissertation would have been possible without the help of Dr. Uttam Sinha and his
patients, who, under Dr. Sinha’s attentive care and guidance, participated in this research
with such willingness and enthusiasm.
I am fortunate to have been part of and supported by the Hearing and
Communication Neuroscience group at USC. Being part of such a diverse group of
scientists not only exposed me to fascinating research in related disciplines, but also was
instrumental in my developing the ability to effectively share my work with a broader
audience.
I am also indebted to the many teachers and mentors whom I was fortunate
enough to study under before arriving at USC. My high school Latin teacher, “Magister”
John Friia, first taught me to decompose and study language in an analytical way and
inspired me to embrace life-long studentship. Lori Repetti first planted in me the seed of
interest in linguistic research and mentored me in my very first research project as an
undergraduate student at Stony Brook University. I can only hope to be as dedicated,
kind, and brilliant an instructor and researcher as Lori. I am thankful also to Marie
Huffman, for first sparking my interest in speech science through her flawless and
captivating presentation of introductory phonetics and for providing me my first
Teaching Assistant opportunity.
I am grateful for the companionship and moral support of those I’ve studied
alongside at USC - Priyanka Biswas, Canan Ipek, Xiao He, Barbara Tomaszewicz, Iris
Chuoying Ouyang, Ellen O’Connor, Adam Lammert and Ben Parrell, and for Joyce Perez
who provided so much mental and emotional support during my first few years in the
program. I’m especially thankful for my pre-graduate school friends, who despite my
moving across the continent and becoming engulfed in work, never wavered in their
support. I am thankful for Irene Milonas, who has been by my side since 6
th
grade and
who is always there when I’m in need of valuable insight or just someone to listen. Ilario
Piagnerelli provided a great deal of emotional support during my first few years of
graduate school and spent seemingly endless hours helping me create Italian sentences
that perfectly satisfied specific phonological conditions. I’m grateful to Aleksey
iii
Shtivelman and Bret Kugelmass for always remaining connected despite the distance in
space and time, and for our many stimulating and inspiring discussions.
Tony Giuliano, Amanda Hamm, Elizabeth Myers and the rest of the Raven family
have created a space that has been so sacred and nourishing to me during my time in L.A.
and one that I will miss dearly. I’m also grateful to Josh Brill for the many thought-
provoking and meaningful discussions we’ve shared in just this short time and to Paige
Bethmann for her openness and compassion and the sweet, sensitive energy she exudes.
My parents, Bakhtavar and Ronald Hagedorn, have spent their lives thus far
selflessly dedicating themselves to fostering my academic, intellectual, and extra-
curricular growth by whatever means necessary. They continue to tenderly cultivate the
strong roots that have allowed me to reach far and wide. My sister, Diana Hagedorn, has
always lovingly provided a fresh and bright perspective when needed. Diana went as far
as to learn the IPA, just so we could transcribe and poke fun at our family members’
dialectal idiosyncrasies (then again, she was only eleven years old and didn’t have much
choice in the matter). I am also deeply appreciative of my grandparents, Maneck and
Navaz Daroowalla, for instilling in me an understanding of the importance of both
academic tenacity and moral integrity from a very young age.
Lastly, I thank Jean-Luc Chaubard for being the most loving, patient and
supportive partner I could have asked for. During my dissertation-writing period,
especially, Jean-Luc went above and beyond to take over for me in many areas, even in
the face of his own academic pressures and challenges, allowing me to focus on my work
and maintain sanity. In moments of discouragement and even panic, Jean-Luc has been
my rock, providing me with much-needed groundedness and a sense of calm, and I am
eternally grateful to have him by my side through this journey.
iv
Table of Contents
Acknowledgements………………………………………………………………...……. i
Abstract………………………………………………………………………………...viii
I Introduction……………………….…………………………………………...…1
I.1 Outline of the dissertation…..……….…………………………………5
II General methods……………………….…………………………….…………..6
II.1 Participants and materials…..……….………………………...............6
II.2 MRI acquisition method...…..……….………………………..............7
III Characterizing vocal tract shaping, acoustics and the percept of
accentedness in post-glossectomy speech…………………………………….…8
III.1 Introduction…………………………………………………….…….8
III.1.1 Vocalic constriction location and resulting formant
frequency values…………………………………………………..8
III.1.2 Hypotheses………………………………………………....8
III.2 Real-time MRI study………………………………………………....9
III.2.1 Methods…………………………………………………….9
III.2.1.1 Articulatory cross-distance analysis…………......9
III.2.1.2 Formant frequency analysis………….....11
III.2.1.3 Acoustic perception study…………...….11
III.2.2 Results: Oral tongue cancer patient M1……………...…...12
III.2.2.1 Vocal tract aperture at constriction location
gridlines………………………………………………......12
III.2.2.2 Acoustic vowel space and formant frequency
analysis……………………………………………….......15
III.2.2.3 Perception study………………….......................17
III.2.2.4 Discussion…………………................................17
III.2.3 Results: Oral and base of tongue cancer patient F1……....18
III.2.3.1 Vocal tract aperture at constriction location
gridlines…………………..................................................18
III.2.3.2 Acoustic vowel space and formant frequency
analysis…………………...................................................22
III.2.3.3 Perception study...................................................24
III.2.3.4 Discussion…………………................................24
III.2.4 Results: Base of tongue cancer patient M2…………...…..25
III.2.4.1 Vocal tract aperture at constriction location
gridlines…………………..................................................25
III.2.4.2 Acoustic vowel space and formant frequency
analysis …………………..................................................27
III.2.4.3 Perception study...................................................29
III.2.4.4 Discussion…………………................................29
III.2.5 Results: Base of tongue cancer patient M3……………….30
v
III.2.5.1 Vocal tract aperture at constriction location
gridlines ………………….................................................30
III.2.5.2 Acoustic vowel space and formant frequency
analysis …………………..................................................32
III.2.5.3 Perception study ..................................................34
III.2.5.4 Discussion …………………...............................34
III.2.6 Results: Base of tongue cancer patient M4……………….35
III.2.6.1 Vocal tract aperture at constriction location
gridlines ………………….................................................35
III.2.6.2 Acoustic vowel space and formant frequency
analysis …………………..................................................36
III.2.6.3 Perception study...................................................38
III.2.6.4 Discussion …………………...............................38
III.2.7 Results: Base of tongue cancer patient M5……………….38
III.2.7.1 Vocal tract aperture at constriction location
gridlines ………………….................................................38
III.2.7.2 Acoustic vowel space and formant frequency
analysis …………………..................................................40
III.2.7.3 Perception study ..................................................42
III.2.7.4 Discussion …………………...............................42
III.3 General Discussion………………………………………………....43
IV Preservation of constriction degree by cross-gesture compensation………...47
IV.1 Introduction and hypotheses…..……….……………………….......47
IV.1 Method…..……….………………………………………….….......47
IV.3 Results…..……….………………………………………….….......48
IV.3.1 Results: Patient M1…………………………………….....48
IV.3.1.1 Compensatory production of alveolar stops by
formation of labiodental constriction…………………….48
IV.3.1.2 Compensatory production of alveolar fricatives by
formation of velar constriction………………..………….51
IV.3.2 Results: Patient F1……………………………………......52
IV.3.2.1 Compensatory production of alveolar and velar
gestures by formation of bilabial constriction…………...52
IV.4 Discussion…..……….…………………………….……………......53
V Evidence for within-gesture compensation of constriction degree by jaw
height modulation ………………………………………………………….......56
IV.1 Introduction ……………..…..……….………………………..........56
V.1.1 Synergistic jaw movement………………………………...56
V.1.2 Hypotheses………………………………………………...56
V.2 Methods……………..…..……….………………………..................57
V.2.1 X-ray microbeam data collection………………………….57
V.2.1 Real-time MRI data collection…………………………….57
V.2.3 Correlation and regression analyses……………………….58
V.3 Results……………..…..……….………………………....................58
vi
V.3.1 Correlation between jaw position and constriction
degree…………………………………………………………….58
V.3.2 Regression slopes of jaw position and constriction
degree…………………………………………………………….60
V.4 Discussion…………………………………………………………...60
VI Evidence against durational reinforcement of contrast ……………………..62
IV.1 Introduction and hypotheses…..……….……………………….......62
VI.2 Method……………………………………………………………...63
VI.3 Results……………………………………………………………...63
VI.4 Discussion…………………………………………………………..66
VII Simulating post-glossectomy speech using the task dynamics (TaDA)
model …………………………………………………………………………....67
IV.1 Introduction …..……….…………………...………………….........67
VII.1.1 Proposal …..……….……………………….……............67
VII.1.2 Speech motor control, the task dynamics model and the
configurable articulatory synthesizer…………..………………...68
VII.2 Method …………………………..………………………………...68
VII.3 Results …………………………..………………………………...70
VII.3.1 Acoustic consequences of tongue size modification under
two task dynamic conditions……………………………………..70
VII.3.2 Consequences of tongue size modification on vocal tract
area function under two task dynamic conditions………………..71
VII.3.3 Evidence for within-gesture compensation by the task
dynamic model…………………………………………………...74
VII.4 Discussion ..……………………..………………………………...75
VII.4.1 Acoustic consequences of tongue size modification under
two task dynamic conditions……………………………………..75
VII.4.2 Consequences of tongue size modification on vocal tract
area function under two task dynamic conditions………………..80
VII.4.3 Evidence for within-gesture compensation by the task
dynamic model…………………………………………………...80
VIII Quantifying lingual flexibility using principal component analysis…………81
IV.1 Introduction ……………..…..……….………………………..........81
VIII.1.1 Lingual flexibility and Principal Component Analysis
(PCA) ……………..…..……….………………………...............81
VIII.1.2 Hypotheses…………………...........................................81
VIII.2 Methods …………………………..................................................82
VIII.2.1 Real-time MRI data acquisition………………...............82
VIII.2.2 Articulator segmentation of real-time MRI data………..82
VIII.2.3 Vocal tract cross-distance measurement………………..82
VIII.2.4 Principal Component Analysis……………………….....83
VIII.3 Results …………………………....................................................83
VIII.3.1 Results: Typical speakers MT1 and FT1……………….83
vii
VIII.3.2 Results: Oral tongue and oral and base of tongue cancer
patients M1 and F1 ………………………………………………84
VIII.3.3 Results: Base of tongue cancer patients M2-M5……….85
VIII.3.4 Results: Comparison of variance accounted for by
principal components in typical speakers and in patients …….....87
VIII.4 Discussion…………………………………...................................90
IX General discussion and conclusions…………………………………...............93
IX.1 Summary of findings………………….………………………........93
IX.2 The status of post-operative speech motor commands.……….........94
IX.3 Insight into the nature of speech motor goals.…………..…….........95
References……………………………………………………………………………….99
Appendix A: Constriction location and constriction degrees produced by the non-
blind, blind, and unaltered task dynamic models…………………………………...106
Appendix B: Stimuli for which PCA data were analyzed…………………..............107
viii
Abstract
This dissertation aims to investigate the articulatory behavior of post-operative
glossectomy patients. Patterns in patients’ vocal tract area functions, acoustic vowel
spaces and speech perceptibility scores are shown to closely reflect the particular loci of
their resections. Oral and base of tongue cancer patients exhibit articulatory
compensation in consonant and vowel production, though they do not enhance durational
cues in tense and lax vowels to aid in perceptibility. Principal component analyses on
vocal tract cross-distance measures indicate that patients exhibit less lingual flexibility
than do typical speakers. Results of a two-condition task dynamic simulation in which the
system takes into account the altered tongue size or assumes the pre-operative tongue size
are considered in light of the experimental findings based on patient data.
1
Chapter I
Introduction
Glossectomy is a surgical procedure typically undergone by tongue cancer patients. It
involves surgical resection of the lingual tissue and may affect all of the tongue (total
glossectomy), half of the tongue (hemiglossectomy) or part of the tongue (partial
glossectomy). In cases involving resection of a substantial portion of the tongue, lingual
reconstruction using a radial forearm flap may be carried out. Additionally, most
glossectomy patients undergo radiation therapy post-operatively. This series of treatments
oftentimes leads to the production of distorted speech, difficulty swallowing (dysphagia)
and limited jaw movement (trismus). This dissertation focuses on the area of distorted
speech, and investigates (i) how the glossectomy procedure impairs speech production
and (ii) what mechanisms glossectomy patients use to produce consonants and vowels
post-operatively.
By studying atypical speech and observing both the breakdown of a system and
the ways in which repair is attempted, much can be learned about the nature of the system
and the fundamental units of which it is comprised. The theoretical significance of this
work lies in shedding light on the goals of speech production and on the nature of the
fundamental units in a task of speech production. Specifically, by considering the
anatomical consequences of the glossectomy procedure to be a perturbation to the speech
system and identifying which aspects of speech articulation can and cannot be maintained
post-operatively, we gain insight as to what control parameters might be specified
(spatially, temporally, etc.) in a potentially hierarchical structure of speech motor goals.
In addition to the potential theoretical implications of investigating the structure
of speech motor goals, there are also clinical implications. The findings of these studies
can be used to inform clinicians of patients’ speech behavior post-operatively through
quantitative, fine-grained analysis of rich real-time MRI data. Ultimately, it is hoped that
that this work will play a critical role in the formulation and fine-tuning of speech therapy
programs for tongue cancer patients by characterizing key aspects of post-glossectomy
speech. Precisely, I aim to identify the aspects of articulation that are or are not
2
maintained following tissue resection and identify systematically preferred manners of
articulatory compensation.
Many studies have investigated speech articulation following the partial-
glossectomy procedure using acoustics (Savariaux et al., 2001; Zhou et al., 2011;
McMicken et al., 2012), cineradiography (McMicken et al., 2012), videofluoroscopy
(Georgian et al., 1982; Morrish, 1984), electropalatography (EPG) (Fletcher, 1988; Imai
and Michi, 1992; Michi, et al. 1989) and cine-MRI (Stone et al., 2013). The findings of
these and other studies suggest that (i) articulation is least affected in patients who have
undergone resection of the base of tongue and (ii) stop consonant articulation is most
distorted for postoperative glossectomy patients. Past work suggests, based on acoustic
measurements, that horizontal tongue movement is generally impacted more than vertical
tongue movement (Kaipa et al., 2012, Savariaux et al., 2001; McMicken et al., 2012).
Evidence has also been presented for gradual improvement of patients’ speech due to
accommodation or compensation. Results of the single-subject study by Kaipa et al.
(2012) suggest that in the first three months following the oral tongue glossectomy, F2
values for vowel /i/ shift closer to typical values, but never fully reach typical-like values.
It remains unclear, however, what articulatory mechanisms the patient employed to
gradually shift these values into a more typical range. While some studies offer
explanations as to the source of articulatory compensation, there is conflicting evidence
as to whether the jaw plays a crucial role in compensating for the inability to raise the
tongue body during the production of vowels. A videofluoroscopy study of total
glossectomy patients by Morrish (1984) suggests that jaw movement is exaggerated in
post-operative speech involving high and low vowels, with the degree of jaw height being
proportional to vowel height, while Hamlet et al. (1990) reports that jaw movement
amplitude did not increase during speech post-operatively. Morrish (1984) also found that
the patients compensatorily manipulated the pharynx in order to increase the volume of
the posterior cavity during high vowels /i/ and /e/ and to decrease the volume of the
posterior cavity for low vowels /a/ and /o/. In the same vein, McMicken et al. (2012), in
their analysis of congenital aglossic speech, remind the reader that it is plausible that in
the production of vowels, patients may use this technique of the altering the volume of
the pharyngeal resonator but also may employ lip spreading to shift their F1 and F2
3
values to a more typical range (also see Stevens, 1998). Lastly, it has been suggested,
using acoustic data (Georgian et al., 1982) and videofluoroscopy (Savariaux et al., 1999),
that glossectomy patients sometimes form consonant constrictions compensatorily, with
articulators other than those conventionally used by healthy subjects.
It is possible that the resulting acoustic differences in the glossectomy patients’
speech have been determined purely mechanically by the glossectomy procedure’s effect
on the vocal tract (i.e. the removal of lingual muscle mass and, in turn, impeded lingual
mobility). In this case, the articulatory movements that have been unaffected by the
procedure would remain intact, while those that have been impacted would not be
compensated for in any way. Alternatively, it is possible that the glossectomy patients
produce speech using compensatory strategies at various hierarchical levels post-
operatively. As part of Browman and Goldstein’s (1989) description of the vocal tract
hierarchy, they propose a constriction degree hierarchy that can be used to characterize
natural classes that emerge from the combination of articulatory gestures. Articulatory
gestures, hypothesized as the basic units of speech, are defined by specifications for
constriction location (CL) and constriction degree (CD) of local constrictions produced
by the lips, tongue tip, or tongue body. Multiple simultaneous gestures combine to
produce effective CD values at different hierarchical levels, at the highest level
characterizing the acoustic source characteristics of the combination. Multiple gestures
can also combine to modulate the constriction location within the tube. For example,
adding a lip protrusion gesture to a tongue gesture changes the tube CL. The tube CL
values together with the CD values define the acoustic resonances of vocal tract tube at
the highest level.
It is hypothesized that the patients may use compensatory strategies at several levels
outlined below. These hypotheses will be tested using the analyses that follow.
(i) Local compensation
In gestures responsible for producing tongue body constrictions for vowels, the
jaw and tongue form a functional synergy to achieve the task goals. For example,
when the jaw is held open using a bite-block, the tongue can raise with respect to
the jaw an “extra” amount, so that the constriction degree goal can be met (Gay et
al., 1981). When a patient can no longer elevate the tongue from the floor of the
4
mouth to form the constrictions necessary to produce consonants and vowels, we
might expect the jaw to compensate for lack of lingual mobility, and to exhibit an
increased range of motion post-operatively, in the attempt to achieve a target
constriction degree
1
. If this is the case, we would expect the range of jaw
positions employed to be greater in patients than in typical speakers, and the range
of jaw positions to be greater in patients’ post-operative speech than in pre-
operative speech. Further, we would expect constriction degree measures to
exhibit more dependence on jaw position measures for patients than for typical
speakers.
(ii) Global compensation of constriction degree at the tube level
Results of studies based on acoustics and videofluoroscopy (Georgian et al., 1982;
Savariaux et al., 1999) suggest that patients may use articulators other than those
used by typical speakers to form constrictions in the oral cavity that satisfy tube-
level constriction degree (and source) goals (e.g., complete closure, turbulence,
neither). It is possible that, for example, a glossectomy patient who can no longer
produce constrictions of the tongue tip narrow enough to result in closure or
turbulent source generation will recruit an additional gesture (of the lips or tongue
body) to achieve the tube-level (and acoustic source) goal.
(iii) Durational reinforcement
Tense and lax vowels in English are typically distinguished by both constriction
degree and duration, with tense vowels being longer than lax. It is possible that
glossectomy patients that lose the ability to distinguish the constriction degree of
tense vs. lax vowels may compensate by enhancing the difference in duration, so
as to preserve contrast. At the acoustic level, patients may create larger durational
differences between tense and lax vowels than those found in typical speech.
Real-time MRI offers ways of testing hypotheses regarding the post-operative
behaviors of glossectomy patients, the nature of which has only been speculated on based
on acoustics and data from less robust imaging modalities. The original contribution of
this work lies in the fact that real-time MRI is used to provide full, mid-sagittal views of
the vocal tract. This information-rich and dynamic data allow for a finer grained analysis
1
possibly constriction location, as well.
5
of post-glossectomy speech, and particularly of the possible compensatory production of
consonants and vowels. Real-time MRI is an ideal tool with which to identify and further
characterize these and other aspects of post-glossectomy speech, as it is minimally
invasive to the patient and provides a global, unobstructed view of articulator behavior in
all parts of the vocal tract.
I.1 Outline of the dissertation
First, a general methods section is presented, including Participants and Materials and
MRI Acquisition Method. Following that, Chapters III-VIII are presented as separate
experiments designed to test the hypotheses presented above. Each includes an
introduction, specific methods, results, and discussion. Chapter III investigates the
acoustics and articulation of post-glossectomy vowel production and evaluates how the
atypical values observed reflect the locus of resection undergone by each patient.
Perception of accentedness in post-glossectomy speech is also quantified and discussed.
Chapter IV investigates whether patients add additional gestures to preserve constriction
degree with the tube. Chapter V tests the hypothesis of within-gesture compensation of
constriction degree using the jaw. Chapter VI looks for evidence of reinforcement of
durational contrast between tense and lax vowels. In Chapter VII, post-glossectomy
speech is modeled using the Task Dynamic model (TaDA) and the CASY synthesizer. In
Chapter VIII, a measure of lingual flexibility using Principal Component Analysis is
proposed. Lastly, a general overview and discussion of the results are given in Chapter
IV.
6
Chapter II
General Methods
II.1 Participants and materials
Six advanced tongue cancer patients (one oral tongue (M1), one base of tongue and
partial oral tongue (F1) and four base of tongue (M2-M5)), were imaged after having
undergone partial glossectomy, neck dissection, free flap reconstruction (M1) and
radiation therapy. MRI data for all subjects were collected more than 6 months post
cancer treatment (after which point post-glossectomy speech intelligibility scores have
been reported to reach a plateau (Imai et al., 1988)). None of the subjects received speech
therapy between the time of finishing cancer treatment and the MRI scan. MRI data from
five typical speakers (two male (MT1-MT2) and three female (FT1-FT3) were used as
controls.
The patients were imaged and had their speech recorded and subsequently
denoised using a custom MRI protocol (Narayanan et al. 2004; Bresch et al. 2008), while
producing read speech as they lay supine in the scanner. The subjects were prompted to
read a series of short phrases and single words, presented visually by the experimenter
via a projector, 2-4 times in random order. The stimuli included a subset of the phrases
contained in “The Rainbow Passage” and the MOCHA-TIMIT corpus (Wrench and
William, 2000), as well as monosyllabic, labial stop-initial words containing the vowels
of American English, /i, ɪ, ɛ, e, æ, ɔ, o, ɑ, ʌ, u, ʊ, ɝ/, as syllable nuclei.
The five typical speakers were scanned using the same MRI protocol. Four of the
five typical speakers produced a subset of the phrases contained in “The Rainbow
Passage” as well as /vVv/ syllables containing the vowels /i, ɑ, u/ as syllable nuclei.
Typical speaker FT3 produced five repetitions of the same short phrases and single words
produced by the glossectomy patients.
7
II.2 MRI acquisition method
Image data for all speakers were acquired on a 1.5T GE Sigma scanner, using a 13-
interleaf spiral gradient echo pulse sequence (TR = 6.376 msec, FOV = 200 × 200 mm,
flip angle = 15◦ (20◦ for M3)) and a custom 4-channel head and neck receiver coil. The
scan plane (3 mm slice thickness) was located midsagittally; pixel density in the sagittal
plane was 84 × 84 yielding a resolution of 2.38 × 2.38 mm. Image data were acquired at a
rate of 18.52 frames per second, and reconstructed at 23.18 frames per second using a
sliding window technique. Audio was recorded inside the scanner at 20 kHz
simultaneously with the MRI acquisition, and subsequently denoised.
8
Chapter III
Characterizing vocal tract shaping, acoustics and the
percept of accentedness in post-glossectomy speech
III.1 Introduction
III.1.1 Vocalic constriction location and resulting formant frequency values
Formant frequency values, used to distinguish the acoustic qualities of the vowels of a
language, result from manipulation of articulators that form constrictions in certain
regions within vocal tract. It has long been observed that for typical speakers, the vowel
/i/ is produced with constriction in the palatal region, that the vowel /ɑ/ is produced with
constriction in the pharyngeal region, and that /u/ is produced with constriction in the
velar region (Wood, 1979). When constrictions are formed in the front of the vocal tract,
high in the palatal region, such as for vowel /i/, the first formant frequency (F1) value
(indexing tongue height, or constriction degree) tends to be low, while the second
formant frequency (F2) value (indexing tongue backness, or constriction location) tends
to be high. When the tongue is low in the mouth and a constriction is made in the
pharyngeal region, such as for vowel /ɑ/, F1 values will be high and F2 values will be in
the low-mid range. When a constriction is made in the back of the vocal tract, high in the
velar or uvular regions, both F1 and F2 values tend to be low, especially when combined
with lip rounding.
III.1.2 Hypotheses
In this section, the acoustics of ‘point’ vowels /i, ɑ, u/ and cross-distance measures at
various points along the vocal tract during the production of these vowels are compared
in oral tongue cancer patients (F1, M1), base of tongue cancer patients (M2-M5) and in
typical speakers (FT1-FT3, MT1-MT3). Work by Iskarous (2010) that suggests that the
constriction location specification is distinct from other articulatory and acoustic
parameters (such as constriction degree, motion of the tongue dorsum and F1 and F2
values) in that it is discrete (as opposed to continuous) and that listeners rely upon this
discreteness to help segment the speech signal into distinct units; listeners are not able to
9
discriminate pairs of synthesized vowel sequences that do not differ in constriction
location discreteness. Accordingly, it is hypothesized that tongue cancer patients will
preserve the typical target constriction location as best they can, despite their lingual
mass and mobility having been compromised. Though we predict that constriction
location will generally be preserved, it is possible that constriction degree may not be,
since the patients have lost a substantial amount of lingual tissue. Tokens in which the
vowel nucleus is not produced with target constriction location or degree, or is produced
with atypical acoustics are expected to be perceived as atypical by naïve listeners. The
observed constriction locations and resulting acoustics for the patients are hypothesized
to differ drastically (quantification method below) from those typically observed only for
cases in which the glossectomy procedure has rendered constriction at the typical location
physiologically impossible for the patient. Particularly, it is expected that the oral tongue
glossectomy patient (M1) will exhibit most difficulty in the production of vowels
produced in the palatal and velar regions, given the locus of his resection and
reconstruction. The oral and base of tongue glossectomy patient (F1) is expected to
exhibit difficulty producing vowels at all constriction locations, given that all parts of the
tongue have been impacted by the surgery. Base of tongue glossectomy patients (M2-
M5) are predicted to exhibit the most difficulty producing back vowels, since the
styloglossus (responsible for up-and-back pulling action) and the hyoglossus (responsible
for down-and-back pulling action) muscles have most likely been impacted.
For oral tongue patients, if the constriction location for front vowels (in the alveo-
palatal region) is generally preserved in post-operative speech, we will test whether
constriction degree in the alveo-palatal region is also generally maintained for front
vowels, where typical constriction location does not vary, as it does for back vowels.
III.2 Real-time MRI study
III.2.1 Methods
III.2.1.1 Articulatory cross-distance analysis
In order to characterize vocal tract shapes at a single point in time during articulation,
aperture (constriction degree) measures (from the tongue or floor of the mouth to the
palate or pharyngeal wall) at four distinct points along the vocal tract (alveolar, palatal,
10
velar and pharyngeal) (Figure III.1) were recorded during tokens /viv/, /vɑv/, and /vuv/
for typical speakers MT1-MT3, FT1 and FT2 and /bit/ /pɑt/ and /but/ for patients M1-
M5
1
and typical speaker FT3. The four gridlines were manually selected for each subject.
The alveolar gridline was placed at the center of the hard palate (Figure III.1). The palatal
gridline was placed just posterior to the alveolar gridline and at the constriction location
for patient M1’s high front vowels and coronal constrictions. The velar gridline was
placed just anterior to where the velum hinges to lower and raise, and the pharyngeal
gridline intersected the epiglottis. Both alveolar and palatal gridline data were collected
because it was clear by observing the rtMRI images that patient M1 was not able to
produce constrictions of any type in the alveolar region and that the production of his
target alveolar constrictions was slightly retracted. Typical speakers MT1-MT3, FT1 and
FT2 produced each utterance once. FT3 produced each utterance four times and the
patients produced each utterance two to three times. The measures were taken on the
frame at which the vowel gesture was most highly constricted and for which no evidence
of labial (onset) or coronal (coda) gestures was evident. Values for each token were
averaged and then compared.
Figure III.1: Four points along the vocal tract (alveolar, palatal, velar and pharyngeal
gridlines) at which constriction degree is measured
1
Patient M4 did not produce /but/ as part of the stimuli set
11
III.2.1.2 Formant frequency analysis
For monosyllabic tokens /bVt/, formant frequency values at the acoustic midpoint of the
vowel were extracted using Praat (Boersma and Weenink, 2014). F1 and F2 values were
averaged across repetitions of the same token for patients M1-M5 and F1. As a method of
quantitatively categorizing formant values as ‘typical’ or ‘atypical’, the original
Hillenbrand et al. (1995) dataset, consisting of acoustic data from 50 male speakers and
50 female speakers, was used to determine ‘typical ranges’ of first and second formant
values for males and females separately. The smallest value in each range is the lowest
formant value found across speakers of a given gender for a particular vowel and formant
(first or second). Similarly, the highest value of each range was determined. Patient
formant values that fall outside of the typical range are considered ‘atypical’.
III.2.1.3 Acoustic perception study
Thirty-three native speakers of English participated in an accentedness scoring task.
Participation of these subjects was elicited by crowdsourcing through a social media
website. The task involved listening to tokens of ‘beat’, ‘pot’ and ‘boot’ produced by
typical speaker FT1 and patients F1 and M1-M5. The target words corresponding to the
audio clips presented were also presented on the screen. Participants were presented with
three tokens (‘beat’, ‘pot’, ‘boot’) produced by each of the six patients
2
and typical
speaker FT1, producing a total of 20 tokens, each repeated twice (40 stimuli tokens total).
Participants were asked to assign “accentedness” scores to each token instead of being
asked to judge “typicality” of the speech in order to avoid the presence of rtMRI scanner
background noise impacting the perception scores. Further, what a given participant
deems “atypical” was thought to vary more subjectively than what is considered
“accented”. Participants assigned accentedness scores between 1 (not accented at all) and
5 (heavily accented) to each token. Participants listened to and scored each token twice.
Within speakers, the scores of each token were averaged.
2
Except for patient M4, for whom only tokens ‘beat’ and ‘pot’ were produced.
12
III.2.2 Results: Oral tongue cancer patient M1
III.2.2.1 Vocal tract aperture at constriction location gridlines
As consistent with traditional descriptions of vowel constriction locations, the vocal tracts
of the three typical male speakers and patient M1 are highly constricted in the palatal
region during the production of /i/ (Figure III.2).
3
Figure III.2: Vocal tract aperture at constriction location gridlines during /i/ for patient
M1 and typical male speakers
During vowel /ɑ/ (Figure III.3), the gridline of minimum aperture for patient M1
and the typical speakers is in the pharyngeal region, as consistent with traditional
articulatory descriptions of /ɑ/. Further, we observe that the aperture value in the alveolar
region is substantially higher for patient M1 than for any of the typical male speakers.
3
Though it should be noted that for speaker MT3, the velar region is most highly
constricted.
0
5
10
15
20
25
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M1's
vocal
tract
aperture
during
/i/
M1
MT1
MT2
MT3
13
Figure III.3: Vocal tract aperture at constriction location gridlines during /ɑ/ for patient
M1 and typical male speakers
During the articulation of /u/ (Figure III.4), the most constricted gridline for
patient M1 and typical speakers MT2 and MT3 is in the velar region. The aperture values
at the palatal and velar gridlines of typical speaker MT1 differ only by .5 mm and are the
most constricted. The velar region is expected to be the primary constriction location in
the production of /u/, and we find that, generally, this is the case for all four speakers.
Additionally, we observe that aperture values in the alveolar and pharyngeal regions are
quite high for patient M1.
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M1's
vocal
tract
aperture
during
/ɑ/
M1
MT1
MT2
MT3
14
Figure III.4: Vocal tract aperture at constriction location gridlines during /u/ for patient
M1 and typical male speakers
As expected given the nature of patient M1’s glossectomy, aperture at the palatal
gridline is generally lesser than aperture at the alveolar gridline. At both the alveolar and
palatal gridlines, however, aperture increases as expected, with vowel /i/ being most
constricted and vowel /æ/ being least constricted (Figure III.5). Though we must take
caution in comparing aperture values across speakers, with respect to typical speaker
FT3’s values (Figure III.12, to follow), patient M1’s aperture values in the alveolar
region are much larger and aperture values in the palatal region are far smaller.
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M1's
vocal
tract
aperture
during
/u/
M1
MT1
MT2
MT3
15
Figure III.5: Patient M1’s aperture at alveolar and palatal constriction location gridlines
(front vowels)
III.2.2.2 Acoustic vowel space and formant frequency analysis
The vowel space of patient M1 is visibly reduced when compared to that of mean values
for typical male speakers (as reported in Hillenbrand et al., 1995) (Figure III.6). Table
III.1 indicates that F1 values are more often ‘atypical’ than F2 values and that almost all
atypical F2 values occur for front vowels, for which the F2 values are lower than typical;
subject M1’s vowel space is compressed rightward, rendering all ‘back’ vowels. F1
values for nearly all vowels are higher than typical, suggesting that they are produced
with reduced constriction degree. Considering specifically the ‘point’ vowels, /i/ is
substantially distorted, being produced with F1 and F2 values outside of the typical
range, /ɑ/ is produced with typical formant values and /u/ is produced with a typical F2
value but with an F1 value just greater than typically produced.
0
5
10
15
20
25
beat
bit
bet
bat
Aperture
(mm)
M1
Aperture
at
Constriction
Location
Gridlines
Alveolar
(GL
24)
Palatal
(GL
21)
16
Figure III.6: Acoustic vowel space of typical male speakers (gold); (Reduced) acoustic
vowel space of patient M1 (red).
Patient M1 Formant Values
Vowel Observed F1
(Hz.)
Observed F2
(Hz.)
Typical F1
range (Hz.)
Typical F2
range (Hz.)
/i/ 468 1514 305-420 2049-2600
/ɪ/ 592 1539 363-501 1810-2494
/ɛ/ 739 1496 517-723 1580-2208
/æ/ 776 1491 511-685 1592-2436
/eI/ 567 1592 374-555 1804-2493
/ɝ/ 585 1267 429-557 1173-1517
/oʊ/ 594 981 376-601 659-1126
/ɔ/ 804 1175 587-740 856-1167
/ʊ/ 615 1181 409-522 965-1375
/u/ 472 1064 313-455 793-1299
/ə/ 756 1236 560-682 998-1410
/ɑ/ 804 1187 662-963 1060-1524
Table III.1: Patient M1’s formant frequency values as compared to those of typical male
speakers (atypical values in highlighted in red)
17
III.2.2.3 Perception study
For patient M1, accentedness scores were highest for ‘beat’, followed by ‘boot’, followed
by ‘pot’, which received an average accentedness score of 2, the average score associated
with typical speech (Figures III.7-8).
Figure III.7: Accentedness scores assigned to tokens produced by patient M1
Figure III.8: Accentedness scores assigned to tokens produced by typical speaker FT1
III.2.2.4 Discussion
For patient M1, the gridline measures indicate that constriction location goals are
met for vowels /ɑ/ and /u/, but not for /i/ (even though the palatal gridline is the most
0
1
2
3
4
5
6
boot
pot
beat
M1
Accentedness
Scores
0
1
2
3
4
5
6
boot
pot
beat
FT1
Accentedness
Scores
18
constricted location), and that aperture values in the palatal region vary as expected for
front vowels. Interestingly, the gridline patterns for vowel /i/ do not differ dramatically
between patient M1 and the typical male speakers. This is likely due to the fact that M1
possesses a reconstructed forearm flap that occupies the most anterior part of the oral
cavity, though he is not able to finely control the movements of this reconstructed flap as
he would the original lingual tissue. Despite the constriction degree at the alveolar
gridline being large, across vowels, the formants are substantially distorted only for
vowel /i/. M1’s acoustic data suggests that his vowels are generally compressed
rightward and produced low in the mouth, and as consistent with M1’s atypical gridline
patterns in the alveolar region, formant values for /i/ fall outside of the typical ranges.
M1’s accentedness scores, indicating percept of heavy accentedness, for tokens ‘beat’
and ‘boot’ (though not for ‘pot’) are consistent with the atypical formant values observed
for these tokens. While M1’s gridline patterns for ‘beat’ are similar to those of typical
speakers, the fact that gridline aperture values are high in regions distal to the constriction
location (velar) for ‘boot’ explains the atypical formants observed.
III.2.3 Results: Oral and base of tongue cancer patient F1
III.2.3.1 Vocal tract aperture at constriction location gridlines
During the production of /i/, the palatal region is highly constricted for all speakers, as
expected. The aperture values for patient F1, however, are substantially higher than for
the typical speakers in this region (Figure III.9).
19
Figure III.9: Vocal tract aperture during /i/ at vocal tract gridlines for patient F1 and
typical female speakers
During the production of /ɑ/, the pharyngeal region is most constricted for typical
speakers FT1-FT3, as expected (Figure III.10). However, the pharyngeal region is not the
most constricted for patient F1. Instead, the velar region is. Aperture values in both the
alveolar and pharyngeal regions are substantially higher than for the typical speakers.
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
F1's
vocal
tract
aperture
during
/i/
F1
FT1
FT2
FT3
20
Figure III.10: Vocal tract aperture during /ɑ/ at vocal tract gridlines for patient F1 and
typical female speakers
During the production of /u/, the velar region is most constricted, as expected, for
typical speakers FT1 and FT3 and for patient F1 (Figure III.11). Speaker FT2’s aperture
values in the velar, alveolar, and palatal regions differ very little (range: 1.1 mm).
Generally, the constriction location for the vowel /u/, across typical speakers and the
patient is in the velar region.
0
5
10
15
20
25
30
35
40
45
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
F1's
vocal
tract
aperture
during
/ɑ/
F1
FT1
FT2
FT3
21
Figure III.11: Vocal tract aperture during /u/ at vocal tract gridlines for patient F1 and
typical female speakers
As expected, typical speaker FT1’s aperture values for front vowels increase as expected,
from /i/ to /æ/ (Figure III.12).
Figure III.12: Typical speaker FT1’s aperture at alveolar and palatal constriction location
gridlines (front vowels)
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
F1's
vocal
tract
aperture
during
/u/
F1
FT1
FT2
FT3
0
5
10
15
20
25
30
beat
bit
bet
bat
Aperture
(mm)
FT1
Aperture
at
Constriction
Location
Gridlines
Alveolar
(GL
17)
Palatal
(GL
14)
22
Patient F1’s aperture values for palatal vowels increase as expected from /ɪ/ to /æ/,
however aperture values for vowels /i/ and /ɪ/ are nearly identical at the palatal gridline
and, strikingly, are smaller at the alveolar gridline for /ɪ/ than for /i/ (Figure III.13). When
compared to typical speaker FT1’s data, we find that all aperture values at the alveolar
gridline are larger for patient F1 than for typical speaker FT1.
Figure III.13: Patient F1’s aperture at alveolar and palatal constriction location gridlines
(front vowels)
III.2.3.2 Acoustic vowel space and formant frequency analysis
Like the vowel space of patient M1, the vowel space of patient F1 is visibly reduced
when compared to that of typical female speakers (Figure III.14). Table III.2 indicates
that for patient F1, F1 values are better preserved than F2 values. F1 values, where not
preserved, are higher than for typical speakers, while F2 values are lower than values in
the typical range for front vowels and higher than values in the typical range for back
vowels; F2 values for this subject are the compressed centrally. As for the ‘point’ vowels,
/i/ is substantially distorted, being produced with F1 and F2 values outside the typical
range, /ɑ/ is produced with a typical F1 value but atypical F2 value, and /u/ is produced
typically with respect to formant frequencies.
0
5
10
15
20
25
30
beat
bit
bet
bat
Aperture
(mm)
F1
Aperture
at
Constriction
Location
Gridlines
Alveolar
(GL
17)
Palatal
(GL
14)
23
Figure III.14: Acoustic vowel space of typical female speakers (gold); (Reduced) acoustic
vowel space of patient F1 (red).
Patient F1 Formant Values
Vowel Observed F1
(Hz.)
Observed F2
(Hz.)
Typical F1
range (Hz.)
Typical F2
range (Hz.)
/i/ 594 2128 331-531 2359-3049
/ɪ/ 683 2114 431-556 2129-2654
/ɛ/ 772 2013 584-981 1762-2426
/æ/ 877 1918 552-893 1944-2701
/eI/ 698 1938 433-672 2218-2889
/ɝ/ 625 1651 455-656 1365-1983
/oʊ/ 703 1747 430-720 803-1314
/ɔ/ 800 1806 656-995 963-1527
/ʊ/ 681 1809 444-617 987-1619
/u/ 473 1709 360-525 778-1711
/ə/ 768 1852 642-885 1100-1634
/ɑ/ 824 1824 708-1163 1233-1751
Table III.2: Patient F1’s formant frequency values as compared to those of typical female
speakers (atypical values in highlighted in red)
24
III.2.3.3 Perception study
For patient F1, accentedness scores were highest for ‘beat’, followed by ‘pot’, followed
by ‘boot’, which received an average accentedness score of less than 2, the score
associated with typical speech
4
(Figure III.15).
Figure III.15: Accentedness scores assigned to tokens produced by patient F1
III.2.3.4 Discussion
In sum, for patient F1, the constriction measures suggest that the goals of achieving
constriction location in vowel gestures are met for vowels /i/ and /u/, but not for /ɑ/.
Constriction degree varies as expected in the palatal region for front vowels, except for /i/
and /ɪ/. Not surprisingly, the formant values for /a/ fall outside of the typical range. The
formant values for /i/ are distorted, despite the target constriction location being achieved,
while the formant values for /u/ are within the typical range. Accordingly, accentedness
scores are highest for /i/ and /ɑ/, but are low for /u/.
4
The accentedness score associated with FT1’s production of ‘pot’ is relatively high
most likely due to dialectal front of /ɑ/.
0
1
2
3
4
5
6
boot
pot
beat
F1
Accentedness
Scores
25
III.2.4 Results: Base of tongue cancer patient M2
III.2.4.1 Vocal tract aperture at constriction location gridlines
During the production of /i/, base of tongue patient M2’s vocal tract aperture values
resemble those of typical speakers (Figure III.16). The palatal region, corresponding to
the place of articulation for /i/, is highly constricted for all speakers. Constriction degrees
at all points along the vocal tract fall within the typical range.
Figure III.16: Vocal tract aperture during /i/ at vocal tract gridlines for patient M2 and
typical male speakers
As expected, during the production of /ɑ/, patient M2’s vocal tract is most highly
constricted in the pharyngeal region, patterning consistently with the typical speakers
(Figure III.17). Constriction degrees at the constriction location (pharyngeal region) and
in the alveolar region, however, are substantially wider for patient M2 than for typical
speakers.
0
5
10
15
20
25
30
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M2's
vocal
tract
aperture
during
/i/
M2
MT1
MT2
MT3
26
Figure III.17: Vocal tract aperture during /ɑ/ at vocal tract gridlines for patient M2 and
typical male speakers
During the production of /u/, typical speakers’ vocal tracts are most constricted in
the velar region (except for MT1, for whom the palatal and velar regions are most
constricted) (Figure III.18). Patient M2’s vocal tract, however, is most constricted in the
alveolar region. Additionally, constriction degree in the pharyngeal region (distal to the
constriction location) is particularly wide for patient M2.
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M2's
vocal
tract
aperture
during
/ɑ/
M2
MT1
MT2
MT3
27
Figure III.18: Vocal tract aperture during /u/ at vocal tract gridlines for patient M2 and
typical male speakers
III.2.4.2 Acoustic vowel space and formant frequency analysis
Figure III.19 and Table III.3 indicate that patient M2 continues to produce front vowels
nearly typically, post-operatively. M2’s production of back vowels, however, seems to
have been impacted by the procedure. For back vowels, F1 values are generally preserved
while F2 values have been severely affected. F2 values are higher than those of typical
speakers; Patient M2’s back vowels are pushed forward in the acoustic space.
0
5
10
15
20
25
30
35
40
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M2's
vocal
tract
aperture
during
/u/
M2
MT1
MT2
MT3
28
Figure III.19: Acoustic vowel space of typical male speakers (gold); (Reduced) acoustic
vowel space of patient M2 (red).
Patient M2 Formant Values
Vowel Observed F1
(Hz.)
Observed F2
(Hz.)
Typical F1
range (Hz.)
Typical F2
range (Hz.)
/i/ 335 2430 305-420 2049-2600
/ɪ/ 517 2118 363-501 1810-2494
/ɛ/ 590 1979 517-723 1580-2208
/æ/ 683 1991 511-685 1592-2436
/eI/ 454 2302 374-555 1804-2493
/ɝ/ 514 1409 429-557 1173-1517
/oU/ 554 1169 376-601 659-1126
/ɔ/ 647 1583 587-740 856-1167
/ʊ/ 595 1663 409-522 965-1375
/u/ 388 1372 313-455 793-1299
/ə/ 609 1770 560-682 998-1410
/ɑ/ 750 1608 662-963 1060-1524
Table III.3: Patient M2’s formant frequency values as compared to those of typical male
speakers (atypical values in highlighted in red)
29
III.2.4.3 Perception study
For patient M2, accentedness scores were highest for ‘pot’, followed by ‘boot’, followed
by ‘beat’, which received an average accentedness score of less than 2, the score
associated with typical speech (Figure III.20).
Figure III.20: Accentedness scores assigned to tokens produced by patient M2
III.2.4.4 Discussion
Patient M2’s near-typical front vowel formant values are consistent with the typical
gridline aperture results for vowel /i/. Accordingly, accentedness scores for /i/ are low.
The centralization of back vowels along the F2 dimension is also consistent with M2’s
gridline analysis results; for /ɑ/, the constriction degrees at the target constriction location
and in the alveolar region are wider than for typical speakers. These findings are
consistent with the high accentedness scores assigned to /ɑ/. During /u/, M2 does not
achieve maximum constriction in the target velar region in addition to exhibiting a wider
constriction in the pharyngeal region than typically observed. These findings are
consistent with the M2’s acoustic patterns (producing an atypically high F1) for vowel /u/
and relatively high accentedness scores. It is likely that styloglossus was impacted during
the glossectomy procedure in such a way that its up-and-back pulling action has been
compromised, causing centralization of the formants for /u/, /ʊ/ and /ɑ/. It is also possible
0
1
2
3
4
5
6
boot
pot
beat
M2
Accentedness
Scores
30
that the down-and-back pulling action of the hyoglossus has been compromised, causing
a higher F2 value for /ɑ/ than typically produced.
III.2.5 Results: Base of tongue cancer patient M3
III.2.5.1 Vocal tract aperture at constriction location gridlines
During the production of /i/, speaker M3 patterns consistently with the typical speakers
(except for MT1, noted above) in that the palatal region is most constricted (Figure
III.21). Constriction degrees at all points along the vocal tract fall within the typical
range.
Figure III.21: Vocal tract aperture during /i/ at vocal tract gridlines for patient M3 and
typical male speakers
In the production of /ɑ/, patient M3’s pharyngeal region is most constricted, as
expected (Figure III.22). In regions distal to the constriction location (i.e. pharyngeal),
however, aperture values are substantially higher for patient M3 than for typical speakers.
0
5
10
15
20
25
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M3's
vocal
tract
aperture
during
/i/
M3
MT1
MT2
MT3
31
Figure III.22: Vocal tract aperture during /ɑ/ at vocal tract gridlines for patient M3 and
typical male speakers
During the production of /u/, patient M3 achieves typical constriction location and
constriction degree in the velar region (Figure III.23). Constriction degree in the alveolar
region is wider than for typical speakers, though does not differ by more than the range of
aperture values in this region among typical speakers themselves.
0
5
10
15
20
25
30
35
40
45
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M3's
vocal
tract
aperture
during
/ɑ/
M3
MT1
MT2
MT3
32
Figure III.23: Vocal tract aperture during /u/ at vocal tract gridlines for patient M3 and
typical male speakers
III.2.5.2 Acoustic vowel space and formant frequency analysis
Patient M3 continues to maintain near-typical production of high front vowel /i/, however
the other vowels (back and front) of M3 are centralized with respect to backness, and
lowered (produced with high F1 values) (Figure III.24, Table III.4). Both F1 values and
F2 values are severely affected, though F2 values of front vowels have been only
somewhat preserved. Generally, F2 values for front vowels are lower than typical and F2
values for back vowels are higher than typical (substantially so for /oʊ/, /ʊ/, and /u/).
Formant values for /ɑ/ fall within the typical ranges, as consistent with the typical
constriction location and constriction degree achieved.
0
5
10
15
20
25
30
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M3's
vocal
tract
aperture
during
/u/
M3
MT1
MT2
MT3
33
Figure III.24: Acoustic vowel space of typical male speakers (gold); (Reduced) acoustic
vowel space of patient M3 (red).
Patient M3 Formant Values
Vowel Observed F1
(Hz.)
Observed F2
(Hz.)
Typical F1
range (Hz.)
Typical F2
range (Hz.)
/i/ 416 2219 305-420 2049-2600
/ɪ/ 743 1630 363-501 1810-2494
/ɛ/
800 1581 517-723 1580-2208
/æ/
874 1632 511-685 1592-2436
/eI/ 575 1759 374-555 1804-2493
/ɝ/
843 1460 429-557 1173-1517
/oU/ 857 1766 376-601 659-1126
/ɔ/
897 1270 587-740 856-1167
/ʊ/
774 1609 409-522 965-1375
/u/ 551 1718 313-455 793-1299
/ə/
856 1507 560-682 998-1410
/ɑ/
890 1310 662-963 1060-1524
Table III.4: Patient M3’s formant frequency values as compared to those of typical male
speakers (atypical values in highlighted in red)
34
III.2.5.3 Perception study
For patient M3, accentedness scores were highest for ‘boot’, followed by ‘beat’ and ‘pot’,
which received average accentedness scores near 2, the score associated with typical
speech (Figure III.25).
Figure III.25: Accentedness scores assigned to tokens produced by patient M3
III.2.5.4 Discussion
Patient M3 produces high, front vowel /i/ with typical aperture measures along the entire
vocal tract and also has typical formant values, consistent with low accentedness scores.
Vowel /ɑ/ is produced with the target constriction location (pharyngeal) and with a
constriction degree in the typical range, though with wide constriction degrees in regions
distal to the constriction location. Despite the atypical constriction degrees in distal
regions, typical acoustics are still maintained post-operatively and lead to a percept of
low-accentedness. Interestingly, though vowel /u/ is produced at the target constriction
location and with a constriction degree in the typical range, the atypically wide
constriction in the alveolar region likely contributes to the atypical formant values
observed that are consistent with somewhat high accentedness scores associated with /u/.
It is somewhat peculiar that atypically wide constrictions formed in regions distal to the
constriction location during vowel /u/, unlike for vowels /i/ and /a/, result in formants
produced outside of typical ranges.
0
1
2
3
4
5
6
boot
pot
beat
M3
Accentedness
Scores
35
The centralization and lowering of front vowels suggests that the genioglossus
was affected in this patient’s procedure, impeding the tongue’s ability to pull up and
forward. The lowering and centralization of back vowels suggests that the styloglossus
may have been compromised as well, inhibiting the up-and-back lifting of the tongue
body. It is possible that the palatoglossus has also been compromised, causing difficulty
in elevating the tongue body.
III.2.6 Results: Base of tongue cancer patient M4
III.2.6.1 Vocal tract aperture at constriction location gridlines
During the production of /i/, patient M4’s vocal tract aperture values closely resemble
those of typical speakers (Figure III.26).
Figure III.26: Vocal tract aperture during /i/ at vocal tract gridlines for patient M4 and
typical male speakers
During the production of /ɑ/, patient M4 achieves the typical constriction location
and constriction degree in the pharyngeal region (Figure III.27). M4’s aperture values in
the alveolar region, however, are substantially larger than for typical speakers.
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M4's
vocal
tract
aperture
during
/i/
M4
MT1
MT2
MT3
36
Figure III.27: Vocal tract aperture during /ɑ/ at vocal tract gridlines for patient M4 and
typical male speakers
III.2.6.2 Acoustic vowel space and formant frequency analysis
The acoustic data for patient M4 reveal that most back vowels are produced with F1 and
F2 values outside of the typical range; F1 and F2 values are higher than typical,
suggesting that the vowels are being produced in a lower and more anterior position than
they typically are (Figure III.28, Table III.5). Interestingly, the formant frequencies
associated with vowel /ɑ/ are not atypical. The formant frequencies of high vowels have
been minimally affected, with some F1 values being higher than typically observed,
suggesting that these vowels are produced lower in the mouth than they typically might
be.
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M4's
vocal
tract
aperture
during
/a/
M4
MT1
MT2
MT3
37
Figure III.28: Acoustic vowel space of typical male speakers (gold); (Reduced) acoustic
vowel space of patient M4 (red).
Patient M4 Formant Values
Vowel Observed F1
(Hz.)
Observed F2
(Hz.)
Typical F1
range (Hz.)
Typical F2
range (Hz.)
/i/ 341 2367 305-420 2049-2600
/ɪ/ 521 1962 363-501 1810-2494
/ɛ/
665 1793 517-723 1580-2208
/æ/
725 1722 511-685 1592-2436
/eI/ 515 2103 374-555 1804-2493
/ɝ/
622 1362 429-557 1173-1517
/oU/ 820 1143 376-601 659-1126
/ɔ/
837 1295 587-740 856-1167
/ʊ/
652 1580 409-522 965-1375
/u/ n/a n/a 313-455 793-1299
/ə/
722 1469 560-682 998-1410
/ɑ/
855 1392 662-963 1060-1524
Table III.5: Patient M4’s formant frequency values as compared to those of typical male
speakers (atypical values in highlighted in red)
38
III.2.6.3 Perception study
For patient M4, accentedness scores for both ‘beat’ and ‘pot’ were low; token ‘boot’ was
not produced by patient M4 (Figure III.29).
Figure III.29: Accentedness scores assigned to tokens produced by patient M4
III.2.6.4 Discussion
Patient M4’s formant frequency patterns are generally consistent with the gridline
analysis results. High, front vowel /i/ is produced with both typical gridline aperture
values and with typical acoustics, as consistent with low accentedness scores. Low back
vowel /ɑ/ is produced with typical constriction degree and constriction location values,
however constriction degree values in regions distal to the constriction location are
atypical. Nonetheless, the resulting acoustics fall within the typical ranges and
accentedness scores were low.
III.2.7 Results: Base of tongue cancer patient M5
III.2.7.1 Vocal tract aperture at constriction location gridlines
During the production of /i/, patient M5’s vocal tract closely resembles those of the
typical speakers, with the most constricted regions generally being in the alveo-palatal
region (Figure III.30).
0
1
2
3
4
5
6
pot
beat
M4
Accentedness
Scores
39
Figure III.30: Vocal tract aperture during /i/ at vocal tract gridlines for patient M5 and
typical male speakers
During the production of /ɑ/, patient M5 achieves the typical constriction location
and constriction degree in the pharyngeal region (Figure III.31). M5’s aperture values in
the alveolar region, however, are substantially larger than for typical speakers.
Figure III.31: Vocal tract aperture during /ɑ/ at vocal tract gridlines for patient M5 and
typical male speakers
0
5
10
15
20
25
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M5's
vocal
tract
aperture
during
/i/
M5
MT1
MT2
MT3
0
5
10
15
20
25
30
35
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M5's
vocal
tract
aperture
during
/a/
M5
MT1
MT2
MT3
40
During the production of /u/, patient M5’s vocal tract aperture values resemble
those of typical speakers (Figure III.32). The velar region is most constricted and
constriction degree in this region falls within the typical range.
Figure III.32: Vocal tract aperture during /u/ at vocal tract gridlines for patient M5 and
typical male speakers
III.2.7.2 Acoustic vowel space and formant frequency analysis
The acoustic data for patient M5 reveal that all vowels but /i/ and /ɑ/ are produced with
F1 values higher than those typically produced, suggesting that these vowels might be
produced lower in the mouth than they typically would be (Figure III.33, Table III.6).
Vowel /i/ is produced with an F2 value lower than typically produced, suggesting that it
is produced in a posterior location. Select back vowels are produced with F2 values
higher than those typically produced, suggesting that they are fronted.
0
5
10
15
20
25
Alveolar
Palatal
Velar
Pharyngeal
Aperture
(mm)
Patient
M5's
vocal
tract
aperture
during
/u/
M5
MT1
MT2
MT3
41
Figure III.33: Acoustic vowel space of typical male speakers (gold); (Reduced) acoustic
vowel space of patient M5 (red).
Patient M5 Formant Values
Vowel Observed F1
(Hz.)
Observed F2
(Hz.)
Typical F1
range (Hz.)
Typical F2
range (Hz.)
/i/ 409 2028 305-420 2049-2600
/ɪ/ 570 1919 363-501 1810-2494
/ɛ/ 763 1718 517-723 1580-2208
/æ/ 892 1678 511-685 1592-2436
/eI/ 585 1990 374-555 1804-2493
/ɝ/ 780 1366 429-557 1173-1517
/oU/ 724 1173 376-601 659-1126
/ɔ/ 910 1420 587-740 856-1167
/ʊ/ 690 1191 409-522 965-1375
/u/ 507 1377 313-455 793-1299
/ə/ 853 1407 560-682 998-1410
/ɑ/ 843 1276 662-963 1060-1524
Table III.6: Patient M5’s formant frequency values as compared to those of typical male
speakers (atypical values in highlighted in red)
42
III.2.7.3 Perception study
For patient M5, accentedness scores were highest for ‘beat’, followed by ‘pot’ and ‘boot’,
which received average accentedness scores near 2, the score associated with typical
speech (Figure III.34).
Figure III.34: Accentedness scores assigned to tokens produced by patient M5
III.2.7.4 Discussion
Patient M5 proves to be an interesting case in terms of his gridline aperture patterns and
the resulting acoustics. For vowel /i/, patient M5 has typical gridline aperture values, yet
the acoustic data reveal that the F2 value produced is just lower than typical, suggesting
that it is slightly backed. This is consistent with /i/’s slightly high accentedness scores.
For vowel /ɑ/, the gridline analysis reveals that constriction location and degree are
preserved, but that constriction degrees in regions distal to the constriction locations are
wider than typically observed. Despite these differences, typical acoustics are achieved.
Corresponding accentedness scores are just over 2, indicating a very low percept of
accentedness. Strikingly, for vowel /u/, typical gridline patterns are observed (as was the
case for /i/), but the corresponding F1 and F2 values are higher than expected, suggesting
that /u/ is centralized. Interestingly, low accentedness scores reveal that listeners perceive
the vowel not to be heavily accented.
0
1
2
3
4
5
6
boot
pot
beat
M5
Accentedness
Scores
43
III.3 General Discussion
In analyzing the data at hand, we find that the patterns observed in the cross-distance
vocal tract measures and acoustic formant frequency values are consistent with the loci of
lingual tissue resection of most patients. Further, the accentedness scores assigned to
various tokens vary systematically with corresponding typical and atypical gridline
measures and resulting acoustics.
For both oral tongue patients, the prediction that vocal tract cross-distance values
would be generally larger than for typical speakers holds true. Patient M1’s glossectomy
and reconstruction procedure caused damage to the anterior portion of the oral tongue
(likely affecting the anterior genioglossus and longitudinal muscles). Further, the
reconstructed tissue had shrunken a bit, due to radiation therapy. Since for patient M1 (i)
the tongue does not occupy as much space in the front region due to the tongue resection
and shrinkage of his reconstructed flap and (ii) the mobility of the front portion of the
tongue has been compromised for front vowels since he cannot finely control his tongue
tip, it is not surprising that constriction location goals (and somewhat typical acoustics)
can still be achieved for vowels /ɑ/ and /u/ but not for high front vowel /i/. Given that the
gridline patterns during /i/ approximate those of typical speakers, it is unclear precisely
what is responsible for the atypical acoustics associated with /i/.
Patient M1’s relatively high cross-distance measures in the pharyngeal region
(except during vowel /ɑ/) may be due to inter-speaker variability. Alternatively, it is
possible that patient M1 pushes the back of tongue forward for front vowels as a
compensatory mechanism, rendering pharyngeal aperture high. That is, M1 uses the
unaffected posterior genioglossus to a greater extent than do typical speakers to push the
reconstructed anterior portion of the tongue forward and up to achieve the palatal
constriction necessary for /i/. M1’s wide pharyngeal constriction during /u/ (Figure III.4)
could be caused by tissue of the base of tongue and tongue dorsum pulling up to create a
velar constriction for /u/, where (more anterior) tongue body tissue would typically create
the constriction. Alternatively, patient M1 could be compressing the tongue to create a
‘tube system’ that best achieves a possible goal of the target vowel /u/ involving
balancing anterior and posterior areas. Since the area of the anterior cavity is increased
44
(compared to that of typical speakers), this mechanism would involve increasing back
area value by whatever means possible. Additionally, it is possible that M1’s
palatoglossus has been affected, causing vowels to be produced lower in the mouth and in
turn, cause F1 values to be higher than expected.
Patient F1 underwent glossectomy of the entire superior portion of the oral tongue
(likely affecting the superior longitudinals and styloglossus) and base of tongue (likely
affecting the posterior genioglossus and hyoglossus), without reconstruction. As a result,
patient F1 is missing a substantial amount of lingual tissue that gives rise to a centrally
compressed vowel space (horizontal movement may be impeded in both directions) and
generally high cross-distance values as compared to typical speakers. Patterns in F1’s
vowel space seem to indicate that some differences in tongue height, or constriction
degree are maintained, despite differences in tongue backness, or constriction location
not being achieved. Though this patient’s high front vowel constriction location does not
differ from that of typical speakers, her lack of lingual tissue and residual tissue mobility
prevents her from producing a lingual constriction that is sufficiently high and front for
the vowel /i/. During /ɑ/, patient F1 has larger cross-distances at the pharyngeal and
alveolar gridlines than do typical speakers. The aperture values at the palatal and velar
gridlines, however, are in the typical range. This suggests that patient F1 attempts to use
the residual lingual tissue to create a constriction in the most low and posterior region
anatomically possible for her, which is anterior of the pharyngeal gridline, causing a high
aperture value in this location. This is also consistent with F1’s acoustic vowel space that
reveals that vowel /ɑ/ is produced slightly higher and fronter than in typical speakers.
All base of tongue patients underwent resection and reconstruction of the base of
tongue. As expected, for all base of tongue patients, the gridline aperture analysis reveals
that high front vowel /i/ patterns consistently with typical speakers. Accordingly, formant
values for /i/ fall in typical ranges for all base of tongue patients except for M5 (for which
the difference between the observed F2 value and range minimum is only 21 Hz.) All
base of tongue patients produce vowel /ɑ/ at the target pharyngeal constriction location
while three of the four patients produce it with a constriction degree in the typical range.
Interestingly, regions distal to the pharyngeal region are wider in aperture than typically
45
observed for all base of tongue patients during /ɑ/, reflecting the fact that a substantial
portion of lingual tissue has been resected.
Despite aperture values in regions distal to the pharyngeal region being higher
than typical, F1 and F2 values in the typical ranges are achieved for the three speakers
that produce /ɑ/ in the pharyngeal region with typical constriction degrees. For patient
M2, who does not achieve the typical constriction degree in the pharyngeal region during
/ɑ/, F2 is affected. The base of tongue patients vary in whether their production of /u/ has
been substantially impacted by the procedure. Patient M2 does not achieve the target
velar constriction location, while M3 and M5 are able to achieve the target constriction
location and degree. Strikingly, though constriction location and degree are achieved by
patients M3 and M5, the F1 and F2 values produced do not fall within typical ranges.
This suggests that atypical aperture values in regions distal to the constriction location are
somewhat reflected in the acoustics.
Two important generalizations are revealed by these data. Firstly, constriction
locations for vowel gestures produced by glossectomy patients are consistent with those
used by typical speakers except in cases where forming a constriction at the typical
constriction location is physiologically impossible due to the lack of tissue or residual
tissue immobility. Additionally, these data show that atypical vocal tract shapes created
in regions distal to a vowel’s primary constriction location (e.g. M1’s wide alveolar
constriction in /ɑ/ and /u/, F1’s wide alveolar and pharyngeal constrictions in /u/)
generally do not affect the acoustic output as heavily as do atypical shapes at a vowel’s
primary constriction location
5
. Atypically wide constrictions in regions distal to
constriction location would be expected to have the same acoustic effect as forming a
more narrow constriction at the constriction location. Specifically, F1 and F2 values
would become slightly more extreme in the acoustic vowel space; not centralized. Thus,
the centralized formant frequency values observed in the patient data likely arise due to
atypically wide constrictions at the constriction location, itself. The fact that post-
glossectomy speakers preserve constriction location specifications but do not preserve
5
Patients M3’s production of /u/ is the only token for which the data do not support this
notion. Patient M5’s cross-distance gridlines pattern typically for /u/ in all regions, yet
typical acoustics are not achieved.
46
constriction degree specifications is consistent with work by Iskarous (2010) that
suggests that the constriction location specification is distinct from other articulatory and
acoustic parameters (such as constriction degree, motion of the tongue dorsum and F1
and F2 values) in that it is discrete (as opposed to continuous) and that listeners rely upon
this discreteness help segment the speech signal into discrete units; listeners are not able
to discriminate pairs of synthesized vowel sequences that do not differ in CL
discreteness. As Iskarous (2010) suggests, perhaps it is these discrete parameters that
“compose the foundation for how the speech production system serves to communicate
discrete contrasts”.
47
Chapter IV
Preservation of constriction degree by cross-gesture
compensation
IV.1 Introduction and hypotheses
Using videofluoroscopy, it has been observed that post-glossectomy patients sometimes
use compensatory mechanisms to form intelligible speech whereby articulators other than
the ones typically used to make certain constrictions in the vocal tract are used (Georgian
et al., 1982). Further, results of an acoustic analysis (Savariaux et al., 1999) suggest that
post-glossectomy speakers may shift the constriction location of velar stops to the
pharyngeal region, and studies on congenital aglossia (Simpson and Meinhold, 2007),
revealed that /t/, /d/, and /n/ were commonly produced by the bottom lip making contact
with the alveolar ridge. It is hypothesized, given that patient M1 is unable to finely
control the reconstructive flap that makes up his anterior oral tongue, that he will use
compensatory mechanisms to produce coronal consonant constrictions. In the case of F1,
a substantial amount of lingual tissue has been removed from the anterior tongue body
and the base of tongue. Thus, it is hypothesized that compensation will be exhibited for
both coronal and dorsal constrictions.
IV.2 Method
For all stimuli in the experimental corpus, audio, MRI video recordings, and MR image
frame sequences of the subjects’ speech were examined. For every token in which
atypical articulatory behavior was visually observed, time series illustrating articulatory
activity in regions of interest (labial, alveolar, velar; Figure IV.1) were automatically
generated by calculating the mean intensity of pixels in each region. This method
provides a robust estimate of constriction degree in noisy data, without relying on error-
prone articulator segmentation along air-tissue boundaries (Lammert et al., 2010).
Additionally, by using an algorithm that automatically detects the pixel of maximum
dynamic intensity during the utterance of interest, we are able to robustly determine
48
constriction location (Proctor et al., 2011).
Figure IV.1: Vocal tract regions (labial, alveolar, velar) within which articulatory
activity is estimated from mean pixel intensity for patients F1 (left) and M1 (right)
IV.3 Results
rtMRI reveals that patient M1 employs two types of compensatory mechanisms in
attempt to produce (i) target alveolar stop constrictions and (ii) target alveolar fricative
constrictions. Patient F1 produces labial stops in place of alveolar and velar oral stops,
but does not always exhibit compensatory behavior for tongue tip gestures associated
with lateral /l/, nasal stop /n/, and interdental fricative /θ/.
IV.3.1 Results: Patient M1
IV.3.1.1 Compensatory production of alveolar stops by formation of labiodental
constriction
Patient M1, who is unable to execute finely controlled movements of the tongue tip,
exhibits compensatory behavior by replacing the tongue tip gesture required for stops and
laterals with labiodental stop constrictions. In cases of target oral stop constrictions, the
compensatory labiodental gesture is accompanied by a dorsal constriction gesture. Patient
M1 typically produces the target coda /t/ in isolated words by forming both a dorsal
constriction combined with a labiodental constriction (as evidenced by triangular lower
lip deformation caused by compression of the upper teeth into the lower lip, outlined in
Figure IV.2). The labiodental constriction produced during target coda /t/ (also
observable in the labial intensity time function panel) can be compared to the bilabial
49
constriction in onset /b/, during which extensive outward deformation of the lower lip is
apparent (Fig. IV.2). It is important to note that a labial gesture is not simply substituting
the alveolar gesture, but rather that labial gesture and a dorso-velar gesture (not present
during the production of onset /b/) are being coproduced in place of coda /t/.
Figure IV.2, Top: Acoustic waveform and time-aligned estimated constriction functions
(labial, alveolar, velar) in M1’s production of isolated token ‘bat’. Bottom: MRI frames
displaying articulatory postures for onset /b/ (l), and labiodental and dorsal constrictions
in place of coda /t/ (r).
Compensatory behavior of this kind is not limited to isolated tokens, but is
exhibited in running speech as well. Patient M1 forms a labiodental nasal stop (evidenced
by slight inward lower lip deformation, circled in Figure IV.3) in place of word-final
target /n/ of “division” in the phrase “The rainbow is a division of white light into many
beautiful colors.” Interestingly, during this word-final /n/, no peak indicating a dorsal stop
constriction is evident in the velar pixel intensity time function.
50
Figure IV.3: Acoustic waveform and time-aligned estimated constriction functions
(labial, alveolar, velar) in production of ‘division’ in running speech. MRI frame shows
labiodental gesture in place of word-final /n/
In running speech, patient M1 produces a labiodental stop in place of the /nl/
portion of “sunlight” (Figure IV.4) in the phrase “When the sunlight strikes raindrops in
the air...” The intensity patterns observable in the MRI image frames corresponding with
the /nl/ constriction duration suggest that the outer edge of the lower lip is compressed
against the upper teeth (evidenced by complete inward deformation of the lower lip).
51
Figure IV.4: Acoustic waveform and time-aligned estimated constriction functions
(labial, alveolar, velar) in M1’s production of ‘sunlight’ in running speech. MRI frame
shows articulatory posture of labiodental stop (inward deformation of lower lip) in place
of /nl/
IV.3.1.2 Compensatory production of alveolar fricatives by formation of velar
constriction
Patient M1’s compensatory behavior is not limited to stops, but also occurs in place of
alveolar frication. M1 produces frication between the palate and tongue dorsum rather
than between the apical tongue and teeth to achieve target /s/ in ‘sun’. Using the
constriction location detection method, we confirm striking differences in constriction
location for the target alveolar fricative /s/ between patient M1 and typical speaker MT2
(Figure IV.5). These differences are reflected in its spectral properties; aperiodic high-
frequency noise that is typically produced ~5000-8000 Hz. is produced by patient M1 at
lower frequencies (2500-4000 Hz.).
52
Figure IV.5: MRI frames from subjects M1 (left) and MT2 (right) during /s/ in ‘sun’
IV.3.2 Results: Patient F1
IV.3.2.1 Compensatory production of alveolar and velar gestures by formation of
bilabial constriction
Patient F1 produces bilabial stops in place of both alveolar and velar oral stops for all
tokens of running speech. Alveolar constrictions for /l/, /n/ and /θ/ are systematically
produced in the alveolar region with the residual tongue tip (Figure IV.6).
Max |Ip| = [24 29]
53
Figure IV.6: Acoustic waveform and time-aligned estimated constriction functions
(labial, alveolar, velar) in F1’s production of ‘Don’t ask me to carry an oily rag like that’
in running speech. MRI frames shows articulatory posture of bilabial stops in place of /d/
and /k/ and postures of alveolar constrictions in /n/, /l/, and /θ/.
IV.4 Discussion
The data in this section illustrate that oral tongue cancer patient M1 exhibits
compensatory behavior by (i) replacing the tongue tip gesture required for nasal stops and
laterals with labiodental stop constrictions, (ii) replacing the tongue tip gesture required
for oral stops with co-produced labiodental and dorsal stop constriction gestures and (iii)
replacing the tongue tip gesture at the teeth required for the production of /s/ with a
tongue dorsum gesture at the palate. This is not surprising, given that M1’s superior
longitudinals, responsible for lifting the tongue tip, have likely been impaired.
Interestingly, despite the (alveolar) target constriction location being the same for
all gestures for which compensatory behavior is observed (in segments /t/, /n/, /l/, /s/), the
compensatory mechanisms selected for each gesture are not the same. Instead, the
compensatory mechanism used for a given tongue-tip gesture varies depending on the
target constriction degree of that gesture. For tongue-tip gestures with a target stop
constriction degree (associated with a negative aperture target), patient M1 elects to use
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Time (sec)
Dont ask me to carry an oily rag like that
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Constriction
Dont ask me to carry an oily rag like that
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Degree
Dont ask me to carry an oily rag like that
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Don’t ask me to carry an oily rag like that
LAB
ask me to carry an oily rag like that Don’t
d
nt
m
t
t
n
r g
l l
k
c
th
t
t
ALV
VEL
54
the lower lip and the teeth to produce the gesture, maintaining the constriction degree of a
stop. In a similar vein, compensatory differences in oral stops and nasal stops and laterals
can be considered in light of their differing source types; oral stops involve increasing
intraoral pressure to produce a burst upon release, whereas nasal stops and laterals are not
produced with intraoral pressure buildup or a release burst.
For tongue-tip gestures with a target fricative constriction degree (so as to create
turbulent airflow), patient M1 opts not to use the lower lip and teeth to form a labiodental
fricative, as in the case of stop production. Rather, M1 maintains the target fricative
constriction degree by producing a constriction using the tongue dorsum and the palate.
The results in this section suggest that it is plausible that for consonant production,
constriction degree tends to be better preserved than constriction location.
The fact that patient M1 uses labiodental and velar stops in place of alveolar stops
and velar fricatives in place of alveolar fricatives, both of which are not present in the
phonemic inventory of English, and does not choose to compensate by forming velar
stops or labiodental fricatives, both of which are present in the phonemic inventory of
English, is striking. This finding suggests that certain speakers may show a systematic
preference for utilizing compensation mechanisms that form segments not naturally
found in the inventory of their language in order to avoid perceptual confusability
between segments.
Patient F1 exhibits compensatory behavior in producing bilabial stops in place of
alveolar and velar stop constrictions. This compensatory pattern is consistent with patient
F1’s loci of resection; the tongue body and base of tongue. Interestingly, patient F1 relies
on a compensatory mechanism to produce only segments relying on complete occlusion
of the oral cavity (such as /d/ and /k/). Segments relying on partial occlusion (such as /l/
and /θ/) are produced with the typical tongue-tip gesture. This pattern suggests that while
F1 is able to make some contact between the tongue tip and the hard palate, she is unable
to create complete occlusion at the alveolar region. It is possible that F1’s inability to
create complete occlusion is due to sheer lack of lingual tissue. Alternatively, F1 may
lack the lingual musculature required to appropriately position the tongue against the hard
palate so as to prevent airflow. Additionally, there is evidence that patient F1 produces /n/
with a tongue tip gesture (possibly co-produced with a labial gesture). It is possible that
55
the tongue-tip gesture associated with /n/ is preserved despite not making full closure
with the hard palate as a result of the acoustics, as perceived by the speaker via the
feedback loop, not being impacted to such a degree that would warrant changing the
constriction location specification from alveolar to labial. That is, the presence of nasal
airflow and the continuous nature of the acoustics of /n/ may ‘afford’ F1 the ability to
preserve the tongue tip gesture despite full occlusion of the oral cavity not being
achieved. Alternatively, it is possible that the articulatory targets associated with the
tongue tip gestures for oral stops differ from those for the nasal stop and lateral in that the
oral stop gestures are more extreme. A more extreme constriction target would ensure
complete blockage of the airstream to allow for increased air pressure within the oral
cavity, resulting in a perceptible burst upon release. If this is the case, it is possible that
F1 is able to achieve the gestural target for the nasal and lateral just as she did pre-
operatively, but that she can no longer achieve the more extreme target associated with
oral stops, resulting in compensatory behavior for only these segments. Compensating in
this way allows the constriction degree at the vocal tract tube level to be achieved for all
consonants, despite the inability to achieve constriction degree at the local (or ‘gestural’)
level.
F1 produces a substantial amount of lip protrusion during the production of /s/
(observable in Figure IV.6, second rtMRI image from left). It is possible that the
constriction produced by the tongue-tip during /s/ is not sufficiently narrow and that a
bilabial constriction is used compensatorily, as a second filter, in order to achieve the
desirable tract tube constriction degree. In doing so, the acoustic percept of frication is
enhanced and constriction degree at the vocal tract tube level is preserved, though
constriction degree at the local, gestural level cannot be not fully maintained.
56
Chapter V
Evidence for within-gesture compensation of constriction
degree by jaw height modulation
V.1 Introduction
V.1.1 Synergistic jaw movement
The vocal tract components—articulators, act synergistically to achieve vocal tract
constrictions that form the meaningful and contrastive units of a spoken language. There
is a large amount of variability in how different speakers of a given language utilize vocal
tract components in attempt to achieve the same articulatory goal. For example, in
producing a bilabial stop consonant, some speakers tend to raise the jaw substantially
while moving the lips minimally while others tend to move the lips substantially and raise
the jaw minimally (Alfonso, 1996). Under various conditions (externally imposed (e.g.
using a bite-block) or context-dependent (e.g. by a coarticulated vowel)), a single speaker
will vary contributions of the articulators accordingly (Abbs and Gracco, 1984; Kelso,
Tuller, Vatikiotis-Bateson and Fowler, 1984; Shaiman, 1989). Investigating jaw position
with respect to palatal cross-distance in the production of front vowels can inform us as
to whether the relationship between jaw movement and constriction degree differs
between post-glossectomy patients and typical speakers, and, if so, what the post-
glossectomy patients are trying to achieve by utilizing a given pattern. Investigating the
relationship of jaw position to cross-distance measures by comparing correlation and
slope values will inform us as to what role the jaw may play in the achievement of the
constriction degree achievement of a gesture.
V.1.2 Hypotheses
It is hypothesized that when the range of lingual mobility and lingual mass are reduced,
the post-glossectomy patients will maximally utilize the (unaffected) mechanism of jaw
lowering and raising in attempt to achieve the constriction targets associated with vowels.
Specifically, we expect the degree of correlation between jaw position and cross-distance
measures at the palatal gridline to be larger in post-glossectomy patients than in typical
57
speakers. Further, we expect the slopes of the regression lines for jaw position and
constriction degree to be larger for patients than for typical speakers, indicating greater
changes in constriction degree per unit change in jaw position.
1
This would suggest that
jaw movement contributes more to the formation of palatal constrictions in patients than
in typical speakers.
V.2 Methods
V.2.1 X-ray microbeam database
Twenty-two typical male speakers and twenty-five typical female speakers of English
produced sentences (running speech; not single words) while their articulators were
tracked using X-ray microbeam (XRMB) (details in Westbury et al., 1994). The
utterances produced by the XRMB subjects were phone-delimited using the Penn
Phonetics Lab’s Forced Aligner (Yuan & Liberman, 2008). Constriction degree, defined
as the minimal distance between the most anterior lingual sensor and the palate outline
(traced for every speaker), and the Euclidean distance between jaw (the buccal surface of
the central incisors) and the gums of the upper teeth were measured and recorded at the
acoustic vowel midpoint for all stressed, front vowels.
V.2.2 Real-time MRI data collection
Patients F1, M1-M5 and typical speaker FT1 produced the /bVt/ stimuli described in
Chapter II in addition to a limited amount of running speech. All monosyllabic words
along with stressed vowels in the running speech data were analyzed at the acoustic
vowel midpoint. Coordinates of the most anterior point of the upper gums and the flattest
region posterior to the chin were manually selected and recorded (Figure V.1). The
Euclidean distance between these points was used as an index of head-movement-
corrected jaw position. The vocal tract cross-distance at the palatal gridline (as described
in Chapter III) was recorded for each subject.
1
For the limiting case in which constriction degree relies exclusively on jaw position, the
slope would be 1
58
Figure V.1: Jaw position calculated between the upper gums and the flattest region
posterior to the chin
V.2.3 Correlation and regression analyses
In order to fairly compare correlation and regression slope values across rt-MRI and
XRMB modalities, each providing values on different scales, jaw position and
constriction degree values were range-normalized from 0-1 for each subject. Correlation
and linear regression analyses were then performed on the normalized data, and the
resulting values compared.
V.3 Results
V.3.1 Correlation between jaw position and constriction degree
As expected, for nearly all participants, as jaw position (distance from lower teeth or chin
to the upper teeth) increases, constriction degree systematically increases, as shown in the
example data of Figures V.2-3, for both XRMB and rtMRI data.
59
Figure V.2: XRMB data: Constriction degree varies systematically with jaw position
Figure V.3: rtMRI data: Constriction degree varies systematically with jaw position
Correlation coefficients for jaw position and constriction degree are highest for
patients. The four highest correlation coefficients correspond to patient data, and all
patient data (red) fall within the highest 17% of correlation coefficient values (Figure
V.4). Interestingly, the correlation coefficient for typical speaker FT1 (green), for whom
data was collected using rtMRI, falls at the high end of the correlation coefficient range,
though is still lower than the coefficient values of most of the patients.
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
Normalized
CD
Normalized
Jaw
Position
FT14
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
Normalized
CD
Normalized
Jaw
Position
FT1
60
Figure V.4: Correlation coefficients for all participants in ascending order (typical
speakers-blue; patients-red; FT1-green)
V.3.2 Regression slopes of jaw position and constriction degree
The regression slopes of jaw position and constriction degree are generally highest for
patients (Figure V.5). As was the case for correlation coefficients, all patient data (red)
fall within the highest 17% of regression slope values. Typical speaker FT1’s (green)
value also falls within this range, though falls below all of the values of the patients
except for one (M4).
Figure V.5: Regression slopes of jaw position and constriction degree for all participants
in ascending order (typical speakers-blue; patients-red; FT1-green)
V.4 Discussion
Investigating the correlation between jaw position and constriction degree helps us to
understand the degree to which the two variables are linearly related; how well can
constriction degree be predicted given jaw position? Higher correlation coefficients
observed for the patients than for typical speakers indicate that jaw position and
constriction degree vary more systematically for patients. However, the high correlation
coefficients observed for patients do not imply that jaw position is modulated
compensatorily to produce constrictions; it is possible that due to the limited mass and
61
mobility of the tongue, any movement of the jaw contributes heavily to the formation of a
constriction, while the tongue, with its extremely limited mobility, contributes very little
to the constriction formation. To determine whether patients modulate jaw position
compensatorily, we look to regression slope values. Slope values indicate the amount of
change in constriction degree associated with unit change in jaw position. Since larger
differences in constriction degree are achieved with the same amount of jaw movement
for patients than for typical speakers, it is likely that they are utilizing compensatory jaw
movement in the production of palatal constrictions.
Interestingly, typical patient FT1, whose data was collected using real-time MRI,
produces higher correlation coefficient and slope values than almost all of the typical
speakers for whom data was collected using XRMB. Though it is possible that these
patterns arise due to differences in how the jaw was measured using each methodology or
due to the normalization procedure used, it is probable that the difference in stimuli used
in the XRMB and rt-MRI experiments accounts for these patterns. In the XRMB data
collection, participants produced only running speech, while in the rt-MRI data
collection, participants produced single, monosyllabic words in isolation with just a few
sentences of running speech. Producing words in isolation likely resulted in the
production of generally larger jaw movement amplitudes due to a lesser degree of
undershoot being exhibited than would be in the running speech produced by the typical
speakers in the XRMB data collection. Despite these patterns that likely arise due to the
difference in experimental stimuli, it is notable that typical speaker FT1 still produces a
correlation coefficient smaller than do two-thirds of the patients and a regression slope
value lower than those of five of the six patients. In doing so, FT1 patterns similarly to
the typical speakers, producing figures in the upper ranges of these measures.
62
Chapter VI
Evidence against durational reinforcement of contrast
VI.1 Introduction and hypotheses
The fact that patient F1’s acoustic and articulatory data reveal that post-operatively, the
constriction degree distinction between /i/ and /ɪ/ is barely preserved (Chapter 3) begs the
question of whether the durational differences between such tense-lax pairs could be
enhanced in order to reinforce the contrast. Since acoustic vowel length in American
English is systematically longer for tense vowels than for lax vowels (Port, 1981), we
hypothesize that across all speakers, /i/ and /u/ will be produced with longer durations
than /ɪ/ and /ʊ/ and that for patient F1, who is unable to maintain differences in
constriction degree between /i/ and /ɪ/, durational differences between tense and lax
vowels will be exaggerated. Since patient M1’s oral tongue has been affected, it is
possible that some durational compensation occurs in tense and lax vowels despite
vowels /i/ and /ɪ/ being differentiated in their formant values. Indices of durational
difference between tense and lax vowels produced by the oral tongue cancer patients will
be compared to those produced by a large sample of typical speakers, as reported in
Hillenbrand et al., 1995. By determining where the patients’ tense-lax duration difference
indices values fall with respect to those produced by Hillenbrand et al.’s (1995) typical
speakers, evidence for or against the durational reinforcement of contrast will be
provided. If the durational difference indices produced by the patients are larger than the
difference values of all typical speakers, it is likely that the patient is utilizing the
durational cue as a compensatory mechanism. If the durational differences indices
produced by the patients fall within the ranges of typical speakers, it is probable that no
durational compensation is occurring.
63
VI.2 Method
Acoustic vowel durations as measured from the onset of voicing for the vowel to the
onset of the stop closure for coda /t/ were recorded for vowels /i/, /ɪ/, /u/ and /ʊ/ for each
monosyllabic token produced by oral tongue cancer patients F1 and M1. Values across
repetitions of the same token were averaged. Normalized indices of durational differences
between tense and lax vowels were calculated as the mean durational difference between
tense and lax vowels divided by the mean acoustic vowel length for tense and lax vowels
for a given speaker, in order to control for inter-speaker variability in overall vowel
duration. Durational difference indices produced by patients and typical speakers
(Hillenbrand et al., 1995) were calculated and compared.
VI.3 Results
The acoustic durational data reveal that both patients exhibit the expected pattern of
acoustic vowel length being longer for tense than lax vowels (/i, u/ and /ɪ, ʊ/), as
exhibited in Figures VI.1-2.
Figure VI.1: Patient M1: Duration of tense and lax vowels
0
50
100
150
200
250
/i/
/I/
/u/
/ʊ/
Duration
(ms)
Vowel
M1
Tense
v.
Lax
Vowel
Duration
64
Figure VI.2: Patient F1: Duration of tense and lax vowels
Patient M1’s high front vowels /i/ (194.8 ms) and /ɪ/ (154.2 ms.) differed by 40.6 ms.
while his high back vowels /u/ (189.8 ms.) and /ʊ/ (124.8 ms.) differed by 65 ms.
Roughly half of the male speakers included in Hillenbrand et al.’s (1995) study produced
the high front vowels with indices of durational difference greater than Patient M1
(Figure VI. 3). One of the forty-five male speakers produced differences greater than
Patient M1 for the high back vowels (Figure VI.4).
Figure VI.3: Indices of durational difference between tense and lax high front vowels in
Hillenbrand’s typical males (blue) and patient M1 (red)
0
50
100
150
200
250
300
350
400
/i/
/I/
/u/
/ʊ/
Duration
(ms)
Vowel
F1
Tense
v.
Lax
Vowel
Duration
65
Figure VI.4: Indices of durational difference between tense and lax high back vowels in
Hillenbrand’s typical males (blue) and patient M1 (red)
Patient F1’s high front vowels /i/ (299 ms) and /ɪ/ (225 ms) differed by 74 ms while
her high back vowels /u/ (273 ms) and /ʊ/ (211.5 ms) differed by 62.5 ms. Nineteen of
the forty-eight typical female speakers produced the high front vowels with indices of
durational difference greater than Patient F1 (Figure VI.5). Twenty-four of the forty-eight
typical female speakers produced the high back vowels with indices of durational
difference greater than Patient F1 (Figure VI.6).
Figure VI.5: Indices of durational difference between tense and lax high front vowels in
Hillenbrand’s typical females (blue) and patient F1 (red)
Figure VI.6: Indices of durational difference between tense and lax high back vowels in
Hillenbrand’s typical females (blue) and patient F1 (red)
66
VI.4 Discussion
The oral tongue cancer patients follow the expected pattern of producing tense vowels
with longer durations than lax vowels. The evidence suggests that patients M1 and F1 do
not enhance the cue of durational difference between tense and lax vowels; the indices of
durational difference between the tense-lax pairs fall in the typical ranges as observed by
Hillenbrand et al. (1995). More generally, these findings serve as preliminary evidence
that hyper-production of one dimension of contrast cannot be used to replace a dimension
that is no longer available.
Upon collecting rt-MRI data from patients both pre-operatively and post-operatively,
this comparison may be done on a single speaker, rather than across multiple speakers. If
it is the case that oral tongue patients’ tense and lax vowel durations differ more post-
operatively than pre-operatively, a plausible explanation will be that the post-
glossectomy speakers are using the gestural specification of duration as a compensatory
mechanism; that they enhance and magnify the tense vs. lax durational cue that already
exists in typical speech, when the manipulation of other gestural specifications becomes
unavailable to them.
67
Chapter VII
Simulating post-glossectomy speech using the task
dynamics (TaDA) model
VII.1 Introduction
As illustrated in Chapter VI, we have observed that glossectomy patients compensate in
the production of consonants by forming constrictions using articulators not traditionally
used. These findings suggest that the forward maps in the patients’ speech systems,
responsible for linking articulatory motor commands to anticipated acoustic output, have
been updated for patients’ consonant production, and their articulations accordingly
altered. These data beg the question of whether patients’ forward maps (Guenther et al.,
1998) are revised also for vowel production, leading to articulatory modifications post-
operatively, for which preliminary evidence has been presented in Chapter V.
1
VII.1.1 Proposal
A gestural simulation study in which vowel production is modeled under two conditions
will be carried out. By comparing the resulting articulatory and acoustic data to those of
actual patients, we may provide evidence in favor of or against the notion that the
forward maps involved in vowel production are updated and the articulations adjusted
accordingly in such a way that implies the adjustment of speech motor commands to
reflect the amount of lingual tissue resected. The first (‘non-blind’) simulation will aim to
model the acoustic and articulatory effects of the glossectomy procedure in a speech
motor system that takes into account the loss of lingual tissue. Alternatively, the second
(‘blind’) simulation aims to simulate the effects of the glossectomy in a system that does
not take into account the loss of lingual tissue, but instead continues to assume the pre-
operative tongue size. The simulation that produces acoustic and articulatory data most
similar to those produced by the oral tongue cancer patients is a more realistic model of
the internal forward model employed by the patients. Specifically, a realistic simulation
will produce a reduced acoustic vowel space and vocal tract area functions in which
1
It is also possible that forward maps are updated for vowel production, but that
articulation is not modified to produce original acoustics.
68
target constriction location is largely maintained with areas in regions distal to the
constriction location being relatively high.
VII.1.2 Speech motor control, the task dynamics model and the configurable articulatory
synthesizer
As proposed by Guenther (2006), speech production involves the integration of auditory,
somatosensory and motor information. The speech motor control system is comprised of
a feed-forward system and somatosensory and auditory feedback systems. At the onset of
speech development, auditory targets are learned. Production of these targets initially
relies heavily on the auditory feedback control subsystem. As the infant repeatedly
attempts to achieve auditory targets, feed-forward commands, mapping articulations to
resulting acoustics, are refined using the auditory feedback received on each attempt.
Over time, somatosensory targets are established and refined, as well. This general tuning
process facilitates learning of the inverse model that computes the speech motor
commands necessary to achieve a target articulation, given the current state of the
articulator.
The Task Dynamic Application (TaDA) (Nam et al., 2004), built on the theories
of Articulatory Phonology (Browman and Goldstein, 1992; 1995; Goldstein and Fowler,
2003) and Task Dynamics (Saltzman and Munhall, 1989), takes as input a gestural score
to generate movements of constriction variables (articulators) and the articulatory degrees
of freedom. Using these degrees of freedom, the Configurable Articulatory Synthesizer
(CASY) (Iskarous et al., 2003) then computes the vocal tract area function values over
time, along with the corresponding resonance frequencies and bandwidths. This set of
variables is input to the HLSyn synthesizer to produce the acoustic output.
VII.2 Method
In both simulations, the radius of the tongue ball in CASY is reduced by one-third. In the
first simulation, the task dynamics is blind to this modification (the forward map does not
take into account the change in lingual mass), whereas in the second simulation, the task
dynamics and forward map do take into account this modification. All vowels of
69
American English were simulated in the /bVt /context used in the glossectomy study.
Formant frequency values and vocal tract area functions at the acoustic vowel midpoints
were recorded and compared. Additionally, most extreme (maximum or minimum) jaw
position during the vowel nucleus was measured and compared across conditions to
determine whether the articulation in the non-blind simulation showed any evidence of
within-gesture compensation by using the jaw. Typical vowel production was simulated
by leaving the tongue size unchanged, and the corresponding values were used as
controls.
70
VII.3 Results
VII.3.1 Acoustic consequences of tongue size modification under two task
dynamic conditions
The non-blind task dynamic condition (Figure VII.1) produces a vowel space larger in F2
range than the one produced in the simulation of typical speech.
Figure VII.1: Vowel space in non-blind task dynamic condition (red) overlaid on
simulated typical vowel space (gold)
The blind task dynamic condition (Figure VII.2) produces a vowel space
substantially smaller than that produced in the simulation of typical speech, along both F1
and F2 dimensions. The vowel space produced by this model resembles the vowel space
of patient F1, who underwent both oral and base of tongue resection.
71
Figure VII.2: Vowel space in blind task dynamic condition (red) overlaid on simulated
typical vowel space (gold)
VII.3.2 Consequences of tongue size modification on vocal tract area function
under two task dynamic conditions
The figures below illustrate the vocal tract area functions simulated at the vowel midpoint
under original (typical) speech (red), post-operative ‘blind’ (black), and post-operative
‘non-blind’ (blue) conditions. By observing the vocal tract area function minima of
particular vowels produced by the unaltered, original task dynamic model, we can
identify the vocal tract constriction regions in terms of distance (in centimeters) from the
larynx, as outlined in Table VII.1.
72
Pharyngeal Uvular-Pharyngeal Uvular
2
Palatal
/æ/ n/o
3
/oʊ/ 9 /u/ 11 /i/ 14
/ɔ/ 8 /ʊ/ 11 /ɪ/ 14
/ɑ/ 8 /eI/ 15
/ɛ/ 16
8 9 11 14-16
Table VII.1: Constriction locations of vowels as produced by TaDA
As illustrated in Figure VII.3, during /i/, the non-blind simulation produces an
area function with a constriction degree similar to that produced in the unaltered
simulation. In the non-blind simulation, area values in regions distal to the constriction
location tend to be higher than areas produced by the blind and unaltered models.
Figure VII.3: TaDA vocal tract area functions under typical, non-blind and blind
simulations during /i/
2
As specified in TaDA
3
not observable
0 2 4 6 8 10 12 14 16 18 20 22
0
500
1000
1500
Distance from larynx (cm.)
Area (mm
2
)
Area functions: BEAT
original
nonblind
blind
73
During /u/ (Figure VII.4), the areas of the region posterior to the constriction are
similar in the blind and non-blind conditions, yet the non-blind constriction degree is far
more extreme than the blind constriction degree. The anterior vocal tract area (region
distal to the constriction location) is largest in the non-blind condition.
Figure VII.4: TaDA vocal tract area functions under typical, non-blind and blind
simulations during /u/
During /ɑ/ (Figure VII.5), no minimum is observed for the blind simulation, but is
for the non-blind and unaltered simulations. As previously observed, in the non-blind
simulation, area amplitude is highest and area values at the constriction location are low,
while in the anterior vocal tract, area values are highest. In the blind simulation, area
values are moderately high in all vocal tract regions.
0 2 4 6 8 10 12 14 16 18 20 22
0
500
1000
1500
Distance from larynx (cm.)
Area (mm
2
)
Area functions: BOOT
original
nonblind
blind
74
Figure VII.5: TaDA vocal tract area functions under typical, non-blind and blind
simulations during /ɑ/
VII.3.3 Evidence for within-gesture compensation by the task dynamic model
As illustrated in Table VII.2, jaw position is lower during vowel /ɑ/ in the non-
blind simulation than in the blind simulation. Jaw position is higher during vowel /i/ in
the non-blind simulation than in the blind simulation.
Blind Non-Blind
/ɑ/ 13.12 mm. 13.19 mm.
/i/ 9.94 mm. 9.53 mm.
Table VII.2: Jaw position during vowels /ɑ/ and /i/
4
4
Higher jaw position values indicate lower jaw position
0 2 4 6 8 10 12 14 16 18 20 22
0
500
1000
1500
Distance from larynx (cm.)
Area (mm
2
)
Area functions: POT
original
nonblind
blind
75
VII.4 Discussion
VII.4.1 Acoustic consequences of tongue size modification under two task
dynamic conditions
Upon observing the acoustic vowel space produced by the non-blind model, we find that
the F2 range is larger than for the unaltered model and that the F1 range is not
particularly reduced. While it may be somewhat surprising to observe a larger F2 range
(suggesting a larger range of horizontal tongue movement across vowels) in a model for
which the tongue size has been reduced, the area functions illustrated offer some
explanation. For all vowels, the area differential (between area at the constriction location
and area in the distal region) is larger than the area differential in the unaltered model.
The vowel space produced by the blind model shows decreased F1 and F2 ranges, and
closely resembles the vowel space produced by patient F1, who underwent both base of
tongue and oral tongue resection without reconstruction. The F1 and F2 ranges for front
vowels also resemble those produced by patient M1, who underwent resection of the oral
tongue with reconstruction (though the reconstructive flap had shrunken in size by the
time of the rtMRI data collection). A simulation of patient M1’s vowels can be created by
assuming unaltered (typical) back vowels (since the base of tongue and posterior oral
tongue were not affected by M1’s glossectomy procedure) with front vowels produced
with the altered tongue size under the blind task dynamic condition, as illustrated in
Figures VII.3-5.
It is important to consider that while the vowel space produced by the model blind
to tongue size modification generally resembles those produced by patients F1 and M1,
some differences are expected to be observed due to possible differences in the amount of
lingual tissue resected and factors that are not accounted for in the task dynamic model
such as possible limitations on lingual mobility due to pain and muscle impairment.
Additionally, the reader should be reminded that in Figure VII.6, patient M1’s vowel
space is being simulated and compared to a simulation of his own pre-operative vowel
space while M1’s actual vowel space is being compared to the average male vowel space
in Figure VII.7.
76
Figure VII.6: Synthesized M1 vowel space (unaltered back vowels and blind model front
vowels) (red); Unaltered vowel space (gold)
77
Figure VII.7: Acoustic vowel space of healthy male speakers (gold); (Reduced) acoustic
vowel space of patient M1 (red).
The decreased F1 and F2 ranges exhibited by the blind model closely resemble
the back vowel formant values produced by the patients who underwent base of tongue
resection. A simulation of a base of tongue cancer patient’s vowels can be created by
assuming unaltered (typical) front vowels (since the oral tongue was not affected by the
glossectomy procedure) with back vowels produced with the altered tongue size under
the blind task dynamic condition, as illustrated in Figure VII.8. As in the case of
synthesizing oral tongue cancer patient’s speech, some differences are expected to be
observed between the patterns exhibited by the task dynamic model (Figure VII.8) and
those exhibited by base of tongue cancer patients (Figure VII.9) due to possible
differences in the amount of lingual tissue resected and factors that are not accounted for
in the model such as possible limitations on lingual mobility due to pain or muscle
impairment.
78
Figure VII.8: Synthesized base of tongue cancer patient’s vowel space (unaltered front
vowels and blind model back vowels) (red); Unaltered vowel space (gold)
79
Figure VII.9: Acoustic vowel space of healthy male speakers (gold); (Reduced) acoustic
vowel space of base of tongue patient M4 (red).
In sum, based on the acoustic data produced by the three models and a
comparison of these patterns to those observed in the actual patient data, it seems that the
task dynamic model that does not take into account the change in tongue size most
closely resembles the acoustic patterns produced by the glossectomy patients.
80
VII.4.2 Consequences of tongue size modification on vocal tract area function
under two task dynamic conditions
By observing the vocal tract area functions under the unaltered, blind, and non-blind
conditions, we find that the constriction location is oftentimes coincident across the three
conditions. Constriction degree at constriction locations in the non-blind area functions is
generally very well preserved, with area values only slightly higher than in the unaltered
condition. Constriction degree values at constriction locations in the blind simulation,
however, are largest. In regions distal to the constriction location, the blind simulation
produces moderate area values, whereas the non-blind simulation produces very high area
values, which are likely the cause of the wide F2 range observed in the non-blind vowel
space.
Comparing these patterns to those produced by patient F1 (whose glossectomy is
most closely simulated by the task dynamic tongue size modification) and M1, we find
that as in the blind and non-blind simulations, patient F1’s constriction locations are
largely preserved, whereas constriction degree values are not always preserved. Like the
blind model’s area values in regions distal to the constriction location, however, patient
F1’s aperture values are relatively high.
VII.4.3 Evidence for within-gesture compensation by the task dynamic model
In order to determine whether the non-blind task dynamic model shows evidence of
compensatory jaw movement in light of the fact that tongue size has decreased, the most
extreme jaw position was compared across conditions in high vowel /i/ and low vowel
/ɑ/. Not surprisingly, jaw position during /i/ is higher in the non-blind simulation than in
the blind simulation and jaw position during /ɑ/ is lower in the non-blind simulation than
in the blind simulation, suggesting compensatory movement of the jaw when lingual
mass is reduced. The compensatory jaw movement exhibited by the non-blind model is
very consistent with the jaw behavior of post-glossectomy patients observed in Chapter
V. The important implications of this observation and its relation to the seemingly
contradictory implications of the vowel space comparison discussed in section VII.3.1 are
considered in Chapter IX.
81
Chapter VIII
Quantifying lingual flexibility using principal component
analysis
VIII.1 Introduction
VIII.1.1 Lingual flexibility and Principal Component Analysis (PCA)
During speech production, the tongue can be shaped in both gross ways (i.e. tongue body
lowering and backing for the production of /ɑ/) and in relatively complex ways (i.e.
simultaneously producing tongue tip and tongue body gestures for /s/). It is lingual
flexibility that allows the tongue to form complex shapes, with regions of the tongue
being differentially controlled, as is required for the production of /l/ or /s/. The loss of
lingual tissue and possible discomfort associated with the glossectomy procedure has
been claimed to result in limited lingual mobility and flexibility (Imai and Michi, 1992;
Michi, et al. 1989); when the muscular integrity of the tongue is no longer intact, tongue
shaping is compromised. Thus, it would be expected that post-operatively, glossectomy
patients are able to carry out relatively gross movements of the tongue but struggle to
produce more complex tongue shapes. One method of determining how freely the tongue
moves within the vocal tract involves carrying out a Principal Component Analysis
(PCA) in vocal tract cross-distance measures. PCA attempts to reduce the number of
dimensions needed to represent the data. By carrying out a PCA on gridline cross-
distance measures, relatively few components explaining patterns of lingual displacement
are identified. The linear sum of these components, scaled by their respective
eigenvalues, characterizes changes in the overall tongue shape during speech; unit change
along one of these components returns the amount of change observed in the tongue-
shaping plots below. In the current analysis, the tongue-shaping patterns associated with
each component are identified and these patterns compared across speakers.
VIII.1.2 Hypotheses
It is predicted that for the typical speakers and post-glossectomy patients alike, the
primary tongue displacement pattern, as indicated by the largest principal component,
82
will primarily reflect the gross forward-backward movement of the tongue body to form
palatal and pharyngeal constrictions. This is consistent with observations of this forward-
backward movement pattern being prevalent in typical speech and with aperture
measures in the anterior and posterior regions of the vocal tract being highly correlated
(Harshman et al., 1977; Maeda, 1990). The next smallest components may reflect subtle
changes in tongue shape, such as those involved in tongue tip gestures, and are expected
to contribute to far less of the overall tongue shape in post-glossectomy patients than in
typical speakers, indicating limited lingual flexibility.
VIII.2 Methods
VIII.2.1 Real-time MRI data acquisition
Typical speakers MT1 and FT1 and patients F1, M1-M5 produced select sentences from
the TIMIT corpus and the Rainbow Passage (Appendix B) while lying in the scanner
(details of acquisition in Chapter II).
VIII.2.2 Articulator segmentation of real-time MRI data
All original rtMRI images are first smoothed. Following this, intensity-corrected frames
are created by multiplying the pixel intensities of the original frames by the inverse of the
pixel intensities of the smoothed frames, while setting the non-tissue pixel intensity to 0.
The final images are formed using sigmoid-kernel-based intensity warping on the
intensity-corrected frames. All frames except those associated with acoustic silence were
fit with gridlines along the vocal tract from the lips to the larynx. Air-tissue boundaries
were identified by locating and connecting intensity thresholds along the vocal tract
gridlines (details in Kim et al., 2014).
VIII.2.3 Vocal tract cross-distance measurement
For all frames on which articulator segmentation was applied, vocal tract cross-distance
measures (in pixels) were recorded for each gridline.
83
VIII.2.4 Principal Component Analysis
Since this analysis intends to address lingual mobility and flexibility, gridlines in the
labial region were excluded from the PCA. To allow for PCA result comparison across
subjects, interpolation of gridline values was carried out on each subject’s data such that
100 gridlines were associated with the vocal tracts of all speakers, from the alveolar ridge
to the larynx. Matlab’s Principal Component Analysis function
[coeff,score,latent,tsquared,explained,mu] = pca(___)
was used to analyze the gridline cross-distance matrices for all speakers, individually.
The data matrices analyzed were of size n x 100, where n equals the number of speech
frames produced by a given speaker. Columns correspond to the 100 vocal tract gridlines.
Gridline cross-distance measures can be reconstructed by multiplying the principal
component score matrix score by the transposed principal component coefficient matrix
coeff, adding the estimated means of the 100 variables (gridlines) in the original data
matrix:
orig_data_matrix = score*coeff' + repmat(mu, n, 1)
VIII.3 Results
VIII.3.1 Results: Typical speakers MT1 and FT1
The plots displayed in Figures VIII.1-VIII.8 represent unit positive and negative
weightings of each factor on the 100 vocal tract gridlines. For both the typical male
(MT1) and female (FT1) speakers, the loading coefficients of the first principal
component (PC1), scaled by their respective eigenvalues, are highest in the palatal and
pharyngeal regions of the vocal tract (Figures VIII.1-2; left). The shapes of those
corresponding to the second largest principal components (PC2) have their highest
amplitude in the velar region (Figures VIII.1-2; center). The amplitude of the third
principal components’ (PC3) loading coefficients is highest in the region of the alveolar
ridge (Figures VIII.1-2; right). The maximum amplitudes of loading coefficient plots
across the first three principal components are comparable.
84
Figure VIII.1: MT1: Scaled loadings of principal components 1-3
Figure VIII.2: FT1: Scaled loadings of principal components 1-3
VIII.3.2 Results: Oral tongue and oral and base of tongue cancer patients M1 and
F1
For oral and base of tongue cancer patient F1, the amplitude of scaled loading
coefficients for the first principal component is largest in the palatal and pharyngeal
regions (Figure VIII.3; left). The amplitude of the scaled loading coefficients on the
second principal component is largest in the upper pharyngeal region. The maximum
amplitude of the scaled loadings on the second component is notably less than that
observed for the first component (Figure VIII.3; center). The scaled loading coefficients
for the third principal component are distributed rather evenly along the vocal tract, and
maximum amplitude is lowest among the three component loading plots (Figure VIII.3;
right).
Figure VIII.3: F1: Scaled loadings of principal components 1-3
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
MT1: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
MT1: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
MT1: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
FT1: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
FT1: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
5
10
15
20
25
FT1: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−50
−40
−30
−20
−10
0
10
20
30
40
50
F1: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−50
−40
−30
−20
−10
0
10
20
30
40
50
F1: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−50
−40
−30
−20
−10
0
10
20
30
40
50
F1: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
85
For oral tongue cancer patient M1, the amplitude of scaled loading coefficients
for the first principal component is largest in the palatal and pharyngeal regions, though
the amplitude in the palatal region is substantially less than in the pharyngeal region,
consistent with the loss of tissue in the palatal region (Figure VIII.4; left). The overall
amplitude of the scaled loading coefficients on the second principal component is
substantially lower than for the first component, though has maxima in the palatal and
pharyngeal regions (Figure VIII.4; center). The overall amplitude of the scaled loading
coefficients on the third component is lowest. These scaled coefficients are highest in the
laryngeal region (Figure VIII.4; right).
Figure VIII.4: M1: Scaled loadings of principal components 1-3
Generally, the scaled loading patterns of the oral tongue cancer patients differ from those
of typical speakers in that for patients, the scaled loadings of the second and third
principal components are substantially smaller than those of the first component, and do
not reflect lingual movement primarily in the velar and alveolar regions.
VIII.3.3 Results: Base of tongue cancer patients M2-M5
The base of tongue cancer patients’ scaled loading patterns are similar to those of typical
speakers and oral tongue cancer patients in that the first component captures forward-
backward lingual movement. Consistent with patterns of patients M1 and F1, the second
and third components’ loading scores of patients M2-M5 are lower in amplitude than
those for the first component and those produced by typical speakers.
For base of tongue cancer patient M2, the amplitude of scaled loading coefficients
for the first principal component is largest in the palatal and pharyngeal regions (Figure
VIII.5; left). The overall amplitude of the scaled loading coefficients for the second
component is substantially lower than that of the first, and reaches a maximum in the
0 10 20 30 40 50 60 70 80 90 100
−60
−40
−20
0
20
40
60
M1: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−60
−40
−20
0
20
40
60
M1: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−60
−40
−20
0
20
40
60
M1: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
86
velar region (Figure VIII.5; center). The amplitude of the coefficients on the third
component is lowest, and the plot exhibits local maxima in multiple regions along the
vocal tract (Figure VIII.5; right).
Figure VIII.5: M2: Scaled loadings of principal components 1-3
For base of tongue cancer patient M3, the amplitude of scaled loading coefficients
for the first principal component is largest in the palatal and pharyngeal regions, though
slightly larger in the palatal region (Figure VIII.6; left). The overall amplitude of the
scaled loadings on the second component is less than those of the first, with maxima
exhibited in the alveolar and velar regions (Figure VIII.6; center). The amplitude of the
scaled loadings on the third component is generally low, with a maxima being reached in
the upper pharyngeal region (Figure VIII.6; right).
Figure VIII.6: M3: Scaled loadings of principal components 1-3
For base of tongue cancer patient M4, the amplitude of scaled loading coefficients
for the first principal component is largest in the palatal and pharyngeal regions, though
slightly larger in the palatal region (Figure VIII.7; left). The overall amplitude of the
scaled loadings on the second component is lower than for those on the first and maxima
are observed in the alveolar and velar regions (Figure VIII.7; center). The overall
amplitude of the loadings on the third component is similarly low, and maxima are
observed in the alveolar and upper pharyngeal regions (Figure VIII.7; right).
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M2: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M2: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M2: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−60
−40
−20
0
20
40
60
M3: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−60
−40
−20
0
20
40
60
M3: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−60
−40
−20
0
20
40
60
M3: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
87
Figure VIII.7: M4: Scaled loadings of principal components 1-3
For base of tongue cancer patient M5, the amplitude of the scaled loading
coefficients for the first principal component is largest in the palatal and pharyngeal
regions, though slightly larger in the palatal region (Figure VIII.8; left). The overall
amplitude of the scaled loadings on the second component is lower than for those on the
first and maxima are observed in the alveolar and velar regions (Figure VIII.8; center).
The scaled loadings on the third component are lowest among the three components, and
have maxima in the alveolar and upper pharyngeal regions (Figure VIII.8, right).
Figure VIII.8: M5: Scaled loadings of principal components 1-3
VIII.3.4 Results: Comparison of variance accounted for by principal components
in typical speakers and in patients
For typical speakers, the first principal component accounts for ~40% of the overall
change in tongue shape during speech, while the second and third components account
for ~27% and ~14%, respectively (Figure VIII.9).
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M4: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M4: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M4: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M5: Loadings of PC 1 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M5: Loadings of PC 2 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
0 10 20 30 40 50 60 70 80 90 100
−40
−30
−20
−10
0
10
20
30
40
M5: Loading Coefficients of PC 3 on Vocal Tract Gridlines
Gridlines (Alveolar Ridge (0) to Larynx (100))
Loading Coefficients * Eigenvalues
88
Figure VIII.9: Variance accounted for by principal components in typical speakers
For the oral and oral and base of tongue cancer patients, the first principal
component accounts for ~50% of the change in tongue shape, while the second and third
components account for ~15% and ~9%, respectively (Figure VIII.10). The first
component explains a larger amount of variance than in the case of the typical speakers,
while the second and third components explain less variance than those of the typical
speakers.
0%
10%
20%
30%
40%
50%
60%
PC1
PC2
PC3
PC1
PC2
PC3
FT1
MT1
Variance
Accounted
for
by
Principal
Components
in
Typical
Speakers
89
Figure VIII.10: Variance accounted for by principal components in oral and oral and base
of tongue cancer patients
For the base of tongue cancer patients, the first component explains ~47% of the
variance in overall tongue shape; slightly more than accounted for by the first principal
component in typical speakers (Figure VIII.11). For patient M5, the second component
values resembles typical values, while for the other base of tongue cancer patients, the
second component explains less variance than in the case of typical speakers. The third
component accounts for ~10% of the variance; also less than in the case of typical
speakers.
0%
10%
20%
30%
40%
50%
60%
PC1
PC2
PC3
PC1
PC2
PC3
F1
M1
Variance
Accounted
for
by
Principal
Components
in
M1
and
F1
90
Figure VIII.11: Variance accounted for by the principal components in base of tongue
cancer patients
VIII.4 Discussion
The typical speakers pattern remarkably similarly in the behavior of the three components
accounting for the largest amounts of change in tongue shape. That the first component
accounts for change in the forward-backward dimension of the vocal tract and that the
second component accounts for change in the velar region is consistent with previous
component analyses of lingual movement (Harshman et al., 1977; Maeda, 1990). The
third largest component accounts for tongue shape change in the alveolar region, within
which tongue tip constrictions are frequently made during running speech in English.
Importantly, the scaled loading coefficient plots exhibit similar maximum amplitudes,
suggesting that movements of the tongue body into smaller, more specific regions, as
indexed by the plots of the second and third component coefficients, contribute just as
much to the overall tongue shaping as does the gross, forward-backward movement
indexed by the first component. That the finely-controlled lingual movements contribute
to tongue shaping as heavily as does the gross, forward-backward movement of the
tongue body is consistent with the hypothesis that typical speakers exhibit greater lingual
flexibility than do glossectomy patients.
0%
10%
20%
30%
40%
50%
60%
PC1
PC2
PC3
PC1
PC2
PC3
PC1
PC2
PC3
PC1
PC2
PC3
M2
M3
M4
M5
Variance
Accounted
for
by
Principal
Components
in
M2-‐M5
91
Similarly to the typical speakers, oral and base of tongue cancer patient F1’s
largest component is associated with forward-backward movement of the tongue. That
none of the largest three components are associated with change in the alveolar region is
not surprising, given that patient F1 produces some target alveolar constrictions
compensatorily (Chapter IV). Interestingly, the second and third component score plots
exhibit much smaller maximum amplitudes as compared to the first component plot and
as compared to typical speakers. This suggests that the second and third components,
associated with subtle movements of the tongue, do not contribute as much to the overall
shaping of the tongue during speech as does the first component, associated with gross,
forward-backward movement. This is consistent with the hypothesis that post-
glossectomy patients will exhibit less lingual flexibility and may be able to maintain
forward-backward movement of the tongue body, but will struggle to produce more
subtle lingual movements.
Oral tongue cancer patient M1’s first component accounts for forward-backward
lingual movement, as was the case for the typical speakers and patient F1. Interestingly,
the amplitude of the scaled loading coefficients is markedly lower in the palatal region
than in the pharyngeal region, reflecting the presence of the reconstructed flap in the
coronal region and its inability to be finely controlled. The low amplitude coefficient
functions associated with the second and third principal components is consistent with the
hypothesis that post-glossectomy patients will be able to produce forward-backward
movement of the tongue body, but will tend not to produce subtle lingual movements.
Base of tongue patients M2-M5 exhibit similar patterns in the first component’s scaled
coefficient functions; forward-backward movement of the tongue is captured, with
highest amplitudes falling in the palatal and pharyngeal regions. For patients M3, M4 and
M5, the maximum amplitude in the palatal region is larger than that in the pharyngeal
region, reflecting the specific locus of resection. Also for patients M3, M4 and M5, the
second component’s scaled coefficients reach maxima in the alveolar and velar regions,
indicating the second component’s accountability for lingual movement in these regions.
Consistent with patterns exhibited by M1 and F1, the second and third components’
loading scores of patients M2-M5 are lower in amplitude than those for the first
component and those produced by typical speakers, suggesting that for the base of tongue
92
patients, as well, the components associated with subtle lingual movement contribute less
to the overall tongue shape than for typical speakers.
The bar graphs in Figures VIII.9-11 indicate that the typical speakers and post-
glossectomy patients pattern remarkably alike within-groups, and quite differently from
each other. For typical speakers, the gross, forward-backward movement pattern
associated with the first component contributes to the overall tongue shaping pattern
proportionally less than it does for the post-glossectomy patients. For typical speakers,
the subtler movement patterns associated with the second and third components
contribute proportionally more to the overall shaping of the tongue than for the
glossectomy patients.
In sum, the principal component analyses in this section reveal that post-
glossectomy patients exhibit less lingual flexibility than do typical speakers; patients are
able to produce gross, forward-backward movements of the tongue body, but do not
produce as complex lingual shaping as do typical speakers.
93
Chapter IX
General discussion and conclusions
IX.1 Summary of findings
The first experimental section of the dissertation compares vocal tract cross-distance
measures, acoustics, and perception scores in post-glossectomy patients and typical
speakers. The patterns of atypicality observed for the patients are consistent with loci of
their particular resections.
Oral tongue cancer patients were found to exhibit cross-gesture compensation in
consonant production, effectively preserving constriction degree in the vocal tract tube.
Patient M1 produces labiodental stops in place of coronal constrictions for /n/ and /l/ and
co-produces labiodental and dorso-velar constrictions in place of coronal oral stop
constrictions. M1 also produces velar fricatives in place of alveolar fricatives. Patient F1
produces labial stop constrictions in place of alveolar and velar constrictions for oral
stops, but does not always exhibit compensatory behavior for coronal gestures requiring
less extreme constriction degrees, such as lateral /l/, nasal stop /n/, and interdental
fricative /θ/.
In a similar vein, oral and base of tongue cancer patients exhibit higher correlation
between jaw position and constriction degree and more dependency of constriction
degree on jaw position than do typical speakers. These findings provide preliminary
evidence that patients may, indeed, recruit the jaw to compensate for limited lingual
mobility and flexibility. While cross-gesture and within-gesture compensation is
observed as described above, oral tongue cancer patients do not enhance durational cues
in tense and lax vowels to aid in perceptibility.
Results of the task dynamic modeling experiment show the task dynamic model
that does not take into account the decreased tongue ball radius to most closely resemble
the reduced acoustic vowel spaces of the post-glossectomy patients. The vocal tract area
functions produced by the models that do and do not take into account the change in
tongue size each could be argued to resemble those produced by the patients in that
constriction degree is sometimes preserved, and areas in regions distal to the constriction
94
location are large. Considering the modeled acoustics and vocal tract area functions
together, the model that does not take into account the change in tongue size best
approximates the patient data. Interestingly, the jaw compensation exhibited by the model
that does take into account the change in tongue size strongly resembles the observed
patient behavior. Possible interpretations of these data are considered in Section IX.2.
IX.2 The status of post-operative speech motor commands
In the case of consonant production in post-glossectomy patients, it is clear that the
patients’ internal models have been updated post-operatively, such that articulators other
than those traditionally used are incorporated and consequently recruited in speech
production. In the case of vowel production, the results of the modeling simulations – that
the acoustic patient data pattern most closely with the acoustic output of the task dynamic
model that does not take into account the altered tongue size and update the task
dynamics accordingly, but that the jaw compensation observed in the patient data is
consistent with that produced by the model that does take into account the change in
tongue size - appear to be contradictory. It is possible, however, that the patients’ task
dynamics have not been updated to account for the reduced tongue size, but that
compensation of the jaw occurs in real-time. That is, in vowel production, patients
assume pre-operative tongue size specifications and corresponding motor commands, but
due to the altered feedback received during post-operative vowel production,
compensatory jaw movement is executed in a reflex-like manner, in real-time. Typical
speakers exhibit sensorimotor adaptation when receiving altered acoustic feedback
(Houde and Jordan, 1998). Houde and Jordan (1998) observed that when participants
received formant-altered feedback of their own vowel production in real time, they
learned to adjust the formants they produced to compensate for the alteration. The
distorted acoustics initially produced by post-glossectomy patients can be compared to
the altered feedback received by Houde and Jordon’s (1998) participants; in both cases,
the anticipated acoustic output does not match the actual acoustic output, and in both
cases, compensatory behavior emerges.
95
IX.3 Insight into the nature of speech motor goals
Over the past several decades and to this day, debate persists as to whether the goals
involved in speech production and perception are articulatory or acoustic in nature. The
theory of articulatory phonology (Browman and Goldstein, 1989) posits articulatory
gestures with constriction specifications as the basic units of speech production, which,
according to the motor theory of speech perception (Liberman et al., 1967; Liberman and
Mattingly, 1985) also serve as the basic units of speech perception. The Task Dynamic
(TaDA) model of speech production has been implemented to model speech within this
gestural framework (Saltzman and Munhall, 1989). On the other hand, acoustic (or
‘auditory’) goals of production and perception have been posited by Stevens and
Blumstein (1978) who introduced the Acoustic Invariance Theory, Perkell et al. (1993),
and Guenther (1994), who introduced the Directions into Velocities of Articulators
(DIVA) model that involves mapping between reference frames at various stages
(including the articulator, tactile, constriction, acoustic and audio-perceptual reference
frames), to model speech using acoustic goals defined as regions in acoustic space. While
theories of articulatory and acoustic speech targets are clearly distinct, they are alike in
that they both parity-preserving; the minimal units produced by the speaker, whether they
be articulatory or acoustic in nature, are the very units perceived by the listener.
The results of analyses described in this dissertation suggest that for vowels,
constriction location is generally preserved post-operatively while constriction degree is
not always. For consonants, however, constriction degree (or manner of articulation) is
always preserved, whereas constriction location and the articulator used is not always
preserved. By considering these patterns within frameworks positing articulatory and
acoustic goals of production, we can evaluate to what degree the results seem to provide
support for one framework or the other.
If it is the case that speech production goals are articulatory in nature, the fact that
patients exhibit within-gesture compensation involving the jaw to achieve constriction
degree in vowel gestures that have been affected by their lingual resection can be
straightforwardly explained. That both glossectomy patients produce consonant
constrictions compensatorily with articulators other than those typically used, may
initially seem problematic for a theory positing articulatory, rather than acoustic, goals of
96
production. However, rather than being conceptualized as an asymmetry in preservation
patterns between vowels and consonants, an alternative view may be considered. Since
vowels are characterized by their filter properties while consonants are characterized as
source generators (e.g. full stop, turbulence, etc.), perhaps it is that the source or
aerodynamics, relying on the constriction degree of the entire vocal tract tube, tends to be
highly preserved, even when constriction degree at the gestural level is not. Though
current versions of Articulatory Phonology and TaDA do not allow for constriction
location or constriction degree specification at levels higher than the gesture, the version
of Articulatory Phonology proposed in Browman and Goldstein’s (1989) paper allows for
specification of constriction location and constriction degree at multiple hierarchical
levels (including the gesture and tube levels). Thus, if the constriction degree goals at the
level of the gesture cannot be met, the aerodynamic, source goals (constriction degree at
the tube level) still may be met by the use of other articulators. In this way, preservation
of these aerodynamic, source goals may drive the cross-gestural compensation in
consonants observed in the patient data.
Perturbation studies have shown that the acoustics of a segment are sometimes
preserved by means of altering the articulatory configuration (compensation) when
external constraints are imposed. The DIVA model predicts that in bite-block
perturbation experiments, speakers will adapt quickly to the perturbation and that typical
acoustics will be preserved. Data from typical speakers are consistent with the model’s
prediction (Guenther, 1998). Another experiment carried out by Savariaux (1995)
perturbed speakers’ production of /u/ by imposing a tube that limited the degree to which
speakers could form a tight labial constriction. Seven of the eleven speakers compensated
for the perturbation by pulling the tongue body back to form a more posterior
constriction. In so doing, the acoustics of the production were improved, however
complete compensation (i.e. to produce completely typical acoustics) was not achieved,
though the DIVA model predicted full compensation by means of tongue body retraction.
Though the authors explain that the speakers that did not exhibit compensation may have
been physically incapable of doing so given their vocal tract anatomy, or that their
forward models may have been inaccurate in these regions, making full compensation
impossible, they admit that “further investigation…is needed before ruling out the
97
possibility that some speakers use invariant targets that are more ‘constriction-like’.” It
would be interesting, indeed, to simulate post-glossectomy speech using the DIVA model
to determine whether, as in other cases involving perturbation, motor equivalence and
pre-perturbation acoustics prevail. If this is the case, the actual patient data could
potentially serve as preliminary evidence against a theory positing invariant acoustic
goals for vowels, and against the validity of the DIVA model of vowel production, in
particular. If patients do, in fact, use acoustic speech targets and have intact awareness of
the lingual tissue and an intact forward mapping mechanism, linking articulation with
acoustics, motor equivalence (compensation) in vowels to achieve typical acoustics
would be expected. Two explanations for patients not preserving acoustics post-
operatively, even if they do utilize acoustic targets, would be (i) inability to move the
articulators to produce the target acoustic output and (ii) impaired tongue awareness or an
inaccurate (not ‘updated’) forward map, linking tongue movements with acoustics
produced.
1
In the latter case, no motor equivalence would be expected, as was observed
for the four speakers in the Savariaux (1995) study that did not demonstrate articulatory
compensation to preserve acoustics of /u/ in the lip tube perturbation experiment. Finally,
it must be considered that adaptation periods for perturbations caused by structural
changes to the vocal tract such as a glossectomy tend to be far longer than those that
inhibit specific articulator movements (e.g. bite block or lip tube perturbations)
(McFarland and Baum, 1997) and that this may reflect yet unidentified differences in
ways that perturbations of different kinds affect the speech motor system. These very
differences may prevent the effects of perturbations caused by structural changes to the
vocal tract from being reflected accurately by current speech models.
A theory of acoustic goals of speech production predicts the patterns in post-
operative consonant production quite well. In post-operative speech we observe source,
aerodynamics, or tube constriction degree being preserved while compensation by
altering constriction location and articulators is evident. These findings are consistent
with a theory in which, “the only invariant target for a speech sound is the auditory
perceptual target, and this target can be reached with an infinite number of different
1
The same argument could be made for articulatory targets with impaired lingual
awareness and is modeled in the blind task dynamic simulation.
98
articulator configurations or vocal tract constriction configurations depending on things
like…constraints on the articulators.” (Gunther et al., 1998) Consonant compensation can
be explained in a theory of acoustic goals without having to posit goals at multiple levels,
as seems to be required in a theory of articulatory speech motor goals. As in the case of
vowels, it would be useful to determine whether the DIVA model would be capable of
modeling compensatory speech to preserve the acoustics of consonants.
In sum, a theory of articulatory speech targets can easily account for the patterns
observed in post-glossectomy vowel articulation, but would require positing constriction
degree, or aerodynamic goals at a level broader than the gestural level to account for
compensatory consonant articulation. A theory of acoustic speech targets, on the other
hand, can easily account for compensatory consonant articulation in post-glossectomy
speech, but cannot easily account for the fact that the acoustic properties of vowels are
not preserved post-operatively.
Designing experiments that will provide data in favor of one theory of speech
motor goals versus another is notoriously challenging due to the fact that regardless of the
specific conditions, at some level, articulation and acoustics are inseparable. Any aspect
of articulation preserved will inevitably improve the acoustics, just as any aspect of
acoustics preserved will be somehow evident in the underlying articulation. It is highly
unlikely that we will observe, after perturbation, that speakers perfectly maintain
acoustics without preserving any aspects of articulation, or perfectly maintain articulation
without preserving any of the acoustics.
99
References
Abbs, J., & Gracco, V. L. (1984). Control of complex motor gestures: Orofacial muscle
response to load perturbations of the lips during speech. Journal of Neurophysiology, 51,
705–723.
Alfonso, P. J. (1996). Long-term Spatiotemporal Stability of Lip-Jaw Synergies for Bilabial
Closure. Proceedings of the Australian International Conference on Speech Science and
Technology, (6), 567–576.
Boersma, P. & Weenink, D. (2014). Praat: doing phonetics by computer [Computer program].
Version 5.4, retrieved from http://www.praat.org/
Bresch, E., Kim, Y.-C., Nayak, K. S., & Narayanan, S. S. (2008). Seeing speech: Capturing
vocal tract shaping using real-time magnetic resonance imaging. IEEE Signal Processing
Magazine, 25(3), 123–132.
Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units.
Phonology, 6, 201–251.
Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: an overview. Phonetica,
49(3-4), 155–180.
Browman, C. P., & Goldstein, L. (1995). Dynamics and articulatory phonology. In R. Port &
T. van Gelder (Eds.), Mind as motion: dynamics, behavior, and cognition (pp. 175–194).
Boston: MIT Press.
Fletcher, S. G. (1988). Speech Production Following Partial Glossectomy. Journal of Speech
and Hearing Disorders, 53, 232–238.
100
Gay, T., Lindblom, B., & Lubker, J. (1981). Production of bite-block vowels: Acoustic
equivalence by selective compensation. Journal of the Acoustical Society of America, 69,
802–810.
Georgian, D. ., Logemann, J. A., & Fischer, H. B. (1982). Compensatory Articulation Patterns
of a Surgically Treated Oral Cancer Patient. Journal of Speech and Hearing Disorders,
47, 154–159.
Goldstein, L., & Fowler, C. (2003). Articulatory phonology: A phonology for public language
use. In N. O. Schiller & A. Meyer (Eds.), Phonetics and Phonology in Language
Comprehension and Production: Differences and Similarities (pp. 159–207). Berlin &
New York: Mouton de Gruyter.
Guenther, F. H. (1994). A neural model of speech acquisition and motor equivalent speech
production. Biological Cybernetics, 72, 43–53.
Guenther, F. H., Hampson, M., & Johnson, D. (1998). A theoretical investigation of reference
frames for the planning of speech movements. Psychological Review, 105, 611–633.
Hamlet, S. L., Mathog, R. H., Patterson, R. L., & Fleming, S. M. (1990). Tongue mobility in
speech after partial glossectomy. Head & Neck, 12(3), 210–217.
Hillenbrand, J., Getty, L., A., Clark, M., J., & Wheeler, K. (1995). Acoustic characteristics of
American English vowels. Journal of the Acoustical Society of America, 97(5), 3099–
3111.
Houde, J., & Jordan, M. I. (1998). Sensorimotor Adaptation in Speech Production. Science,
279(5354), 1213–1216.
101
Imai, S., & Michi, K. (1992). Articulatory Function After Resection of the Tongue Floor of
the Mouth: Palatometric and Perceptual Evaluation. Journal of Speech Language and
Hearing Research, 35, 68–78.
Imai, S., Michi, K., Yamashita, Y., Yoshida, Y., Michiwaki, K., Ohno, N., & Suzuki, N.
(1988). Speech intelligibility after resection of the tongue and floor of the mouth - The
relation between the surgical excisions or operation methods and speech intelligibility.
Japanese Journal of Oral Maxillofacial Surgery, 34, 1567.
Iskarous, K., Goldstein, L., Whalen, D. H., Tiede, M. K., & Rubin, P. E. (2003). CASY: The
Haskins Configurable Articulatory speech synthesizer. Proceedings of the 15th
International Congress of Phonetic Sciences, D. Recasens, M.-J. Solé & J. Romero
(Eds.), 185–188.
Iskarous, K. (2010). Perception of articulatory dynamics from acoustic signatures. Journal of
the Acoustical Society of America, 127(6), 3717–3728.
Kaipa, R., Robb, M. P., O’Bierne, G. A., & Allison, R. S. (2012). Recovery of Speech
Following Total Glossectomy: An Acoustic and Perceptual Appraisal. International
Journal of Speech-Language Pathology, 14(1), 24–34.
Kelso, J. A. S., Tuller, B., Vatikiotis-Bateson, E., & Fowler, C. (1984). Functionally specific
articulatory cooperation following jaw perturbations during speech: Evidence for
coordinative structures. Journal of Experimental Psychology: Human Perception and
Performance, 10, 812–832.
Kim, J., Kumar, N., Lee, S., & Narayanan, S. S. (2014). Enhanced airway-tissue boundary
segmentation for real-time magnetic resonance imaging data. Proceedings of the
International Seminar on Speech Production.
102
Ladefoged, P., Harshman, R., Goldstein, L., & Rice, L. (1978). Generating vocal tract shapes
from formant frequencies. Journal of the Acoustical Society of America, 64, 1027-1035.
Lammert, A., Proctor, M., & Narayanan, S. S. (2010). Data-Driven Analysis of Realtime
Vocal Tract MRI using Correlated Image Regions. Proceedings of Interspeech,
Makuhari, Japan
Liberman, A., & Mattingly, I. G. (1985). The motor theory of speech revisited. Cognition, 21,
1–36.
Liberman, A., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception
of the speech code. Psychological Review, 74, 431–461.
Maeda, S. (1990). Compensatory articulation during speech: evidence from the analysis and
synthesis of vocal-tract shapes using an articulatory model. In W. J. Hardcastle and A.
Marchal (Eds.), Speech Production and Speech Modeling, Kluwer Academic, Dordrecht,
131-149.
McFarland, D., & Baum, S. (1995). Incomplete compensation to articulatory perturbation.
Journal of the Acoustical Society of America, 97, 1865–1873.
McMicken, B., Von Berg, S., & Iskarous, K. (2012). Acoustic and Perceptual Description of
Vowels in a Speaker With Congenital Aglossia. Communication Disorders Quarterly,
34(1), 38–46.
Michi, K., Imai, S., Yamashita, Y., & Suzuki, S. (1989). Improvement of speech intelligibility
by a secondary operation to mobilize the tongue after glossectomy. Journal of Cranio
and Maxillofacial Surgery, 17, 162–166.
Morrish, L. (1984). Compensatory Vowel Articulation of the Glossectomee: Acoustic and
Videofluoroscopic Evidence. British Journal of Disorders of Communication, 19(2),
125–134.
103
Nam, H., Goldstein, L., Saltzman, E. L., & Byrd, D. (2004). TADA: An enhanced, portable
task dynamics model in MATLAB. Journal of the Acoustical Society of America, 115(24-
30).
Narayanan, S. S., Nayak, K. S., Lee, S., Sethy, A., & Byrd, D. (2004). An approach to real-
time magnetic resonance imaging for speech production. Journal of the Acoustical
Society of America, 115(4), 1771–1776.
Perkell, J., Matthies, M. L., Svirsky, M. A., & Jordan, M. I. (1993). Trading relations between
tongue-body raising and lip rounding in production of the vowel [u]: A pilot “motor
equivalence” study. Journal of the Acoustical Society of America, 93, 2948–2961.
Perrier, P. (2005). Control and representations in speech production. ZAS Papers in
Linguistics, 40, 109–132.
Port, R. (1981). Linguistic timing factors in combination. Journal of the Acoustical Society of
America, 69, 262–274.
Proctor, M., Lammert, A., Katsamanis, A., Goldstein, L., Hagedorn, C., & Narayanan, S. S.
(2011). Direct Estimation of Articulatory Kinematics from Real-time Magnetic
Resonance Image Sequences. Interspeech, Florence, Italy.
Proctor, M., Bone, D., Katsamanis, A., & Narayanan, S. S. (2010). Rapid Semi-automatic
Segmentation of Real-time Magnetic Resonance Images for Parametric Vocal Tract
Analysis. Interspeech, Makuhari, Japan.
Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to gestural patterning in
speech production. Ecological Psychology, 1(3), 333–382.
104
Saltzman, E. L., Nam, H., Krivokapic, J., & Goldstein, L. (2008). A task-dynamic toolkit for
modeling the effects of prosodic structure on articulation. Proceedings of the Speech
Prosody 2008 Conference, 175–184.
Savariaux, C. (1995). Etude de l’espace de controle distal en production de la parole: Les
enseignements d’une perturbation a l’aide d’un tube labial. Doctoral Dissertation,
l’Institut National Polytechnique de Grenoble, Grenoble, France.
Savariaux, C., Perrier, P., Orliaguet, J. P., & Schwartz, J. L. (1999). Compensation strategies
for the perturbation of the rounded vowel [u] using a lip-tube II. Journal of the Acoustical
Society of America, 106, 381–393.
Savariaux, C., Perrier, P., Pape, D., & Jacques, L. (2001). Speech production after
glossectomy and reconstructive lingual surgery: a longitudinal study. The 2nd
International Workshop on Models and Analysis of Vocal Emissions for Biomedical
Applications, Italy.
Shaiman, S. (1989). Kinematic and Electromyographic responses to perturbation of the jaw.
Journal of the Acoustical Society of America, 86, 78–88.
Simpson, A., & Meinhold, G. (2007). Compensatory Articulations in a Case of Congenital
Aglossia. Clinical Linguistics and Phonetics, 21(7), 543–556.
Stevens, K., & Blumstein, S. E. (1978). Invariant cues for place of articulation in stop
consonants. Journal of the Acoustical Society of America, 64, 1358–1368.
Stevens, K. (1998). Acoustic Phonetics. Massachusetts Institute of Technology Press.
Stone, M., Rizk, S., Woo, J., Murano, E. Z., Chen, H., & Prince, J. L. (2013). Frequency of
Apical and Laminal /s/ in Normal and Post-glossectomy Patients. Journal of Medical
Speech Language Pathology, 20(4), 106–111.
105
Tiede, M. K., Nam, H., Katsika, A., & Goldstein, L. (2011). Kinematic parsing of the U.
Wisconsin X-ray microbeam corpus applied to a prosodic boundary location task. New
Tools and Methods for Very-Large-Scale Phonetics Research Workshop, University of
Pennsylvania, Philadelphia, PA.
Westbury, J., R., Turner, G., & Dembowski, J. (1994, June). X-ray Microbeam Speech
Production Database User’s Handbook. Waisman Center, University of Wisconsin, Tech.
Rep.
Wood, S. (1979). A radiographic analysis of constriction locations for vowels. Journal of
Phonetics, 7, 25–43.
Wrench, A. A., & William, H. J. (2000). A multichannel articulatory database and its
application for automatic speech recognition. 5th Seminar on Speech Production: Models
and Data, Bavaria, 305–308.
Yuan, J., & Liberman, M. (2008). Speaker Identification on the SCOTUS corpus. Proc.
Acoustics (Paris), 5687–5690.
Zhou, X., Stone, M., & Carol, E.-W. (2011). A Comparative Acoustic Study on Speech of
Glossectomy Patients and Normal Subjects. Proceedings of Interspeech, Florence, Italy.
106
Appendix A: Constriction location and constriction degrees produced by the non-
blind, blind, and unaltered task dynamic models
Non-Blind Simulation
CL (cm) CD (mm
2
)
bait 14 149.85
bat n/o n/o
beat 14 79.82
bert 8 366.08
bet 14 219
bit 14 149.85
boat 10 54.8
boot 11 67.5
bought 8 171.7
but 9 107.8
pot 8 171.7
put 11 131.5
Blind Simulation
CL (cm) CD (mm
2
)
bait 15 296.05
bat n/o n/o
beat 14 217.83
bert n/o n/o
bet 16 347.2
bit 14 296.1
boat 9 243.85
boot 11 245.61
bought n/o n/o
but 9 280.4
pot n/o n/o
put 11 307.6
107
Unaltered Simulation
CL (cm) CD (mm
2
)
bait 14.8 113.3
bat n/o n/o
beat 14.19 57.01
bert 7.3 225.8
bet 15.6 179
bit 14.14 113.3
boat 8.95 26.3
boot 11.03 39.7
bought 7.9 113.3
but 8.9 55.2
pot 7.92 120.6
put 11 87.7
Appendix B: Stimuli for which PCA data were analyzed
1. “She had your dark suit in greasy wash water all year.”
2. “Don’t ask me to carry an oily rag like that.”
3. “When the sunlight strikes raindrops in the air, they act like a prism and form a
rainbow. The rainbow is a division of white light into many beautiful colors.
These take the shape of a long round arc with its path high above and its two ends
apparently beyond the horizon.”
Abstract (if available)
Abstract
This dissertation aims to investigate the articulatory behavior of post-operative glossectomy patients. Patterns in patients’ vocal tract area functions, acoustic vowel spaces and speech perceptibility scores are shown to closely reflect the particular loci of their resections. Oral and base of tongue cancer patients exhibit articulatory compensation in consonant and vowel production, though they do not enhance durational cues in tense and lax vowels to aid in perceptibility. Principal component analyses on vocal tract cross-distance measures indicate that patients exhibit less lingual flexibility than do typical speakers. Results of a two-condition task dynamic simulation in which the system takes into account the altered tongue size or assumes the pre-operative tongue size are considered in light of the experimental findings based on patient data.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Investigating the production and perception of reduced speech: a cross-linguistic look at articulatory coproduction and compensation for coarticulation
PDF
Dynamics of speech tasks and articulator synergies
PDF
The prosodic substrate of consonant and tone dynamics
PDF
Articulatory knowledge in phonological computation
PDF
Dynamics of consonant reduction
PDF
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
PDF
Harmony in gestural phonology
PDF
Tone gestures and constraint interaction in Sierra Juarez Zapotec
PDF
Beatboxing phonology
PDF
Emotional speech production: from data to computational models and applications
PDF
Structure and function in speech production
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
The grammar of correction
PDF
Visualizing and modeling vocal production dynamics
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Effects of speech context on characteristics of manual gesture
PDF
The phonology and phonetics of Turkish intonation
PDF
Mechanisms underlying acquisition of non-adjacent dependencies
PDF
The planning, production, and perception of prosodic structure
PDF
Understanding family caregiver needs post-laryngectomy: an evaluation study
Asset Metadata
Creator
Hagedorn, Christina M. (author)
Core Title
Speech production in post-glossectomy speakers: articulatory preservation and compensation
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Publication Date
07/29/2017
Defense Date
08/01/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Compensation,glossectomy,OAI-PMH Harvest,preservation,speech
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Louis (
committee chair
), Iskarous, Khalil (
committee member
), Narayanan, Shrikanth S. (
committee member
), Sinha, Uttam (
committee member
)
Creator Email
chagedor@usc.edu,christina.m.hagedorn@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-614485
Unique identifier
UC11304124
Identifier
etd-HagedornCh-3747.pdf (filename),usctheses-c3-614485 (legacy record id)
Legacy Identifier
etd-HagedornCh-3747.pdf
Dmrecord
614485
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Hagedorn, Christina M.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
glossectomy
preservation