Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
(USC Thesis Other)
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Signs of Skilled Adaptation in the Co-Speech Ticking of Adults with Tourette syndrome
by
Mairym Lloréns Monteserín
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(LINGUISTICS)
December 2022
Copyright 2022 Mairym Lloréns Monteserín
ii
Epigraph
First forget inspiration. Habit is more dependable.
Octavia Butler
iii
Acknowledgements
The love and support of many people brought me here. I should first thank Drs. Louis
Goldstein, Dani Byrd, Khalil Iskarous, and Shri Narayanan for all the opportunities they afforded
me, for their mentorship and patience, and for their scientific work. Your research inspires in me
the most stimulating ruminations, and I live for those.
Knowing that I had my family behind me kept me sane on many an occasion, particularly
toward the end. Thank you for understanding that the misery was worth it to me. Thank you for
being proud of me.
I am very grateful to my chosen family of the CCHD because without them, I have no heart,
no life. Thank you for waiting for me, it took longer than expected. To my Barcelona family,
now each in your own place, thank you for not forgetting me. I’m coming to see you soon. To
everyone who hosted me during academic travels, my house is your house forevermore.
The USC campus would have been hell were it not for Dr. Jason Zevin and his lab. I am
grateful to him, Melissa Reyes, and Giovanni Nunez-Dueñas for making me feel like myself and
laughing with me. I laughed less after I stopped seeing you. To my comrades in LATU, thank
you for always trying to be there for me, even when I resisted. To the people of Venice, thank
you for your love.
The two people with whom I formed a COVID pod during Spring 2020 are in many ways
responsible for my having reached the finish line: my friend Theodore Flood and my brother Ian
J. Lloréns. Whether it was walking Pepe on my behalf when I was too overwhelmed or just
sitting with me when I wasn’t very good company, you were always there. You were with me
when literally no one else could be and I will be forever grateful to both of you.
Doing a PhD is extremely difficult. Having executive disfunction doesn’t help. But finally
iv
being diagnosed as neurodiverse at 37 years of age really changes how you view yourself and
how you manage your life—for the better. So I find myself extremely grateful for my medical
providers at the Venice Family Clinic and especially for my therapist, Dr. Wendy Ashley. You
encouraged me to advocate for myself and my disability; you helped me reclaim bits and pieces
of myself that were instrumental to my success. I will never forget that.
Two colleagues have, at different stages, provided the safe and fertile environment necessary
for hashing out my ideas. First, Dr. Aldo Filomeno, thank you for answering question after
question after question. I am glad that you are working to understand how there can be stable
regularities without physical laws. And to Dr. Reed Blaylock, well, I’ve decided it’s futile to try
to write something sensical here today. So to you I just say, we did, in fact, do it. I love you like I
love half of my brain.
The people that most directly enabled my research were my collaborators. To the Tourettes
Hero, Jess Thom, I thank you for literally every single word that has come from your mouth,
whether ticked or spoken. All the participants were willing to sit and share their stories with me.
No amount of compensation can truly capture what they brought to the table. I thank you from
the bottom of my heart.
v
Table of Contents
Epigraph .................................................................................................................................... ii
Acknowledgements .................................................................................................................. iii
List of Figures ......................................................................................................................... viii
List of Tables .......................................................................................................................... xiii
Abstract ................................................................................................................................... xvi
Chapter 1. General Introduction ................................................................................................ 1
1.1 Action control in circumstance ..................................................................................................... 4
1.2 Skills and their background corrections ....................................................................................... 7
1.3 What is a vocal tic? ....................................................................................................................... 9
1.4 Preview of dissertation ............................................................................................................... 12
Chapter 2. The actions of ticking and speaking are coordinated in time ................................. 15
2.1. Introduction ................................................................................................................................ 15
2.1.1. Hypotheses ........................................................................................................................................ 20
2.2. Methods ..................................................................................................................................... 22
2.2.1. Case studies ....................................................................................................................................... 22
2.2.2. Recordings ......................................................................................................................................... 23
2.2.3. Segmentation and labeling ................................................................................................................ 24
2.2.4. Statistical analysis ............................................................................................................................. 30
2.3. Results ........................................................................................................................................ 35
2.3.1. Participant A ...................................................................................................................................... 35
2.3.2. Participant B ...................................................................................................................................... 41
vi
2.3.3. Participant C ...................................................................................................................................... 46
2.3.4. Participant D ...................................................................................................................................... 51
2.4. Discussion .................................................................................................................................. 55
2.4.1. Experience and skill in co-speech ticking ......................................................................................... 56
2.4.2. Flexibility on the part of the speech system ...................................................................................... 58
2.4.1. Better ticking ..................................................................................................................................... 62
2.5. Conclusion ................................................................................................................................. 63
Chapter 3. Ticking and Talking on Distinct Acoustic Channels ............................................. 66
3.1. Hypotheses ................................................................................................................................. 73
3.2. Methods ..................................................................................................................................... 78
3.2.1. Case studies ....................................................................................................................................... 78
3.2.2. Recordings ......................................................................................................................................... 78
3.2.3. Measurements .................................................................................................................................... 79
3.2.4. Statistical analyses ............................................................................................................................. 84
3.3. Results ........................................................................................................................................ 88
3.3.1. Participant A ...................................................................................................................................... 88
3.3.2. Participant B ...................................................................................................................................... 93
3.3.3. Participant C ...................................................................................................................................... 98
3.3.4. Participant D .................................................................................................................................... 104
3.4. Discussion ................................................................................................................................ 109
3.5. Conclusion ............................................................................................................................... 119
Chapter 4. Verbal Tics Don’t Undergo Boundary-related Lengthening ............................... 120
4.1 Introduction ............................................................................................................................... 120
4.2 Method ...................................................................................................................................... 121
vii
4.2.1 Recordings ........................................................................................................................................ 122
4.2.2. Segmentation and labeling .............................................................................................................. 123
4.2.3. Analysis ........................................................................................................................................... 127
4.3. Results ...................................................................................................................................... 128
4.4. Discussion ................................................................................................................................ 129
4.5. Conclusion ............................................................................................................................... 133
Chapter 5. General Discussion .............................................................................................. 134
5.1 Concluding remarks .................................................................................................................. 139
References ............................................................................................................................. 141
Appendices ............................................................................................................................ 149
viii
List of Figures
Figure 1. Spectrogram and transcript of verbal tics and read speech. Pitch track overlaid in cyan.
Green text and shading represents word that hosted phrase-final boundary tone. Red text and shading
indicate vocal noise tics. Tics followed first attempt at word day which resulted in a break that was
subsequently corrected. ............................................................................................................................... 15
Figure 2. Schematic representation of ways one tic can interrupt (red shading) and cooperate (blue
shading) with the production of a five-word intonational phrase (IP). Each row represents a single
Intonational Phrase (IP). W = word; T = tic; B = word hosting phrase-final boundary tone.
Underscores indicate potential tic hosting sites. The left-most W in each chunk is phrase-initial.
Red T tic events have occurred in interruptive positions; blue T tic events have occurred in
cooperative positions. .................................................................................................................................. 21
Figure 3. Spectrogram and transcript of utterance produced by Participant B during a personal
narrative. Pitch track overlaid in cyan. Three chunks (outlined in blue) result from segmentation
of two pauses lasting 499 (left) and 289 (right) milliseconds. The interval corresponding to the
distal tic funking is shaded in blue. Phrase-final words and their corresponding acoustic intervals
are in green. ................................................................................................................................................. 26
Figure 4. Spectrogram and transcript of utterances demonstrating phrase-final boundary tone
categories for varieties of British English in the IViE transcription system that were to delimit
intonational phrases inside chunks containing speech. Pitch track overlaid in cyan. Blue shading
over verbal tic. Green shading over words hosting a boundary tone. Top panel: continuation rise
on word light and final fall on word colors. Bottom panel: two instances of tonal plateaus on
children and happy. ..................................................................................................................................... 28
Figure 5. Spectrogram portion of The Rainbow Passage read by Participants C (top) and A (bottom).
Transcripts above spectrogram in rough time alignment. Pitch tracks overlaid in cyan. Tic intervals
are shaded in blue and they are transcribed in blue. Phrase-final word intervals are shaded in green
and transcribed in green. .............................................................................................................................. 30
Figure 6. Schema of procedure for counting positions. W represents words that do not host boundary
tones, B represents words that do. Underscores indicate possible positions for tic occurrence. Positions
that are inherently cooperative are highlighted in blue. .............................................................................. 32
Figure 7. Spectrogram of portion of The Rainbow Passage. Transcript of utterance above
spectrogram in rough time alignment. Pitch track overlaid in cyan. Acoustic intervals corresponding
to the phrase-final words gold and rainbow shaded in green; these host a continuation rise and a final
falling boundary tone, respectively. Three verbal occurred (from left): “biscuit”, in red, is speech-
proximal and constitutes an interruption; sausage, in blue, is speech-proximal and cooperative.
The verbal tic hey is distal from speech. ..................................................................................................... 34
Figure 8. Correlation matrices of Tic Presence x Position Status. Expected counts if variables are
independent on left; observed counts on right. Size of the circle in each cell indicates the relative
contribution of that cell to the total count. Color of circle indicates raw counts. ........................................ 38
Figure 9. Residuals of expected and observed Tic Presence x Position Status. Circle size indicates
relative contribution of the cell. Circle color represents raw value of residual. .......................................... 38
ix
Figure 10. Number of tics (A), intonational phrases (B), and words (C) per chunk in three speech
task types. In this and all following boxplots, the group median is represented as a horizontal line
inside the box, the interquartile range [IQR] is represented by the box, and intervals between
minimum and maximum values within 1.5 * the IQR are represented by vertical bars. Significance
in all pair-wise comparisons based on Wilcoxon test. ................................................................................. 39
Figure 11. Proportions of Distal and Speech-proximal Cooperative tic events in two speech tasks. ......... 40
Figure 12. Number of tics (A), intonational phrases (B), and words (C) per chunk in three speech
task types. Significance in all pair-wise comparisons based on Wilcoxon test. .......................................... 41
Figure 13. Correlation matrices of Tic Presence x Position Status, tasks pooled. Expected counts if
variables are independent on left; observed counts on right. Size of the circle in each cell indicates
the relative contribution of that cell to the total count. Color of circle indicates raw counts, tasks
pooled. ......................................................................................................................................................... 44
Figure 14. Chi-square residuals by Task Type. The size and color of the circle in each cell represents
the relative contribution of that cell and the value of the residual, respectively. Zero is represented by
white. These results were each significant, p < .001. .................................................................................. 44
Figure 15. Relative proportions of three tic event types in reading (top), picture description (middle),
and personal narrative tasks (bottom) performed by Participant B. Distal tics and Outside IP tics
areinherently cooperative (gray and blue); Inside IP tics are inherently interruptive (orange). .................. 46
Figure 16. Number of tics (A), intonational phrases (B), and words (C) per chunk in three speech
task types. Significance in all pair-wise comparisons based on Wilcoxon test. .......................................... 48
Figure 17. Correlation matrices of Tic Presence x Position Status, tasks pooled. Expected counts if
variables are independent on left; observed counts on right. Size of the circle in each cell indicates
the relative contribution of that cell to the total count. Color of circle indicates raw counts. ..................... 48
Figure 18. Chi-square residuals by Task Type. The size and color of the circle in each cell represents
the relative contribution of that cell and the value of the residual, respectively. Zero is represented by
white. These results were each significant, p < .001. .................................................................................. 49
Figure 19. Relative proportions of three tic event types in reading (top), picture description (middle),
and personal narrative tasks (bottom) performed by Participant C. ............................................................ 50
Figure 20. Number of intonational phrases (A), tics (B), and words (C) per chunk in three speech task
types. Significance in all pair-wise comparisons based on Wilcoxon test. ................................................. 53
Figure 21. Correlation matrices of Tic Presence x Position Status. Expected counts if variables are
independent on left; observed counts on right. Size of the circle in each cell indicates the relative
contribution of that cell to the total count. Color of circle indicates raw counts. ........................................ 54
Figure 22. Relative proportions of three tic event types in reading (top), picture description (middle),
and personal narrative tasks (bottom) performed by Participant D. ............................................................ 55
x
Figure 23. Two sequential tic pairs in cooperative interaction with an intonational phrase. Transcript
of utterance in rough time alignment above spectrogram. Pitch track overlaid in cyan. Word hosting
IP-final falling tone (work) and its acoustic interval in green. Verbal tics and their acoustic intervals
in blue. Pauses flanking the utterance show that it represents a single mixed chunk. ................................ 62
Figure 24. Spectrogram of portion of utterance by Participant C produced while describing a picture.
Pitch track overlaid in cyan. Utterance transcription above the spectrogram in rough time-alignment.
The verbal tic “biscuit” in the utterance is capitalized and in red font; red shading surrounds the
interval of time corresponding to it. ............................................................................................................ 67
Figure 25. Participant A boxplots showing results of acoustic measurements on tic (n=45) and word
(n=707) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR, Mean Intensity,
Mean H4* and Mean H1*-H2*. In this and all following boxplots, the group median is represented
as a horizontal line inside the box, the interquartile range [IQR] is represented by the box, and
intervals between minimum and maximum values within 1.5 * the interquartile range [IQR] are
represented by vertical bars. Significance is based on p-values adjusted using Bonferroni correction. ..... 91
Figure 26. Participant A boxplots showing acoustic measurements in Tic+word paired events
(left panels) and Word+Tic paired events (right panels). Tics are in salmon (preceding element in left
panels, following element in right panels), and words are in cyan (following element in left panels,
preceding element in right panels). Paired events are connected by a gray line. P-values are from
paired, one-sided Wilcoxon tests. ................................................................................................................ 93
Figure 27. Participant B boxplots showing results of acoustic measurements on tic (n=79) and word
(n=1556) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR,
Mean Intensity, Mean H4* and Mean H1*-H2*. ........................................................................................ 95
Figure 28. Participant B boxplots showing acoustic measurements in Tic+Word paired events
(left panels; n=63) and Word+Tic paired events (right panels; n=65). Tics are in salmon (preceding
element in left panels, following element in right panels), and words are in cyan (following element
in left panels, preceding element in right panels). Paired events are connected by a gray line. P-values
are from paired, one-sided Wilcoxon tests. ................................................................................................. 98
Figure 29. Participant C boxplots showing results of acoustic measurements on tic (n=476) and
word (n=2677) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR, Mean
Intensity, Mean H4* and Mean H1*-H2*. ................................................................................................ 101
Figure 30. Participant C boxplots showing acoustic measurements in Tic+Word paired events
(left panels; n=63) and Word+Tic paired events (right panels; n=65). Tics are in salmon (preceding
element in left panels, following element in right panels), and words are in cyan (following element
in left panels, preceding element in right panels). Paired events are connected by a gray line.
P-values are from paired, one-sided Wilcoxon tests. ................................................................................ 104
Figure 31. Participant D boxplots showing results of acoustic measurements on tic (n=32) and
word (n=1867) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR,
Mean Intensity, Mean H4* and Mean H1*-H2*. ...................................................................................... 106
Figure 32. Participant D boxplots showing acoustic measurements in Tic+Word paired events
(left panels; n=7) and Word+Tic paired events (right panels; n=7). Tics are in salmon (preceding
element in left panels, following element in right panels), and words are in cyan (following element
xi
in left panels, preceding element in right panels). Paired events are connected by a gray line.
P-values are from paired, one-sided Wilcoxon tests. ................................................................................ 109
Figure 33. Spectrogram of utterance produced by Participant C while reading “The Rainbow
Passage” aloud. A transcript of the utterance is above the spectrogram in rough time alignment.
Pitch tracking for the utterance is in cyan. Green text indicates that the word hosted a phrase-final
boundary tone; corresponding time intervals in the spectrogram are shaded in green. Blue text
indicates tics that did not interfere with production of a phrase; corresponding time intervals are
shaded in blue. Red text indicates tics that did interfere with production of a phrase; corresponding
time intervals are shaded in red. ................................................................................................................ 111
Figure 34. Boxplots of stressed vowel Mean f0 data for the three most frequently produced tics by
each participant (clockwise from top left: Participant A, Participant B, Participant C, and
Participant D). Significance results are from two-sided independent samples Wilcoxon
rank sums tests. .......................................................................................................................................... 118
Figure 35. Spectrogram of utterance containing two post-boundary “biscuit” tics. Transcript in
rough time alignment. Phrase-final words in green text and shading. Tics in blue text and shading. ...... 120
Figure 36. Spectrogram and transcript of utterances demonstrating phrase-final boundary tone
categories for varieties of British English in the IViE transcription system that were used to delimit
intonational phrases inside chunks containing speech. Pitch track overlaid in cyan. Blue shading
over verbal tic. Green shading over words hosting a boundary tone. Top panel: continuation rise
on word light and final fall on word colors. Bottom panel: two instances of tonal plateaus on
children and happy. ................................................................................................................................... 126
Figure 37. Faux-Final (blue text and shading) and Faux-Medial (red text and shading) instances of
“biscuit”. Spectrogram overlaid with pitch track in cyan. Green shading and text indicates
phrase-final boundaries. ............................................................................................................................. 128
Figure 38. Acoustic duration of “biscuit” tics across three Faux-prosodic positions. One-way
ANOVA found no significant differences between groups. ...................................................................... 129
Figure 39. Conjectured triggering of tic vocalization by decreasing activation of π-gesture
that embodies the phrase boundary. .......................................................................................................... 130
Figure 40. “Biscuit” tic duration (left) and stressed vowel f0 (right) by group including distal tics. ....... 132
Figure 41. Three consecutive breakfast moments under familiar circumstances. At each moment,
the prepotent task recruits tasks from lower levels. Automation most developed for lower-level tasks.
Thickness of arrows connecting moments represents how much attention is paid to that transition. ....... 135
Figure 42. Three consecutive breakfast moments under unfamiliar circumstances. At each moment,
the prepotent task recruits tasks from lower levels. Automation most developed for lower-level tasks.
Thickness of arrows connecting moments represents how much attention is paid to that transition. ....... 137
Figure 43. Data collection protocol. Orange squares – passage readings; blue squares – personal
narratives; yellow squares – picture descriptions. Participants decided on which between-block
break to take lunch. Recording was only stopped for breaks that required the participant to leave
the room (e.g., bathroom, lunch). .............................................................................................................. 150
xii
Figure 44. Task duration (top), word count (middle), and tic count (bottom) by task type for each
participant. ................................................................................................................................................. 153
xiii
List of Tables
Table 1. Monadic speech tasks and the verbal prompts that elicited them. ................................................ 24
Table 2. Two by two contingency table of counts analyzed using chi-square test of independence. ......... 33
Table 3. Chunk count per task and average number of words, IPs, tics and cooperative/interruptive
positions per chunk. ..................................................................................................................................... 36
Table 4. Contingency table of Tic Presence and Position Status for chunks containing speech. ................ 37
Table 5. Mean number of words, tics, intonational phrases, and Cooperative/Interruptive possible tic
positions per chunk in each task. Gray shading indicates observed count of Interruptive tics in task is
higher than observed count of Cooperative tics. ......................................................................................... 42
Table 6. Counts submitted to chi-square analysis for reading, narrative, and picture description tasks. .... 43
Table 7. Average number of words, tics, intonational phrases, and cooperative/interruptive positions
per chunk for each task. ............................................................................................................................... 47
Table 8. Counts submitted to chi-square analysis for reading, narrative, and picture description tasks. .... 49
Table 9. Mean number of words, tics, intonational phrases, and Cooperative/Interruptive possible tic
positions per chunk in each task. ................................................................................................................. 52
Table 10. Contingency table of observed counts for Tic Presence and Position Status variables. ............. 53
Table 11. Counts of intonational phrases, true-words, and tics per chunk in three task types for
Participant C. Interruptivity is the percentage of all tic events that occurred interior to an intonational
phrase; darker shading indicates more interruptivity. Blue square = significantly more IPs per chunk
in reading tasks. ........................................................................................................................................... 59
Table 12. Counts of intonational phrases, true-words, and tics per chunk in three task types for
Participant D. Interruptivity is the percentage of all tic events that occurred interior to an
intonational phrase; darker shading indicates more interruptivity. Red rectangle = words per
chunk different across tasks. ........................................................................................................................ 60
Table 13. Acoustic parameters and relationships between falsetto and modal voice with respect
to those parameters. ..................................................................................................................................... 76
Table 14. Predicted change in each acoustic parameter at cross-system transitions ................................... 77
Table 15. Verbal tasks elicited and the prompts used to elicit them. .......................................................... 79
Table 16. Stressed vowel counts for each participant by vocalization type. ............................................... 85
Table 17. Counts of tic/word tokens and two cross-system juncture types for each participant. ................ 87
Table 18. Difference between Tic and Word group averages in each acoustic parameter by
Task Type. Gray cells indicate parameters that did not pattern as predicted. ............................................. 89
xiv
Table 19. Acoustic parameter extraction results for ticked (n=45) and word (n=707) stressed
vowel intervals for Participant A. Hypotheses shown in third column tested by unpaired, one-sided
Wilcoxon tests using group medians. Reported p-values are adjusted using Bonferroni correction. ......... 90
Table 20. Acoustic parameter results from preceding and following word-size elements at
Participant A cross-system transitions. Tic+Word transitions (left; n=7) are made up of consecutive
Tic+Word pairs. Word+Tic transitions (right; n=5) pairs are made up of consecutive Word+Tic pairs. ... 92
Table 21. Average Tic and Word group differences for acoustic parameters by Task Type. All
differences patterned as predicted. .............................................................................................................. 94
Table 22. Acoustic parameter extraction results for ticked (n=79) and spoken (n=1556) Participant B
stressed vowel intervals. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon
tests using group medians. Reported p-values are adjusted using Bonferroni correction. .......................... 95
Table 23. Acoustic parameter results from preceding and following word-size elements at
Participant B cross-system transitions. Tic+Word transitions (left; n=63) are made up of
consecutive Tic+Word pairs. Word+Tic transitions (right; n=65) pairs are made up of consecutive
Word+Tic pairs. ........................................................................................................................................... 97
Table 24. Difference between Tic and Word averages by Task Type. Gray rows indicate acoustic
parameters that did not pattern as predicted. ............................................................................................. 100
Table 25. Acoustic parameter extraction results for tic (n=476) and word (n=2677) Participant C
stressed vowel intervals. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon
tests using group medians. Reported p-values are adjusted using Bonferroni correction. ........................ 101
Table 26. Acoustic parameter results from preceding and following word-size elements at
Participant C cross-system transitions. Tic+Word transitions (left; n=628) are made up of
consecutive Tic+Word pairs. Word+Tic transitions (right; n=650) pairs are made up of consecutive
Word+Tic pairs. ......................................................................................................................................... 103
Table 27. Differences between Tic and Word group averages for each acoustic parameter by
Task Type. Gray cells indicate parameters that did not pattern as predicted. ........................................... 105
Table 28. Acoustic parameter extraction results for ticked (n=32) and spoken (n=1867) Participant D
stressed vowel intervals. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon
tests using group medians. Reported p-values are adjusted using Bonferroni correction. ........................ 106
Table 29. Acoustic parameter results from preceding and following word-size elements at
Participant C cross-system transitions. Tic+Word transitions (left; n=7) are made up of
consecutive Tic+Word pairs. Word+Tic transitions (right; n=7) pairs are made up of consecutive
Word+Tic pairs. ......................................................................................................................................... 108
Table 30. Labels and counts for each participant’s three most frequently produced tic words.
Participant D’s only repeated verbal tic, the phrase fuck off, is presented in different ways for the
purposes of this exercise. ........................................................................................................................... 117
Table 31. Monadic speech tasks and the verbal prompts that elicited them. ............................................ 123
xv
Table 32. Counts, average duration in milliseconds, and standard deviation of average
for “biscuit” tics across Faux-Prosodic positions. ..................................................................................... 128
Table 33. Duration of tasks performed by each participant and count of words and tics that occurred. .. 150
Table 34. Participant unique/total tic count, word count, and count of speech tasks performed. ............. 154
Table 35. Unique vocal and verbal tics and their frequency by speech task type. The average
acoustic duration of each unique as well as the average f0 of its stressed vowel is presented
for reference. .............................................................................................................................................. 154
Table 36. Unique vocal and verbal tics and their frequency by speech task type. The average
acoustic duration of each unique as well as the average f0 of its stressed vowel is presented for
reference. ................................................................................................................................................... 155
Table 37. Unique vocal and verbal tics and their frequency by speech task type. The average
acoustic duration of each unique as well as the average f0 of its stressed vowel is presented for
reference. ................................................................................................................................................... 157
Table 38. Unique vocal and verbal tics and their frequency by speech task type. The average
acoustic duration of each unique as well as the average f0 of its stressed vowel is presented for
reference. ................................................................................................................................................... 159
xvi
Abstract
Adults with Tourette syndrome produce unwanted movements and vocalizations called tics
that do not correspond to their own behavioral goals and, as a result, appear inappropriate in
context. Tics are often preceded by an uncomfortable sensation that grows in intensity until the
tic response is released. Tics occur on a background of typical goal-directed behavior, including
speech, but ticking and speaking appear to be at cross-purposes. Strictly speaking, production of
vocal tics (i.e., tic vocal tract movements that have an audible result) cannot overlap in time with
speech production because the two kinds of behavior have opposed aims whilst requiring action
by the same set of effectors. Preceding urges to tic, however, frequently co-occur with intentions
to speak because like visceral urges, urges to tic are orthogonal to intentions to act. Speech
planning and production processes in tickers therefore co-mingle with what could be competing
urges to vocalize. The broad objective of this dissertation is to determine whether, and how,
ticking and speaking can manifest “cooperatively”.
So-called “cooperative” interactions between ticking and speaking allow the tasks of each
system to be achieved. Using a corpus of acoustic recordings of adult tickers performing a
variety of monadic speech tasks while ticking freely, linguistically informed analyses were
carried out, each of which probed a different aspect of co-speech ticking for specific signs of
optimization. Taken together, results suggest that interactions promoted optimization.
Three signs of systematic and optimized interaction between ticking and speaking were
identified in co-speech ticking data. First, for most tickers, the vast majority of tic-words
occurring during running speech are located immediately before or immediately after prosodic
phrase boundaries, which is to say that tics systematically occur around prosodic phrases, not
interior to them. Ticking around prosodic phrases ensures that a talker’s intended linguistic
xvi
xvii
message is produced correctly while still allowing for frequent tic urge satisfaction—an optimal
outcome given the circumstance. The distributional pattern is suggestive of temporary deferment
of tics to the end of a phrase, a sort of accommodation on the part of the tic system. But there is
also evidence that the speech system itself re-organizes to accommodate potential tics through
the use of adaptive prosodic phrasing. When ticking around phrases, shorter phrases mean more
frequent tic production. At least one participant may be using this principle to her advantage: the
shorter the phrases in this dataset, the fewer the interruptions by tics. In contrast, one participant
who shows no adaptive changes to the size of phrases in response to ticking has very frequent tic
interruptions. A second predicted sign of optimization was found by comparing the phonatory
characteristics of stressed vowels in verbal tics and true words. It was found that the former are
distinguishable from the latter in that verbal tic stressed vowels display acoustic signatures of
falsetto voice. By producing verbal tics along a non-speech acoustic channel talkers who tic can
segregate intended, dialogic meaning from unintended referential meaning. Segregated tic and
speech acoustic channels were observed in all four case studies—even in the case of the one
participant who consistently failed to keep tics out of prosodic phrases. A third sign of
cooperative interaction is with regards to the token-to-token durational variability of verbal tics,
which does not appear word-like. Proximity to prosodic phrases boundaries does not induce
lengthening in verbal tics, suggesting that these tics may be “unprosodified” (phrasal prosody),
even if their temporal occurrence is in part determined by the presence of prosodic boundaries.
Participants differ with respect to the amount of experience they have in free-ticking (on its
own) and co-speech free-ticking, and differences across participants in co-speech ticking patterns
align with expectations given apparent skill level. For instance, one participant produced
hundreds more tics than the others but also experienced few interruptions; this participant free-
xviii
tics by default and works in public speaking. A participant who reported rarely free-ticking, in
contrast, produced mostly interruptive tics. These differences suggest that the observed
interactions between ticking and speaking reflect compensatory strategies that have been
developed to ensure the quality of speech (e.g., fluency, clarity) despite frequent ticking. Taken
together, the results presented in this dissertation support the notion that in order to optimally
achieve the tasks of ticking and speaking, tic and speech actions are co-orchestrated.
1
Chapter 1. General Introduction
Cognitive scientists may have the luxury of modularity, but the human performer does not. In
matters of milliseconds, the brain-body-environment system must work with a wide array of
signals.
(Dale & Kello, 2018:62)
For the most part, human behavior is coherent rather than self-defeating even though needs,
desires, and goals are frequently at odds. A person sitting in the living room watching TV who
gets hungry cannot continue to watch their show while simultaneously obtaining nourishment—
feeding and watching are at odds. In order to feed, they must go to the kitchen and serve themself
a bowl of cereal. If they want to avoid missing the show, then they must refrain from going to the
kitchen, though human ingenuity predicts that a work-around for these contingencies will
emerge. Humans are crafty animals and there are many ways to resolve the apparent conflict.
The TV can be permanently placed within eyeshot of the kitchen, for instance; maybe it’s just a
matter of waiting for the next commercial break. A group of people did, in fact, invent a way to
halt TV programs OnDemand ™, allowing viewers to effortlessly interleave the actions of show-
watching with the those required for feeding. Both systems of behavior are adaptive—able to
flexibly accommodate the current circumstance in service of the situated goal, namely “making
dinner while watching TV”.
Depending on how frequently this circumstance occurs, very little to no attention will be
required to navigate the requirements of each behavioral modality. A cross-modal or composite
goal limits the possibilities of action, reducing the cognitive resources required for planning. To
illustrate, consider an adult TV-viewer who loves to cook and has accumulated orthogonal
experience managing urges to micturate (like most adults). This person understands that a
commercial break provides enough time to use the bathroom but not enough time to make a
2
risotto. A TV-viewer that knows how to make risotto will know that they should not attempt to
do so whilst watching a big game that requires their undivided attention. Risotto requires almost
constant stirring, making it an impractical dish for simultaneous TV-watching/cooking. Unless,
that is, you were one of the people who did not wish to compromise and decided to place your
TV in such a way that facilitates co-performance of cooking and viewing actions.
It is of course understood that production of speech entails composite goals because
production of a phrase involves production of grouped words, each of which involves production
of certain syllables and phonemes in a particular phrasal context, etc; it is also the case that
speech is a behavioral tool through which humans perform illocutionary acts. It is perhaps less
common to consider speech planning and production as a process taking place in bodies that are
host to a wide variety of visceral and/or cognitive urges.
Vocal tic phenomena provide a unique opportunity to investigate cross-modal interaction
because the bodies of people with Tourette’s always “want” to produce certain movements and
vocalizations called tics (e.g., Niccolai et al., 2019). Tourette syndrome is a neurological
condition that causes individuals to produce unwanted movements and vocalizations called tics.
1
Tics don’t correspond to any part of a ticker’s behavioral goals. As a result, tic actions don’t
cohere with an individual’s intended behavior. They do, however, mimic purposive action in
form. Some vocal tics, for example, mimic words and phrases. Tickers produce their vocal and
1
Very little detail regarding the clinical presentation and neuropathology of Tourette syndrome is
presented in this text. For further study see excellent reviews by Hartmann & Worbe (2018),
Hashemiyoon et al. (2017), and Müller-Vahl, Sambrani, & Jakubovski (2019).
3
verbal tics on a background of typical goal-directed behavior, including speech.
2
Tourette’s most
often develops in early childhood but tics sometimes appear in adults (e.g., Robertson, 2008a;
Scahill, Simmons, & Volkmar, 2013).
Vocal tics and speech require the same set of effectors and they cannot be produced
simultaneously –they must occur in some relative order. Thus, one way that co-production of tics
and speech could be optimized is for their temporal occurrence to be co-orchestrated. Optimal
ordering is dependent on context. If action planning can use information available from the
body’s current urge landscape, then each system’s constraints and sub-skills can be put to use.
This relates to adaptive action in circumstance. Coordinating the actions for ticking and speaking
in time according to principles of optimization promotes the tasks of both systems being
achieved. But speakers are also concerned with the quality of their speech (fluency, clarity, etc.)
and the presence of tics in generally impacts said quality. It stands to reason that the drive for
skill compensation will adopt further strategies to ameliorate these effects and differentiating tics
and speech acoustically is one available option.
To summarize the preceding discussion, the nature of ticking is such that it could interfere
2
With regards specifically to stutter, the relationship between ticking and this kind of speech disfluency is
hard to assess because (a) certain aspects of stuttering and ticking are hard to disentangle behaviorally and
(b) the developmental course of the conditions mirror each other. Stuttering and ticking both start in early
childhood, though persistent developmental stuttering may appear earlier than typical ticking. Prevalence
figures are equivalent: 10% of all children show observable signs of stutter and tics with only 0.1 - 1% of
children retaining the symptoms (Robertson, 2008a, 2008b). In terms of observable behavior, certain
kinds of disfluency are indistinguishable from stutter (e.g., syllable repetitions). Findings from two
studies bear upon these issues. Individuals with TS and age-matched controls have been found to produce
comparable numbers of atypical disfluencies (i.e., stuttering) during passage reading, but tickers did
produce significantly more typical (i.e., non-stuttering) disfluencies relative to their healthy peers (De Nil,
Sasisekaran, Van Lieshout, & Sandor, 2005). In a separate study, adults who stutter produced
significantly more involuntary movements overall (i.e., motor tics) relative to non-stuttering adults
(Mulligan, Anderson, Jones, Williams, & Donaldson, 2003). Basal ganglia architecture is implicated in
both conditions, which further confuses the issue. Putting these issues to the side, however, it is the case
that Tourette’s is not classified as a speech or communication disorder, which indicates that the speech of
Tourette’s tickers is perceived as typical by clinicians.
4
with production of fluent and clear speech. The question is—does it? Results from the three
experiments reported in this dissertation show that production of fluent and clear speech is
possible in the face of frequent ticking thanks to the flexibility of both the tic and speech action
systems, suggesting that each system orients itself with respect to the current circumstance.
Wielding conceptual and analytical tools from linguistic inquiry, this dissertation utilizes the
phenomena of Tourette’s vocal and verbal tics to investigate control over two disparate
behaviors in circumstance.
1.1 Action control in circumstance
Whether explicitly or implicitly, every model or theory of spoken language assumes that the
first step in production of an utterance is intention leading to selection. In this view, behaviors
performed in the absence of a future-facing goal are mere habit. Speech is clearly too flexible to
be considered a habit. But it is also true that speech planning and production processes are
automatic enough to allow talkers to quite literally do it in their sleep. How can something so
complicated be done in the absence of consciousness? How can there be a cognitive selecting
force if the person is not even awake? The problem lies in ascribing inherent properties to action
entities like walking, singing, talking, making breakfast, scratching one’s nose, or going to the
bathroom. In reality, whether action entities are voluntary/involuntary, controlled/uncontrolled,
5
or goal-directed/habitual depends entirely on their function
3
as conceived by the actor in their
appraisal of the current circumstance.
Ecological perspectives hold that behaving is always behaving in some circumstance
(Gibson, 1979). The term circumstance refers to the relationship between what have traditionally
been called affordances of the external environment and effectivities of the internal context (e.g.,
Shaw, 2001). Affordances are the behavioral goals that are available for selection in a given
moment in time. Effectivities are the skills available for adaptive action in concert with whatever
dispositions, motivations, etc., are operant in that moment. Shaw provides a succinct yet
thorough descriptive statement worth repeating:
Affordances control by informing (influencing) the choices the agent makes regarding the
control to apply; conversely, effectivities inform by controlling (selecting and modulating) the
forces the agent applies to produce the behavior that, in the course of unfolding, changes the
availability of affordances. This cycling of perceiving and acting repeats itself until the encounter
intended is either successful or is aborted in favor of other means or other goals.
(Shaw, 2001:280)
Circumstance defined as the combination of affordances and effectivities of the current
moment thus generates candidate goals (affordances) that are “selected” by the agent’s current
effectivities, a complex that includes abilities as well as motivations, dispositions, urges, and
long-term goals or aspirations, etc. (e.g., Shaw, 2001). Going back to the TV-viewer alluded to
3
The definition of biological function adopted here is taken from Justin Garson’s selected effects theory
(Garson, 2016, 2019), which states that an extant trait’s function is whatever the trait does that
contributed to its differential retention in a population. This definition accounts for biological function in
two senses. The first is phylogenetic: a species trait x can be said to have function y in so far as y
contributed to increased fitness in the species in question. The fly-repellent function of a zebra’s stripes
are an example. The communicative function of infant cries in humans is another example. The second
sense in which the definition accounts for biological function is ontogenetic. While infants have an innate
ability to cry, a trait presumable acquired through selective pressures, humans also develop behavioral
traits that acquire particular communicative functions. These points are discussed in greater detail in page
29.
6
above, if the TV is far from the kitchen and they are watching an episode they don’t want to
miss, then making a risotto won’t be afforded upon the appearance of hunger. Instead, preparing
a bowl of cereal or a microwaveable hot meal might be afforded because these food items can be
prepared in the amount of time that a commercial break lasts. The affordances “match” the
circumstance. If the TV-viewer had cereal for breakfast and lunch, they may feel a temporally
local aversion to the food. When combined with the current affordances, this effectivity leads to
selection of the frozen meal. Which action (or action dimension) is attended to is also determined
by the circumstance. If the actor is focused on the game, they may not pay proper attention to
what they do in the kitchen. On the other hand, the actor may opt to pay close attention to
preparing their meal because they know they have precious little time during a commercial
break. Thus, the same task can be performed under more or less control in the sense that more
attention is paid to execution in some instances than others.
Vocal and verbal tics open a door to the study of control in context. Vocal tics are
vocalizations that could be interpreted by an outside observer as purposive actions (e.g., whistles,
throat-clearing, words). But from the perspective of the ticker, tic actions do not correspond to
any current intended action plans. Thus, a ticker that has a throat-clearing tic in their tic
inventory will sometimes perform the relevant vocal-respiratory actions as a tic and other times
as a truly goal-directed instance of clearing the throat. Similarly, a ticker who frequently tics the
word “biscuit” will sometimes use that word in a phrase as part of their daily life.
To summarize, control in the present work is thought of as instances of control. With this
conceptual shift, it is argued that the distinction between tic-words and true-words is that they are
outcomes of different planning and production processes. These processes can (and must)
interact when tickers are speaking while ticking freely.
7
1.2 Skills and their background corrections
Neurophysiologist Nikolai Bernstein proposed a neural hierarchical model to account for the
processes by which skilled movements and actions are “built” or developed. In Bernstein’s view,
control of any task, whether abstract or not, requires involvement by an upper leading level and
the lower-level background corrections that have been assigned by the upper level based on what
the leading level task requires. Once performance of an action (leading level task) becomes
skilled, background correction tasks are not overtly planned but rather function “autonomously”
in service of the upper level task that recruited them (Latash et al., 1996; Profeta & Turvey,
2018).
Bernstein bases his model on the evolutionary history of nervous systems and movement in
the clade Craniata, specifically on the observation that different neural systems evolved to solve
different kinds of movement “problems.” Each level in the hierarchy is identified by the kind of
information available to it (raw or synthesized, depending on the level). Access to particular
kinds of information is the basis of each level’s capacity to solve its particular movement or
behavioral problem. The level of tonus solves the problem of “pre-stress” (Profeta & Turvey,
2018), the level of synergies solves the degrees of freedom problem, the level of space solves the
problem of targeted movement through space, and the level of action solves the problem of
“semantic chain action” or planning.
“Movements follow sensations” (Latash, Turvey, & Bernstein, 1996:75). The statement is a
description of the emergence (in phylogeny and ontogeny) of complex movements. What enables
the level of synergies to maintain “internal consistency” to overt movements is its receiving
proprioceptive signals. The level of space has access to information picked up by the external
sensory systems and memory, hence its capacity to create the space field and support targeted
8
movement. The level of action receives no sensory input, instead having access to what
Bernstein referred to as “general ideas and notions” (Latash, Turvey, & Bernstein, 1996:153).
These general ideas and notions are the accumulated experience of background corrections in
different experiences that make them “smart” (Latash et al., 1996:81).
4
Functional building of movements in Bernstein is envisioned as a top-to-bottom affair
(Profeta & Turvey, 2018) with relatively higher-level actions enlisting background support from
lower-levels. This arrangement is the basis of control of both overt movement and meaningful
action in the human (Latash et al., 1996:150). Over time, background tasks become “smarter”—
they become skilled in providing background support. The unfolding of this process leads to a
reduction in the cognitive resources required to be allotted to performance of the task, which, in
humans, is a desirable outcome. Consider:
There is a constant press for higher level understanding and control of action, but this press
is countermanded by movement to lower levels of identification when the higher level identities
cannot be enacted automatically. (Vallacher & Wegner, 1987:6)
Speech scientists have encountered the principle of background corrections as it is built into
Task Dynamics (Saltzman & Kelso, 1987) and its implementation in Articulatory Phonology
(Articulatory Phonology; Browman & Goldstein, 1989, 1992, 1990). The latter theory endorses
vocal tract constriction tasks called gestures as the primitive phonological informational units in
speech. From this perspective, lexical contrast is a matter of which gestures are present, their
relative order (and coupling), and the particular gestural parameters at play, among other things
4
Bernstein provides a detailed account of how and why the cortex is uniquely positioned to amass
background skill, noting its “[…] unlimited capacity for accumulating the life experiences of an animal”
(Latash et al., 1996:81).
9
(Byrd & Krivokapić, 2021).
Lexical contrast is a cognitive process and so gestures are cognitive action entities,
spatiotemporal tasks led by Bernstein’s level of space, whose background corrections are led by
functional synergies that enact overt movements of the vocal tract articulators. Gestures bridge
the gap between the cognitive (i.e., constriction tasks/goals) and non-cognitive (i.e., coordinated
muscle activity resulting in overt movement) because the leading level task is cognitive, but the
background tasks are non-cognitive. Relatedly, they are the lowest level or atomic phonological
task/goal in operation, which entails that they will almost always function to service relatively
higher-level tasks/goals. If this is the case and the leading level task is skilled, then constriction
tasks are rarely if ever being attended to—they are not under direct cognitive control. Instead,
they are smart automatisms, the foundation for higher-level thinking. Every action link or
component that forms part of an action is itself an independent motor act at one of the lower
background levels or it arose from an independent motor act that was used in the past In sum, the
study of speech dynamics benefits from a serious commitment to ecological principles because
these principles connect the usual lower-level objects of study (e.g., gestures, syllables) to the
higher-level actions that are piloting those lower-level cognitive entities. In sum, every task is
under some form of control; the question is whether the task in question is a leading-level or
background-level task in the context in which it is being studied. This statement clears the path
to discuss the phenomena of vocal tics.
1.3 What is a vocal tic?
In terms of their manifestation, vocal tics are actions of the vocal tract articulators that result
in audible sound. All vocal tics are mimics of purposive action, even non-verbal vocal noise tics
10
such as clicks, chirps, yips, grunts. Adults with Tourette’s frequently produce tics that mimic
words and phrases; these are referred to as verbal tics here and elsewhere. Isomorphism with true
words imbues verbal tics with unintended semantic content. It also creates a methodological
opportunity for study as isomorphism of verbal tics and speech renders the primitive units of
ticking and speaking commensurate. A loose assumption in this dissertation is that a verbal tic
shares a gestural score with its true word analogue, in that like articulatory actions are
performed. In essence, an instance of a verbal tic amounts to enacting of lexical contrast in the
absence of communicative intent.
Tic production has its own system-specific temporal dynamics. This topic is broached by
initially limiting discussion to instances where tic actions effectively lead all overt behavior—
ticking that occurs during sleep. Sleep ticking has been observed in children and teenagers
(Eapen & Črnčec, 2009) as well as adults (Jankovic, 1997) during all stages of sleep (including
long-wave sleep), though the frequency and vigor of tics appears visibly attenuated relative to
daytime ticking (review in Cavanna et al., 2017). That there is no tendency for tickers to be
awoken by their own sleep-tics is interpreted as an indication that some instances of ticking are
“truly” subconscious and involuntary (e.g., Cavanna et al., 2017:100). Here it is suggested that
tics are the prepotent task under those circumstances because the body is otherwise non-
behaving. Tic tasks will also occupy a relatively high position in the overall action hierarchy
during free (unsuppressed) ticking in a waking state while otherwise “doing nothing.” One of the
few studies aimed at determining whether there exists any systematicity to tic timing recorded
tickers sitting alone in a room and coded the temporal occurrence of tic events, finding that the
distribution of inter-tic interval durations followed an inverse power law scaling; analyses of
inter-tic interval duration timeseries showed that the timing of tic occurrence was non-random
11
(Peterson & Leckman, 1998). An animal model that uses focal striatal disinhibition to induce tic-
like stereotypies in freely behaving mice has addressed the topic of tic timing (Bronfeld, Yael,
Belelovsky, & Bar-Gad, 2013; McCairn, Bronfeld, Belelovsky, & Bar-Gad, 2009a, 2009b).
Using micro-stimulation of motor cortex before and after striatal disinhibition, (Israelashvili &
Bar-Gad, 2015) examined the factors underlying the timing of individual instances of ticking,
finding that the precise time of the tic expression depends on the interaction between the
summation of incoming excitatory inputs to the striatum and the timing of the previous tic.
The picture of control over ticking is complicated by willed self-suppression of tics, thereby
introducing urge dynamics to consideration. Experiencing premonitory urges can instigate in
tickers a sense of agency with respect to their subsequent tic response, but the temporal
occurrence of urges cannot be anticipated, and tics will eventually “burst out.” Whatever patterns
of ticking there may be, the timing of tics is also subject to affordances and effectivities of the
current circumstance. Specifically, situational over/under-stimulation, frustration, lack of
sleep/fatigue, and anxiety have all been linked to acute increases in tic frequency (for an
excellent review on this topic, see Godar & Bortolato, 2017).
All of these facts about ticking have earned them a unique position in the
voluntary/involuntary dichotomy (or scale), meriting the labeling of tics as “semi-voluntary”
(Jankovic, 1997; Jankovic & Kurlan, 2011). The ecological-dynamical perspective reframes
categorical labels in terms of instances of control; a task can in one instance occupy the leading
level guiding action and in another, act to serve its master. To provide a concrete example of the
preceding, consider a specific set of circumstances. Simplifying, a ticker has put herself in an
external environment that limits her candidate goals moving forward to either continued
participation in an experiment, on the one hand, or exiting the room and abandoning the study,
12
on the other. The latter goal is afforded by the presence of the door and the participant has
effectivities that can get her there (i.e., ability to walk to the door); however, the intention to
leave mid-experiment does not arise. This can be explained if a larger set of effectivities are
considered: active interest in participating for whatever reason, an open schedule (i.e., boredom),
wanting to do a favor for a friend. Effectivities to leave could end up taking over if, say, the fire
alarm in the building goes off. In the scenario just described a talker who tics and is willing to
follow the experimenter’s instructions to talk while ticking freely, will participate. In theory,
these experiment situation instructions are enough to instigate construal of a co-speech ticking
task that enlists ticking and speaking to achieve its aims.
1.4 Preview of dissertation
Tickers in these studies performed speaking tasks while ticking freely, a set of circumstances
that can drive the tasks of ticking and speaking to interact. In theory, cognitive control processes
who tic will have developed coping strategies aimed at addressing issues posed by ticking freely.
The studies in this dissertation investigate hypothesized avenues through which optimization of
co-speech ticcing could manifest. The distribution of vocal tics is examined in order to determine
whether tic urge satisfaction can “cooperate” with ongoing speech planning and production
processes with respect to the timing of their occurrence. Ticking shows sensitivity to prosodic
structure by preferentially occurring marginal to prosodic phrases, rather than intruding within
them. Second, stressed vowel phonatory acoustics in verbal tics are compared with those of
speech to determine whether the two vocalization types are reliably distinguished along acoustic
parameters that index the distinction between modal and falsetto voice. Tics and speech are
produced on distinct acoustic channels, both in global terms and in the sense that there are
13
“jumps” from modal to falsetto register (and back) across speech+tic (and tic+speech) junctures.
Finally, low token-to-token variability in duration of a specific individual’s most-frequent verbal
tic (“biscuit”) suggests that verbal tics are excluded from linguistic phrasal groupings. In speech,
words are grouped into phrases in service of an intended message; tics should be excluded by
default because they contribute unintended semantics.
The first experiment (Chapter 2) demonstrates that the temporal occurrence of vocal tics is
sensitive to the timing of linguistic events (i.e., beginnings and ends of prosodic phrases),
providing support for the hypothesis that verbal tics are orchestrated with respect to prosodic
phrases by an over-arching task that coordinates the two in time. Central to this interpretation is
the view that phrases are combinatorial action units encoding message (i.e., composed meaning)
goals. Assuming that which action entity (what granularity) serves as prepotent speech task at
any given moment is determined by the relationship between the affordances and effectivities at
play in the current circumstance, it is theorized that instances of coordinated co-speech ticking is
orchestrating tics with respect to running speech (i.e., sequences of phrases) rather than
individual words (or other linguistic unit). This is one way that the tic and speech systems
interact cooperatively to efficient ends.
Experiment 2 (Chapter 3) presents evidence that phonation in verbal tics is underpinned by
falsetto mode of laryngeal vibration in contrast to typical phonation in (English) speech. It was
hypothesized that anticipation of potential problems (e.g., correctness) prompts the over-arching
communication task to relegate verbal tics to a phonation using a separate falsetto acoustic
channel as a means of segregating intended dialogic meaning from unintended intrusive
meaning. Even with optimized coordination in time between verbal tics and prosodic phrases, the
former can imbue utterances with unintended semantic and/or referential meaning.
14
Finally, Experiment 3 (Chapter 4) shows that the acoustic duration of verbal tic “biscuit”
does not vary as a function of proximity to phrase boundaries. In theory, constrictions for a
verbal tic aren’t subject to modulation by prosodic gestures that instantiate phrase boundaries in
speech because the coupling between prosodic gestures and the constriction gestures they
modulate is established by the selecting force of communicative intent—by the phrase-level
message goal. Lexical items can contribute to conveying intended messages if they are selected
to do so, but instances of verbal ticking are cases where a lexical item does not function to
communicate. While “biscuit” tics weren’t impacted by phrase boundaries in a speech-like way,
“biscuit” tics that were separated from speech (on either side) by pauses were of significantly
shorter duration relative to “biscuit” tics that were close to speech, suggesting that tic and speech
tasks are blending as part of the systems’ interaction.
15
Chapter 2. The actions of ticking and speaking are coordinated in time
2.1. Introduction
Ticking and speaking can come into conflict if the two behaviors are not strategically co-
orchestrated. The utterance in Figure 1 includes four instances of a “clicking” vocal noise tic.
The utterance gives the appearance of ticking that took precedence over speaking, wresting
control over vocal effectors while the talker was mid-word. Perhaps the urge to produce these
tics had been accumulating for some time prior to the eventual outburst illustrated in the figure;
perhaps the urge appeared suddenly but was very strong. It is also possible that a word-break
occurred and vocal noise tics took advantage of the opening. Though causality cannot be
attributed with confidence, the observed outcome amounts to apparent tic-related disfluency. As
is the case with disfluencies in general, this outcome is dis-preferred.
Figure 1. Spectrogram and transcript of verbal tics and read speech. Pitch track overlaid in cyan.
Green text and shading represents word that hosted phrase-final boundary tone. Red text and shading
indicate vocal noise tics. Tics followed first attempt at word day which resulted in a break that was
subsequently corrected.
Adaptive action in the circumstances under study—speech production in a free-tic state—
requires the actions of ticking and speaking to be systematically coordinated in time. The
16
example depicted in Figure 1 is taken from a passage reading; the word day is a sentence-final
word. The outcome observed in this instance is sub-optimal because disfluencies were present
(i.e., word break and repetition). If tic production is “blind” to ongoing speech planning and
production processes, there is nothing to prevent such outcomes from occurring; this point is
revisited in the following section. A preferred outcome could be for production of the four
consecutive vocal noise tics to occur after the end of the word “day”. More generally, if it is
possible to avoid self-interruptions by coordinating the temporal occurrence of tics with speech
planning and production in some way, then it is preferable to do so. Coordinated ticking and
speaking serves an integrated and relatively higher-level “co-speech ticking” goal. As such, it is
a local optimization strategy that may be available if cognitive resources are diverted from self-
suppression of tics to adaptive coordination of ticking and speaking. This experiment
investigates whether ticking and speaking do (main hypothesis) or do not (null hypothesis)
interact in a way that supports both tic and speech tasks being optimally achieved during
naturalistic co-speech free ticking. Specifically, the prediction is tested that optimized tic
occurrence will be associated with moments in time that find the vocal tic outside the bounds of
neighboring prosodic phrases.
Ticking around prosodic phrases is envisioned as a coping mechanism: ticking around one’s
speech is preferable to “blind” ticking that potentially intrudes on and interferes with ongoing
speech plans. Researchers who study the neurobiological underpinnings of Tourette’s have
speculated that compensatory strategies aimed at mitigating the possible negative impacts of
ticking must emerge at various levels of description (e.g., Brandt et al., 2014; Leckman,
Vaccarino, Kalanithi, & Rothenberger, 2006; Pogorelov, Xu, Smith, Buchanan, & Pittenger,
2015), but concrete proposals on this topic are lacking.
17
One illuminating study speaks to the larger question of compensatory strategies in Tourette’s.
Schüller et al. (2018) found that adolescents and adults who tic performed better than
neurotypical age-matched controls in task-set switching. In task-set switching studies,
participants are asked to perform some operation using one algorithm, then told to switch to a
different algorithm in order to see the costs of switching cognitive modes. In the Wisconsin
sorting task (e.g., Lange, Seer, Müller-Vahl, & Kopp, 2017), for example, participants switch
between sorting cards by suit and sorting cards by color. Switches typically cause an increase in
likelihood of errors in the first post-switch trial, as well as a slowdown in reaction time in that
first trial. The Schüller et al. (2018) study showed that these costs are somehow mitigated in
adults with Tourette’s. Furthermore, structural brain scans showed anatomical differences
between Tourette’s and control groups within regions implicated in cognitive control (Schüller et
al., 2018). Authors interpreted these findings as indicating that neuroplastic compensatory
changes have occurred over the course of disease development. Similarly, it is proposed here that
adaptive prosodic grouping that sequences tics and prosodic phrases relative to each other should
be observed in adult co-speech ticking.
Adaptive co-orchestration of ticking and speaking is theorized to be a manifestation of an
over-arching task/goal that acts to coordinate tics and speech actions in time, but why are
prosodic phrases the relevant speech task level? Why not units corresponding to other levels of
the prosodic hierarchy? Or other kinds of linguistic structure altogether, like words? In short,
prosodic phrase tasks/goals are the focus here because they are envisioned as the prepotent
(leading level) communicative action entity at play in the circumstances under study. Relatively
lower tasks in the current action hierarchy (e.g., gestures, syllables) only serve the prepotent
action and cannot be considered independent actions with independent goals; words making up a
18
composed message are not in of themselves the intended message.
5
Byrd and Krivokapić (2021:45) noted that “[…] prosody must be modeled with an eye to its
social context.” It is in the context of communicative function in a particular circumstance that
prosodic phrases are distinguished from other levels of the prosodic hierarchy. The definition of
biological function adopted here is taken from Garson’s selected effects theory (Garson, 2016,
2019), which states that an extant trait’s function is whatever the trait does that contributed to its
differential retention in a population. Traits are retained in populations of animals across
evolutionary time-scales if they contribute to the evolutionary fitness of individuals with those
traits (over individuals lacking them). Zebra stripes can be said to function to repel tse-tse flies
because extant zebras’ being striped is a result of stripes contributing to their ancestors’ chances
of survival. Ontologically, an individuals’ traits at different levels of description are differentially
retained to the extent that they are associated with reward. For instance, Garson (2019) notes that
synaptic connections in the brain are retained or pruned based on how much or how often these
connections are selected (i.e., used) to obtain (rewarding) effects. In the domain of behavioral
traits, the ability to convey messages (composed of more than one lexical concept) is a trait of
prosodic phrases, not of gestures, syllables, feet, or individual lexical items. That adults plan and
produce speech in phrases (as opposed to word-by-word) is presumably a consequence of
phrasemaking having better served communication in most circumstances throughout their
lifespan, relative to sequential transmission of individual lexical concepts. In other words,
prosodic phrases have acquired the function of communicating composed messages. In contrast,
structures that occupy lower levels of the prosodic hierarchy, like the foot, cannot on their own
5
Which is not to say that lower-level tasks cannot themselves occupy the prepotent level in another
instance under different circumstances.
19
convey composed meaning—they can only do so if it so happens that the intended message can
be expressed in a single-foot utterance. Prosodic feet, or perhaps more appropriately, the act of
syllable footing, have acquired a function that serves a higher-level phrasal task/goal. In the view
advanced here, because syllable footing is a background correction to phrase-making, then tics
aren’t expected to preferentially occur .
To summarize, it is argued that prosodic phrases, specifically, function to communicate
linguistic messages under most circumstances and that for this reason, optimal tic/speech
coordination should occur at the level of the prosodic phrase. It is taken for granted that the
presence of an intonational contour characteristic of an intonational phrase in acoustic recordings
of speech indicates that the talker intended to communicate whatever (composed) linguistic
message was associated with that contour (for languages that use intonational prosody). This is
the same as saying that there was a discrete message goal which a phrasal task is construed to
achieve, which is to say that the entire phrase is one communicative (prepotent) action. In the
circumstances under study, skilled phrasemaking is the effectivity that rises to meet the demands
of communicative goals.
From the perspective of tic urge satisfaction and given that tics must occur in relative order
with some speech unit, if tic occurrence were “blind” to ongoing speech planning and production
it might “prefer” to occur as often as between every gesture or syllable or any other relatively
lower (shorter timescale) action entity because that would enable more frequent tic urge
satisfaction. But because the message conveyed by a prosodic phrase unfolds as the articulatory
action unfolds and is not contained within any single component of the larger action sequence,
individual gestures or words are not intended to be interpreted on their own (Gibson, 1979;
Latash et al., 1996; Profeta & Turvey, 2018; Shaw, 2001). Thus, tics occurring inside phrases are
20
instances of failed (or non-optimal) phrase production.
2.1.1. Hypotheses
According to ecological-dynamical assumptions, tic actions should be ordered relative to the
prepotent communicative speech task level. The prepotent communicative action entities at play
in this experiment were reading, describing pictures, and narrating a personal story. These
communicative circumstances do not afford production of individual words but rather production
of phrases that convey “whole” messages (i.e., composed meaning). It is of course possible for
word production to be the prepotent speech task at play when a ticker is naming pictures or
reading single words presented visually, for instance; word production is the only available
prepotent speech task during the word-learning stage of linguistic development. Arguably,
however, even single-word responses by adults are prosodified into single-word phrases. Which
is to say that adult talkers speak in phrases, not single words. The communicative goals of the
speakers in this study are thus best served by coordinating ticking with prosodic phrase units,
specifically intonational phrases because they are speakers of English. This prediction can be
verified empirically by determining whether tics occur in positions outside intonational phrases
more often than predicted by chance. But how is chance defined? The approach taken here is to
take stock of the “positions” available for tic occurrence and their status with respect to prosodic
phrase structure.
Figure 2 presents schema of five-word phrases, each with a single tic occurring in one of the
21
word-adjacent positions.
6
Positions can be counted from left (earliest) to right (latest) starting
with the left-adjacent position before the first word and ending with the right-adjacent position
after the last word (indicated by B because this word hosts a phrase-final boundary tone). Four
out of the six available positions are inherently interruptive because they are within the bounds
of the phrase; only the two positions on the margins are inherently cooperative. Aside from
phrases containing one to three words there will always be at least one more interruptive
positions than there are words. As a consequence, interruptive positions will (almost) always
outnumber cooperative ones, meaning that chance predicts relatively frequent phrase-level
interruptions by tics, at least for phrases longer than four words.
__ W T W __ W __ W __ B __ Tic interrupts
__ W __ W T W __ W __ B __
__ W __ W __ W T W __ B __
__ W __ W __ W __ W T B __
__ W __ W __ W __ W __ B T Tic cooperates
T W __ W __ W __ W __ B __
Figure 2. Schematic representation of ways one tic can interrupt (red shading) and cooperate (blue
shading) with the production of a five-word intonational phrase (IP). Each row represents a single
Intonational Phrase (IP). W = word; T = tic; B = word hosting phrase-final boundary tone. Underscores
indicate potential tic hosting sites. The left-most W in each chunk is phrase-initial. Red T tic events have
occurred in interruptive positions; blue T tic events have occurred in cooperative positions.
To summarize the foregoing discussion, the prediction that cooperative tic/speech interaction
can manifest in the relative ordering of vocal tics and prosodic phrases is a claim about the
6
Note that quantification of available interruptive positions inside phrases could have been done on the
basis of syllables, feet, etc.; here positions available for ticking are defined as word-adjacent positions.
The number of interruptive positions is thus dependent on this choice. The number of cooperative
positions per phrase, in contrast, remains the same regardless of how the interior of the phrase is broken
up.
22
primitive “message” units of spoken communication in the circumstances under study—to tic
around speech, it is enough to tic around phrases and avoid interrupting phrases.
The null hypothesis that tic occurrence ensues independently from ongoing speech planning
and production processes predicts that co-speech ticking will result in frequent phrase-level
interruptions or intrusions simply because there is more available “space” inside phrases than
outside them. Finding that tics tend to occur outside of phrases would indicate that ticking is
sensitive to prosody, suggesting that ticking and speaking are co-planned.
2.2. Methods
The hypothesis that tics occurring close to speech (in time) are sensitive to the (intonational)
phrasing of utterances around them is tested by subjecting to linguistically informed
distributional analysis a corpus of acoustic recordings of adult tickers performing a variety of
speech tasks while ticking freely. Procedures and protocols are as follows.
2.2.1. Case studies
Data was collected from three female adults and one male. Their ages fell within the 22-65
range. All are speakers of British English. Participants received a Tourette’s diagnosis from a
neurologist and reported no voice, speech or hearing disorders. A patient advocacy group
familiar with the Tourette’s community in the United Kingdom, called Tourette’s Hero,
facilitated recruitment of appropriate participant volunteers. Individuals whose daily ticking
patterns could be characterized by those who knew them as including “very frequent vocal and
verbal tics” were contacted by the organization and directed to the researcher’s IRB-approved
recruitment materials. A condition of participation was a willingness to tic freely throughout the
23
study, that is, to refrain from suppressing one’s own tics. Participants were asked to start ticking
freely immediately upon arrival.
2.2.2. Recordings
Audio recording took place in a small, quiet conference room with mild sound attenuation.
The room was equipped with a built-in system from audio visual presentations that was used to
present visual materials to the participants, who sat at the room’s table facing the screen. The
study was modelled as a sociolinguistics-style interview with the investigator present in the room
(e.g., Eckert & Labov, 2017). Verbal prompts by the investigator elicited performance of four
different speech tasks within each of three task types (see Table 1). Task-specific Speech tasks
were blocked so that each block included a task from each type (Readings, Pictures, Narratives).
The order of tasks within each block was randomized but block order was fixed across
participants. Participants were outfitted with a Shure head-worn microphone adjusted to be about
two inches from the corner of the mouth immediately after providing consent. The acoustic
signal was routed through an M Audio USB hub to a personal laptop running Audacity software
(Audacity Team, 2017) which performed the recording.
An automatic speech to text service Otter.ai generated transcripts for each task recording.
These were subsequently manually corrected. Vocal tic noises were transcribed as “NS” during
manual correction following conventions used by the Penn Phonetics Lab Forced Aligner (P2FA;
Yuan and Liberman 2008). P2FA was fed corrected transcripts and trimmed audio recordings for
each task to automatically generate phone and word level segmentation in Praat (Boersma & Van
Heuven, 2001). This segmentation was corrected manually according to standard phonetic
conventions. Tics and true words are both represented on the phone and word tiers of Praat text
24
grids. Thus, tic and word intervals were labelled according to their Vocalization Type (Tic vs.
Word) on a separate tier.
Table 1. Monadic speech tasks and the verbal prompts that elicited them.
Task Type Prompt Task Name
Personal
Narrative
“Tell me about a time when you felt
extreme ______.”
Joy narrative
Embarrassed narrative
Proud narrative
Sad narrative
Picture
Description
“Describe the picture you will see on the
screen in as much detail as possible.”
Pool party scene
Park scene
Beach scene
Animal house scene
Passage
Reading
“Read the passage that will appear on the
screen.”
Rainbow passage
Grandfather passage
Northwind passage
Comma gets a cure passage
2.2.3. Segmentation and labeling
Long recordings comprising more than one speech task were trimmed. The start of a speech
task was defined as the moment at which the investigator finished uttering the verbal prompt
corresponding to its Task Type. The end of a task was defined as the end of the participant’s
statement indicating completion of the task. Participants reliably produced these statements
unprompted. Any vocal tics occurring within the bounds of a task so defined constitute the set of
all co-speech tics for that task, but not all co-speech tics occurred at temporal distances close
enough to speech to examine prosodic sensitivity.
Ultimately, protocols for segmentation and labeling are designed to extract counts of
interruptive/cooperative positions that did and did not contain tics for chi-square analysis. In
25
addition to phone and word level segmentation, the method of obtaining these counts requires
two further levels of segmentation: utterance “chunks” lacking internal pauses and the
intonational phrases within their bounds.
Segmentation of pauses and utterance chunks
To examine the extent to which tic responses show sensitivity to grouping of nearby words
into intonational phrases, vocal tics that occurred during pauses must be excluded because they
don’t have words in their temporal vicinity. Long intervals of acoustic silence are identified in
order to break up each speech task recording into utterance “chunks”—intervals of speech that
contain no pauses internally. Pauses are defined here as silent intervals lasting at least 250
milliseconds.
7
They were automatically labelled via Praat scripts developed for the purpose of
locating speech pauses in English and Dutch acoustic recordings (De Jong, Pacilly, & Heeren,
2021). Automatic pause segmentation subsequently underwent manual correction. Different
possible pause categories recognized in the literature are not distinguished.
Chunks are classified into one of three types defined in terms of the vocalizations that they
contain. Tic-only chunks are comprised entirely of vocal tics, speech-only chunks contain only
7
Any intervals of acoustic silence occurring within the bounds of a chunk are conceptually discounted
following early work suggesting silences shorter than 250 milliseconds represent so-called articulatory
“gaps” (Goldman-Eisler, 1958). Though different durations to minimally characterize a pause have been
proposed, the definition employed here is used in research and clinical settings; for example, counts and
locations of pauses are used as a metric for fluency in an L2 (De Jong, Groenhout, Schoonen, & Hulstijn,
2015). Note that both filled pauses (um, uh) and unfilled pauses are included in the broader pause
category by the researchers that developed the Praat scripts used in the present study. However, for the
purposes of this study, filled pauses are like true words in that their production precludes simultaneous
production of vocal tics. Therefore, filled pauses are considered words in the present analysis, forming
part of speech-only or mixed chunks. The question of whether filled pauses um/uh are word-like in terms
of their planning and/or communicative function (e.g., Clark & Fox Tree, 2002; Rose, 1998) is a separate
issue and no position is taken with respect to that here.
26
true words, and mixed chunks are composed of some combination of vocal tics and true words.
Tics occurring in mixed chunks are coded as speech-proximal in contrast to tics forming part of
tic-only chunks which are coded as distal. Only counts extracted from chunks containing speech
(i.e., speech-only and mixed chunks) are subjected to analysis; tic-only chunks and the distal tics
they contain are excluded.
For reference, Figure 3 depicts an example of pause segmentation that generated two speech-
only chunks (leftmost and rightmost blue rectangles) and one tic-only chunk (center blue
rectangle). The tic-only chunk is made up of a single tic, the distal tic funking.
8
Figure 3. Spectrogram and transcript of utterance produced by Participant B during a personal
narrative. Pitch track overlaid in cyan. Three chunks (outlined in blue) result from segmentation of two
pauses lasting 499 (left) and 289 (right) milliseconds. The interval corresponding to the distal tic funking
is shaded in blue. Phrase-final words and their corresponding acoustic intervals are in green.
Intonational phrase segmentation and interruptive/cooperative positions
Counts submitted to chi-square analysis are of positions available for tic occurrence in
chunks containing speech. Whether a position is inherently interruptive or cooperative depends
on its location relative to intonational phrase (IP) boundaries in its chunk. Chunks in the
8
Distal tics are inherently cooperative as pausing provides a buffer which precludes interference with
surrounding IPs. Distal ticking is therefore indicative of some sort of prosodic sensitivity.
27
collected samples have an unpredictable number (and size) of intonational phrases, so IP edges
must be located empirically. This task was operationalized in terms of identifying chunk-internal
words that hosted phrase-final boundary tones. These words serve as proxies for the phrase-final
boundary itself.
IP-final words in varieties of British and American English often host tones that mark the
phrase’s final boundary commonly known as IP-final boundary tones. It is assumed here that
British English speakers may produce three kinds of phrase-final boundary tones: final falls,
continuation rises, and tonal plateaus. The three-tone boundary system arises in the context of the
Intonational Variation in English (IViE) corpus designed to document and characterize
intonation across varieties of British English; researchers modified existing ToBI (mono-
dialectal) conventions for English to enable analysis of prosodic features across a variety of
Englishes (Grabe, 2000). While IP-final falls and continuation rises are well-established
constructs in the language and speech sciences, it is worth explaining the tonal plateau boundary
tone category further.
9
Phrase-final tonal plateaus in British English varieties occur variably in
the place of more “canonical” continuation rises and are described as truncated continuation rises
(Grabe, Post, Nolan, & Farrar, 2000). Visually, these tonal events appear as a continuation of the
phrase’s pitch accent till the phrase-final edge. Phrase-final boundary tones were reliably
produced by all participants in this study. Figure 4 shows examples. The Praat analysis windows
displaying acoustic data were always sized to depict 6.5-7 seconds of material before
identification and labelling of boundary tones was performed to maintain consistency in visual
9
The debate about whether or not English tonal plateaus finalize an intermediate phrase as opposed to a
“full-fledged” IP is immaterial to the discussion here because whatever the case may be, the prosodic unit
in question meets the necessary criteria of simultaneously transmitting a complete message and being of
relatively short duration.
28
inspection.
Figure 4. Spectrogram and transcript of utterances demonstrating phrase-final boundary tone
categories for varieties of British English in the IViE transcription system that were to delimit
intonational phrases inside chunks containing speech. Pitch track overlaid in cyan. Blue shading over
verbal tic. Green shading over words hosting a boundary tone. Top panel: continuation rise on word light
and final fall on word colors. Bottom panel: two instances of tonal plateaus on children and happy.
Phrase-initial edges were not coded explicitly because their precise location isn’t needed to
obtain necessary counts (details below). However, the presence and location of phrase-initial
boundaries in a chunk can be inferred on the basis of phrase-final boundaries. Specifically, any
true word that immediately follows an IP-final word inside a chunk is necessarily an IP-initial
word; the first word in a chunk is also necessarily IP-initial.
Once the IP structure is determined for each chunk, positions can be evaluated for their status
as interruptive/cooperative and for whether a tic did or did not occur there. Speech-proximal tics
are coded as interruptive or cooperative on the basis of their occurrence inside or outside IPs in
29
their chunk, respectively. This step is performed manually in Praat. Tics forming part of a
sequence of consecutive tics inherit the coding of the first tic in the sequence. Thus, if a tic event
interrupted an IP, any intervening tics between itself and the resumption of true speech are also
counted as interruptive, or vice versa.
To illustrate how segmentation of chunks into IPs enables evaluation of the
interruptive/cooperative status of instances of vocal ticking, examples of cooperative tic-speech
interaction are depicted in Figure 5. A sentence found in The Rainbow Passage is pictured, read
by two different participants: “Some have accepted it as a miracle without physical explanation.”
Participants uttered this sentence using matching IP structure: a single mixed utterance chunk
(Figure 5) made up of two IPs, the first ending in the word miracle and the second ending in the
word explanation. Participants marked phrase-final boundaries in these chunks differently:
Participant C’s phrase-final boundaries were identified as the continuation rise hosted by miracle
and the final fall hosted by explanation, while Participant A produced final falls in both cases.
Two tics occurred for both participants. Participant A read the sentence in its entirety before
producing the speech-proximal tic hello immediately following the phrase-final word
explanation. A pause follows, separating the mixed chunk from the subsequent tic phrase fuck a
duck. Participant C produced tics immediately following each of the two IP-final boundaries.
Both participants demonstrate ticccing that is sensitive to prosodic structure, appearing to
interact cooperatively with speech. However, both of Participant C’s tics enter into the analysis
while only one of Participant A’s tics does.
30
Figure 5. Spectrogram portion of The Rainbow Passage read by Participants C (top) and A (bottom).
Transcripts above spectrogram in rough time alignment. Pitch tracks overlaid in cyan. Tic intervals are
shaded in blue and they are transcribed in blue. Phrase-final word intervals are shaded in green and
transcribed in green.
2.2.4. Statistical analysis
Planned statistical analyses aim to determine whether speech-proximal tics occurred outside
intonational phrases more often than expected by chance given the relative availability of
between-phrase positions that are inherently cooperative and within-phrase positions that are
inherently interruptive. To achieve this, variables Tic Presence (tic present vs. tic absent) and
Position Status (interruptive vs. cooperative) are cross tabulated.
Interruptive positions are positions within phrases, while cooperative positions are positions
outside of phrases. A vocal tic either did or did not occur in each position. Thus, every position
flanking true words is either [+tic present,+interruptive], [+tic present,-interruptive], [-tic
present,+ interruptive], and [-tic present, -interruptive]. An overall chi-square statistic calculated
for the resulting 2x2 cross-tabulation of position-counts tested the null hypothesis that the
31
occurrence of a tic in some position is not dependent on the cooperative status of that position.
Statistical analyses were carried out in the R coding environment (R. C. Team, 2021) using the
statistics packages rstatix (Kassambara, 2020) and corrplot (Friendly, 2002; Murdoch & Chow,
1996). The package rPraat (Bořil & Skarnitzl, 2016) was used for text grid management in R.
The protocol for obtaining counts is described next. At this stage, speech-proximal tics have
been manually labeled as interruptive or cooperative instances of ticking. Labels and intervals
from Praat text grids were extracted using custom R scrips. Counts are obtained by formula for
each chunk using the following quantities: the number of true words in the chunk and the number
of IPs that those words are grouped into. Those quantities are also extracted directly from Praat
text grids in the R environment.
The number of words w in each chunk is obtained automatically using custom rPraat scripts.
For each chunk, the count of positions p in which tics could occur is equal to w + 1. For each
participant dataset, the total number of available positions is equal to the sum of all ps across
tasks.
Chunks containing speech are always associated with at least two cooperative positions—the
position immediately before the chunk-initial word and the position immediately following the
chunk-final word. Chunk-initial and chunk-final words represent transitions out of or into pauses
and so represent phrase edges by default. In addition, every chunk-internal word that hosts a
boundary tone b contributes one cooperative position; that is, the position that is right-adjacent to
the b. In cases where a chunk-final true word does host a boundary tone, b is equal to the total
number of IPs. The number of cooperative positions in these chunks is therefore b + 1, where the
+1 is capturing the cooperative position that is left-adjacent to a chunk’s initial word. In the case
of mixed chunks whose final word does not host a boundary tone, b is not equal to the total
32
number of IPs because the quantity does not factor in the implicit boundary associated with
chunk final words. The number of cooperative positions in these chunks is instead given by b + 2
which accounts for both the pre-chunk and post-chunk cooperative positions.
In Figure 6, a schema is provided for the protocol for obtaining counts for chi-square
analysis. The chunk represented on the first row, for instance, contains five words and thus 5+1
total positions as indicated by six underscores. The chunk contains two boundary words, one of
which is the chunk-final word, making the number of cooperative positions 2+1 (positions
shaded in blue). In contrast, the chunk represented in the third row contains five words and six
total positions but only one instance of a phrase-final boundary tone; the chunk-final word does
not host a boundary tone. Accordingly, the number of cooperative positions out of the six total
positions is 1+2.
Total
Positions
Cooperative
Positions
6 3 ___ W ___ W ___ B ___ W ___ B ___
5 2 ___ W ___ W ___ W ___ B ___
6 3 ___ W ___ B ___ W ___ W ___ W ___
5 2 ___ W ___ W ___ W ___ W ___
7 3 ___ W ___ W ___ B ___ W ___ W ___ B ___
Figure 6. Schema of procedure for counting positions. W represents words that do not host boundary
tones, B represents words that do. Underscores indicate possible positions for tic occurrence. Positions
that are inherently cooperative are highlighted in blue.
The variable that is crossed with Position Status for chi-square analysis is Tic Occurrence.
This variable is equal to the count of interruptive/cooperative tics across chunks and across tasks
that were extracted in a preceding step of the method. These counts appear in the left and right
cells in the Tic Present row of Table 2, respectively. The number of Interruptive positions in
which tics did not occur is equal to the difference between the number of interruptive positions
33
and the number of interruptive tics. Likewise, the number of Cooperative positions in which tics
did not occur is equal to the difference between the number of cooperative positions and the
number of cooperative tics. These counts are submitted to chi-square test of independence.
Separate chi-square tests were performed for each task type in the case of Participants B and C.
Table 2. Two by two contingency table of counts analyzed using chi-square test of independence.
Interruptive
Position
Cooperative
Position
Tic
Present
Interruptive tic count Cooperative tic count
Tic
Absent
Interr. position count – Interr. tic count Coop. position count – Coop. tic count
The image in Figure 7 depicts a mixed chunk followed by a tic-only chunk separated by a
pause; the tic-only chunk is excluded from analysis. The mixed chunk contains fourteen true
words and fifteen total positions where tics could possibly occur. Two words host a phrase-final
boundary tone, one of which is the chunk-final word, making three inherently cooperative
positions available. Two verbal tics are present in the chunk, each of a different kind. One
speech-proximal tic “biscuit” (pictured in red) is located in an inherently interruptive position
inside the bounds of an IP. Speech-proximal tic sausage (pictured in blue), on the other hand,
does not occur in an intrusive position but rather immediately following an IP-final boundary; it
has occurred in a cooperative position. (The tic would have also interacted cooperatively with the
IP if it had occurred immediately following gold.)
34
Figure 7. Spectrogram of portion of The Rainbow Passage. Transcript of utterance above spectrogram
in rough time alignment. Pitch track overlaid in cyan. Acoustic intervals corresponding to the phrase-final
words gold and rainbow shaded in green; these host a continuation rise and a final falling boundary tone,
respectively. Three verbal occurred (from left): “biscuit”, in red, is speech-proximal and constitutes an
interruption; sausage, in blue, is speech-proximal and cooperative. The verbal tic hey is distal from
speech.
To summarize, an intonational phrase has only two inherently cooperative positions available
for tic occurrence, but the number of interruptive positions increases as the number of words
increases. Chunks containing speech are generally expected to contain more interruptive
positions than cooperative ones, putting one-, two-, and three-word IPs to the side. The mixed
chunk in the example above, for instance, contains twelve interruptive positions and three
cooperative ones. These facts imply that a tic produced in the temporal vicinity of ongoing
speech has relatively few opportunities to occur without interrupting IPs. The null hypothesis that
Tic Occurrence and Positions Status are not associated predicts that speech-proximal tics will be
more likely to occur in relatively more frequent interruptive positions. The hypothesis that
ticking and speaking can interact cooperatively by coordinating at the level of the intonational
phrase, in contrast, predicts that tics will show sensitivity to prosodic structuring occurring in
their (temporal) environment. Finding that Tic Occurrence and Position Status are not
significantly associated falsifies the main hypothesis.
35
2.3. Results
Results are reported by participant below because the study is designed as a series of case
studies. To preview here: Tic Presence and Position Status are significantly associated in three
out of four participant datasets (A, B, & C), providing support for the hypothesis that ticking can
be coordinated with speaking at the level of the intonational phrase. The variables were not
significantly associated in Participant D’s data. Closer inspection of counts (e.g., words per
chunk, tics per chunk) revealed patterns of variability that are presented in each participant’s
results sub-section.
2.3.1. Participant A
Participant A completed one block of tasks.
10
On average, chunks were 9.63 (SD=1.99)
words long and these words were grouped into 0.97 (SD=0.30) intonational phrases (see Table
3).
11
10
Participants were not obligated to complete the study. This participant chose to terminate her
participation after one block due to exhaustion.
11
The IP structure of a chunk is analyzed as lacking IPs all together when no words hosting phrase-final
boundary tones are located. Such a chunk still has two cooperative positions and a number of interruptive
positions that is proportional to the number of words in the chunk.
36
Table 3. Chunk count per task and average number of words, IPs, tics and cooperative/interruptive
positions per chunk.
Task Chunks Words (mean) IPs (mean) Tics (mean)
Possible Positions for Tics (mean)
Cooperative Interruptive
Rainbow
Passage
23 7.57 0.74 0.09 1.74 6.83
Proud
Narrative
63 11.51 0.90 0 1.90 10.60
Pool Scene
Description
35 9.80 1.34 0.29 2.34 8.46
Chunks containing speech had a total of 1,363 positions available for tic occurrence across
the three speech tasks performed. As Table 3 shows, Participant A chunks contained an average
of 1.99 (SD=0.31) inherently cooperative possible tic positions and 8.63 (SD=1.89) interruptive
positions. These averages meet the general expectation that tics have more opportunities to
interrupt IPs in their temporal vicinity than to cooperate with them. The cells in Table 4 show the
counts submitted chi-square analysis. Cells correspond to (clockwise from top left): the count of
interruptive tics across tasks, the count of cooperative tics across tasks, the count of cooperative
positions within which tics did not occur (i.e., the number of empty cooperative positions across
tasks), and the count of empty interruptive positions.
37
Table 4. Contingency table of Tic Presence and Position Status for chunks containing speech.
Interruptive
Position
Cooperative
Position
Tic
Present
0 10
Tic
Absent
1121 232
Figure 8 shows correlation matrices representing expected (right panel) and observed (left
panel) counts in Participant A data (tasks pooled).
12
The color of the circle indicates the cell’s
raw value and the size of the circle reflects the relative contribution of that cell to the total count.
Expected and observed counts of empty interruptive/cooperative positions were almost identical
(bottom row, left & right panels). The pattern of expected interruptive/cooperative tic presence,
however, is observed in reverse (top row, left & right panels): more cooperative ticking was
observed than interruptive ticking. Sensitivity to phrasal structure of supports the hypothesis that
ticking is not “blind” to speech.
12
Expected counts for the purposes of plotting were obtained using the chi-square function in base R but
a Fisher’s exact test was used to determine if the variables were significantly associated because
assumptions for chi-square weren’t met.
38
Figure 8. Correlation matrices of Tic Presence x Position Status. Expected counts if variables are
independent on left; observed counts on right. Size of the circle in each cell indicates the relative
contribution of that cell to the total count. Color of circle indicates raw counts.
Tic Presence and Position Status were found to be significantly associated (p < .001, Fisher’s
exact test). In Figure 9, a correlation matrix of residuals shows that the increased frequency of
tics occurring in cooperative positions was the strongest contributing factor to the result. The
second strongest contributing factor was the decreased frequency of tics occurring in interruptive
positions.
Figure 9. Residuals of expected and observed Tic Presence x Position Status. Circle size indicates
relative contribution of the cell. Circle color represents raw value of residual.
39
These findings do not contradict the broad hypothesis that ticking and speaking can be
coordinated in time in service of communication; the absence of IP-level interruptions by tics is
indicative of coordination between ticking and speaking. However, it is also the case that speech-
proximal tics are much less frequent than distal tics in Participant A data, which suggests that a
different optimization strategy is being deployed (ticking during pauses as opposed to ticking
around phrases). Analysis of Participant A’s event counts by task type provides some context.
Figure 10. Number of tics (A), intonational phrases (B), and words (C) per chunk in three speech task
types. In this and all following boxplots, the group median is represented as a horizontal line inside the
box, the interquartile range [IQR] is represented by the box, and intervals between minimum and
maximum values within 1.5 * the IQR are represented by vertical bars. Significance in all pair-wise
comparisons based on Wilcoxon test.
Non-parametric Kruskal-wallis tests found significant variability by task type in the number
of words per chunk [H(2) = 21.55, p < .001] and in the number of intonational phrases per chunk
[H(2) = 15.66, p < .001]. Pair-wise comparisons were performed using post-hoc Wilcoxon tests
finding significant differences in all pairs (Figure 10); the number of words per chunk was
greatest in the personal narrative, followed by passage reading, and finally the picture
description. With respect to the grouping of words into intonational phrases (IPs), Wilcoxon tests
found chunks in the picture description task were broken up into significantly fewer IPs relative
40
to both the personal narrative and the passage reading. No other pairwise comparisons reached
significance (all ps > .156).
The stacked bars in Figure 11 show the proportions of distal and cooperative tics produced
by Participant A in two of the three speech tasks they performed. Participant A produced no tics
during her personal narrative task. This task also tended to have longer chunks, that is, a greater
number of words per chunk ( During passage reading, Participant A produced about equal
proportions of distal and cooperative speech-proximal tics; corresponding chunks were towards
the longer side of her range. The picture description task had mostly distal ticking around
significantly short chunks.
Figure 11. Proportions of Distal and Speech-proximal Cooperative tic events in two speech tasks.
To summarize Participant A results, tic and speech systems interact cooperatively in this
adult’s co-speech ticking, but it is possible that a “ticking during pauses” strategy is being
employed. A co-speech ticking task construed in that manner avoids having to switch abruptly
between ticking and speaking—pauses serve as a buffer between tics and speech. Ticking
(enough) during pauses could also have the secondary effect of reducing the quantity or strength
of tic urges during utterances, which could be contributing to the participant’s non-existent
41
interruptivity.
2.3.2. Participant B
Participant B performed eleven speech tasks and produced a total of 132 speech-proximal
tics. Table 5 presents task averages for count of words, IPs, positions, and tics per chunk in the
participant’s dataset. On average, the number of words in chunks containing speech is 11.75
(SD=2.52). These words are grouped into an average of 1.17 intonational phrases (SD=0.33) per
chunk. Kruskal-wallis tests compared the length of chunks (in words) and the number of IPs per
chunk by speech task type, finding significant differences (Chunk length: [H(2) = 8.96, p =
.011]; IPs per chunk: [H(2) = 15.04, p < .001]). Pairwise comparisons showed that chunks
contained more words in passage readings than in picture descriptions and that the number of IPs
per chunk was greater in passage readings relative to personal narratives. Other pair-wise
comparisons were not significant (all ps > .679).
Figure 12. Number of tics (A), intonational phrases (B), and words (C) per chunk in three speech task
types. Significance in all pair-wise comparisons based on Wilcoxon test.
With regards to speech-proximal ticking, chunks containing speech in Participant B’s dataset
42
had a total of 2,762 positions available for tic occurrence across all tasks; 2,298 were interruptive
(i.e., interior to an IP) and 464 were cooperative (i.e., outside an IP). The three gray rows in
Table 5 highlight tasks where interruptive tics outnumbered cooperative tics. Patterns of prosodic
structuring within chunks gave rise to an average of 2.17 cooperative positions (SD=0.33) and
10.59 interruptive positions (SD=2.40) per chunk, across which an average of 0.82 (SD=0.31)
tics were distributed. A Kruskal-wallis test found significant differences in the number of tics per
chunk as a function of the type of speech task [H(2) = 12.33; p = .002]. For this reason, the
association between variables Tic Presence and Position was tested for the three task types
separately.
Table 5. Mean number of words, tics, intonational phrases, and Cooperative/Interruptive possible tic
positions per chunk in each task. Gray shading indicates observed count of Interruptive tics in task is
higher than observed count of Cooperative tics.
Task Type Task Chunks Words
(mean)
IPs
(mean)
Tics
(mean)
Possible Positions for Tics
(mean)
Cooperative Interruptive
Passage
Reading
Comma 30 13.30 1.63 0.63 2.63 11.67
Grandfather 12 11.08 1.17 0.58 2.17 9.92
Northwind 9 12.89 1.22 1.33 2.22 11.67
Rainbow 25 13.52 1.44 0.52 2.44 12.08
Personal
Narrative
Embarrassed 25 12.40 0.44 0.88 1.44 13.40
Joyful 21 15.86 1.38 0.67 2.38 14.48
Proud 19 14.16 1.37 0.79 2.37 12.79
Sad 18 8.78 0.72 0.50 1.72 8.06
Picture
Description
Park Scene 18 10.00 1.06 1.39 2.06 8.94
Pool Scene 19 9.58 1.21 1.05 2.21 8.37
Animal House 17 7.71 1.18 0.71 2.18 6.53
Chi-square tests found that tics occurred in cooperative positions more often than predicted
43
by chance in all three task types. Counts submitted to chi-square analysis are shown in Table 6.
The association between Tic Presence and Position Status is significant in personal narratives
[X
2
(1, N = 1153) = 81.971, p < .001], passage readings [X
2
(1, N = 1183) = 126.71, p < .001],
and picture descriptions [X
2
(1, N = 2527) = 58.379, p < .001].
Table 6. Counts submitted to chi-square analysis for reading, narrative, and picture description tasks.
Interruptive Position Cooperative Position
Passage Reading Tic Present 3 31
No Tic 873 155
Personal Narrative Tic Present 26 32
No Tic 965 130
Picture Description Tic Present 12 28
No Tic 419 88
Expected and observed raw counts are depicted as correlation matrices in Figure 13 (tasks
pooled). Inspection of the first row in the Expected (left) panel shows that at chance, tics are
expected to occur more frequently in interruptive positions relative to cooperative positions.
Observed counts (right panel) showed the reversed pattern. The expected and observed
proportions of interruptive/cooperative positions where tics did not occur were roughly equal.
44
Figure 13. Correlation matrices of Tic Presence x Position Status, tasks pooled. Expected counts if
variables are independent on left; observed counts on right. Size of the circle in each cell indicates the
relative contribution of that cell to the total count. Color of circle indicates raw counts, tasks pooled.
Residuals plotted in Figure 14 indicate that cooperative ticking was the strongest contributing
factor to the association between Tic Presence and Position Status across all three task types (top
right cell). Regardless of the kind of speech planning involved, tics that are anywhere in the
temporal proximity of speech will preferentially occur outside of or in between intonational
phrases.
Figure 14. Chi-square residuals by Task Type. The size and color of the circle in each cell represents
the relative contribution of that cell and the value of the residual, respectively. Zero is represented by
white. These results were each significant, p < .001.
45
Participant B’s co-speech ticking strategy can be further contextualized by taking distal tics
into account. Figure 15 shows proportions of all three tic event types by task (distal, speech-
proximal cooperative, & speech-proximal interruptive). The relatively small proportion of distal
tics in the majority of tasks (gray bars) suggests that Participant B does not target pauses for tic
production, unlike Participant A who mostly ticked during pauses. Three tasks that did contain a
relatively large proportion of distal tics also contained a greater number of interruptive speech-
proximal tics than cooperative ones. These were two personal narratives (Embarrassed, Proud)
and a picture description task (Pool scene).
Overall, ticking was less interruptive during passage readings relative to personal narratives.
It is possible that the low rate of interruptions in passage readings is related to the significantly
shorter IPs that occurred in passage readings relative to personal narratives. It is also possible
that increased emotional load in personal narratives contributes to those tasks’ “difficulty”,
leading to increased rates of interruption. Narratives are the only speech task type where talkers
must recall events and they are the only task type that elicits positive/negative emotion; both of
these could be play a role in increased rates of interruption due to increased mental load. It is
also possible that baseline rates of ticking are elevated during personal narratives, again because
of emotional load. These points would not explain the increase in interruptions during the one
picture description task, however; this local increase, if significant, should be related to a local
circumstance (i.e., boredom during that particular task).
To summarize, Tic Presence and Position Status were significantly associated in Participant
B data, with tics largely occurring around prosodic phrases and not interior to them. Co-speech
ticcing was most cooperative (i.e., less interruptive) during passage reading tasks, followed by
picture descriptions. IPs were shortest in passage readings, followed by picture descriptions.
46
Taken together, these results point to a role of adaptive prosodic phrasing in the relative
“success” in tic/phrase coordination. these tasks also had significantly shorter IPs and therefore
increased opportunities to tic cooperatively. Ticcing is not “blind” to ongoing speech and neither
is speech “blind” to the current tic-urge landscape.
Figure 15. Relative proportions of three tic event types in reading (top), picture description (middle),
and personal narrative tasks (bottom) performed by Participant B. Distal tics and Outside IP tics are
inherently cooperative (gray and blue); Inside IP tics are inherently interruptive (orange).
2.3.3. Participant C
Participant C’s co-speech ticking sample contained a total of 4,957 positions available for tic
occurrence across the three task types, 4,048 interruptive and 909 cooperative. 535 speech-
proximal tics occurred. The average number of words, IPs, tics, and positions per chunk are
listed by task in Table 7. Chunks present with an average of 12.13 words (SD=2.60) that are
47
grouped into an average of 1.5 (SD=0.51) intonational phrases. Interruptive positions (M=10.64,
SD=2.46) tended to outnumber cooperative ones (M=2.50, SD=0.51), as expected.
Table 7. Average number of words, tics, intonational phrases, and cooperative/interruptive positions per
chunk for each task.
Task Type Task Chunks Words
(mean)
IPs
(mean)
Tics
(mean)
Possible Positions for Tics
(mean)
Cooperative Interruptive
Passage
Reading
Comma 31 12.81 2.48 2.74 3.48 10.32
Grandfather 17 9.59 1.71 1.65 2.71 7.88
Northwind 14 9.86 1.57 1.57 2.57 8.29
Rainbow 30 13.10 2.07 2.43 3.07 11.03
Personal
Narrative
Embarrassed 15 14.13 1.13 2.13 2.13 13.00
Joyful 26 14.19 1.31 1.81 2.31 12.88
Proud 27 14.78 1.59 2.56 2.59 13.19
Sad 22 8.05 0.86 2.05 1.86 7.18
Picture
Description
Park Scene 83 8.61 0.84 1.02 1.84 7.77
Pool Scene 82 13.18 1.12 0.95 2.12 12.06
Animal House 35 15.17 1.77 1.71 2.77 13.40
There is a small but significant difference in the length of chunks as a function of task type
according to Kruskal-wallis test [H(2) = 5.406057, p = .067] that is driven by longer chunks in
readings relative to picture descriptions (Figure 16, right panel). Kruskal-wallis also found the
number of IPs per chunk to vary significantly across the three task types [H(2) = 9.078248, p =
.010], driven by significantly more IPs per chunk in readings relative to picture descriptions
(Figure 16, center panel). There were significant differences across the task types in the number
of tics per chunk as well [H(2) = 44.88651, p < .001]. Pair-wise comparisons showed a greater
number of tics per chunk in readings relative to picture descriptions, and in personal narratives
48
relative to picture descriptions. These findings are discussed further in the general discussion in
the context of skilled adaptation considering that this participant endured the fewest tic
interruptions while also producing the greatest number of tics overall.
Figure 16. Number of tics (A), intonational phrases (B), and words (C) per chunk in three speech task
types. Significance in all pair-wise comparisons based on Wilcoxon test.
Figure 17. Correlation matrices of Tic Presence x Position Status, tasks pooled. Expected counts if
variables are independent on left; observed counts on right. Size of the circle in each cell indicates the
relative contribution of that cell to the total count. Color of circle indicates raw counts.
The variables Tic Presence and Position Status are significantly associated in personal
narratives [X
2
(1, N = 1247) = 455.43, p < .001], picture descriptions [X
2
(1, N = 2527)
49
= 455.43, p < .001], and passage readings [X
2
(1, N = 1183) = 586.78, p < .001]. Counts
submitted to chi-square analysis are shown in Table 8. Speech-proximal tics are more likely to
occur in cooperative positions around IPs than in interruptive positions within them. Raw count
correlation matrices show that the expected pattern of interruptive and cooperative tic
proportions is reversed in the observed proportions (Figure 17). This finding mirrors the findings
in Participant A and Participant B’s data, though the pattern is more pronounced for Participant
C.
Table 8. Counts submitted to chi-square analysis for reading, narrative, and picture description tasks.
Interruptive Position Cooperative Position
Passage Reading Tic Present 22 154
No Tic 879 128
Personal Narrative Tic Present 41 120
No Tic 1003 83
Picture Description Tic Present 42 156
No Tic 2061 268
Figure 18. Chi-square residuals by Task Type. The size and color of the circle in each cell represents
the relative contribution of that cell and the value of the residual, respectively. Zero is represented by
white. These results were each significant, p < .001.
50
Across all three task types, the two strongest contributing factors to the association between
Tic Presence and Position Status in Participant C data were the relatively higher frequency of tics
occurring in cooperative positions and the relatively lower frequency of tics occurring in
interruptive positions, in that order. This is illustrated by large residuals in the cells of the top
rows in the correlation matrices in Figure 18.
Participant C produced 535 speech-proximal tics and 110 distal tics, the largest sample of co-
speech tics in the corpus. Cooperative ticking dominated across tasks (light blue bars Figure 19).
Tasks vary with respect to the exact proportions of distal and speech-proximal tics, but every
task contained a larger number of speech-proximal tics than distal tics.
Figure 19. Relative proportions of three tic event types in reading (top), picture description (middle),
and personal narrative tasks (bottom) performed by Participant C.
51
2.3.4. Participant D
The Participant D co-speech ticking sample is unique in that speech-proximal tics occurred
more frequently in interruptive positions than cooperative positions—the pattern expected by
chance given the relative availability of interruptive/cooperative positions. Participant D co-
speech ticcing does not show signs of systematic coordination between speech-proximal tics and
prosodic phrases in their temporal vicinity.
The average number of words/IPs/tics/positions per chunk for each task are presented in
Table 9. Chunks contained an average of 9.80 words (SD=3.17) grouped into an average of 0.70
intonational phrases (SD=0.25). Only one task in this participant’s data had at least one IP per
chunk on average, a marked difference from the other three participants; this point is elaborated
on in the discussion section. An average of 0.31 (SD=0.25) tics occurred per chunk. Finding such
a low average number of tics per chunk implies that there were a great many more speech-only
chunks (2,283) than mixed chunks containing both tics and speech (346); it also indicates that
there were long intervals of time speech without tics. With regards to positions available for tic
occurrence, chunks contained relatively more interruptive positions (M=9.10, SD=2.94) than
cooperative positions (M= 1.70, SD=0.25) as expected.
52
Table 9. Mean number of words, tics, intonational phrases, and Cooperative/Interruptive possible tic
positions per chunk in each task.
Task Type Task Chunks Words
(mean)
IPs
(mean)
Tics
(mean)
Possible Positions for Tics
(mean)
Cooperative Interruptive
Passage
Reading
Comma 52 8.38 0.63 0.87 1.63 7.75
Grandfather 26 7.69 0.38 0.54 1.38 7.31
Northwind 7 16.57 1.29 0.29 2.29 15.29
Rainbow 36 10.97 0.83 0.47 1.83 10.14
Personal
Narrative
Embarrassed 25 13.24 0.84 0.16 1.84 12.40
Joyful 18 11.00 0.72 0.11 1.72 10.28
Proud 37 8.84 0.57 0.08 1.57 8.27
Picture
Description
Animal House 38 7.26 0.61 0.24 1.61 6.66
Park Scene 105 6.45 0.46 0.31 1.46 5.99
Pool Scene 78 7.56 0.67 0.06 1.67 6.90
A Kruskal-wallis test showed that the length of chunks differed as a function of task type
[H(2) = 10.50486, p = .005]. The right panel in Figure 20 shows that chunks in narratives
contained more words than chunks in picture descriptions; otherwise chunks were of comparable
length. The center panel of the figure shows a significantly increased number of tics in passage
readings relative to picture descriptions, however, it should be noted that this difference is driven
by the participant’s reading of Comma gets a cure, during which a number of long tic bouts
occurred. There were no significant differences between task types in the number of IPs per
chunk.
53
Figure 20. Number of intonational phrases (A), tics (B), and words (C) per chunk in three speech task
types. Significance in all pair-wise comparisons based on Wilcoxon test.
Chunks containing speech had a total of 3,968 positions available for tic occurrence; 71
speech-proximal tics occurred. A chi-square test examining the relation between Tic Occurrence
and Position Status found the variables are not significantly associated [X
2
(1, N = 3968) = 1.1, p
= .3]. Speech-proximal tic distribution in this sample shows no sensitivity to the IP structure of
surrounding speech. The almost identical proportions of expected and observed frequencies
depicted in the correlation matrices in Figure 21 indicate that tics occurred in
interruptive/cooperative positions at rates expected by chance.
Table 10. Contingency table of observed counts for Tic Presence and Position Status variables.
Interruptive
Position
Cooperative
Position
Tic
Present
55 17
Tic
Absent
3231 666
54
Figure 21. Correlation matrices of Tic Presence x Position Status. Expected counts if variables are
independent on left; observed counts on right. Size of the circle in each cell indicates the relative
contribution of that cell to the total count. Color of circle indicates raw counts.
Closer inspection of the kinds of tics that occurred in Participant D data reveals that patterns
of co-speech ticking were not stable across tasks—a marked difference between this participant
and the three previous ones. The proportions of distal/cooperative/interruptive tics occurring in
each task are presented in Figure 22. Tics were almost exclusively speech-proximal interruptive
in readings “Comma Gets a Cure” and “The Northwind Passage” as well as the “Embarrassed”
narrative. On the other extreme, tics were exclusively speech-proximal and cooperative during
performance of the two positive valence personal narratives (Joyful and Proud). Distal ticking
was prevalent in four tasks.
55
Figure 22. Relative proportions of three tic event types in reading (top), picture description (middle),
and personal narrative tasks (bottom) performed by Participant D.
In summary, Tic Presence and Position Status are not linked in Participant D data. Tics
are not likely to occur in cooperative (between-IP) positions relative to interruptive (within-IP)
positions. This finding is expected by chance given the reduced availability of cooperative
positions relative to interruptive ones. Thus, the null hypothesis is not falsified in this particular
case study.
2.4. Discussion
Overall findings support the hypothesis that ticking and speaking can optimize their
interaction via strategic relative ordering: tic presence is associated with the cooperative status of
available positions in the co-speech ticking of three out of the four adult tickers. These findings
are interpreted as indicating that actions for ticking and speaking have been coordinated in time
56
at the level of the intonational phrase. By ticking marginal to prosodic phrases—the primitive
action unit for conveying composed messages—the tasks of both the tic and speech system are
achieved satisfactorily.
2.4.1. Experience and skill in co-speech ticking
It is argued that cooperative co-speech ticking is a skill adult talkers who tic can develop—a
behavioral compensatory strategy. As such, individuals are expected to show different levels of
skill. One direct and objective way to estimate skill level is to equate it with amount of practice.
This is an aspect of participants’ natural history that was not quantified; however, email
communication prior to recording as well as casual conversation between the participant and the
researcher during recording brought to light important details regarding participants’ skill level
in co-speech (free) ticking.
Tic Presence and Position Status were not associated in Participant D’s co-speech ticking
sample. This participant produced a greater number of interruptive tics than cooperative ones. In
theory, this outcome can be accounted for in terms of skill level given that Participant D is likely
the least experienced (free) ticker. For one, he is the youngest ticker. Furthermore, the onset of
his tics was in his late teens; he has been ticking for the smallest number of years of all the
participants. Finally, this participant reported in casual conversation that he rarely, if ever, tics
freely around anyone aside from a very select few close friends and family. It is reasonable to
assume that Participant C, on the other hand, is the most experienced free-ticker as well as the
most experienced co-speech free ticker. With regards to free-ticking generally, Participant C
reports choosing to tic freely at (practically) all times due to a strong philosophical and political
conviction. With regards to ticking while talking, Participant C can be assumed to be the most
57
experienced co-speech ticker due to her roles as a theater performer and advocate for the
Tourette’s community, which require her to speak and perform publicly on an almost daily basis.
If Participant C speaks often and in a variety of (relatively challenging) circumstances while
ticking freely most of the time, it can be concluded that she is a relatively experienced co-speech
free-ticker, which can explain how successful her co-speech ticking was.
Less details are known regarding Participant B and Participant A. The former spoke of her
experience with respect to ticking freely or self-suppressing in more contextual terms, explaining
she opts to self-suppress in certain kinds of situations. Participant A started ticking and was
diagnosed late in life and so has fewer years of experience relative to child-onset tickers.
However, this participant reports ticking freely most of the time and, like Participant C, frames
her choice to free-tic as a political conviction. She produced mostly distal tics; the few speech-
proximal tics present were all cooperative. It is interesting that while distal ticking is inherently
cooperative because tics are kept entirely separate from speech, Participant A is the only
participant that appears to regularly employ this strategy, and with a perfect success rate. This
suggests that Participant A’s over-arching co-speech ticking task could be construed with
different specifications (e.g., organized with respect to pauses and not phrase boundaries). If
these assumptions were to hold, then relative levels of interruptivity observed in participant
datasets align with expectation: more experience is associated with more frequent ticking but
ticking that is less intrusive.
The finding that speech-proximal vocal tics occur outside prosodic phrases more often than
predicted by chance is evidence of cooperative tic/speech interaction, but what processes are
driving the observed patterns? Beyond strategic deferment of tic actions to optimal times, speech
planning and production processes should themselves reorganize in order to accommodate more
58
frequent ticking. For instance, grouping fewer words into phrases increases the number of
phrases, and for every phrase, two cooperative positions appear. To put another way, shorter
phrases create more “surface area” or opportunity around which to tic. In contrast, the number of
interruptive positions increases linearly with the number of words in a phrase. For this reason,
speaking in longer phrases is relatively inefficient in the context of cooperative co-speech
ticking. If optimal coordination involves flexible adaptation on the part of both systems, then
patterns of speech production are expected to display variability that is indicative of cross-modal
accommodation. Some data that speak to this issue are discussed immediately below.
2.4.2. Flexibility on the part of the speech system
The three different kinds of speech task elicited in this study each require particular planning
processes. For example, reading involves visual word recognition and personal narratives do not.
It was shown that the type of speech task did not impact Participant C’s ability to optimally
coordinate the actions of ticking and speaking, but it is possible that the different cognitive
requirements of each task push planning processes toward one or another mode. As Table 11
shows, Participant C produced, on average, more intonational phrases per chunk during passage
readings (mean=1.397, sd=1.560) relative to both personal narratives (mean=1.018, sd=1.183)
and picture descriptions (mean=0.996, sd=1.496). Two-sided Wilcoxon tests found that these
differences were significant (Readings|Narratives: Z = 31428, p < .001; Readings|Pictures: Z =
29493, p < .001). The number of IPs per chunk in picture descriptions and personal narratives
were not significantly different. As the presence of visually presented text is the difference
between reading tasks and the other task types, it is possible that tickers are using punctuation
marks in the text to guide phrasing in chunks containing speech. The fact that passage reading
59
tasks also have the lowest rates of ticking that interrupts phrases suggests that such adaptive
phrasing is to the benefit of communication (Table 11, last column). Interestingly, the length of
chunks (number of true-words) did not differ significantly as a function of task type, suggesting
that the planning of utterance chunks in this participant is more general, perhaps more directly
linked to breathing and breath groups.
Table 11. Counts of intonational phrases, true-words, and tics per chunk in three task types for Participant
C. Interruptivity is the percentage of all tic events that occurred interior to an intonational phrase; darker
shading indicates more interruptivity. Blue square = significantly more IPs per chunk in reading tasks.
Task Type Chunks IPs / Chunk Words / Chunk Tics / Chunk Interruptivity
M SD M SD M
Personal
Narrative
111 1.018 1.183 10.423 12.233 1.991 22%
Picture
Description
225 0.996 1.496 10.342 11.194 1.169 18%
Passage
Reading
136 1.397 1.560 8.022 8.574 1.963 9%
Turning to the other end of the skill spectrum, Participant D produced very few mixed
chunks. His data consisted of long chains of speech-only chunks with the occasional tic-only
chunk. Given that a great many tics did occur, and that the majority of these instances of ticking
were interruptive, the long swaths of speech can be considered intervals during which the
participant’s communicative system did not “allow him” to tic.
13
What mixed chunks could be
found were mostly mixed by virtue of their containing tic interruptions, which is to say that they
13
Recall that this participant has relatively little experience allowing himself to tic in front of a stranger;
initially, he reported feeling “unable” to tic freely. This is interpreted here in terms of automatisms:
Participant D has a stable automatism to self-suppress. Effort and time are needed to de-automate and
begin to tic freely.
60
were likely failed speech-only chunks. This is further evidence that it may be preferable to tic
relatively frequently (at optimal times) rather than refrain from ticking altogether. Interestingly,
the participant’s average IPs per chunk was stable across task types; his passage readings had the
highest rates of interruptivity ( Table 12 ). All this to say that in Participant D, neither the tic nor
the speech system appears to be flexible in a way that supports cooperative interaction between
the two systems.
Table 12. Counts of intonational phrases, true-words, and tics per chunk in three task types for Participant
D. Interruptivity is the percentage of all tic events that occurred interior to an intonational phrase; darker
shading indicates more interruptivity. Red rectangle = words per chunk different across tasks.
Task Type Chunks IPs / Chunk Words / Chunk Tics / Chunk Interruptivity
M SD M SD M
Personal
Narrative
85 0.647 0.767 10.071 9.144 0.212 25%
Picture
Description
235 0.523 0.688 6.566 6.121 0.319 49%
Passage
Reading
128 0.641 0.839 8.961 8.604 0.688 69%
Finally, it is worth noting that Participant D had fewer IPs per chunks containing speech
relative to the other three, more skilled participants; he generally had a single IP per chunk.
14
Given the other lines of evidence just presented, the fact that skilled co-speech tickers produce a
greater number of shorter IPs and the unskilled one doesn’t suggests that part of co-speech
ticking is to shift the speech system toward a different mode of phrasing. As co-speech ticking is
a (relatively) novel task for Participant D, this aspect of the skill has not developed.
14
Recall that chunks can be analyzed as lacking IPs if there were no phrase-final boundary tones present.
61
One alternative explanation for qualitative differences in Participant D data is that the
monotony of his regular speaking voice led to an underestimate of intonational phrase
boundaries. As the method employed in this study relies on identifying prosodic phrases on the
basis of the visual detection of boundary tones, if a talker has unusually monotone speech,
making boundary tone events less noticeable. This is a possibility but it doesn’t impact the
analysis because Participant D’s speech-proximal tics were clustered in long bouts in a few
mixed chunks that tended to follow actual word-breaks. That is, interruptions weren’t just sub-
optimal, they were disfluent. Notably, Participants C and B produced interruptive tics that mostly
avoided interrupting words. Future work will uncover whether phrase-level interruptions occur at
syntactic boundaries, which would suggest another mode of optimization.
The adaptive nature of observed tic-speech interactions is most clear in utterances that
contained more tics than words. Participant C’s ticking was successful despite frequent ticking
because tic events were clustered into little bouts/pairs. An example is provided in Figure 23
below, which shows the spectrogram and transcript of Participant C reading a portion of the text
“Comma Gets a Cure.” The complete sentence corresponding to this portion of the text reads:
“Then she put on a plain yellow dress and a fleece jacket, picked up her kit, and headed for
work.” Participant C produced this sentence in two mixed chunks separated by a pause: “Then
she put on a plain yellow dress and fleece jacket, picked up her kit” followed by “and headed
for work.” The first mixed chunk in question consisted of three IPs as indexed by IP-final
boundary tones on the words dress, jacket, and kit. Two cooperative speech-proximal verbal tic
events occurred during the first mixed chunk, each immediately following an IP-final boundary
tone (on dress and jacket). Figure 23 illustrates the participant’s reading of the last (sequentially)
coordinative phrase of the sentence, which was produced as a one-IP mixed chunk. Alongside
62
the four-word IP in the mixed chunk were four verbal tic events, two of which occurred before
the onset of the IP (i.e., chunk initially), and two of which immediately followed the IP-final
boundary event. By “stacking” tics just outside phrase boundaries, tic urges can be expediently
satisfied without impacting the interpretability of surrounding speech.
Figure 23. Two sequential tic pairs in cooperative interaction with an intonational phrase. Transcript
of utterance in rough time alignment above spectrogram. Pitch track overlaid in cyan. Word hosting IP-
final falling tone (work) and its acoustic interval in green. Verbal tics and their acoustic intervals in blue.
Pauses flanking the utterance show that it represents a single mixed chunk.
2.4.1. Better ticking
Tickers face a conundrum when it comes to tic control. To the observer, the vocal tics appear
like typical but situationally inappropriate vocalizations. To the ticker, tic actions are unwanted
in that they do not form part of intended action plans. Ticking freely feels good but draws
unwanted attention. Social stigma and/or social conventions regarding proper behavior can
motivate tickers to inhibit their own ticking in certain situations but suppressed tics will
eventually “burst out.” Tickers can inhibit their tics temporarily but not without consequence—
hours-long periods of global inhibition are thought to systematically lead to prolonged bouts of
vigorous, uncontrollable ticking once tic production is allowed to resume (e.g., Hashemiyoon,
Kuhn, & Visser-Vandewalle, 2017; Leckman et al., 1993; Reese et al., 2014; Specht et al., 2013).
63
These “tic attacks” are the only instances during which vocal ticking is experienced as directly
preventing intended action (Robinson & Hedderly, 2016).
Results of this experiment suggest that free tic states promote cooperative interactions
between ticking and goal-directed behavior, an effect that is likely due in part to having diverted
cognitive control resources from self-suppression to coordination. The frequency of urges and
tics have been reported to increase during periods of free ticking relative to periods of tic
suppression, while the strength of urges (based on self-report) remains the same (Brabson et al.,
2016). Taken together, these facts paint an interesting picture: allowing oneself to tic freely may
lead to more controlled and less disruptive (albeit more frequent) ticking. Given that successful
self-suppression (often) leads to eventual uncontrolled ticking, some tickers may choose to tic
freely in order to tic better. The sense of agency experienced by tickers in these circumstances
suggests that free ticking may facilitate goal-directed control over ticking.
15
2.5. Conclusion
This study found evidence for the claim that vocal tics occurring in close temporal proximity
to speech are ordered with respect to the prosodic phrases in the speech. The construal of a
cognitive entity like reading a story out loud while ticking freely amounts to creation of an over-
arching behavioral task (e.g., Farooqui & Manly, 2018; Vallacher & Wegner, 1987) that is
defined in terms of solving the potential problems posed by intrusive tics co-occurring with
speech and the introduction of unintended meanings. Such over-arching action entities are in a
15
Of course, there can be no control over the temporal occurrence of urges; for many researchers that is
precisely the nature of the condition, that inappropriate/atypical associations between tic actions and
reward lead to what amounts to an addiction (e.g., Graybiel, 2008).
64
position to coordinate tic and speech tasks within their temporal scope (Farooqui & Manly, 2018;
Vallacher & Wegner, 1987, 1989). In the present example, the higher-level task goal subsumes,
on the one hand, purposive expression of linguistic signals via reading out loud, and on the other,
management of co-active urges. Supervision and monitoring of performance of this higher-level
task should, in theory, trigger qualitative adjustments that are typical for speech (and for ticking,
for that matter). To put another way, execution of the higher-level ticking-while-reading task
should ensue in a way that keeps the qualitative properties of typical reading intact; namely,
fluency and clarity of speech should not be affected by urge satisfaction.
That vocal tics in close temporal proximity to true words tended to occur before prosodic
phrase-initial words and after prosodic phrase-final words is also interpreted in light of the claim
that prosodic phrasemaking is the primitive “message” task in spoken communication—the
prepotent task level at which speech goals are defined and outcomes evaluated. The key to this
proposal is that intended action plans specify at what point in (relative) time a goal will have
been met. In the case of phrases, phrases are tasked with the goal of communicating particular
messages (beyond single lexical concepts). For this goal to be accomplished, the entire message
must be communicated. This is the sense in which the phrase-making task is the prepotent
communicative action and why prosodic phrase actions are the speech actions around which
ticking should organize.
Ticking around phrases enacts cooperative interaction; to the extent that instances of ticking
are successfully co-orchestrated with speech in this manner, then co-speech ticking can be said to
exhibit skill. Flexibility on the part of the speech system that benefits a joint ticking-while-
talking goal further demonstrates adaptive skill. Cooperative interaction between ticking and
speaking opens the door to frequent but less intrusive ticking; e.g., in our dataset, the most
65
skilled co-speech ticker produces the most tics but experiences the fewest interruptions. In
contrast, the ticker with the least amount of experience in co-speech free ticking experienced
mostly interruptive ticking.
66
Chapter 3. Ticking and Talking on Distinct Acoustic Channels
Adults living with Tourette syndrome produce frequent vocal tics. Tic vocalizations are
responses to urges to tic. Suppressing one’s tics causes increasing discomfort as time goes by, as
when resisting the urge to scratch an itch (Cavanna et al., 2017). Eventually, suppressed tics
“burst out” (e.g., Hashemiyoon, Kuhn, & Visser-Vandewalle, 2017; Reese et al., 2014; Specht et
al., 2013). The present study undertakes an acoustic investigation of the phonatory characteristics
of the subset of vocal tics that are isomorphic with linguistic units like words and phrases—
referred to as verbal tics here and elsewhere—and that are produced while the ticker is speaking.
Talkers-who-tic produce speech that is peppered with verbal (and potentially non-verbal) tics
but is otherwise typical.
16
Figure 24 depicts a verbal tic “biscuit” occurring within the bounds of
a spontaneous speech utterance.
16
With regards specifically to stutter, the relationship between ticking and this kind of speech disfluency
is hard to assess because (a) certain aspects of stuttering and ticking are hard to disentangle behaviorally
and (b) the developmental course of the conditions mirror each other. Stuttering and ticking both start in
early childhood, though persistent developmental stuttering may appear earlier than typical ticking.
Prevalence figures are equivalent: 10% of all children show observable signs of stutter and tics with only
0.1 - 1% of children retaining the symptoms (Robertson, 2008a, 2008b). In terms of observable behavior,
certain kinds of disfluency are indistinguishable from stutter (e.g., syllable repetitions). Findings from two
studies bear upon these issues. Individuals with TS and age-matched controls have been found to produce
comparable numbers of atypical disfluencies (i.e., stuttering) during passage reading, but tickers did
produce significantly more typical (i.e., non-stuttering) disfluencies relative to their healthy peers (De Nil,
Sasisekaran, Van Lieshout, & Sandor, 2005). In a separate study, adults who stutter produced
significantly more involuntary movements overall (i.e., motor tics) relative to non-stuttering adults
(Mulligan, Anderson, Jones, Williams, & Donaldson, 2003). Basal ganglia architecture is implicated in
both conditions, which further confuses the issue. Putting these issues to the side, however, it is the case
that Tourette’s is not classified as a speech or communication disorder, which indicates that the speech of
Tourette’s tickers is perceived as typical by clinicians.
67
Figure 24. Spectrogram of portion of utterance by Participant C produced while describing a picture.
Pitch track overlaid in cyan. Utterance transcription above the spectrogram in rough time-alignment. The
verbal tic “biscuit” in the utterance is capitalized and in red font; red shading surrounds the interval of
time corresponding to it.
In so far as speech is directed towards the transmission of linguistically encoded messages
that express intended meanings, co-speech verbal ticking presents a problem for communicative
goals because verbal tics introduce unintended meanings into the speech signal by virtue of their
isomorphism with true words and phrases. The utterance pictured in Figure 24 demonstrates this
point: the intrusive tic word “biscuit” appears to be embedded in a phrase
17
, which alters the
phrase’s semantic content. In this particular instance, the unintended semantic contribution of
“biscuit” is parsimonious with the surrounding linguistic context—there is, in fact, such a thing
as a “dog biscuit” even though there is no such object in the picture being described. Considering
how ubiquitous both ticking and talking are in the lives of adults with Tourette’s—and how
important communication is to talkers—it stands to reason that behavioral systems generating
17
The term embedded is not used in a technical sense. In Figure 24, the temporal distance between the tic
word “biscuit” are the true words “dog” and “brown” (i.e., the two true words immediately adjacent to
“biscuit”) is comparable to the temporal distance between any two true words. Such close proximity
between tic and true words is suggestive of some sort of grouping. What is claimed here, therefore, is that
because grouped words are a hallmark of linguistic phrases, tic words give the appearance of being
embedded within a phrase (e.g., from the perspective of a listener). It is a separate question whether or not
the vocalizations are actually grouped in terms of motoric/cognitive planning.
68
spoken communication develop compensatory strategies to counter the potential negative
impacts of intrusive meanings during co-speech ticking. Researchers who study the
neurobiological underpinnings of Tourette’s have speculated that such compensatory strategies
must emerge at various levels of description (e.g., Brandt et al., 2014; Leckman, Vaccarino,
Kalanithi, & Rothenberger, 2006; Pogorelov, Xu, Smith, Buchanan, & Pittenger, 2015).
However, no quantitative accounts of the nature of behavioral compensatory strategies for tics
exist. The present study focuses on verbal ticking and its manifestation during ongoing speech in
order to investigate this topic, asking whether talkers-who-tic generate separate (or separable)
acoustic and informational streams for tics and speech as a means of segregating unintended
semantic or referential meaning (i.e., of verbal tics) from intended speech (dialogic meaning).
Ecological approaches to the study of animal behavior assume that behaving is always
behaving in some particular circumstance (Gibson, 1979). Relatedly, development of
compensatory or “coping” strategies is understood as adapting previously acquired skills to
address a novel problem or somehow challenging circumstance. Skilled systems of action
display what has been referred to as dexterity by neurophysiologist Nicolai Bernstein (Latash et
al., 1996). It is useful to repeat his definition here:
Dexterity is the ability to find a motor solution for any external situation, that is,
to adequately solve any emerging motor problem correctly (i.e., adequately and accurately),
quickly (with respect to both decision making and achieving a correct result),
rationally (i.e., expediently and economically),
and resourcefully (i.e., quick-wittedly and with initiative).
(Latash et al., 1996:228)
In Bernstein’s model, an action’s leading level (i.e., the task level) achieves dexterity by
developing a suite of task/goal-specific sensory-based action adjustments led by lower levels,
that serve the leading level so that it may accomplish its intended task goal in a wide variety of
contexts without the need for (conscious or unconscious) attention (Latash et al., 1996:208).
69
Through practice and experience in task performance across a wide variety of circumstances,
these sensory-based corrections become automatized, by which Bernstein means that their
control has been “pushed down” far enough as to be out of reach of cognitive planning (Latash et
al., 1996:192). Automatisms function to free up supervisory systems to evaluate task
performance and the continued likelihood of accomplishing goals in the current (ongoing)
circumstance. Importantly, in theory, any task at one level can be recruited by a relatively higher-
level task to serve the function of a sensory-based correction, meaning that any task is
susceptible to automatization in the right circumstance.
18
Skilled communicative speech is equipped with a toolkit of automatized sensory-based
corrections that allow talkers to adapt to changing conditions without planning. Noise-induced
vocal adjustments like the Lombard effect are a particularly illustrative example.
19
If a talker
aims to transmit a linguistically encoded message with intended meanings to an interlocutor via
speech, then it goes without saying that the utterance’s audiblity is a crucial prerequisite of their
overall communicative goal. In the terms of the definition presented above, correct execution of
a communicative utterance requires that an utterance be audible. It is well known that if an
18
Bernstein asserts that the reverse is also true; automatisms can undergo de-automation in fact of
“destructive” forces. For example, major changes to the physical plant (e.g., post-glossectomy) create the
need for substantial retuning of functional synergies.
19
It is typical for linguists to consider Lombard and similar effects as “automatic” without specific
attention being paid to what this term implies. In this dissertation terms automatism(s) and automatization
refer specifically to Bernstein automatisms. Whether Lombard effects represent an automatism in this
sense is an empirical question that has not been addressed, though Bernstein’s definitions open the door to
quantitative examination. That Lombard effects are skilled as opposed to innate, however, can be
concluded from evidence that is currently available: these effects are absent in the speech of small
children (i.e., Lombard adjustments are learned; Aubanel, Cooke, Villegas, & Lecumberri, 2011); they are
difficult or impossible to inhibit (i.e., Lombard adjustments are automatized; Pick, Siegel, Fox, Garber, &
Kearney, 1989); and they are lost in neurological disease and old age (Stathopoulos et al., 2014). See
(Hotchkin & Parks, 2013) for an excellent review on noise-induced vocal adjustments across mammal
communication systems. For further discussion in the context of co-speech ticking see the last chapter of
this dissertation.
70
instance of dialogue is occurring in the presence of some environmental noise, a concordant
action response is automatically triggered. Talkers automatically increase their vocal amplitude
or loudness via coordinated vocal-respiratory action in what is known as the Lombard effect. It is
theorized that automated background tasks like those instantiating the Lombard effect do not
engage cognitive resources (i.e., planning). Certain circumstances do alert high-level supervisory
systems, such as the appearance of very loud environmental noise or sudden changes in
environmental noise that cannot be solved by regular background corrections. In those instances,
corrective actions would be planned and integrated into the flow of behavior. To illustrate the
distinction between Lombard effects and adjustments that are planned, consider an individual
talking on the phone while walking down a street. Upon visual detection of an approaching (and
slow moving) trash truck, the talker may suggest to their interlocutor a temporary cessation of
speech rather than attempt to override the truck’s loudness by purposefully increasing their own
loudness and sustaining that volume for the duration of the truck’s passage. The talker may
instead choose to alter their course in response to the oncoming truck, turning down an alternate
road and continuing their conversation. Thus, adjustments to ongoing speaking behavior are in
some instances planned and in others un-planned.
Positing and enacting plans to account for the projected appearance of loud noise exemplifies
cases where a higher-level (i.e., cognitive) task entity has been construed whose goal is to solve
the novel problem (i.e., the projected appearance of loud noise). Lombard effects, in contrast, are
representative of automated smart corrections. Crucially, the tasks/goals of relatively lower-level
task entities are subsumed by the higher-level task in those cases where overarching task entities
are posited. Therefore, the suite of automatisms proper to any lower-level skills becomes
available to overarching task entities that subsume them in the process of task construal.
71
The starting assumption of the present experiment is that one of the problems posed by
intrusive tics during active speech is the insertion of unintended meanings (referential or
semantic). It is suggested that one possible task solution lies in systematic signal segregation
between intended and unintended meanings. In theory, this kind of solution is available because
it is a part of natural conversation to segregate messages for intended purposes. For example,
when reading a story aloud, changes to one’s voice or speech patterns can be used performatively
to distinguish the speech of different characters. Or a talker having a conversation with a friend
in one room of a large house will use drastically increased volume to “speak” to their sister in
another room upstairs, without this being construed as shouting at her sister.
One documented way in which talkers enact separable speech signals for communicative
purposes is via abrupt changes to voice quality. Sometimes this reflects communicative uses of
sound symbolism regarding animal size. If a talker is going to imitate or perform as someone or
something that is larger than the talker, then the talker lowers their voice pitch. In contrast,
persons and things that are smaller than oneself should have voices that are of a higher pitch than
one’s own. In addition to animal size, sound symbolism is also extended to social categories of
imitation. Consider a talker who, speaking about their young niece to a friend, intends to utter a
message formulated as “She’s always whining like I am so bored”. The talker intends to poke
fun at the whiner by producing the italicized portion of the message in an exaggerated, “whiny”
voice. This speaker can be expected to deploy falsetto registration to achieve this communicative
goal because it is well known that speakers do this when imitating certain demographics (e.g.,
elderly women, young children; Stross, 2013). Speakers also use falsetto registration to take on
entirely different personas (Podesva, 2007). It can be concluded that modulation of voice quality
into and out of falsetto can signal a change in narrative perspective or stance—talkers can use
72
falsetto to indicate that their current utterance should not be interpreted as coming from
themselves but rather coming from another individual.
It is unlikely that when talkers shift between modal and falsetto registers in such
circumstances, the details of these shifts are planned. Attention is not placed on exactly how to
change the voice quality to accomplish the communicative goal of mocking the niece’s voice
(nor when, for that matter).
20
Instead, the relationship between particular voice qualities and
particular communicative circumstances has been learned previously, and the articulatory
adjustments needed to shift from one quality to another have been largely automatized.
Automatized shifting into and out of falsetto register in spoken communication accomplishes
a goal that is analogous to what is required for co-speech ticking, that is to say, the segregation of
ticked and spoken signals so as to indicate which one an interlocutor should actually attend to.
Here, it is argued that this same register-shift action can be re-purposed to serve a ticking-while-
talking task that is situated at a relatively higher level than merely talking.
To summarize, it is expected that free-ticking and speaking can interact systematically in
service of communication. The question addressed in the present study is whether one sign of
such systematic interaction is observable in the phonatory characteristics of verbal tics and true
words. It is proposed that an overarching task repurposes adaptive control over phonation to
serve the goals of a higher-level “ticking-while-talking” task. Talkers-who-tic are hypothesized
to tic and talk along distinct acoustic channels; the present study specifically asks whether voice
20
It is very likely certain relatively lower-level skills within the broader communicative toolkit like voice
quality modulations are automatized in Bernstein’s sense, but this question hasn’t been addressed
empirically. To test, experiments that gauge the ability of typical talkers to refrain from performing the
functions, and the amount of cognitive effort required to achieve this refraining if it is possible, are
required.
73
quality modulation into and out of falsetto may index transitions across tic and speech behavioral
systems.
3.1. Hypotheses
The design of this study hinges on the known qualitative distinctions between falsetto and
modal register. Voice in falsetto is underpinned by high frequency and low amplitude vibration
of vocal fold edges relative to modal voice (which involves vibration of the entire vocal fold
body) (Berry, Herzel, Titze, & Story, 1996; Hollien, 2014; Spencer & Titze, 2001; Titze, 2014).
Some physiological characteristics of the falsetto mode of laryngeal vibration are a relatively
stiff vocal ligament and concomitantly increased longitudinal tension (Švec, Schutte, & Miller,
1999), as well as relatively relaxed thyroarytenoid muscles (Deguchi, 2011). These conditions
lead to higher pitch and steeper spectral down-slope than that observed in modal voice (Lee,
Oya, Kaburagi, Hidaka, & Nakagawa, 2021; Neiman et al., 1997).
Apart from a higher f0 range, studies have found at least five acoustic parameters that
reliably distinguish between falsetto and modal voice: subharmonic-to-harmonic ratio,
harmonicity, intensity, the fourth harmonic amplitude, and the difference in amplitudes of the
first and second harmonic. For each parameter, falsetto values and modal values have a predicted
relationship. First, in unpublished work Keating (2014) measured the amplitude ratio between
subharmonics and harmonics (SHR) in the voices of college-age speakers who read passages in
neutral voice and in falsetto; the latter was elicited by instructing the speakers to read the story in
the voice of an old woman. Results of neutral (i.e., modal) voice data confirmed previous
findings of rich subharmonic structure (SHR) (Sun, 2002); in contrast, falsetto voice data had
close to zero subharmonics. Keating (2014) thus concluded that the number of sub-harmonics
74
serves as an acoustic indicator of falsetto/modal voice. A second index is harmonicity, or the
ratio between periodic and non-periodic portions present in the voiced signal (HNR). Pure modal
voice has high harmonicity; in contrast, Keating’s (2014) analysis found significantly lower
HNR in falsetto. Other data suggesting that relatively low HNR distinguishes falsetto from
modal voice comes from the vocal fold model in Zhang (2016). To model falsetto voice, they
decreased vocal fold medial surface thickness by increasing vocal fold tension. Those
adjustments were associated with increased glottal flow rate and open quotient of the glottis,
both of which are expected to result in low HNR. However, HNR on its own cannot uniquely
identify falsetto and modal registers because HNR in breathy and creaky voice is also low
relative to modal voice. Alipour, Finnegan, & Scherer (2009) identified a fourth possible index,
relative intensity or loudness, in their work using excised canine larynges. They used pressure-
flow sweeps to induce transitions into falsetto and determined that modal register involves louder
voicing. The relatively reduced intensity of falsetto register is a consequence of reduced
subglottal pressure (Alipour et al., 2009; Lee et al., 2021; Zhang, 2016).
21
Lastly, two measures
that characterize harmonic structure are expected to distinguish falsetto and modal voice.
Quantitative work investigating falsetto acoustic signatures uncovered the impoverished
harmonic structure of falsetto relative to modal register (e.g., Colton, 1972; Hollien, 2014).
These observations eventually lead to two specific acoustic metrics: the difference in amplitudes
of the first and second harmonic (H1-H2) and the amplitude of the fourth harmonic (H4).
Neiman and colleagues compared harmonics in falsetto and modal voice in different spoken
21
An older study with fewer participants (Colton, 1973) found no significant differences in amplitude
between modal and falsetto register. However, participants in this study produced sung (not spoken)
vowels. It is likely the case that under those circumstances the relationship between pitch and loudness
manifests differently in singing and speech.
75
vowels and found that the relationship between f0 (H1) and the second harmonic (H2) reliably
distinguishes falsetto and modal registers because in falsetto voice the amplitude of the H1 is
higher than the amplitude of H2, while in modal voice, this pattern is reversed (Neiman et al.,
1997: 137). A recent study observed similar results, albeit in vowels that were sung rather than
spoken (Lee et al., 2021). That study collected simultaneous aerodynamic, high-speed imaging,
electroglottographic, and acoustic recordings of trained singing of five vowels in three registers
(modal, falsetto, and mixed). Analyses of all singers’ data showed that the first and second
harmonic difference indexes the distinction between registers, finding that H1-H2 is greater in
falsetto relative to modal voice for all five vowels studied. Finally, the amplitude of the fourth
harmonic (H4), examined in Keating (2014), was found to be significantly lower in falsetto
relative to modal voice. In that study, a correction for vowel quality was performed that enables
pooling all vowel types. Corrected versions of both the H1-H2 (H1*-H2*) and H4 (H4*)
parameters were obtained, with spectral analysis showing that H4* is significantly lower in
falsetto voice relative to modal voice. Keating’s (2014) analyses support the results reported
above from Lee et al. (2021) with respect to H1*-H2* as well, who found that the difference is
larger (and positive) for falsetto relative to modal register. Taken together, the battery of acoustic
parameters described above can be used as a litmus test for the presence of falsetto in tics and
speech. Table 13 lists the assumed relationships between falsetto and modal phonation for each
acoustic parameter. It is proposed that tic phonation will be segregated via the use of falsetto or
falsetto-like register.
76
Table 13. Acoustic parameters and relationships between falsetto and modal voice with respect to those
parameters.
Parameter Prediction
f0 Falsetto > Modal
SHR Falsetto < Modal
HNR Falsetto < Modal
Intensity Falsetto < Modal
H1*-H2* Falsetto > Modal
H4* Falsetto < Modal
The hypothesis that verbal tics and speech are transmitted along distinct acoustic channels is
tested in this study both globally, in terms of the static properties of each behavioral system, and
locally, in terms of the transitions from one behavioral system to the other unfolding in time.
Global properties of tic/speech phonation are assayed in order to pinpoint which acoustic
parameters, if any, distinguish on average tics from speech and whether or not they do so in a
way that suggests specifically that tics but not speech are underpinned by falsetto phonation. In
the global analysis, individual tics and words are treated as tokens of a type as opposed to events
in time. Measurement is focused on stressed vowels, as these are the intervals of spoken
utterances during which modal phonation is most likely to be observed.
The verbal tics and true words produced by ticker-talkers can also be conceptualized as
events in time rather than tokens of a type. When tic and word events are consecutive, as they are
in Tic+Word and Word+Tic sequences, they represent what will be referred to here as cross-
system transitions. Control over vocalization changes from one behavioral paradigm to the other
at these switch points. Average, global differences as a function of vocalization types don’t take
into account that phonation is shifting into and out of falsetto in a targeted fashion at cross-
system transitions. Evidence of targeted modulations in voice quality must be uncovered by
77
examining the change in each acoustic measure across these transitions. If the acoustics are
indicative of switches between falsetto and modal, then the hypothesis that distinct phonatory
patterns are the result of cross-system coordination is supported. The local analysis thus involves
quantification of the changes in voice quality at junctures where the tic and speech behavioral
systems abut. Table 14 shows for each parameter whether it is expected to rise or to fall
significantly at each of the two possible Cross-system Transition types (Tic+Word and
Word+Tic).
Table 14. Predicted change in each acoustic parameter at cross-system transitions
Parameter Tic+Word Transitions Word+Tic Transitions
f0 Falls Rises
SHR Rises Falls
HNR Rises Falls
Intensity Rises Falls
H1*-H2* Falls Rises
H4* Rises Falls
To summarize, this study tests predictions of the hypothesis that verbal tics and speech are
produced along distinct acoustic channels, falsetto and modal respectively. The first prediction is
conceptualized in global terms: stressed vowels in verbal tics, but not true words, should show
evidence of being underpinned by voicing in falsetto register voicing. Finding little to no
evidence of falsetto in tics and/or finding that verbal tic phonation is comparable to phonation in
true words would falsify the hypothesis. The second prediction tested is conceived of in local
terms: transitions across behavioral systems should involve shifts from modal to falsetto registers
(or vice versa). The experimental hypothesis is supported if parameter values change
significantly and in the predicted direction when going from a tic to a word and from a word to a
78
tic.
3.2. Methods
To determine if phonation for verbal tics exhibits indicators of falsetto mode of laryngeal
vibration, the acoustics of voicing in co-speech ticking by adult ticker-talkers must be examined.
The fact that verbal tics and spoken utterances both include isomorphic stressed vowels enables a
direct comparison of phonation in tics and speech. The procedures of this study are described
below.
3.2.1. Case studies
Three female adults and one male participated in this study. Their ages fell within the 22-65
range. All are speakers of British English. Participants had all received a Tourette’s diagnosis
from a certified neurologist and reported no voice, speech or hearing disorders. A patient
advocacy group familiar with the Tourette’s community in the United Kingdom called Tourette’s
Hero facilitated recruitment of appropriate participant volunteers. Individuals who’s daily ticking
patterns could be characterized by those who knew them as including “very frequent vocal and
verbal tics” were contacted by the organization and directed to the researcher’s IRB-approved
recruitment materials. A condition of participation was a willingness to tic freely throughout the
study, that is, to refrain from suppressing one’s own tics.
3.2.2. Recordings
Audio recording took place in a small, quiet conference room with mild sound attenuation.
The room was equipped with a built-in system for audio visual presentations that was used to
79
present visual materials to the participants, who sat at a table facing the screen. The study was
modelled as a sociolinguistics-style interview (e.g., Eckert & Labov, 2017) with the investigator
present in the room. Verbal prompts by the investigator elicited performance of four different
speech tasks within each of three task types ( Table 15 ). Speech tasks were blocked so that each
block included a task from each type (Readings, Pictures, Narratives). The order of tasks within
each block was randomized but block order was fixed across participants. Participants were
outfitted with a Shure head-worn microphone adjusted to be about 2 inches from the corner of
the mouth. The acoustic signal was routed through an M Audio USB hub to a personal laptop
running Audacity software (Audacity Team, 2017) which performed the digital recording at a
sampling rate of 441. kHz.
Table 15. Verbal tasks elicited and the prompts used to elicit them.
Task Type Prompt Task Name
Personal
Narrative
“Tell me about a time when you felt
extreme ______.”
Joy narrative
Embarrassed narrative
Proud narrative
Sad narrative
Picture
Description
“Describe the picture you will see on the
screen in as much detail as possible.”
Pool party scene
Park scene
Beach scene
Animal house scene
Passage
Reading
“Read the passage that will appear on the
screen.”
Rainbow passage
Grandfather passage
Northwind passage
Comma gets a cure passage
3.2.3. Measurements
Long recordings comprising more than one task were trimmed. The start of a speech task was
80
defined as the moment at which the investigator finished uttering the verbal prompt
corresponding to its Task Type. The end of a task was defined as the end of the participant’s
statement indicating completion of the task. Participants reliably produced these statements
unprompted. Any verbal tics occurring within the bounds of a task so defined constitute the set
of co-speech verbal tics for that task.
An automatic speech-to-text service Otter.ai generated transcripts for each task recording.
These were subsequently manually corrected. The Penn Phonetics Lab Forced Aligner (P2FA;
Yuan and Liberman 2008) was fed corrected transcripts and trimmed audio recordings for each
task to automatically generate phone and word level segmentation in Praat (Boersma & Van
Heuven, 2001). This segmentation was corrected manually according to standard phonetic
conventions. Detailed criteria for manual correction of vowel segmentation were as follows:
§ The start of a vowel was defined as the start of periodicity observable in the
waveform, that is, the start of phonation.
§ The end of a vowel was determined by inspection of a combination of factors:
o the start of aperiodicity
o a sudden drop in amplitude of the waveform and
o a loss of energy in the second and third formants (F2 and F3)
Both verbal tics and true words are represented on the phone and word tiers of Praat text
grids. Phone and word intervals were labelled according to their Vocalization Type (Tic vs.
Word) on a separate tier. The first planned analysis requires extraction of vowels hosting primary
or secondary stress. Scripts running in the R coding environment (R Team, 2021) and using the
81
package rPraat (Bořil & Skarnitzl, 2016) extracted these intervals on the basis of their automatic
labeling by the Penn Forced Aligner, which tags stressed vowels with a digit 1 or 2 depending on
their stress level. The aligner determines a syllable’s stress level by referring to the CMU
pronouncing dictionary (Weide, 1998). Scripts also extracted word-level intervals (as opposed to
just stressed vowels) for the second planned analysis, which takes Tic+Word and Word+Tic
pairs and measures the changes in a subset of the acoustic parameters from the preceding to the
following item in the pair.
Stressed vowels or their containing words were excluded from statistical analyses if they met
one of the following criteria. First, intervals containing laugh particles were excluded because
laughter often has raised pitch relative to speech. Stressed vowels forming part of a word
containing laugh particles were excluded even if the stressed syllable was not the site where the
laughter took place. Second, the vowels forming part of filled pauses “um” and “uh” were
excluded from acoustic analyses because they are reliably of lower pitch than their surrounding
speech (Shriberg, 1994).
Voice analysis software called VoiceSauce (Shue, Keating, & Vicenik, 2009) was used to
perform acoustic measurements from audio recordings that were paired with their Praat text grid
for automatic labeling of tokens. This program runs in the Matlab environment and provides
automated voice measurements from acoustic recordings. For each acoustic measure,
VoiceSauce produces a sequence of three values resulting from the averaging of one-
millisecond-apart data points in each of three same-size portions of the stressed vowel or word
interval. In the case of whole words, any voiced frame can contribute to the measurement. The
mean of the three values for each interval determines the Mean x for each acoustic parameter x
for every stressed vowel and word token. Datasets consist of all measures for all vowel phones
82
hosting primary or secondary stress as well as the tic or true words containing them. Participant
datasets are treated as independent case studies. The six acoustic parameters of interest are
detailed below.
Mean f0
Praat’s autocorrelation method (Boersma, 2013) for extracting f0 candidate points was
deployed via VoiceSauce. The pitch floor and ceiling were set to 70 Hz and 600 Hz respectively
for all participant data. The time step was set to 0.01. The voicing threshold parameter was raised
from the default 0.45 to 0.7; this parameter defines the strength of the unvoiced candidate
relative to the maximum possible autocorrelation. If the voicing threshold is low, then periods of
creaky voice can be incorrectly interpreted as having impossibly high f0 because the actual pitch
during those intervals is possibly lower than the established floor. The effect of raising the value
of this parameter is to increase the number of unvoiced decisions. This in turn increases the
chances that periodicities representing creaky voice are measured to have no pitch, a
consequence of which is their exclusion from analyses (see below). An unintended consequence
of this approach, however, is that a certain number of stressed vowels with measurable modal
phonation under typical voicing threshold parameter values are excluded. f0 points occurring
within the bounds of the stressed vowel or word interval are averaged to obtain Mean f0.
Mean SHR
The SHR refers to the amplitude ratio between sub-harmonics and harmonics (Sun 2002).
SHR is computed using a spectrum compression technique (Sun 2002: 333). SHR values smaller
than a threshold value indicate that the sub-harmonics are weak and that the harmonics should be
83
favored (Sun 2002: 334). SHR values within each stressed vowel or word interval are averaged
to obtain Mean SHR for each interval.
Mean HNR
VoiceSauce measures harmonicity using the algorithm in De Krom (1993). HNR
measurements are found by liftering the pitch component of the cepstrum and comparing the
energy of the harmonics with the noise floor. Measurements use a variable window length equal
to five pitch periods. For each stressed vowel, VoiceSauce measures HNR within three
bandwidth ranges: 0-500Hz, 0-1500Hz, and 0-2500Hz. Values for each bandwidth for each of
the three time points are then averaged to obtain Mean HNR for the stressed vowel or word.
Mean Intensity
The intensity measure in VoiceSauce is the root mean square of energy calculated at every
frame over a variable window equal to five pitch pulses. Normalization of the energy measure
with f0 is accomplished by the variable window (Vicenik, Lin, Keating, & Shue, 2021). Energy
measures from each stressed vowel are averaged to obtain Mean Intensity measure for that
vowel. Due to frequent tic head movements, it was necessary to devise a procedure for
participants to report position shifts of the microphone that allowed the researcher to adjust its
position.
Mean H1*-H2* & Mean H4*
VoiceSauce determines harmonic amplitudes pitch-synchronously over a three-pitch-period
window (Vicenik et al., 2021). First, second and fourth harmonics (H1, H2, H4) were located
84
and measured by determining the maximum of the spectrum around peak locations as estimated
by f0. When measuring the amplitudes of these harmonics, VoiceSauce implements an algorithm
developed by Iseli & Alwan (2004) to correct for the effect of formant frequency structure. This
facilitates comparison across vowels of different qualities present in each dataset. At each frame,
amplitude measures are corrected by using formant frequencies obtained from Praat. Formant
frequency estimation was carried out with Praat’s default settings: the number of formants set to
four and the maximum formant frequency set to 6000 Hz. VoiceSauce then deploys the formula
from Hawks & Miller (1998) to calculate formant bandwidths. Corrections for harmonic
amplitudes (H1, H2, H4) use F1 and F2, as well as their bandwidths (B1 and B2). H1*, H2* and
H4* measures—the harmonic amplitudes that have been corrected for vowel identity—are
smoothed with a moving average filter of 20 milliseconds. Mean H1*-H2* for each stressed
vowel refers to the difference between H1 and H2 amplitudes after correction for vowel identity,
averaged across stressed vowel or word-sized intervals. Mean H4* refers to the amplitude of the
fourth harmonic after corrections, averaged across stressed vowel or word intervals.
3.2.4. Statistical analyses
All statistical comparisons were conducted in the R coding environment (R Core Team,
2021) using the rstatix package (Kassambara, 2020). Quantile-quantile plots of acoustic
measurements (Mean f0, Mean Intensity, etc.) suggested that the stressed vowel data for tic/word
groups were not normally distributed. Furthermore, word group sample sizes were always
considerably larger than tic group sample sizes. For this reason, non-parametric statistical tests
were used in both the global and local level analyses.
85
Global analyses
In the planned global analyses, every stressed vowel is treated as a token of a Tic/Word type.
Tic and Word group scores for each of the six acoustic parameters that were extracted from
every stressed vowel interval (Mean f0, Mean SHR, etc.) were the independent samples in one-
sided Wilcoxon rank-sums tests (also known as Mann-Whitney). That is, six different rank-sum
tests were performed per participant dataset, with a different acoustic parameter being compared
on the basis of Vocalization Type in each test (for each participant). Each analysis is based on
the ranks of the Mean f0/SHR/HNR/etc. scores for Tic and Word samples, with Tasks pooled.
Table 16 shows the number of the Tic/Word stressed vowel samples for each participant.
Table 16. Stressed vowel counts for each participant by vocalization type.
Participant A B C D
Tic Vowels 45 79 476 32
True Vowels
707 1556 2677 1867
For each parameter, statistical analyses tested a different alternative hypothesis (greater or
lesser than); the “side” of the test depended on the predicted relationship between Tic/Word
vocalizations with regards to that parameter. For example, Mean f0 scores in the tic domain are
expected to be higher than true word Mean f0 scores if tics but not speech are produced in
falsetto. Correspondingly, the R function used to perform the rank-sums test comparing Mean f0
values on the basis of Vocalization Type was called with the alternative “greater” specified. In
contrast, Mean SHR for tics is expected to be lower than Mean SHR for true words, and so the
alternative hypothesis tested by the rank-sums test is “less.” A significant result in these tests
86
indicates that the median difference between rank pairs is not zero and that the relationship
between the groups is as predicted, greater or lesser than as the case may be. Put another way, a
significant result suggests that were a random observation to be taken from each of tic and word
groups (x0 and y0), then the probability that x0 > y0 is not the same as the probability that x0 < y0.
Local analyses
Local changes in parameters at cross-system transitions are examined by treating whole-word
tic/speech intervals as sequences or events in time (as opposed to tokens of a type). The
hypothesized qualitative voice quality switching at cross-system transitions should lead to
significant “jumps” up or down in acoustic parameter values. For example, if a transition from
ticking to speaking involves shifting from falsetto to modal register, as predicted, then Tic+Word
transitions should be associated with a significant drop in Mean f0. Differences in parameter
values between tic/word paired events in both juncture types are analyzed to address this
question.
Tic and Word scores for each of the six acoustic parameters were extracted from every
whole-word interval (Mean f0, Mean SHR, etc.). These scores were then paired according to
their occurrence in real-time. For instance, the utterance I’m not happy biscuit contains two
word-word sequences I’m not and not happy. Because “biscuit” is a tic, the utterance also
contains one Word+Tic sequence in happy biscuit. The latter sequence exemplifies one of the
two cross-system juncture types (Tic+Word and Word+Tic). For each participant dataset,
Tic+Word and Word+Tic sequences were the paired samples for one-sided Wilcoxon signed-
rank tests. Specifically, Tic+Word cross-system transitions are examined by comparing
preceding (tic) and following (word) scores for Mean f0/HSR/HNR/etc.; Word+Tic cross-system
87
transitions are examined by comparing preceding (word) and following (tic) scores on the same
parameters. Analyses are based on the ranks of n paired differences, where n is the sample size
of Tic+Word and Word+Tic transition groups. The bottom two rows of Table 17 show the
number of Tic+Word and Word+Tic transitions for each participant. Note that Wilcoxon signed-
rank tests cannot give a significant result irrespective of the difference between preceding and
following scores if the number of observations is less than five. Participant A and D’s datasets
for the local analyses are therefore in the lower limit for this analysis.
Table 17. Counts of tic/word tokens and two cross-system juncture types for each participant.
Participant A B C D
Verbal Tic Count 48 112 648 33
True-Word Count 1035 2126 3540 2609
Tic+Word
Transitions
7 63 314 7
Word+Tic
Transitions
5 65 325 7
Twelve different paired-differences tests per participant dataset are possible—one for each of
six acoustic parameters for each of two cross-system juncture types (Tic+Word and Word+Tic).
However, for each participant, only the subset of acoustic parameters that patterned as predicted
by the experimental hypothesis at the global level underwent local analysis.
22
Hence, results of
22
Each participant’s global analysis achieves identification of the acoustic parameters along which true-
word and tic-word stressed vowels differ in a manner that suggests falsetto phonation for tics. If a
parameter failed to index this distinction in a participant’s global analysis, then it can be assumed that the
parameter does not behave systematically at cross-system transitions.
88
twelve paired difference tests are reported for Participant B, while results of only six of these
tests are reported for Participant A.
3.3. Results
Results for all four participants generally support the hypothesis that tics are relegated to a
falsetto acoustic channel as a means to generate separate or separable acoustic streams for
unintended (versus intended) meaning. The first prediction tested in this study is that the
acoustics of phonation in verbal tic and true word stressed vowels can be distinguished on the
basis of six potential acoustic parameters. Specifically, verbal tic stressed vowels were expected
to exhibit falsetto voice in contrast to the stressed vowels of true words, which are expected to
exhibit indicators of modal voice. The results indicate that the tics and speech of all participants
were distinguishable on the basis of at least three out of the six acoustic parameters investigated
(three, six, four and five parameters for Participants A, B, C, and D respectively). A second
prediction tested is that systematic “jumps” in acoustic parameter values occur at cross-system
transitions, the acoustic consequence of the ticker-talker’s switching between falsetto and modal
voice. Changes in the parameters at Tic+Word and Word+Tic junctures generally did jump
up/down as predicted given the experimental hypothesis. Results for each participant are
reported individually in what follows. Because Participant A and D produced so few usable
verbal tics in their Narrative tasks—zero for the former and two for the latter—data from the
Narrative Task Type was excluded from their analyses.
3.3.1. Participant A
Three out of the six acoustic parameters in Participant A’s dataset patterned as predicted if
89
tics but not words are produced on a distinct, falsetto acoustic channel. Before moving to the
details of the stressed vowel and whole-word data, Table 18 lists the differences between Tic and
Word group averages by Task Type for each acoustic parameter (Mean f0, Mean HNR, etc.).
Tic-word differences for each parameter are expected to be positive or negative depending on the
predicted tic/word relationship for that measure. Tic-word differences in average Mean f0 and
Mean H1*-H2* are predicted to be positive if phonation for tics but not words involves falsetto
registration. In contrast, the differences between tics and words in average Mean SHR, HNR,
Intensity and H4* should be negative to support the experimental hypothesis. In Participant A
co-speech ticcing, Tic and Word groups are successfully distinguished on the basis of Mean f0,
Mean SHR, and Mean HNR (white cells in Table 18).
Table 18. Difference between Tic and Word group averages in each acoustic parameter by Task Type.
Gray cells indicate parameters that did not pattern as predicted.
Task Type Acoustic Parameter Tic – Word Difference
Picture Av. f0 180.9297168
Av. H1H2 -5.1699777
Av. SHR -0.1206042
Av. HNR -17.1342171
Av. Intensity 19.0484348
Av. H4 11.0195127
Reading Av. f0 214.8276019
Av. H1H2 -7.2120956
Av. SHR -0.1094300
Av. HNR -23.2006174
Av. Intensity 24.3506760
Av. H4 11.5087208
Participant A global analyses
90
Acoustic parameters extracted from stressed vowels were compared on the basis of
Vocalization Type (Tics vs. Words) using one-sided versions of the independent samples
Wilcoxon test. The predicted relationship between tics and words with respect to the acoustic
parameter in question determined the “side” of the test. Tasks and Task Types are pooled in the
Tic and Word groups. Three out of the six acoustic parameters differed significantly as predicted
by the hypothesis being tested. Results for these data are shown in Table 19. To summarize, tic
word Mean f0 was higher than true word Mean f0 while Mean SHR and Mean HNR were both
lower for tics relative to true words. These three findings are as expected if tic phonation
involves falsetto voice. Three parameters did not differ significantly as predicted: Mean
Intensity, Mean H4*, and Mean H1*-H2*. It is worth noting that in the case of parameters H4*
and Intensity, which did not pattern as predicted, a reversed pattern can be observed (Figure 3,
two bottom left-most panels). This finding will be discussed further in the Discussion section.
Table 19. Acoustic parameter extraction results for ticked (n=45) and word (n=707) stressed vowel
intervals for Participant A. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon
tests using group medians. Reported p-values are adjusted using Bonferroni correction.
Acoustic
Parameter
Tic Vowels Word Vowels Hypothesis p
M SD Med M SD Med
Mean f0 377.0 80.8 377.0 210.0 39.0 210.0 Tic > Word p < .001
Mean SHR 0.024 0.172 0.024 0.329 0.212 0.329 Tic < Word p < .001
Mean HNR 20.3 13.9 20.3 41.3 10.7 41.3 Tic < Word p < .001
Mean Intensity 28.4 25.6 28.4 0.4 1.7 0.4 Tic < Word ns
Mean H1*-H2* 1.32 6.93 1.32 2.44 5.36 2.44 Tic > Word ns
Mean H4* 7.95 5.65 7.95 -5.77 6.78 -5.77 Tic < Word ns
91
Figure 25. Participant A boxplots showing results of acoustic measurements on tic (n=45) and word
(n=707) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR, Mean Intensity,
Mean H4* and Mean H1*-H2*. In this and all following boxplots, the group median is represented as a
horizontal line inside the box, the interquartile range [IQR] is represented by the box, and intervals
between minimum and maximum values within 1.5 * the interquartile range [IQR] are represented by
vertical bars. Significance is based on p-values adjusted using Bonferroni correction.
Participant A local analysis
The three parameters that reliably distinguished tics from speech in the global results above
underwent a paired differences analysis. Differences in Mean f0, Mean SHR, and Mean HNR
across Tic+Word and Word+Tic pairs were examined to determine if transitions between tic and
speech systems are concomitant with significant “jumps” in the predicted direction (up or down).
Participant A data had only seven Tic+Word transitions and five Word+Tic transitions, which
are at the lower limit of sample sizes that can be analyzed by the Wilcoxon signed-rank test
92
deployed in this analysis.
23
Every instance of cross-system transition patterned as expected given the experimental
hypothesis (Table 20). Tic+Word transitions are characterized by falling Mean f0 and rising
Mean SHR and HNR. Word+Tic transitions are the inverse—Mean f0 rises whereas Mean SHR
and Mean HNR fall. With regards to Tic+Word pairs, Wilcoxon signed-ranks tests indicated that
Word Mean f0 scores are significantly lower relative to the preceding Tic Mean f0 scores (Z = 0,
p = 0.007), and Word Mean SHR and Mean HNR are significantly higher relative to preceding
Tic SHR/HNR (Mean SHR: Z = 25, p = 0.039; Mean HNR: Z = 28, p = 0.008). Turning to
Word+Tic pairs, signed-rank tests showed that Tic Mean f0 scores are significantly higher scores
for preceding words (Z = 15, p = 0.032), and Mean SHR/Mean HNR are significantly lower in
the following element relative to the preceding element (Mean SHR: Z = 1, p = 0.063; Mean
HNR: Z = 0, p = 0.031). Figure 26 illustrates these results.
Table 20. Acoustic parameter results from preceding and following word-size elements at Participant A
cross-system transitions. Tic+Word transitions (left; n=7) are made up of consecutive Tic+Word pairs.
Word+Tic transitions (right; n=5) pairs are made up of consecutive Word+Tic pairs.
Parameter TicàWord Transitions WordàTic Transitions
Preceding Tic Following Word Preceding Word Following Tic
M SD Med M SD Med M SD Med M SD Med
Mean f0 483 34.2 491 284 75.4 281 198 10.4 196 477 25.1 464
Mean SHR 0.06 0.02 0.05 0.10 0.05 0.11 0.154 0.10 0.15 0.05 0.02 0.05
Mean HNR 10.1 5.41 9.86 35.80 9.35 36.1 39.3 5.65 39.2 8.10 3.41 6.24
23
The low count of cross-system transitions is related to Participant A’s co-speech ticking strategy, which
appears to revolve around strict separation in time between ticked and spoken utterances (see previous
chapter).
93
Figure 26. Participant A boxplots showing acoustic measurements in Tic+word paired events (left
panels) and Word+Tic paired events (right panels). Tics are in salmon (preceding element in left panels,
following element in right panels), and words are in cyan (following element in left panels, preceding
element in right panels). Paired events are connected by a gray line. P-values are from paired, one-sided
Wilcoxon tests.
3.3.2. Participant B
All six acoustic parameters extracted from Participant B data differ significantly as a function
of Vocalization Type in the direction predicted by the experimental hypothesis. Table 21 lists
tic/word differences for each parameter by Task Type. These data are the result of subtracting
Tic and Word group averages for each acoustic parameter. Tic/Word differences in f0 and H1*-
H2* are positive, as expected. The other four parameter differences are negative, as expected.
94
Table 21. Average Tic and Word group differences for acoustic parameters by Task Type. All differences
patterned as predicted.
Task Type Acoustic Parameter Tic – Word Difference
Narrative Av. f0 63.98295722
Av. H1H2 1.04182473
Av. SHR -0.08022373
Av. HNR -8.48260995
Av. Intensity -1.03432062
Av. H4 -2.38143926
Picture Av. f0 48.35804607
Av. H1H2 0.47514664
Av. SHR -0.07378061
Av. HNR -10.68477464
Av. Intensity -1.01945159
Av. H4 -4.72239723
Reading Av. f0 45.85622388
Av. H1H2 1.03817930
Av. SHR -0.08543880
Av. HNR -11.01944706
Av. Intensity -0.57713879
Av. H4 -3.01303699
Participant B global analysis
Participant B stressed vowels were compared as a function of Vocalization Type with Tasks
and Task Types pooled. One-sided, unpaired Wilcoxon tests found significant differences in the
expected direction for all six acoustic parameters measured. Global Mean f0 and Mean H1*-H2*
were significantly higher in tic stressed vowels relative to speech vowels according to the signed
rank tests. Parameters Mean SHR/HNR/Intensity/H4* were all significantly lower in tics than in
speech. Table 22 shows results and their significance level.
95
Table 22. Acoustic parameter extraction results for ticked (n=79) and spoken (n=1556) Participant B
stressed vowel intervals. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon tests
using group medians. Reported p-values are adjusted using Bonferroni correction.
Acoustic
Parameter
Tic Vowels Word Vowels Hypothesis p
M SD Med M SD Med
Mean f0 333.0 104.0 340.0 198.0 41.7 188.0 Tic > Word p < .001
Mean SHR 0.115 0.192 0.024 0.331 0.199 0.327 Tic < Word p < .001
Mean HNR 31.9 8.3 30.7 36.3 10.6 38.6 Tic < Word p < .001
Mean Intensity 2.38 2.31 1.62 3.27 3.10 2.36 Tic < Word p = .002
Mean H1*-H2* 5.59 6.79 6.56 0.653 4.85 -0.221 Tic > Word p < .001
Mean H4* -1.22 5.53 0.01 1.06 5.10 1.29 Tic < Word p < .001
Figure 27. Participant B boxplots showing results of acoustic measurements on tic (n=79) and word
(n=1556) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR, Mean Intensity,
Mean H4* and Mean H1*-H2*.
96
Pariticipant B local analysis
All six acoustic parameters from Participant B data were subjected to the analysis of paired
differences across Tic+Word and Word+Tic transitions. Table 23 describes results. Paired, one-
sided Wilcoxon tests compared preceding tics and following words at Tic+Word transitions,
finding that Mean f0 and Mean H1*-H2* scores were significantly lower for words relative to
tics (f0: Z = 56, p < .001; H1*-H2*: Z = 539, p < .001), while, correspondingly, Mean SHR,
HNR, Intensity and H4* scores were significantly lower in preceding tics relative to following
words (SHR: Z = 1293, p=.013; HNR: Z = 1294, p=.025; Intensity: Z = 1279, p=.032; H4*: Z =
1379, p=.006). With respect to Word+Tic transitions, signed-rank tests showed the following:
Tic Mean f0 and H1*-H2* scores are significantly higher relative to preceding words (f0: Z =
2015, p < .001; H1*-H2*: Z = 1697, p < .001) while Tic Mean SHR, HNR, Intensity and H4* are
all lower relative to preceding words (SHR: Z = 644.5, p = 0.003; HNR: Z = 609, p = 0.001;
Intensity: Z = 1083, p = 0.529; H4*: Z = 902, p = 0.133). Thus, as predicted, jumps of
significant magnitude and in predicted directions occur at cross-system transitions for all
acoustic measures in Participant B data. On the one hand, Mean f0 and Mean H1*-H2* drops
suddenly when crossing from tics to words and rises abruptly when crossing from words to tics.
Meanwhile, Mean SHR, HNR, Intensity and H4* all rise abruptly when crossing from tics to
speech and drop suddenly when crossing from speech to tics.
97
Table 23. Acoustic parameter results from preceding and following word-size elements at Participant B
cross-system transitions. Tic+Word transitions (left; n=63) are made up of consecutive Tic+Word pairs.
Word+Tic transitions (right; n=65) pairs are made up of consecutive Word+Tic pairs.
Parameter TicàWord Junctures WordàTic Junctures
Preceding Tic Following Word Preceding Word Following Tic
M SD Med M SD Med M SD Med M SD Med
Mean
f0
290 82.4 194 208 41.7 194 198 59.0 184 296 76.5 273
Mean
SHR
0.15 0.09 0.15 0.11 0.13 0.18 0.11 0.12 0.18 0.13 0.09 0.11
Mean
HNR
34.5 8.69 35.0 36.4 11.2 38.8 37.5 6.77 37.3 34.1 8.48 34.6
Mean
Intensity
2.24 2.17 1.25 2.82 2.36 2.16 2.05 1.99 1.33 2.03 1.60 1.82
Mean
H1*-H2*
5.25 4.11 4.89 2.44 4.86 2.44 5.24 4.41 5.4 1.38 5.16 0.82
Mean
H4*
-1.40 4.55 -0.58 0.563 4.66 1.25 -0.73 4.77 0.139 -1.52 4.85 -1.12
98
Figure 28. Participant B boxplots showing acoustic measurements in Tic+Word paired events (left
panels; n=63) and Word+Tic paired events (right panels; n=65). Tics are in salmon (preceding element in
left panels, following element in right panels), and words are in cyan (following element in left panels,
preceding element in right panels). Paired events are connected by a gray line. P-values are from paired,
one-sided Wilcoxon tests.
3.3.3. Participant C
For participant C, statistical comparisons of four out of the six acoustic parameters on the
99
basis of Vocalization Type were consistent with the hypothesis that verbal tic phonation involves
falsetto voice. The differences in Mean f0 and Mean H1*-H2* between tics and words should be
positive if the prediction that tics but not words are produced in falsetto voice is supported; the
differences in Mean SHR, HNR, Intensity and H4* should be negative if the prediction is
supported. Differences between tics and words in four parameters (f0, SHR, HNR, and H1*-
H2*) patterned as predicted in all three Task Types (Table 24). The difference between the
corrected fourth harmonic amplitude averages for Tic and Word groups patterned as predicted
only in the picture description Task Type.
100
Table 24. Difference between Tic and Word averages by Task Type. Gray rows indicate acoustic
parameters that did not pattern as predicted.
Task Type Acoustic Parameter Tic – Word Difference
Narrative Av. f0 140.91920698
Av. H1H2 4.73825897
Av. SHR -0.13931033
Av. HNR -13.97144861
Av. Intensity 1.46704350
Av. H4 1.72050344
Picture Av. f0 98.41516850
Av. H1H2 4.83699700
Av. SHR -0.10975016
Av. HNR -15.85279736
Av. Intensity 0.08059912
Av. H4 -1.00701306
Reading Av. f0 106.49002298
Av. H1H2 0.65887864
Av. SHR -0.11075744
Av. HNR -10.08905279
Av. Intensity 1.53140278
Av. H4 0.27508912
Participant C global analyses
One-sided unpaired Wilcoxon tests compared stressed vowel data on the basis of
Vocalization Type (tasks pooled). Stressed vowel Mean f0 and Mean H1*-H2* were both
significantly higher in verbal tics relative to words, as predicted, and Mean SHR and Mean HNR
were both significantly lower for tics than for words, as predicted. Results of Mean Intensity and
Mean H4* comparisons were not as predicted by the experimental hypothesis. Table 25 shows
results for all statistical comparisons for Participant C stressed vowel measurements.
101
Table 25. Acoustic parameter extraction results for tic (n=476) and word (n=2677) Participant C stressed
vowel intervals. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon tests using
group medians. Reported p-values are adjusted using Bonferroni correction.
Acoustic
Parameter
Tic Vowels Word Vowels Hypothesis p
M SD Med M SD Med
Mean f0 297.0 83.6 271.0 186.0 40.2 180.0 Tic > Word p < .001
Mean SHR 0.212 0.205 0.123 0.370 0.187 0.375 Tic < Word p < .001
Mean HNR 29.5 11.0 31.0 44.7 11.6 46.7 Tic < Word p < .001
Mean Intensity 1.81 4.99 0.20 0.48 1.41 0.11 Tic < Word ns
Mean H1*-H2* 5.22 6.21 5.22 -0.13 5.14 -1.03 Tic > Word p < .001
Mean H4* -7.01 9.07 -9.84 -10.1 7.4 .11.0 Tic < Word ns
Figure 29. Participant C boxplots showing results of acoustic measurements on tic (n=476) and word
(n=2677) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR, Mean Intensity,
Mean H4* and Mean H1*-H2*.
102
Participant C local analyses
Acoustic parameters that patterned as predicted in the global analysis underwent the local
analysis that measured jumps in parameter values across Tic+Word and Word+Tic pairs
representing cross-system transitions. Paired, one-sided Wilcoxon tests compared scores of tics
and words at Tic+Word transitions; the side of the test depended on the predicted jump direction
(up/down) for each acoustic parameter for the two kinds of cross-system transition. Tasks are
pooled in Tic and Word groups. For all four parameters tested and both juncture types, paired
differences of significant magnitude and in the right direction were found. In Tic+Word pairs,
preceding Word Mean f0 and H1*-H2* scores were significantly lower relative to preceding Tic
means (f0: Z = 1777, p < .001; H1*-H2*: Z = 14920.5, p < .001). Word Mean SHR and HNR
were significantly higher relative to preceding tics (SHR: Z = 42322.5, p < .001; HNR: Z =
45821, p < .001). In Word+Tic pairs, Tic Mean f0 and H1*-H2* scores were significantly higher
relative to preceding word scores (f0: Z = 52093, p < .001; H1*-H2*: Z = 47472, p < .001), and
Tic Mean SHR and HNR was significantly lower (SHR: Z = 4602.5, p < .001; HNR: Z = 1684, p
< .001).
103
Table 26. Acoustic parameter results from preceding and following word-size elements at Participant C
cross-system transitions. Tic+Word transitions (left; n=628) are made up of consecutive Tic+Word pairs.
Word+Tic transitions (right; n=650) pairs are made up of consecutive Word+Tic pairs.
Parameter TicàWord Junctures WordàTic Junctures
Preceding Tic Following Word Preceding Word Following Tic
M SD Med M SD Med M SD Med M SD Med
Mean
f0
277.0 71.7 257.0 199.0 42.0 191.0 180.0 41.1 173.0 281.0 73.7 261.0
Mean
SHR
0.14 0.05 0.14 0.24 0.12 0.23 0.234 0.102 0.221 0.134 0.053 0.127
Mean
HNR
32.3 6.7 32.6 43.9 11.3 45.5 45.4 9.0 47.4 32.4 7.4 32.8
Mean
H1*-H2*
5.88 5.14 6.08 3.09 5.98 2.63 -0.12 4.83 -0.89 6.20 5.06 6.73
104
Figure 30. Participant C boxplots showing acoustic measurements in Tic+Word paired events (left
panels; n=63) and Word+Tic paired events (right panels; n=65). Tics are in salmon (preceding element in
left panels, following element in right panels), and words are in cyan (following element in left panels,
preceding element in right panels). Paired events are connected by a gray line. P-values are from paired,
one-sided Wilcoxon tests.
3.3.4. Participant D
As was the case with the three previous participants, Participant D data largely supported the
105
experimental hypothesis that stressed vowel phonation for verbal tics is underpinned by falsetto
mode of laryngeal vibration in contrast to stressed vowel phonation for speech. Table 27 lists the
results of taking Tic and Word group averages for each acoustic parameter (Mean f0, Mean
HNR, etc.) and subtracting them to get the tic-word differences in each parameter by Task Type.
Positive f0 and H1*-H2* differences would support the hypothesis being tested, as would
negative SHR, HNR, Intensity, and H4* differences. For Participant D, the following acoustic
parameters pattern as expected given the hypothesis being tested: Mean f0, Mean SHR, Mean
HNR, Mean H4*, and Mean H1*-H2*.
Table 27. Differences between Tic and Word group averages for each acoustic parameter by Task Type.
Gray cells indicate parameters that did not pattern as predicted.
Task Type Acoustic Parameter Tic – Word Difference
Picture Av. f0 30.73581488
Av. H1H2 2.35170580
Av. SHR -0.09766691
Av. HNR -2.00252267
Av. Intensity 1.69283626
Av. H4 -0.60537181
Reading Av. f0 75.12114239
Av. H1H2 3.08732081
Av. SHR -0.06050526
Av. HNR -6.89948523
Av. Intensity 1.80917707
Av. H4 -0.82220615
Participant D global analyses
Statistical comparisons of stressed vowel data for the six parameters on the basis of
Vocalization Type found significant differences in predicted directions for Mean f0, SHR, HNR,
106
H4*, and H1*-H2* in Participant D data. Figure 31 shows boxplots of acoustic data for Tic and
Word groups (Tasks pooled); results are summarized in Table 28.
Table 28. Acoustic parameter extraction results for ticked (n=32) and spoken (n=1867) Participant D
stressed vowel intervals. Hypotheses shown in third column tested by unpaired, one-sided Wilcoxon tests
using group medians. Reported p-values are adjusted using Bonferroni correction.
Acoustic
Parameter
Tic Vowels Word Vowels Hypothesis p
M SD Med M SD Med
Mean f0 189.0 47.7 198.0 120.0 24.4 115.0 Tic > Word p < .001
Mean SHR 0.29 0.16 0.29 0.39 0.11 0.38 Tic < Word p = .003
Mean HNR 31.0 11.3 32.2 35.6 11.1 37.0 Tic < Word p = .002
Mean Intensity 3.99 3.95 2.84 1.13 1.14 0.76 Tic < Word ns
Mean H1*-H2* -2.09 3.91 -2.62 -5.21 3.99 -5.60 Tic > Word p < .001
Mean H4* -1.22 6.17 -1.16 1.84 5.43 1.94 Tic < Word p = .001
Figure 31. Participant D boxplots showing results of acoustic measurements on tic (n=32) and word
(n=1867) stressed vowels. Clockwise from top left: Mean f0, Mean SHR, Mean HNR, Mean Intensity,
Mean H4* and Mean H1*-H2*.
107
Participant D local analyses
Significant differences in the global analysis are also significant in the local analyses in both
Tic+Word and Word+Tic pairs. Paired, one-sided Wilcoxon tests found that Word Mean f0 and
H1*-H2* scores in Tic+Word pairs are significantly lower relative to preceding tic scores (f0: Z
= 4, p = 0.0547; H1*-H2*: Z = 7, p = 0.148); the opposite is true for Word+Tic pairs, where tic
Mean f0 and H1*-H2* scores are lower relative to preceding word scores (f0: Z = 18, p = 0.289;
H1*-H2*: Z = 28, p = 0.008). Relatedly, Word Mean SHR, HNR, and H4* scores are higher
relative to preceding tic scores in Tic+Word pairs (SHR: Z = 22, p = 0.109; HNR: Z = 19, p =
0.234; H4*: Z = 11, p = 0.711) and lower relative to following tic scores in Word+Tic pairs
(SHR: Z = 3, p = 0.039; HNR: Z = 0, p = 0.008; H4*: Z = 10, p = 0.289). These findings are
presented in Table 29 and illustrated in Figure 32.
108
Table 29. Acoustic parameter results from preceding and following word-size elements at Participant C
cross-system transitions. Tic+Word transitions (left; n=7) are made up of consecutive Tic+Word pairs.
Word+Tic transitions (right; n=7) pairs are made up of consecutive Word+Tic pairs.
Parameter TicàWord Junctures WordàTic Junctures
Preceding Tic Following Word Preceding Word Following Tic
M SD Med M SD Med M SD Med M SD Med
Mean
f0
165.0 53.4 156.0 123.0 26.7 109.0 119.0 35.8 106.0 162.0 57.1 163.0
Mean
SHR
0.20 0.09 0.19 0.29 0.14 0.28 0.37 0.18 0.32 0.16 0.05 0.16
Mean
HNR
33.0 4.6 34.5 36.2 5.5 33.8 39.9 8.9 42.6 30.0 6.9 27.2
Mean
H1*-H2*
-2.77 5.33 -1.95 -4.22 3.81 -6.15 -7.60 5.19 -9.41 -2.00 5.43 0.39
Mean
H4*
4.78 13.30 2.37 1.46 6.37 4.54 0.86 4.32 2.00 0.37 14.90 -0.29
109
Figure 32. Participant D boxplots showing acoustic measurements in Tic+Word paired events (left
panels; n=7) and Word+Tic paired events (right panels; n=7). Tics are in salmon (preceding element in
left panels, following element in right panels), and words are in cyan (following element in left panels,
preceding element in right panels). Paired events are connected by a gray line. P-values are from paired,
one-sided Wilcoxon tests.
3.4. Discussion
In the present study, six acoustic parameters previously shown to index the distinction
between falsetto and modal phonation register were extracted from stressed vowel and whole-
word acoustic intervals corresponding to verbal tics and true words. Findings from all participant
datasets support the hypothesis that verbal tics and true words are produced on distinct acoustic
channels. Specifically, the phonatory characteristics of tics are indicative of falsetto mode of
110
laryngeal vibration, though participants differed with respect to the subset of acoustic parameters
that marked their tic/speech phonatory distinction. For Participant B, all six parameters succeed
in distinguishing tics from speech in a way that is indicative of a falsetto channel for tic
phonation. Five out of the six parameters behaved as predicted in Participant D co-speech ticcing
(f0, SHR, HNR, H4*, and H1*-H2*). Participant C tics and speech were distinguished on the
basis of four of the parameters: f0, SHR, HNR, and H1*-H2*. Participant A had the smallest
subset of parameters that patterned as predicted (3)—f0 in verbal tics is higher relative to true
words and both SHR and HNR are lower in tics than in true words. These three parameters
patterned as predicted in all participant’s datasets. Global results do not seem to be artefacts of
averaging. For every participant dataset, the subset of parameters that differed significantly in the
expected direction in global analyses also “jumped,” i.e. underwent an abrupt transition, in
expected ways at switch points between tic and speech behavioral system events in the local
analyses. Specifically, pitch and H1*-H2* jump up when going from a word to a tic and jump
down when going from a tic to a word; the opposite patterns are true for SHR, HNR, H4*, and
Intensity, as predicted.
Results reported here are consistent with the view that adult ticker-talkers have developed a
strategy to compensate for intrusive tic meanings that involves generating separate or separable
acoustic streams for tics and speech, a strategy embodied by producing vocalizations from each
behavioral system on distinct phonatory channels.
To illustrate how separating one’s tics from one’s speech in this manner might be considered
optimal given the circumstance, Figure 33 depicts an utterance that had three verbal tic events
that occurred at very close temporal distance to speech. Two of them, “sausage” and “hey”, did
not interrupt the Intonation Phrase nearest them; they immediately followed the phrase. One,
111
“biscuit”, did occur internal to the phrase. Unintended meanings are inserted in all three cases,
though placement of the former two appears more cooperative than placement of the latter.
Regardless, three unintended meanings are inserted into less than five seconds worth of speech.
But the tics sound so qualitatively different from the surrounding speech—here illustrated by the
pitch track in cyan showing large jumps up and down when crossing from words to tics and
back—that they give the auditory impression of not forming part of the utterance. Future studies
should investigate whether (a) interlocutors perceive verbal tics to be “separate” from
surrounding speech and whether (b) this perception provides a benefit for communication in
terms of ease, efficiency, or social comfort.
Figure 33. Spectrogram of utterance produced by Participant C while reading “The Rainbow Passage”
aloud. A transcript of the utterance is above the spectrogram in rough time alignment. Pitch tracking for
the utterance is in cyan. Green text indicates that the word hosted a phrase-final boundary tone;
corresponding time intervals in the spectrogram are shaded in green. Blue text indicates tics that did not
interfere with production of a phrase; corresponding time intervals are shaded in blue. Red text indicates
tics that did interfere with production of a phrase; corresponding time intervals are shaded in red.
Having found in the previous chapter empirical evidence that co-speech tic production is
sensitive to the prosodic structure of spoken utterances, this study can be interpreted as further
evidence that ticking and speaking can be coordinated in service of communication. An
important caveat to consider with regards to the research design of the present study is that while
it exploits past research that identified a set of acoustic indices of the falsetto/modal distinction
112
to determine whether tics and speech are produced along separate falsetto and modal channels,
respectively, direct observation of falsetto mode of laryngeal vibration would provide
considerably stronger support for the claim that falsetto phonation for tics is a skilled aspect of
co-speech ticking. Future studies using articulatory data should investigate tic phonation to
confirm that the acoustic findings can be attributed to falsetto mode of vocal fold vibration. The
sections that follow elaborate on these points further.
The neurophysiological underpinnings and phenomenological profile of Tourette’s ticking
suggest at least two factors that may be contributing to the findings that tics and speech are
produced along acoustic channels and that phonation for tics occurs in the falsetto register. While
switching to falsetto voice is here being interpreted as a dimension of skilled communication
(i.e., a coping mechanism for speaking while ticking freely), there is also reason to think that the
neurophysiological underpinnings of ticking are conducive to the achievement of falsetto
registration. Put another way, falsetto registration could manifest for “purely” physiological
reasons. A second possibility to consider is that the verbal tics in a speaker’s inventory could be
specified to be realized via certain (specific) phonatory goals. Each of these is discussed below.
Constraints of Ticking
While the effector sets required for verbal ticking and speaking overlap—meaning their
neural underpinnings necessarily overlap to some extent—the neural circuits managing urge-
related behavior and goal-directed behavior are largely distinct (e.g., Brooks, 2011; Graybiel,
2008; Gupta & Aron, 2011; Reese et al., 2014). Under the assumption that the same is true for
urge-based ticking and goal-directed speech, then the distinction at the level of neural circuitry
could translate to distinct patterns of neuromuscular signaling, that is, to tasks that are specified
113
with different dynamic properties. Definitions of the term tic in clinical and neuroscientific
research usually include descriptors referring to their “speed” and/or “force”, as well as their
unpredictability. For example, Cohen and colleagues said TS tics are movements and
vocalizations that appear “[…] sudden, rapid, stereotyped and purposeless (Cohen, Leckman, &
Bloch, 2013:998)”. The almost identical description of “[…] rapid, brief, and purposeless” had
been offered by Peterson & Leckman (1998:1337). If there are neurophysiological bases for the
observed qualities of tic movements, then it is possible that the achievement of falsetto register
during verbal tic voiced segments is an unintended consequence of tic-system physiological
properties. For example, experiments on excised and living human larynges have shown that
phonation shifts qualitatively from modal to falsetto register at a certain point as longitudinal
tension of the vocal ligament is gradually increased. If such rapid tension shifts were entailed
during ticking, it is possible then that voiced segments in verbal tics that would otherwise be
underpinned by typical modal phonation instead end up in falsetto register due to the sudden,
rapid increased longitudinal tension.
One set of findings here are of interest with regards to this “purely” physiological account.
Tic intensity or loudness in Participant A and Participant D datasets was found to be significantly
greater relative to speech—the opposite pattern than would be expected if tics but not speech are
voiced in falsetto register. When tic and word stressed vowel samples for Mean Intensity are
compared using an unpaired Wilcoxon that tests the opposite relationship than that predicted by
the hypothesis, it is confirmed that ticked stressed vowels are significantly louder than spoken
stressed vowels (p < .001 for both participants). Mean f0 was clearly higher for tics relative to
speech for these participants as well, like it was for every participant. Given that “[…] just a
small change in lung pressure can trigger a sudden change from one register to the other (Titze,
114
2014:2091)” and that vocal pitch jumps up spontaneously about 9 semitones when vocal
intensity increases 30 dB (Debruyne & Buekers, 1998), it is conceivable that more forceful
muscle contractions concomitant with tic-like neuromuscular signaling results in falsetto
registration incidentally rather than “purposefully.” Participant A and D findings are consistent
with this account. On the other hand, Participant A’s harmonic structure data are suggestive of
increased intensity causing more of the vocal fold body to vibrate than would in falsetto voice
(which does not involve vibration of the entire vocal fold body) given that ticked and spoken
words had comparably rich harmonics. This alternative (but not mutually exclusive) explanation
provided by a purely physiological account cannot be excluded given the current data and
analysis methods. It is worth noting that if a ticker has sufficient experience with their own
verbal ticking, supervisory systems will have learned whether verbal tic phonation is likely to
surface as falsetto (in the absence of intent). Findings from Participants A and D could be
explained by skillful relegation of tics to a falsetto acoustic channel that, when combined with
physiological factors, led to an achieved falsetto or falsetto approximation that was louder than
expected.
Non-modal Channels that are not Falsetto
It was noted in the introduction that in theory, other non-modal acoustic channels could
function to segregate ticked from spoken utterances, much like falsetto is hypothesized to do. For
example, any voice quality modulation that the speaker doesn’t use to mark linguistic contrasts
in their specific language should be available. For example, intensity or relative loudness isn’t
thought to be contrastive in any language and could be an appropriate alternative to falsetto in
this regard. Participant A results intensity results could be reflective of targeted (i.e., skilled)
115
intensity modulation for co-speech ticcing. Notably, Participant A acoustic data resulted in the
smallest subset of acoustic parameters that patterned as expected if tics were produced in
falsetto. This raises the possibility that for Participant A, falsetto was never the target
modulation. Another point worth mentioning is Participant A’s f0 range in speech, which
appears very high. It is possible that for this ticker-talker, falsetto register use to distinguish tics
from speech is a less efficient choice than increased amplitude. To this point, statistical
comparison of tic/word intensity testing the reverse relationship than what was tested in the
global analysis does confirm that Participant A’s tic intensity is significantly higher than word
intensity (Z = 12355.5, p < .001). If Participant A’s compensatory strategy involves relegation of
tic vocalizations to a relatively louder channel and not relegation to a falsetto channel, then
significantly raised pitch could be an indirect consequence of this increased intensity. Participant
A’s HNR findings are harder to account for this way, however, because increased intensity
should accompany increased HNR but HNR is relatively decreased in tics. Difficulties in
interpreting Participant A data highlight the general limitation of this study, which is that there is
no single acoustic measure that can indicate the presence or absence of falsetto, making it
difficult to confirm that the separate tic channel is indeed a falsetto one. However, it remains
clear that there are separable acoustic channels that likely create separate perceptual streams
because at least three out of the six acoustic parameters succeed in distinguishing tics from
speech across all participant datasets.
Specific Targets for Individual Tics & the “Just Right” Phenomenon
Setting aside the multi-faceted “purely” physiological account, finding that tics are produced
in falsetto register could be related to the existence of specific tic register goals. Recall that the
116
inventory of tics produced by an individual can remain fairly stable over months and sometimes
years. It is not currently known what the nature of a tic task is. Is simply uttering a verbal tic
enough to satisfy the urge to produce that tic? Or does the tic have to be produced in a “just
right” fashion? If the representation for a specific tic could have unique phonatory goals that are
incorporated into the urge/task itself. Premonitory urges to tic have been linked to “not just right”
phenomena in which preceding urges are not satisfied despite production of appropriate motor
tics (Martino, Ganos, & Pringsheim, 2017; Sambrani, Jakubovski, & Miller-Vahl, 2016). If a
motor tic can be “not just right,” it is possible that verbal tics have a specific register goal just as
they have particular segmental phonetic goals.
If each tic in a ticker’s inventory has a separate phonatory goal, then it is the relative
frequency of occurrence (or repertoire) of different tics in a ticker’s inventory that would explain
the patterns observed overall. This is particularly true in the case of Participant C whose most
frequent verb tic “biscuit” (n=466) is almost an order of magnitude more frequent than the next
most frequent verbal tic fuck (n=69). Even if fuck had a modal voice target, a falsetto target for
“biscuit” would make it seem like the whole tic system was dominated by the falsetto trend.
Table 30 shows the counts for each participant’s three most frequently produced tics (for further
details about each ticker’s tic sample, see Appendix) counted across all tasks.
24
Figure 34
compares Mean f0 across each participant’s three most-frequent tics. In Participant C tics, f0 is
significantly different between each pair. Interestingly, biscuit tends to be of lower pitch relative
to the other two tics. Thus, it is not the case that the preponderance of biscuit is artificially
raising the pitch of the whole group. Two of Participant A’s tics do not differ significantly with
24
Only one of Participant D’s verbal tics had any token repetitions—the tic phrase fuck off (n=8).These
occurrences were split into three groups for the purpose of this exercise.
117
respect to f0, while the third differs from them both. F0 in Participant B’s happy and kind isn’t
significantly different from f0 in Ione
25
, but kind is significantly higher than happy.
Results from Participant A, Participant B, and Participant C results don’t provide a definitive
answer regarding whether or not individual tics have unique phonatory goals but they do show
that tics are not monolithic. Participant C’s most frequent tics, for example, each occupy a
different region of the tic system f0 range, but they also contain vowels of different qualities
which have intrinsically different pitch and amplitude. It will be necessary to investigate the
extent to which variants of tics succeed in accomplishing satisfaction to understand just-right
phenomena.
Table 30. Labels and counts for each participant’s three most frequently produced tic words. Participant
D’s only repeated verbal tic, the phrase fuck off, is presented in different ways for the purposes of this
exercise.
Rank A B C D
Tic Count Tic Count Tic Count Tic Count
1 HELLO 7 HAPPY 17 BISCUIT 466 FUCK_OFF 5
2 NO 4 KIND 8 FUCK 69 FUCK 3
3 POPE 4 IONE 5 HEY 42 OFF 3
25
The verbal tic “ione” (pronounced eye-OH-nee) is a female proper name.
118
Figure 34. Boxplots of stressed vowel Mean f0 data for the three most frequently produced tics by
each participant (clockwise from top left: Participant A, Participant B, Participant C, and Participant D).
Significance results are from two-sided independent samples Wilcoxon rank sums tests.
A main limitation of this study is the absence of articulatory evidence. A battery of six
acoustic parameters were used as a proxy for the presence or absence of falsetto/modal register
in voiced tics and speech. Falsetto vocal fold vibration was not directly observed. Ultimately,
laryngeal movement data would confirm these findings and also enable investigation of the role
played by tic-system-specific neurophysiological constraints that likely underpin verbal tic
voicing.
119
3.5. Conclusion
This study presented evidence that adult ticker-talkers produce tics and speech along distinct
acoustic channels, with tic phonation exhibiting acoustic indicators of falsetto that were mostly
absent in each ticker’s communicative speech. Importantly, large “jumps” in parameter values
between the ranges expected for falsetto and modal voice occur at those points in time where
vocal behavior transitions between tic and speech behavioral systems, suggesting that phonatory
register shifts may be targeted (i.e., coordinated). These findings support an account that treats
separation of tic and speech streams as an acquired skill that can mitigate unwanted effects of
intrusive verbal tics.
120
Chapter 4. Verbal Tics Don’t Undergo Boundary-related Lengthening
4.1 Introduction
The case studies represented in this dissertation show adult tickers producing speech that is
peppered with vocal tics, some of which are isomorphic with words (and phrases). These tics
often occur at very close temporal distances to true words; one such instance is illustrated in
Figure 35. The figure depicts two instances of verbal tic “biscuit” (blue text and shading) that
followed closely after words open and out (green text and shading). The acoustic intervals of
silence between true-word offsets and tic onsets in this utterance represent the closure portion of
the labial consonant [b].
26
Thus, the two instances of “biscuit” abut true words. To put another
way, the “space” between these tics and surrounding true-words is comparable to the space
between any two true-words in a phrase.
Figure 35. Spectrogram of utterance containing two post-boundary “biscuit” tics. Transcript in rough
time alignment. Phrase-final words in green text and shading. Tics in blue text and shading.
26
The segmentation protocol employed for this study required instances of “biscuit” onset to be
identified as the onset of formants for the first vowel.
121
Distributional analyses presented in the second chapter of this dissertation show that for
Participant C, verbal tics are most likely to occur near a phrase boundary. Vocal tract actions
instantiating typical connected speech are known to undergo articulatory lengthening, reduced
temporal overlap, and increased spatial magnitude in the vicinity of a prosodic boundary (e.g.
Byrd & Saltzman, 1998; Fougeron & Keating, 1997). It is being claimed here that tics are
isomorphic with the true-words they mimic (i.e., having the same gestural makeup) and this
raises the question—are articulatory actions for ticking impacted by their proximity to prosodic
phrase boundaries? This experiment investigates whether the acoustic duration of a frequent
verbal tic (463 token repetitions) varies as a function of its proximity to a prosodic phrase
boundary. In theory, verbal tic actions aren’t incorporated into phrasal task plans because they do
not form part of an intended message. If boundary-related clock-slowing processes (Byrd &
Saltzman, 1998, 2003) operate on any and all vocal tract constrictions with which they are co-
active, then proximity to a boundary could cause verbal tics to lengthen like their communicative
counterparts do. On the other hand, if prosodic marking events have no purview over the
dynamics of ticking, as theorized, then proximity to boundaries is predicted to either have no
impact (i.e., no significant difference in duration) or have an impact that is not speech-like (i.e.,
shortening at a boundary instead of lengthening).
4.2 Method
The hypothesis that verbal tics do not undergo word-like lengthening at prosodic phrase
boundaries is tested by analyzing durational variability in a sample of acoustic token repetitions
of a verbal tic. Tic tokens were sourced from the corpus of co-speech ticcing acoustic data
described previously. Data was collected from three female adults and one male. One of the
122
female participants (Participant C) produced 463 token repetitions of the verbal tic “biscuit,”
facilitating analysis of token-to-token variability. No other participants had more than 11
repetitions of a single verbal tic. Participant C was in her thirties at the time of the study and was
diagnosed by a neurologist as having Tourette’s at the age of 6. She was born and raised in
London, has never lived outside of the city for more than a few months, is a monolingual speaker
of British English, and reported no hearing or speech difficulties. A condition of participation
was a willingness to tic freely throughout the study, that is, to refrain from suppressing one’s
own tics; Participant C reported free ticking by default in everyday life.
4.2.1 Recordings
Audio recording took place in a small, quiet conference room with mild sound attenuation.
The room was equipped with a built-in system for audio visual presentations that was used to
present visual materials to the participants, who sat at the room’s table facing the screen. The
study was modelled as a sociolinguistics-style interview with the investigator present in the room
(e.g., Eckert & Labov, 2017). Verbal prompts by the investigator elicited performance of four
different speech tasks within each of three task types (see Table 31). Task-specific speech tasks
were blocked so that each block included a task from each type (Readings, Pictures, Narratives).
The order of tasks within each block was randomized, but block order was fixed across
participants. The participant was outfitted with a Shure head-worn microphone adjusted to be
about two inches from the corner of the mouth immediately after providing consent. The acoustic
signal was routed through an M Audio USB hub to a personal laptop running Audacity software
(Audacity Team, 2017) that performed the recording at a 44.1 kHz sampling rate.
An automatic speech to text service Otter.ai generated transcripts for each task recording.
123
These were subsequently manually corrected. Vocal tic noises were transcribed as “NS” during
manual correction following conventions used by the Penn Phonetics Lab Forced Aligner (P2FA;
Yuan and Liberman 2008). P2FA was fed corrected transcripts and trimmed audio recordings for
each task to automatically generate phone and word level segmentation in Praat (Boersma & Van
Heuven, 2001). This segmentation was corrected manually according to standard phonetic
conventions. Tics and true words are both represented on the phone and word tiers of Praat text
grids. Thus, tic and word intervals were labelled according to their Vocalization Type (Tic vs.
Word) on a separate tier.
Table 31. Monadic speech tasks and the verbal prompts that elicited them.
Task Type Prompt Task Name
Personal
Narrative
“Tell me about a time when you felt
extreme ______.”
Joy narrative
Embarrassed narrative
Proud narrative
Sad narrative
Picture
Description
“Describe the picture you will see on the
screen in as much detail as possible.”
Pool party scene
Park scene
Beach scene
Animal house scene
Passage
Reading
“Read the passage that will appear on the
screen.”
Rainbow passage
Grandfather passage
Northwind passage
Comma gets a cure passage
4.2.2. Segmentation and labeling
Long recordings comprising more than one speech task were trimmed. The start of a speech
task was defined as the moment at which the investigator finished uttering the verbal prompt
corresponding to its Task Type. The end of a task was defined as the end of the participant’s
124
statement indicating completion of the task. The ticker reliably produced these statements
unprompted. Any “biscuit” tics occurring within the bounds of a task so defined constitute the
set of “biscuit” tics for that task.
This analysis relies on coding instances of “biscuit” tics according to their Faux-Prosodic
position (Faux-Initial, Faux-Medial, Faux-Final). Two further segmentation protocols are
required before coding can take place: utterance “chunks” with no internal pauses, and
intonational phrases inside chunks of continuous speech. Segmentation into chunks isolates
speech-proximal tics from tics that occurred during inter-speech pause intervals, the latter of
which are excluded from analysis. Segmentation into intonational phrases enables coding
instances of “biscuit” according to their Faux-prosodic position.
Long intervals of acoustic silence are identified in order to break up each speech task
recording into utterance “chunks”—intervals of speech that contain no pauses internally. Pauses
are defined here as silent intervals lasting at least 250 milliseconds.
27
They were automatically
labelled via Praat scripts developed for the purpose of locating speech pauses in English and
Dutch acoustic recordings (De Jong et al., 2021). Automatic pause segmentation subsequently
underwent manual correction. Different possible pause categories that are recognized in the
27
Any intervals of acoustic silence occurring within the bounds of a chunk are conceptually discounted
following early work suggesting silences shorter than 250 milliseconds represent so-called articulatory
“gaps” (Goldman-Eisler, 1958). Though different durations to minimally characterize a pause have been
proposed, the definition employed here is used in research and clinical settings; for example, counts and
locations of pauses are used as a metric for fluency in an L2 (De Jong et al., 2015). Note that both filled
pauses (um, uh) and unfilled pauses are included in the broader pause category by the researchers that
developed the Praat scripts used in the present study. However, for the purposes of this study, filled
pauses are like true words in that their production precludes simultaneous production of vocal tics.
Therefore, filled pauses are considered words in the present analysis, forming part of speech-only or
mixed chunks. The question of whether filled pauses um/uh are word-like in terms of their planning
and/or communicative function (e.g., Clark & Fox Tree, 2002; Rose, 1998) is a separate issue, and no
position is taken with respect to that here.
125
literature are not discussed here.
Chunks are classified into one of three types defined in terms of the vocalizations that they
contain. Tic-only chunks are comprised entirely of vocal tics, speech-only chunks contain only
true words, and mixed chunks are composed of some combination of vocal tics and true words.
Tics in mixed chunks are speech-proximal, that is, they are less than 250ms from a true-word on
either side.
Chunks have an unpredictable number (and size) of intonational phrases, so IP edges must be
located empirically. This task was operationalized in terms of identifying chunk-internal words
that hosted phrase-final boundary tones. These words serve as proxies for the phrase-final
boundary itself.
IP-final words in British and American English often host tones (pitch events) that mark the
phrase’s final boundary, commonly known as IP-final boundary tones. It is assumed here that
British English speakers may produce three kinds of phrase-final boundary tones: final falls,
continuation rises, and tonal plateaus. The three-tone boundary system arises in the context of the
Intonational Variation in English (IViE) corpus designed to document and characterize
intonation across varieties of British English; researchers modified existing ToBI (mono-
dialectal) conventions for English to enable analysis of prosodic features across a variety of
Englishes (Grabe, 2000). While IP-final falls and continuation rises are well-established
constructs in the language and speech sciences, it is worth explaining the tonal plateau boundary
tone category further.
28
Phrase-final tonal plateaus in British English varieties occur variably in
28
The debate about whether or not English tonal plateaus finalize an intermediate phrase as opposed to a
“full-fledged” IP is immaterial to the discussion here because whatever the case may be, the prosodic unit
in question meets the necessary criteria of simultaneously transmitting a complete message and being of
relatively short duration, thus allowing for frequent ticking around them.
126
the place of more “canonical” continuation rises and are described as truncated continuation rises
(Grabe et al., 2000). Visually, these tonal events appear as a continuation of the phrase’s pitch
accent till the phrase-final edge. Figure 36 below presents the three-category IP-boundary tone
system for varieties of British English. Phrase-final boundary tones were reliably produced by all
participants in this study. The Praat analysis windows displaying acoustic data were always sized
to depict 6.5-7 seconds of material before identification and labelling of boundary tones was
performed so as to maintain consistency in visual inspection.
Figure 36. Spectrogram and transcript of utterances demonstrating phrase-final boundary tone
categories for varieties of British English in the IViE transcription system that were used to delimit
intonational phrases inside chunks containing speech. Pitch track overlaid in cyan. Blue shading over
verbal tic. Green shading over words hosting a boundary tone. Top panel: continuation rise on word light
and final fall on word colors. Bottom panel: two instances of tonal plateaus on children and happy.
Phrase-initial edges were not coded explicitly. However, the presence and location of
phrase-initial boundaries in a chunk can be inferred on the basis of phrase-final boundaries.
127
Specifically, any true word that immediately follows an IP-final word inside a chunk is
necessarily an IP-initial word; the very first word in a chunk is also necessarily IP-initial.
4.2.3. Analysis
Speech-proximal tics were manually labeled according to their position relative to phrase
boundaries ( Figure 37 ). Tics are Faux-Initial or Faux-Final if they are left-adjacent to a phrase-
initial word or right-adjacent to a phrase-final word, respectively. Recall that in a previous step
recordings are separated into utterance “chunks” containing only tics and/or words (no pauses).
Instances of “biscuit” that were chunk-final are unambiguously Faux-Final. Chunk-initial tics
are unambiguously Faux-Initial. Tics that occurred between two phrases inside a chunk are
ambiguously Faux-Initial or Faux-Final; these were coded as Faux-Final by default. Tics
anywhere inside a phrase are classified as Faux-Medial. Faux-Medial tics are necessarily
interruptive in this analysis; this point addressed in the discussion section of the present chapter.
“Biscuit” tic duration was compared across Faux-Prosodic groups with a one-way ANOVA
implemented in the R coding environment (R Core Team, 2021). Original scripts used the rstatix
package (Kassambara, 2020) for analysis.
128
Figure 37. Faux-Final (blue text and shading) and Faux-Medial (red text and shading) instances of
“biscuit”. Spectrogram overlaid with pitch track in cyan. Green shading and text indicates phrase-final
boundaries.
4.3. Results
A one-way ANOVA compared the effect of Faux Prosody on “biscuit” tic duration and
found no statistically significant differences between group means (F(2,426) = 0.267, p = .766).
Figure 38 shows that acoustic “biscuit” duration is not lengthened at a boundary (Faux-Initial
and Faux-Final) relative to “biscuit” durations that are one or more true-words away from a
boundary (Faux-Medial).
Table 32. Counts, average duration in milliseconds, and standard deviation of average for “biscuit” tics
across Faux-Prosodic positions.
129
Figure 38. Acoustic duration of “biscuit” tics across three Faux-prosodic positions. One-way
ANOVA found no significant differences between groups.
4.4. Discussion
As predicted, acoustic duration of the tic “biscuit” does not vary systematically as a function
of Faux-Prosodic context. Tics adjacent to boundaries (Faux-Initial and Faux-Final) are not
lengthened relative to Faux-Medial “biscuit” tics. These results are consistent with the notion
that phrasal prosody doesn’t modulate the expression of verbal tic constrictions. Under the
assumption that instances of true-word “biscuit” are lengthened phrase-finally by this speaker,
these findings suggest that verbal tics are not prosodified.
In theory, instances of phrase production by typical adult talkers are the outcome of skilled
mental processes working in concert to achieve transmission of a message with a particular
intended meaning. Thought of this way, prosodic grouping processes/tasks actively exclude
elements that don’t form part of intended messages or that detract from intended messages. To
visualize what is happening, it can be conjectured that coordination between vocal tic tasks and
phrase juncture tasks is part of an expanded gestural model by modeling tic actions as elements
130
that can be triggered by phrase boundaries. (This is akin to accounts of pause postures at
boundaries e.g., Katsika, Krivokapić, Mooshammer, Tiede, & Goldstein 2014). Under the
assumption that tic vocal tract action has stable spatial, temporal, and timing gestural properties
(i.e., that tics can be modeled in the same terms as phonological gestures), a tic’s gestural
molecule could be triggered or initiated once a π-gesture reaches a certain level of deactivation,
i.e. when the slowing derived from the boundary is no longer active. In this speculative account
the clock-slowing force exerted by π-gestures can’t then influence individual constrictions that
underlie production of a verbal tic.
Figure 39. Conjectured triggering of tic vocalization by decreasing activation of π-gesture that
embodies the phrase boundary.
“Biscuit” tic duration may not be modulated by prosodic gestures in the way that true words
are, but their proximity to speech does impact their duration in that speech-proximal “biscuit”
tics are significantly longer than distal “biscuit” tics. The left panel of Figure 40 below presents
speech-proximal “biscuit” duration (repeated from Figure 38 above) alongside distal “biscuit”
duration for reference. Distal “biscuit” duration is shorter than speech-proximal “biscuit”
duration. A one-way ANOVA including the Distal group yielded significant variation among
conditions, F(3,459) = 4.97, p = .002. A post hoc Tukey test showed that Distal “biscuit”
131
duration differed significantly from Faux-Initial (p = .013), Faux-Medial (p = .001), and Faux
Final (p = .001) “biscuit” duration. Given that speech-proximal instances of “biscuit” are of
significantly longer duration than distal instances, comparable duration across Faux Prosodic
categories is best interpreted as evidence that speech-proximal tics aren’t susceptible to
modulation in a speech-like way; proximity to speech is impacting their duration somehow.
One way to interpret systematic differences in verbal tic duration as a function of proximity
to speech is to factor in possible tic-system constraints. Specifically, relatively shortened distal
tics may point to the possibility of a general preference for fast ticking. As verbal tics are not
intended to convey meaning, there is little to no reason to speak slowly, clearly, or audibly; there
is no interlocutor to consider. As such, the tic system could have an overall tendency to satisfy
tic urges as quickly as possible without a concern for correctness.
29
The discussion in the second
chapter of this dissertation noted that clinical observation of tic movements and actions have
tended to highlight their rapid and abrupt nature; it is reasonable that efficiency in tic urge
satisfaction demands impose durational requirements. Having said that, between speech-
proximal and distal instances of ticcing, distal tics are expected to reflect tic-system-internal
constraints more faithfully. A different possibility is that verbal tics have no “preferred”
durational properties and that relatively shorter duration of distal tics reflects optimization with
respect to the optimal duration of pauses for monadic speech (in the presence of an unfamiliar
listener). Assume for a moment that pause postures are the active pausing vocal task. If there is a
dis-preference for lengthy pauses, then pause postures could be timed to be short. The processes
ensuring that pause postures are not unduly lengthened could be influencing distal tics the same
29
It is also possible, in contrast, that tics must be executed in just the right manner in order to be effective
at reducing the sensations of discomfort they are meant to alleviate.
132
way. These possibilities are not mutually exclusive.
A set of findings reported in the preceding chapter that speaks to the issue of tic-system
constraints are revisited here. That study found that average f0 is significantly higher for speech-
proximal verbal tic stressed vowels relative to true-speech stressed vowels for all four tickers that
participated. However, distal and speech-proximal vocal and verbal tics were all pooled into a
single tic group. To discover if the differences in duration between speech-proximal and distal
tics are also mirrored in the f0 domain, a one-way ANOVA compared “biscuit” stressed vowel f0
across the same groups and also found significant differences [F(3,459) = 11.8, p < .001]. Post-
hoc Tukey test results show that Distal “biscuit” stressed vowel f0 is significantly higher than
Faux-Initial, Faux-Medial, and Faux-Final “biscuit,” all at p < .001.
Figure 40. “Biscuit” tic duration (left) and stressed vowel f0 (right) by group including distal tics.
The general finding regarding phonatory acoustics in co-speech ticcing is that talkers who tic
speak with significantly lower average voice pitch than their pitch when they tic; Figure 40 adds
the nuance that distal tics are pulling the tic-system average up. Speech-proximal tic f0 falling
between distal tic and speech averages provides evidence of cross-system “blending” of
parameters.
133
4.5. Conclusion
Lack of evidence for speech-like patterns of durational variability due to prosodic context
supports the notion that tics do not form part of (phrasal) prosody—they are “un-prosodified”.
This suggests a useful working model for what a verbal tic is.
134
Chapter 5. General Discussion
Systems of action interact when they are co-active—when we talk and walk at the same time,
for example. Physiological systems of action that function to maintain homeostasis (e.g.,
micturition, coughing), are active throughout the day and night though we may rarely have
conscious awareness of them. Systems of action also interact intentionally, as when a talker
performs other activities while talking. It has been observed that throughout this all, humans
manage to “make sense”, that is, to not act in an entirely self-defeating manner (Dale & Kello,
2018). Thus, adaptive cross-modal interaction must occur. Co-speech ticking by adults with
Tourette syndrome demonstrates that action systems with disparate aims and requirements can
interact in a way that is cooperative—that is, where the tasks of both interacting systems are
optimally achieved. It appears that with practice, flexibility on the part of both the tic and the
speech systems can be harnessed to address potential tic intrusions or interference.
Agents believe themselves to be doing some action, and the identity of that action “[…]
exerts a selecting and guiding force on subsequent action” (Vallacher & Wegner, 1987:3).
Cognitive-psychological research in the 1980s in the area of “everyday action planning” (e.g.,
Schwartz, 2006) sought to understand the dynamics of what humans are doing (their observable
behavior) and what humans believe themselves to be doing (the identity of their explicitly
endorsed action) and how these two relate to each other. Act Identity Theory (AIT; Vallacher &
Wegner, 1987, 1989) is motivated by the observation that while humans appear to be doing
many things (i.e., actions) at the same time, all the time, the complex of actions is not a “random
assemblage of unrelated elements (Vallacher & Wegner, 1987:4). Instead, all of the actions in
play form an action hierarchy that is led by a prepotent task—what agents believe themselves to
135
be doing.
30
Figure 41. Three consecutive breakfast moments under familiar circumstances. At each moment, the
prepotent task recruits tasks from lower levels. Automation most developed for lower-level tasks.
Thickness of arrows connecting moments represents how much attention is paid to that transition.
Participants in the studies carried out by the AIT authors showed that there was systematicity
to the relation between the action endorsed by agents and features of the circumstance. For
example, participants were asked to perform common actions in their typical circumstances, as
well as novel ones, and queried about which out of a list of action entities they endorsed,
showing that moving between relatively higher and lower action entities is systematic. An action
like “drinking coffee” can be considered low-level relative to “having breakfast,” in which case
the former provides specification for the latter. “Take a sip of coffee” could also be the prepotent
task, which is to say that it exploits lower-level skills like “holding the mug to my lips” and
30
The authors couch their theory in traditional dualist terms; for them the action hierarchy is a cognitive
representation they refer to as the action’s “identity structure.” The view here is slightly different, namely,
we understand the action hierarchy that is in place in the current moment to be determined by
affordances/effectivities.
136
“swallowing” to accomplish its goal (to take a sip of the coffee). In the context of “having
breakfast” in your own home, “drinking coffee” is largely automated. In the context of “having
breakfast” in your new partner’s home, “drinking coffee” will require overt planning. Similarly,
experienced chopstick users endorsed action entities like “having a meal” and “gaining
nourishment” after performing an eating task under typical (for them) circumstances. However,
when they were given an extra small pair of chopsticks, they began to endorse lower-level
entities like “putting food in my mouth.” Inexperienced chopstick users, meanwhile, endorsed
lower-level entities in both conditions like “chewing,” “swallowing,” and “putting food in my
mouth”. Three principles arose from this research (summarized in Vallacher & Wegner, 1987:5-
6). First, action is maintained with respect to its prepotent identity; this point is revisited below.
Second, there is a tendency for action to be maintained at the highest possible level. Researchers
working in the AIT framework attribute this to a preference for “conceptual understanding of
action’s meaning” (Vallacher & Wegner, 1989: 661).
31
Switching from a relatively higher to a
relatively lower prepotent action is an indication that a change in the circumstance has made the
higher-level entity contextually inappropriate for some reason (e.g., sudden change in
environment). Relatedly, the third principle gleaned from this work is that when an action cannot
be maintained at a relatively higher level, a lower-level entity becomes prepotent.
31
Russian neurophysiologist Nikolai Bernstein advanced the same principle but on the basis of a dis-
preference for allotting attention to relatively low-level tasks, specifically tasks whose outcome is overt
movement, such as phonological gestures in speech articulation.
137
Figure 42. Three consecutive breakfast moments under unfamiliar circumstances. At each moment,
the prepotent task recruits tasks from lower levels. Automation most developed for lower-level tasks.
Thickness of arrows connecting moments represents how much attention is paid to that transition.
Linguists will of course be familiar with the concept of hierarchy in behavior; in fact,
linguists are uniquely positioned to consider these topics because centuries of linguistic inquiry
have led to the discovery of theoretically and empirically grounded informational structures in
the speech signal. The phonological hierarchy, which represents structural levels including
gestures, syllables, feet and phrases, can be conceptualized as an action hierarchy as well.
The tickers who participated in the experiments reported here were instructed to perform
spontaneous, narrative, and read speech while ticking freely, a set of circumstances
manufactured to engender construal of an over-arching co-speech ticking task. In theory, this
task can now exert control over ticking and speaking; the experiments in this dissertation
examined co-speech ticking data with an eye toward empirically validating three proposed
measurable indices of optimally coordinated co-speech ticking.
Control over movement and action demands participation of both a prepotent action entity—
138
the leading task that is evaluated for correctness and completion—and task-specific background
corrections assigned by that entity. But which action is prepotent can change from moment to
moment (Latash et al., 1996; Profeta & Turvey, 2018; Vallacher & Wegner, 1987, 1989). An
instance of occurrence of an action will see it occupy a relatively high/low level in the current
action hierarchy. In the case of speech, this hierarchy starts with gestures at the lowest level and
can go up to the highest levels of personal agency. Any leading level task can achieve dexterity
by developing its suite of task/goal-specific adjustments, led by lower levels, that serve the
leading level so that it may accomplish its intended task goal in a wide variety of contexts
without the need for (conscious or subconcious) attention (Latash et al., 1996:208). Importantly,
in theory, any task at one level can be recruited by a relatively higher-level task to serve the
function of a sensory-based correction, meaning that any task is susceptible to automate in the
right circumstance.
32
Through practice and experience in task performance across a wide variety
of circumstances, these corrections become automatized, which for Bernstein means that their
control has been “pushed down” far enough as to be out of reach of cognitive planning (Latash et
al., 1996:192). Automatisms function to free up supervisory systems to evaluate task
performance and the continued likelihood of accomplishing goals in the current (ongoing)
circumstance. Thus, in the long run, it is the development and automation of the suite of
background corrections that imparts skill to task performance. Production of phrases is
automated in many discourse situations, which is the same as saying that it is a highly developed
skill.
These simple observations have important consequences. The drive towards automatization
32
Bernstein asserts that the reverse is also true; in theory, automatisms can undergo de-automation, the
result of which is the (re-)emergence of that action entity as a leading-level task. In adult human behavior
this occurs whenever destructive forces break the skill (e.g., changes to the plant).
139
is desirable because the more a leading level task can rely on automatisms to function, the better
it can perform, eventually becoming automatized itself and available for recruitment by other
leading level tasks. While it is the case that there is evaluation of performance of lower-level
tasks, this evaluation is fully automatized (e.g., Lombard effect, phrase final lengthening). Thus
it is the “smartness” of background automatisms that is the foundation of any higher-level skill.
5.1 Concluding remarks
This dissertation identified novel signatures of adaptive/cooperative interaction between
Tourette’s vocal ticking and speech in acoustic recordings of adults talking while ticking freely.
When tics occur at close temporal distances to true speech, they are found to reliably respect
prosodic phrase boundaries—tics occurred internal to phrases much less often that is predicted
by chance. This shows that there is optimization with respect to the relative timing of actions
across two disparate domains of action. The finding of little to no token-to-token durational
variability in speech-proximal verbal tics suggests that these tic-words are not subject to phrase
prosody, even though their temporal occurrence is often organized around (the outer edge) of
phrase boundaries. In other words, ticking is sensitive to prosody but prosody may ignore ticking
as part of its function to structure information (i.e., keep tics out of phrasal structures). Evidence
was also uncovered for tics’ being relegated to a separate acoustic, and perhaps informational,
channel. Whether the signs of falsetto voice register in verbal tic phonation reflects skilled
adaptation, physiological factors related to ticking, individual/specific tic voice targets, or some
combination of all of the above, the outcome is that the intended/dialogic semantics being
generated by a talker can be kept separate from any referential meaning being generated by tics.
These properties of co-speech free-ticcing by adults with Tourette’s are interpreted as evidence
140
that skilled compensatory strategies have developed.
When comparing tics close to speech to those far from speech, it becomes clear that the goals
of tic actions and articulatory goals may be blending, leading to a sort of “undershoot” of the tic
goals. On their own, tics tend to be higher pitch and very short; when in the temporal vicinity of
speech, they tend to be relatively lower pitch (though still significantly higher than surrounding
speech) and less short (relative to distal tics). There is also some evidence that the speech system
re-orients itself to anticipated ticking. For example, the three tickers who experienced the fewest
phrase-level interruptions had consistently shorter intonational phrases relative to the ticker who
experienced frequent interruptions of all kinds. Shorter intonational phrases lead to an increase in
the number of available positions for tic occurrence that are cooperative. It is possible that
experienced tickers learn to adaptive phrasing strategies like formulating messages in fewer
words or breaking up intonational phrases into intermediate phrases. In short, ticking and
speaking on their own are different than ticking and speaking simultaneously. This is expected if
control is defined in circumstance.
141
References
Alipour, F., Finnegan, E. M., & Scherer, R. C. (2009). Aerodynamic and acoustic effects of
abrupt frequency changes in excised larynges. Journal of Speech, Language, and Hearing
Research, 52(2), 465–481. https://doi.org/10.1044/1092-4388(2008/07-0212)
Aubanel, V., Cooke, M., Villegas, J., & Lecumberri, M. L. G. (2011). Conversing in the
presence of a competing conversation: Effects on speech production. Proceedings of the
Annual Conference of the International Speech Communication Association,
INTERSPEECH, (August), 2833–2836.
Berry, D. A., Herzel, H., Titze, I. R., & Story, B. H. (1996). Bifurcations in Excised Larynx
Experiments. Journal of Voice, 10(2), 129–138.
Boersma, P. (2013). Acoustic analysis. In R. J. Podesva & D. Sharma (Eds.), Research methods
in linguistics (pp. 1–10). Cambridge University Press.
Boersma, P., & Van Heuven, V. (2001). Speak and unSpeak with PRAAT. Glot International,
5(9), 341–347.
Bořil, T., & Skarnitzl, R. (2016). Tools rPraat and mPraat. In P. Sojka, A. Horák, I. Kopeček, &
K. Pala (Eds.), Text, Speech, and Dialogue (pp. 367–374). Springer International
Publishing.
Brabson, L. A., Brown, J. L., Capriotti, M. R., Ramanujam, K., Himle, M. B., Nicotra, C. M., …
Specht, M. W. (2016). Patterned changes in urge ratings with tic suppression in youth with
chronic tic disorders. Journal of Behavior Therapy and Experimental Psychiatry, 50, 162–
170. https://doi.org/10.1016/j.jbtep.2015.07.004
Brandt, V. C., Niessen, E., Ganos, C., Kahl, U., Bäumer, T., & Münchau, A. (2014). Altered
synaptic plasticity in Tourette’s syndrome and its relationship to motor skill learning. PloS
One, 9(5). https://doi.org/10.1371/journal.pone.0098417
Bronfeld, M., Yael, D., Belelovsky, K., & Bar-Gad, I. (2013). Motor tics evoked by striatal
disinhibition in the rat. Frontiers in Systems Neuroscience, 7(September), 50.
https://doi.org/10.3389/fnsys.2013.00050
Brooks, S. M. (2011). Perspective on the human cough reflex. Cough, 7(10).
https://doi.org/10.1186/1745-9974-7-10
Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units. Phonology,
6(2). https://doi.org/10.1017/S0952675700001019
Browman, C. P., & Goldstein, L. (1992). Articulatory Phonology: An Overview. Haskins
Laboratories Status Report on Speech Research.
142
Browman, C. P., & Goldstein, L. M. (1990). Tiers in articulatory phonology, with some
implications for casual speech. In J. Kingston & M. E. Beckman (Eds.), Papers in
Laboratory Phonology I: Between the grammar and the physics of speech. Cambridge, UK:
Cambridge University Press.
Byrd, D., & Krivokapić, J. (2021). Cracking Prosody in Articulatory Phonology. Annual Review
of Linguistics, 7(31–52). https://doi.org/10.1146/annurev-linguistics-030920
Byrd, D., & Saltzman, E. (1998). Intragestural dynamics of multiple prosodic boundaries.
Journal of Phonetics, 26(2), 173–199. https://doi.org/10.1006/jpho.1998.0071
Byrd, D., & Saltzman, E. (2003). The elastic phrase: modeling the dynamics of boundary-
adjacent lengthening. Journal of Phonetics, 31(2), 149–180. https://doi.org/10.1016/S0095-
4470(02)00085-2
Cavanna, A. E., Black, K. J., Hallett, M., & Voon, V. (2017). Neurobiology of the Premonitory
Urge in Tourette’s Syndrome: Pathophysiology and Treatment Implications. J
Neuropsychiatry Clin Neurosci, 29(2), 95–104.
https://doi.org/10.1176/appi.neuropsych.16070141
Clark, H. H., & Fox Tree, J. E. (2002). Using uh and um in spontaneous speaking. Cognition, 84,
73–111. Retrieved from www.elsevier.com/locate/cognit
Cohen, S. C., Leckman, J. F., & Bloch, M. H. (2013). Clinical assessment of Tourette syndrome
and tic disorders. Neuroscience and Biobehavioral Reviews, 37, 997–1007.
https://doi.org/10.1016/j.neubiorev.2012.11.013
Colton, R. H. (1972). Spectral characteristics of the modal and falsetto registers. Folia
Phoniatrica, 24, 337–344.
Colton, R. H. (1973). Vocal intensity in the modal and falsetto registers. Folia Phoniatrica, 25,
62–70.
Dale, R., & Kello, C. T. (2018). “How do humans make sense?” multiscale dynamics and
emergent meaning. New Ideas in Psychology, 50, 61–72.
https://doi.org/10.1016/j.newideapsych.2017.09.002
De Jong, N. H., Groenhout, R., Schoonen, R., & Hulstijn, J. H. (2015). Second language fluency:
Speaking style or proficiency? Correcting measures of second language fluency for first
language behavior. Applied Psycholinguistics, 36(2), 223–243.
https://doi.org/10.1017/S0142716413000210
De Jong, N. H., Pacilly, J., & Heeren, W. (2021). PRAAT scripts to measure speed fluency and
breakdown fluency in speech automatically. Assessment in Education: Principles, Policy &
Practice, 28(4), 456–476. https://doi.org/10.1080/0969594X.2021.1951162
143
De Krom, G. (1993). A cepstrum-based technique for determining a harmonics-to-noise ratio in
speech signals. Journal of Speech and Hearing Research, 36(2), 254–266.
https://doi.org/10.1044/JSHR.3602.254
De Nil, L. F., Sasisekaran, J., Van Lieshout, P. H. H. M., & Sandor, P. (2005). Speech
disfluencies in individuals with Tourette syndrome. Journal of Psychosomatic Research,
58(1), 97–102. https://doi.org/10.1016/J.JPSYCHORES.2004.06.002
Debruyne, F., & Buekers, R. (1998). Interdependency between intensity and pitch in the normal
speaking voice. Acta Otorhinolaryngol Belg., 52(3), 201–205.
Deguchi, S. (2011). Mechanism of and threshold biomechanical conditions for falsetto voice
onset. PLoS ONE, 6(3). https://doi.org/10.1371/journal.pone.0017503
Eapen, V., & Črnčec, R. (2009). Tourette syndrome in children and adolescents: Special
considerations. Journal of Psychosomatic Research, 67(6), 525–532.
https://doi.org/10.1016/j.jpsychores.2009.08.003
Eckert, P., & Labov, W. (2017). Phonetics, phonology and social meaning. Journal of
Sociolinguistics, 21(4). https://doi.org/10.1111/josl.12244
Farooqui, A. A., & Manly, T. (2018). We do as we construe: extended behavior construed as one
task is executed as one cognitive entity. Psychological Research, 1–20.
https://doi.org/10.1007/s00426-018-1051-2
Fougeron, C., & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains.
Journal of the Acoustical Society of America, 101(6).
Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. The American
Statistician, 56, 316–324.
Garson, J. (2016). A Critical Overview of Biological Functions. Springer Internaltional
Publishing AG Switzerland. Retrieved from http://www.springer.com/series/13349
Garson, J. (2019). What Biological Functions Are and Why They Matter. Cambridge, UK:
Cambridge University Press.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
https://doi.org/10.4324/9781315740218
Godar, S. C., & Bortolato, M. (2017, May 1). What makes you tic? Translational approaches to
study the role of stress and contextual triggers in Tourette syndrome. Neuroscience and
Biobehavioral Reviews. Elsevier Ltd. https://doi.org/10.1016/j.neubiorev.2016.10.003
Grabe, E. (2000). Intonational variation in urban dialects of English spoken in the British Isles.
Journal of Phonetics, 28(2), 161–185.
144
Grabe, E., Post, B., Nolan, F., & Farrar, K. (2000). Pitch accent realization in four varieties of
British English. Journal of Phonetics, 28(2), 161–185.
https://doi.org/10.1006/JPHO.2000.0111
Graybiel, A. M. (2008). Habits, rituals, and the evaluative brain. Annual Review of Neuroscience,
31, 359–387. https://doi.org/10.1146/annurev.neuro.29.051605.112851
Gupta, N., & Aron, A. R. (2011). Urges for food and money spill over into motor system
excitability before action is taken. European Journal of Neuroscience, 33(1), 183–188.
https://doi.org/10.1111/j.1460-9568.2010.07510.x
Hartmann, A., & Worbe, Y. (2018, August 1). Tourette syndrome: Clinical spectrum,
mechanisms and personalized treatments. Current Opinion in Neurology. Lippincott
Williams and Wilkins. https://doi.org/10.1097/WCO.0000000000000575
Hashemiyoon, R., Kuhn, J., & Visser-Vandewalle, V. (2017). Putting the Pieces Together in
Gilles de la Tourette Syndrome: Exploring the Link Between Clinical Observations and the
Biological Basis of Dysfunction. Brain Topography, 30(1), 3–29.
https://doi.org/10.1007/s10548-016-0525-z
Hawks, J. W., & Miller, J. D. (1998). A formant bandwidth estimation procedure for vowel
synthesis. The Journal of the Acoustical Society of America, 97(2), 1343.
https://doi.org/10.1121/1.412986
Hollien, H. (2014). Vocal Fold Dynamics for Frequency Change. Journal of Voice, 28(4), 395–
405. https://doi.org/10.1016/j.jvoice.2013.12.005
Hotchkin, C., & Parks, S. (2013). The Lombard effect and other noise-induced vocal
modifications: insight from mammalian communication systems. Biol. Rev, 88, 809–824.
https://doi.org/10.1111/brv.12026
Iseli, M., & Alwan, A. (2004). AN IMPROVED CORRECTION FORMULA FOR THE
ESTIMATION OF HARMONIC MAGNITUDES AND ITS APPLICATION TO OPEN
QUOTIENT ESTIMATION. In Proceedings of International Congress of Phonetic
Sciences, Vol. 1 (pp. 10–13).
Israelashvili, M., & Bar-Gad, I. (2015). Corticostriatal Divergent Function in Determining the
Temporal and Spatial Properties of Motor Tics. Journal of Neuroscience.
https://doi.org/10.1523/JNEUROSCI.2770-15.2015
Jankovic, J. (1997). Phenomenology and Classification of Tics. Neurologic Clinics, 15(2), 267–
275. https://doi.org/10.1016/S0733-8619(05)70311-X
Jankovic, J., & Kurlan, R. (2011, May). Tourette syndrome: Evolving concepts. Movement
Disorders. https://doi.org/10.1002/mds.23618
145
Kassambara, A. (2020). rstatix: Pipe-Friendly Framework for Basic Statistical Tests. Retrieved
from https://cran.r-project.org/package=rstatix
Katsika, A., Krivokapić, J., Mooshammer, C., Tiede, M., & Goldstein, L. (2014). The
coordination of boundary tones and its interaction with prominence. Journal of Phonetics,
44(1), 62–82. https://doi.org/10.1016/j.wocn.2014.03.003
Keating, P. (2014). Acoustic measures of falsetto voice [poster presentation]. In 167th Meeting
of the Acoustical Society of America. Providence, RI, United States. Retrieved from
https://acousticalsociety.org/program-167th-meeting-acoustical-society-america/
Lange, F., Seer, C., Müller-Vahl, K., & Kopp, B. (2017). Cognitive flexibility and its
electrophysiological correlates in Gilles de la Tourette syndrome. Developmental Cognitive
Neuroscience, 27, 78–90. https://doi.org/10.1016/j.dcn.2017.08.008
Latash, M. L., Turvey, M. T., & Bernstein, N. A. (1996). Dexterity and its development.
Mahwah, NJ: L. Erlbaum Associates.
Leckman, J. F., De Lotbini, A. J., Marek, K., Gracco, C., Scahill, L., & Cohen, D. J. (1993).
Severe disturbances in speech, swallowing, and gait following stereotactic infrathalamic
lesions in gilles de la tourette’s syndrome. Neurology, 43(5), 890–894.
https://doi.org/10.1212/wnl.43.5.890
Leckman, James F., Vaccarino, F. M., Kalanithi, P. S. A., & Rothenberger, A. (2006).
Annotation: Tourette syndrome: a relentless drumbeat - driven by misguided brain
oscillations. Journal of Child Psychology and Psychiatry, 47(6), 537–550.
https://doi.org/10.1111/j.1469-7610.2006.01620.x
Lee, Y., Oya, M., Kaburagi, T., Hidaka, S., & Nakagawa, T. (2021). Differences Among Mixed,
Chest, and Falsetto Registers: A Multiparametric Study. Journal of Voice.
https://doi.org/10.1016/j.jvoice.2020.12.028
Martino, D., Ganos, C., & Pringsheim, T. M. (2017). Tourette Syndrome and Chronic Tic
Disorders: The Clinical Spectrum Beyond Tics. In International Review of Neurobiology
(Volume 134, pp. 1461–1490). Elsiever. https://doi.org/10.1016/bs.irn.2017.05.006
McCairn, K. W., Bronfeld, M., Belelovsky, K., & Bar-Gad, I. (2009a). The neurophysiological
correlates of motor tics following focal striatal disinhibition. Brain, 132(8), 2125–2138.
https://doi.org/10.1093/brain/awp142
McCairn, K. W., Bronfeld, M., Belelovsky, K., & Bar-Gad, I. (2009b). The neurophysiological
correlates of motor tics following focal striatal disinhibition. Brain, 132(8), 2125–2138.
https://doi.org/10.1093/brain/awp142
Müller-Vahl, K. R., Sambrani, · Tanvi, & Jakubovski, E. (2019). Tic disorders revisited:
146
introduction of the term “tic spectrum disorders.” European Child and Adolescent
Psychiatry, 28, 1129–1135. https://doi.org/10.1007/s00787-018-01272-7
Mulligan, H. F., Anderson, T. J., Jones, R. D., Williams, M. J., & Donaldson, I. M. (2003). Tics
and developmental stuttering. Parkinsonism & Related Disorders, 9(5), 281–289.
https://doi.org/10.1016/S1353-8020(03)00002-6
Murdoch, D. J., & Chow, E. D. (1996). A graphical display of large correlation matrices. The
American Statistician, 50, 178–180.
Neiman, M., Robb, M., Lerman, J., Duffy, R., Neirnan, M., Lerrnan, J., & Robb, M. P. (1997).
Acoustic examination of naturalistic modal and falsetto voice registers. Logopedics
Phoniatrics Vocology, 22(3), 135–138. https://doi.org/10.3109/14015439709075325
Niccolai, V., Korczok, S., Finis, J., Jonas, M., Thomalla, G., Siebner, H. R., … Biermann-Ruben,
K. (2019). A peek into premonitory urges in Tourette syndrome: Temporal evolution of
neurophysiological oscillatory signatures. Parkinsonism and Related Disorders, 65, 153–
158. https://doi.org/10.1016/j.parkreldis.2019.05.039
Peterson, B. S., & Leckman, J. F. (1998). The temporal dynamics of tics in Gilles de la Tourette
syndrome. Biological Psychiatry, 44(12), 1337–1348. https://doi.org/10.1016/S0006-
3223(98)00176-0
Pick, H. L., Siegel, G. M., Fox, P. W., Garber, S. R., & Kearney, J. K. (1989). Inhibiting the
Lombard effect Effects of noise on speech production: Acoustic and perceptual analyses.
Journal of the Acoustical Society of America, 85(894). https://doi.org/10.1121/1.397561
Podesva, R. J. (2007). Phonation type as a stylistic variable: The use of falsetto in constructing a
persona 1. Journal of Sociolinguistics, 11(4), 478–504.
Pogorelov, V., Xu, M., Smith, H. R., Buchanan, G. F., & Pittenger, C. (2015). Corticostriatal
interactions in the generation of tic-like behaviors after local striatal disinhibition.
Experimental Neurology, 265, 122–128. https://doi.org/10.1016/j.expneurol.2015.01.001
Profeta, V. L. S., & Turvey, M. T. (2018). Bernstein’s levels of movement construction: A
contemporary perspective. Human Movement Science, 57, 111–133.
https://doi.org/10.1016/j.humov.2017.11.013
Reese, H. E., Scahill, L., Peterson, A. L., Crowe, K., Woods, D. W., Piacentini, J., … Wilhelm,
S. (2014). The premonitory urge to tic: Measurement, characteristics, and correlates in older
adolescents and adults. Behavior Therapy, 45(2), 177–186.
https://doi.org/10.1016/j.beth.2013.09.002
Robertson, M. M. (2008a). The prevalence and epidemiology of Gilles de la Tourette syndrome.
Part 2: Tentative explanations for differing prevalence figures in GTS, including the
possible effects of psychopathology, aetiology, cultural differences, and differing
147
phenotypes. Journal of Psychosomatic Research, 65(5), 473–486.
https://doi.org/10.1016/j.jpsychores.2008.03.007
Robertson, M. M. (2008b, November 1). The prevalence and epidemiology of Gilles de la
Tourette syndrome. Part 1: The epidemiological and prevalence studies. Journal of
Psychosomatic Research. Elsevier. https://doi.org/10.1016/j.jpsychores.2008.03.006
Robinson, S., & Hedderly, T. (2016). Novel psychological formulation and treatment of “tic
attacks” in tourette syndrome. Frontiers in Pediatrics, 4(MAY).
https://doi.org/10.3389/fped.2016.00046
Rose, R. L. (1998). The communicative value of filled pauses in spontaneous speech. University
of Birmingham. https://doi.org/10.1044/2018_AJSLP-MSC18-18-0111
Saltzman, E., & Kelso, J. A. (1987). Skilled actions: a task-dynamic approach. Psychological
Review, 94, 84–106. https://doi.org/10.1037/0033-295X.94.1.84
Sambrani, T., Jakubovski, E., & Miller-Vahl, K. R. (2016). New insights into clinical
characteristics of Gilles de la Tourette syndrome: Findings in 1032 patients from a single
German center. Frontiers in Neuroscience, 10. https://doi.org/10.3389/fnins.2016.00415
Scahill, L. D., Simmons, E. S., & Volkmar, F. R. (2013). Yale Global Tic Severity Scale. In
Encyclopedia of Autism Spectrum Disorders (pp. 3415–3415). Springer New York.
https://doi.org/10.1007/978-1-4419-1698-3_1279
Schüller, T., Gruendler, T. O. J., Huster, R., Baldermann, J. C., Huys, D., Ullsperger, M., &
Kuhn, J. (2018). Altered electrophysiological correlates of motor inhibition and
performance monitoring in Tourette’s syndrome. Clinical Neurophysiology, 129(9).
https://doi.org/10.1016/j.clinph.2018.06.002
Schwartz, M. F. (2006, February). The cognitive neuropsychology of everyday action and
planning. Cognitive Neuropsychology. https://doi.org/10.1080/02643290500202623
Shaw, R. (2001). Processes, Acts, and Experiences: Three Stances on the Problem of
Intentionality. Ecological Psychology, 13(4), 275–314.
https://doi.org/10.1207/S15326969ECO1304_02
Shriberg, E. E. (1994). Preliminaries to a theory of speech disfluencies. University of California
at Berkeley. Retrieved from ftp://130.107.33.205/pub/papers/shriberg-thesis.pdf
Shue, Y., Keating, P., & Vicenik, C. (2009). VOICESAUCE: A program for voice analysis. The
Journal of the Acoustical Society of America, 126(4), 2221.
https://doi.org/10.1121/1.3248865
Specht, M. W., Woods, D. W., Nicotra, C. M., Kelly, L. M., Ricketts, E. J., Conelea, C. A., …
Walkup, J. T. (2013). Effects of tic suppression: Ability to suppress, rebound, negative
148
reinforcement, and habituation to the premonitory urge. Behaviour Research and Therapy,
51(1), 24–30. https://doi.org/10.1016/j.brat.2012.09.009
Spencer, M. L., & Titze, I. R. (2001). An investigation of a modal-falsetto register transition
hypothesis using helox gas. Journal of Voice : Official Journal of the Voice Foundation,
15(1), 15–24. https://doi.org/10.1016/S0892-1997(01)00003-0
Stathopoulos, E. T., Huber, J. E., Richardson, K., Kamphaus, J., DeCicco, D., Darling, M., …
Sussman, J. E. (2014). Increased vocal intensity due to the Lombard effect in speakers with
Parkinson’s disease: Simultaneous laryngeal and respiratory strategies. Journal of
Communication Disorders, 48(1), 1–17. https://doi.org/10.1016/j.jcomdis.2013.12.001
Stross, B. (2013). Falsetto voice and observational logic: Motivated meanings. Language in
Society, 42(2), 139–162. https://doi.org/10.1017/S004740451300002X
Sun, X. (2002). Pitch determination and voice quality analysis using subharmonic-to-harmonic
ratio. In Proceedings of the International Conference on Acoustics, Speech and Signal
Processing (Vol. 1, pp. 333–336). https://doi.org/10.1109/icassp.2002.5743722
Švec, J. G., Schutte, H. K., & Miller, D. G. (1999). On pitch jumps between chest and falsetto
registers in voice: Data from living and excised human larynges. The Journal of the
Acoustical Society of America, 106(3), 1523–1531. https://doi.org/10.1121/1.427149
Team, A. (2017). Audacity: Free audio editor and recorder. Retrieved from
http://www.audacityteam.org/
Team, R. C. (2021). R: A language and environment for statistical. Vienna, Austria: R
Foundation for Statistical Computing. Retrieved from https://www.r-project.org/
Titze, I. R. (2014). Bi-stable vocal fold adduction: A mechanism of modal-falsetto register shifts
and mixed registration. Journal of the Acoustical Society of America, 135(4).
https://doi.org/10.1121/1.4868355
Vallacher, R. R., & Wegner, D. M. (1987). What Do People Think They’re Doing? Action
Identification and Human Behavior. Psychological Review, 94(1), 1–15.
Vallacher, R. R., & Wegner, D. M. (1989). Levels of Personal Agency: Individual Variation in
Action Identification. Journal of Personality and Social Psychology, 57(4), 660–671.
Vicenik, C., Lin, S., Keating, P., & Shue, Y. (2021). Online documentation for VoiceSauce.
Weide, R. (1998). The CMU pronunciation dictionary. Carnegie Mellon University.
Zhang, Z. (2016). Cause-effect relationship between vocal fold physiology and voice production
in a three-dimensional phonation model. Journal of the Acoustic Society of America, 139(4),
1493–1507. https://doi.org/10.1121/1.4944754
149
Appendices
This appendix provides counts and general descriptions of the four tic inventories analyzed
for the studies in this dissertation and the method of their collection. Acoustic recordings were
collected of adult speakers of British English diagnosed with Tourette syndrome (3F,1M)
performing passage readings, personal narratives, and describing unfamiliar scenes while ticking
freely. Each participant’s tic inventory consists of the vocal and verbal tics that occurred during
these speech tasks.
The start of a task was defined as the end of the researcher’s prompt (e.g., “Please describe
this picture is as much detail as possible”). In the case of readings, the end of a task was
identified with the end of the last sentence in the text. In the case of picture descriptions and
personal narratives, the end of a task was defined as the participant’s utterance indicating that
they are finished (e.g., “And that’s it”, “Ok I’m done”). All four participants reliably produced
these statements of their own accord.
The study consisted of 12 speech tasks of three types: passage readings, personal narratives,
and picture descriptions. There were four tasks within each type; these were randomly ordered
within four blocks. Participants performed the same blocks in the same order. Participant B, C,
and D completed all four blocks.
33
Participant A only completed one block of tasks.
34
The table below shows the number of unique tic labels, the total number of tics produced,
and the total number of true-words contained in each participant co-speech ticking sample.
33
Acoustic recordings of Participants B, C, and D performing the Beach Scene picture description task
were corrupted and could not be subjected to analysis. Thus, 11 speech tasks are reported for these
participants even though they completed all four blocks.
34
Participant A left the study early.
150
Figure 43. Data collection protocol. Orange squares – passage readings; blue squares – personal
narratives; yellow squares – picture descriptions. Participants decided on which between-block break to
take lunch. Recording was only stopped for breaks that required the participant to leave the room (e.g.,
bathroom, lunch).
Passage reading tasks place inherent time limits on task performance. Readings of the
Rainbow Passage, which all four participants performed, ranged in duration from roughly 130-
188 seconds. In contrast, participants varied widely with respect to the amount of time they took
to complete tasks on which no limits were placed (i.e., picture descriptions and personal
narratives). Picture descriptions of the Pool Party scene, for example, ranged in duration from
88-421 seconds.
Table 33. Duration of tasks performed by each participant and count of words and tics that occurred.
Participant Task Type Task Task Duration (s) Word Count Tic Count Tics/Sec
A Narrative Proud 262.132 826 0 0.004
151
Picture Pool 124.015 204 72 0.581
Reading Rainbow 165.035 388 31 0.188
B Narrative Embarrassed 131.671 343 60 0.456
Joy 125.511 366 53 0.422
Proud 95.387 293 27 0.283
Sad 64.870 176 19 0.293
Picture Animal 57.857 151 29 0.501
Park 82.502 198 57 0.691
Pool 88.784 202 51 0.574
Reading Comma 142.471 450 44 0.309
Grandfather 55.852 150 15 0.269
Northwind 47.054 136 24 0.510
Rainbow 130.331 385 31 0.238
C Narrative Embarrassed 91.362 234 65 0.711
Joy 149.780 394 110 0.734
Proud 157.472 441 130 0.826
Sad 108.434 190 104 0.959
Picture
Animal 239.017 591 157 0.657
Park 308.908 827 171 0.554
Pool 421.811 1227 164 0.389
Reading
Comma 174.234 410 184 1.056
Grandfather 85.550 181 77 0.900
Northwind 60.103 154 60 0.998
Rainbow 188.941 437 194 1.027
D Narrative
Embarrassed 113.857 366 12 0.105
Joy 75.547 228 7 0.093
Proud 143.973 381 11 0.076
Picture
Animal 147.080 319 41 0.279
Park 348.887 806 66 0.189
Pool 274.235 691 19 0.069
Reading
Comma 179.452 481 67 0.373
Grandfather 101.734 239 38 0.374
Northwind 40.103 131 5 0.125
152
Rainbow 155.746 444 39 0.250
In Figure 44, task durations (top panel), word counts (middle panel), and tic counts (bottom
panel) are plotted by task type. Inspection of task duration shows that Participant A spend more
time on her personal narrative than the other three participants did, on average, across their four
personal narratives; notably, no vocal or verbal tics were produced. Participant C and Participant
D spent more time on their picture descriptions relative to Participant A and Participant B. Word
count findings mirror task duration results; the longer you spend saying something, the more
words are spoken. Tic count, on the other hand, does not increase with task duration/word count,
as was already noted with regards to Participant A’s personal narrative. Patterns of ticking by
task type are discussed in the second chapter of the dissertation.
153
Figure 44. Task duration (top), word count (middle), and tic count (bottom) by task type for each
participant.
Table 34 presents general information regarding tic events in each participant’s inventory.
The following should be noted with regards to the analytical coding of tic events.
Individual/unique vocal noise tics were collapsed into a single vocal noise category (“NS” label).
This obscures the fact that vocal noise tics of different kinds occur (e.g., yips, grunts, squeaks).
Second, both single-word and whole-phrase verbal tics are possible. In the Total Tics column in
Table 34, individual tic-words making up tic phrases are counted separately. In contrast, the
Unique Tics column indicates how many unique tic events occurred, treating tic phrases as
unitary events. Participant B produced the greatest number of unique tics. Participant C produced
the greatest number of tics and words overall. Participant D produced the smallest number of
both unique tics and total tic events.
154
Table 34. Participant unique/total tic count, word count, and count of speech tasks performed.
Participant Unique Tic Labels Total Tics Total Words Tasks Performed
A 17 48 1035 3
B 62 112 2126 11
C 37 648 3540 11
D 12 33 2609 11
Lists of unique tics for each participant are presented in the tables that follow. Tics are
ranked by their relative frequency within each task type (yellow = picture descriptions, blue =
personal narratives, orange = passage readings).
Participant A
Table 35. Unique vocal and verbal tics and their frequency by speech task type. The average acoustic
duration of each unique as well as the average f0 of its stressed vowel is presented for reference.
Task Type Tic Label Count
Acoustic Duration (ms) Stressed Vowel Mean f0 (Hz)
Mean Sd Mean Sd
Picture
NO 7 409.394 46.609 324.374 76.471
FUCK A DUCK 2 1199.087 0.145 446.448 0.842
FULL SCREEN 2 1252.085 222.535 421.308 44.444
HA 2 280.392 113.554 495.564 29.554
HOP 2 370.275 37.305 455.531 121.306
NOW 2 350.384 44.778 349.635 170.880
ARE YOU
TRYNA DO A
POOL PARTY
1 2074.721 / 356.077 /
OUCH 1 434.495 / 475.154 /
PARTY 1 769.557 / 296.978 /
POOL 1 375.251 / 468.238 /
POPE 1 761.335 / 427.620 /
155
WHOA 1 871.178 / 269.974 /
Reading
HELLO 4 651.489 32.714 481.399 20.075
POPE 3 653.785 93.066 495.705 29.089
OUCH 2 477.562 41.267 461.259 67.321
BOWS 1 743.211 / 464.088 /
FUCK A DUCK 1 1025.937 / 489.477 /
HAHA 1 614.937 / 360.184 /
OKAY 1 425.899 / 325.399 /
Participant B
Table 36. Unique vocal and verbal tics and their frequency by speech task type. The average acoustic
duration of each unique as well as the average f0 of its stressed vowel is presented for reference.
Task Type Tic Label Count
Acoustic Duration (ms) Stressed Vowel Mean f0 (Hz)
Mean Sd Mean Sd
Narratives NS 22 616.183 276.701 / /
HAPPY 9 592.520 114.281 285.143 50.130
FUNKY 4 596.126 88.691 288.581 61.451
KIND 3 585.786 144.792 472.511 93.142
ASIAN 3 736.340 293.273 409.696 99.051
PORRIDGE 2 604.152 112.767 285.039 53.991
HAPPY BABY 2 1117.395 118.292 295.546 23.191
CUNT 2 581.792 74.944 363.219 57.028
KIND WOMEN 1 1585.460 / 367.394 /
TWO 1 532.435 / 428.854 /
PRIVATE TIME 1 837.388 / 207.737 /
SHAG 1 397.046 / 295.425 /
PRIVATE 1 824.394 / 227.861 /
PIRATES 1 723.834 / 270.361 /
PINE 1 548.752 / 290.263 /
PATRICK 1 688.436 / 252.888 /
MONKEY 1 690.580 / 390.330 /
LLAMA 1 338.081 / 235.256 /
156
IT 1 234.651 / 202.475 /
IONE 1 575.267 / 356.864 /
HAPPEN 1 689.267 / 403.696 /
FUNKING 1 718.367 / 216.164 /
FINGERS 1 598.501 / 235.294 /
CONE 1 650.323 / 169.855 /
CHEESE 1 529.007 / 412.328 /
BREAK 1 542.472 / 302.630 /
Picture NS 17 598.106 225.563 / /
HAPPY 8 535.998 70.167 251.047 18.052
KIND 3 598.272 17.140 319.112 21.200
KIND WOMEN 1 998.931 / 259.289 /
PUNK ASS
BYRON
1 1394.810 / 285.947 /
PIRATES 1 737.950 / 232.681 /
PIRATE 1 492.411 / 222.952 /
PICNIC 1 409.071 / 383.380 /
OKAY 1 589.641 / 244.585 /
MONKEY 1 548.753 / 235.264 /
LITTLE 1 489.209 / 337.281 /
IONE 1 650.773 / 362.351 /
HEY 1 474.471 / 359.596 /
FIGHTING 1 1028.900 / 237.805 /
DENMARK 1 517.977 / 246.802 /
CUNT 1 706.098 / 264.811 /
CUNNILINGUS 1 987.755 / 318.062 /
COCKTAILS 1 694.879 / 188.133 /
COCKTAIL 1 722.449 / 204.274 /
CHURCH 1 494.412 / 203.573 /
CHEESE 1 505.954 / 225.616 /
AH 1 887.568 / 339.691 /
Reading
NS 9 597.1486 150.070 / /
HATE 5 485.4494 180.495 289.812 56.071
157
IONE 3 721.0823 258.815 293.252 129.137
KIND 2 758.9675 108.312 351.221 110.983
HER 2 306.7620 26.809 347.015 97.096
PRIVATE TIME 1 906.5580 / 185.364 /
SIMON 1 694.8020 / 270.829 /
NORTH 1 448.9800 / 339.501 /
MARY 1 476.9660 / 295.586 /
LLAMA 1 558.7300 / 103.364 /
HI 1 638.5480 / 385.615 /
HATE CRIME 1 837.2320 / 257.843 /
GRANDMA 1 744.2720 / 273.144 /
FUNKY 1 506.1550 / 198.157 /
FUCK 1 482.2020 / 260.872 /
EY 1 326.4170 / 241.917 /
DIE 1 394.3150 / 277.694 /
CUNT 1 842.3330 / 273.692 /
CRIMES 1 423.9200 / 168.842 /
ARISTOTLE 1 887.9820 / 247.211 /
Participant C
Table 37. Unique vocal and verbal tics and their frequency by speech task type. The average acoustic
duration of each unique as well as the average f0 of its stressed vowel is presented for reference.
Task Type Tic Label Count
Acoustic Duration (ms) Stressed Vowel Mean f0
Mean Sd Mean Sd
Narrative BISCUIT 122 349.338 61.402 305.661 101.912
FUCK 32 366.171 114.992 301.338 52.885
HEY 14 257.639 82.222 320.810 75.749
HEDGEHOG 4 594.544 171.155 323.571 70.029
SAUSAGE 4 687.216 240.229 366.786 88.399
I LOVE CATS
INNIT
2 948.144 15.468 337.667 34.967
FUCK_IT 2 389.660 36.406 357.995 16.945
158
BEE 1 277.540 / 398.268 /
BISCUIT_disf 1 238.808 / 189.031 /
Picture BISCUIT 184 343.391 56.441 269.573 64.850
FUCK 23 321.940 84.013 316.914 68.847
FUCK_IT 5 456.600 50.979 355.300 55.343
SAUSAGE 4 693.963 252.879 431.687 85.082
HEY 3 243.402 80.016 355.070 31.932
HEDGEHOG 2 368.330 7.425 342.545 98.804
BEANS 1 638.549 / 349.280 /
BISCUIT_disf 1 245.523 / 411.601 /
BISCUITS 1 572.500 / 457.935 /
I LOVE CATS
INNIT
1 853.161 / 292.116 /
FUCK A SHEEP 1 498.570 / 337.989 /
FUCKING 1 303.810 / 274.696 /
NANDOS 1 784.623 / 198.818 /
Reading BISCUIT 157 306.065 45.927 305.383 78.881
HEY 25 222.655 48.876 368.326 62.998
FUCK 14 299.045 50.236 285.340 51.499
SAUSAGE 9 578.771 80.211 346.518 94.575
HEDGEHOG 4 374.908 56.962 313.555 51.517
FUCK_IT 3 269.866 39.839 306.769 90.128
NS 3 223.653 21.468 / /
BEAN 1 288.664 / 424.925 /
BEE 1 551.637 / 392.852 /
BISCUIT_disf 1 380.029 / 241.503 /
COMMA 1 415.407 / 149.044 /
GRANDFATHER 1 598.639 / 290.501 /
PASSAGE 1 782.213 / 224.284 /
READY 1 231.173 / 260.313 /
SCIENCE 1 698.700 / 143.503 /
Participant D
159
Table 38. Unique vocal and verbal tics and their frequency by speech task type. The average acoustic
duration of each unique as well as the average f0 of its stressed vowel is presented for reference.
Tic Label Task Type Count
Acoustic Duration (ms) Stressed Vowel Mean f0 (Hz)
Mean Sd Mean Sd
Narrative
FUCK_OFF 2 450.456 148.119 185.916 88.225
NS 2 435.083 9.357 / /
Picture
NS 30 536.934 252.366 / /
BLAH 3 226.152 66.432 102.843 11.232
FUCK_OFF 5 619.678 3.769 162.623 62.745
CUNT 1 625.609 / 184.706 /
DRUGS 1 526.185 / 188.549 /
Reading
NS 29 485.062 195.913 / /
FUCK 9 317.641 112.768 209.714 27.819
FUCK_OFF 2 357.814 44.265 157.963 74.315
CHECK
YOURSELF
BEFORE YOU
WRECK
YOURSELF
1 762.308 162.265 228.714 65.427
BANANA 1 724.661 / 187.864 /
CUNT 1 651.181 / 208.682 /
YOU CUNT 1 556.229 / 140.567 /
WARM 1 798.186 / 232.542 /
Abstract (if available)
Abstract
Adults with Tourette syndrome produce unwanted movements and vocalizations called tics. Tics do not correspond to the ticker's own behavioral goals and, as a result, they appear inappropriate in context. For some people, tics are often preceded by an uncomfortable sensation that grows in intensity until the tic response is released. Tics occur on a background of typical goal-directed behavior, including speech, but ticking and speaking appear to be at cross-purposes. Strictly speaking, production of vocal tics (i.e., tic vocal tract movements that have an audible result) cannot overlap in time with speech production because the two kinds of behavior have opposed aims whilst requiring action by the same set of effectors. Preceding urges to tic, however, frequently co-occur with intentions to speak because like visceral urges, urges to tic are orthogonal to intentions to act. Speech planning and production processes in tickers therefore co-mingle with what could be competing urges to vocalize. The broad objective of this dissertation is to determine whether, and how, ticking and speaking can manifest “cooperatively” in light of these circumstances. To put another way, this dissertation aims to understand whether tic production can be wrangled in service of communicative goals. So-called “cooperative” interactions between ticking and speaking allow the tasks of each system to be achieved. Using a corpus of acoustic recordings of adult tickers performing a variety of monadic speech tasks while ticking freely, linguistically informed analyses were carried out, each of which probed a different aspect of co-speech ticking for specific signs of optimization. Taken together, results suggest that observed systematic interactions promoted optimization.
Three signs of systematic and optimized interaction between ticking and speaking were identified in co-speech ticking data. First, for most tickers, the vast majority of tic-words occurring during running speech are located immediately before or immediately after prosodic phrase boundaries, which is to say that tics systematically occur around prosodic phrases, not interior to them. Ticking around prosodic phrases ensures that a talker’s intended linguistic message is produced correctly while still allowing for frequent tic urge satisfaction—an optimal outcome given the circumstance. The distributional pattern is suggestive of temporary deferment of tics to the end of a phrase, a sort of accommodation on the part of the tic system. But there is also evidence that the speech system itself re-organizes to accommodate potential tics through the use of adaptive prosodic phrasing. When ticking around phrases, shorter phrases mean more frequent tic production. At least one participant may be using this principle to her advantage: the shorter the phrases in this dataset, the fewer the interruptions by tics. In contrast, one participant who shows no adaptive changes to the size of phrases in response to ticking has very frequent tic interruptions. A second predicted sign of optimization was found by comparing the phonatory characteristics of stressed vowels in verbal tics and true words. It was found that the former are distinguishable from the latter in that verbal tic stressed vowels display acoustic signatures of falsetto voice. By producing verbal tics along a non-speech acoustic channel talkers who tic can segregate intended, dialogic meaning from unintended referential meaning. Segregated tic and speech acoustic channels were observed in all four case studies—even in the case of the one participant who consistently failed to keep tics out of prosodic phrases. A third sign of cooperative interaction is with regards to the token-to-token durational variability of verbal tics, which does not appear word-like. Proximity to prosodic phrases boundaries does not induce lengthening in verbal tics, suggesting that these tics may be “unprosodified” (phrasal prosody), even if their temporal occurrence is in part determined by the presence of prosodic boundaries.
Participants differ with respect to the amount of experience they have in free-ticking (on its own) and co-speech free-ticking, and differences across participants in co-speech ticking patterns align with expectations given apparent skill level. For instance, one participant produced hundreds more tics than the others but also experienced few interruptions; this participant free-tics by default and works in public speaking. A participant who reported rarely free-ticking, in contrast, produced mostly interruptive tics. These differences suggest that the observed interactions between ticking and speaking reflect compensatory strategies that have been developed to ensure the quality of speech (e.g., fluency, clarity) despite frequent ticking. Taken together, the results presented in this dissertation support the notion that in order to optimally achieve the tasks of ticking and speaking, tic and speech actions are co-orchestrated.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The prosodic substrate of consonant and tone dynamics
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
Effects of speech context on characteristics of manual gesture
PDF
A computational framework for exploring the role of speech production in speech processing from a communication system perspective
PDF
Speech production in post-glossectomy speakers: articulatory preservation and compensation
PDF
The planning, production, and perception of prosodic structure
PDF
Beatboxing phonology
PDF
Structure and function in speech production
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Emotional speech production: from data to computational models and applications
PDF
The role of individual variability in tests of functional hearing
Asset Metadata
Creator
Llorens Monteserin, Mairym (author)
Core Title
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Publication Date
11/16/2022
Defense Date
09/06/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,phonetics,prosody,speech production,Tourette's,verbal tics,vocal tics
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Louis (
committee chair
), Byrd, Dani (
committee member
), Iskarous, Khalil (
committee member
), Narayanan, Shri (
committee member
)
Creator Email
llorensm@usc.edu,mairymllo@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112485550
Unique identifier
UC112485550
Identifier
etd-LlorensMon-11320.pdf (filename)
Legacy Identifier
etd-LlorensMon-11320
Document Type
Dissertation
Format
theses (aat)
Rights
Llorens Monteserin, Mairym
Internet Media Type
application/pdf
Type
texts
Source
20221121-usctheses-batch-992
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
phonetics
prosody
speech production
Tourette's
verbal tics
vocal tics