Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Sound sequence adaptation in loanword phonology
(USC Thesis Other)
Sound sequence adaptation in loanword phonology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SOUND SEQUENCE ADAPTATION IN LOANWORD PHONOLOGY
By
Daylen Riggs
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
(LINGUISTICS)
May 2014
Copyright 2014 Daylen Riggs
ii
Table of Contents
Acknowledgements iv
List of Tables vii
List of Figures viii
Abstract ix
List of Abbreviations xi
1 INTRODUCTION…………………….………………………………………………1
2 DATA, METHODOLOGY, AND TYPOLOGY…………………………………………. 5
2.1 The Cross-Linguistic Corpus: Data and Methodology ………………………5
2.1.1 Data…………………………………………………………………... 6
2.1.2 Sources……………………………………………………………….. 8
2.1.3 Data Properties………………………………………………………. 9
2.1.4 Data Specifics….…………………………………………………….. 15
2.1.5 Hypotheses and Data Analysis………………………………………. 17
2.1.6 Corpus Normalization……………………………………………….. 18
2.2 General Findings: Typology of Process…………………………………….. 23
2.2.1 Cross-Linguistic Trends…………….………………………………... 23
2.2.2 The Problem of Grouping Results: Simpson’s Paradox…………….. 27
2.3 Typology of Language Process Preference………………………………….. 31
2.3.1 Initial Considerations………………………………………………… 31
2.3.2 The Classification Method…………………………………………… 35
2.3.3 Language Classification……………………………………………… 40
2.4 Typological Trends……………………………………………………………49
2.4.1 The Typology of Language Process Preference…………………….. 49
2.4.2 The Typology as a Function of Phonological Material ……………... 53
2.5 Limitations of the typological investigation…………………………………. 56
2.6 An analysis of conformity to the L1 phonology…………………………….. 58
2.6.1 Data and Methodology………………………………………………. 58
2.6.2 Main Findings of the Conformity Analysis………………………….. 59
2.7 Conclusion: A Cue-Based Approach………………………………………... 62
3 LOANWORD ADAPTATION AND THE PRIMACY OF PERCEPTUAL CUES……………. 64
3.1 Loanwords and Borrowing…………………………………………………... 64
3.2 The Cue Hypothesis…………………………………………………………. 68
3.3 Perceptual Cues and the Predictions of the Cue Hypothesis …………………74
3.3.1 Perceptual Cues and Cue Robustness………………………………... 75
3.3.2 Cue Robustness as a Function of Word Position…………………….. 77
3.3.3 Predictions of the Cue Hypothesis…………………………………… 70
3.4 Testing the Predictions……………………………………………………….. 85
3.4.1 Ratio Predictions……………………………………………………... 88
4.5 Summary and Conclusion……………………………………………………. 94
iii
4 CASE STUDY: SOUND SEQUENCE ADAPTATION IN TONGAN.……………………... 98
4.1 Tongan as a case study………………………………………………………. 98
4.2 Tongan adaptation trends…………………………………………………….. 99
4.3 Fine-grained investigation of Tongan Patterns ………………………………108
4.3.1 Data and Analysis……………………………………………………. 111
4.3.2 General Trends……………………………………………………….. 112
4.3.3 Word-Initial Consonant Clusters…………………………………….. 113
4.3.4 Word-Medial Consonant Clusters……………………………………. 114
4.3.5 Word-Final Codas……………………………………………………. 117
4.3.6 Word-Final Consonant Clusters……………………………………… 120
4.4 Summary and Conclusion……………………………………………………. 125
5 THEORETICAL CONNECTIONS AND CONCLUSIONS………………………………… 128
5.1 Summary of Findings………………………………………………………… 128
5.2 Theoretical Connections……………………………………………………... 132
5.2.1 Gestural Clock Slowing in the Borrowed Word……………………... 132
5.2.2 Optimality Theory……………………………………………………. 135
5.3 Connection with Theories of Loanword Adaptation………………………… 141
REFERENCES………………………………………………………………………... 149
APPENDICES………………………………………………………………………… 160
iv
Acknowledgements
Receiving a PhD in linguistics is a goal I set many years ago, at the age of fifteen
or sixteen. Numerous people have since then been instrumental in the achievement of
this goal. This dissertation could not, nor would not have been written if it were not for
the assistance, guidance, support, and encouragement of many people, to whom all I am
deeply grateful.
First and foremost, I thank my advisor and dissertation chair, Rachel Walker. The
true magnitude of her assistance and contribution to this dissertation transcend words that
I can put on paper. Her patience with me has been nothing short of angelic. Her
commitment to me has been nothing short of heroic. She is a graciously powerful
teacher, a brilliant scholar, and a magnificent person who is both humble and humbling. I
am forever thankful for the fact that this dissertation could not have happened with any
other advisor but her.
I thank Louis Goldstein, who has assisted and advised me on many projects at
USC. Louis was essential in the development of my research on loanword phonology
and my dissertation. He both helped me to get back on course when I was off, and
provided insights that allowed me to proceed on a better course. Louis’ ability to
(immediately) see things in my research that I had not (yet) considered enriched my
dissertation, on top of being a profound demonstration of his genius.
I thank Stephen Finlay for serving on my dissertation and qualifying committees;
Stephen provided insightful angles of thought, and recognized the Wittgensteinian nature
of some of my thought, for which I am both grateful. I also thank Abigail Kaun for
serving on my M.A. and qualifying committees, who inspired the typological
investigation of loanword adaptation that ultimately ended up being the focus of my
dissertation.
I owe a great debt of gratitude to Dani Byrd. To list everything she has done for
me would genuinely take pages. She saw something in me early on, as an undergraduate,
and continued to see it in me, even when I didn’t. It was an amazing pleasure working
with her on several projects. It is a humbling and absolute honor to have my name next
to hers on publications. I thank her for being instrumental in my life and in my academic,
intellectual, and personal development, and for everything else.
v
Many other faculty members at USC have also been crucial in my academic and
intellectual development, as well as important to me personally. I thank Ed Finegan, the
wisest man I’ve ever known; his interactions with me were continuously invaluable and
profound. I thank Elena Guerzoni, for sharing her thoughts and her wit with me (and for
giving me a BMW). I am also grateful for knowing the late Jean-Roger Vergnaud. He
was the first teacher of linguistics I had, and continued to teach me until his passing.
Somewhere in between teaching C-Command on the first day of Intro to Linguistics, to
discussing the !-gesture with me in my third year in grad school, he put the idea in my
mind that linguistics is ultimately a branch of cognitive neuroscience; I cherish the
respect he had for me as well as his influence on me. I also thank the other many
excellent and influential teachers I have had, especially Rachel Walker, Anna !ubowicz,
Bridget Copley, Louis Goldstein, Todd Haskell, Jack Hawkins, Elsi Kaiser, Peter
Ladefoged and Jennifer Perlmutter.
I especially thank Robert Armstrong, the first person to teach me language. In the
first few days of his Spanish I class, a spark went off in my mind, which set my trajectory
for the next eighteen years. Señor Armstrong: En la tercera semana del semestre, cuando
empecé a aprender el subjuntivo, me dijo "Espero que tengas éxito." Este éxito se debe a
usted.
I also thank friends and colleagues in the USC phonetics and phonology group.
My projects in grad school all benefited from assistance and discussion in PhonLunch
and the phonetics lab. I especially thank Melissa Frazier, Khalil Iskarous, Karen Jesney,
Jelena Krivokapi", Ben Parrell, Anna Lubowicz, and Sam Tilsen. I also thank my
collaborators on projects in the phonetics lab and in the Department of Electrical
Engineering: Jason Adams, Erik Bresch, Argyro Katsika, Sungbok Lee, and Shrikanth
Narayanan.
In my journey at USC, I have been fortunate to meet and know a number of
friends and colleagues who have been influential, supportive, and important in my life. I
thank Abe Kazemzadeh, who was there for me –literally, at USC – the whole time. Nate
vi
Dumas was the first scholar I had the pleasure of knowing; when we were undergrads at
USC, we shared the same determination to get into grad school, a very meaningful
experience for which I am grateful. My friendship with Aaron Cornell started over
Bertrand Russell and has continued through many ups and downs and profound life
experiences in multiple countries and states. Aaron has always been instrumental. And
so have many other brilliant people, with whom I am truly fortunate to have crossed
paths; I thank Rebeka Campos-Astorkiza, Ed Holsinger, Michal Temkin Martinez, Philip
Potamites, Ann Sawyer, Magdalena Pire Schmidt, Laura Tejada, and Stephen Tobin.
Many other people were important for this dissertation as well for my time in grad
school. I thank Janet Anderson, Nick Bartlett, Amanda Bloom, James Grime, Xiao He,
Ben Herzberg, Lucy Kim, Joseph Peha, Joyce Perez, Meghan Sensenbach, Russ
Timmins, Richard Tippy, and Barbara Tomaszewicz. To everyone listed above, and
those I may have unintentionally omitted, thank you.
I thank my parents, Craig and Tamara Riggs. For everything. This dissertation is
dedicated to them.
D.R.
vii
List of Tables
Table Page
TABLE 1. List of languages by family…………………………………………… 7
TABLE 2. Coda Adaptations……………………………………………………… 14
TABLE 3. Epenthesis in consonant clusters……………………………………… 14
TABLE 4. Deletion in consonant clusters………………………………………… 14
TABLE 5. Alternation/Substitution in consonant clusters………………………… 15
TABLE 6. Metathesis in consonant clusters……………………………………… 15
TABLE 7. Retention in consonant clusters………………………………………… 15
TABLE 8. Data example from Swahili and Malayalam…………………………… 19
TABLE 9. Swahili and Adjusted Malayalam Data………………………………… 20
TABLE 10. Total observations/processes the corpus, across all languages……… 23
TABLE 11. Hypothetical Language Data………………………………………… 29
TABLE 12. Normalized Hypothetical Language Data…………………………… 30
TABLE 13. Typology of possible languages……………………………………… 33
TABLE 14. Retention Dominant Subtypes………………………………………… 39
TABLE 15. Step 1 Results: Retention vs Adaptation…………………………… 40
TABLE 16: Adapting languages: Epenthesis vs Deletion……………………… 41
TABLE 17. Pairwise Comparisons for Retention>Epenthesis>Deletion Languages 43
TABLE 18. Pairwise Comparisons for Retention>Deletion>Epenthesis Languages 44
TABLE 19. Classification summary………………………………………………... 45
TABLE 20. |Retention Dominant| Subtype Classification…………………………. 48
TABLE 21. Typological results…………………………………………………….. 50
TABLE 22. A Typology of Types………………………………………………….. 52
TABLE 23. Ratio Variables……………………………………………………….. 82
TABLE 24. Hypothetical Language Data………………………………………….. 84
TABLE 25. Indonesian Consonant Clusters……………………………………….. 85
TABLE 26. Total Deletion Counts…………………………………………………. 112
TABLE 27. Medial Consonant Cluster Deletion Counts…………………………... 115
TABLE 28. Medial Consonant Cluster Deletion Counts…………………………... 118
TABLE 29. Final Consonant Cluster Deletion Counts…………………………….. 121
TABLE 30. Consonant cluster sonority scores…………………………………… 124
TABLE 31. Sonority scoring analysis results……………………………………… 124
TABLE 32. Constraint Violation Tallies…………………………………………… 140
viii
List of Figures
Figure Page
FIGURE 1. Total process observations across all languages……………………….. 25
FIGURE 2. Direct Division Normalization means per language…………………… 26
FIGURE 3. T-Score Scaled Normalized Means, averaged across language………... 27
FIGURE 4. Flowchart demonstrating the classification method…………………… 39
FIGURE 5. Phonological material spectrum……………………………………….. 54
FIGURE 6. Bar graph showing the distribution of language super-types…………... 55
FIGURE 7. Distribution plotted as a function……………………………………… 55
FIGURE 8. Diagram of “crystals” ………………………………………………… 78
FIGURE 9. Counts for normalized Data: Direct Division………………………… 87
FIGURE 10. T-value scaling normalization………………………………………… 87
FIGURE 11. Cross-Language Means, Retention to Adaptation……………………. 88
FIGURE 12. Boxplot: Retention to Adaptation……………………………………. 89
FIGURE 13. Means: Retention to Epenthesis……………………………………… 90
FIGURE 14. Individual Languages, Retention to Epenthesis………………………. 91
FIGURE 15: Retention to Deletion………………………………………………… 92
FIGURE 16. Cross-linguistic Percentage of Epenthesis…………………………… 93
FIGURE 17. Boxplot for Percentage of Epenthesis ANOVA……………………… 93
FIGURE 18. Percentage of Deletion for Statistically Significant Language……….. 94
FIGURE 19. Tongan totals………………………………………………………….. 101
FIGURE 20. Tongan Consonant Cluster adaptation……………………………….. 102
FIGURE 21. Direct Division Normalized totals……………………………………. 103
FIGURE 22. T-Score Normalized Totals…………………………………………… 104
FIGURE 23. Percentage of Epenthesis……………………………………………… 105
FIGURE 24. Tongan Deletion and Preserve Cue…………………………………… 107
FIGURE 25. Tongan Epenthesis and Enhance Perceptibility……………………… 108
FIGURE 26. Distribution of Deletion in Tongan…………………………………… 113
FIGURE 27. Distribution of deletion in word final consonant clusters…………….. 122
FIGURE 28. Sonority Score Results………………………………………………... 125
FIGURE 29. Gestural Score for “spams” ………………………………………….. 135
ix
Abstract
This dissertation investigates the phonology of loanword adaptation of sound
sequences. When speakers borrow words that contain phonotactically marked sequences
of sounds, there are a number of different ways by which they may adapt the foreign
word into their native language. The type of adaptations that occur cross-linguistically
and the range and distribution of occurrence is the focal point of this study. Loanword
data from fifty-three languages were collected, analyzed, and assembled into a typology
of adaptation strategies, as well as a typology a languages based on their adaptation
tendencies. The trends in both of these typologies indicate a strong cross-linguistic bias
against consonant deletion in loanword adaptation; sequences of sounds in foreign words
(e.g. consonant clusters) tend to be adapted faithfully in loanwords. When a repair
happens, the repair is almost always epenthesis. It is argued that this is due to a general
need to preserve the linguistic information in a foreign word when it is first used by
speakers of the borrowing language, in order to facilitate communication and
comprehension of a word that is unfamiliar.
In addition to providing cross-linguistic data and a typological analysis of
loanword adaptation, this dissertation contributes to phonology by providing a method of
classifying languages based on their treatment of foreign words containing
phonotactically marked sequences of sounds. Loanword adaptation data are highly
variable; no language in this study behaves consistently with respect to how it handles
sequences of sounds in foreign words. It is thus not always immediately clear what
adaptation strategy that a language prefers. A method involving a series of statistical
tests is developed and employed in order to determine the classification of languages by
their preferred adaptation strategy.
This dissertation also provides an analysis of consonant cluster repair strategy
with respect to location in the word. This analysis reveals various word position
asymmetries in loanword adaption. Word-final consonant clusters are more likely to be
repaired than word-initial and word-medial consonant clusters; deletion is observed most
frequently in word-final position. That is, deletion is generally intolerable as a repair, but
less so for word-final consonant clusters. It is argued that this is due to the relative
strength of perceptual cues. Where perceptual cues are weakest, i.e. word-finally,
x
deletion is more tolerable as repair, as deleting a weakly cued consonant has relatively
minor perceptual consequences.
A case study of loanword adaptation in Tongan is also provided in order to further
examine the role of perceptual cues in describing adaptation patterns. The analysis of
Tongan confirms the primacy of perceptual cues in describing loanword adaptation
patterns, in addition to showing that a complete description of a language’s loanword
adaptation behavior need also include language-specific properties that may not be
detected in a broad typological analysis. In other words, it is argued that the need to
preserve linguistic information, i.e. perceptual cues, is necessary but not sufficient for a
complete understanding of sound sequence adaptation in loanwords. The phonological,
sociological, and historical properties of the borrowing language also play an important
role in determining how sequences of sounds in foreign words are adapted.
xi
List of Abbreviations
CH: The Cue Hypothesis
EP: Enhance Perceptibility
EV: Epenthetic Vowel
FSS: Foreign Sound Sequence
L1: First Language
L2: Second Language
Lb: Borrowing Language
Ls: Source Language
LW: Loanword
LWA: Loanword Adaptation
OT: Optimality Theory
PC: Preserve Cue
SW: Source Word
1
“Mele kalikimaka is the thing to say, on a bright, Hawaiian Christmas day.”
- Bing Crosby
1 INTRODUCTION. Languages vary widely as to the types of syllables they permit in their
words. The tolerance of permissible syllable types (or sequences of sounds) can be construed as
a spectrum ranging from a simple sequence of an optional consonant followed by a vowel, to a
non-vocalic nucleus flanked by multiple consonants. Hawaiian and English serve as examples of
this variation. Hawaiian tolerates only (C)V syllables, e.g. [/ae] ‘yes’ (Kaiao 2003), whereas
English allows up to three consonants flanked by a vowel, e.g. [st®eINTs] ‘strengths.’ When
speakers of one language borrow a word from another language, they must deal with sequences
of sounds in a foreign language that may or may not be permissible their own language. A
robust case in point is shown above in the English word [k®Is(t)m´s] ‘Christmas,’ borrowed into
Hawaiian as [kalikimaka]. Such robust changes have been made to the English word many
(perhaps most) English speakers do not recognize that word as the English word “Christmas.”
The English [®] sound has been changed to [l] and [s] has been changed to a [k]. Additionally,
the sequence of sounds in the English also underwent a change in the conversion to Hawaiian.
Consonants that are adjacent to one another and at the end of the English word are no longer so
in Hawaiian. The word-initial stop+liquid [k®] sequence in English is separated by a vowel in
the Hawaiian borrowing; and where the English word ends with a consonant [s], the Hawaiian
word ends with a vowel.
This dissertation investigates such phonological changes in Loanword Adaptation
(LWA), sound sequence (or syllable level) alternations that occur when one language borrows a
word from another language. Specifically, it examines typology of LWA of sequences of
sounds (specifically, consonants) from both a cross-linguistic and within-language perspective,
addressing the type of sound sequence changes that occur in the world’s languages, the extent to
which such changes occur, and the patterns of occurrence.
The adaptation of loanwords is both an important and theoretically interesting topic of
investigation for variety of reasons, potentially informing not only phonological theory, but
language and human cognition as well. This is because in some cases, LWA can be thought of
as a phenomenon where speakers must confront linguistically unfamiliar structures. Speakers
must therefore be both creative and discerning when borrowing a foreign word, navigating
2
different possibilities for handling the unfamiliar structure. The choice that is made and the
reasons for why the choice was made potentially reveals properties of phonology and grammar
that would otherwise remain undiscovered in an inquiry limited to native phonological grammar,
i.e. the grammar of the borrowing language (Lb).
It is argued that the typological patterns of LWA revealed in this dissertation are
influenced by two factors: (1) the phonotactic properties of the borrowing language (Lb), and (2)
the communicative imperative to maximize the probability of comprehension of the word being
borrowed. These two factors, independently and combined, contribute to the description and
explanation of both the observed cross-linguistic patterns and within-language patterns.
This dissertation contributes to phonological theory and the study of LWA in a number of
ways. Firstly, it provides data on the cross-linguistic adaptation of Loanwords (LWs) from fifty-
three genetically and geographically diverse languages, revealing the commonality (and rarity) of
various strategies employed in LWA. Additionally, I provide a method for classifying languages
in terms of their LWA strategy preference, which are compiled into a typology of languages. The
trends in the typology indicate universal characteristics of how speakers adapt syllable structure
in LWs, shedding light on the nature of LWA.
I also compare the LWA preferences of individual languages against their native
phonotactic properties/tolerances. This comparison reveals the extent to which LWA patterns
mirror the syllable-level phonology of the native grammar of the Lb, showing that LWA
processes often do not conform to aspects of the Lb, a fact that has further significance for the
study of LWA.
Additionally, I provide data on the type/location of syllable structure involved in LWA.
That is, I examine consonants and consonant clusters as they occur in different syllable/word
positions. This study of the differences of how sequences of sounds are handled in onset, coda,
and word-medial position reveals key differences that suggest the primacy of perceptual cues in
the LWA process.
Another contribution is the comparison of cross-linguistic trends against within-language
trends. I provide an in-depth case study of loanword adaptation in Tongan. The similarity
between the cross-linguistic and within-language (Tongan) patterns further reveals aspects of the
characteristics and properties of LWA, in addition to showing the role of and extent to which
language-specific properties are involved in LWA.
3
This dissertation is organized as follows. In Chapter 2, I describe the corpus of data that
is the main focus of analysis in the dissertation, including the sources used, and how the data
were collected and organized. I also describe the methodologies for handling and analyzing the
data, as well as the methodology used to classify languages by their preference of LWA process.
I then examine the general trends observed in this cross-linguistic corpus, and show how a
statistical problem known as Simpson’s Paradox (Simpson 1951) necessitates a language-
focused typological analysis, i.e. a typology of language process preference. Such a typology is
then constructed and the patterns of typology are discussed. The classification of the languages
in the typology is compared against the phonotactics of native words in the languages. I argue
that the observations and analyses suggest the primacy of perceptual cues in LWA, an idea that is
further explored in subsequent chapters.
In Chapter 3, I lay out the hypothesis of the primacy of perceptual cues in LWA, which I
call “The Cue Hypothesis.” This hypothesis makes certain predictions as to how sequences of
sounds are adapted with respect to their position within the word. I lay out these predictions, and
then test them against the data in the cross-linguistic corpus. The results of these tests
demonstrate that perceptual cues do indeed play a crucial role in LWA.
In Chapter 4, I provide a case study of syllable structure in Tongan. Tongan, like
Hawaiian, has obligate (C)V syllable structure, thus making it an ideal language to study in-
depth. I test the predictions of the Cue Hypothesis in Tongan, and further explore primacy of
perceptual cues in LWA. In this chapter, I also show that understanding idiosyncratic properties
of specific LWs and language-specific markedness requirements are crucial for a complete
understanding of LWA in Tongan, and by extension LWA in general. Further, I compare the
patterns observed in Tongan against the cross-linguistic patterns, demonstrating that the primacy
of perceptual cues in LWA shows up on both micro- and macro-levels.
Chapter 5 provides a conclusion to the dissertation. In this chapter, I connect the data and
analysis to phonological theory, showing that formal theories of phonology are relevant and
applicable in LWA phonology, specifically that of the P-Map and work in this tradition (Steriade
1999, 2001/2009; Côté 2000). I also sketch a formal analysis of the LWA process in
Articulatory Phonology (Browman and Goldstein 1993) and Optimality Theory (Prince and
Smolensky 1993/2004). I additionally connect the observations and analyses in the dissertation
to theories of LWA, mainly the Perceptual Theory (Silverman 1992; Yip 1993, 2002;
4
Kenstowicz 2003, among others) and the Bilingual Theory (Paradis and LaCharité 1997,
LaCharité and Paradis 2002), showing that findings in this dissertation are mostly consistent with
the Perceptual Theory of LWA. I conclude the dissertation by suggesting avenues for future
investigation of LWA, considering the implications that such research potentially has for an
understanding of the nature of the human mind.
5
2 DATA, METHODOLOGY, AND TYPOLOGY. This chapter investigates the cross-linguistic
typology of LWA, providing a typology of process, and more importantly, a typology of
language process preference. The main goal of the study in this chapter is to investigate the
processes that speakers use in adapting loanwords (LWs), and to further discover how common
processes are with respect to one another, and why such commonalities exist.
In Section §2.1, I outline the data and methodology that serve as a basis for the typology,
including descriptions of how and where the data were collected, the languages used, and the
rationale for such methods. §2.2 provides the general results of the typology, that is, the typology
of process, independent of individual languages. This section shows how a statistical
problem/paradox known as Simpson’s Paradox (Simpson 1951) necessitates not only a typology
of process, but a typology of language process preference (stated more succinctly, a typology of
language). §2.3 provides this typology of languages, detailing the methodology used for
classifying languages, and then using this methodology to classify languages in the corpus. The
results of the classification are then given in §2.4, followed by a discussion of the typological
trends, which point to a mandate to retain information in the source word (SW) when adapting a
loanword (LW). Yet before any conclusions are made, the limitations and potential drawbacks
of the typological investigation in this chapter are discussed in §2.5. These limitations are
partially addressed in §2.6, where an analysis is presented comparing the classification of the
languages’ LWA behavior to the phonotactic properties of the languages’ native words. The
analysis in this section, in conjunction with the typological trends, suggests that perceptual cues
play a crucial role in the LWA process, which is the main finding and conclusion of this chapter,
outlined in §2.7
2.1 THE CROSS-LINGUISTIC CORPUS: DATA AND METHODOLOGY. This section describes
the data and methodology used to construct the cross-linguistic corpus, including how the data
were collected and selected, the sources used, and the languages included in the corpus. §2.1.3
lays out the possible responses (i.e. repairs, or lack of a repair) to a SW containing a
phonotactically marked sequence of sounds, and gives examples from the corpus of these
responses. The method of investigation and hypotheses are given in §2.1.4, followed by a
description of two normalization techniques used in the analyses.
6
2.1.1 DATA. The main typological analysis presented in this study is based on a corpus
containing 8,528 loanwords that I constructed from fifty-three genetically and geographically
diverse languages. The primary reason for the selection of languages was the availability of LW
data, and the type of this data. Firstly, the languages had to display adaptation of consonant
clusters in loanwords; languages that adapt foreign words completely faithfully were not of
theoretical interest. Of interest here is what happens when a language adapts a foreign syllable
structure, and why that happens. Thus, languages with highly permissive syllable structure, such
as Russian and Arabic, were generally not chosen for analysis. It should be noted that the corpus
constructed did indeed contain many instances of faithful LWA. However, the languages that
were selected must have also shown some type of repair process. Consider data from Turkish,
where faithful LWA occurs (1c), along with epenthesis (1a) and deletion (1b), with all three
happening with the same types of sounds and in the same position.
(1) The variability of LWA: Turkish (Iz and Hony 1978).
a. Epenthesis: “skeleton”
1
[skElEt´n] ! [iskelet]
b. Deletion/substitution: “schizophrenia” [skItzof®Enia] ! [Sizofreni]
c. Non-Adaptation: “sketch” [skEtS] ! [sketS]
An additional selection criterion of the languages in the corpus was to obtain a
geographically, typologically, and genetically diverse set of languages. Languages that had
ample LW data were not excluded from corpus; however, languages were explicitly sought after
so that the corpus contained a wide spectrum of language families and parts of the globe. All six
continents are represented in the corpus. Seventeen different language families are represented
in the corpus, along with two creoles, and two isolate languages.
Geographical and genetic diversity was not sought after in and of itself. Rather, this
diversity was sought so maximize variation in the properties of the native phonology of the
languages. For example, European languages tend to have relatively permissive syllable
structure, and conversely, Austronesian languages tend to have relatively restrictive syllable
structure (Hawkins 1990; Clark 1990). A typology of only European languages or a typology of
only Austronesian languages would ostensibly look quite different, as it is reasonable to assume
that the phonotactic properties of a language will, at least to some extent, affect how speakers of
that language borrow words. Because of this, geographic and genetic diversity was sought after
1
English words are transcribed based on my intuitions regarding my American English dialect
7
so that any trends observed in the typology could be more strongly adduced as trends of LWA in
general, and not trends due to the properties of the languages in the corpus.
The following table lists the languages investigated by family. A more detailed list,
organized by language, that includes family, countries of origin, data source, and number of
datapoints, can be found in Appendix A.
Classification Languages Count
Afro-Asiatic Gawwada, Tarifyt Berber, Hausa, Iraqw, Maltese 5
Altaic Korean, Japanese
Isolate Basque, Ket 4
Bantu Shona, Swahili 2
Amerindian Kali’na, Hup, Imbabura Quechua, Yaqui,
Mapudungan
5
Caucasian Archia, Bezhta, Georgian 3
Creole Saramaccan, Seychelles Creole 2
Dravidian Kannada, Malayalam, Tamil 3
Indo-Aryan Gujarati, Marathi 2
Indo-European Armenian, Dutch, English, Romanian, Irish, Welsh,
Lower Serbian, Macedonian, Selice Romani
9
Niger-Congo Isindebele 1
Nilo-Saharan Kanuri 1
Australian Gurindji 1
Sino-Tibetan-Burman Thai, Manange 2
Tungusic Oroqen 1
Turkic Azerbaijani, Turkish, Sakha 2
Uralic Finnish, Kildin Saami 2
Austronesian Cebuano, Javanese, Malagasy, Malay, Indonesian,
Hawaiian
6
Austro-Asiatic Ceq Wong 1
TABLE 1. List of languages by family, according to the Ethnologue database (Lewis et al 2013).
8
2.1.2 SOURCES. A combination of three different types of sources was used to gather
data. The common source was the World Loanword Database, providing the majority of the data
(Haspelmath and Tadmor 2009; http://wold.livingsources.org/). This is an online database of
loanwords that provides translations of the same words for forty-one different languages,
marking various properties of each word, most importantly, a word’s status as a loanword.
Thirty-three of the languages in the corpus come from this database. I manually went through
the entire vocabulary in this database for each language, marking the properties of interest,
discussed below.
Secondly, a collection of dictionaries, vocabularies, and articles was used. Five English-
to-Foreign Language dictionaries were used to collect data from Indonesian, Malay, Malayalam,
Turkish, and Welsh. Two methods were used to acquire data from these sources. Dictionaries
were scanned for words that were obviously borrowed from English, and properties of interest
were marked and recorded. Secondly, I generated a list of 240 words that are likely to be
loanwords in non-Indo-European languages. These words were “likely” to be borrowed from
English/European languages into other languages based on semantic/usage properties. For
example, Christian words such as “Christ,” “sacrament,” technical, scientific, and technology
words such as “chlorine,” “internet, “matrix,” etc, Western proper names such as “Flemish,
“Stockholm” etc, as well as Western cultural words, such as “jazz,” “spaghetti,” etc were all
used. This list can be found in Appendix B. I looked up each word individually, and if it was a
borrowing, recorded its properties. Three vocabulary books were used to collect data from
Japanese, Korean, and Hawaiian. These books were read page by page in alphabetical order
until 1) 120 words of interest were collected, or 2) the end of the vocabulary/book was reached.
Take for example Korean (Pae 1968). The first 120 loanwords containing a word-initial cluster
were collected and recorded. Word-final clusters are more rare, resulting in mining the entire
vocabulary for fifty-six words containing word-final clusters. Data from Shona was collected by
mining the entire vocabulary provided in Uffmann (2004), and similarly for Insindebele in
Mahlangu (2007).
The third source of loanword data came from Google Translate
(http://translate.google.com). This is an online translation service provided by Google, in which
words and/or phrases are input in one language, and automatically translated into another
9
language. I entered the same list of 240 words above into Google Translate, and the translations
that were obviously borrowings were recorded for properties of interest. This method was used
exclusively for eleven languages (Armenian, Azerbaijani, Cebuano, Finnish, Georgian, Gujarati,
Irish, Javanese, Kannada, Maltese, and Thai). Google Translate data from three other languages
(Romanian, Turkish, and Welsh) was used to supplement the dictionary scans (Turkish, Welsh)
and the vocabulary from the World Loanword Database (Romanian). A detailed list of the
observation counts for each language in the cross-linguistic corpus can be found in Appendix C.
2.1.3 DATA PROPERTIES. Four different types of syllable structure were recorded, serving
as independent variables of interest:
(2) Independent variables: English word: “crystals”
a. Word-Initial consonant clusters: [k®Ist´lz]
b. Word-Final consonant clusters: [k®Ist´lz]
c. Word-final codas: [k®Ist´l]
d. Word-medial CC sequences [k®Ist´lz]
2b and 2c both deal with sounds at the ends of words, but they are different in that 2c deals with
singleton coda consonants, and 2b deals with consonant clusters (word finally, and in a coda).
Both of these types of sequences may be repaired in LWA, as they diverge from the
phonologically and typologically preferred CV syllable (Blevins 1995; Greenberg 1978). The
description in 2d, that is “word-medial CC sequences,” is intentionally ambiguous. It is
described as such here because consonant-consonant sequences in the middle of a word could be
two different things, a word-medial syllable-initial CC cluster, or a word-medial coda followed
by an onset consonant. That is, two different syllabifications are possible:
#
V.CCV
#
and
#
VC.CV
#
. Take for example Macedonian’s borrowing of the word “nuclear”: [nuklearna]. The
/kl/ sequence could be syllabified as [nu.kle.ar.na] or [nuk.le.ar.na]. As it was not immediately
apparent which syllabification is appropriate, word-medial CC sequences were categorized
uniformly as the same unit, and are henceforth referred to as “word medial consonant clusters.”
Whether or not they are actually consonant clusters or coda+onset sequences is not of
importance for the analyses, as will be come clear in subsequent chapters.
The main dependent variable of interest was the type of process that a language used to
adapt syllable structure in the source word. For example, coda consonants are marked, often
10
repaired in native grammar (Jacobson 1968; Clements and Keyser 1983), typologically less
common than onsets (Greenberg 1978), and acquired relatively late by children (Ingram 1999).
Foreign words containing coda consonants can handled/treated in a number of ways. A language
schematic is given below, with a hypothetical source word.
(3) Coda Adaptation
Possible Processes Source Word Loan Word
a) Epenthesis tasap tasapa
b) Deletion tasap tasa
c) Alternation tasap tasam
d) Metathesis tasap taspa
e) Non-Adaptation tasap tasap
A language may insert a vowel to change a coda consonant into an onset consonant (3a). A
language may avoid the coda consonant by not pronouncing it, as in 3b. If a language allows
coda consonants of a certain type, such as Mandarin Chinese, where only nasals are allowed in
coda position (Li & Thompson 1981), then it may substitute the sound, as in 3c. Also, a
language may switch the temporal order of segments, changing the source word from a
prohibited syllable structure to one that is prohibited, i.e. metathesis, shown in 3d. Perhaps most
surprisingly, a language may do nothing at all, and retain the coda consonant faithfully, shown in
3e. Indeed all of these processes are observed in loanword data. Actual examples for coda
adaptation are given below, but first, the possible data range for consonant clusters is described.
Consonant clusters are subject to the same repair strategies, with addition to one more:
Coalescence. A sequence of two distinct consonants in a cluster may become one, sharing
properties of both. This, along with the other possibilities are shown in 3. It should be noted
here that the positions “initial,” “medial,” and “final” refer to the location of the consonant
cluster in the SW, and not (necessarily) to the location of the repair.
11
(4) Consonant Cluster adaptation
Process Source Word Loan Word
a) i. Epenthesis: Initial traka t´raka, ´traka
ii.Epenthesis: Medial tanpo tan´po
iii.Epenthesis: Final talt tal´t, tal´t´
b) i.Deletion: Initial traka taka, raka
ii.Deletion: Medial tarka taka, tara
iii.Deletion: Final talt tal, tat, ta
c) i.Substitution: Initial traka twaka
ii.Substitution: Medial tanpo tampo
iii.Substitution: Final talt tant
d) i. Non-Adapt: Initial traka traka
ii. Non-Adapt: Medial tanpo tanpo
iii. Non-adapt: Final talt talt
e) i.Metathesis: Initial traka tarka
ii.Metathesis: Medial tarka takra
iii.Metathesis: Final-VC tapalt taplat
f) i.Coalescence: Initial mpe be
ii.Coalescence: Medial tanpo tabo
iii.Coalescence: Final talk tag
There are some things to take note of in this illustration. For initial epenthesis, 4a.i, an
epenthetic vowel may be inserted in between the two consonants, or before the consonant cluster.
Although both happen, inserting the vowel in between the two consonants is by far more
common; both phenomena were recorded as the same. These were recorded as the same because
the goal of this study was to investigate what types of processes occur, such as epenthesis,
deletion, etc. For example in 4a.i, what matters is the fact that epenthesis was the chosen repair.
Repair was the focus of the typological analysis, not intra-positional specifics of the repair.
For final epenthesis (4a.iii), two possibilities exist and were indeed observed: “single”
epenthesis breaking up the cluster, resulting in a coda consonant, and “double” epenthesis,
resulting in a C´C´ sequence. The latter was by far the most common, as languages that avoid
12
consonant clusters (epenthesizing into them) also tend to avoid coda consonants. Both
alternatives were recorded as the same phenomenon, simply as a single case of epenthesis; repair
was the focus of the typological analysis, as something categorical that did or did not happen,
rather than the extent to which or the location in which it happened.
For final deletion, shown in 4b.iii, a language may delete one of the consonants in a
cluster, or both. This was recorded as the same phenomenon in the cross-linguistic data,
although cases of “double deletion” were noted. This phenomenon was quite rare. They mainly
occurred in Javanese and Hawaiian, and very sporadically throughout the remainder of the
corpus. There are also two examples of this “double deletion” in Tongan, which are discussed in
Chapter 4.
Substitution is demonstrated in 4c. Initial substitution (4c.i) was quite rare, occurring
mostly in cases where an r+C sequence in Arabic changed to aC in the borrowing language, most
commonly Berber. Initial substitution likewise occurred in one token in Tongan: “drill” !
[tuila] (Churchward 1959). Medial and final substitutions (4c.ii, 4c.iii) were almost entirely
cases of nasal place assimilation in an N+C sequence, and or a voiceless stop becoming voiced in
an N+C sequence.
For initial and final metathesis (4e.i, 4e.iii), it should be noted that what is being
metathesized is not the two consonants involved in the cluster, but one of the consonants and an
adjacent vowel. The consonants in question are in the same order, but are no longer adjacent; the
vowel that preceded the consonant cluster in the source word is in between the two consonants in
the loanword. A second type of final metathesis could happen, for example [tapast] ! [tapats].
The first type, called “Final-VC” metathesis in Table 6 was almost always the case when
segments in a loanword switched temporal order. This is not surprising if the purpose for
metathesis is to repair a consonant cluster, which the “Final-CC” type (shown in Table 6) fails to
do
2
. For medial metathesis (4e.ii), This ostensibly occurred when a language preferred a V.CCV
sequence to a VC.CV sequence, or vice-versa.
Coalescence (4f) was exceedingly rare, occurring only in four words in the corpus, in
medial consonant clusters in Ceq Wong, for example, Malay [tanjuN] ! Ceq Wong [ta!uN]
‘valley’ (Kruspe 2009). Other cases were discovered, but could have been due to orthographic
2
The same is true for word-initial consonant clusters. All of these cases involved metathesis of a consonant and a
vowel, rather than the two consonants in the cluster. Because metathesis occurred very infrequently (as will be
shown below), encoding this distinction in the analysis would be trivial.
13
and/or transcriptional conventions. These cases were not recorded; as with all data, if I was not
completely sure about phonological entities and process involved, it was not recorded. However,
the Malay to Ceq Wong case does appear to be a true case of coalescence: According to Clynes
and Deterding (2011), Malay lacks a palatal nasal, but has phonemic /n/ and /j/. For (f.i) and
(f.iii), a repair in which the voiced nasal and a voiceless obstruent become a voiced obstruent
could happen. There were cases that looked like this, but I was unable to determine if they were
true cases of coalescence, or cases of deletion plus an independent segmental change. These
cases were not recorded, for the same metric of certainty expressed above.
Above it was mentioned that 4e, that is, non-adaptation, may be surprising. Prima facie,
one would expect that if a language allows coda consonants in its native phonology, then it
would retain foreign coda consonants faithfully. Likewise, if a language prohibits coda
consonants (and/or clusters) in its native phonology, then it would necessarily repair a coda
consonant (or consonant cluster). However this is not the case. In fact, a robust property of
LWA is the fact that languages repair foreign syllable structure even when they do not have to.
That is, a language may allow consonant clusters or coda consonants in its native phonology, but
nonetheless repair identical sequences and structures when borrowing words from other
languages. Take for example Maltese, which allows codas, including [m], in the native
phonology: [kuddiem] ‘before.’ Maltese faithfully retains coda consonants, for example
“helium” ! [heljum], but not all of the time: “atom” ! [atumu]. This optional adaptation is
cross-linguistically common
3
. Although it is not a repair of a LW, it is included in the analysis
along with the repairs, as it is a way in which languages handle phonological structure in SWs.
This non-adaptation (or faithful adaptation) phenomenon can be thought of conceptually as
“retention,” in that a LW retains the structure in the SW. Such phenomena will be called
“retention,” throughout, as an analogue to “epenthesis,” “deletion,” etc
4
.
3
This may be thought of as a type of The Emergence of The Unmarked Effect (McCarthy and Prince 1994). This
phenomenon of ostensibly gratuitous epenthesis is an important aspect in understanding LWA of sound sequences;
it will be returned to more fully in later in this chapter (§2.5), and in Chapter 3.
4
Retention, as used here and throughout, refers exclusively to loanwords in which a phonotactically marked sound
sequence in the SW is retained in the LW, independently of the phonotactic properties of native words in the
borrowing language. For example, consider the word-initial consonant cluster in the English word “plastic”
borrowed into Dutch as [plœstIk] (van der Sijs 2009), and into Swahili as [plastiki] (Schadeberg 2009). Both were
counted as observations of Retention, despite the fact that Dutch allows word initial consonant clusters (e.g.
[pl{x´n] ‘to plough’), but native Swhili words do not allow consonant clusters (Mwita 2009).
14
All of the possible processes are indeed observed in the LW data, and in almost every
position. Examples from languages in the corpus are given below for codas in Table 2.
Consonant cluster adaptation examples are demonstrated in Tables 3-7. Coalescence is not
included because of its rarity, and that it was discussed in the example above in Ceq Wong.
Process Contact Adaptation Gloss
Epenthesis English ! Tongan [d´pasIt] ! [tipositi] ‘deposit’
Deletion Jaminjun ! Gurindji [wamulaN] ! [wamula] ‘young woman
Alternation English ! Javanese [t®UmpIt] ! [slompet] ‘trumpet’
Metathesis Russian ! Kildin Saami [vodit
j
]! [vodte] ‘to lead’
Non-Adaptation Avar ! Bezhta [majdan]! [majdan] ‘the valley’
TABLE 2. Coda adaptations
Position Position Contact Adaptation Gloss
Initial #CeC English ! Swahili [b®US] ! [buruSi] ‘brush’
Initial #eCC English ! Turkish [skElEt´n] ! [iskelet] ‘skeleton’
Medial CeC Malay ! Ceq Wong [askar] ! [/as´kar] ‘soldier’
Final CeC# Arabic ! Kanuri [ba©r] ! [ba©ar] ‘the sea’
Final CeCe# English ! Hausa [silk] ! [siliki] ‘silk’
TABLE 3. Epenthesis in consonant clusters
Position Target Contact Adaptation Gloss
Initial #CC Arabic ! T. Berber [©r´q] ! [©aq] ‘stream’
Initial #CC Swahili ! Iraqw [mfereji] ! [fere:ji] ‘the ditch’
Medial VCCV Arabic ! Kanuri [Sam?a] ! [Same] ‘candle’
Medial VCCV Georgian ! Bezhta [k"ombali] ! [k"obala] ‘axle’
Final CC# English ! Maltese [krus´fIks] ! [kursifis] ‘crucifix’
Final CC# Old Norse ! Middle English [samr] ! [sem] ‘same’
TABLE 4. Deletion in consonant clusters
15
Position Target Contact Adaptation Gloss
Initial #CC Arabic ! T. Berber [rbib]![abib] ‘stepson’
Initial #CC English ! Maltse [flEmIS]! [fjamiN] ‘Flemish’
Medial VCCV Spanish ! Im. Quechua [suegros] ! [swidrus] ‘parents-in-
law’
Medial VCCV Arabic ! Kanuri [jinzir] ! [jinjir] ‘chain’
Final CC# English ! Azerbaijani [Ultr´saUnd] ! [ultrases] ‘ultrasound’
TABLE 5. Alternation/Substitution in consonant clusters
Position Contact Adaptation Gloss
Initial English ! Kanuri [b®Ik] ![b´rki] ‘brick’
Medial Portuguese ! Hup [eskada] ! [sikada] ‘the ladder’
Final-CC Anglo-Norman French ! Middle English [bodn´] ! [bond] ‘boundary’
Final-VC Moroccan Arabic ! Tarifiyt Berber [Serk] ! [aSraq]
[Z´hd] ! [Zh´d]
‘to tan’
‘strong’
TABLE 6. Metathesis in consonant clusters
Position Contact Adaptation Gloss
Initial Russian ! Kildin Saami [kniga] ! [kniga] ‘the book’
Medial Spanish ! Cebuano [pulpito] ! [pulpito] ‘pulpit’
Final Ngaliwurru ! Gurindji [bujarl] ! [pujarl] ‘lazy’
TABLE 7. Retention in consonant clusters
2.1.4 DATA SPECIFICS. Other specifics regarding the data are described as follows.
Consonant clusters consisting of three or more consonants, e.g. CCCV sequences, were
not considered. This was mainly due to reasons of simplicity. Such sequences commonly show
a combination of epenthesis and deletion in LWA, where one consonant is deleted, but the other
two are retained but separated by an epenthetic vowel. As the goal of this study is to investigate
the commonality of various processes in LWA, categorizing processes as either epenthesis or
deletion, and not some hybrid category, more parsimoniously addresses this goal.
16
Borrowings from English, where a coda, word medial, or word final consonant cluster
contained /r/ were not included in the data. This was done for reasons of precision. Take for
example the word “Mars” borrowed into Tongan as [masi] (Churchward 1959). At first glance
this may appear to be a case of deletion of a final consonant cluster. However, the variety of
English that Tongan borrowed from was a coda-/r/-less variety, such as dialects spoken outside
of North America. In fact, orthographic “r” often goes unpronounced in loanword data, another
example coming from Korean: “bar girl” ! [pakkol] (Pae 1969). Because of the abundance of
English borrowings in other languages and the volume of loanwords handled, it was not known
whether such cases were true cases of deletion, or mere apparent cases of deletion. Thus, such
data was not included. However, when “r” occurred in word initial consonant clusters, like
“bromine,” there was no such ambiguity, and this data was included.
Homorganic nasal+stop sequences were treated slightly differently than other types of
consonant clusters. Due to the properties of the consonants involved, NC sequences tend to be
treated differently than other consonant clusters in LW phonology. Malagasy is exemplary of
this: word-medial consonant clusters are treated by non-adaptation (5a), epenthesis (5c), and
deletion (5b). However, all word-medial NC sequences are adapted faithfully, as shown in 5d.
(5) Word-medial consonant clusters in Malagasy (Adelaar 2009)
a. Non-Adaptation: [plastik] ! [plastika] ‘plastic’ From French
b. Deletion: [latsa] ! [lasa] ‘to leave’ From Bantu
b. Epenthesis: [SaÂbo‡] ! [saribao] ‘the charcoal’ From French
d. Non-Adapt: [lambu] ! [lambo] ‘the boar’ From Banjarese
For Malagasy, 4a was counted as a case of non-adaptation, but 4d was not counted as
anything, i.e. it was excluded from the corpus. This was also a case for a goodly number of other
languages in the corpus that showed a similar pattern of tolerance of NC clusters only.
However, I would indeed count NC sequences if I had an example of a language
repairing them. Ceq Wong provides an illustrative case of this.
(6) Word-medial NC clusters in Ceq Wong (Kruspe 2009)
a. NC Retention: [bintaN] ! [bintaN] ‘the star’ From Malay
b. NC Deletion: [tanduk] ! [tanok] ‘the horn’ From Malay
For Ceq Wong and like languages, the NC cluster such as in 6a was counted as a case of
retention. The rationale behind this was that counting NC sequences in the Malagasy-type
17
languages would skew the data and obscure the true pattern. What I wanted to know was
whether a language was, for example, an epenthesizing or a deleting language. Thus, including
NC sequences in the count along with stop+l or stop+stop sequences, when a language repaired
the latter but retained the former, would make it more difficult to detect the repair strategies of
that language’s loanword phonology, which was the focus of the analysis.
2.1.5 HYPOTHESES AND DATA ANALYSIS. The goal of the first part of the study is to
determine the typology of loanword adaptation. That is, what types of languages exist, classified
by how they adapt consonant clusters and codas in foreign words. Here, “type” refers to what
type of process a language uses/prefers to adapt syllable structure in a foreign word, e.g. whether
a language is an epenthesizing language, a deleting language, etc. In order to classify languages
as such, the null hypothesis was assumed throughout. That is, it was assumed that there was no
bias for one process or another. Recall that the languages were selected necessarily because they
showed variation in how they adapt foreign words. The distribution of repair processes was
therefore assumed to be random and evenly distributed. That is, given 120 observations of a
language borrowing a word with a consonant cluster (and excluding phoneme substitution and
metathesis), we expect about 40 (1/3) of the observations to show epenthesis, 40 to show
deletion, and 40 to show non-adaptation. In other words, when a speaker adapts a foreign
consonant cluster (or coda), the speaker freely and randomly chooses whether and how to adapt
it. If a language does indeed have a preferred way to treat foreign consonant clusters, the null
hypothesis will be falsified. It is then when we may classify a language as one type or another.
In order to test the null hypothesis, Pearson’s Chi-Square Goodness-of-Fit test was
employed (Pearson 1900; National Institute of Standards and Technology 2012). This statistical
test was used for various reasons. First of all, the Pearson’s Chi-Square (!
2
) test compares the
distribution of data against an expected distribution. For this study, the expected distribution was
an even distribution. For any set of data, the expected distribution was the mean value of all the
values in that set. If the distribution of processes is not even/random, then the null hypothesis is
rejected, demonstrating a bias towards one (or two) processes; Pearson’s Chi-Square tests
provides a clear indication of whether or not the null hypothesis is to be rejected. Secondly,
Pearson’s Chi-Square test is designed for categorical data. The data in this study was indeed
categorical, involving process counts and language counts. Thirdly, the data in this study was
18
treated as observations rather than behavior. Standard ANOVA tests and other tests using the
general linear model work well for continuous data, such as the effect of some variable upon
linguistic behavior. The Pearson’s Chi-Square test compares expected outcomes versus
observed outcomes, which is ideal for the main questions addressed in this study, that is, whether
the patterns in the observations are random, or if they are statistically meaningful.
The p-value and the Chi-Square statistic are reported. The level of significance is
assumed to be p < 0.05. The Chi-Square (!
2
) statistic is calculated by squaring the difference
between the observed and expected values for each data point, dividing each of these values by
the expected values, and adding these quotients together. It is usually a measure of the
robustness of the effect – the larger the !
2
value, the stronger the effect
5
. All inferential and
descriptive statistics were computed using MATLAB, except where noted.
2.1.6 CORPUS NORMALIZATION. For certain analyses done in this dissertation,
normalization of the cross-linguistic corpus was necessary, so that a language with a large
number of datapoints would not skew the results in one direction or another. For this chapter,
normalization was necessary for the typology of process investigation
6
. The number of
datapoints per language in the corpus varied widely. This was due to two factors. First, different
sources and methods of collection were used, as described above. Secondly, individual
languages widely vary as to how many loanwords they have in their vocabulary: the lexicon of
Mandarin Chinese (Wiebusch 2009) is composed of approximately 2% borrowings, whereas that
of Selice Romani contain up to 63% (El!ík 2009). Of the fifty-three languages in the corpus, the
language with the lowest number of datapoints was Hup (28), and the highest was Indonesian
(627). The mean across all languages was 160.906, with a median of 138, a standard deviation
of 109.698 and mean absolute deviation of 75.958. Because of this wide variation, normalization
was required to obtain more accurate and meaningful results. Two different normalization
methods were used.
Before the two methods of normalization are explained, it should again be noted that the
rationale for normalization was to weigh all languages as the same. That is, the purpose of
5
However, the size of the !
2
value is based on the size of the values in the data set, so “larger” is necessarily
relative.
6
Normalization was also needed for analyses in Chapters 4; to avoid redundancy, the normalization technique used
throughout is described here.
19
normalization was to manipulate the magnitude of the data so that the general, cross-corpus
results would be reflective to the true rate of process occurrence. Thus, the total number of data
points in each language was what was normalized; the values for the individual processes, i.e.
retention, epenthesis, deletion, and “other
7
” were then scaled/adjusted against the normalized
total. Consider data from Malayalam and Swahili in Table 8.
Language Epenthesis Deletion Retention Other Total
Malayalam 5 4 21 1 31
Swahili 190 2 27 1 220
Total 195 6 48 2 251
TABLE 8. Data example from Swahili and Malayalam
Simply examining the raw totals (and ignoring Retention and Other for expository
simplification) one concludes that epenthesis is by far the most common process, and that
deletion is very rare; epenthesis happens at a ratio of 195:6, or 32.5:1. However, this conclusion
is not necessarily accurate due to the fact that Swahili has a strong bias towards epenthesis, and
large number of datapoints. More importantly, it obscures the fact that epenthesis and deletion
happen at roughly the same rate in Malayalam. If Malayalam had the same number of
datapoints as Swahili, i.e. 220, then the totals would be different. Table 9 adjusts the Malayalam
data so that the proportion of epenthesis, deletion, retention, and “other,” are equivalent (rounded
to the nearest whole number), and the total number of datapoints equals that of Swahili. The
adjustment is done by simple algebra.
7
Metathesis, substitution, and coalescence are lumped together as “other,” because their occurrence is so rare,
which will be shown in §2.3
20
(7) Equation used for adjusting the data for normalization
Where PL
obs
= number of observations of Process P in language L,
TotL
obs
= total datapoints (observations) for language L
TotL
norm
= the normalized number of total observations in language L
PL
adj
= adjusted value for process P in language L, calculated from the
normalized total observations for L
Language Epenthesis Deletion Retention Other Total
Malayalam 35 28 149 7 220
Swahili 190 2 27 1 220
Total 225 30 176 8 440
TABLE 9. Swahili and Adjusted Malayalam Data
Again, we observe that epenthesis is more common than deletion; however the magnitude
is much lesser. The epenthesis to deletion ratio is 225:30, or 7.5:1, which is a more accurate
figure regarding the commonality of epenthesis and deletion. The two corpus normalization
techniques used for certain analyses in this study are analogous to the adjustment done here,
except that for the mini-example, Malayalam was adjusted based on Swahili’s total. That is,
TotL
norm
in this example was 220.
Another way to think of this method – which is perhaps descriptively more accurate – is a
process of scaling and then normalization. The total number of datapoints in each language is
adjusted to a scale that weighs each language equally, and then the value for process in each
language is normalized according to that sale. Either way, what is important is that each
language’s data was weighed the same, no matter how many loanwords a language contributes to
the corpus. Likewise, tendencies cannot be “washed out” by languages that have a large number
of datapoints and a strong bias towards a certain process.
21
The two normalization processes, then, simply differ in how TotL
norm
is calculated. The
first method is called Direct Division Normalization. This method takes the mean across all
languages (160.9), and normalizes individual languages against that mean. Simply, the
percentage of observations of process P
obs
in an individual language L is normalized against the
mean all languages. TotL
norm
in the equation above is the same for each language: 160.9.
(8) Equation for Direct Division Normalization
This allows for direct comparison across languages. Again, Swahili and Malayalam
serve as examples. In a comparison across all languages, these two languages contribute a total
of 195 observations of epenthesis and 6 observations of deletion, skewing the total data in favor
of epenthesis, and obscuring the fact that Malayalam epenthesizes and deletes at about an equal
rate. Normalized with the Direct Division method, Malayalam contributes 25.95 datapoints for
epenthesis, and 20.76 for deletion. For Swahili, 138.96 cases of epenthesis and 1.46 cases of
deletion are added to cross-language totals. Epenthesis is still the majority repair, but deletion is
more proportionately represented.
The second method used for normalizing the data is a type of T-Score Normalization
(Mortensen and Gade 1993). In some ways this is a more informative technique for normalizing
the data in this study as it yields trends are easier to observe, fitting values on a scale of 1-100,
with a mean of 50; patterns in the data can be directly conceptualized as percentages. The type
of T-Score scaling done here adjusts the results of Z-Score scaling, which is a type of
normalization itself. However, Z-Score scaling is not useful for analysis of the current data set,
as it results in small and negative values, obscuring any directional patterns. Thus, the Z-Scores
were transformed to the more transparent T-Scores values.
This was done as follows. For the total datapoints/observations in each language, a Z-
Score was calculated. The Z-Score of a single value V in a set of data S is V minus the mean of
S, then divided by the standard deviation of S. To get the T-score, this value is then multiplied
by 10, and then 50 was added to it. This is shown in the equation below.
22
(9) Equation for calculating Z-Score
(10) Equation for calculating T-Score
Here, V is the total number of observations/datapoints in a language, for example 220 for
Swahili; S is the set of all 53 language totals, whose mean (S
mean
) is 160.9 and standard deviation
(S
std
) is 109.7. These values were used to calculate a Z-Score for each language, V
z
. This value
V
z
is then put through a second equation: it is multiplied by 10 and 50 is added to yielding the
final T-Score value, V
t
. This is value was specific/unique for each language, and the observations
of the various processes within each languages were adjusted according to this value, per the
algebraic equation as above. With this method TotL
norm
in 7 is V
t
here (in 10). As with the
Direct Division method, this method of T-Score Scaling weighs the observations across
languages equally, regardless of how many datapoints are in each language. Again using
Malayalam and Swahili, the epenthesis to deletion ratio goes from 5:4 and 150:2 to 6.15:4.92 to
47.83:0.5. Swahili’s bias for epenthesis still shows up, but the fact that Swahili has more data
has been neutralized.
Both the Direct Division method and the T-Score scaling method have the effect of
giving confidence to any observed process bias. That is, if a bias for epenthesis and against
deletion is observed, we may more confidently claim that this is a true bias in favor of
epenthesis, rather than an effect of there simply being more epenthesis observations in languages
that have a large number of total datapoints. These two methods were both used because each as
its own advantage. With the T-Score scaling method, the advantage is that relative
proportionality between patterns is more transparent, in that the values produced by T-Score
Scaling fall between 0 and 100, with the mean value for each language being 50. The Direct
Division method has an advantage in that the values are closer to the actual observation values,
allowing for a better estimate of the relative magnitude of differences between patterns. Both are
relevant for the typology of process, discussed below.
23
2.2 GENERAL FINDINGS: TYPOLOGY OF PROCESS. Recall from above that one of the goals
of this chapter – and of the dissertation, in general – is to develop and investigate a typology of
LWA processes, and to explain any biases towards one process or another that are found. This
section does this, providing a typology of the processes observed in LWs, independently of
language. These data are given and discussed below.
A strong bias for retention (or non-adaptation), and for epenthesis was found; deletion is
cross-linguistically dispreferred, as well as other possible repairs, such as metathesis, phoneme
alternation/substitution, and coalescence. However, it is argued that these trends are only
partially telling, as cross-linguistic LWA data are subject to a conceptual/organizational problem
known as Simpson’s Paradox, which I explain below. Further, I argue that this paradox
necessitates a language-focused analysis, i.e. a typology of language process preference.
2.2.1 CROSS-LINGUISTIC TRENDS. There are a total of 8,538 data points in the loanword
corpus. Out of these, six different phenomena are observed. These are listed in the Table 10,
with the raw (non-normalized) counts for each one, along with the percentage of the corpus they
represent.
Observation Count Percentage
Non-Adaptation 5140 60.2%
Epenthesis 2307 27.02%
Deletion 907 10.62%
Metathesis 96 1.12%
Alternation 74 0.88%
Coalescence 4 0.05%
TABLE 10. Total observations/processes the corpus, across all languages
The four instances of coalescence all occurred in Ceq Wong, where a /n+j/ sequence
became [!], for example, Malay [tanjuN] ! Ceq Wong [ta!uN] ‘valley’. Metathesis,
alternation/substitution, and coalescence were so rare (making up about 2% of the corpus) that
they are treated as one category, “others.” For all subsequent analyses, instances of the “others”
24
category were not included in the analysis, unless the “other” processes made up at least 5% of
the observations in a language. There were five such languages where this was the case: Ceq
Wong, Gawwada, Iraqw, Kanuri, and Thai. The languages and data were treated that way as the
typology of language process preference analysis aimed to discover what process is
dominant/preferred for each language. For 48 of the languages in the corpus, the observations of
“other” were so few that it is clear (a priori) that these languages strongly dispreferred
metathesis, substitution, and coalescence; including them in the typology would be redundant
and meaningless. However, for Ceq Wong, Gawwada, Iraqw, Kanuri, and Thai, the “other”
processes occurred frequently enough to be considered meaningful and necessary in the
classification of these languages. This turned out to indeed to be the case for Kanuri, as will be
shown below.
The general trend can be described as a preference for retention, or non-adaptation, with a
secondary preference for epenthesis. This distribution of processes was statistically significant,
with the “others” category included (p<.0001, !
2
= 3,407.4), and when the “others” category was
excluded (p <.0001, !
2
= 1,699.9).
25
FIGURE 1. Total process observations across all languages, indicating a strong preference for
non-adaptation, and a preference for epenthesis over any other repair
Excluding cases of non-adaptation, and just comparing epenthesis, deletion, and “others,”
the pattern is likewise significant (p<.0001, !
2
= 1,128.1), as is comparing the two main
strategies, epenthesis and deletion (p<.0001, !
2
= 320.1). These results suggest that languages
prefer to faithfully adapt foreign consonant clusters, and when they do alter the sequence of
sounds in a SW, inserting a vowel – i.e. epenthesis – is the preferred alteration (repair).
Despite the strength of these patterns, any conclusion about process preference is
tentative, as these are based on raw counts. The raw data show that non-adaptation is the
dominant pattern for languages adopting foreign syllable structure. However, this may not
accurately describe true cross-linguistic preferences; for example, it could be the case that
languages with a large number of datapoints, such as Indonesian (627), Shona (457), Tarifiyt
Berber (331) have strong biases for non-adaptation, that may not be true of languages with a
lesser number of datapoints. This is precisely the problem discussed above in the section dealing
with corpus normalization – and so such normalization was done.
26
Normalized using the Direct Division method, the results were still significant and in the
same direction (p <.0001, !
2
=1,629.4). Non-adaptation was by far the most ubiquitous
observation (5,077.5 normalized counts), followed by epenthesis (2,298.3) and deletion (936.7).
Because the data were normalized so that all languages had an equal number of data points
(160.9), within-language means could be directly compared in a meaningful way. The average
number of epenthesis observations per language was 43.36, compared to 17.67 for deletion, and
95.8 for non-adaptation shown in Figure 2. This distribution was likewise significant (p<.0001,
!
2
=30.74). The T-Score scaling method also yielded results that were statistically significant,
and in the same direction, shown in Figure 3. (p = 0079, !
2
= 9.6745).
FIGURE 2. Direct Division Normalization means per language, indicating a preference for Non-
Adaptation and Against Deletion
27
FIGURE 3. T-Score Scaled Normalized Means, averaged across language
The normalized data confirm the trends in the raw data: in the corpus representing a
sample of the world’s languages, the most common way speakers adapt foreign consonant
clusters (and codas) is to do nothing at all, that is, to adapt them as they exist in the donor
language. When a repair occurs, epenthesis is more likely to be chosen than deletion. Other
options, such as metathesis, substitution, and coalescence, rarely occur. However, examining
broad cross-linguistic trends, although useful and informative, is not as meaningful or
statistically sound as deriving a typology based on language, rather than process. The reasons
for this are explained in the following section.
2.2.2 THE PROBLEM OF GROUPING RESULTS: SIMPSON’S PARADOX. The problem of
summing the overall results and comparing the trends has already been discussed: because the
fifty-three languages used in the corpus had a widely varied number of samples, one type of
observation (i.e., epenthesis, deletion, “other,” and non-adaptation) may be over-represented in
the raw counts. Scaling and normalizing this data ostensibly fixes this problem. However, this
remedy is only a superficial fix. A wide-scale cross-corpus typological analysis of LWA, as the
one done in this study, is problematic due to a statistical phenomenon known as Simpson’s
Paradox (Simpson 1951). Conceptually, Simpson’s Paradox is the problem that, if a set of data
28
has different independent variables that are ontologically distinct, an aggregate analysis of the
effect of those independent variables upon the dependent variables may not necessarily reveal
sound or accurate findings. Another way of stating this is that different equally valid ways of
organizing and analyzing data can yield different findings. The Stanford Encyclopedia provides
the following summary of Simpson’s Paradox: “An association between a pair of variables can
consistently be inverted in each subpopulation of a population when the population is
partitioned” (Malinas and Bigelow 2012).
Above, phonological process was the target of analysis – that is, a typology of process
was given, where Retention, epenthesis, and deletion were the types (or categories) of the
typology. I show below that Simpson’s Paradox necessitates that language-type should also be
the target of analysis. That is, for a more complete representation and analysis of the typology of
LWA, an analysis of language process preference is necessary: the “types” of the typology need
necessarily also be categories such as “epenthesis preferring languages,” “retention preferring
languages,” “deletion preferring languages,” etc.
Consider a hypothetical drug trial study. Two drugs, Drug A and Drug B, are being
tested to see which is a better cure for some disease. Two trials are done, yielding the following
results. In Trial 1, Drug A cures 63/90 (70%) of people, and Drug B cures 8/10 (80%) of people.
In Trial 2, Drug A cures 4/10 (40%) of people, and Drug B cures 45/90 (50%) of people. This is
illustrated below.
Drug A Drug B
Cure Ratio Cure % Cure Ratio Cure %
Trial 1 63/90 70% 8/10 80%
Trial 2 4/10 40% 45/90 50%
TABLE 11. Data from a hypothetical drug study, illustrating Simpson’s Paradox
It appears as though Drug B is the better drug, curing 80% versus 70% for Drug A in Trial 1, and
50% versus 40% in Trial 2. That is, considering each trial separately, Drug B is the better drug.
However, if the results are combined, we see that Drug A cures 67/100 people, and drug B cures
29
53/100 people. Drug A cures 67% overall, and Drug B cures 53% overall; Drug A thus appears
to be the better drug
8
. This is the paradox.
This statistical problem/paradox applies to any cross-linguistic study where the
distribution of phonological processes within languages is the target of investigation. Consider
another hypothetical dataset, this one resembling the corpus investigated in this study. Five
languages show either epenthesis or deletion in repairing consonant clusters.
Language Epenthesis Total Deletion Total Total Observations
1 97 7 104
2 12 28 40
3 240 3 244
4 8 31 39
5 13 32 45
Sum 370 102 472
TABLE 11. Hypothetical Language Data
Epenthesis appears to be the preferred repair strategy used by these languages, even by a
wide margin: 78.39% of the repairs are epenthesis, versus only 21.61% deletion. However, if
we look at each individual language, a different conclusion is drawn: Deletion is preferred.
Languages 1 and 3 are epenthesis-dominant, but languages 2, 4, and 5 are deletion-dominant.
One might initially assume that normalization might resolve this paradox; after all, the
total number of observations per language varies, just like it does in the actual corpus. However,
the same paradox applies. Below is a table that normalizes the hypothetical language data using
the Direct Division and T-Score Scaling Methods.
8
Special thanks to Dr. James Grime for data and consultation.
30
Raw Counts Direct Division T-Score Scaling
Language Epenthesis Deletion Epenthesis Deletion Epenthesis Deletion
1 97 7 88.05 6.35 47.65 3.44
2 12 28 22.32 66.1 13.14 30.67
3 240 4 92.85 1.55 65.91 1.1
4 8 31 18.27 75.04 8.96 34.74
5 13 32 27.27 67.13 12.82 31.56
sums 370 102 255.85 216.15 148.49 101.5
TABLE 12. Normalized Hypothetical Language Data
Here we see the same pattern. In each of the normalization methods, epenthesis appears to be
the dominant pattern (although the difference between epenthesis and deletion is not as large as it
is in the raw data). However, if we look at individual languages, deletion is the preferred repair,
in the raw data as well as in the normalized data. It should be noted that the data in this example
were manipulated to clearly demonstrate the problem of Simpson’s Paradox. The actual data
may or may not be similar to this. Because this problem exists, however, a language-focused
analysis is warranted for the sake of analytical thoroughness.
Simpson’s Paradox problem is not specific to the current study – it applies to any cross-
linguistic study that counts the number of observed phonological processes with an uneven
(variable) distribution – a fact inherent to loanword phonology. Perhaps the problem would be
remedied if every single loanword in each language investigated were collected and included in
the analysis. Even if this were possible, it was not done in the construction of the corpus
examined in this study.
In sum, the purpose of this study is to investigate the cross-linguistic adaptation of
syllable structure. Examining process/repair strategy as the target of investigation is less than
ideal for the reasons explained in this section, i.e., Simpson’s Paradox. That’s not to say that the
above is invalid and should be discarded. The process-focused analysis does indeed provide
insight into the nature of LWA – epenthesis and non-adaptation are the most common cross-
linguistic patterns, which is insightful for the purposes of the investigation. However, to include
a language-focused analysis is necessary; the conclusions drawn from the process-focused
analysis are only partial, and may indeed be rejected by the results of a language-focused
31
analysis. That is, to understand the nature of sound sequence adaptation in loanword phonology
from a cross-linguistic perspective, we must investigate what types of languages exist, how
common the types are, and why the types exist. Thus, we next turn the focus to a typology of
language process preference. The remainder of this chapter is dedicated to the development,
exploration, and analysis of such a typology.
2.3 TYPOLOGY OF LANGUAGE PROCESS PREFERENCE. In this section, a typology of
languages according to the preference for how they handle sound sequences in foreign words is
examined. Relevant facts regarding LWA of syllable structure (or sound sequences) and
previous findings in this chapter, as well as aspects of the cross-linguistic corpus are given to be
the necessary building blocks of building a typology of language process preference. A
methodology for classifying languages in terms of their preference, which uses a two-step
process of Chi-Square tests is explained. This method is then used to classify the languages in
the cross-linguistic corpus, resulting in the desired language-focused typology.
2.3.1 INITIAL CONSIDERATIONS. There are three crucial properties of the corpus and of
LWA in general to consider in developing a typology of languages based on how they adapt
sequence of sounds in foreign words.
1) Three types of adaptation are relevant for 48 of the 53 languages in the corpus.
Languages handle foreign sequences of sounds in three ways: these 48 languages delete
consonants, epenthesize vowels after (or in between) consonants, or they do nothing at all,
retaining the syllable structure in the SW. Other processes, mainly metathesis and phoneme
substitution occur very infrequently. Metathesis and substitution are only observed in 174 tokens
in the entire corpus of 8,525 tokens (2.04%). Because these cases are so rare, they are not
included, except for the five languages in which the “other” category makes up 5% or greater of
the data, that is, Ceq Wong, Gawwada, Iraqw, Kanura, and Thai. In other words, the “other”
patterns will only be considered when classifying these five languages. The other forty-eight
languages will be classified based only on the three majority processes, Epenthesis, Deletion, and
non-adaptation, referred to hence forth as “Retention.”
9
9
Capitalization is used to differentiate two slightly different things. Capital “Epenthesis” refers to a category in the
classification and typology; lowercase “epenthesis” refers to a phonological process.
32
2) No language shows uniform and consistent LWA of syllable structure. That is, no
language shows only Deletion, only Epenthesis, or only Retention. Excluding the “other”
category, 50/53 of the languages in the corpus show all three processes. Japanese and Shona
have no cases of retention, only employing epenthesis and deletion to adapt loanwords.
Manange has no cases of deletion, only showing retention and epenthesis. For the other fifty
languages, epenthesis, deletion, and retention are all used to adapt foreign consonant clusters.
3) The preference for process varies widely. There can be a preference for one adaptation
over all others; two adaptations could be equally preferred with exclusion to a third, and so on.
Although there are three main process categories, Epenthesis, Deletion, and Retention, the
number of possible types of languages is greater than 3!=6. For example, a language could
prefer Epenthesis and Deletion equally. Thus, the typology needs to consider types that include
both exclusive process preference and inclusive process preference. In other words, there could
be a three-way preference, for example, Epenthesis>Deletion>Retention. And each factorial
combination of processes can be grouped, for example (Epenthesis, Deletion)>Retention, or
Epenthesis>(Deletion, Retention), etc. So, the number of possible languages classified this way,
is 3!=6, plus the all permutations in which these six may be grouped in 2 versus 1 groups, that is
6, plus a grouping of all three together (that is, no preference for process). Given this, there are
13 possible languages.
33
Language Type Description
1 Epenthesis Dominant Language uses Epenthesis in almost all cases
2 Epenthesis-Retention Dominant Language uses Epenthesis in the majority of cases;
Retention is used in a statistically significant
minority
3 Epenthesis & Retention Dominant Language uses Epenthesis and Retention about
equally; there is a statistically significant bias
against Deletion
4 Epenthesis-Deletion Dominant Language uses Epenthesis in the majority of cases;
Deletion is used in a statistically significant minority
5 Epenthesis & Deletion Dominant;
Adapting, Process Neutral
Language uses Epenthesis and Deletion about
equally; there is a statistically significant bias
against Retention
6 Deletion Dominant Language uses Deletion in almost all cases
7 Deletion-Epenthesis Dominant Language uses Deletion in the majority of cases;
Epenthesis is used in a statistically significant
minority
8 Deletion-Retention Dominant Language uses Deletion in the majority of cases;
Retention is used in a statistically significant
minority
9 Deletion & Retention Dominant Language uses Deletion and Retention about
equally; and there is a statistically significant bias
against Epenthesis
10 Retention Dominant Language uses Retention in almost all cases
11 Retention-Epenthesis Dominant Language uses Retention in the majority of cases;
Epenthesis is used in a statistically significant
minority
12 Retention-Deletion Dominant Language uses Retention in the majority of cases;
Deletion is used in a statistically significant minority
13 Neutral Language uses Epenthesis, Deletion, and Retention
about equally; No statistically significant bias for
process
TABLE 13. Typology of possible languages
For clarity, these types/categories, when used within the text, are denoted with a vertical
line, e.g. |Epenthesis Dominant|. There are some inconspicuous details in the typology that one
might miss. Consider, for example, the types |Epenthesis-Retention Dominant| and |Epenthesis
& Retention Dominant|. From a nomenclature perspective, one might assume that these are one
of the same thing. However, they represent distinct patterns; the punctuation used in their names
is necessary, informative, and meaningful. For the |Epenthesis & Retention Dominant| category,
there is a statistical bias against Deletion and for both Epenthesis and Retention, when grouped
together, but no statistical bias between Epenthesis and Retention when compared individually:
34
(Epenthesis, Retention)>Deletion. For the |Epenthesis-Retention Dominant| category, there is a
bias towards both Epenthesis and Retention, and against Deletion; when all three values are
compared, the value for Epenthesis is statistically significantly larger than the value for
Retention, which is statistically significant than the value for deletion:
Epenthesis>Retention>Deletion.
Additionally, there is a meaningful difference between the categories |Epenthesis-
Retention Dominant| and |Retention-Epenthesis Dominant|. Per above, languages classified as
such show a tripartite distinction in process preference: the ordering of the processes with a
hyphen is demonstrative of a language’s LWA behavior. However, for types with an ampersand,
i.e. |Epenthesis & Retention Dominant| and |Deletion & Retention Dominant|, the ordering of the
processes in the names of these types bears no meaning. For example, a |Epenthesis & Retention
Dominant| language shows a statistically significant bias against Deletion, but no statistical
difference between Epenthesis and Retention. The name of this category/type could be
|Retention & Epenthesis Dominant|; and the label |Deletion & Retention Dominant| could be
|Retention & Deletion Dominant|, but for mostly matters of convention, I chose to put the actual
loanword repair first.
This situation is the same for the type/category |Epenthesis & Deletion Dominant|;
languages of this type show a statistically significant bias against Retention, but no bias is
detected when Epenthesis and Deletion are compared. However, the name |Epenthesis &
Deletion Dominant| is only listed as such for expository ease and convenience. For this type of
language, sound sequences in the SW are adapted/repaired in the LW almost all cases, but the
language shows no strong preference for the type of repair. Essentially, languages of this type do
not show a bias for how sequences of sounds in the SW are repaired, as long as they are repaired.
Due to this process-neutrality, languages that show this pattern can be described as “Adapting,
Process Neutral.” This phrase is just as descriptive as |Epenthesis & Deletion Dominant|, and it
has the additional benefit of being more intuitive. Thus, from here on out, the name |Adapting,
Process Neutral| will be used to describe languages of this type
10
. Granted, the names for the
other types of languages may not be as transparent or clear in the abstract. However, the
10
Using the name |Adapting, Process Neutral| has the added benefit of resulting in a simpler and more intuitive
typological organization of languages. This will be made more clear below, after all of the languages have been
classified.
35
rationale for the details described above should be made clearer as the typology is developed
below.
2.3.2 THE CLASSIFICATION METHOD. The method of classification I present here employs
a two-step process of Chi-Square tests that determine the proper classification of language
process preference. The first step uses the results of a Chi-Square test to place languages in one
of three different categories. The second step uses a Chi-Square test that is specific to each one
of these categories. This is described in more detail as follows.
The general, language-independent trends given above show that Retention (also called
non-adaptation above) is by far the most widespread of all the processes. The first step to
classify each language, then, is to determine if Retention is the dominant pattern in a language.
That is, is a language an adapting language or a retaining language? To test this, the number of
observations of Retention in each language is compared to the number of cases of adaptation;
Epenthesis and Deletion values are lumped together as one category in the Chi-Square test, and
compared against the number of Retention observations. This Chi-Square test yields three
possible results, placing languages into three categories for the second step of statistical testing.
This second step involves procedures that are specific to the category determined by the first
step. The three categories are given abbreviations for convenience and in order to refer back to
them throughout the text.
(11) The three possible results of the step 1 test
a. Significant in the direction of Retention (S.R.)
b. Significant in the direction of Adaptation (S.A.)
c. Not Significant (N.S.)
It should be noted here that these three categories S.R., S.A., and N.S. only refer to the
results of the first-step Chi-Square test. Other statistical tests may be not significant, but the
label “N.S.” refers only to the result of the initial test.
For languages that are significant in the direction of adaptation in the first step Chi-
Square test, noted S.A., a subsequent Chi-Square test is done to determine whether Epenthesis
versus Deletion is significant in these languages
11
. Again, three results are possible: Significant
11
If an adapting languages happens to be one of the four languages mentioned above, for whom the “other”
category makes up at least 5% of the total data (Ceq Wong, Iraqw, Kanuri, and Thai), it is tested separately,
36
in the direction of Epenthesis, significant in the direction of Deletion, and not significant. This
determines three classifications: |Epenthesis Dominant|, |Deletion Dominant|, and |Adapting,
Process Neutral|, respectively.
For the languages that were not significant the initial test, i.e. category N.S. above, there
is no statistical difference between Retention and Adaptation. Thus to classify these languages, a
three-way test Chi-Square test is done comparing Epenthesis vs Deletion vs Retention. If a
language is not significant for this test, it means that Retention, Epenthesis, and Deletion occur
equally within the language, thus classifying the language as a |Neutral| language.
For the category N.S. languages that are statistically significant in the three-way
comparison, individual language data patterns are examined. That is, the raw counts of the data
are examined to determine which process has most observations, the second most observations,
and the third most observations. The are six (3!=6) possible patterns, shown below
(12) Possible patterns for N.S. Languages
A B C
i. Epenthesis> Retention>Deletion
ii. Epenthesis>Deletion>Retention
iii. Retention>Epenthesis>Deletion
iv. Retention>Deletion>Epenthesis
v. Deletion>Epenthesis>Retention
vi. Deletion>Retention>Epenthesis
For each of these possibilities, pair-wise comparisons are done. That is, three separate
pair-wise Chi-Square tests are done comparing A versus B, and B versus C and A versus C. The
significance of these determines the classification of the language.
Consider the example in 12.i for an illustration. If all three pair-wise tests are significant,
that is, if AvsB (Epenthesis versus Retention) and BvsC (Retention versus Deletion) and AvsC
(Epenthesis versus Deletion) tests are significant, this places a language into |Epenthesis-
Retention Dominant| category. In other words, there is a statistical bias for Epenthesis over
Retention, and a statistically significant bias for Retention over Deletion; Epenthesis and
Retention are the preferred adaptation strategies, but Epenthesis is preferred to Retention at a
comparing Epenthesis vs Deletion vs Other. Likewise, when one of these languages is involved in further tests
described below, separate tests are run that includes Other as category of comparison in the Chi-Square test.
37
statistically significant rate. Epenthesis is the dominant pattern, but Retention makes up a
statistically significant minority. The same goes for every other type 12.ii-12.vi: If AvsB is
significant, BvsC is significant, and AvsC is significant, then the trend of A>B>C is significant,
which, by definition is a |A-B Dominant| language.
Consider another scenario, again with a language that shows the pattern in 12.i. If AvsB
(Epenthesis versus Retention) is not significant, but BvsC (Retention bersus Deletion) and AvsC
(Epenthesis versus Deletion) are significant, this places the language in the |Epenthesis &
Retention Dominant| category: Epenthesis and Retention are predominantly used, but there is no
statistical distinction between the two. In other words, Deletion is dispreferred – set apart from
Epenthesis and Retention – but Epenthesis and Deletion are one of the same thing, at least from a
statistical perspective. The LWA of the language involved retains (adapts faithfully) and
epenthesizes about equally, but shows a significant bias against Deletion. The same goes for
other patterns: If AvsB is not significant, and BvsC and AvsC are significant, this indicates that
A and B are equally preferred over C, by definition, a |A & B Dominant| language.
Consider a final scenario, this time with 12.iii, where the pattern is
Retention>Epenthesis>Deletion. If AvsB (Retention versus Epenthesis) is significant, but BvsC
(Epenthesis versus Deletion) is not significant
12
, then this places the language in the |Retention
Dominant| category. This might seem surprising and/or problematic in that the |Retention
Dominant| category is established for languages of type S.R, discussed below. However, this
classification is justified here: some languages will be not significant in the first step simply due
to low counts, which cause the Chi-Square test to report Retention vs. Adaptation as not
significant. For example, if a language has 9 observations of Retention, 4 observations of
Deletion, and 3 observations of Epenthesis, the first-step Chi-Square tests will compare 9 versus
7, resulting in non-significance, placing the language into the N.S. category. However, when
Retention is compared separately against Epenthesis and Deletion, (i.e. 9 versus 4, and 9 versus
3), significance will be reached, but the comparison of Epenthesis and Deletion (4 vs 3) will not
be significant, justifying the language as a |Retention Dominant| language. This happens for a
language in the classification, as will be shown below.
12
In such a scenario, if the comparison of AvsB is significant, but the comparison of BvsC is not significant, it must
also be the case that the comparison of AvsC is significant. Recall that these letters are variables that represent
integers, where A>B>C. If A is significantly larger than B, and B is larger than C, A must therefore be significantly
larger than C. The same logical implication holds for other comparisons as well. For example, if AvsB is not
significant, but BvsC is significant, AvsC must be significant, given A>B>C.
38
Something not immediately obvious but crucial to the classification of N.S. languages
must be addressed at this point. The method described above primarily classifies languages
based on bias for a process. With |Retention Dominant| languages, there is a bias for Retention.
However, a bias against a process may likewise be informative for the proper classification of a
language. This is indeed the case for the language types with the ampersand in their name, e.g.
|Epenthesis & Retention Dominant|. For a language of this type, there is a statistically significant
bias against Deletion, but not for Epenthesis or Retention. This may at first glance seem
problematic. However, given the goals of this study as stated above, avoidance of certain
processes in LWA is just as informative as preference for certain processes. Additionally, I
contend that this factoid is a merit of the classification methodology presented here. Abstractly,
if two things x and y happen at unequal rates, where x happens more frequently than y, to claim
that x is the preferred thing is congruous to claiming that y is the avoided thing. In other words,
there is no a priori way to differentiate preference and dispreference/avoidance. More concretely
and in terms of the present study, if epenthesis happens frequently and deletion happens
infrequently, the proposition that the grammar prefers epenthesis is no more meaningful or
insightful than the proposition that the grammar disprefers deletion. The classification
methodology presented here potentially dispels this ambiguity via statistical tests on observed
patterns.
39
Last to be addressed are languages that were significant in the direction of Retention in
the initial Chi-Square test, i.e. category S.R. These languages are given the classification of
|Retention Dominant| as their main classification. However, a second step may also be done for
these languages in order to achieve a more fine-grained understanding of the cross-linguistic
nature of syllable structure adaptation. That is, to achieve a fuller understanding of the
distribution of Epenthesis, Deletion, and “other,” the |Retention Dominant| languages were
subjected to further testing. For these languages, the Retention datapoints were excluded from
analysis, and a Chi-Square test was done comparing Epenthesis vs Deletion (and Other for the
five languages mentioned above) in these languages. This allows for a classification of subtypes
within the |Retention Dominant| category. That is, to be added to the typology of possible
languages, are possible subtypes within this category, labeled category 10 above in the table of
possible languages, i.e. Table 13.
10 Retention Dominant Description
10-ep Retention Dominant;
Subtype Epenthesis
In a Retention dominant language, Epenthesis is
statistically more common than Deletion
10-del Retention Dominant;
Subtype Deletion
In a Retention dominant language, Deletion is
statistically more common than Epenthesis
10-n Retention Dominant;
Subtype Neutral
In a Retention dominant language, Epenthesis and
Deletion occur equally.
TABLE 14. |Retention Dominant| subtypes
In all, this method provides a basis for the typological classification and range of the
languages in the corpus, which hopefully is suggestive about the typological classification and
range of the world’s languages, in general. Below, a flow chart illustrating this method is given.
40
FIGURE 4. Flowchart demonstrating the classification method.
2.3.3 LANGUAGE CLASSIFICATION. As described above, the first step in the typology is to
test for Retention versus adaptation. That is, the Epenthesis, Deletion, and Other observations
for each language were summed, and this sum was compared against the Retention values for
each language. This test was significant for 36 if the 53 languages in the corpus, with 26 in
significant in the direction of Retention, and 10 in the direction of Adaptation, with the
remaining 17 being not significant. The p-value, Chi-Square statistic, and means for each
language are given in Appendix D. The results for the initial step/test divide the languages into
three groups for further testing given below by category and in alphabetical order.
41
Category Test Result Languages
S.R.
Significant in the
direction of Retention
Archi, Armenian, Azerbaijani, Ceq Wong, Dutch,
English, Gurindji, Imbabura Quechua, Indonesian, Irish,
Javanese, Ket, Kildin Saami, Lower Serbian,
Macedonian, Malay, Maltese, Manange, Mapudungan,
Romania, Sakha, Seychelles Creole, Tarifyt Berber,
Thai, Turkish, Welsh
S.A. Significant in the
direction of Adaptation
Bezhta, Hausa, Hawaiian, Insindebele, Japanese,
Kali’na, Malagasy, Saramaccan, Shona, Swahili
N.S.
Not significant
Basque, Cebuano, Finnish, Gawwada, Georgian,
Gujarati, Hup, Iraqw, Kannada, Kanuri, Korean,
Malayalam, Marathi, Oroqen, Selice Romani, Tamil,
Yaqui
TABLE 15. Step 1 Results: Retention vs Adaptation
The second step is to test the languages within each of these categories. We turn to
category S.A., that is, languages that were significant in the direction of adaptation. The Chi-
Square test was significant for nine of these languages. The only language it was not significant
for was Bezhta. So for this language, comparing Retention and adaptation is significant,
showing a bias for adaptation; but comparing Epenthesis and Deletion is not, showing no bias.
(There is, however, a trend in favor for deletion for Bezhta (58 vs 87), but according to the Chi-
Square test there was a greater than 5% chance that this was due to chance.) Thus, Bezhta is
classified as an |Adapting, Process Neutral| language, per the definitions given above. For the
remaining nine languages, all but one (Saramaccan) were biased towards epenthesis, classifying
Saramaccan as |Deletion Dominant| and the rest as |Epenthesis Dominant|. These classifications
are given in Table 16. The data, p-values, and Chi-Square statistics can be viewed in Appendix
E.
Epenthesis Dominant Deletion Dominant Adapting, Process Neutral
Hausa, Hawaiian, Insindebele,
Japanese, Kali’na, Malagasy,
Shona, Swahili
Saramaccan Bezhta
TABLE 16: Adapting languages: Epenthesis vs Deletion
Next, we turn to languages of category N.S., where the initial test comparing Retention
versus adaptation was not significant. To sort out these languages, a three-way Chi-Square test
42
was done, comparing Retention vs Epenthesis vs Deletion
13
. The full results of these tests can be
seen in Appendix F. If the test is not significant for these languages, it indicates an even
distribution between Retention, Epenthesis, and Deletion. This was true for two languages: Hup
and Iraqw, resulting in their classification as |Neutral|. The number of observations of different
process types are fairly even in these two languages, especially for that of Iraqw, justifying its
classification as a |Neutral| language. However, Hup has a slight bias towards Deletion, with 5
observations of Epenthesis, 14 observations of Deletion, 2 observations of metathesis, and 7
observations of Retention. It could be that Hup is a |Deletion Dominant| language, and that both
the step 1 and step 2 tests for Hup did not reach statistical significance because of the low
number of datapoints in this language. Further inquiry into the Hup language and its loanwords
is necessary to sort this out. Because of this ambiguity, Hup will be unclassified in the final
typology.
Thirteen of the category N.S. languages were significant in the three-way Chi-Square
comparison, and two for the four-way Chi-Square comparison (Kanuri and Gawwada). Recall
that the procedure for classifying these languages is to run pair-wise comparisons based on the
raw observation trends in each language, as demonstrated above in 12. A subset of the logically
possible patterns in 12 are represented in the actual languages, with the addition of two
languages that included a fourth observation: metathesis occurred at relatively high rate in
Kanuri and Gawwada, and so Metathesis was included in the comparisons for these languages.
13
For Gawwada and Kanuri, a four-way test was done that included the “other” category, as these were the
languages that had a number of metathesis and/or substitution observation higher than the 5% threshold.
43
(13) Trends for Remaining Languages
a. Retention>Epenthesis>Deletion
Languages: Basque, Cebuano, Finnish, Georgian, Gujarati, Korean,
Malayalam, Marathi, Oroqen, Selice Romani, Tamil
b. Retention>Deletion>Epenthesis
Languages: Kannada, Yaqui
c. Epenthesis>Retention>Metathesis>Deletion
Language: Gawwada
d. Retention>Epenthesis>Metathesis>Deletion
Language: Kanuri
Recall from above that languages of this type are classified by comparing three sets of
pairwise statistical comparisons against the trends in the data. The pairwise comparisons for the
languages in 13a are shown in Table 17. Full inferential statistics data and means can be found
in Appendix G.
Language Retention vs Epenthesis Epenthesis vs Deletion Retention vs Deletion
Basque n.s. p < .0001 p < .0001
Cebuano n.s. p < .0001 p < .0001
Finnish n.s. p < .0001 p < .0001
Georgian n.s. p < .0001 p < .0001
Gujarati n.s. p < .0001 p < .0001
Korean n.s. p < .0001 p < .0001
Malayalam p = 0.019704 n.s. p = 0.010574
Marathi n.s. p < .0001 p < .0001
Oroqen p = 0.033598 n.s. p = 0.0050693
Selice Romani p = 0.019819 p < .0001 p < .0001
Tamil n.s. p = 0.019556 p = 0.00025035
TABLE 17. Pairwise Comparisons for Retention>Epenthesis>Deletion Languages
Basque, Cebuano, Finnish, Georgian, Gujarati, Korean, Marathi, and Tamil all showed
the same pattern: Retention versus Epenthesis was not significant, but Epenthesis vs. Deletion
was. This means that the grammar for these languages prefers both Epenthesis and Retention,
and avoids Deletion. Because there is no statistical significance between Retention and
Epenthesis, these languages are |Epenthesis & Retention Dominant| languages.
44
Selice Romani showed a significant difference between Retention and Epenthesis, and
Epenthesis and Deletion. Statistically, there was a three-way difference between these three
processes, with Retention being the most preferred, Deletion the least preferred, and Epenthesis
in the middle. This classifies Selice Romani as |Retention-Epenthesis Dominant|.
Malayalam and Oroqen showed the same pattern, where the comparison between
Retention and Epenthesis was significant, as was the comparison between Retention and
Deletion, but the comparison between Epenthesis and Deletion was not significant. These two
languages have a pattern of Retention>Epenthesis>Deletion; test comparing Epenthesis and
Deletion was not significant, suggesting that these processes occur at an equal rate; a more
accurate way of representing these two languages is, then, Retention>(Epenthesis, Deletion).
These languages then are by definition |Retention Dominant|
14
.
Yet to be classified are the Languages in 13b. Kannada and Yaqui (13b) show a
Retention>Deletion>Epenthesis pattern. The statistical results are given Table 18.
Language Retention vs Deletion Deletion vs Epenthesis Retention vs Epenthesis
Kannada
p = 0.0018447
n.s. p < .0001
Yaqui n.s. p < .0001 p < .0001
TABLE 18. Pairwise Comparisons for Retention>Deletion>Epenthesis Languages
Although the trend for Kannada is slightly different than that of Oroqen and Malayalam, the
distribution of (non-)significant results is the same. For Kannada, Retention is categorically
different than both Epenthesis and Deletion, but Epenthesis and Deletion are categorically the
same. This indicates that the Kannada pattern is Retention>(Deletion, Epenthesis), classifying it
as a |Retention Dominant| Language.
For Yaqui, the pattern is analogous to that of Basque, Cebuano, and other |Retention &
Epenthesis Dominant| languages, except that for Yaqui, Deletion is more common than
Epenthesis. There is a statistically significant bias against Epenthesis; the non-significance of
14
Note that this is precisely the situation referred to above, where languages with a low number of total datapoints
aren’t detected by the “first pass” statistical analysis as |Retention Dominant|, but are determined to be |Retention
Dominant| in step two. Malayalam has 5 observations of epenthesis, 4 observations of deletion, 1 other, and 21
observations of retention. Oroqen has 6 observations of 3 observations of deletion, 1 other, and 21 observations of
retention. Comparing these data points individually resulted in statistical significance, but comparing non-
adaptation versus adaptation did not. In other words, 21vs5vs4vs1 (the observations in Malayalam) is a meaningful
trend when the values are expected to be evenly distributed, but 21vs(5+4+1) is not.
45
the Retention verses Deletion comparison suggests that these processes occur at similar rates in
the LW observations, which by extension suggests that the grammar prefers either Retention and
Deletion – or has a bias against Epenthesis – indicating that Yaqui is a |Retention & Deletion|
Dominant Language.
Gawwada shows a trend of Epenthesis>Retention>Metathesis>Deletion, with 38
observations of Epenthesis, 23 observations of Retention, four observations of Metathesis, and
one observation of Deletion. Comparing Epenthesis versus Retention is not significant (p =
.1712), but the Retention versus Metathesis comparison is significant (p = .0057; !
2
=7.63).
Further, the Metathesis versus Deletion comparison is not significant (p = .32). In other words,
Gawwada shows the pattern (A,B)>(C), where C includes both Metathesis and Deletion. The
language prefers either Epenthesis or Retention, and has a bias against both Metathesis and
Deletion. Gawwada is thus a |Epenthesis & Retention Dominant| language.
The remaining language, Kanuri, is similar to Gawwada in that Metathesis is involved.
However, Kanuri requires a somewhat more complex analysis. Recall from above in 13c that
Kanuri shows a Retention>Epenthesis>Metathesis>Deletion trend. The test comparing
Retention versus Epenthesis was not significant, and neither was the test comparing Epenthesis
and Metathesis. None of the above tests were significant for Kanuri. However, the test of
Retention versus Deletion was significant (p = 0.0013, !
2
=10.394). Likewise, the test comparing
Epenthesis versus deletion was significant (p = 0.0247, !
2
=5.0481)
15
. This indicates that Kanuri
prefers anything but Deletion, and that the language treats Retention, Epenthesis, and Metathesis
the same. In other words, Kanuri has a bias against Deletion, but is neutral with respect to other
possible repairs. Kanuri therefore belongs in a new category: |Epenthesis, Retention, &
Metathesis Dominant|. This category is not listed in the table above, as cases of “other” such as
metathesis were not considered among the factorial possibilities. The above analyses of the
category N.S. languages are summarized in Table 19.
15
Tables reporting the p-values and lack of significance are therefore not given here, but can be seen in Appendix
G.
46
Category Language(s)
|Epenthesis & Retention Dominant| Basque, Cebuano, Finnish, Georgian,
Gujarati, Korean, Marathi, Tamil,
Gawwada
|Epenthesis-Retention Dominant| Selice Romani
|Retention Dominant| Malayalam, Oroqen, Kannada
|Retention & Deletion Dominant| Yaqui
|Epenthesis, Retention, & Metathesis
Dominant|
Kanuri
|Neutral| Iraqw
Table 19. Classification summary
Finally, we return to the S.R. languages, that is, the languages that were significant in the
direction of Retention as determined by step 1 of the typological classification. As discussed
above in the methodology section, the classification resulting from the step 1 analysis is
sufficient to classify these languages as |Retention Dominant|. However, a further analysis was
pursued that examined the subtype within the |Retention Dominant| category. In other words, the
data indicate that in the majority of cases, Retention is the way these languages handle sequences
of sounds in LWA. However, for a deeper understanding of LWA, the analysis of the |Retention
Dominant| subtype investigates what happens when these languages deviate from their
preference for Retention. That is, in the relatively few cases in which they “repair” syllable
structure in the SW, how commonly do they repair via Epenthesis versus Deletion, or by some
other method?
In order to investigate this, the Retention observations of these languages were excluded.
That is, Chi-Square tests were done comparing just their Epenthesis versus Deletion counts. For
Thai and Ceq Wong, Epenthesis versus Deletion versus Metathesis was also examined.
For 15 of the 29 Retention languages, the test was significant, with six in the direction of
Deletion, and nine in of Epenthesis. For Ceq Wong, metathesis and Deletion were significant,
classifying that language as a Deletion-Metathesis subtype. All other languages – that is, the
languages for which the test was not significant – are classified as subtype neutral. These
subtypes are shown below in Table 20, and statistics are given in Appendix H.
47
It should be noted that while these |Retention Dominant| subtypes bear similar names to
the other established language types, they are indeed different
16
. For example, |Retention
Dominant; subtype Epenthesis| has a similar name to |Retention-Epenthesis Dominant|, whereas
the only difference between the two is the punctuation and the word “subtype” in the former.
For the |Retention Dominant; subtype Epenthesis| category, these languages prefer Retention to
any other form of adaptation, but when they do indeed alter the SW to form a LW, they prefer
Epenthesis to Deletion at a significantly higher level. That is, Retention occurs in most cases; in
the minority of cases where Retention does not occur, Epenthesis occurs more frequently than
Deletion. For |Retention-Epenthesis Dominant| languages, Retention or Epenthesis occurs in
most cases.
The differences between these two categories may further be shown by thinking of the
types/categories as predictors of future behavior. Consider speakers of two hypothetical
languages A and B, adapting a set of words containing consonant clusters. Language A is
classified as |Retention Dominant; subtype Epenthesis|, and Language B is classified as
|Retention-Epenthesis Dominant|. For each word in the set, if we predict that the speaker of
Language A will adapt a SW containing a consonant cluster faithfully (i.e. Retention), our
prediction would turn out to be correct in nearly all cases. Conversely, if we make the same
prediction for a speaker of Language B, our prediction would be correct in most cases; for a
|Retention-Epenthesis Dominant| language, Retention happens most of the time, but Epenthesis
happens at a statistically significant rate.
Further, although these two categories are descriptively similar, they are distinguished in
the classification method employed here. Consider two languages in the corpus, Irish and Selice
Romani. Irish is classified as |Retention Dominant; subtype Epenthesis|, and Selice Romani is
classified as |Retention-Epenthesis Dominant|. Irish has 122 observations of Retention, thirteen
observations of Epenthesis, and two observations of Deletion. Selice Romani has 108
observations of Retention, 65 observations of Epenthesis, and eighteen observations of Deletion.
Per the rationale above, the first step in the classification is to determine if Retention is the
dominant pattern. This is done by comparing Retention versus adaptation (Epenthesis plus
Deletion). For Irish, the first step compares the values 122 versus 15, which is statistically
16
This is true for other types and subtypes as well, e.g. |Retention Dominant; subtype Deletion| and |Retention-
Deletion Dominant|.
48
significant (p<.0001, !
2
= 49.303), immediately classifying Irish as |Retention Dominant|. In the
15 cases where an adaptation occurred for Irish, 13 were observations of Epenthesis versus only
2 for Deletion, statistically significant in the direction of Epenthesis (p = .016, !
2
= 5.822), giving
Irish the “subtype Epenthesis” label in its classification of |Retention Dominant; subtype
Epenthesis|.
For Selice Romani, 108 is compared to 83 (65 observations of Epenthesis plus eighteen
observations of Deletion), which is not significant (p = .2, !
2
= 1.643). Comparing Selice
Romani’s three values of 108, 65, and 18 is significant (p < .0001, !
2
= 36.999), indicating that
Selice Romani is not neutral in terms of its adaptation preference (and is thus not a |Neutral|
language). Comparing Retention (108) versus Epenthesis (65) is significant (p = .02, !
2
= 5.428),
and so is comparing Epenthesis (65) versus Deletion (18) (p = .0014 !
2
= 14.467), indicating a
primary bias for Retention, and a secondary bias for Epenthesis, which is a |Retention-Epenthesis
Dominant| language.
Although the |Retention Dominant| subtypes are examining a minority of data, and often
times a very small minority (similar to the magnitude of Irish), it is argued that such
classifications are nonetheless meaningful. As the goal of this study is partially to examine the
commonality of processes in LW, the fact that when a language with permissive syllable
structure that tends to adapt foreign sound sequences faithfully, such as Irish, does indeed
“repair” a LW, the chosen repair is Epenthesis, rather than Deletion. In fact, languages that
pattern this way are more common than |Retention Dominant| languages that prefer Deletion
over Epenthesis. More significantly, this mirrors the trends of the adapting languages:
Epenthesis is common, and Deletion is rare. Such patterns are discussed more fully in the
subsequent sections.
49
Subtype Languages
Retention Dominant; subtype Neutral Archi, Armenian, Azerbaijani, Dutch,
Imbabura Quechua, Javanese, Kannada,
Ket, Mapudungan, Oroqen, Tarifyt Berber,
Thai, Turkish, Welsh, Malayalam
Retention Dominant; subtype Epenthesis Indonesian, Irish, Macedonian, Malay,
Malayalam, Maltese, Manange, Romanian,
Sakha
Retention Dominant; subtype Deletion English, Gurindji, Kildin Saami, Lower
Serbian, Seychelles Creole
Retention Dominant; subtype Deletion-
Metathesis
Ceq Wong
TABLE 20. |Retention Dominant| Subtype Classification
The classification is complete: All 52 languages have been classified according to the
method developed in this chapter
17
. In the following section, all of the above pieces are
assembled into a typology of language process-preference, the main goal and focus of analysis of
this chapter.
2.4 TYPOLOGICAL TRENDS. This section provides various ways of visualizing the
typological trends revealed in the previous section. The typology of possible languages that was
given above in Table 13 is filled out, with noticeable gaps pointed out. In order to explain these
gaps, the typology of languages is condensed into a typology of types, or “second order”
typology, further demonstrating a clear avoidance of deletion as a repair in LWA of syllable
structure, and a preference for epenthesis and faithful adaptation (i.e. Retention). Further, it is
argued that these patterns and preferences are best viewed as a function of the amount of
phonological material in the SW that is retained in the LW, and that adding material to the SW
by way of epenthesis is relatively benign compared to subtracting phonological material from the
SW by way of deletion. This way of thinking about the typological distribution points to a
mandate to preserve phonological information, a mandate that is central to the LWA process.
But before any conclusions are drawn, potential limitations and problematic issues of the
typology investigated in this chapter are discussed.
17
Recall that Hup was excluded.
50
2.4.1 THE TYPOLOGY OF LANGUAGE PROCESS PREFERENCE. The table below gives the
complete and final typology of the languages investigated in this study. Kanuri was placed in
with the |Epenthesis & Retention Dominant| languages, denoted with an asterisk, although it was
originally classified as |Epenthesis, Retention, and Metathesis Dominant|. This was done for
parsimony and is based on the similarity between Metathesis and Retention. For Retention,
nothing is added or subtracted from the source word (SW); for metathesis, nothing is added or
subtracted as well, just the order of segments is switched. The word “Dominant” has been
omitted from the combined categories, for example |Retention-Epenthesis Dominant| is simply
listed as |Retention-Epenthesis| in the table. They are the same thing.
Type # Languages
Epenthesis Dominant 8 Hausa, Hawaiian, Insindebele, Japanese, Kalina,
Malagasy, Shona, Swahili
Epenthesis-Retention 0
Epenthesis-Deletion 0
Epenthesis & Retention 10 Basque, Cebuano, Finnish, Georgian, Gujarati,
Korean, Marathi, Gawwada, Tamil, Kanuri
*
Epenthesis & Deletion 0
Deletion Dominant 1 Saramaccan
Deletion-Epenthesis 0
Deletion-Retention 0
Deletion & Retention 1 Yaqui
Neutral 1 Iraqw
Adapting, Process Neutral 1 Bezhta
Retention-Epenthesis 1 Selice Romani
Retention-Deletion 0
Retention Dominant 29
Subtype Neutral 14 Archi, Armenian, Azerbaijani, Dutch, Imbabura
Quechua, Javanese, Kannada, Ket, Mapudungan,
Oroqen, Tarifyt Berber, Thai, Turkish, Welsh
Subtype Epenthesis 9 Indonesian, Irish, Macedonian, Malay,
Malayalam, Maltese, Manange, Romanian,
Sakha
Subtype Deletion 5 English, Gurindji, Kildin Saami, Lower Serbian,
Seychelles Creole
Subtype Deletion-Metathesis 1 Ceq Wong
TABLE 21: Typological results
There are a few things to take note of in the final typology. First, there are a number of
gaps, that is, possible types that are not represented by any language: six to be precise. This is
51
not necessarily problematic, as the set of possible grammars in linguistic theory is larger than the
set of grammars that actually occur. Secondly, languages are not evenly distributed among the
types that do have representative languages. Five types are only represented by one language,
and three types are represented by the remainder of the languages. That is, 47 of the 52
languages fit into just three categories: |Retention Dominant|, |Epenthesis & Retention
Dominant|, and |Epenthesis Dominant|, and more than half are classified as |Retention
Dominant|. Just one language was a strong bias for Deletion, being classified as |Deletion
Dominant|: Saramaccan
18
. Third, the distribution of language types – that is, the typology of
language process preference – directly mirrors the typology of process, given above in §2.2.
Non-Adaptation/Retention is by far the most common, followed by Epenthesis, followed by
Deletion, both in terms of the processes that occur cross-linguistically, and the processes that
occur within languages.
The main goal of this chapter, and of the dissertation in general, is not just to provide a
typology of the LWA of sound sequences, but to explain the patterns in the typology. That is,
why are Retention and Epenthesis so strongly preferred, and why is Deletion so rare?
To further investigate the distribution of types, a “second order” typology can be
constructed based on the idea of how likely a language is to repair a foreign sequence of sounds
(i.e. consonant clusters and codas), and how strong its preference for repair strategy is. I
propose that this “super-typology” has just nine types, written in the text with double vertical
lines “||” to distinguish them from the types previously established. These are given below, along
with their definition, and which types they supersede. The acronym “FSS” is used as shorthand
for “Foreign Sound Sequences.” Ceq Wong, the |Retention Dominant, subtype Deletion-
Metathesis| language is included with the |Retention Dominant, subtype Deletion| languages.
18
Saramaccan is a creole language. Its “loanwords” may not actually be true loanwords; it seems within the realm
of possibility that phonological changes from an established language to a pidgin to a creole are subject to different
pressures than the LWA process.
52
Super-type Definition Subtypes Count, Languages
||Epenthesis Default|| A language repairs FSS with
Epenthesis
|Epenthesis Dominant| 8: Hausa, Hawaiian,
Insindebele,
Japanese, Kalina,
Malagasy, Shona,
Swahili
||Epenthesis Strong|| A language both repairs and
retains FSS; Epenthesis is the
preferred repair
|Epenthesis-Retention|
|Epenthesis & Retention|
11: Selice Romani,
Basque, Cebuano,
Finnish, Georgian,
Gujarati, Korean,
Marathi, Gawwada,
Tamil, Kanuri
*
||Epenthesis Weak|| A language mostly retains
FSS; when a repair happens,
Epenthesis is used
|Retention Dominant;
subtype Epenthesis|
9: Indonesian, Irish,
Macedonian, Malay,
Malayalam, Maltese,
Manange, Romanian,
Sakha
||Deletion Default|| A language repairs FSS with
Deletion
|Deletion Dominant| 1: Saramaccan
||Deletion Strong|| A language both repairs and
retains foreign syllable
structure; Deletion is the
preferred repair
|Deletion-Retention|
|Deletion & Retention|
1: Yaqui
||Deletion Weak|| A language mostly retains
FSS; when a repair happens,
Deletion is the preferred repair
|Retention Dominant;
subtype Deletion|
6: English, Gurindji,
Kildin Saami, Lower
Serbian, Seychelles
Creole, Ceq Wong
||Non-Adapting|| A language retains FSS in
nearly all cases
|Retention Dominant,
subtype neutral|
14: Archi, Armenian,
Azerbaijani, Dutch,
Imbabura Quechua,
Javanese, Kannada,
Ket, Mapudungan,
Oroqen, Tarifyt
Berber, Thai,
Turkish, Welsh
||Neutral|| A language displays no
preference for how it adapts
FSS
|Neutral|
|Adapting, Process
Neutral|
2: Iraqw, Bezhta
TABLE 22. A Typology of Types
These super-types are again skewed in the same direction, displaying the same pattern
seen throughout this chapter, except that Epenthesis has a greater representation. Fourteen
languages are ||Non-Adapting||, two languages are ||Neutral||, 28 are of the Epenthesis type, and
six are of the Deletion type. Again, Deletion is strongly avoided as an option for repairing
syllable structure. The remainder of this chapter, and the next, seeks to explain why this is.
53
2.4.2 THE TYPOLOGY AS A FUNCTION OF PHONOLOGICAL MATERIAL. In order to get at
this question, it is useful to think of the typology as not of a typology of processes, but as a
typology of the amount of phonological material in the LW. When a language adapts a source
word (SW) by using Epenthesis, all of the phonological material in the SW is contained in the
LW, plus additional phonological material, that is, the epenthetic vowel. Likewise, when a
language adapts a SW faithfully, all of the phonological material in SW remains in the LW.
However, with deletion, there is less phonological material in the LW as there is in the SW. The
categories in the typology, and in the super-typology given in Table 22, can be thought of as a
function of the amount of phonological material in the LW. For example, if a language is of the
type ||Epenthesis Default|| or ||Epenthesis Strong||, its loanwords will have a relatively large
amount of phonological material as compared to the source word, and the ||Deletion|| side of the
spectrum will have less. Consider Figure 5 below, which organizes the super-types along a
spectrum representing the amount of phonological material in LWs.
54
FIGURE 5. Phonological material spectrum
The more leftward a language is on this spectrum, the more likely it is that any given LW in the
language will have more phonological material with respect to the SW, and vice versa for how
rightward it is. Clearly, the number of languages is skewed to the left. And this distribution is
statistically significant (p = 0.03011, !
2
=13.95777). Figures 6 and 7 show this distribution as a
bar graph and function, demonstrating a correlation between phonological material and number
of languages: languages with relatively less phonological material in the LW are relatively less
in number.
55
FIGURE 6. Bar graph showing the distribution of language super-types
FIGURE 7. Distribution plotted as a function
56
The function in Figure 7 nearly looks like a standard distribution – except that the right
half drops off. Most importantly, the point at which the plot drops off is the same point at which
LWs lose information contained in the SW, by way of repairing via deletion. The LWA process
thus seems to be driven by a mandate to preserve information in the source word. Languages
that do exactly that are the most common: the plot peaks with the number of languages that
adapt syllable structure in foreign words faithfully. Languages that preserve all of the
phonological material in the SW, as well as adding new material in the LW, i.e. Epenthesis-
languages, are tolerated. Languages that do not preserve information in the SW – that is,
languages that employ Deletion – are exceedingly rare.
There are two possible ways to think of this pattern: 1) Avoidance of Deletion, and 2)
Preference for Epenthesis and Non-Adaptation. Although both are descriptively accurate, I
contend that the second one is more insightful. Consider what Epenthesis and Retention/Non-
Adaptation have in common: In both of these two phenomena, all of the segments in the Source
Word (SW) are preserved in the Loanword (LW). With Deletion, segments are lost. If we
further consider Epenthesis and Retention to be one of the same thing, that is, as the same
category of phenomenon, then the typology becomes remarkably robust. Call this the
Epenthesis+Retention category “segment preserving,” and consider it against Deletion
languages, or “non segment preserving” category. In the Typology of 52 languages (Recall Hup
was excluded), 48 languages are “segment preserving,” and just two languages are “non segment
preserving.” The Chi-Square test indicates that this distribution is not random (p = 2.21 x 10
-7
,
!
2
=26.839). Even if the subtypes within the |Retention Dominant| category are considered, with
the deletion subtypes considered “non segment preserving,” the distribution is nonetheless
robust, with 42 languages considered “segment preserving” and 8 considered “non segment
preserving” (p = 2.99 x 10
-4
, !
2
=13.071). Clearly there is bias to preserve segments.
Yet, before any strong claims can be made regarding the nature of LWA, the limitations
of the typology investigated in this study need be addressed.
2.5 LIMITATIONS OF THE TYPOLOGICAL INVESTIGATION. The main typology derived and
investigated in this chapter – that is, the typology of language process preference – is suggestive
that some common mechanism is responsible for the LWA process, a proposition that explains
57
the distribution and patterns of the languages observed thus far. The results of the typological
investigation show that Retention is the preferred way of handling words in a foreign language
that contain marked syllable structure. Of the 53 languages investigated in this study, 29 of them
are classified as |Retention Dominant|, meaning that a statistically significant majority of the LW
observations in these languages are LWs that faithfully adapt the syllable structure of the SW.
Further, 41 of the 53 languages show some form of Retention; on top of the 29 that are
|Retention Dominant|, one language shows a primary bias for Retention and a secondary bias for
Epenthesis (Selice Romani), one language shows a bias for Retention and Deletion (Yaqui), and
ten others show a bias for both Retention and Epenthesis (i.e. those classified as |Epenthesis &
Retention Dominant|). The same bias for Retention is likewise seen in the typology of process,
independent of languages, described in §2.2, where both the raw process counts and the
normalized/scaled process counts show a strong tendency to faithfully adapt foreign sound
sequences. These results could be considered to support the proposition that there is some
universal, cross-linguistic grammatical mechanism underpinning LWA that demands faithful
adaptation (i.e. Retention) of consonant clusters and coda consonants in foreign words.
However, such a conclusion – or any similar conclusion – is only weak at best. This is
because what could be a key component in determining the shape of the LWA typology was not
addressed. This key component is the properties of the grammars of the languages examined. In
other words, the explanatory power of the analysis is limited in that the phonotactic allowances
of the native grammars of the languages investigated in the study could – and most likely do, to
some extent – play a crucial role in determining how speakers of these languages handle syllable
structure in foreign words.
Although this is problematic across the board, it is most problematic for the classification
of the |Retention Dominant| languages. It could be that the languages that are |Retention
Dominant| in their LWA behavior are Retention Dominant specifically because the (native)
grammar permits marked syllable structure. This problem was alluded to in the §2.1, where I
explained that I sought genetic and geographical diversity in the corpus in order to maximize the
probability that the languages analyzed showed diversity with respect to how they handle
syllable structure in their native words. I gave the example of a hypothetical corpus that
contained mainly European Languages, suggesting that such a corpus of LWA in these languages
may not be reflective of a true typology of LWA, as European languages tend to be liberal in
58
terms of syllable structure, which may transfer to how the syllable structure of foreign words is
borrowed into these languages. This prediction was largely borne out: nine European languages
were examined in the corpus, and eight of them are classified |Retention Dominant|, with the sole
exception being Selice Romani, which is classified as |Retention-Epenthesis Dominant|.
The Epenthesis-preferring languages are also subject to the same uncontrolled factor.
Among the |Epenthesis Dominant| languages are Hawaiian and Japanese, both languages that are
well known to have restrictive syllable structure. In other words, speakers of Hawaiian, Japanese,
and similar languages may not show much Retention when adapting foreign sound sequences
because they cannot pronounce marked syllable structure, as consonant clusters are prohibited in
their native language. Nevertheless, the preference for Epenthesis as opposed to Deletion is not
as readily explained, and is potentially meaningful for the study of LWA. Yet in all, the scope of
the explanatory power of the typological study is limited, as the properties of the native
grammars of the languages in the corpus was not a dimension included in the analysis.
2.6 AN ANALYSIS OF CONFORMITY TO THE NATIVE PHONOLOGY. In consideration of the
problems explained above, an additional analysis was conducted that included the properties of
the native phonology of the languages in the cross-linguistic corpus. That is, the extent to which
the LW observations in each language conform to the phonotactic properties of native (non-
borrowed) phonology was investigated. The findings of this investigation both address and
partially remedy the issues explained in the previous section, in addition to providing insightful
data regarding the nature of LWA. This study is explicated below.
2.6.1 DATA AND METHODOLOGY. Properties of the native phonology of the languages
were not examined up front in this study mainly because many of these languages lack
published, complete phonological descriptions. However, relevant grammatical properties can
nonetheless be adduced obliquely be examining surface forms of the languages. For example, if
a language contains consonant clusters in the surface forms (i.e. published vocabulary), then
obviously the language permits consonant clusters. Conversely, if a language lacks consonant
clusters in the vocabulary items examined, then it may be assumed with limited certainty that the
grammar of that language prohibits consonant clusters. For the less-well studied languages in
the corpus, words in the native vocabulary were collected and examined for the presence and
59
absence of four different aspects of syllable structure: word-initial consonant clusters, word-
medial consonant-consonant sequences, word-final consonant clusters, and word-final codas.
Data for this analysis were collected in a similar way to the LW data, as described in §2.1
above. For the languages in the WOLD database, the words that were marked as native
vocabulary items were compiled as a list to be used for the analysis. For the languages for which
Google Translate was available, Chapter 1 of the novel Moby Dick was translated by Google
Translate, resulting in a set of words that was used for the analysis. This text was chosen
because the novel is freely available online and contains a large number of various words. The
languages for which the WOLD method was used and the Google Translate method was used are
listed in Appendix A. For Shona and Insindebele, online dictionaries were scanned for words
containing the relevant properties of analysis. Malayalam was not included in this analysis due
to lack of availability of materials demonstrating native words in this language.
A Python script was written that input a set of words for a given language, and output a
list of words containing the four properties of interest listed above (i.e. word position)
19
. Each
output file was manually checked for accuracy, insuring that list of words generated were not
obviously loanwords, and accurately demonstrated the presence of consonant clusters in the
given language. For example, words that the script listed as consonant clusters, that were not
actually consonant clusters, such as affricates spelled /ts/ in the vocabularies, and final /N/ spelled
as “ng” were removed, so that reasonably accurate information regarding the presence or absence
of syllable structure in a given language was used.
The presence and/or absence of consonant clusters (or coda consonants) in the list of
native words was compared to the presence and/or absence of consonant clusters in the LW
corpus. The full extent to which the LW data mirrored the native word data is beyond the scope
of this section. However, four relevant findings and data are presented below.
2.6.2 MAIN FINDINGS OF THE CONFORMITY ANALYSIS. Comparing the LW data against
native words for each language resulted in four findings that are relevant for addressing the
issues explained above, as well as for the typology of LWA of syllable structure in general.
1) The proposition that |Retention Dominant| languages are Retention Dominant due to
the phonotactic properties of the L1 is mostly but not entirely confirmed. Twelve of the
19
Special thanks to Ben Herzberg for assistance with this
60
languages classified as |Retention Dominant| have multiple examples of native words containing
consonant clusters in word-initial, word-medial, and word-final position, as well as word final
codas. This is true for Archi, Armenian, Azerbaijani, Dutch, English, Kannada, Lower Serbian,
Macedonian, Romanian, Thai, Turkish, and Welsh. This indicates that the limitations of the LW
typological analysis described above are applicable for these languages. However, this is not
true for all languages, as demonstrated by other relevant findings.
2) The tolerance of consonant clusters in the L1 phonology is not necessarily predictive
that the language is |Retention Dominant| in the LW phonology. Kanuri, Basque, Bezhta, and
Iraqw all tolerate consonant clusters as well as coda consonants, but repair LWs in the majority
of cases, and in all positions. Examples from Basque are given in 14.
(14) Basque Vocabulary
a. Word-initial consonant clusters
i. Native word: [greziaren] ‘Greeks’
ii. Native Word: [trapo] ‘clothes’
iii. Loanword: ‘Cross/crucifix’! [gurutze]
b. Word-final codas
i. Native word: [eskatzen] ‘required’
ii. Loanword: “electron” ! [elektroi]
iii. Loanword: “penicillin” ! [penizilina]
Native Basque vocabulary contains many words ending in [n], e.g. 14b.i, yet the same
sound is sometimes repaired in LWs with Deletion (14b.ii) and Epenthesis (14b.iii). Likewise,
stop+r sequences appear in native Basque words (14a.i, 14a.ii), but similar sequences are
repaired in LWs (14.a.iii). If the properties of the native phonology (i.e. tolerance of consonant
clusters) necessarily informs how LWs are adapted, then these languages should not behave the
way they do in LWA – they should be classified as |Retention Dominant|, but are not
20
.
3) Conversely, the classification of a language as |Retention Dominant| is not necessarily
predictive of the properties of its native phonology. Sakha and Imbabura Quechua contain no
evidence of consonant clusters (i.e. there were no consonant clusters in the words examined in
20
Granted, this analysis did not control for the consonants that were involved. That is, having consonant clusters
(or codas) in the native vocabulary was treated as binary, where a gradient description might be more accurate.
Some languages may permit only some types of consonant clusters in the native vocabulary, which would require
them to repair other types in LWs. However, parallel cases such as Basque final /n/ (14b) indeed exist.
61
this analysis), yet tend to adapt consonant clusters in LWs faithfully, classifying them as
|Retention Dominant|. This suggests that LWA processes can sometimes be lexically non-
conforming, retaining consonant clusters in LWs, even though such sequences aren’t in the
words in the native lexicon. This fact is independently confirmed for Swahili by Mwita (2009),
who claims that native Kiswahili words prohibit consonant clusters, but consonant clusters are
tolerated in borrowings. For example, the English word “skirt” is adapted as [sketi] in Swahili,
whereas the language contains no native examples of [sk] onset sequences.
4) Fifteen |Retention Dominant| show only partial conformity to the native phonology.
Javanese, Ket, Oroqen, Tarifiyt Berber, Indonesian, Irish, Malay, Maltese, Manange,
Mapudungan, Gurindji, Kildin Saami, Lower Serbian, Seychelles Creole, and Ceq Wong all have
examples of consonant clusters in native words, but repair similar or identical consonant clusters
in significant number LWs. For example Tarifiyt Berber allows word-initial consonant clusters,
but in the LW corpus, the language has 48 observations of Retention, and 60 observations of
adaptation, including Deletion, shown in 14. Another example is provided by Irish, which
allows word-final consonant clusters in native words, but in the LW corpus, Irish has 14
observations of Retention, and 12 observations of adaptation, including Epenthesis, one example
for which is given in 15.
(14) Tarifiyt Berber vocabulary
a. Native word: [fri] ‘to cut’
b. Loanword: [fr´q] ! [faq] ‘to separate’ From Moroc. Arabic
(15) Irish vocabulary
a. Native word: [teaSt] ‘to come’
b. Loanword: [tost] ! [tosta] ‘toast’ From English
These four findings show that the situation involving the |Retention Dominant| languages
is more complex than simply being cases of a language handling foreign sound sequences in the
same way it handles syllable structure in the native grammar. Although this is the case for many
of the |Retention Dominant| languages, it is not the case for all. Additionally, some of the
|Retention Dominant| languages conform to the properties of the native grammar in certain
positions in LWs, but show a preference for adaptation in others.
The short analysis of conformity to native phonotactics requirements done here does not
completely ameliorate limitations of the typological analysis presented in this chapter, but it does
62
indeed partially address the issue. The classification of the |Retention Dominant| languages is
not trivial. More broadly, the findings of this analysis point to the idea that LWA is not merely a
reflex of the native grammar, as a number of languages allow consonant clusters and codas in
their native words, yet tend to repair analogous syllable structures in LWs. Likewise, languages
are sometimes non-conforming in the opposite direction, allowing consonant clusters and coda
consonants in LWs, whereas such sequences are prohibited in LWs. It is argued that this non-
conformity is related to the trends in the typology discussed in the analysis above, which indicate
the primacy of information preservation in LWA. This connection will be made more clear in
the following chapter, which more fully discusses the primacy of perceptual cues in LWA. But
first, the idea that LWA is governed by the primacy of perceptual cues is introduced in the
following section.
2.7 CONCLUSION: A CUE-BASED APPROACH. In this chapter, a strong preference against
deletion, and for epenthesis and non-adaptation was observed in many ways. The earlier part of
the chapter, which gave a process-focused analysis, showed this trend. For raw counts and
normalized counts, Retention is by far the most common phenomenon observed in LWA, and
Deletion is by far the least common, with Epenthesis somewhere in the middle.
The language-focused analysis – that is, the typology of language process preference –
likewise follows this same pattern, where Epenthesis and Retention are preferred to Deletion.
Typologically, epenthesis-preferring languages are more common than deletion-preferring
languages. And when languages are classified by how much information they preserve in the
loanword, the pattern is even stronger. “Segment-Preserving” languages far outnumber “Non-
Segment preserving” languages. These findings can be explained by positing that speakers tend
to behave in certain ways when adapting a foreign word into their native language. Although
there are exceptions (and a couple of exceptional languages, i.e. Yaqui and Saramaccan), the
general tendency in LWA is to proceed in the manner described in 16:
(16) The LWA Procedure
a. Adapt a loanword as faithfully as possible.
b. When change is made, use epenthesis.
The explanation for why speakers tend to proceed this way is based on the imperative to
preserve information in the SW when adapting it as a LW. In preceding sections, information
63
preservation was posited to be the force that explained the observed patterns. Deletion is rare
because information is lost when this repair is used, and Epenthesis and Retention are common
because information is not lost: the information in the SW is preserved in the LW. The question
that follows, then, is what this information is.
When a word is uttered, an acoustic signal is produced. This acoustic signal has various
properties that the listener uses to decipher the signal into meaningful language. In other words,
certain properties of the acoustic signal convey information as to the type of speech sounds
uttered. For example, voice onset time (VOT) is a perceptual cue that speakers may use to
distinguish voicing contrasts in stop consonants. The lag between the time at which a bilabial
stop is released and the vocal folds start vibrating cues speakers as to whether the spoken sound
is a /b/ or a /p/ (Johnson 1997). Perceptual cues are the vessels of information.
If information preservation is the driving force that affects how speakers tend to adapt
LWs, and if perceptual cues are the locus of this information, then it follows that perceptual cues
are paramount in LWA. The claim regarding the nature of LWA is that speakers are not only
sensitive to the perceptual cues in a SW, but manipulate the SW in ways that increase the
strength, quality, and number of perceptual cues, thereby making the information in the SW
more perceptible.
Positing the primacy of perceptual cues is capable of explaining the patterns observed in
this chapter. Cross-linguistically, speakers tend to avoid deletion as a repair for foreign sound
sequences, as deleting a segment from the source word eliminates all of the perceptual cues in
the deleted segment. Likewise, when a loanword must be repaired, epenthesis is far more
common than deletion, as epenthesis preserves the perceptual cues in the SW, on top of having
the additional benefit of making segments in the SW more perceptible. For example, if a foreign
word with the shape CVCC is adapted using epenthesis, i.e. CVC´C, the perceptibility of the two
final consonants is enhanced; perceptual cues as to the manner and place of these consonants
increase in strength, quality, and number, thereby enhancing the perceptibility of the consonants.
This claim that the primacy of perceptual cues tends to inform LWA, thereby explaining the
patterns in this chapter, is further explicated and explored in the following chapters.
64
3 LOANWORD ADAPTATION AND THE PRIMACY OF PERCEPTUAL CUES. The previous
chapter examined the cross-linguistic typology of sound sequence adaptation in loanwords. Both
across and within languages, it was shown that a robust pattern exists for how languages adapt
foreign sequences of sounds. Retention and Epenthesis are preferred to Deletion. This chapter
explains the typological patterns and the robustness of the typological patterns by positing the
primacy of perceptual cues in the LWA process (Wright 2004). That is, perceptual cues and cue
robustness are proposed to be crucial for describing and explaining various observations
regarding LWA. The main claim is that speakers are highly sensitive to perceptual cues in a
foreign word, as cues are crucial to the perception and comprehension of a word being borrowed
into the native grammar; speakers thus borrow words from foreign languages in a manner that
preserves cues to phonological information, and adapt words in ways that increase the strength,
quality, and number of perceptual cues, in order to enhance the perceptibility a LW. This
approach to LWA is called the Cue Hypothesis. The basic ideas behind the Cue Hypothesis have
connections to other research on phonological theory; such connections will be discussed in
Chapter 5, but here, the general descriptive approach is given.
Section §3.1 explains some of the foundations for the proposition of the primacy of
perceptual cues in LWA, including conceptual, historical, sociological, and linguistic factors
relevant for LWA. §3.2 then lays out the Cue Hypothesis, providing illustrations and explaining
why it is able to account for the observed cross-linguistic patterns in LWA. The Cue Hypothesis
makes specific predictions as to what patterns should occur in LWA. These predictions are laid
out in §3.3 and tested in §3.4. Section §3.5 provides an interpretation of the results and a
conclusion.
3.1 LOANWORDS AND BORROWING. The proposition that perceptual cues are essential in
describing and explaining the observed patterns of LWA is based on the fact that that language is
a social and communicative phenomenon. The purpose of language/speech is for one speaker to
transmit a message to others. Essential for the successful transmission of messages is that the
speaker is understood. For comprehension to successfully occur, speech must be made
perceptible.
In speech, shortcuts and reduction are frequently used: words are omitted, truncated, and
otherwise altered. One need look no further than everyday English to observe this. The phrase
65
“it is” becomes “it’s,” “psychology” becomes “psych,” “… ten percent possible” [tEn p®sEnt
pasIbl``] becomes [tEm p
´
sE/ pasIbl``], and so on. Speakers automatically and unconsciously
minimize articulatory effort, a phenomenon ascribed to the cognitive force of ease-of-articulation
(Välimaa-Blum 2005).
Yet there is a limit to how much shorthand and shortcuts that can be used in speech. If
the articulation of a message is reduced too much, then the transmission of the message may be
compromised: the listener may not perceive the message accurately, or at all. Ease-of-
articulation is kept in check by another force, ease-of-perception
21
. Speech must be sufficiently
perceptible so that the message is understood. Speakers minimize articulatory effort in
conveying a message, but not to an extent that compromises the comprehension of the message.
Ease-of-articulation and ease-of-perception are the cognitive forces underpinning the message
transmission between (two) or more speakers of the same language; one keeps the other in check.
However, the situation is slightly different when loanwords (LWs) are involved. By
definition, LWs are foreign words that are at some point not present in the lexicon of the
borrowing language (Lb); they are necessarily unfamiliar to speakers of the Lb at the point in
time when they are initially borrowed, and may contain unfamiliar sounds, and unfamiliar
sequences of sounds.
The claim put forth here is that, because of the unfamiliarity of the borrowed/foreign
word (or source word; SW), the weighting of ease-of-articulation with respect to ease-of-
perception changes. When speaking known/lexical words, speakers of the Lb are relatively free
to truncate and/or reduce such words; the chances of misperception are relatively low due to the
familiarity (or status in the lexicon). For a novel/foreign word, the chances of misperception are
relatively higher. So that successful communication happens, and that the speaker is understood,
the ease-of-perception force becomes stronger, constraining ease-of-articulation more than
normally. The unfamiliarity of the foreign word causes ease-of-perception to be weighted more
(with respect to ease-of-articulation) when used by speakers of the Lb. Because of this, ease-of-
articulation is weaker when it comes in conflict with ease-of-perception, in LWA.
Speech is perceived by interpreting perceptual cues contained within the acoustic signal.
The Cue Hypothesis then claims that speakers manipulate unfamiliar words (SWs) in order to
21
This may also (and more precisely) be explained by a force to maximize the distinctiveness of contrasts
(Flemming 1995, 2004).
66
enhance their perceptibility. They do this in two related but distinct ways. 1) Speakers preserve
perceptual cues in the SW, and 2) speakers alter the SW in ways that enhances the perceptibility
of the sounds in the SW, increasing perceptual cues in the SW in strength, quality, and number.
This claim is capable of explaining several phenomena observed in the previous chapter.
It explains why deletion is rare: deletion removes a segment from the SW, thereby removing its
cues, thereby removing information that could be important for comprehension. This is also why
Retention is so common, for inverse reasons.
The above proposition likewise explains the commonality of epenthesis: Epenthesis is
the favored repair not just because it preserves all of the segments in the SW, but also because it
enhances the perceptibility of the SW, thereby facilitating ease-of-perception. It does this by
increasing the strength, quality, and number of perceptual cues as to the place and manner of the
consonants involved. For example, consider a word-initial C
1
C
2
V sequence that is adapted as
C
1
´C
2
V. The epenthetic vowel adds CV formant transition to C
1
, and conversely VC formant
transitions to C
2
. Formant transitions provide perceptual cues for the place and manner of
consonants, especially for obstruents (Wright 2004). In this situation, the perceptual cues for the
consonants are made stronger and increase in quality and number, via the (CV) formant
transition. Epenthesis therefore enhances the perceptibility of consonants.
A central claim that I make in this chapter is that is that the choice of repair, specifically
epenthesis, happens not for only reasons of minimizing articulatory difficulty for a speaker
pronouncing a foreign word. Normally, epenthesis is attributed to ease-of-articulation; here, I
claim that in LWA, it happens, at least some of the time, for ease-of-perception. When speakers
adapt a SW via epenthesis, they enhance the perceptibility of the consonants in the SW, thereby
increasing the probability of being understood. That is, I claim that epenthesis in loanwords does
not always happen to satisfy phonotactic constraints of the native Lb phonology; it sometimes
happens to enhance perceptibility via the manipulation of perceptual cues. In other words,
epenthesis in a LW is compelled by one (or both) of the following: 1) the phonotactic constraints
of the native Lb phonology (ease-of-articulation), 2) the necessity to enhance the perceptibility of
sounds in the SW (ease-of-perception).
A fundamental demonstration of this was shown in the previous chapter. Irish tolerates
consonant clusters in all positions, including word-final consonant clusters consisting of a
sibilant and an obstruent, for example [teaSt] ‘to come’, [geist] ‘entailed.’ Yet in the LW
67
phonology, Retention and Epenthesis happen at similar rates. Most telling is the illustration of
the borrowing of the English word “toast” into Irish: [tost] ! [tosta]. This suggests that
epenthesis in LWA sometimes happens for reasons other than the native grammar of the Lb. For
the adult Irish speaker, a word final [Vst]
#
sequence is well-formed, yet the word “toast”
containing such a sequence has been “repaired” via epenthesis. A theory of LWA that posits that
epenthesis happens not only to conform to the phonotactic properties of the Lb, and/or for ease-
of-articulation, but in order to enhance perceptibility via increasing the strength, quality, and
number of perceptual cues, explains this phenomenon in Irish, as well as others.
It is important to note that the terms “epenthesis” and “deletion” and “non-
adaptation/retention” used in this study, when describing LWA, are fundamentally different than
how such terms are used in the analysis of native phonology. In native phonological analysis,
“epenthesis” describes an active/online grammatical process. Different from this is the term
“epenthesis” in LWA, which describes an observed difference between a lexical entry in the Lb
and a foreign word from which that lexical entry came.
Consider for example English plural morphology. The English plural morpheme /-z/ is
attached to singular nouns to make them plural: “dog” /dag/ + /-z/ ! [dagz]. However, when a
word ends with a sibilant, an epenthetic vowel is inserted in between the sibilant and the plural
morpheme: “dish” /dIS/ + /-z/ ! [dIS´z]. In most theories of phonology, this is an active
process done online
22
when a speaker when a speaker concatenates morphemes.
However, I argue that this is not necessarily true with loanword phonology. Consider
“epenthesis” in a Yoruba word for “brick,” borrowed from English: [b®Ik] ! [biriki] (Salami
1972). This sort of epenthesis (and epenthesis in LWs in general) is not an active, online process
in the way that epenthesis in the plural formation of English “dish” is. It could be that at one
point in the history of Yoruba, when Yoruba speakers were talking to English speakers and other
Yoruba speakers about bricks, it was. However in present-day Yoruba, it is lexicalized; the first
and third [i] in [biriki] are regular, lexical vowels, just like the second vowel is, and other [i]’s in
Yoruba. Although it is impossible to know the precise circumstances under which this word was
borrowed from English into Yoruba, the claim made here is that epenthesis in LWs is sometimes
(but not necessarily always) an epiphenomenon of hyper-articulation. In the example of [b®Ik]
22
This approach to native phonology has empirical validity in the “wug” test (Berko 1958); English speakers
behave the same whether a word is real or nonce.
68
becoming [biriki], speakers of Yoruba were unfamiliar with this foreign (English) word. Per
above, enforcement of ease-of-perception becomes stronger due to the unfamiliarity of the word.
Hyper-articulation is one way of increasing ease-of-perception. Hyper-articulated, the gestures
involved in producing the [b] and the [r] are spread out in time, resulting in a vowel sound
between the two consonants
23
. Likewise, the hyper-articulation affects the [k], resulting in the
release of the word-final consonant, which is then lexical as a regular/lexical vowel once Yoruba
speakers are sufficiently familiar with the word. In this example, the epenthesis into the
consonant cluster [br] in the English is not necessarily avoided for phonotactic constraints of the
Lb; it happens due to hyper-articulation, which is caused by the need to increase the
perceptibility of a foreign word
24
. This claim and its implications are further developed and
explored in the following sections.
3.2 THE CUE HYPOTHESIS. As seen in the loanword data, both across and within
languages, a robust pattern exists for how languages adapt sequences of sounds in foreign words.
Except for two languages (Saramaccan and Yaqui), all languages showed a tendency to adapt
foreign sequences of sounds faithfully, and when faithful adaptation does not occur, epenthesis is
by far the preferred repair over deletion. Cross-linguistically, there seems to be a procedure that
speakers follow when adapting foreign words into their native language. This procedure can be
described by two statements:
(17) The LWA Procedure
a. Adapt a loanword as faithfully as possible.
b. When change is made, use epenthesis.
It is argued that this pattern, which adequately describes general observational trends in LWA, is
due to two well-formedness conditions that speakers tend to obey when borrowing a word in
23
In such situations, it may be more accurate to describe the vowel that is inserted in between consonants as an
excrescent vowel (Levin 1987), rather than an epenthetic vowel.
24
A situation that English speakers (or speakers of any other language) may experience in everyday life serves to
further exemplify this. Imagine you are speaking with someone about dinner plans at a restaurant called “Plate”
over a terrible cell phone connection. The hearer does not understand the speaker and expresses confusion as to the
name of the restaurant. So that the speaker is understood, s/he hyper-articulates the name of the restaurant, saying
each consonant slowly and clearly, spreading the gestures out in time, resulting in a pronunciation of the word that
could be transcribed as [p
h
´lei…t
h
´]. To facilitate communication, hyper-articulation occurs, resulting in the
preservation and enhancement of perceptual cues. The argument made here is that this is parallel to how LWA
proceeds in many, if not most cases.
69
from a foreign language
25
. The first well-formedness condition is similar to a faithfulness
constraint to the phonological information in the SW. That is, perceptual cues to phonological
information in the SW must be preserved in the LW.
(18) Preserve Cue: Every perceptual cue in the source word must exist in the loanword.
This condition accounts for both parts of the LWA pattern above: for 17a, when a loanword is
adapted completely faithfully, every perceptual cue in the SW is present in the LW; all of the
information is preserved. Likewise, when a loanword employs epenthesis to repair foreign
sound sequences, all of the segments (thus information/cues) of the SW are preserved, and so
Preserve Cue (PC) is also satisfied: all of the information from the SW are there, but with the
addition of epenthetic vowels.
However, Preserve Cue alone ceases to explain certain phenomena in LWA. Again we
return to the example from Irish. A faithful borrowing of the English word “toast” as *[tost] in
Irish satisfies the PC condition, as all of the cues in the SW exist in the LW. This phenomenon
of gratuitous
26
epenthesis is fairly common in LW data. Languages that allow (marked)
sequences of sounds/consonants in their native phonology nonetheless commonly repair similar
or identical sequences in LWs.
A robust case in point comes from Gawwada (Tosco 2009). Gawwada allows coda
consonants, and has plenty in its native vocabulary, shown in 19, but nonetheless repairs final
codas in LWs, shown in 20.
25
It should be noted that “well-formedness” here is not the same as it is in Optimality Theory, i.e. the antithesis of
Markedness. Here it means something akin to being felicitous, or being “a well formed loanword.”
26
“Gratuitous epenthesis” means epenthesis that is not driven by the Lb phonotactic constraints.
70
(19) Native Gawwada Words: Coda consonants retained (Tosco 2009)
a. [Îil] ‘to burn’
b. [hok] ‘to light’
c. [/imas] ‘to extinguish’
(20) Gawwada Loanwords: Coda Consonants avoided (All examples borrowed from
Amharic; Tosco 2009)
a. [kinin] ! [kinine] ‘the medicine
27
’
b. [k´prit] ! [kipra:te] ‘the match’
c. [kis] ! [ki:se] ‘the pocket’
This cannot be explained by a Coda Condition: sonorants (a), stops (b), and fricatives (c)
all show this native word-loanword asymmetry. In fact, for the 32 loanwords borrowed from a
SW containing a coda, 31 show epenthesis, one shows deletion, and none show Retention. If one
only had access to these data for Gawwada, one would, perhaps justifiably so, incorrectly predict
that Gawwada disallows coda consonants in native words.
This phenomenon is not limited to codas. The “repair when you don’t have to”
phenomenon happens frequently for consonant clusters in many languages as well, occurring in
LWs for a language that permits analogous consonant clusters in native words. In the corpus for
Irish, 11 loanwords with a final consonant cluster show epenthesis, 14 show retention, and 1
show deletion.
Georgian provides another example of frequent epenthesis involving consonant clusters
in LWs, where identical environments in native words permit such consonant clusters. Native
Georgian words allow consonant clusters of all types, such as [kedavt] ‘see,’ [tsdilobs] ‘trying,’
yet repairs final consonant clusters in loanwords: [pœ®´daks] ! [paradoksi] ‘paradox,’ and
perhaps most tellingly, does so at an astonishingly high rate. All 15 observations of adaptation
of final consonant clusters in Georgian were observations of epenthesis; no word-final consonant
clusters were retained faithfully. Georgian speakers can pronounce word-final obstruent-
obstruent sequences, as evidenced by their presence in native Georgian words, yet they repair
similar sequences in LWs by epenthesis. Identical examples can be found in a wide spectrum of
languages. Epenthesis happens to sequences of sounds not due the native phonotactic properties
of the Lb; something else compels epenthesis.
27
An Amharic LW from English: “quinine.”
71
It is argued here that such cases of epenthesis are compelled not by the native phonology
of the Lb, but by a need to strengthen the weak cues of the consonants the SW. Epenthesis adds
cues to consonant place and manner, it also makes such cues more audible and more perceptible.
That is, epenthesis enhances the perceptibility of consonants SW. Consider three examples of
epenthesis in word-final consonant clusters, shown in 21.
(21) Epenthesis in word-Final consonant clusters
a. Hausa: [ba©r] ! [bahar] From Arabic: ‘the sea’
b. Swahili: [saks] ! [saksi] From English: “socks”
c. Hausa: [sIlk] ! [siliki] From English: ‘silk’
The perceptual cues for word-final consonants in the source words are relatively weak.
The perceptual cues for consonants are more robust for CV sequences than for VC sequences.
The formant transitions, release burst, and other acoustic properties are more audible, and thus
more perceptible, when a consonant is followed by a vowel than when a consonant is not
followed by a vowel. Similarly, the second consonant in a VC
1
C
2
sequence is even perceptually
weaker, lacking vowel formant transitions, and often, release information. C
2
only has its own
internal cues, and relatively weak C-C cues for it be perceived. In 21a, the cues for both the
consonants in the final cluster are enhanced, as they increase in number and robustness. The
fricative is made more perceptible. The LWA process changes the SW from a state where this
sound has its own internal cues and VC and CC transition cues, to a state where it has CV cues,
which are the most robust, auditory noticeable, and thus perceptible, of all possible sequences for
consonants and vowels. Likewise, the perceptual cues of the [r] are enhanced, adding VC cues
to the cues it already has.
The adaptations in 21b and 21c also serve to enhance the perceptibility of the consonants
in the clusters. In 21b, the final [s] goes from having no consonant-to-vowel formant transitions,
to having CV formant transition cues. And in 21c, the perceptibility of the consonants is
enhanced the most
28
. The VCC sequence becomes a VCVCV sequence: “The alternating
28
A plausible explanation for why “double epenthesis” happens in 21c but not in 21a is also
perceptually based: approximants such as [r] have stronger internal cues than stops such as [k],
and so the need to enhance the perceptibility is as strong. Thus, in “silk” there is a greater need
to enhance the cues of [k], thus resulting in double epenthesis. In other words, double epenthesis
in Hausa’s borrowing of “silk” is not because of the prohibition of codas, but to enhance the cues
of the voiceless stop consonant.
72
consonant and vowel (CVCV) pattern… is the best in terms of sheer number and redundancy of
cues in the signal. At each transition from vowel to consonant there are numerous cues to both
the vowel’s quality and the consonant’s place, manner, and voicing” (Wright 2004; 49).
When adapting sequences of sounds in foreign words, speakers do indeed epenthesize
due to the phonotactics of their native grammar; this explains cases such as the epenthetic schwa
in the English pronunciation of the Polish city “Gdansk” [g´dœnsk]. However, the claim
advanced here is that articulatory difficulty is not the only reason for epenthesis in LWA. When
adapting foreign consonant clusters (and codas), speakers also epenthesize to enhance the
perceptibility of the SW.
(22) Enhance Perceptibility: Consonants in the SW must be made as perceptible as
possible
As consonants are perceived via perceptual cues, the perceptual cues of the SW are manipulated
to satisfy this condition. Specifically, vowels are inserted before and/or after consonants, which
has the effect of making the perceptual cues more robust (and/or numerous), thereby enhancing
the perceptibility of the consonant.
The proposition explaining the observational patterns is that Enhance Perceptibility (EP)
and Preserve Cue (PC) are highly prioritized in LWA, playing a key role in the shaping of the
patterns observed in the cross-linguistic LW data
29
. Preserve Cue (PC) favors non-adaptation,
which is by far the most common observation in LWs. PC likewise favors epenthesis over
deletion, as epenthesis satisfies PC, whereas deletion violates it. Enhance Perceptibility (EP)
favors the epenthesis preference as well, in addition to accounting for cases of gratuitous
epenthesis, i.e., epenthesis that is not motivated by well-formedness conditions of the borrowing
language (Lb).
PC and EP are not absolute imperatives, but highly prioritized conditions for what should
happen in order to make the SW as perceptible as possible, in order to enhance the probability of
communicative success. In other words, PC and EP are not necessarily required for
communicative success; communicative success may still be possible without satisfying the PC
and EP conditions. The claim advanced here is that the distribution and patterning of LWs in
this study are due the perceptual conditions PC and EP acting in combination with established
29
Why PC and EP are prioritized in LWA phonology is discussed in more detail in Chapter 5 (§5.2.1), where I
argue that gestural clock slowing (Byrd and Saltzman 2003) is an important factor in LWA phonology.
73
properties of grammar/phonology, i.e. faithfulness and responses to phonotactic markedness.
The effect of EP, that is, epenthesis taking place to enhance the perceptibility of consonants, is
limited by faithfulness to the SW (Kenstowicz 2005). Ubiquitous epenthesis in LWs may cause
a perceptual departure from the SW sufficient enough to impede upon comprehension. Although
EP mandates change to the SW, too much change in the LW may obscure comprehension
30
.
Likewise, some cases of epenthesis and deletion in LWs are responses to phonotactic
markedness. To produce a phonotactically marked sequence requires articulatory effort; speakers
may avoid a phonotactically marked sequence by epenthesizing into a consonant cluster, e.g. the
English pronunciation of “Gdansk” as [g´dœnsk]. Additionally, cases of deletion may also be a
response to phonotactic markedness. Speakers may disregard perceptual considerations and
minimize articulatory difficulty by being “lazy” (Kirchner 1997; 2004), failing to preserve a
consonant in the SW when adapting a LW.
I assume that the process of integrating a foreign word (SW) into the lexicon of a Lb –
that is, the creation of a LW – is a process that is done by different speakers at different times
and for different reasons. Thus, the description above is not, nor is it meant to be, a description
of grammar, the grammar any given language, or a grammar possessed by a speaker of any given
language. To intertwine all of the factors above into a coherent system of grammar, if possible,
would be at best an oversimplification, and at worse incapable of explaining the trends observed
in the typology of LWA. It is acknowledged that the hypothesis put forth here has little
explanatory (and/or predictive) power for individual loanwords. Rather, it is a hypothesis of how
speakers tend to behave when adapting LWs: speakers tend to Preserve Cues and Enhance
Perceptibility in LWA, but may not always do so. This tendency is manifested in cross-linguistic
and within-language patterns of LWA. The claim put forth here is that the tendency to adapt
LWs a certain way results in observational patterns, and such patterns tend to be a function of the
presence and/or robustness of perceptual cues. This is further explicated below in sections §3.3
and §3.4, as well as in Chapter 5.
This hypothesis, called the “Cue Hypothesis,” where phonotactic markedness,
faithfulness to the SW, and the two conditions PC and EP determine how a word in the Source
Language (Ls) is adapted into Borrowing Language (Lb) makes specific predictions about where
30
This is analogous to the Hawaiian borrowing [mele kalikimaka], discussed previously, where the SW (or source
phrase) “Merry Christmas” has undergone so much change as to not be recognized.
74
and when epenthesis, deletion and retention are likely to occur. These predictions are spelled out
in subsequent sections, and then tested against the data in the cross-linguistic corpus.
3.3 PERCEPTUAL CUES AND THE PREDICTIONS OF THE CUE HYPOTHESIS. The Cue
Hypothesis posits the existence of two conditions that influence how speakers of a language Lb
borrow words from other languages containing marked syllable structure, that is, consonant
clusters and codas. These conditions are derived from the necessity to maximize the probability
of successful comprehension of a foreign word when it is first used by speakers of the Lb. The
probability of successful comprehension is increased by preserving the cues to phonological
information in the SW (Preserve Cue), and making cues to the phonological information in the
SW more easily perceived (Enhance Perceptibility). PC and EP are of course not the sole
factors that determine how speakers adapt foreign words into their native language. Deletion
still happens – cues are not always preserved. And epenthesis doesn’t always happen –
consonants aren’t always made more perceptible by the augmentation of their perceptual cues.
Yet because communicative success is at stake, the PC and EP conditions on LWA are more
important for certain sounds and sequences of sounds. Perceptually weak consonants and
consonants in perceptually weak positions benefit the most from EP. Conversely, consonants
that are perceptually strongest contribute the most information used for comprehension;
preserving their perceptual cues, as called for by PC, is more necessary than perceptually weak
consonants. In other words, deleting a perceptually strong consonant is more likely to impede
communicative success than deleting perceptually weaker consonants. This proposal is captured
by adding the following addendums to the definitions of the PC and EP conditions
(23) Preserve Cue: Every perceptual cue in the SW must exist in the loanword
a) The perceptually stronger a consonant is, the more important it is to preserve its
cues
(24) Enhance Perceptibility: Consonants in the SW must be made as perceptible as
possible
a) The perceptually weaker a consonant is, the more important it is to enhance its
perceptibility
The relative perceptibility of consonants is determined by the strength, quality, and number of
their perceptual cues. Thus, like PC, EP is sensitive to the perceptual cues in the SW. PC and
75
EP can thus be thought of as inverses of one another; both are sensitive to the robustness of cues,
but in opposite directions. I argue that this hypothesis is crucial for understanding and explaining
not only the distribution of language types, but the distribution of process type – that is,
Retention, Epenthesis, and Deletion – within a language. To proceed, we just first consider what
makes a cue strong or weak.
3.3.1 PERCEPTUAL CUES AND CUE ROBUSTNESS. The quality of consonants – that is, their
place of articulation, manner of articulation, and voicing – is interpreted by the mind based on
perceptual cues in the acoustic signal. Consonants have cues as to their quality in and of
themselves, independently of context; these are known as internal perceptual cues. Perceptual
cues for consonants may also depend on the context/environment. For example a consonant that
precedes a vowel is more audible than the same consonant that precedes another consonant,
especially so for stops and certain fricatives (Wright 2004). It is acknowledged that internal
perceptual cues are indeed significant, and most certainly as well for LWA. However, the focus
of this chapter will be the context-dependent perceptual cues of consonants, i.e. the relative
strength and robustness of perceptual cues as a function of position within the syllable and
word
31
.
Internal cues aside, the perceptibility of consonants, as determined by the strength,
quality, and number of perceptual cues, is a function their environment. For example consonants
in syllable onset position, (a CV sequence), have stronger, more audible cues than identical
consonants in coda position, i.e. a VC sequence. The release of a consonant gesture into a vowel
in a CV sequence amplifies the perceptual cues of the consonant, especially for stops, fricatives,
and affricates. The closure from a vowel into a consonant – that is, a VC sequence, has no such
auditory boost; the perceptual cues of a consonant followed by a vowel are thus inherently
stronger than those for consonants not followed by a vowel (Wright 2004). The
robustness/strength of perceptual cues can perhaps be quantified, but what matters for the sake of
the current analysis is relative strength. Relatively, the perceptual cues in a CV sequence are
stronger than the perceptual cues in a VC sequence
CV and VC sequences can be translated to where they appear within the syllable and
word. A consonant in a CV sequence is typically part of the syllable onset, and a consonant in a
31
The role of internal (context-free) perceptual cues in LWA will be addressed in Chapter 4
76
VC sequence at the end of a word is typically part of the syllable coda. Perceptual cues in
syllable onset position are thus stronger than those in coda position. As explained by Wright,
“…in any but the best listening environments, they [coda transitions] will stand a poorer chance
of reliably transmitting information about a particular contrast” (2004; 46).
A consonant that is flanked by vowels (that is, a VCV sequence) is in a context where its
cues are most robust, as it contains VC and CV formant transitions, cues that are essential for
manner and place information. Additionally, consonants in VCV context are adjacent to vowel
sounds, which are articulated with a relatively wider vocal tract. The wider the vocal tract is, the
louder the acoustic signal is; this greater amplitude of the acoustic signal results in stronger cues
(thus more perceptible consonants). For these reasons, consonants in a VCV context – i.e. word-
medial position, when a word is spoken in isolation – are perceptually the strongest, as their cues
are robust and numerous.
The focus of the study in this chapter is on four different types of sound sequences:
word-initial consonant clusters, word-medial consonant-consonant sequences, word-final
consonant clusters, and word-final codas. Each of these sequences varies in terms of the relative
perceptibility of the consonants involved, due to the availability, strength, and quality of acoustic
cues, described as follows.
First, consider word initial consonant clusters and word final consonant clusters. Per The
Sonority Sequencing Principle (Jesperson 1904; Hooper 1976; Selkirk 1984, Blevins 1995),
sounds closer to the vowel nucleus of a syllable are generally higher in sonority than sounds
farther away from the syllable nucleus. In a C
1
C
2
VC
3
C
4
sequence, C
1
and C
4
are expected to be
less sonorous consonants than C
2
and C
3
, respectively.
32
The edge-most consonants are thus
more likely to be stops and fricatives, and the vowel-adjacent consonants are more likely to be
glides, liquids, nasals, or approximants. To demonstrate this, consider the word “plant” [plœnt].
Though both /p/ and /t/ are voiceless stop consonants, the perceptual cues of the /p/ in this word
are more robust than those of /t/. As described above, CV sequences amplify the perceptual cues
of the consonant, as the consonant is released into a sonorous sound. In [plœnt], the /p/ is
likewise released into a sonorous sound. The amplification of its cues is not as great as if it were
32
There are exceptions to this tendency, most notably for sibilant fricatives, e.g. /s/, /S/, which commonly occur
edgemost in such positions. In a word s+stop sequence, such as the word “stop”, the sibilant is more sonorant than
the stop.
77
released into a vowel, but the amplification is there, nonetheless. The /t/ receives no such
amplification; the tongue tip gesture for the /t/ may not even be released at all.
Likewise, sonorants such as /n/ and /l/ may also vary in terms of the relative robustness of
cues as a function their location within the syllable/word. Consider the monosyllabic nonce
word [plapl], where the first /l/ and the second /l/ differ in their relative perceptibility due to the
cues available to them. The first /l/ is followed by a vowel, a position that amplifies the cues for
consonants (albeit more robustly so for obstruents than for sonorants), where the second /l/ is
preceded by a consonant, and not adjacent to a vowel. Describing a sequence of [pla], Wright
states “This ordering ensures that information about the liquid is not lost as result of overlap (as
might be the case if it were to precede the stop), since a portion of it overlaps with the following
vowel, and it creates a signal in which any one segment is distributed redundantly throughout the
signal… ” (2004; 49). The ordering of the /l/ and then the /a/, i.e. [pla] or [plœ], insures that
information about the sonorant (i.e. the first /l/) will show up on the vowel, thus increasing its
perceptibility. However, the ordering of the second /l/ after the obstruent /p/ [apl]" lacks such
insurance. The cues for the second /l/ are not distributed throughout the vowel signal in the same
way as they are for first /l/ in [plœpl]. The second (word-final) /l/ here is therefore relatively less
perceptible than the first (word-intial) /l/, despite the fact that both sounds are sonorants. Despite
the segments involved, word-position plays an important role in determining the robustness of
perceptual cues, where consonants in word-initial position are more strongly cued.
Perhaps even more so relevant to the relative robustness of perceptual cues is the fact that
consonants and consonant clusters in onset position are usually longer in duration than
consonants and consonant clusters in coda position (Byrd 1994; Byrd et al. 2005; Keating et al.
2003). The cues for consonants in onset position are thus perceptually stronger: because they
exist for a longer during in time, they are more audible. These metrics for assessing the
robustness of cues have direct implications for the current study, which are expressed below.
3.3.2 CUE ROBUSTNESS AS A FUNCTION OF WORD POSITION. To summarize the
information above and to place it in context with the current study, consider the following figure,
an illustration of the English word “crystals.”
78
FIGURE 8. Diagram of “crystals”
The consonant clusters in this word all vary in terms of their relative robustness of their
perceptual cues. The cues are weakest for the consonant cluster at the end of the word, that is
/lz/, C
5
and C
6
. Excluding internal cues, C
5
only has VC cues which are relatively weak, and C
6
has no vowel formant transition cues, nor does it have any cues that are normally present in the
release of a consonant. In general, consonants in coda position, and by extension consonant
clusters in coda position, have relatively weak cues
33
.
Stronger are the cues for /k®/ that is, C
1
and C
2
. With this specific example, the stop
consonant /k/ has release cues into the approximant /®/, and the /®/ has cues that show up in V
1
.
Due to gestural overlap, both consonants are in a position where their cues are strengthened, that
is, syllable onset position. Abstracting away from the specific example in English, cues in such
word-initial C
1
C
2
V
1
position are likely in general to be relatively robust, as C
2
is likely to be a
sonorant, which is likely to contain the C
1
’s release cues
34
.
The position where cues are strongest is the word medial position. C
3
has VC formant
transition cues. C
4
has CV formant transition cues: the consonantal gesture is released into V
2
.
Because of this, word medial consonant clusters have the strongest, most perceptually robust
cues, and the largest number of cues. Perhaps most crucially, word-medial consonant clusters
are flanked by loud vowel
35
sounds, whereas word-initial clusters and word-final clusters (at
least ones spoken in isolation) are not.
The relative robustness of perceptual cues of consonants in consonant clusters is thus a
function of the consonant cluster’s position within the word. Word-final consonant clusters have
33
As previously noted, the investigation in this chapter is limited to context-dependent cues, i.e. cues derived from
word position. Internal cues are not investigated because data in the cross-linguistic corpus were not sensitive to
this. The role of internal cues thus remains open for future investigation
34
C
1
may not always be an obstruent, but the generality holds, per sonority sequencing.
35
Excluding cases of a non-vocalic nucleus.
79
the weakest cues, and word-medial consonant clusters, due to the facts discussed above, can be
considered to have the strongest perceptual cues. Word-initial consonant clusters are then
somewhere in between.
Recall the claims above regarding the well-formedness conditions on LWA. Preserve
Cue requires that perceptual cues in the SW be retained/preserved in the LW. The stronger
perceptual cues for a consonant are, the more significant the consonant is for comprehensive
(and communicative success). And the more significant the information is for comprehension,
the more important it is to preserve that information in the LW. In other words, it is worse to
delete a consonant in a perceptually strong position than it is to delete a consonant in a
perceptually weak position. Deleting a consonant in word-final position is not as bad as deleting
a consonant in word-medial or word-initial position. Thus, the PC condition to preserve cues
(and thus the consonants) is most likely to be satisfied in word-medial position, as this is where
cues are strongest, and least likely to be satisfied in word-final position, where cues for
consonants are weakest.
Enhance Perceptibility is similar but the inverse. The weaker the perceptual cues of a
consonant are, the more necessary it is to enhance the perceptibility of that consonant. Applied
to consonant clusters in different word-positions, Enhance Perceptibility is most necessary in
word-final consonant clusters; its effect (i.e. compelling epenthesis) is therefore most likely
word-finally. In the following section, these claims are translated into predictions, which are
then tested against data in the corpus.
3.3.3 PREDICTIONS OF THE CUE HYPOTHESIS. Preserve Cue demands that consonants and
their perceptual cues in the SW be present in the LW. Similar to a constraint in Optimality
Theory, PC is not an absolute imperative that borrowers must follow to no exception. It is
indeed violable, but its violation is more severe in certain situations. To maximize the
probability of communicative success, phonological information, as signaled by perceptual cues,
should be preserved in loanwords, and the more salient the information is in the SW, the more
important it is for comprehension. According to the Cue Hypothesis, then, the more robust a
consonant’s cues are, the more likely it is to be preserved. Conversely with Enhance
Perceptibility, consonants whose cues are few and/or weak require enhancement more than
consonants whose cues are numerous and strong.
80
To say that all LWA data is determined by the two perceptual cue imperatives would be
both a gross oversimplification and inaccurate. As argued above, EP compels epenthesis in
LWs, but it is not the sole factor for compelling epenthesis. Other aspects of the phonology are
indeed relevant. PC and EP are not absolutes. There are indeed word medial clusters that delete
consonants (Arabic [Sam?a] ! Kanuri [Same] ‘the candle’; Löhr and Wolff 2009), and there are
word-final clusters that do not epenthesize, thus failing to enhance the perceptibility of the
relatively weak consonants (“asphalt” ! [asfalt], Azerbaijani). The predictions that the Cue
hypothesis makes are not about individual loanwords, but about statistical trends in loanwords.
In other words, not every speaker who borrows a word containing a foreign sound sequence is
going to adapt that word in a way that satisfies PC and EP, in the same way, every time.
Faithfulness to the SW does indeed play a role, which is why epenthesis is not observed in every
single consonant cluster in every single LW. Likewise, phonotactic markedness is undoubtedly
involved as well, compelling simplification of consonant clusters by deletion (as well as some
cases of epenthesis). However, the Cue Hypothesis predicts that epenthesis and deletion are
more likely to occur in certain positions than others. Thus, the prediction that the Cue
Hypothesis makes is that, over a wide range of data, such as the one in the loanword corpus
investigated in this study, there will be statistical trends that fall out from PC and EC. These are
listed below.
(25) Predictions of the Cue Hypothesis
a) Deletion is most likely in word-final consonant clusters, and least likely in
word-medial consonant clusters
b) Retention is most likely in word-medial consonant clusters, and least likely in
word-final consonant clusters.
c) Epenthesis is most likely in word-final clusters, and least likely in word-medial
clusters.
The likelihood of adaptation is also expected to be sensitive to the segments involved, not just
their position within the word. However, as mentioned above, this factor is not included in the
analysis. These predictions in 25 are quite general, however, more specific and testable
predictions fall out from these.
81
(26) Specific Predictions of the Cue Hypothesis
I. Prediction 1: a) Cross-linguistically, there will be more observations of
deletion in word-final consonant clusters than in word-initial consonant clusters,
and b) there will be more observations of deletion in word-initial consonant
clusters than in word-medial clusters.
II. Prediction 2: a) Cross-linguistically, there will be more observations of
epenthesis in word final consonant clusters than in word-initial consonant
clusters, and b) there will be more observations of epenthesis in word initial
consonant clusters than in word medial consonant clusters.
These predictions will be tested. Whether these predictions are confirmed or rejected may not
necessarily be informative, however. This is because the word-position data are uneven. There
are 3,244 total observations of processes involving word-initial consonant clusters, 4,976 total
observations of processes involving word-medial clusters, and 1,994 involving word-final
clusters. Given this, one would predict that there would be the least observations of anything
word-finally, and most observations of anything word-medial, simply because there are more
observations word-medially, and less observations word-finally.
Thus, what is crucial to testing the predictions of the Cue Hypothesis is to examine the
ratio of process occurrence. Five different variables were created based on ratios between types
of observation counts in the data. These have the effect of both being able to test the predictions
made by the Cue Hypothesis, and to cancel out any numerical biases in terms of one independent
variable (that is, word position), versus another. These ratio values/variables each make a
specific prediction. These are given and explained as follows, where R = Retention, E =
Epenthesis, and D = Deletion, A = Adaptation, and T = Total.
82
Shorthand Ratio Definition
R:A Retention to Adaptation # of R observations
(# of E obs. + # of D obs.)
R:E Retention to Epenthesis # of R observations
# of E observations
R:D Retention to Deletion # of R observations
# of D observations
E:T Percentage of Epenthesis # of E observations
# of T observations
D:T Percentage of Deletion # of D observations
# of T observations
TABLE 23. Ratio Variables
Each of these ratios/values is calculated for each word-position, initial, medial, and final. The
Cue Hypothesis makes predictions about the distribution of these values, both cross-linguistically
and within languages.
(27) Ratio-based predictions to be tested
Prediction 3: The Retention to Adaptation value will be largest word medially,
and smallest word-finally
Prediction 4: The Retention to Epenthesis value will be largest word medially,
and smallest word finally.
Prediction 5: The Retention to Deletion value will be largest word medially, and
smallest word finally
Prediction 6: The percentage of Epenthesis tokens will be largest word-finally,
and smallest word-medially
Prediction 7: The percentage of Deletion tokens will be largest word-finally, and
smallest word-medially
Crucial to understanding these predictions is to the fact that Preserve Cue and Enhance
Perceptibility are in conflict with other phonological factors. PC is in conflict with phonotactic
markedness. Phonotactic markedness (or syllable well-formedness) demands that consonant
clusters be simplified, but doing so (with deletion) would fail to preserve phonological
information in the SW. As stated above, PC is strongest where consonants are perceptually
strongest. Thus, cluster simplification via deletion – compelled by phonotactic markedness – is
83
most intolerable word medially, where consonants/cues are strongest, and least intolerable word-
finally, where consonants are perceptually weakest.
Enhance Perceptibility in conflict with faithfulness to the SW. Because LWA necessarily
involves a situation in which speakers are communicating using a foreign/unfamiliar word,
inserting epenthetic vowels in every position would decrease the probability of successful
communication; it would make the unfamiliar word even more unfamiliar. Gratuitous epenthesis
would result in a form that weakly resembles the foreign/unfamiliar SW. So as to not impede
upon communicative success, speakers of Lb pronounce the foreign word as it is pronounced in
Ls; making changes to the SW, such as inserting vowels that do not exist in the SW, is
undesirable
36
.
However, changing the SW may indeed be desirable if it increases the probability of
communicative success. Epenthesis to enhance the perceptibility of weak consonants (which is a
change that is made to the SW, thus violating faithfulness) can indeed increase the probability of
communicative success. It is thus predicted that the faithfulness violation that happens when a
vowel is inserted into a consonant cluster is least intolerable word-finally, and most intolerable
word-medially.
Following from this, it is predicted that Retention will be most likely word-medially, and
that both Epenthesis and Deletion will be most likely word-finally. That is, there is predicted to
be a higher number of observations of Retention as compared to other processes in word medial
position. The number of Retention observations divided by the number of adaptation
observations is an indicator of this, as the higher the number in the numerator, and the lower the
number in the denominator, the higher the value will be for this variable. Consider the following
table, representing data from a hypothetical language, intentionally constructed to verify these
hypotheses.
36
This proposition not only has the merit of explaining the Retention bias seen in Chapter 2, it explains non-
structure preserving Retention bias, that is, the languages that retain consonant clusters in LWs even though
consonant clusters are prohibited in native words, e.g. Swahili, Sakha, and Imbabura Quechua.
84
Observation Word-Medial Word-Final
Retention 20 30
Epenthesis 3 20
Deletion 2 5
Table 24. Hypothetical Language Data
In this toy language are 20 observations of Retention, 3 observations of Epenthesis, and 2
observations of Deletion word medially. The value for Retention-to-Adaptation word-medially
is thus 20/(3+2) = 4. Word finally there are 30 observations of Retention, 10 observations of
Epenthesis, and 5 observations of Deletion. The value here is thus 30/(10+5) = 2. This would
confirm the prediction made by PC, as the value is higher word-medially than word-finally; that
is, the ratio of Retention-to-Adaptation is highest in word-medial position, and lowest in word-
final position. This is the rationale behind Prediction 3.
Prediction 4 and 5 also test the Cue Hypothesis and have similar rationale. They are
designed to test the difference, and perhaps relative strengths, of PC and EP, as Deletion violates
PC and EP, whereas PC is vacuously satisfied when EC is satisfied. Considering the same toy
values as above, the value for Retention-to-Epenthesis is 20/3 = 6.666 word medially, and 30/10
= 3 word-finally. For Retention to Epenthesis, the value is 20/2 = 10 word-medially, and 30/5 =
6 word-medially.
Predictions 6 and 7 look at percentage, i.e. the amount of observations of a single process
compared to the total amount of process observations. The Cue Hypothesis predicts that
repair/adaptation is most likely to happen word-finally, as consonants are weakest word-finally.
Therefore, per PC, deleting word-final consonants is least intolerable, and per EP, epenthesizing
in word-final position is most desirable. These two measures are the inverses of the three
predictions above. Where Retention-to-Adaptation is predicted to be highest word medially, and
lowest word-finally, Adaptation-to-Retention (that is, percentage of the occurrence of a repair),
is predicted to be the opposite.
To more fully illustrate these variables and predictions, consider actual data from
Indonesian, given in Table 25.
85
Indonesian Initial CCs Medial CCs Final CCs
Retention 17 182 3
Epenthesis 21 8 12
Deletion 7 6 16
R:A .607 13 .107
R:E .81 25.75 .25
R:D 2.429 30.333 .188
E:T .466 .04 .387
D:T .155 .031 .516
TABLE 25: Indonesian Consonant Clusters
Prediction 3: Confirmed. The value for R:A in is largest for word-medial position, and
smallest for word-final position, with word-initial position in between.
Prediction 4: Confirmed. The value for R:E in is largest for word-medial position, and smallest
for word-final position, with word-initial position in between.
Prediction 5: Confirmed. The value for R:D in is largest for word-medial position, and
smallest for Word-Final position, with word-initial Position in between.
Prediction 6: Partially Confirmed: The percentage of Epenthesis is highest word-initially
(46.6%), but lowest word-medially (4%).
Prediction 7: Confirmed: The percentage of Deletion is highest word finally (51.6%), and
lowest word medially (3.1%)
Below, these hypotheses will be tested both for individual languages, and across the
entire corpus.
3.4 TESTING THE PREDICTIONS. In this section, the above predictions are tested against
the data, for the entire corpus and for individual languages. Each prediction is repeated below
with an explanation of precisely how it was tested (including statistical tests, normalization, etc),
and the results are given and interpreted. The overall results are summarized at the end of this
section.
Predictions 1 and 2 are tested below for normalized totals, using the two methods
described in the Chapter 2 section. However, the Direct Division Normalization was different
here. Because simple coda data was excluded in the present analysis, the total against which
each observation value was scaled was different. This was the sum of the all the consonant
cluster observations (4,788) divided by the number of languages in the corpus (53), that is, 90.34
86
Prediction 1: a) Cross-linguistically, there will be more observations of Deletion in
word-final consonant clusters than in word-initial consonant clusters, and b) there will be more
observations of Deletion in word-initial consonant clusters than word-medial clusters. Result:
For both forms of normalization, this prediction is partially confirmed. The number of
observations of Deletions is larger for Final CCs than it is for Initial CCs. However, Initial CCs
had a lower Deletion count than Medial CCs.
Prediction 2: a) Cross-linguistically, there will be more observations of Epenthesis in
word-final consonant clusters than in word-initial consonant clusters, and b) there will be more
observations of Epenthesis in word-initial consonant clusters than word-medial clusters. Result:
This prediction is confirmed for both forms of normalization. The number of epenthesis
observations is highest in Final CCs, and lowest in Medial CCs. The following two graphs show
this, for Direct Division Normalization and T-Score Scaling Normalization. Included in these
graphs are the values for deletion and retention as well.
87
FIGURE 9. Counts for normalized Data: Direct Division
FIGURE 10. T-value scaling normalization.
88
3.4.1 RATIO PREDICTIONS. As discussed above, the predictions with regard to the ratios
of counts are more informative and meaningful. In order to test these cross-linguistically, the
ratio value for each variable in each language was considered to be a data point in a one-way
ANOVA, with position as the independent variable. If the ANOVA was significant with the
mean values in the predicted direction, this was confirmation of the ratio hypotheses.
For individual languages, Chi-Square tests were done on each ratio variable as though
they were observations, comparing Initial CCs, Medial CCs, and Final CCs. For the languages
that were statistically significant, the distribution of means for the ratios was examined to
confirm individual hypotheses.
Prediction 3: The Retention to Adaptation value will be largest word medially, and
smallest word-finally. Result: This Prediction is confirmed. The mean value across all
languages is largest word medially, second largest word-initially, and smallest word-finally
(p<.0001, F(1, 156) = 12.695).
FIGURE 11. Cross-Language Means, Retention to Adaptation
For within language testing, 30 out of 53 languages were statistically significant for the Chi-
Square test. The means for the significant languages were in the same direction as the general
89
trends, unsurprisingly. A bargraph for the means of this test, and a plot for the ratios of each
language are given in Appendix J.
Of importance is a fact shown in the following boxplot of this data; the third box
represents word-final values for Retention to Adaptation – compared to the first two boxes, the
third box is quite compact, meaning that the values for word-final position across the languages
were very similar. This confirms that what the Cue Hypothesis predicts, that PC and EP are
most active in word-final position, across all languages, resulting in a more even distribution of
the values of Retention, Epenthesis, and Deletion.
FIGURE 12. Boxplot: Retention to Adaptation
Prediction 4: The Retention to Epenthesis value will be largest word medially, and
smallest word finally. Result: This prediction is confirmed: p<.0001, F(1, 156) = 14.893.
90
FIGURE 13. Means: Retention to Epenthesis
For individual languages, the Retention to Epenthesis was significant for 38 out of 53 languages
in the corpus. A plot for each language that was statistically significant is given below. This
figure is quite busy, however, what is important to notice is that the values for final CCs are
mostly low, and the values for word-medial CCs are mostly high. Tracing most of the individual
languages, you will see that the plot goes from a middle position on the y-axis, then up, then all
the way down
37
. This pattern is predicted by the Cue Hypothesis. The bar graph for the mean
values of this figure is given in Appendix J.
37
A few languages do not follow the cross-linguistic trend. For these languages, this was mainly because the
languages showed a combination of Retention and Epenthesis in certain positions, but only Retention or Epenthesis
in others. For example, Japanese and Korean showed Retention, Epenthesis, and Deletion in word-medial CCs and
word-final CCs, but only Epenthesis in word-initial CCs. The value for Retention to Epenthesis for these languages
in word-initial position was thus 0, resulting in a difference in pattern. Other languages that patterned differently
likewise showed such positional asymmetries.
91
FIGURE 14. Individual Languages, Retention to Epenthesis
Prediction 5: The Retention to Deletion value will be largest word medially, and smallest
word finally. Result: This prediction is confirmed: p <.0001, F [1, 156] = 15.031. The Chi-
Square test was significant for 38 out of 53 languages – a bar graph, plot, and boxplot for these
languages is given in Appendix J.
92
FIGURE 15: Retention to Deletion
Prediction 6: The percentage of Epenthesis tokens will be lowest word medially and
highest word finally. Result: This prediction is confirmed, (p = .000013, F[1, 155] = 12.0659).
This was highly significant cross-linguistically, with 48 individual languages showing statistical
significance. The boxplot below shows that the word-medial percentage of epenthesis across
languages is not just low, but it is uniformly low, whereas the percentage of epenthesis varies
much more across languages in the other two positions.
93
FIGURE 16. Cross-linguistic Percentage of Epenthesis
FIGURE 17. Boxplot for Percentage of Epenthesis ANOVA
94
Prediction 7: The Percentage of Deletion tokens will be largest word-finally, and
smallest word-medially. Result: This prediction is not confirmed. The ANOVA is not
significant. However, 28 languages were significant for the Chi-Square test. These 28
languages were included in another ANOVA test with exclusion of the others, but still the
ANOVA was not significant. The 28 languages that were significant partially confirmed the
hypothesis: The word-final percentages were the highest, however, the word-initial percentages
were the lowest. This is shown below, for these 28 languages that were individually significant.
The overall data also showed this trend.
FIGURE 18. Percentage of Deletion for Statistically Significant Languages
3.5 SUMMARY AND CONCLUSION. This chapter started with a claim meant to explain the
typological patterns observed in Chapter 2. When adapting sequences of sounds in foreign
words, the general preference is to retain such sequences in the LW. When a repair happens, that
95
repair is usually epenthesis; deletion is rarely employed. This was seen both in the typology of
process, and the typology of languages according to their LWA repair preference. It was argued
that these patterns were due to the necessity to preserve phonological information in the SW
when adapting it into a LW, a claim further refined to the primacy of perceptual cues in LWA,
called the Cue Hypothesis. This hypothesis makes various predictions as to where and when a
repair should happen, which were given above and then tested.
The predictions laid out above are largely confirmed by the data, with only a few
exceptions: The Cue Hypothesis is verified in the majority. Loanword adaptation of sound
sequences is sensitive to the types of sequence involved, and where the sequence is within the
word. The likelihood of a consonant cluster being repaired in a LW is function of that consonant
cluster’s position within the word. However, word position in and of itself is not what important;
the relative strength of perceptual cues – and thus the relative perceptibility of the consonants –
explains the patterns observed in this chapter.
However, not all of the predictions were confirmed: Prediction 7, which predicted that
the percentage of Deletion tokens will be largest word-finally, and smallest word-medially, was
partially confirmed. The data showed a slightly different pattern than what was predicted.
Although word-final position had the largest number of deletion observations, word-initial
position had the smallest number of deletion observations.
The fact that this prediction was not confirmed is not necessarily detrimental to the Cue
Hypothesis, however. The basis of the Cue Hypothesis is the idea that the preservation of
phonological information is key in explaining how speakers adapt LWs; the more important that
this information is for perception/word-recognition, the more important it is to preserve the
information. Above, the strength, quality, and number of perceptual cues were argued to be the
basis of this information preservation. However, word-position may be equally important.
Consider again Figure 18, which is repeated below.
96
FIGURE 18. Percentage of Deletion for Statistically Significant Languages
This figure illustrates the pattern for languages that had a statistically significant
difference between the three word positions. It shows that the percentage of deletion in
consonant clusters – and by extension, the likelihood of a consonant being deleted – is a function
of position in the word, in a directly linear fashion. Word-initial consonant clusters are least
likely to undergo deletion, and word-final consonant clusters are most likely. If the proposition
that deletion (and non-adaptation and epenthesis) is at least partially determined driven by a
consonant’s significance in perception (or word-recognition) is true, then the pattern in Figure 18
is consistent with this.
Linear word-position is posited to be crucial for word recognition, given the Cohort
Model of word recognition (Marslen-Wilson 1987). In this model, phonemes are processed in
real time, until a word is recognized. For example, in the word “loanword,” at first all words in
the lexicon starting with /l/ are considered as candidates, then all words starting with /lo/, until a
unique point is reached, i.e. /lonw/, narrowing down the possibilities to just one. This implies
that the beginning of word is most important for its comprehension, with decreasing importance
as the word continues in time.
97
Above, the claim was made that foundation for Enhance Perceptibility and Preserve Cue
was to increase the success of communication. It then seems not only possible, but probable, that
these conditions are not just sensitive to the strength, quality, and number of perceptual cues of
consonants in a SW, but a consonant’s location within the SW. In other words, although the
perceptual cues for consonants in word-medial position are strongest, and thus privileged,
consonants in word-initial position may also be privileged (and more so), as they are the most
important for word recognition (and thus communicative success), given the Cohort Model of
word-recognition. In short, although Prediction 7 was not verified, the patterns of deletion
percentage are nonetheless consistent with proposition that maximizing the probability of
communicative success guides the LWA process.
The importance of word-position in LWA is further examined in the following chapter,
which investigates LWA in Tongan as a case study. The Tongan case study investigates the role
of the relative perceptibility of consonants independently of word-position, a factor that was not
addressed in the cross-linguistic study. Additionally, the claim was made above that EP and PC
are not the sole factors that determine how speakers proceed in LWA, that other factors (such as
phonotactic markedness) are indeed involved as well. The Tongan case study that follows
further illustrates this, providing a more complete investigation of the loanword adaptation of
sound sequences.
98
4 CASE STUDY: SOUND SEQUENCE ADAPTATION IN TONGAN. This chapter provides an in-
depth case study of sound sequence adaptation in Tongan, an Austronesian language spoken in
the South Pacific nation of Tonga. One of the central claims of this dissertation is that loanword
adaptation (LWA) is driven by four factors, Preserve Cue (PC) and Enhance Perceptibility (EP),
phonotactic markedness, and faithfulness to the SW. If this is true, then the effect one or more of
these should be seen in every language. Here we investigate to what extent PC, EP, and
markedness are responsible for the patterns of LWA in Tongan.
Section §1 explains why an in-depth case study is warranted on theoretical and meta-
theoretical (methodological) levels. This section also explains why Tongan is an ideal language
for such a case study, likewise providing facts about Tongan necessary for the analysis. Section
§2 investigates sound sequence adaptation in the language, examining the data against
predictions made by the Cue Hypothesis (CH). The focus of this section is mainly the same as in
the previous chapter: that is, the predictions the Cue Hypothesis makes with respect to a
consonant cluster’s location within the word. The Cue Hypothesis can be evaluated at an even
more fine-grained level, or lower level in the phonological hierarchy. That is, the CH makes
predictions about specific consonants and classes of consonants in LWA; these are explored and
evaluated in Section §3. The concluding Section §4 connects the findings of this chapter with
those of previous chapters, and explores what these connections imply for the nature of LWA.
4.1 TONGAN AS A CASE STUDY. Tongan is an Austronesian language; this language
family tends to not allow marked syllable structure (Clark 1990). This prohibition of marked
syllable structure seems to be true for LWA as well, as demonstrated above in the cross-
linguistic study for Hawaiian and other languages. The preferred syllable shape for these
languages is (C)V. However, Tongan is unique in that this is not merely a preference: it is an
absolute. Tongan syllables are obligatorily CV(V); syllables lacking an orthographic onset
consonant are pronounced with a glottal stop in onset position (Feldman 1978). Unlike the
languages in the cross-linguistic corpus, Tongan does not show Retention of foreign consonant
clusters and codas; it repairs all consonant clusters and codas through epenthesis, deletion, or
substitution. The prohibition of marked syllable structure in Tongan is so robust that even
homorganic NC clusters are not retained.
99
(28) Tongan Repair of homorganic NC clusters
a. “hemp” ! [hemipa]
b. “bond-store” ! [ponite]
c. “bank” ! [baNike]
In parallel to its limited syllable structure, the segment inventory of Tongan is also quite
limited. Tongan has only the five canonical vowels [a e i o u] and twelve consonants [p t k m n N
f v s l h /] (Feldman 1978). The main reason for why Tongan is ideal for a case study of LWA
is thus obvious: words borrowed into Tongan must go through robust changes to fit Tongan
phonology.
Likewise, the properties of Tongan allow for the discovery of factors involved in LWA
that that occur independently of the Cue Hypothesis. For example, the likelihood of the deletion
of a consonant may not be solely due to the strength of that consonant’s cues: a consonant’s
status in the grammar/inventory of Tongan might also play a role. It seems plausible that
Tongan speakers may be more likely to delete consonants that are not in the Tongan inventory
than they are consonants that are in the inventory, regardless of the perceptual cues involved.
The data from this chapter come from a corpus of 1,476 English loanwords (LWs) in
Tongan, which was given to me by Kie Zuraw. It was constructed by students at UCLA from a
scan/search of an English-Tongan dictionary (Churchward 1959), for an independent project not
related to syllable structure adaptation. From this corpus, 1,340 tokens of interest
38
were
extracted to be used in this study.
4.2 TONGAN ADAPTATION TRENDS. This section investigates LWA in Tongan in the same
way as Chapter 3. It examines the predictions of the Cue Hypothesis regarding word-position,
independently of the segments involved.
Tongan adapts English consonant clusters and codas using three strategies: epenthesis,
deletion, and substitution. All of these processes occur in every type position, for consonant
clusters and word-final codas. Total counts and examples processes are shown below.
38
That is, English words containing consonant clusters and codas; borrowings of English words consisting of only
open syllables were excluded, as their adaptation was not of theoretical interest.
100
(29) Tongan process examples
a) Initial CCs: 202 total observations
i. Epenthesis: “flu” ! [fulu]
ii. Deletion: “plus” ! [posi]
iii. Substitution: “drill” ! [tuila]
b) Medial CCs: 294 total observations
i. Epenthesis: “taxi” ! [takisi]
ii. Deletion: “soldier” ! [sota]
iii. Substitution: “object” ! [/opiesi]
c) Final CCs: 159 total observations
i. Epenthesis: “fox” ! [fokisi]
ii. Deletion: “civics” ! [siviki]
iii. Substitution: “subject” ! [sapiesi]
39
d) Codas: 687 total observations
i. Epenthesis: “acid” ! [/asita]
ii. Deletion: “Jesus” ! [sisu]
iii. Substitution: “stove” ! [sitou]
Epenthesis is by far the preferred repair in Tongan; deletion occasionally happens, and
substitution is very rare. Of the 1,340 observations of LWA, 1,180 show Epenthesis, 155 show
deletion, and 5 show substitution (p<.0001, !
2
=904.877). Excluding observations of
substitution, the epenthesis bias is likewise significant (p<.0001, !
2
=463.031). Because
substitution has such low counts, it may thus be considered inconsequential; it is excluded from
all subsequent analyses in this chapter.
Word-initial consonant clusters show this same bias for epenthesis, with 196 observations
of epenthesis versus only five observations of deletion (97.05% epenthesis; p < .0001,
!
2
=117.208). Word-medial consonant clusters show a similar trend, with 257 observations of
epenthesis versus 35 observations of deletion (87.71% epenthesis; p < .0001, !
2
=98.645). The
same is true for coda consonants, with 661 cases of epenthesis, and 26 cases of deletion (96.07%
39
Although this was recorded as a case of substitution, it is unclear what is actually going on here. “Subject” and
“object” are the only two words that show this substitution-esque process in final position; substitution only occurs
in 5 words in the entire corpus. What is actually going on here is thus not important for the analysis.
101
Epenthesis; p < .0001, !
2
=348.579). However, word-final consonant clusters behaved
differently, with 67 observations of epenthesis, and 89 observations of deletion. This distribution
is not statistically significant.
FIGURE 19. Tongan totals
These patterns largely confirmed the predictions made by the Cue Hypothesis. Deletion
is relatively rare, explained by Preserve Cue; and it is most common in word-final position,
where cues are weakest, as predicted. In all other positions, epenthesis is strongly preferred.
The overall results and general trends classify Tongan as an |Epenthesis Dominant| language, and
||Epenthesis Default|| for the super-type classification. Each individual position/structure will be
addressed below. But first, we return to the ratio analysis done in the previous chapter.
The same general ratio analysis done above with the cross-linguistic data can be done
with Tongan. However, not all of the predictions above can be tested with Tongan, as this
language does not show Retention. The ratio predictions involving Retention are thus not
applicable. However, the other predictions can still be tested against the Tongan data. Also,
predictions about general trends need to be revised as they contain the phrase “cross-linguistic.”
102
The revised predictions are given and examined below. Coda-consonant data is not included in
this analysis, so as to have a direct parallel with the cross-linguistic analysis.
FIGURE 20. Tongan Consonant Cluster adaptation
Prediction 1: (a) There will be most observations of deletion word-finally, (b) and fewest
observations of deletion word-medially. Result: This prediction is partially confirmed: (1a) is
confirmed, as there are most observations of deletion in word-final position; however, (1b) is
not, as there are more observations of deletion word-medially than word-initially.
Prediction 2: (a) There will be most observations of epenthesis word-finally, and (b)
there will be fewest observations of epenthesis word-medially. Result: This prediction is not
confirmed.
The fact that Prediction 1 and 2 were not entirely confirmed is not necessarily
informative, as the three different word-positions are not equally represented. More informative
is to investigate normalized totals. Two methods of normalization were done: Direct Division
and T-Score Scaling. These were done in a slightly different way than for the cross-linguistic
103
analysis. The cross-linguistic analysis adjusted observations within a language against the total
observations in that language. Using this method for Tongan, there would be no adjustment. So,
for Tongan, observation totals were adjusted against total observations for each position
40
.
FIGURE 21. Direct Division Normalized totals
For Direct Division Normalization, Prediction 1 is partially confirmed. (1a) is
confirmed: Word final position has the most observations of deletion, however, (1b) is not
confirmed: Word initial position has the fewest observations of deletion, not word-medial
position. Prediction 2 is partially confirmed. (2a) is not confirmed, as word-initial position
has slightly more observations of epenthesis, but (2b) is confirmed, as word-final position has the
fewest observations of epenthesis.
40
See Chapter 2 for reference and more detail. For the Direct Division method, observations for each observation
were adjusted against the mean of total observations across all positions: 230.3. For the T-Score Scaling method,
observations were adjusted against 44.872 for word-initials, 61.524 for word-medials, and 43.605 for word-finals.
104
FIGURE 22. T-Score Normalized Totals
The T-Score Normalized method yields slightly different results. Prediction 2 is
confirmed with this normalization method. Word-medial position has the most cases of
epenthesis, and word-final position has the fewest.
However, Prediction 1 is partially confirmed. (1a) is confirmed in that word-final
position had the most observations of deletion, but (1b) is not, as word-initial position had the
fewest observations of deletion. To further investigate this, the predictions regarding
percentages
41
were investigated.
Prediction 6: The percentage of Epenthesis will be highest word medially and lowest
word finally. This prediction is partially confirmed: Word-final position does indeed have the
lowest percentage of epenthesis; however, word-initial position has the most. This distribution is
significant (p =.002, !
2
=12.45). Because epenthesis and deletion were the only processes that
occurred in Tongan (excluding substitution), the percentage of deletion is simply the inverse of
the percentage of epenthesis. Thus, Prediction 7, which predicts that the percentage of
41
Recall that Tongan does not show Retention in LWA; thus the ratio variables (e.g. Retention to Adaptation) are
not applicable here; Predictions 3-5 cannot be tested.
105
epenthesis will be highest word-finally, and lowest word-medially, is partially redundant, being
partially confirmed in the same manner. The distributions are displayed below
FIGURE 23. Percentage of Epenthesis
If examined in a strictly categorical fashion, all of the hypotheses were not confirmed.
However, these hypothesis all had two parts to them, one of which was confirmed, the other of
which was not, suggesting partial confirmation. The part of the predictions about word-final
position were all confirmed, demonstrating that deletion is most common and epenthesis is least
common in word-final consonant clusters. The patterns with respect to word-initial and word-
medial prediction were opposite of what was predicted. In word-initial position, deletion was
very rare, more rare than in word-medial position.
Even though these predictions were only partially confirmed (thus falsified), I contend
that the patterns in the data be nonetheless interpreted to support the Cue Hypothesis. The
predictions that were tested/investigated were constructed under the idea that both PC and EP are
only sensitive to the relative strength or robustness of perceptual cues. However, the previous
chapter suggested that PC and EP are also sensitive to the word position, for the sake of lexical
106
access. Per the Cohort Model of word recognition (Marslen-Wilson 1987), the earlier a
consonant/phoneme is within a word, the more important it is for word recognition, in a linear
fashion. That is, word-initial consonants are highly important for word-recognition, and word-
final consonants are relatively less important.
The versions of PC and EP that give importance to a cue’s role in word recognition,
rather than the raw acoustic/perceptual robustness of the cue, neatly explain the Tongan patterns.
These versions are stated below, excluding the (a) clause that were given above in 23 and 24
(30) Preserve Cue: Every perceptual cue in the source word must exist in the
loanword
b) The more important a cue is for word recognition, the more important it is to
preserve its cues.
(31) Enhance Perceptibility: Consonants in the SW must be made as perceptible as
possible
b) The more important a consonant is for word-recognition, the more important it
is to enhance its perceptibility
Re-examining the epenthesis percentage and deletion percentage data, we see that these
versions of PC and EP explain the patterns. For PC(b), the requirement to preserve cues is
strongest word initially, and weakest word-finally; thus, deletion is least likely to occur word-
initially, and more permissible word-finally. That is, because PC(b) is strongest word-initially,
and weakest word-finally, deletion should be less likely word-initially, and most likely word-
finally. This is the case in the data.
107
FIGURE 24. Tongan Deletion and Preserve Cue
For the word-position versions of PC and EP, that is, PC(b) and EP(b) stated above, the
strength (or level of activation) are identical for the two. EP is strongest word-initially, and
weakest word-finally. Word-initial cues are most important for word-recognition, and so the
imperative to enhance them is strongest in word-position. Conversely, word-final cues are least
important for word recognition, and so the necessity to enhance them is less. Epenthesis is thus
most likely to occur word-initially, and least likely to occur word-finally. This is the Tongan
pattern: the percentage of epenthesis in Tongan consonant clusters is directly correlated with the
relative strength of EP (and PC), as shown in the figure below.
108
FIGURE 25. Tongan Epenthesis and Enhance Perceptibility
I conclude that Tongan provides evidence for idea that the conditions responsible for how
speakers of language adapt sequences of sounds in LWs reference perceptual cues. However, the
relative strength and robustness of perceptual cues is not the sole factor for explaining word-
position patterns, as was previously suggested. Speakers likewise seem sensitive to a perceptual
cue’s importance word-recognition. The suggestion above in Chapter 3 is now a central claim:
There exist two versions of Preserve Cue and Enhance Perceptibility, one of which regarding cue
strength, and the other regarding a consonant’s importance for word-recognition. The latter
explains the distribution of epenthesis and deletion across word position in Tongan. This claim
is discussed in more detail in the conclusion of this chapter. First, the next section examines PC
and EP at a more detailed level in each position, examining the specific consonants involved in
epenthesis and deletion.
4.3 FINE-GRAINED INVESTIGATION OF TONGAN: POSSIBILITIES AND PREDICTIONS. In this
section, the adaptation of consonant clusters and codas in Tongan is examined at an even finer
level of detail, zooming in on the specific properties of epenthesis and deletion in each syllable
109
structure/position. Preserve Cue mandates that consonants in the source word be preserved in
the loanword; the more perceptible the consonant is (in terms of its perceptual cues), the more
important it is to preserve it. The analysis above examined the role of perceptual cues as a
function of context; the robustness of the perceptual cues of a consonant can be assessed (at least
in part) by that consonant’s position with respect to other sounds. For example, in a C
1
C
2
V
sequence C
2
is more perceptible than C
1
, as C
2
contains a CV formant transition. However this
is somewhat of an oversimplification, as the quality of C
1
and C
2
, that is, the internal perceptual
cues of consonants (Wright 2004), also influences their perceptibility.
Here, the quality of consonants is examined. Rather than examining the internal
consonant cues per se, the sonority of consonants is examined, as investigating sonority allows
for a simpler and more transparent analysis. This substitution of (internal) perceptual cues with
sonority is warranted under the assumption that there is a direct and linear correlation between
sonority and perceptibility. In other words, the categories, as ordered in the Sonority Hierarchy
(given in 32), accurately reflect differing levels of perceptibility. The correlation between
sonority and perceptibility is supported by the description of sonority given in Ladefoged (2006):
“The sonority of a sound is its loudness relative to that of other sounds… the loudness of
a sound depends mainly on its acoustic intensity (the amount of acoustic energy present).
The sonority of a sound can be estimated from measurements of the acoustic intensity of
a group of sounds that have been spoken on comparable pitches and with comparable
degrees of length and stress” (2006; 239. Italics added).
Following from this, if the perceptibility of a sound/consonant is (at least partially) a function its
of loudness and/or acoustic intensity, then the perceptibility of a sound/consonant is a function of
sonority.
I thus contend that Cue Hypothesis predicts that less sonorous consonants, such as stops
and fricatives, are more likely to undergo deletion than more sonorous consonants, such as
nasals, approximants, etc. In other words, the sonority of a consonant is predicted to play a role
in determining whether or not it gets deleted: The more rightward a consonant is on the
following hierarchy, the more likely its deletion.
(32) The Sonority Hierarchy (Hooper 1976; Selkirk 1984, Clements 1992)
vowels>glides>liquids>nasals>fricatives>stops
110
As stated above, a consonant’s context, or position with respect to other sounds, is also
expected to play a role in how it is treated in the LW (this was the focus of the analysis in
Chapter 3). This factor will also be investigated in this chapter, in addition to sonority. For
example, consider the English word “apt.” The two consonants in this word are equal in terms of
their position on the sonority hierarchy, and more or less equal in terms of the strength of their
internal cues, as they are both voiceless stops. However, due to their context, the perceptual cues
for /p/ are stronger than those for /t/, as /p/ has vowel formant transitions, whereas /t/ does not.
We thus predict that the /t/ is more likely to undergo deletion. Abstracting away from these
particular consonants, we predict that when deletion occurs in a word-final consonant cluster, the
second consonant is more likely to deleted than the first consonant, regardless of the relative
sonority.
Because English syllables/words tend to obey sonority sequencing (English: Carr 1999;
sonority: Clements 1992), distinguishing the sonority and the context of a consonant is especially
relevant for the investigation of word-medial consonant-consonant sequences (referred to here
and throughout as word-medial consonant clusters). Consider two English words, “apron” and
“shipment,” syllabified as [ei.p®´n] and [Sip.mEnt]. Because the [p] in “apron” is syllabified in
syllable onset position, and the same sound in “shipment” is syllabified in coda position, we
predict that the latter is more likely to undergo deletion than the former. That is, if deletion
occurs, it is more likely to occur for a word such as [Sip.mEnt] than for a word such as [ei.p®´n],
due to the fact that word-medial stop is syllabified
42
as a coda in the former, and as part of the
onset in the latter
43
.
The above aspects of sonority and syllabification/context may be translated into a set of
predictions that can be tested against the Tongan data.
42
It should be noted that a description of this that does not reference syllabification, e.g. Licensing-by-Cue (Steriade
1999), may be more appropriate and accurate. However, for reasons of simplification, the concept of syllabification
will be used (and is only applicable for the analysis of word-medial consonant clusters).
43
See Chapter 3.3.1 for a more detailed explanation for why this is.
111
(33) Narrow-scope Sonority scale predictions
Prediction 10: The likelihood of deletion of a consonant will be a function of
that consonant’s position on the sonority hierarchy
Prediction 10a: Stops are more likely to be deleted than fricatives
Prediction 10b: Fricatives are more likely to be deleted than nasals
Prediction 10c: Nasals are more likely to be deleted than liquids
Prediction 10d: Liquids are more likely to be deleted than glides
The extent which these can be precisely tested depends on the amount of the data. Recall that
deletion is relatively rare in Tongan, occurring in just 155 out of 1,339 tokens (11.501%). And
these 155 are spread out across four different types of structure/position (word-initial CCs, word-
medial CCs, word-final CCs, word-final codas). Thus, a wider-scope version of Prediction 10 is
most likely needed. This prediction utilizes the notion of obstruents and sonorants as natural
classes, and is equally informative in testing the Cue Hypothesis.
(34) Sonority scale prediction
Prediction 11: Obstruents are more likely to be deleted than sonorants.
Recall that a consonant’s position within a syllable, i.e. its affiliation as part of a syllable
onset or syllable coda, is also predicted to possibly play a role in determining deletion patterns.
These predictions are given as follows.
(35) Syllable position/affiliation prediction
Prediction 12: Consonants in a cluster are less likely to be deleted if adjacent to
a vowel.
Prediction 12a: When deletion occurs in a word initial C
1
C
2
V cluster, C
1
is more
likely to be deleted than C
2
.
Prediction 12b: When deletion occurs in a word-final VC
1
C
2
cluster, C
2
is more
likely to be deleted than C
1
.
Prediction 13: Consonants in onset position are less likely to be deleted than
consonants in coda position.
4.3.1 DATA AND ANALYSIS. This section examines the specifics of epenthesis and
deletion in Tongan, testing the above predictions. It likewise examines and explains patterns that
do not necessarily fall under the Cue Hypothesis. Because epenthesis is by far the default repair
112
strategy in Tongan, deletion is mainly the focus of analysis here: this section addresses what
happens when deletion occurs. That is, when deletion happens, what consonants does it happen
to, and why? First, the general deletion trends are given, across the entire Tongan corpus.
Following this, trends for specific syllable structures/positions are given.
4.3.2 GENERAL TRENDS. In the Tongan corpus, there are 155 observations of deletion.
Two of these were deletions of an entire word-final consonant cluster: “New Zealand” ! [nu/u
sila], and “cement” ! [sima]
44
. These were not counted in the present analysis.
Across the entire corpus, deletion follows a pattern that is perfectly predicted by
Prediction 10; all of the sub-predictions (10a, 10b, etc) are confirmed/verified. There were more
observations of deletion of a stop than a fricative, a fricative than a nasal, a nasal than a lateral,
and a lateral than an approximant. There were no observations of deletion of a glide. Prediction
10 is confirmed. This is shown in the following table, giving the counts for each consonant
type.
Consonant Class Count
Stops 56
Fricatives 48
Nasals 30
Laterals (/l/) 12
Approximants (/r/) 7
Glides 0
Table 26. Total Deletion Counts
This distribution is statistically significant. (With glide value: p < .0001, !
2
=59.557;
without glide value: p < .0001, !
2
=34.195). The likelihood of deletion in Tongan – correlated
with the observations of deletion – is a direct function of the sonority hierarchy. This is shown
below (hierarchy order reversed).
44
These “double deletion” cases could be mere apparent cases of deletion; the dialect of English from which Tongan
may have borrowed these words, i.e. that which is spoken in the South Pacific, may truncate the ends of words. For
example, “New Zealand” is often pronounced in New Zealand English as [niu zil´]. Tongan speakers may thus be
borrowing such forms faithfully (thanks to Stephen Finlay for pointing this out).
113
FIGURE 26. Distribution of Deletion in Tongan
Prediction 11 is likewise confirmed by this data, with 104 deletions of an obstruent, and
48 deletions of a sonorant, a significant distribution (p = 0012, !
2
=10.678).
4.3.3 WORD-INITIAL CONSONANT CLUSTERS. Epenthesis is by far the most common
observation in word-initial consonant clusters: there are 196 examples of epenthesis to only five
examples of deletion. In four of these, the second consonant in the cluster is deleted, and in the
fifth, the first consonant is deleted. The deletion observations are given below.
114
(36) Deletion in Word-Initial Consonant Clusters
a. Deletion of second consonant
“blackboard” ! [pakipoe]
“block tobacco” ! [paki]
“plus” ! [posi]
“trillion” ! [tiliona]
b. Deletion of first consonant
“sphinx” ! [viNi]
These data partially confirm the predictions of the Cue Hypothesis. The forms in 36a
deleted a consonant that was the most sonorant in the cluster, i.e. /l/ or /r/, the opposite of what is
predicted by Prediction 11. Prediction 11 is thus not confirmed. The form in 36b, however, is
consistent with the Prediction 12a.
For the cases in 36a, it should be noted that Cohort Model approach to the Cue
Hypothesis could explain these forms. The word-initial consonant – the first consonant in the
word – is retained. Above, it was shown that in Tongan, the occurrence of deletion is a function
of word-position, where deletion is least likely to occur in word-initial position. This was
ascribed to the importance of word-initial position for word-recognition. On the level of the
individual segment, per the Cohort Model of word recognition, the first consonant is more
important than the second. I contend that this explains these somewhat aberrant cases in Tongan.
However, it should be noted that such cases are very few; the fact that such cases of word-initial
deletion are so rare (5/202, 0.025%) independently confirms the Cue Hypothesis, as PC (and EP)
prefer epenthesis to deletion.
4.3.4 WORD-MEDIAL CONSONANT CLUSTERS. There are 35 observations of deletion in
word-medial consonant clusters, versus 257 cases of epenthesis, a significant distribution that
confirms the Cue Hypothesis (p < .0001, !
2
=98.645). Examining the deletion cases in detail, we
see that consonant class is not an indicator of deletion.
115
Consonant Class Count
Stops 11
Fricatives 2
Nasals 13
Laterals (/l/) 3
Approximants (/r/) 6
TABLE 27. Medial Consonant Cluster Deletion Counts
This distribution is not significant. Prediction 10 is not confirmed. Comparing obstruents (13)
and sonorants (22) is also not significant; Prediction 11 is not confirmed.
However, it seems as though syllabification (or context) is crucial for explaining deletion
observations. Out of the 35 observations of deletion, twenty-one of these are deletion of the first
consonant, and all of these are syllabified in English as VC.CV. Some examples are given below
(37) Deletion of first consonant in medial CC
a. “sol.dier” ! [sota]
b. “com.pass” ! [kapasa]
c. “ad.miral” ! [/amelali]
d. “eucalyp.tus” ! [/iukaletusi]
These data suggest that a consonant’s affiliation in a syllable, as an onset or as a coda,
which correspond to the strength robustness of perceptual cues, is a factor determining whether
or not it gets deleted. Thus, Prediction 13 is confirmed.
The remaining fourteen cases of medial consonant cluster deletion, where the second
member of the CC sequence is deleted, require closer scrutiny to understand. These are given
below.
116
(38) Deletion of second consonant in Medial CC
a. NC sequences: 5
“December” ! [tisema]
“September” ! [sepitema]
“ebondy” ! [/eponi]
“lavender” ! [laveni]
“sandwich” ! [sanuisi]
b. Stop+r sequences: 6
"algebra” ! [/asipa]
“pomegranate” ! [pamikanite]
“February” ! [febueli]
“library” ! [laipeili]
“apron” ! [/epani]
“petroleum” ! [petoliume]
c. Others: 3
“diphthong” ! [tifoNi]
“leghorn” ! [lekone]
“protestant” ! [palotisani]
The cases in 38a, that is, the NC sequences, may be explained by sonority: nasals are
more sonorous than stops, and so the stop was deleted in these words.
For 38b, i.e. deletion of the /r/ in stop+/r/ sequences, a few possible explanations exist.
There is no form of /r/ is in the Tongan inventory, and English /®/ is quite a difficult sound to
pronounced, as evidenced by its cross-linguistic rarity and the difficulty children have
producing/acquiring this sound (Ladefoged 2001). This could explain its avoidance – hence
deletion. The deletion of /r/ in “February,” “library,” and “petroleum” could be due to the
obligatory contour principle (Goldsmith 1976; OCP). Tongan substitutes /l/ for the English /r/;
retaining the /r/ in “library,” for example, would result in something like *[laipeoleili], an OCP-
violating word. As for the “February” example, this is most likely an example of apparent
deletion. Tongan borrowed from British English. Much like American English, “February” is
117
pronounced without the first /r/ in casual British English speech (Stephen Tobin, personal
communication).
The final three examples (38c) can all be explained with sonority: the more
sonorous/perceptually-robust sound is the one that is retained. This is especially true for
“leghorn,” as /h/ is a particularly faint sound. The same comparison holds for /s/ and /t/ in
“protestant”: /s/ is much more perceptible than is /t/, as sibilant fricatives are more acoustically
robust than stops (Shadle 1985). The word for “diphthong” contains a sequence of [fT]; /T/ is not
a phoneme in Tongan, but /f/ is: if speakers are going to use deletion to simplify a consonant
cluster, then it make sense that they would choose to delete the sound that is foreign to them, and
retain the sound that is known to them.
Overall, the data for word-medial consonant clusters suggest that PC and EP are active at
the level of the segment in determining which gets deleted. Deletion is most likely to occur for a
word-medial consonant cluster if that consonant cluster is syllabified as a VC.CV sequence. If
such a sequence is a VN.CV sequence, the nasal is retained, explained by the higher sonority of
the nasal as compared to a stop. The data also suggest that language specific factors and
independent markedness factors play a role, albeit occasionally. The status of a sound in the
Tongan phoneme inventory explains some of the data, and likewise does the OCP. These ideas
are returned to later in the conclusion section of this chapter.
4.3.5 WORD-FINAL CODAS. Word-final codas displayed a strong bias towards epenthesis.
In the corpus there are 660 observations of epenthesis to only 25 observations of deletion. There
does not seem to be any obvious reason for why these 25 deletions occurred; nothing specific
about these consonants, from a phonological/sonority perspective, stands out. Totals are given
below.
118
Consonant Class Count
Stops 3
Fricatives 9
Nasals 11
Laterals (/l/) 2
Approximants (/r/) 0
Table 28. Medial Consonant Cluster Deletion Counts
This distribution is statistically significant (p = .038, !
2
=10.179), but not in any
meaningful way – fricatives and nasals seem to be deleted more often than other sounds. Since
Tongan borrowed from a variety of English that lacks coda /r/, the fact that there were zero
deletions of approximants is irrelevant. If this datapoint is removed, the distribution becomes
non-significant (p = 0.162). And comparing obstruents versus sonorants, there is no obvious
bias, with 12 being of the former and 13 being of the latter.
What gets deleted in the coda in Tongan seems to be idiosyncratic, language-specific, and
random. Four patterns can be sorted out in the 26 observations.
(39) Deletion in Word-Final Codas
a. Highly technical/scientific words: 11
i. “calcium carbide” ! [kabai]
ii. “epicurean” ! [/epikulio]
iii. “harmonium” ! [hamoni]
b. Christian/Bible words: 7
i. “communion” ! [komunio]
ii. “absolution” ! [/apesolusio]
iii. “Jesus” ! [sisu]
iv. “centurion” ! [senitulio]
119
c. Possible OCP effects: 3
i. “narcissus” ! [nasisi], *[nasisisi]
ii. “matches” ! [masi], *[masisi]
iii. “mustard” ! [musita], *[musitata]
e. No obvious pattern: 5
i. “flannel” ! [falani]
ii. “ace (in cards)” ! [hai]
iii. “dungarees” ! [taNali]
iv. “station” ! [setasio]
v. “commission” ! [komisio]
The occurrence of deletion for the words in 34c can be explained by an OCP effect –
retaining the final consonant in the English word would result in a repetition of identical
consonants, most egregious of which would for be the word “narcissus.”
Lexical specification/category appears as though it may play a role in how speakers of
Tongan adapt sequences of sounds in foreign words. Ten of the word-final coda deletions are
words that are highly technical and scientific. Seven of the words have to do with
Catholicism/Christianity, and/or words that are likely to be used just in reference to The Bible,
for example “centurion.”
There are two possible explanations for this. The first is that these words are somehow
exceptional. Perhaps they were borrowed at a different time than other Tongan borrowings,
perhaps they were borrowed from a different group of people (i.e. missionaries), or perhaps the
people who borrowed them behave differently. Regarding the second explanation, it should be
noted that Catholic/Bible and scientific/technical words make up a significant portion of the
Tongan loanword corpus. This makes sense under the idea that speakers borrow words from
other languages when the words do not exist in their language, i.e. lexical gaps. Then, if deletion
happens randomly to a small percentage of loanwords, a certain portion of the words showing
deletion will happen to be words of this type, due to random chance.
In order to further investigate this, all of 155 words that showed deletion were reviewed
and marked for whether a word was a Christian/Bible word or a scientific/technical word. Some
of these have already been shown above, such as the words for “diphthong” ! [tifoNi] and
120
“protestant” ! [palotisani], in the section on word-medial consonant cluster deletion.
Additionally, all of the 1,476 loanwords in the Tongan corpus were reviewed and marked for the
same properties. Percentages were compared to determine if Christian/Bible words and/or
scientific/technical words are overrepresented in the set of loanwords that show deletion.
Scientific/technical words make up 29 out of the 155 observations of deletion, about
18.71%. The entire corpus of Tongan loanwords contains 189 scientific/technical words, about
12.8%. Christian/Bible words make up 20 of the 155 observations of deletion (12.9%), whereas
they make up 119 words in the corpus (8.1%). This suggests that the history of these words is
somehow exceptional – they are more likely to display deletion than any given random word in
the Tongan corpus
45
. Why this is, and how this occurred, is beyond the beyond the scope of the
current investigation. However, what is important to note is that the distribution of patterns of
LWA can be influenced by idiosyncratic and specific aspects of a language’s demographics and
contact history.
The five words in 39e also seem to be random. There does not seem to be any obvious
category they fit into, phonological or semantic. Two of the coda deletions – “flannel” and
“dungarees” – are part of a semantic class dealing with clothing/textiles. This is ostensibly just a
coincidence, as there are other clothing/textile words that epenthesize to avoid final consonants,
e.g., “drill (clothing)” ! [tilili], “nylon” ! [nailone].
The word for “ace” ! [hai] may be a case of metathesis and substitution, as no obvious
source exists for the initial /h/. There could have been an opaque process where the /s/ and the
/ai/ switched positions, and the /s/ became /h/. This is probably not the case, however. There are
other examples of Tongan epenthesizing word-initial /h/, both of which are before a low vowel:
“iron (clothes)” ! [haeane], “arrowroot” ! [halaulutu]. Recall Tongan’s necessity for onsets:
vowel-initial words epenthesize a glottal stop at the beginning. It is then more likely that these,
including the case of “ace,” are more likely to be a form of epenthesis to achieve a canonical
syllable with an onset. Why the /s/ is deleted is therefore attributed to random chance.
4.3.6 WORD-FINAL CONSONANT CLUSTERS. Word-final consonant clusters displayed a
pattern that was much different from the other positions/structures. For all of the others, there
45
It should be noted that comparing the percentage values for each, for example, 18.71 versus 12.8 for
scientific/technical words, is not significant in a Chi-Square test; however, the Chi-Square test is meant to handle
observation counts, not (necessarily) computed values, such as percentages.
121
was a strong bias for epenthesis, and deletion was rare. However for word-final consonant
clusters, there were more observations of deletion (89) than there were of epenthesis (67). This
deletion-bias is not statistically significant. Because of this, both epenthesis and deletion are to
be investigated in detail.
Firstly, a data issue needs to be addressed. Three words were included in the total count
as observations of deletion, which are most likely cases of apparent deletion.
(40) Word final consonant cluster apparent deletion
“palm tree” ! [pa…me]
“balm” ! [pa/ame]
“psalm” ! [sa…me]
In these words, the /l/ is mainly orthographical: only in very careful, precise situations, such as
citation form, is the lateral tongue tip gesture for the /l/ pronounced. Tongue dorsum gestures for
the /l/ may still be present; I speculate that this is why Tongan adapted these words with a long
vowel (or a /v/v/ sequence, which could perhaps be due to contrast preservation, so as to not
neutralize “palm” and “palm”). In short, these are most likely cases of apparent deletion, rather
than actual deletion. These are excluded from subsequent analyses
46
.
Examining which consonants undergo deletion is telling: The deletion observations are
in direct correlation with the sonority hierarchy. This is shown in the following table and figure.
Consonant Class Count
Stops 44
Fricatives 35
Nasals 5
Laterals (/l/) 2
TABLE 29. Final Consonant Cluster Deletion Counts
46
These loanwords suggest falsification of the Bilingual Hypothesis of LWA (Paradis and LaCharité 1997); which
will be addressed in Chapter 5.
122
FIGURE 27. Distribution of deletion in word final consonant clusters
Prediction 10 is confirmed (p < .0001, !
2
=37.409). Prediction 11, which compares
obstruents versus sonorants, is also confirmed, with 79 deletions for obstruents to only seven
deletions for sonorants (p < .0001, !
2
=36.543).
Prediction 12b is likewise confirmed: when deletion happened for a VC
1
C
2
sequence, it
happened more often for C
2
than for C
1
. There were seven deletions of C
1
, and 79 deletions of C
2
(p < .0001, !
2
=36.543)
47
.
To further investigate the adaptation of word-final consonant clusters, an analysis was
done that compared which consonant clusters were repaired with epenthesis versus which were
repaired with deletion. The theory advanced here is that the effect of PC and EP may or may not
show up on individual loanwords, but across a set of data, their effect will (almost) always be
seen, and the larger the set, the more readily their effect will be seen. Although word-final
47
Although sonority and vowel adjacency are very much linked, it is just a coincidence that these values happen to
be the same as the obstruent-sonorant values that confirmed Prediction 11. Consider an /nt/ sequence, treated two
different ways: “percent” ! [peseti], “pheasant” ! [feseni].
123
consonant clusters show a non-significant bias for deletion, the effects of PC and EP should
nonetheless show up. PC mandates that the more robust a cue/consonant is, the more important
it is to preserve that consonant (and its cues). We thus predict that epenthesis is more likely to
occur in consonant clusters whose cues are louder and more numerous, and deletion is more
likely to occur in consonant clusters whose cues are quieter and less numerous. The segmental
analysis in this chapter is based on sonority: it is predicted that consonant clusters that are overall
highly sonorous are more likely to undergo epenthesis than consonant clusters that are overall
less sonorous. This prediction is given below as the 13
th
prediction.
(41) Consonant Cluster Sonority Prediction
Prediction 13: Epenthesis is more likely to happen in consonant clusters that
have a relatively high sonority, and less likely to happen in consonant clusters that
have a relatively low sonority.
In order to test this prediction, the following analysis was done. For all of the word-final
consonant clusters in Tongan, the consonants in the cluster were examined, and classified as one
of three categories: obstruents, sibilants, and sonorants. Previous analyses classified as sibilants
as obstruents, but here, sibilants are classified as a separate category. Although /s/ and /S/ are
fricatives, and fricatives are generally considered to be obstruents, these sounds are highly
sonorous compared to other fricatives and stops (Shadle 1985, Wright 2004). These sounds have
relatively very high energy at frequencies above 4,000Hz, whereas stops and other fricatives do
not (Ladefoged 2001). Phonologically, they behave differently than other fricatives. For
example, English allows a sonority sequence violation involving sibilants, but not other
fricatives: “star” [sta®], but *[fta®]. Likewise, they are known to sometimes behave aberrantly
with respect to other fricatives in loanword phonology (Rose & Demuth 2006). Classifying
these sounds as separate from obstruents, in order to test the prediction above, is thus warranted.
The three categories were then given a score based on their sonority. Obstruents have the
lowest sonority, sonorants have the highest, and sibilants are somewhere in between. Thus,
obstruents were given a score of 1, sibilants a score of 2, and sonorants a score of 3. Consonant
clusters were then given a score, which was simply the sum of the scores of the individual
consonants in the cluster. This is shown in the following table. Examples are from actual
English words in the Tongan corpus
124
Consonant cluster Example Sonority Score
Obstruent+Obstruent “draft” ! [talafi] 2
Obstruent+Sibilant “fox” ! [fokisi] 3
Sibilant+Obstruent “west” ! [uesite] 3
Obstruent+Sonorant “logarithm” ! [lokalimi] 4
Sonorant+Obstruent “mint” ! [miniti] 4
Sonorant+Sibilant “France” ! [falanise] 5
Sonorant+Sonorant “film” ! [filimi] 6
TABLE 30. Consonant Cluster Sonority Scores
Per Prediction 13, the higher the sonority score, the more likely epenthesis is to occur, and the
lower the sonority score, the less likely epenthesis is to occur (or the more likely deletion is
occur). Epenthesis percentages for each of these consonant cluster types were calculated. That
is, for each consonant cluster type (e.g. Obstruent+Obstruent), the number of epenthesis
observations for that type was divided by the total number of consonant clusters of that type,
then multiplied by 100. For example, there were 7 total obstruent-obstruent consonant cluster
sequences in the data; 5 showed deletion, and 2 showed epenthesis. So, the sonority score for
this consonant cluster type was 2/7 x 100 = 28.571. Data for consonant clusters that had an
equal sonority value was combined, i.e. 3 (Obstruent+Sibilant and Sibilant+Obstruent) and 4
(Obstruent+Sonorant and Sonorant+Obstruent). The results are given in the following table.
Sonority Score Result
2 28.6
3 31.3
4 53.1
5 63.3
6 1
Table 31. Sonority scoring analysis results
125
FIGURE 28. Sonority Score Results
This distribution is statistically significant (p < .0001, !
2
=29.884). Likewise, it is in the
direction that is predicted: The more overall sonorous a word-final consonant cluster is, the
more likely it is to undergo epenthesis. All Sonorant+Sonorant clusters showed epenthesis;
63.3% of Obstruent+Sonorant and Sonorant+Obstruent clusters showed epenthesis, and so on.
Prediction 13 is confirmed. PC and EP are active in Tongan on the level of the individual
consonant: in a perceptually weak position, i.e. word-final clusters, the effect of PC and EP is
nonetheless observed. In this perceptually weak position, perceptually strong consonants (or
more sonorous ones) are more likely to be retained, and perceptually weak consonants (or less
sonorous ones) are more likely to be deleted. The Cue Hypothesis is once again verified.
4.4 SUMMARY AND CONCLUSION. This chapter demonstrated that an in-depth analysis of
a language such as Tongan enhances the overall understanding of LWA of syllable structure.
For one, details were examined that could not have been examined in the cross-linguistic corpus.
The cross-linguistic analysis showed that PC and EP are sensitive to a consonant cluster’s
126
location within the word. Here, it was shown that PC and EP are sensitive to the properties of
the consonants involved in a cluster: they likewise operate on a micro-level, as shown in the
previous section, in Figure 28.
This chapter also demonstrates how factors unrelated to sonority/cues (or the quality of
consonants per se) are relevant to a fuller understanding of loanword phonology. Some of the
data shown above seem to be due to independent markedness constraints, such as the OCP, and
the prohibition of marked segments such as /®/ and /T/. Tongan has a strict CV(V) syllable
structure; Retention is thus not an option for Tongan, a fact due to Tongan’s specific phonotactic
properties. Other language-specific markedness factors are involved as well. Tongan sometimes
deletes a consonant from English if that consonant is not in the Tongan consonant inventory,
regardless of the consideration of its perceptual cues.
Additionally, language-specific idiosyncrasies were also found to have a role: certain
types of words behave differently in the loanword data, such as scientific/technical words and
Christian/bible words, demonstrating the importance of understanding the history of language
contact in describing and analyzing LWA.
In addition to these micro-level facts, the investigation of Tongan also provided
confirmation of an idea that was posited in the previous chapter. That is, the Cue Hypothesis
operates not only on the relative robustness/strength of perceptual cues, but on the importance of
a cue for word-recognition. Cross-linguistically, it was shown that word-medial position is
privileged, as this is where perceptual cues are the most robust. However with Tongan, word-
initial position is privileged, as this position is key for word recognition under the Cohort Model.
Preserve Cue and Enhance Perceptibility come in two distinct but related varieties. Both or one
of which may determine how LWA proceeds in any given language.
In sum, this chapter enhanced the understanding of LWA of sound sequences. A
generalization of all of the observations in the dissertation can now be described in a single,
simple proposition, stated in 44.
(44) Loanword Adaptation of Sound Sequences
The range and patterns of LWA of sound sequences are mainly guided by two
conditions, Preserve Cue and Enhance Perceptibility which interact with
markedness and faithfulness to the SW; language-specific factors, such as
127
phonotactic requirements and lexical idiosyncrasies, also influence the borrowing
process.
128
5 THEORETICAL CONNECTIONS AND CONCLUSIONS. This dissertation has been primarily
data focused, presenting and analyzing cross-linguistic data on loanword adaptation (LWA), as
well as data in LWA in Tongan, providing various ways of organizing and analyzing the data.
This chapter provides sketches of how the observed LWA phenomena may be connected to
formal phonological theory. Many of the observations and claims made in the dissertation
regarding LWA have direct correlates with established theories and frameworks of native
phonology, specifically Articulatory Phonology (Browman and Goldstein 1992), and Optimality
Theory (Prince and Smolensky 1993/2004). Further, the perceptual, cue-based approach to the
data and observations regarding LWA has direct correlates with the Licensing-by-Cue approach
to phonological patterns (Steriade 1999), and the P-Map theory of correspondence (P-Map:
Steriade 2001/2009: Correspondence Theory: McCarthy and Prince 1995), as well as others that
have followed in this tradition, such as Fleischhacker (2000), and Côté (2000). Some
connections between the LWA typology and the Cue Hypothesis to these more formal
approaches to phonology are outlined here.
In Section §5.1, I review the main findings and claims of the dissertation. Section §5.2
connections these findings and claims to formal phonological theory. In this section I provide a
description of how the LWA process may proceed in terms of Articulatory Phonology. I then
turn to an inquiry in Optimality Theory and the P-Map, showing how loanword (LW)
observations are an example of the Too-Many-Solutions problem (Steriade 2001), which may be
remedied in similar ways as it is in native phonology. §5.3 connects the findings of the
dissertation to theories of LWA in formal phonology, specifically the Perceptual Theory of LWA
(Silverman 1992; Yip 1993, 2002; Kenstowicz 2003, 2007) and the Bilingual Theory of LWA
(Paradis and LaCharité 1997). I argue that the typology of sound sequence adaptation
investigated in this dissertation is mostly consistent with the Perceptual Theory.
5.1 SUMMARY OF FINDINGS. The main goal of this dissertation was to examine the
adaptation of sound sequences (specifically, those involving syllable structure) in loanwords
from a cross-linguistic perspective, as well as to provide an explanation of observational
patterns. This goal was accomplished mainly by providing a typology of LWA of syllable
structure, which included a typology of process and a typology of languages based on their
129
process preference. The general explanation given for the observations regarding these
typologies invoked the primacy of perceptual cues in the loanword process.
Both the typology of process and the typology of language process preference show the
same robust trend, where generally, LWs are adapted as faithfully as possible. That is, non-
adaptation, or retaining marked syllable structure such as consonant clusters and coda
consonants, was the most common observation, by a fairly wide margin. When non-adaptation
did not happen, and an alternation/repair was made to the source word in adapting the loanword,
this repair was most commonly epenthesis. Epenthesis occurred at a significantly higher rate in
the data than deletion. Other possible repairs, such as metathesis, phoneme substitution, and
coalescence, were observed rarely. This trend for retaining foreign sound sequences, and when
a repair happens, the repair tends to be epenthesis, holds for both typologies. Simple counts of
process observations, both raw and normalized, showed this trend. Likewise, language-types
shows this trend. Out of the 53 languages investigated in the study, the majority (29) were
classified as |Retention Dominant|, meaning they borrowed words from other languages
faithfully in a statistically significant majority of LWs. The second most common pattern was
one where both Retention and epenthesis occurred, with 11 of the languages in the corpus
showing a dual preference for either one of these repair strategies (or lack thereof). Eight
languages preferred epenthesis exclusively. Just two languages (Iraqw and Bezhta) showed no
strong preference for any strategy, and perhaps most remarkably, only two languages tended
towards deletion. Yaqui favored both deletion and Retention, and Saramaccan favored deletion
over any other repair strategy.
The second main finding was presented in Chapter 3, where clear asymmetries were
observed with respect to word position. It was shown that the repair strategy employed depends
on the type of structure, i.e. whether a word initial consonant cluster, a word-final consonant
cluster, or a word-medial sequence of two consonants. Word-final consonant clusters were the
most susceptible to repair: both deletion and epenthesis occurred at a significantly higher rate
for word-final consonant clusters than for similar sequences in other positions in the word.
Word-medial consonant-consonant sequences (referred to as word-medial consonant clusters)
were the least susceptible to repair, showing the strongest bias towards retention. Word-initial
consonant clusters behaved somewhere in between word-medial consonant clusters and word-
final consonant clusters, still showing a bias towards Retention, but being adapted more
130
frequently than word-medial consonant clusters, but less frequently than word-final consonant
clusters. It was also shown that when a repair (i.e. non-Retention) happens in this position,
Epenthesis is more likely to occur than when a repair happens in word-final position. This
property can be explained in Optimality Theory by the interaction of Faithfulness constraints and
other constraints posited by Côté (2000); this will be discussed below.
The case study of Tongan revealed similar patterns, where epenthesis was strongly
preferred to deletion, and deletion was most likely to happen in word-final consonant clusters.
The analysis of Tongan also revealed other important facts relevant for LWA. For one, it
examined LWA patterns on the level of the segment. The investigation in this chapter found that
the sonority of a consonant, which roughly correlates to the strength/robustness of internal
perceptual cues of a consonant, is likewise key in explaining patterns observed in the language’s
loanword phonology. Tongan additionally showed that additional factors are sometimes
relevant in describing the LWA phonology. For example, the absence of an English phoneme in
Tongan was explanatory for some cases of deletion. Likewise, independent markedness
constraints, such as that which is enforced by the Obligatory Contour Principle, likewise play a
role in explaining certain patterns. And finally, the Tongan data suggest that historical and
sociological factors can be relevant in a full description of LWA. With Tongan, words that
might have been introduced to Tongan speakers by missionaries or other specific demographics
(i.e. “scientific/technical” and “Christian/Bible” words), behave slightly differently than other
LWs in the language. The precise reason for this is beyond the scope of the current
investigation, but what is crucial to note here is that LWA is necessarily a social (or
sociolinguistic) phenomenon. I suggest that this is necessary for an accurate description of the
LWA process, and that this property is beyond the scope of formal phonology. This was
discussed above in Chapter 3 in different terms, and will be returned to below.
An additional analysis was done on the extent to which LWA observations within
languages conform to the native phonology of the language (or the speakers) doing the
borrowing. It was found that mostly, LWs were adapted in accordance to the rules/constraint
rankings of the native language. However, in many cases, LWA behavior did not conform to the
native phonology. It was shown that many languages that allow marked syllable structure,
including word-initial consonant clusters, word-final consonant clusters, and word-final codas
nonetheless repair similar and identical structures when borrowing words from foreign
131
languages, and some languages even do this in the majority of cases. Georgian was the most
robust example of this: Word-final consonant clusters of all types, even those that disobey
Sonority Sequencing, are observed in native Georgian words, but epenthesis was observed in all
observations of LW adaptation of word-final consonant clusters. And perhaps even more
consequentially, LWA is such that marked sequences of sounds that are prohibited in the native
lexicon may be tolerated in LWs. This was most remarkable for the phonology of Sakha and
Imbabura Quechua. In all 909 native Sakha words provided by Pakendorf and Novgorodov
(2009), and in all 1,000 native words of Imbabura Quechua provided by Rendón (2009), there
was not a single word that contained either a word-initial consonant cluster, or a word-final
consonant cluster; yet these two languages retained foreign consonant clusters at a rate high
enough for them to be classified as |Retention Dominant|. It seems as though these two
languages prohibit consonant clusters, yet when they borrowed words from other languages,
consonant clusters were adapted faithfully. Perhaps the single most telling example of this
comes from Hawaiian, a language that adapts LWs so much that “Merry Christmas” ! [mele
kalikimaka] is unrecognizable as the common English phrase to the untrained ear. The general
description of such a form is that Hawaiian prohibits coda consonants and consonant clusters, as
well as [r] and [s], which become [l] and [k] in the above adaptation. However, this
generalization is shattered by a single loanword provided by Parker Jones (2009): The word
“Christ” has been borrowed into Hawaiian as [kristo], with the [r], and the [s], and adjacent
consonants as they are in the source word.
This non-phonotactically conforming aspect of Hawaiian and other languages, as well as
the distribution of patterns in the typology, are explained by a hypothesis presented in the
dissertation that is external to formal linguistic theory. Stated tersely, this hypothesis claims
speakers tend to hyper-articulate words borrowed from foreign languages, and this hyper-
articulation influences the phonological shape of the LW. The data in this dissertation are in
line with this hypothesis. The Retention bias, the non-phonotactically conforming nature of
LWA, the epenthesis preference, the cases of gratuitous epenthesis, and the word-position
asymmetries could have resulted from hyper-articulation.
One might then inquire as to why this hyper-articulation occurs. I contend that two
separate, uncontentious facts about language and cognition are relevant for the explanation.
132
1) By definition, a loanword in any language Lb is a word that at some point in history
was not in the lexicon of Lb. When a speaker of Lb borrows a foreign word, they are necessarily
unfamiliar with this word. It is by definition foreign: strange, unknown, unusual, unique, and/or
new.
2) The human mind unequivocally handles foreign/strange/unfamiliar things differently
than it handles things with which it is familiar. The mind behaves differently to new stimuli than
it does to well-known stimuli: compare the experience of being in an unfamiliar city with being
in one’s neighborhood. This is similar to what is known in cognitive science as “the oddball
effect” (Schindel et al. 2011; Tse et al. 2004), where in an experimental setting, people react
automatically and systematically to novel, unusual, or unexpected stimuli; their behavior/actions,
mental computations, and perception of time all slow down. It is thus reasonable to assume that
the same goes for language, as language/speech involves action and mental computation (as well
as perception). In essence, LW phonology is a type “oddball effect.” Generally, hyper-
articulation occurs in LWA due to the above; over the course of time, this has resulted in the
patterns sound sequence adaptation observed in this study
48
. This claim is further explored
below, where the possible hyper-articulation of LWs is examined within the framework of
Articulatory Phonology.
5.2 THEORETICAL CONNECTIONS. This section aims to connect the findings of the
dissertation to formal theories of phonology. I argue that the description of the LWA process
given in Chapter 3 is crucial to the understanding of the LW observations and patterns, and that
this description is congruent with a type of gestural clock slowing in the Articulatory Phonology
framework. I then provide a partial Optimality Theoretic analysis of LWA, first of the general
observations and the typology of language process preference, and then the word-position
asymmetries seen in Chapter 3.
5.2.1 GESTURAL CLOCK-SLOWING IN THE BORROWED WORD. In Chapter 3, and here in
§5.1, I argue that an accurate description of LWA requires an understanding of both the social
and cognitive factors involved when a speaker of one language borrows a word from a foreign
48
Granted, such a claim, although ostensibly plausible, is mostly speculative; the validity of this hypothesis is left
as a topic for future research.
133
language. I make a distinction that, as far as I am aware, is not a distinction that usually made in
the LW literature. That is, there is a key difference between a borrowed word and a loanword.
When a speaker of one language uses a word from another (foreign) language to refer to some
concept, this speaker is, in all likelihood, unfamiliar with the foreign word. The borrowed word
is treated differently precisely because it is foreign/unfamiliar, an automatic cognitive response
of the mind to unfamiliarity. If the borrowed word is used enough, it eventually will become
fossilized and enter the lexicon, then deserving of the label “loanword.”
Consider the following sentences, meant to illustrate this point. Transcription of the
relevant words is given below the sentences.
(45) I saw the French painting at the museum.
[f®EntS "peIntIN]
(46) I saw the Lefebvre at the museum.
[l´"fEb.vr
´
]
These are meant to be read with identical intonation, with focus the fifth syllable (hence the
italics in “painting.” Assuming that one is not familiar with the late 19
th
Century French painter
Jules Joseph Lefebvre, these two sentences are likely not to be treated with the same prosody
when read aloud. This is because “Lefebvre” is a foreign/unfamiliar word, with syllable
structure that an English speaker is probably not used to dealing with. In 46, the prosody is
likely to change when one pronounced “Lefebvre.” One is likely to slow down the rate of
speech, handling every sound and the sequences of sounds differently than one would handle a
phrase containing familiar sounds and sound sequences such as [f®EntS "peIntIN] “French
Painting.” It is argued that it is precisely this phenomenon that occurs when one speaker
borrows a word from another language – the word is hyper-articulated, slowed down, and
handled with precision and care, due to its unfamiliarity.
In Chapter 3, the example was given for the borrowing of the English word “brick” [b®Ik]
into Yoruba as [biriki]
49
. The claim here is that just as English speakers do with “Lefebvre,”
speakers of Yoruba may have handled the borrowed word “brick” differently, pronouncing the
word in a hyper-articulated fashion, where the articulatory gestures are both more carefully
executed and slowed down in time. This slowing down, or gestural clock-slowing, leads to
49
I am not claiming that this is necessarily what happened with this LW, for that is likely impossible to know; this
word serves as an example for illustration.
134
gestures that are more spread out in time. The gestures in the initial consonant cluster, [b®]
become more separated, resulting in a vocalic sound in between them. Likewise, the hyper-
articulation, or clock-slowing of the [k] sound, results in a more audible release, which also may
result in something like a excrescent vowel. This hyper-articulated form is then fossilized into
the Yoruba lexicon as [biriki]. It is then a “loanword,” the form that phonologists study.
A corollary claim about this is that when speakers borrow a word from a foreign
language, and use this word in conversation, the borrowed word is hyper-articulated not just
because it is foreign in and of itself. It is hyper-articulated because doing so more likely
guarantees that the speaker is understood by his interlocutor. In other words, because the word is
unfamiliar, the hyper-articulation/clock-slowing maximizes the probability that the unfamiliar
word is understood. As an epiphenomenon of this, the perceptual cues in the word are
preserved, as the gestures in the word are more precisely executed, and the consonants in the
word are perceptibly enhanced by the excrescent vowel sound that occurs due to the clock-
slowing, which adds vowel formant transitions to the consonant gestures, thereby making them
more perceptible. Here we have the basis for Preserve Cue and Enhance Perceptibility, which
show up as certain patterns of epenthesis and deletion in the examination of loanwords.
This can be accounted for in Articulatory Phonology terms by positing that
unfamiliar/foreign/borrowed words are co-articulated with some sort of prosodic gesture, similar
to the #-gesture (Byrd and Saltzman 2003) or µ-gesture (Saltzman et al. 2008), which causes the
clock slowing. What precise nature and formulation of this prosodic gesture is beyond the scope
of the current investigation but provides an avenue for future questioning.
A gestural approach to LW phonology also possibly accounts for another fact: word-
final consonant clusters are more likely to be repaired than consonant clusters in other positions,
and the repair is predominantly epenthesis. Consider the following figure, a gestural score of the
word “spams” [spœmz] adapted from the gestural scores of “span” in Nam et al (2012) and “ten
themes” in Browman and Goldstein (1989).
135
FIGURE 29. Gestural Score for “spams”
Due to the anti-phase timing relationship with the vowel, the gestures of [m] and [z] are already
more spread out in time than the gestures in the word-initial cluster [sp]. The gestural clock-
slowing that happens with when a foreign word is borrowed could therefore affect the word-final
gestures more; as these gestures are already more spread out in time. Further spreading them out
in time due to the clock slowing could be more likely to result in a vowel sound in between them.
Over a large number of borrowings, then, more “epenthetic” vowels would be observed in word-
final consonant clusters than word-initial consonant clusters. And such is precisely what is
observed in the cross-linguistic corpus of data. Further exploration of this hypothesis is left open
for future research.
5.2.2 OPTIMALITY THEORY. The Cue Hypothesis put forth in this dissertation also has
precursors in and connections to Optimality Theory. The LWA patterns observed in Chapter 2
may be considered a type of the “Too-Many-Solutions” problem, and is likewise solved by the P-
Map Theory of Correspondence (Steriade 2001/2009; McCarthy and Prince 1995). Additionally,
136
the word position asymmetries observed in Chapter 3 may also be explained by aspects of
perceptual similarity in line with the P-Map. Further, Preserve Cue and Enhance Perceptibility
have other correlates with research in Optimality Theory, especially with Côté (2000). These are
all outlined below.
5.2.2.1 TOO-MANY-SOLUTIONS. Steriade (2001/2009) points out that the original
formulation of Optimality Theory (Prince and Smolensky 1993/2004; OT) has a problem she
calls the “Too-Many-Solutions” problem. Briefly stated, this is a problem of empirical accuracy.
Of all of the possible solutions to handling marked phonological structure, only a subset of those
solutions are attested in language. For example, word-final voiced obstruents, such as the /b/ in
[tab] are marked. Any number of possible repairs could be used to solve this problem, such as
nasalization [tam], deletion [ta], epenthesis [tab´], yet only one is attested in language, i.e.
devoicing: [tap]. The solution to this problem she offers is known as the P-Map, where the
knowledge of perceptual similarity between sounds informs Correspondence Theory (McCarthy
and Prince 1995), thereby limiting the possible solutions to solution that are actually attested in
language.
A version of the “Too-Many-Solutions” problem can be seen in patterns of LWA syllable
structure observed in this dissertation. Consider a loanword such as [smapo], which may be
repaired via several different strategies.
(47) Possible LW repairs of [smapo]
a. Epenthesis: [smapo] ! [simapo]
b. Deletion: [smapo] ! [mapo]
c. Metathesis: [smapo] ! [sampo]
d. Lenition/substitution: [smapo] ! [swapo]
All of these possible repairs are indeed observed in the LW corpus; however, there is a strong
preference for epenthesis (47a), with deletion (47b) sometimes occurring, and metathesis (47c)
and substitution (47d) occurring very rarely. Independently of the idea of perceptual similarity,
one would expect the above repairs to occur at equal rates, yet as was shown above, they
certainly do not.
The idea of perceptual similarity is able to account for rarity of metathesis and
substitution, and the commonality of epenthesis. The rarity of substitution (43e) can be
137
explained by the fact that it alters a segment in the source word (SW), a relatively large
perceptual departure compared to the other forms, in addition to the fact that it does not remedy
the consonant cluster. The rarity of metathesis (43c) is also explained in similar terms, in that
the perceptual departure is greater than necessary. This is confirmed by experimental evidence
in Fleischhacker (2000), where English speakers judged various alternations to the word “flip,”
in terms of similarity. A metathesized form/candidate such as “filp” was judged to be less
similar to “flip” than a deletion form/candidate such as “fip.” Thus, epenthesis and deletion are
the best in terms of perceptual similarity.
The preference for epenthesis and deletion can also be explained by the relative
perceptual departure from the SW. Relevant here are facts regarding the quality of epenthetic
vowels (EV) in loanwords. Riggs (2010/2013) argues that the quality of epenthetic vowels in
loanwords is frequently a minimal perceptual departure from the SW. That is, epenthetic vowels
in LWs are usually reduced, minimal, or relatively short duration, such as [´], [i], and [i].
Likewise, epenthetic vowels in LWs are often copies adjacent vowels and consonants in the SW.
For example, the quality of the EV in Shona tends to match the features of adjacent consonants,
such as a labial vowel after a labial consonant as in “item” ! [aitumu], or a coronal vowel after
a coronal consonant, such as “ice” ! [aizi] (Uffmann 2004). In other words, vowels of
matching qualities are minimally salient, or a minimal perceptual departure, from the source
word, as compared to other possible EV qualities. This EV quality phenomenon was frequent in
the LW corpus used in this study, although such data was not investigated.
In the deletion candidate (47b), all of the features and perceptual cues of the [s] are lost, a
relatively salient discrepancy between the LW and the SW. Whereas in the epenthesis candidate
(47a), all of the features and perceptual cues of the SW are maintained; the only difference is that
they now exist for a longer duration in time, occurring over both the [s] and the [i]. For the
candidate 47a, the only perceptual change from the SW is then a temporal one, which is a lesser
change than any other of the repair options. Further, the concept of Enhance Perceptibility adds
to explaining the epenthesis bias seen in LWA; not only is epenthesis a minimal perceptual
departure from the SW, it enhances the perceptibility of the SW. In other words, epenthesis
serves two purposes: it repairs the SW in with a minimal perceptual departure, as well as
enhancing the perceptibility of the LW. I propose that it is for these reasons why epenthesis is
138
strongly preferred to deletion. Related to this is the Retention bias. In a Retention candidate, i.e.
[smapo] ! [smapo], no change is made, and no change equates to no perceptual departure.
5.2.2.1 THE CUE HYPOTHESIS, PERCEPTUAL SIMILARITY, AND CUE ENHANCEMENT. The
claims made in this dissertation have further connections to Optimality Theory and the P-Map.
The Cue Hypothesis, which posits the conditions of Preserve Cue and Enhance Perceptibility,
was shown to behave asymmetrically with respect to word-position. Consonants (and their
corresponding perceptual cues) were more likely to be preserved in word-medial and word-initial
consonant clusters. Conversely, they were more likely to be deleted in word-final consonant
clusters than in any other position. It is claimed that this is because Preserve Cue is sensitive to
the cues of the consonants involved. Where consonants are more strongly cued, their
preservation is more necessary, and where they are perceptually weaker, their preservation is less
necessary.
This aspect of Preserve Cue can be implemented in Optimality Theoretic terms via the P-
Map. The P-Map is posited to be a mental representation of the distinctiveness of various
contrasts, which informs a ranking of correspondence (Faithfulness) constraints, which tends
towards minimal change in repairs. Broadly, phonological processes that result in a minimal
perceptual change are preferred to those that result in a large perceptual change. In essence, this
is parallel to the effect of Preserve Cue acting on different word positions across a large set of
loanwords. Deletion of a consonant in a word-final consonant cluster undergoes a minimal
perceptual change, as compared to deletion of a consonant in other positions.
In the P-Map, deletion is evaluated by comparing a consonant against the result of
deleting the consonant, i.e. nothing, expressed by the symbol: Ø. If a consonant is perceptually
strong, e.g. one that is followed by a vowel, as in C
1
V, then deleting this consonant is a relatively
greater change than deleting a consonant in a perceptually weak position, e.g. one that follows
another consonant C
3
in VC
2
C
3
#. The perceptual difference between C
1
and Ø is greater than the
perceptual difference between C
3
and Ø: C-Ø/_V > C-Ø/ C_#.
Constraints prohibiting deletion that are sensitive to these contexts are capable of
explaining the word-position asymmetries observed in this dissertation. For example consider
two correspondence constraints projected by the P-Map, prohibiting the deletion of consonants in
the two environments discussed in the preceding paragraph:
139
(48) MAX C/_V
Deletion of a consonant before a vowel is prohibited
(49) MAX C/ C_#
Deletion of a word final consonant after a consonant is prohibited.
The perceptual change resulting from a violation of the constraint in 48 is larger than the one
resulting from a violation of the constraint in 49. Per the P-Map, the constraint in 48 is
universally ranked over the constraint in 49. When speakers adapt LWs, they more frequently
violate 49 than 48, as a violation of 49 is less serious. Across time, then, there should be more
observations of deletion of a consonant in a word-final consonant cluster than anywhere else; and
this is what the data show.
In more formal terms, in order for deletion to occur, a markedness constraint compelling
deletion must be ranked above one (or both) of these constraints. In the LWA process, if
markedness is ranked probabilistically with respect to PC each time a speaker borrows a foreign
word, then it is easier for markedness to outrank MAX C/ C_#. In other words, if the constraints
in 48 and 49 have fixed ranking values, and if markedness receives a random value each time a
SW is adapted into a LW, the markedness constraint will be more frequently ranked above MAX
C/ C_ # than it will be ranked above MAX C/_V. Thus, deletion in a final consonant cluster will
occur more often than deletion in an initial (or medial) cluster. This is the pattern observed in
the data. Precisely how this is formally captured and implemented in Optimality Theory remains
open for future research.
Enhance Perceptibility claims that epenthesis in LWs is sometimes compelled by the
need to enhance the perceptibility of perceptually weak consonants. Inserting an epenthetic
vowel into a consonant cluster does this by enhancing the perceptual cues of the consonants in
the cluster, adding vowel formant transition cues that previously did not exist in the SW. EP
likewise has a precursor/correlate in the P-Map and Optimality Theory. Côté (2000) proposes
Cue Enhancement as a principle within the P-Map theory. Cue Enhancement is used to explain
various patterns of epenthesis in consonant clusters in different positions; in positions where the
cues of a consonant are weak, epenthesis is compelled to enhance the cues of that consonant.
Two of the Cue Enhancement constraints used in Côté (2000) have direct applicability and
explanatory power in the LW data. These two constraints are given below.
140
(50) C$V
A consonant is adjacent to a vowel
(51) C%V
A consonant is followed by a vowel
According to Côté, C$V is stringently ranked over C%V. These constraints and their universal
ranking are capable of explaining patterns of epenthesis in consonant clusters in different
locations in the word, as seen in Chapter 3. Consider the three word positions that were
examined, word-initial, word-final, and word-medial, in the following table that tallies constraint
violations.
CC word position C$V C%V
a. #C
1
C
2
V * *
b. #VC
1
C
2
V *
c. VC
1
C
2
# * **
TABLE 32. Constraint Violation Tallies
According to these constraints, (b) is the best candidate, and (c) is the worst candidate. C
1
in (a)
incurs a violation of both constraints, as this consonant is neither adjacent to a vowel, nor does it
follow a vowel. C
1
in (b) incurs a violation of the lower ranked constraint C%V, as it is not
followed by a vowel. Both consonants in (c) incur a violation of C%V, as neither precedes a
vowel, and C
2
in (c) additionally incurs a violation of C$V, as it is not adjacent to any vowel.
Thus, in terms of Cue Enhancement, (c) in Table 32 is the worst candidate. That is, the
fully faithful candidate (FFC; McCarthy 1999) for the input VCC (c) incurs a superset of the
violations that are incurred by the input CCV (a), which in turn is worse off than the FFC for the
input of VCCV (b). Because it is the worse candidate in terms of these constraints, one could
think of it as the most deserving of epenthesis. In terms of the Cue Hypothesis combined with
the Cue Enhancement constraints, a change made to a SW made by an epenthetic vowel does
more work for word-final consonant clusters than for other consonant clusters. Thus, epenthesis
141
in final clusters is less intolerable
50
than epenthesis in word-medial clusters, which is precisely
how Enhance Perceptibility was formulated above.
In all, we see that LWA is not a phonological phenomenon that occurs independently of
native phonological grammars. The Cue Hypothesis is not disconnected from L1/Native
grammatical principles and theories of phonology as well; it indeed has direct correlates and
precursors in phonological theory, and has the potential to be accounted for by Optimality
Theory, especially versions of OT that invoke the P-Map theory of Correspondence.
5.3 CONNECTION WITH THEORIES OF LOANWORD ADAPTATION. The above section
sketched some connections with the cross-linguistic patterns regarding LWA of sound sequences
to formal phonological theory, especially one that posits the primacy of perception in phonology,
i.e. the P-Map. In the previous chapters, I argue that for the primacy of perception in LWA,
specifically the importance of perceptual cues in describing the data. Given all of this, it may
justifiably be concluded that perception plays a role in LWA, if not the main role. And in fact,
this is precisely what the more widely accepted theory of LWA claims as well.
As mentioned above, there are two main theories regarding the nature of LWA, the
Perceptual Theory, and the Bilingual Theory. The main difference between these two theories
regards the type of information that speakers use when borrowing a word from a foreign
language and integrating it into their native language. The Bilingual Theory, originally put forth
by Paradis and LaCharité (1997), posits that bilinguals – who are proficient enough to have
abstract phonological knowledge of their second language (L2) – operate over phonemic
representations in the L2 to adapt LWs. In opposition to this is the Perceptual Theory, originally
put forth by Silverman (1992), and since developed by Yip (1993, 2002), Kang (2004),
Kenstowicz (2001, 2005, 2007), and others, which posits that speakers use their perception of the
raw acoustic signal of a foreign word as the basis for how they adapt that word into their native
language.
The findings of this dissertation are generally aligned with the Perceptual Theory of
LWA, both empirically and theoretically. The primacy of perceptual cues approach, i.e. the Cue
Hypothesis, falls within the Perceptual Theory of LWA. This is especially true for the
observations in Chapter 3, where word-position asymmetries in LWA were found. Epenthesis
50
Recall that EP is in conflict with Faithfulness to the SW.
142
and deletion were found to be most common word-finally, and less common in other positions.
It was argued that this is due to the primacy of perceptual cues in LWA. A consonant cluster in
word-final position contains consonants that are the least perceptible. Per the Cue Hypothesis,
preserving these consonants is of least importance, explaining the commonality of deletion in
this position. Likewise, epenthesis in a word-final consonant cluster is of most importance, in
order to enhance the perceptibility of these relatively weak consonants, explaining the prevalence
of epenthesis word-finally. The analysis of LWA in Tongan likewise pointed to the importance
of perception in how LWs are adapted. Similar word position asymmetries were found: deletion
most frequently occurs in word-final consonant clusters in Tongan. Tongan also showed a
tendency where more sonorous sounds (such as nasals and approximants) are more likely to be
retained, and less sonorous sounds (such as stops and fricatives) are more likely to be deleted.
The distribution of epenthesis and deletion in word-final consonant clusters in Tongan was
proposed to be a function of the overall sonority of the consonant cluster. If there is a correlation
between sonority and perceptibility (as was argued for in Chapter 4), the conclusion that
perception plays a key role in LWA is further supported. The findings in this dissertation more
closely resonate with the Perceptual Theory of LWA.
In addition to this, many LWA data independently seem to be consistent with the
Perceptual Theory. In the 9,918 LWs I examined in this study (in addition to others), I
encountered many data that are ostensibly more consistent with the Perceptual Theory.
However, some data suggest validity of the Bilingual Theory (which will be discussed below).
Data on the adaptation of palatalized consonants in Russian SWs perhaps most robustly
and uniformly support the perceptual account of LWA. When borrowing Russian words
containing palatalized consonants, languages commonly retain the consonant, but not its
secondary articulation, (i.e. the palatalization). Consider Ket LWs borrowed from Russian
(Vajda and Nefedov 2009).
(52) Ket Loanwords from Russian
a. [tSajnik] ! [sajnik] ‘the kettle’
b. [kruglij] ! [kruglaj] ‘round’
c. [t
j
ur
j
ma] ! [turma] ‘the prison’
d. [pomogat
j
] ! [pomogata] ‘to help’
143
Ket speakers can and do pronounce the palatal glide [j], as evidenced by the forms in 52a and
52b; the palatal sound is retained when it is a full phoneme. However, when it is part of the
secondary articulation of a consonant, for example the final [t
j
] in 52d, Ket speakers have
adapted the words with plain consonants, i.e. without the palatalization. This is readily explained
in perceptual terms, as well as within the Cue Hypothesis presented in this dissertation. The
perceptual cues for the phoneme [j] are more robust than for the secondary articulation [
j
]. Per
Preserve Cue, omitting/deleting the secondary articulation [
j
] is relatively less egregious than
omitting/deleting the full phoneme [j]. Likewise, this data may be argued to support the
Perceptual Theory over the Bilingual Theory. If Ket speakers used Russian (L2) phonemic
representations to adapt LWs, one might expect the palatal feature to be retained, as they would
ostensibly have phonological knowledge of the secondary articulation, however weakly it is cued
in the acoustic signal. They do not behave this way, however, which may be explained in
perceptual terms: It could be that Ket speakers retained the phoneme [j] because it is
perceptually robust. [j] has a relatively obvious presence in the acoustic signal, whereas the
palatalization of a consonant does not.
Similar data in Bezhta (Comrie and Khalilov 2009) likewise support a perceptual account
of LWA.
(53) Bezhta Loanwords from Russian
a. [artel
j
] ! [artel] ‘the workshop’
b. [krovat
j
] ! [kuruwat] ‘the bed’
c. [p
j
atnitsa] ! [pijatnitsa] ‘Friday’
d. [bal
j
nitsa] ! [balintsa] ‘the hospital’
e. [sud
j
ja] ! [sudija] ‘the judge’
These data are different from Ket in that palatalization is sometimes retained in the LW.
However, it is not always retained: when palatalization is on the word-final consonant in the
SW, it is not retained in the LW, as seen in 53a and 53b. When the palatalized phoneme occurs
word-initially (53c) or word-medially (53d and 53e) in the SW, it is retained in the LW, showing
up as the (palatal) vowel [i]. This pattern can be explained in terms of perception: where
palatalization is weakly cued, i.e. word-finally – where the secondary articulation is not followed
by a vowel – it is lost. Where it is more strongly cued (word-initially and internally), it is
retained. It thus appears as though speakers are not using phonemic representation in their L2
144
(Russian); rather, they are using information in the acoustic signal, a view consistent with both
the Cue Hypothesis and the Perceptual Theory of LWA.
Sakha seems to behave in the same way as Bezhta, with slight differences, shown in 54
(Pakendorf and Novgorodov 2009). Palatalization is not retained word-finally (54a and 54b), but
it is in other positions (in a less uniform way than Bezhta).
(54) Sakha Loanwords from Russian
a. [pil
j
] ! [bi:l] ‘dust’
b. [tsep
j
] ! [siap] ‘the chain’
c. [sud
j
ja] ! [sudZaya] ‘the judge’
d. [svad
j
ba] ! [siba:jba] ‘the wedding’
e. [svin
j
a] ! [sibin!a] ‘the pig’
f. [brus] ! [buru:s] ‘the whetstone’
g. [br
j
uki] ! [byryke] ‘the trousers’
The form in 54d is similar to Bezhta, in that the palatalization shows up as a [j] in the
loanword
51
. The LW in 54c can be said to retain the palatalization of in the source word. That
is, [d
j
] in Russian was adapted as alveolo-palatal [dZ] in Sakha. The same is true for 54e, where
the palatalization shows up on the nasal. 54g can be described as the palatalization in the SW
showing up on the vowel(s) in the LW. A regular [bru] sequence in Russian is adapted with
epenthesis, with the quality of the vowel remaining the same (54f). However, in 54g, it appears
as though the palatalization is retained in the vowel. The round vowel in the SW (that is, [u])
remains a round vowel in the LW, however, the vowel is fronted, becoming [y], ostensibly
because of the frontness of the secondary articulation in the LW. If speakers of Sakha were
using L2 (Russian) phonemic representations in the LWA process, one might expect the
treatment of the palatalized consonant to be stable/constant, as phonemes are stable and constant.
However this consonant is borrowed in different, dynamic ways, possibly a reflection the
dynamic nature of the acoustic signal (which comes from the dynamic and overlapping nature of
speech). Either way, a perceptual account of Sakha and Bezhta explains why the (palatalization)
contrast is neutralized word-finally in LWs, but shows up in non-final positions.
51
Note, however, that the alveolar sound [d] was not retained, but was perhaps encoded in the length of the vowel,
due to compensatory lengthening.
145
Other types of data from other languages likewise seem to support the Perceptual Theory
of LWA. This may be seen in how a number of languages adapt word-final consonants in
French. Certain words in French that end in an orthographic (but silent) consonant may be
described as having the consonant in the underlying form, which is normally deleted in the
surface form (Tranel 1981).
(55) French consonant deletion
Orthographic form
52
Underlying form Surface form
a. “le bouton” [l´ buton] [l´ buto&]
b. ‘le bouton est là” [l´ buton e la] [l´ buton e la]
c. “le permis” [l´ pEÂmiz] [l´ pEÂmi]
d. “le permis est là” [l´ pEÂmiz e la] [l´ pEÂmiz e la]
In the word for “the button” in 55a, the word is pronounced without a consonant, but with a nasal
vowel. If this word precedes a vowel-initial word, the /n/ is retained, as in 55b. The same goes
for word for “the permit” in 55c. Though not usually pronounced in (55c), the /z/ surfaces
before a vowel-initial word, as in 55d.
If this analysis of French is correct, then it follows that bilingual speakers whose L2 is
French would have phonological knowledge of the presence of the silent phoneme. The
Bilingual Theory may then predict that the silent phoneme may show up in LWs, as speakers use
their phonological knowledge of the L2 in LWA. This prediction is partially verified. Consider
loanwords in Hausa (Awagana and Wolff 2009).
(56) Hausa Loanwords from French
a. ‘bouton’ [butO&] ! [buto] ‘the button’
b. ‘permis’ [pEÂmi] ! [fa:rmi] ‘the permit’
c. ‘maçon’ [masO&] ! [ma:son] ‘the mason’
d. ‘président’ [pÂezidE&] ! [farzidan] ‘the president’
At first glance, these data may appear to be ambiguous, supporting both the Perceptual Theory
and the Bilingual Theory. 56a and 56b may be explained by the Perceptual Theory: as the
acoustic signal lacks a nasal stop consonant, so do the loanwords. However, 56c is a form that
may be predicted by the Bilingual Theory. A sound (/n/) that is not in the acoustic signal, but is
part of the phonemic representation of the word in the L2, is indeed in the loanwords.
52
Glosses/translations - 55a: “the button”; 55b: “the button is there”; 55c: “the permit”; 55d ‘the permit is there.”
146
This dichotomy may indeed be accurate; more data on Hausa is needed to confidently
determine the correct description. However, I contend that the data in 56 are best explained by
the Perceptual Theory. All of the Hausa loanwords from French of types 56c, that is, the LWs
that reflect the orthographic but silent consonant in French, are words that end in a nasal. In
spoken French, this is normally realized without the tongue tip gesture of the /n/, but with a
velum lowering gesture, resulting in heavily nasalized vowel, e.g. “bouton” [butO&]. Hausa does
not have phonemic nasal vowels (Greenberg 1941). Then, in order to more precisely mimic the
French word, a nasal stop is pronounced
53
. This has the effect of replicating the nasality of the
vowel in the French word. Assuming that Hausa is not articulatory aberrant, pronouncing the
word with a final nasal, as in [ma:son], will necessarily lead to (at least partial) nasalization of
the /o/: the velum lowering gesture of /n/ starts during during the vowel sound. [ma:son] could
therefore be more accurately transcribed as [ma:so&n]. In this way, the LW is made more similar
to the SW. Having the final /n/ also independently ensures the redundancy of cues to nasality
54
.
In other words, [ma:son] is a well formed borrowing of the French word “maçon” because the /n/
in the LW serves a functional and linguistic purpose, but *[permiz] is not a well formed
borrowing of the French word “permis,” as pronouncing this word with a final /z/ results in little
or no effect on the perception of the vowel. Pronouncing the LW as *[permiz] would thus be
gratuitous, on top of creating a phonotactically marked closed syllable.
Support for this claim comes from the word in 56d, that is, the word for “president”
borrowed into Hausa as [farzidan]. The /n/ achieves the same thing as it does for [ma:son],
replicating the nasal vowel in the SW, and preserving phonological information (i.e. the
nasality). That is, the /n/ in [ma:son] may not necessarily come from the /n/ in the borrower’s L2
grammar. This proposition explains why the /t/ is absent in the word for “president.” If the
borrower were using (only) phonemes to create the LW, then there is no obvious explanation for
why the form is [farzidan], and not *[farzidanta], *[farzidat], or something else. The /t/ is not in
the LW because it is not in the acoustic signal. The /n/ is in the loanword because nasality is in
the acoustic signal.
At this point, this proposition cannot be assessed with much confidence. The discovery
of a Hausa LW that pronounced a stop consonant that is orthographical/silent in the SW, such as
53
The concept of “mimic” is formalized within the Perceptual Theory of LWA by Yip (2002).
54
I contend that this happens because it maximizes the probability of communicative success, through minimizing
possible confusion, by having a LW that is a minimal perceptual departure from the SW.
147
“permis” being adapted as *[permiz], would falsify this hypothesis. Conversely, lack of such
words would support it. The data I currently have for Hausa are insufficient one way or another.
However, another language with similar data and patterns both illuminates and further
complicates the matter. Consider Malagasy LWs borrowed from French (Adelaar 2009).
(57) Malagasy loanwords from French
a. ‘cadenas’ [kadena] ! [kadana] ‘the padlock’
b. ‘robinet’ [Âobine] ! [roabine] ‘the facuet’
c. ‘jupon’ [ZupO&] ! [zipo] ‘the dress’
d. ‘lion’ [liO&] ! [liona] ‘the lion’
e. ‘éléphant’ [elefE&] ! [elefanta] ‘the elephant’
57a and 57b are congruous to 56a and 56b: as there is no final consonant in the surface form of
the SW (i.e. the acoustic signal), there is no final consonant in the LW. 57c is the same way,
except that a nasal (and nasal vowel) is involved. 57d is counter to this: a nasal stop that is not
in the normal pronunciation of the French word is in the loanword. But the datum of most
important is 57e, the borrowing of ‘éléphant’ [elefE&] ! [elefanta]. This loanword lends support
to the Bilingual Theory. The /t/ only exists in a phonemic representation (and or in the
orthography); it is (ostensibly) not present in the acoustic signal, yet shows up in the loanword
55
.
Malagasy is not the only language that has data that are consistent with the Bilingual
Theory. Similar data that could be interpreted as consistent with the Bilingual Theory exists for
French loanwords in Kalina and Tarifyt Berber. And it is most likely not a coincidence that all
such examples come from borrowings from French. The Bilingual Theory was originally
formulated in response to similar data, that is, French loanwords in Fula (Paradis and LaCharité
1997). However, although the French language and its speakers may be a factor/influence, there
are indeed LWs from other SWs that are best explained under the lens of bilingualism, in
addition to data that are best explained perceptually
56
.
From this, I believe it is safe to assume that neither the Perceptual Theory nor the
Bilingual Theory are truisms. The data and analysis in the previous chapters, and the data
discussed here, are mostly in line with the Perceptual Theory. Speech perception must play at
55
The form in 57e could also be due to speakers using merely their knowledge of the orthography of the Lb, and not
necessarily any L2 grammatical knowledge. Disentangling this is left open as a task for future research.
56
For example, see 40, which showed English loanwords in Tongan that are best described in perceptual terms.
148
least some role in the LWA process. However, it is far from accurate to claim that it plays the
only role in LWA.
I conclude that loanwords and loanword phonology, like any human phenomenon, is
complex and influenced by a range of different factors. Loanword adaptation seems to be mostly
driven by perceptual factors, yet some cases exist that suggest the primacy of bilingualism in the
description of loanword adaptation. It thus seems plausible that a unified theory of loanword
adaptation is necessary, where both the tangible (acoustics, perception) and the abstract
(grammatical knowledge) work in harmony to explain patterns like Malagasy. However, uniting
the material (body/matter) with the phenomenological (mind/experience) is a challenging
problem, remaining unsolved since The Enlightenment. These and related issues are thus left
open for future exploration.
149
REFERENCES
ADELAAR, ALEXANDER. 2009. Malagasy vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
AWAGANA, ARI, and H. EKKHARD WOLFF (WITH DORIS LÖHR). 2009. Hausa vocabulary. World
Loanword Database, ed. by Martin Haspelmath, and Uri Tadmor. Online:
http://wold.livingsources.org/.
BLEVINS, JULIETTE. 1995. The syllable in phonological theory. In John A. Goldsmith (ed.) The
Handbook of Phonological Theory, Blackwell Handbooks in Linguistics 2.206-44.
Cambridge: Cambridge University Press.
BERKO, JEAN. 1958. The child’s Learning of English morphology. Word 14.150-177.
BROWMAN, CATHERINE, and LOUIS GOLDSTEIN. 1992. Articulatory Phonology: An overview.
Phonetica 49.155-80.
BROWMAN, CATHERINE, and LOUIS GOLDSTEIN. 1998. Articulatory gestures as phonological
units. Phonology 6.201-51.
BYRD, DANI. 1994. Articulatory timing in English consonant sequences. Los Angeles: UCLA
Working Papers in Linguistics (UCLA Dissertation).
BYRD, DANI, and ELLIOTT SALTZMAN. 2003. The Elastic Phrase: Modeling the dynamics of
boundary adjacent lengthening. Journal of Phonetics 31.149-80.
BYRD, DANI, SUNGBOK LEE, DAYLEN RIGGS, and JASON ADAMS. 2005. Interacting effects of
syllable and phrase position on consonant articulation. Journal of the Acoustical Society
of America 118.3860-73.
150
CARR, PHILIP. 1999. English phonetics and phonology: An Introduction. Oxford: Blackwell.
CHURCHWARD, C. MAXWELL. 1959. Tongan dictionary. London: Oxford University Press.
CLARK, ROSS. 1990. Austronesian Languages. The world’s major languages, ed. by Bernard
Comrie, 899-912. Oxford: Oxford University Press.
CLEMENTS, GEORGE. N. 1992. The sonority cycle and syllable organization. Phonologica 1988,
ed. by Wolfgang Dressler, Hans Luschutzky, Oskar E. Pfeiffer & John R. Rennison, 63–
76. Cambridge: Cambridge University Press.
CLEMENTS, GEORGE. N., and SAMUEL. J. KEYSER. 1983. CV Phonology. Cambridge, MA: MIT
Press.
CLYNES, ADRIAN, and DAVID DETERDING. 2011. Standard Malay (Brunei). Journal of the
International Phonetic Association 41.259–68.
COMRIE, BERNARD, and MADZHID KHALILOV. 2009. Bezhta vocabulary. World Loanword
Database, ed. by Martin Haspelmath, and Uri Tadmor. Online:
http://wold.livingsources.org/.
CÔTÉ, MARIE-HÉLÈNE. 2000. Consonant cluster phonotactics: A perceptual approach.
Cambridge, MA: MIT Dissertation.
EL"ÍK, VIKTOR. 2009. Selice Romani vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/
EVANS, TOSHIE. 1997. A dictionary of Japanese loanwords. Westport, CT: Greenwood.
FELDMAN, HARRY. 1978. Some notes on Tongan phonology. Oceanic Linguistics 17.133-9.
151
FLEISCHHACKER, HEIDI. 2000. The location of epenthetic vowels with respect to consonant
clusters: an auditory similarity account. Los Angeles: UCLA MS.
FLEMMING, EDWARD. 1995. Auditory representations in phonology. Los Angeles: UCLA
Dissertation.
FLEMMING, EDWARD. 2004. Contrast and perceptual distinctiveness. Phonetically based
phonology, ed. by Bruce Hayes, Robert Kirchner, and Donca Steriade. Cambridge:
Cambridge University Press.
GOLDSMITH, JOHN. 1976. Autosegmental phonology. Bloomington: Indian University
Linguistics Club.
GOOGLE TRANSLATE. Online: http://translate.google.com.
GREENBERG, JOSEPH. 1941. Some problems in Hausa phonology. Language 17.316-23.
GREENBERG, JOSEPH. 1978. Some generalizations concerning initial and final consonant
clusters. Universals of human hanguage II: Phonology, ed. by Joseph Greenberg, 243-
80. Stanford: Stanford University Press.
GUNDERT, HERMANN. 1992. A Malayalam and English dictionary. New Delhi: Asian
Educational Services
HASPELMATH, MARTIN, and URI TADMOR (eds). 2009. World Loanword Database. Online:
http://wold.livingsources.org/.
HAWKINS, JOHN A. 1990. Germanic languages. The world’s major languages, ed. by Bernard
Comrie, 68-76. Oxford: Oxford University Press.
152
HESELWOOD, BARRY. 2006. Final schwa and R-sandhi in RP English. Leeds Working Papers in
Linguistics & Phonetics 11.78-95.
HOOPER, JOAN B. 1976. An introduction to natural generative phonology. New York:
Academic Press.
IZ, FAHIR, and H.C. HONY. 1954. A Turkish-English dictionary. Oxford: Oxford University
Press.
JAKOBSON, ROMAN. 1968. Child language: aphasia and phonological universals. The Hague:
Mouton.
JESPERSEN, OTTO. 1904. Lehrbuch der Phonetik. Leipzig and Berlin: Teubner
JOHNSON, KEITH. 1997. Acoustic and auditory phonetics. Oxford: Blackwell.
GRIJNS, C.D., and JAN W. DE VRIES. 2007. Loan-words in Indonesian and Malay. Bijdragen tot
de Taal-, Land- en Volkenkunde, ed. by Russell Jones. Leiden: KITLV Press.
INGRAM, DAVID. 1999. Phonological acquisition. The Development of language, ed. by M.
Barrett. East Sussex: Psychology Press.
KAIAO, MAMAKA. 2003. Modern Hawaiian vocabulary. Honolulu: University of Hawaii Press.
KANG, YOONJUNG. 2004. Perceptual similarity in loanword adaptation: English postvocalic
word-final stops in Korean. Phonology, 20.219-74.
KEATING, PATRICIA, TAEHONG CHO, CECILE FOUGERON, and CHAI-SHUNE HSU. 2003. Domain-
initial strengthening in four languages. Laboratory Phonology 6.145-63. Cambridge:
Cambridge University Press.
153
KENSTOWICZ, MICHAEL. 2003. The role of perception in loanword phonology. Studies in African
Linguistics 32.95-112.
KENSTOWICZ, MICHAEL. 2005. The Phonetics and Phonology of Korean Loanword Adaptation.
Proceedings of the First European Conference on Korean Linguistics, ed. by S.J. Rhee.
Seoul: Hankook Publishing Company.
KENSTOWICZ, MICHAEL. 2007. Salience and similarity in loanword phonology: A case study
from Fijian. Language Sciences 29.316-340.
KIRCHNER, ROBERT. 1997. Contractiveness and Faithfulness. Phonology 14.83-111.
KIRCHNER, ROBERT. 2004. Consonant Lenition. Phonetically based phonology, ed. by Bruce
Hayes, Robert Kirchner, and Donca Steriade. Cambridge: Cambridge University Press.
KRUSPE, NICOLE. 2009. Ceq Wong vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
LACHARITÉ, DARLENE, and CAROLE PARADIS. 2002. Addressing and disconfirming some
predictions of phonetics for loanword adaptation. Langues et Linguistique 28.71-91.
LADEFOGED, PETER. 2001. Vowels and consonants. Oxford: Blackwell.
LADEFOGED, PETER. 2006. A course in phonetics. Boston: Thomson Wadsworth.
LADEFOGED, PETER, and IAN MADDIESON. 1995. The sounds of the world’s languages. Oxford:
Blackwell.
LEVIN, JULIETTE. 1987. Between epenthetic and excrescent vowels. Proceedings of the West
Coast Conference on Formal Linguistics 6, 187-201.
154
LEWIS, M. PAUL, GARY F. SIMONS, and CHARLES D. FENNIG (eds.). 2013. Ethnologue: Languages
of the World, Seventeenth edition. Dallas, Texas: SIL International. Online
version: http://www.ethnologue.com.
LI, CHARLES, and SANDRA THOMPSON. 1981. Mandarin Chinese: A functional reference
grammar. Berkeley: University of California Press.
LÖHR, DORIS, and H. EKKARD WOLFF. 2009. Kanuri vocabulary. World Loanword Database,
ed. by Martin Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
MAHLANGU, KATJE SPONGO. 2007. Adoption of loanwords in Insindebele. Pretroria:
University of Petroria. MS.
MALINAS, GARY, and JOHN BIGELOW. 2012. Simpson’s Paradox. The Stanford Encyclopedia of
Philosophy, ed. by Edward Zaita. Online
http://plato.stanford.edu/archives/win2012/entries/paradox-simpson/.
MARSLEN-WILSON, WILLIAM. 1987. Functional parallelism in spoken word recognition.
Cognition 25.75-192.
MCCARTHY, JOHN J. 1979. Formal problems in Semitic phonology and morphology.
Cambridge, MA: MIT Dissertation.
MCCARTHY, JOHN J. 1999. Sympathy and phonological opacity. Phonology 19.273-92.
MCCARTHY, JOHN J. 2000. Harmonic serialism and paralellism. Proceeings of the North East
Linguistics Society 30, ed. by Masako Hirotani, Amherst: GLSA Publications, 501-24.
MCCARTHY, JOHN J., and ALAN PRINCE. 1994. The emergence of the unmarked: Optimality in
prosodic morphology. In Proceedings of the North East Linguistics Society 24, Amherst:
GLSA Publications, 333-379.
155
MCCARTHY, JOHN J., and ALAN PRINCE. 1995. Faithfulness and reduplicative identity.
University of Massachusetts Occasional Papers in Linguistics 18: Papers in Optimality
Theory, ed. by Jill Beckman, Suzanne Urbanczyk, and Laura Walsh Dickey, Amherst,
249-238
MORTENSEN, ERIK LYKKE, and ANDERSE GADE. 1993. On the relation between demographic
variables and neuropsychological test performance. Scandanavian Journal of Psychology
34.305-7
MWITA, LEONARD CHACHA. 2009. The adaptation of Swahili loanwords From Arabic: A
constraint-based analysis. The Journal of Pan African Studies 8.46-61.
NAM, HOSUNG; VIKRAMJIT MITRA; MARK TIEDE; MARK HASEGAWA-JOHNSON; CAROL ESPY-
WILSON; ELLIOT SALTZMAN, and LOUIS GOLDSTEIN. 2012. A procedure for estimating
gestural scores from speech acoustics. Journal of the Acoustical Society of America
132.3980-9.
NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY. 2012. Engineering Statistics
Handbook. Online: http://www.itl.nist.gov/div898/handbook/
NGUYEN, DINH-HOA. 1990. Vietnamese. The world’s major languages, ed. by Bernard Comrie,
777-796. Oxford: Oxford University Press.
PAE, YANG SO. 1968. English Loanwords in Korean. Ann Arbor: University Microfilms Inc.
PAKENDORF BRIDGET, and INNOKENTIJ NOVGORODOV. 2009. Sakha vocabulary. World
Loanword Database, ed. by Martin Haspelmath, and Uri Tadmor. Online:
http://wold.livingsources.org/
156
PARADIS, CAROLE, and DARLENE LACHARITÉ. 1997. Preservation and minimality in loanword
adaptation. Journal of Linguistics 33.379-430.
PARKER JONES, OIWI. 2009. Hawaiian vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/
PEARSON, KARL. 1900. On the criterion that a given system of deviations from the probable in
the case of a correlated system of variables is such that it can be reasonably supposed to
have arisen from random sampling. Philosophical Magazine Series 5 50.157–75.
PRINCE, ALAN AND PAUL SMOLENSKY. 1993. Optimality Theory: Constraint Interaction in
Generative Grammar. New Brunswick: Rutgers University, Baltimore: Johns Hopkins
University, ms.
PRINCE, ALAN AND PAUL SMOLENSKY. 2004. Optimality Theory: Constraint Interaction in
Generative Grammar. Oxford: Wiley-Blackwell
RENDÓN, JORGE GOMEZ. 2009. Imbabura Quechua vocabulary. World Loanword Database, ed.
by Martin Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/
RIGGS, DAYLEN. 2010. Minimal Salience and the quality of epenthetic vowels in loanwords.
Los Angeles: University of Southern California, MS
RIGGS, DAYLEN. 2013. Minimal salience and the quality of epenthetic vowels in loanwords. The
Proceedings of the 41
st
Meeting of the North East Linguistic Society (NELS 41) Volume
II, ed. by Y. Fainleib, N. LaCara and Y. Park, 123-36.
ROSE, YVAN, and KATHERINE DEMUTH. 2006. Vowel epenthesis in loanword adaptation:
Representation and phonetic considerations. Lingua 116.1112-39.
157
SALAMI, ADEBISI. 1972. Vowel and consonant harmony and vowel restriction in assimilated
English loan words in Yoruba. African Language Studies 13.162-81.
SALTZMAN, ELLIOT, HOSUNG NAM, JELENA KRIVOKAPIC, and LOUIS GOLDSTEIN. 2008. A task-
dynamic toolkit for modeling the effects of prosodic structure on articulation. 2008.
Proceedings of the 4th International Conference on Speech Prosody (Speech Prosody
2008), Campinas, 175–84.
SCHADEBERG, THILO. 2009. Swahili vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/
SCHINDEL, RYAN, JEMMA ROWLANDS, and DEREK H. ARNOLD. 2011. The oddball effect:
Perceived duration and predictive coding. Journal of Vision 11.1-9.
SELKIRK, ELIZABETH. 1984. On the Major Class Features and Syllable Theory. Language Sound
Structure: Studies in Phonology, ed. by Mark Aronoff and Richard T. Oehrler, 107-136.
Cambridge, MA: Cambridge University Press.
SHADLE, CHRISTINE. 1985. The acoustics of fricative consonants. Research Lab of Electronics
Technical Report 506.45-52.
SILVERMAN, DANIEL. 1992. Multiple scansions in loanword phonology: evidence from
Cantonese. Phonology 9.289-328.
SIMPSON, EDWARD. 1951. The Interpretation of the Interaction in Contingency Tables. Journal
of the Royal Statistical Society, Series B 13.238-41.
STERIADE, DONCA. 1999. Phonetics in phonology: the case of laryngeal neutralization. UCLA
Working Papers in Linguistics 2, ed. by Matthew K. Gordon, 25-246.
STERIADE, DONCA. 2001. The phonology of perceptibility effects: The P-Map and its
158
consequences for constraint organization. Los Angeles: UCLA MS
STERIADE, DONCA. 2009. The phonology of perceptibility effects: The P-Map and its
consequences for constraint organization. The Nature of the Word, ed. by Kristin Hanson
and Sharon Inkelas. Cambridge, MA: MIT Press.
TOSCO, MAURO. 2009. Gawwada vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
TSE, PETER URLIC, JAMES INTRILLIGATOR, JOSEE RIVEST, and PATRICK CAVANAGH. 2004.
Attention and the subjective expansion of time. Perception & Psychophysics 66.1171-89.
TRANEL, BERNARD. 1981. Concreteness in Generative Phonology: Evidence from French.
Berkeley and Los Angeles: The University of California Press.
UFFMANN, CHRISTIAN. 2004. Vowel epenthesis in loanword phonology. Marburg: Philipps-
Universitat Dissertation.
UFFMANN, CHRISTIAN. 2006. Epenthetic vowel quality in loanwords: Empirical and formal
issues. Lingua 116.1079-1111.
VAJDA, EDWARD, and ANDREY NEFEDOV. 2009. Ket vocabulary. World Loanword Database,
ed. by Martin Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
VAN DER SIJS, NICOLE. 2009. Dutch vocabulary. World Loanword Database, ed. by Martin
Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
VÄLIMAA-BLUM, RIITA. 2005. Cognitive phonology in construction grammar. Berlin: Mouton
de Guyer.
159
WIEBUSCH, THEKLA. 2009. Mandarin Chinese vocabulary. World Loanword Database, ed. by
Martin Haspelmath, and Uri Tadmor. Online: http://wold.livingsources.org/.
WRIGHT, RICHARD. 2004. A Review of Perceptual Cues and Cue Robustness. Phonetically
based phonology, ed. by Bruce Hayes, Robert Kirchner, and Donca Steriade. Cambridge:
Cambridge University Press
YIP, MOIRA. 1993 Cantonese loanword phonology and optimality theory. Journal of
East Asian Languages 2.261-91.
YIP, MOIRA. 2002. Perceptual influences in Cantonese loanword phonology. The Journal
of the Phonetic Society of Japan. Special issue on Aspects of loanword phonology 6.4–21.
160
APPENDICES
Appendix A: List of languages
WOLD = World Loanword Database
GT = Google Translate
Language Genetic Classification Location Source(s) Number of
tokens
Archi Caucasian Dagestan, Russia WOLD 163
Armenian Indo-European Armenia GT 82
Azerbaijani Turkic Azerbaijan GT 124
Basque Isolate Spain, France GT 129
Bezhta Caucasian Dagestan, Russia WOLD 163
Cebuano Austronesian Philippines GT 65
Ceq Wong Austro-Asiatic, Mon-
Khmer
Malaysa WOLD 296
Dutch Indo-European Netherlands, Belgium,
etc
WOLD 184
English Indo-European UK, USA, etc WOLD 389
Finnish Uralic Finland GT 142
Gawwada Afro-Asiatic Ethiopia WOLD 66
Georgian Caucasian Georgia GT 97
Gujarati Indo-Aryan India GT 129
Gurindji Australian, Pama-
Nyungan
Australia WOLD 172
Hausa Afro-Asiatic, Chadi Nigeria WOLD 139
Hawaiian Austronesian,
Polynesian
Hawaii WOLD, Kaiao
2003
152
Hup Nadahup Brazil WOLD 28
Imbabura
Quechua
Quechuan Peru WOLD 198
Indonesian Austronesian, Malayic Indonesia WOLD, Grijns et
al (2007)
627
Iraqw Afro-Asiatic, Cushtic Tanzania WOLD 33
Irish Indo-European, Celtic Ireland GT 138
Insindebele Niger-Congo South Africa Mahlangu (2007) 105
Japanese Altaic, Isolate Japan WOLD, Evans
(1997)
259
Javanese Austronesian Java GT 115
Kali’na (Carib) Cariban Brazil WOLD 103
Kannada Dravidian India GT 97
Kanuri Nilo-Saharan Nigeria, Niger, Chad,
etc
WOLD 102
Ket Isolate Russia WOLD 86
Kildin Saami Uralic Russia WOLD 224
Korean Altaic/Isolate Korean Peninsula Pae (1968) 271
Lower Serbian Indo-European, Slavic Serbia WOLD 224
Macedonian Indo-European, Slavic Macedonia, Albania,
etc
WOLD GT
Malagasy Austronesian Madagascar WOLD 103
Malay Austronesian, Malayic Malaysia Grijns et al
(2007)
118
161
Malayalam Dravidian India Gundert (1992) 31
Maltese Afro-Asiatic, Semitic Malta GT 139
Manange Tibeto-Burman Nepal WOLD 44
Mapudungan
(Mapuche)
Araucanian Chile, Argentina WOLD 102
Marathi Indo-Aryan India GT 65
Oroqen Tungusic China WOLD 31
Romanian Indo-European Romania WOLD, GT 214
Sakha Turkic Russia WOLD 273
Saramaccan Creole Suriname, French
Guiana
WOLD 184
Selice Romani Indo-European, Slavic Slovakia WOLD 192
Seychelles
Creole
Creole Seychelles WOLD 123
Shona Bantu Zimbabwe,
Mozambique, etc
Uffmann (2004) 457
Swahili Bantu Kenya, Tanzania, etc WOLD 220
Tamil Dravidian India WOLD 67
Tarifyt Berber Afro-Asiatic Morocco WOLD 331
Thai Sino-Tibetan Thailand GT 117
Turkish Turkic Turkey GT, Iz & Hony
(2008)
151
Welsh Indo-European, Celtic Wales, UK GT 124
Yaqui Uto-Aztecan Mexico WOLD 190
162
Appendix B: Google translate and dictionary-search words
Initial CCs Medial CCs Final CCs Codas
blasphemy Afghanistan absinth (absinthe) accordion
blouse Albania advent airplane
Blues album Afrikaans album
bratwurst alcohol alcoholism angel
Britain Algebra algorithm aspirin
broker asphalt ambulance atom
Bromine asthma apartment basketball
Brussels bacteria asphalt bishop
Chlorine Bangladesh baptism blouse
Christ baptism capitalism boycott
chromosome baseball catechism bus
clarinet basketball cement calcium
clerk Belgium Christ calculus
clinic benzene church carbide
club biscuit cobalt cathedral
cream Bulgaria communism catholic
crepe cancer convent chocolate
crucifix captain crucifix chromosome
dram cathedral disk (floppy disk) clarinet
drama chimpanzee exorcism clinic
Flemish
confession
(confessional) fax club
Fluorine convent film condom
Franc delta France credit
France Denmark golf crepe
fresco discotheque Greenland devil
globalism doctor inch discotheque
glycerin emphysema jeans electron
gram England lens gospel
gravity football lent gram
Greenland fresco lozenge helium
Kroner
(currency) gospel Mars hormone
plastic hectare matrix hospital
Platinum hospital New Zealand hydrogen
Plutonium Israel paradox internet
Prague magnesium parliament jazz
president magnet pound (currency) magnet
pretzel microscope president morphine
Priest monsoon priest nun
prism neutron prism parish
professor nuclear province pastel
propeller passport racist penicillin
protein pastel rhythm photograph
scalpel petroleum (petrol) sacrament pilot
schizophrenia phosphorus saint pope
163
scone
pingpong (ping-
pong, ping pong) science pretzel
sketch plastic
second (there are sixty
seconds in one minute) Sabbath
Slovenia pulpit sex salad
spaghetti purgatory small pox (smallpox) scalpel
spinach rugby socialism shilling
sponge sacrament sponge stethoscope
sports saxophone Stockholm tampon
stethoscope scalpel syringe telephone
The train sulfur tent tennis
tractor symphony the post the train
tram telemetry toast truck
transplant testament tongs trumpet
trigonometry trumpet tournament vaccine
trinity vaccine transplant violin
truck vodka ultrasound virus
trumpet website Welsh website
164
Appendix C: Corpus Totals
Archi Initial CCs Medial CCs Final CCs Codas Total
Epenth 3 0 0 4 7
Retention 1 50 11 89 151
Deletion 0 4 0 1 5
Other 0 0 0 0 0
Total 4 54 11 94 163
Armenian Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 1 4 3 8
Retention 24 17 14 17 72
Deletion 1 1 0 0 2
Other 0 0 0 0 0
Total 25 19 18 20 82
Azerbaijani Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 2 3 2 8
Retention 32 26 20 34 112
Deletion 1 1 1 0 3
Other 0 0 1 0 1
Total 34 29 25 36 124
Basque Initial CCs Medial CCs Final CCs Codas Total
Epenth 10 1 18 17 46
Retention 28 35 5 3 71
Deletion 0 1 2 4 7
Other 0 2 1 2 5
Total 38 39 26 26 129
Bezhta Initial CCs Medial CCs Final CCs Codas Total
Epenth 13 2 1 42 58
Retention 1 3 7 3 14
Deletion 5 81 1 0 87
Other 0 3 1 0 4
Total 19 89 10 45 163
Cebuano Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 2 16 6 25
Retention 14 15 3 5 37
Deletion 0 2 0 0 2
Other 0 0 1 0 1
Total 15 19 20 11 65
Ceq Wong Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 1 0 0 1
Retention 1 19 0 233 253
Deletion 2 23 0 0 25
Other 1 11 0 5 17
Total 4 54 0 238 296
Dutch Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 3 5 1 9
Retention 27 47 17 76 167
Deletion 0 2 4 1 7
165
Other 0 0 0 1 1
Total 27 52 26 79 184
English Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 1 1 2 4
Retention 72 97 34 154 357
Deletion 1 4 9 2 16
Other 1 2 4 5 12
Total 74 104 48 163 389
Finnish Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 23 29 52
Retention 36 34 4 8 82
Deletion 2 0 1 2 5
Other 1 1 0 1 3
Total 39 35 28 40 142
Gawwada Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 1 6 31 38
Retention 0 23 0 0 23
Deletion 0 0 0 1 1
Other 0 4 0 0 4
Total 0 28 6 32 66
Georgian Initial CCs Medial CCs Final CCs Codas Total
Epenth 2 1 15 22 40
Retention 24 28 0 3 55
Deletion 1 0 0 1 2
Other 0 0 0 0 0
Total 27 29 15 26 97
Gujarati Initial CCs Medial CCs Final CCs Codas Total
Epenth 3 6 23 25 57
Retention 31 30 0 8 69
Deletion 0 0 1 2 3
Other 0 0 0 0 0
Total 34 36 24 35 129
Gurindji Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 0 0 5 6
Retention 0 64 23 57 144
Deletion 2 1 5 12 20
Other 0 2 0 0 2
Total 3 67 28 74 172
Hausa Initial CCs Medial CCs Final CCs Codas Total
Epenth 10 4 9 54 77
Retention 0 13 0 28 41
Deletion 0 5 6 7 18
Other 1 1 0 1 3
Total 11 23 15 90 139
Hawaiian Initial CCs Medial CCs Final CCs Codas Total
Epenth 11 10 5 93 119
Retention 0 1 0 1 2
166
Deletion 5 7 7 12 31
Other 0 0 0 0 0
Total 16 18 12 106 152
Hup Initial CCs Medial CCs Final CCs Codas Total
Epenth 3 1 0 1 5
Retention 0 4 0 3 7
Deletion 1 9 0 4 14
Other 0 2 0 0 2
Total 4 16 0 8 28
Imbabura Quechua Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 0 5 5
Retention 34 112 0 44 190
Deletion 0 0 0 1 1
Other 0 2 0 0 2
Total 34 114 0 50 198
Indonesian Initial CCs Medial CCs Final CCs Codas Total
Epenth 21 8 12 3 44
Retention 17 182 3 332 534
Deletion 7 6 16 3 32
Other 4 7 0 6 17
Total 49 203 31 344 627
Iraqw Initial CCs Medial CCs Final CCs Codas Total
Epenth 7 1 0 0 8
Retention 2 6 0 2 10
Deletion 5 1 0 0 6
Other 9 0 0 0 9
Total 23 8 0 2 33
Irish Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 11 2 13
Retention 41 34 14 33 122
Deletion 0 0 1 1 2
Other 0 1 0 0 1
Total 41 35 26 36 138
Isindebele Initial CCs Medial CCs Final CCs Codas Total
Epenth 35 0 23 26 84
Retention 7 12 0 0 19
Deletion 1 0 1 0 2
Other 0 0 0 0 0
Total 43 12 24 26 105
Japanese Initial CCs Medial CCs Final CCs Codas Total
Epenth 58 33 38 84 213
Retention 0 0 0 0 0
Deletion 0 33 6 2 41
Other 0 5 0 0 5
Total 58 71 44 86 259
Javanese Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 6 0 6
167
Retention 32 25 8 25 90
Deletion 1 0 12 1 14
Other 1 0 0 4 5
Total 34 25 26 30 115
Kali'na Initial CCs Medial CCs Final CCs Codas Total
Epenth 21 27 2 18 68
Retention 2 19 0 10 31
Deletion 0 3 1 0 4
Other 0 0 0 0 0
Total 23 49 3 28 103
Kannada Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 0 8 4 13
Retention 24 24 13 0 61
Deletion 0 0 1 21 22
Other 0 0 0 1 1
Total 25 24 22 26 97
Kanuri Initial CCs Medial CCs Final CCs Codas Total
Epenth 6 1 9 15 31
Retention 2 9 0 32 43
Deletion 0 6 2 3 11
Other 5 3 9 0 17
Total 13 19 20 50 102
Ket Initial CCs Medial CCs Final CCs Codas Total
Epenth 4 1 1 0 6
Retention 4 29 1 34 68
Deletion 10 1 0 0 11
Other 1 0 0 0 1
Total 19 31 2 34 86
Kildin Saami Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 3 1 4
Retention 16 58 13 105 192
Deletion 9 0 4 12 25
Other 2 0 0 1 3
Total 27 58 20 119 224
Korean Initial CCs Medial CCs Final CCs Codas Total
Epenth 34 26 2 48 110
Retention 0 34 49 68 151
Deletion 0 0 5 5 10
Other 0 0 0 0 0
Total 34 60 56 121 271
Lower Serbian Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 3 7 10
Retention 59 69 23 3 154
Deletion 2 3 3 47 55
Other 0 1 1 3 5
Total 61 73 30 60 224
Macedonian Initial CCs Medial CCs Final CCs Codas Total
168
Epenth 0 1 11 8 20
Retention 48 42 14 22 126
Deletion 2 0 1 0 3
Other 0 0 1 0 1
Total 50 43 27 30 150
Malagasy Initial CCs Medial CCs Final CCs Codas Total
Epenth 5 16 8 48 77
Retention 2 14 0 0 16
Deletion 2 5 0 3 10
Other 0 0 0 0 0
Total 9 35 8 51 103
Malay Initial CCs Medial CCs Final CCs Codas Total
Epenth 2 0 8 4 14
Retention 2 47 0 47 96
Deletion 0 0 0 6 6
Other 0 2 0 0 2
Total 4 49 8 57 118
Malayalam Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 4 0 0 5
Retention 1 6 0 14 21
Deletion 0 1 0 3 4
Other 0 1 0 0 1
Total 2 12 0 17 31
Maltese Initial CCs Medial CCs Final CCs Codas Total
Epenth 3 3 20 14 40
Retention 30 30 8 19 87
Deletion 0 2 3 3 8
Other 2 0 0 2 4
Total 35 35 31 38 139
Manange Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 0 2 3 6
Retention 1 11 0 25 37
Deletion 0 0 0 0 0
Other 0 1 0 0 1
Total 2 12 2 28 44
Mapudungan Initial CCs Medial CCs Final CCs Codas Total
Epenth 14 7 0 2 23
Retention 3 32 0 32 67
Deletion 0 7 0 5 12
Other 0 0 0 0 0
Total 17 46 0 39 102
Marathi Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 5 17 8 31
Retention 22 9 1 0 32
Deletion 0 0 1 0 1
Other 0 1 0 0 1
Total 23 15 19 8 65
169
Oroqen Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 2 1 2 6
Retention 0 6 0 15 21
Deletion 0 1 1 1 3
Other 0 1 0 0 1
Total 1 10 2 18 31
Romanian Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 2 1 17 21
Retention 59 78 4 46 187
Deletion 0 2 1 2 5
Other 1 0 0 0 1
Total 61 82 6 65 214
Sakha Initial CCs Medial CCs Final CCs Codas Total
Epenth 45 14 7 2 68
Retention 4 60 3 110 177
Deletion 1 16 4 1 22
Other 0 5 0 1 6
Total 50 95 14 114 273
Saramaccan Initial CCs Medial CCs Final CCs Codas Total
Epenth 17 9 2 10 38
Retention 0 2 0 7 9
Deletion 27 25 0 78 130
Other 1 2 0 4 7
Total 45 38 2 99 184
Selice Romani Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 15 50 65
Retention 48 53 3 4 108
Deletion 5 11 2 0 18
Other 0 0 0 1 1
Total 53 64 20 55 192
Seychelles Creole Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 1 0 0 1
Retention 15 27 3 63 108
Deletion 6 2 6 0 14
Other 0 0 0 0 0
Total 21 30 9 63 123
Shona Initial CCs Medial CCs Final CCs Codas Total
Epenth 120 109 80 120 429
Retention 0 0 0 0 0
Deletion 0 10 17 0 27
Other 0 1 0 0 1
Total 120 120 97 120 457
Swahili Initial CCs Medial CCs Final CCs Codas Total
Epenth 4 19 28 139 190
Retention 8 14 0 5 27
Deletion 0 2 0 0 2
Other 0 1 0 0 1
170
Total 12 36 28 144 220
Tamil Initial CCs Medial CCs Final CCs Codas Total
Epenth 18 3 2 0 23
Retention 2 18 6 12 38
Deletion 0 1 4 1 6
Other 0 0 0 0 0
Total 20 22 12 13 67
Tarifyt Berber Initial CCs Medial CCs Final CCs Codas Total
Epenth 47 1 1 6 55
Retention 48 20 5 130 203
Deletion 8 8 10 33 59
Other 5 5 4 0 14
Total 108 34 20 169 331
Thai Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 2 0 2
Retention 19 28 17 33 97
Deletion 3 0 4 0 7
Other 9 2 0 0 11
Total 31 30 23 33 117
Turkish Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 3 9 3 16
Retention 33 38 17 40 128
Deletion 1 2 2 2 7
Other 0 0 0 0 0
Total 35 43 28 45 151
Welsh Initial CCs Medial CCs Final CCs Codas Total
Epenth 0 0 1 0 1
Retention 36 31 22 31 120
Deletion 0 0 2 1 3
Other 0 0 0 0 0
Total 36 31 25 32 124
Yaqui Initial CCs Medial CCs Final CCs Codas Total
Epenth 1 0 0 0 1
Retention 17 91 0 5 113
Deletion 11 22 0 43 76
Other 0 0 0 0 0
Total 29 113 0 48 190
171
Appendix D: Chi-Square test for Adaptation vs. Non Adaptation
For P value, 0 = significant, 1 = not significant.
Language Sig P-Value !
2
Value
Adapt
obs
Non-A
obs Interpretation
Archi 0 0 72.4357 12 151 bias for retention
Armenian 0 0 27.3476 10 72 bias for retention
Azerbaijani 0 0 48.1516 12 112 bias for retention
Basque 1 0.4177 0.6567 58 71
No bias for retention
or adaptation
Bezhta 0 0 67.4762 149 14 Bias for adaptation
Cebuano 1 0.4288 0.6261 28 37
No bias for retention
or adaptation
Ceq Wong 0 0 85.2163 43 253 bias for retention
Dutch 0 0 73.3237 17 167 bias for retention
English 0 0 164.4647 32 357 bias for retention
Finnish 1 0.1904 1.7145 60 82
No bias for retention
or adaptation
Gawwada 1 0.0782 3.1015 43 23
No bias for retention
or adaptation
Georgian 1 0.3496 0.8751 42 55
No bias for retention
or adaptation
Gujarati 1 0.575 0.3143 60 69
No bias for retention
or adaptation
Gurindji 0 0 44.1349 28 144 bias for retention
Hausa 0 0.0005 12.1999 98 41 bias for adaptation
Hawaiian 0 0 94.4352 150 2 Bias for adaptation
Hup 1 0.0533 3.7333 21 7
No bias for retention
or adaptation
Imbabura
Quechua 0 0 106.0465 8 190 bias for retention
Indonesian 0 0 176.976 93 534 bias for adaptation
Iraqw 1 0.1026 2.664 23 10
No bias for retention
or adaptation
Irish 0 0 47.7539 16 122 bias for retention
Isindebele 0 0 23.7987 86 19 bias for adaptation
Japanese 0 0 172.6667 259 0 Bias for adaptation
Javanese 0 0 19.964 25 90 bias for retention
Kali'na (aka
Carib) 0 0.0036 8.4968 72 31 bias for adaptation
Kannada 1 0.0703 3.2761 36 61
No bias for retention
or adaptation
Kanuri 1 0.2611 1.2627 59 43
No bias for retention
or adaptation
Ket 0 0.0001 15.8765 18 68 bias for retention
Kildin
Saami 0 0 65.4971 32 192 bias for retention
Korean 1 0.1823 1.7789 120 151
No bias for retention
or adaptation
172
Lower
Serbian 0 0.0001 16.3239 70 154 bias for retention
Macedonian 0 0 39.213 24 126 bias for retention
Malagasy 0 0 27.7696 87 16 bias for retention
Malay 0 0 25.7335 22 96 bias for retention
Malayalam 1 0.1557 2.015 10 21
No bias for retention
or adaptation
Maltese 0 0.0343 4.4774 52 87 bias for retention
Manange 0 0.0007 11.5722 7 37 bias for retention
Mapudunga
n (aka
Mapuche) 0 0.0433 4.0826 35 67 bias for retention
Marathi 1 0.9301 0.0077 33 32
No bias for retention
or adaptation
Oroqen 1 0.1557 2.015 10 21
No bias for retention
or adaptation
Romanian 0 0 69.5299 27 187 bias for retention
Sakha 0 0.0005 12.2869 96 177 bias for retention
Saramaccan 0 0 94.0094 175 9 Bias for adaptation
Selice
Romani 1 0.2198 1.5059 84 108
No bias for retention
or adaptation
Seyschelles
Creole 0 0 41.0213 15 108 bias for retention
Shona 0 0 304.6667 457 0 Bias for adaptation
Swahili 0 0 73.0206 193 27 Bias for adaptation
Tamil 1 0.4358 0.6072 29 38
No bias for retention
or adaptation
Tarifyt
Berber 0 0.0033 8.6075 128 203 bias for retention
Thai 0 0 28.4143 20 97 bias for retention
Turkish 0 0 41.5265 23 128 bias for retention
Welsh 0 0 69.4532 4 120 bias for retention
Yaqui 1 0.0636 3.4414 77 113
No bias for retention
or adaptation
173
Appendix E: Chi-Square results for Epenthesis vs Deletion
Language Sig P-Val !
2
-Square Epenthesis Deletion Interpretation
Saramaccan 0 0 21.9367 38 130 Deletion Dominant
Hausa 0 0 21.7929 77 18 Epenthesis Dominant
Hawaiian 0 0 28.2435 119 31 Epenthesis Dominant
Isindebele 0 0 50.5918 84 2 Epenthesis Dominant
Japanese 0 0 67.8421 213 41 Epenthesis Dominant
Kali'na 0 0 35.4462 68 4 Epenthesis Dominant
Malagasy 0 0 30.2899 77 10 Epenthesis Dominant
Shona 0 0 220.5716 429 27 Epenthesis Dominant
Swahili 0 0 121.7242 190 2 Epenthesis Dominant
Bezhta 1 0.1461 2.1122 58 87 Process Neutral
174
Appendix F: Test results for step-three languages
Language Sig P-Value !
2
-Square Epenth Del O Ret
Basque 0 0 53.1373 46 7 5 71
Cebuano 0 0 34.5503 25 2 1 37
Finnish 0 0 71.9178 52 5 3 82
Gawwada 0 0 30.9018 38 1 4 23
Georgian 0 0 58.9018 40 2 0 55
Gujarati 0 0 76.7236 57 3 0 69
Hup 1 0.142 5.4444 5 14 2 7
iraqw 1 0.9056 0.5595 8 6 9 10
Kannada 0 0 40.7579 13 22 1 61
Kanuri 0 0.0059 12.4665 31 11 17 43
Korean 0 0 152.3698 110 10 0 151
Malayalam 0 0.0044 13.1036 5 4 1 21
Marathi 0 0 36.7095 31 1 1 32
Oroqen 0 0.0034 13.6352 6 3 1 21
Selice Romani 0 0 84.3524 65 18 1 108
Tamil 0 0 31.0601 23 6 0 38
Yaqui 0 0 125.3899 1 76 0 113
175
Appendix G: 2x2 test results for step-three languages
Language Epenthesis Deletion Other Retention Pattern
Basque 46 7 5 71 red
Cebuano 25 2 1 37 red
Finnish 52 5 3 82 red
Gawwada 38 1 4 23 ermd
Georgian 40 2 0 55 red
Gujarati 57 3 0 69 red
Kannada 13 22 1 61 rde
Kanuri 31 11 17 43 remd
Korean 110 10 0 151 red
Malayalam 5 4 1 21 red
Marathi 31 1 1 32 red
Oroqen 6 3 1 21 red
Selice
Romani 65 18 1 108 red
Tamil 23 6 0 38 red
Yaqui 1 76 0 113 rde
Test: Epenthesis vs Deletion
Language sig p-value !
2
-Square
Basque 0 4.63E-05 16.596
Cebuano 0 0.00054142 11.967
Finnish 0 1.35E-06 23.345
Gawwada 0 1.95E-06 22.647
Georgian 0 3.33E-06 21.614
Gujarati 0 3.39E-08 30.47
Kannada 1 0.27805 1.1766
Kanuri 0 0.024653 5.0481
Korean 0 1.24E-12 50.42
Malayalam 1 0.81338 0.055728
Marathi 0 2.18E-05 18.023
Oroqen 1 0.47329 0.51429
Selice Romani 0 0.00014264 14.467
Tamil 0 0.019556 5.4511
Yaqui 0 4.52E-12 47.883
176
Deletion vs Retention
Language sig p-val !
2
-Square
Basque 0 1.92E-08 31.57
Cebuano 0 9.23E-06 19.665
Finnish 0 7.54E-11 42.373
Gawwada 0 0.0003532 12.765
Georgian 0 2.06E-08 31.435
Gujarati 0 6.08E-10 38.295
Kannada 0 0.0018447 9.6979
Kanuri 0 0.0012643 10.394
Korean 0 0 76.39
Malayalam 0 0.010574 6.5355
Marathi 0 1.54E-05 18.682
Oroqen 0 0.0050693 7.8545
Selice Romani 0 1.28E-09 36.842
Tamil 0 0.00025035 13.41
Yaqui 1 0.055843 3.6567
Epenthesis vs Retention
Language sig p-val !
2
-Square
Basque 1 0.10024 2.7018
Cebuano 1 0.27894 1.1723
Finnish 1 0.065164 3.4008
Gawwada 1 0.17118 1.8726
Georgian 1 0.275 1.1916
Gujarati 1 0.44918 0.57273
Kannada 0 3.03E-05 17.398
Kanuri 1 0.32234 0.97941
Korean 1 0.071848 3.2403
Malayalam 0 0.019704 5.4379
Marathi 1 0.92901 0.007937
Oroqen 0 0.033598 4.5151
Selice Romani 0 0.019819 5.4278
Tamil 1 0.17118 1.8726
Yaqui 0 0 72.516
Epenthesis vs Metathesis
Language sig p !
2
-Square
Gawwada 0 4.97E-05 16.45830508
Kanuri 1 0.148651827 2.086031042
Metathesis vs Retention: Kanuri
sig p !
2
-Square
0 0.00574134 7.629745597
177
Metathesis vs Deletion: Kanuri
sig p !
2
-Square
1 0.419997372 0. 650322581
178
Appendix H: Epenthesis vs. Deletion Chi-Square for all languages
Language sig p- !
2
-Square Epenth Del
Archi 1 0.682 0.1678 4 1
Armenian 1 0.1596 1.978 3 0
Azerbaijani 1 0.2737 1.1983 2 0
Basque 0 0 16.5956 17 4
Bezhta 1 0.087 2.9293 42 0
Cebuano 0 0.0005 11.9673 6 0
Ceq Wong 0 0.0002 14.0752 0 0
Dutch 1 0.7232 0.1255 1 1
English 0 0.0467 3.956 2 2
Finnish 0 0 23.3453 29 2
Gawwada 0 0 22.6473 31 1
Georgian 0 0 21.6137 22 1
Gujarati 0 0 30.4702 25 2
Gurindji 0 0.0438 4.0638 5 12
Hausa 0 0 20.2762 54 7
Hawaiian 0 0 28.2435 93 12
Hup 1 0.1329 2.2583 1 4
Imbabura
Quechua 1 0.2207 1.5 5 1
Indonesian 1 0.3289 0.9533 3 3
iraqw 1 0.7047 0.1436 0 0
Irish 0 0.0309 4.6598 2 1
Isindebele 0 0 50.5918 26 0
Japanese 0 0 65.7768 84 2
Javanese 1 0.1967 1.6667 0 1
Kali'na (aka
Carib) 0 0 35.4462 18 0
Kannada 1 0.2781 1.1766 4 21
Kanuri 0 0.0247 5.0481 15 3
Ket 1 0.386 0.7515 0 0
Kildin Saami 0 0.0031 8.7506 1 12
Korean 0 0 50.4202 48 5
Lower Serbian 0 0 17.6975 7 47
Macedonian 0 0.007 7.2764 8 0
Malagasy 0 0 30.2899 48 3
Malay 1 0.1967 1.6667 4 6
Malayalam 1 0.8134 0.0557 0 3
maltese 0 0.0005 12 14 3
Manange 0 0.0455 4 3 0
Mapudungan
(aka Mapuche) 1 0.1831 1.7723 2 5
Marathi 0 0 18.0225 8 0
Oroqen 1 0.4733 0.5143 2 1
Romanian 0 0.0197 5.4379 17 2
Sakha 0 0.0004 12.5769 2 1
Saramaccan 0 0 27.2321 10 78
179
Selice Romani 0 0.0001 14.467 50 0
Seyschelles
Creole 0 0.0084 6.9357 0 0
Shona 0 0 219.9284 120 0
Swahili 0 0 121.0584 139 0
Tamil 0 0.0196 5.4511 0 1
Tarifyt Berber 1 0.7911 0.0702 6 33
Thai 1 0.2199 1.505 0 0
Turkish 1 0.176 1.831 3 2
Welsh 1 0.4652 0.5333 0 1
Yaqui 0 0 47.8829 0 43
180
Appendix J: Additional figures
Figure: Ration plots for each language : Retention to Adaptation (action)
Figure: Bar graph of Retention to Epenthesis for individually significant languages
181
Figure (8c) – mean values for statistically significant languages, Retention to Epenthesis.
Figure: Ratios for individual significant languages: Retention to Deletion
182
Abstract (if available)
Abstract
This dissertation investigates the phonology of loanword adaptation of sound sequences. When speakers borrow words that contain phonotactically marked sequences of sounds, there are a number of different ways by which they may adapt the foreign word into their native language. The type of adaptations that occur cross‐linguistically and the range and distribution of occurrence is the focal point of this study. Loanword data from fifty‐three languages were collected, analyzed, and assembled into a typology of adaptation strategies, as well as a typology a languages based on their adaptation tendencies. The trends in both of these typologies indicate a strong cross‐linguistic bias against consonant deletion in loanword adaptation
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Harmony in gestural phonology
PDF
The Spanish feminine el at the syntax-phonology interface
PDF
Articulatory knowledge in phonological computation
PDF
Soft biases in phonology: learnability meets grammar
PDF
Tone gestures and constraint interaction in Sierra Juarez Zapotec
PDF
Beatboxing phonology
PDF
Individual differences in phonetic variability and phonological representation
PDF
Syntax-prosody interactions in the clausal domain: head movement and coalescence
PDF
The phonology and phonetics of Turkish intonation
PDF
Cognitive-linguistic factors and brain morphology predict individual differences in form-sound association learning: two samples from English-speaking and Chinese-speaking university students
PDF
The phonological dimension of grammatical markedness
PDF
Minimal contrast and the phonology-phonetics interaction
PDF
Dynamics of consonant reduction
PDF
The prosodic substrate of consonant and tone dynamics
PDF
Sound symbolism and visual categorization
PDF
Sources of non-conformity in phonology: variation and exceptionality in Modern Hebrew spirantization
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
Prosody and informativity: a cross-linguistic investigation
PDF
Flexibility in language production
PDF
Effects of language familiarity on talker discrimination from syllables
Asset Metadata
Creator
Riggs, Daylen
(author)
Core Title
Sound sequence adaptation in loanword phonology
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Publication Date
04/17/2014
Defense Date
12/18/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
loanword,loanword adaptation,loanword phonology,loanword typology,loanwords,OAI-PMH Harvest,phonology,sound sequence adaptation,syllable structure,Typology
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Walker, Rachel (
committee chair
), Finlay, Stephen (
committee member
), Goldstein, Louis (
committee member
)
Creator Email
daylen.riggs@gmail.com,daylenri@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-380869
Unique identifier
UC11295767
Identifier
etd-RiggsDayle-2372.pdf (filename),usctheses-c3-380869 (legacy record id)
Legacy Identifier
etd-RiggsDayle-2372.pdf
Dmrecord
380869
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Riggs, Daylen
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
loanword
loanword adaptation
loanword phonology
loanword typology
loanwords
phonology
sound sequence adaptation
syllable structure