Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Improving frequency resolution in cochlear implants with implications for cochlear implant front- and back-end processing
(USC Thesis Other)
Improving frequency resolution in cochlear implants with implications for cochlear implant front- and back-end processing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Improving Frequency Resolution in Cochlear Implants with Implications for
Cochlear Implant Front- and Back-End Processing
by
Susan Rebekah Subrahmanyam Bissmeyer
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOMEDICAL ENGINEERING)
December 202 2
Copyright 2022 Susan Rebekah Subrahmanyam Bissmeyer
ii
Dedication
To my loves, Jamie and Tommy
iii
Acknowledgements
I would like to thank my advisor, Dr. Raymond Goldsworthy. I have no idea where I would be
without your mentorship—you took me on as a volunteer, Master’s student, research assistant,
project manager, and PhD student. Thank you for 8 years of influence, guidance, and direction—
for always having faith in me and pushing me to be better.
Thank you to my past and present committee members, Dr. Gerald Loeb, Dr. Laurie Eisenberg,
Dr. Leonid Litvak, Dr. Chris Shera, and Dr. Radha Kalluri for your guidance and advising
throughout my PhD.
Thank you to the normal hearing and cochlear implant subjects who worked tirelessly on my
tests and training.
Thank you to God, my husband Jamie—my rock and encourager every step of the way (and
fellow PhD candidate), my Mom and Grandma who instilled in me a love for music and people,
my Dad who encouraged me academically throughout my life, my brother—my comedic relief
and fellow PhD candidate, my grandfathers who inspired me to help people hear better, and my
darling baby boy who joined me during this journey and who I cannot imagine life without.
I would like to acknowledge both of my parents for watching my son for countless hours this
past year while I worked. I express my utmost thanks. Love you both!
Thank you to my many funding sources: the USC Caruso Department of Otolaryngology, the
Hearing and Communication Neuroscience T32 Training Program, TAship with Jesse Yen in the
Department of Biomedical Engineering, and finally and most thankfully Dr. Goldsworthy’s
TFS4CIs R01!
Thank you ALL from the bottom of my heart!
iv
Table of Contents
Dedication ....................................................................................................................................... ii
Acknowledgements ........................................................................................................................ iii
List of Tables .................................................................................................................................. x
List of Figures ................................................................................................................................ xi
Abbreviations ............................................................................................................................... xiv
Abstract ........................................................................................................................................ xvi
Chapter 1: General Introduction ..................................................................................................... 1
Cochlear Implants ....................................................................................................................... 1
Brief History of Development ................................................................................................. 1
Brief Overview of the Current Device .................................................................................... 3
Cochlear Implant Clinical Use and Outcomes ............................................................................ 5
Variability in Outcomes and Abilities ..................................................................................... 5
Overall Good for Speech in Quiet, Poor for Speech in Noise and Pitch Perception ............... 5
Brief History of Front-end Noise Reduction in Hearing Assistive Technology ..................... 7
Brief Look at Back-end Frequency Encoding in CI Signal Processing .................................. 8
Exploring Frequency Resolution with Electrode Psychophysics .............................................. 11
Purpose of this Dissertation ....................................................................................................... 12
Chapter 2: A Novel Adaptive Beamforming Algorithm Improves Signal-to-Noise Ratio
while Preserving Cues Necessary for Localization ...................................................................... 15
Introduction ............................................................................................................................... 15
Methods ..................................................................................................................................... 20
Subjects.................................................................................................................................. 20
Materials ................................................................................................................................ 20
Binaural Fennec Algorithm ................................................................................................... 22
Speech Reception Thresholds................................................................................................ 25
Lateralization Thresholds ...................................................................................................... 27
Acoustic Analyses ..................................................................................................................... 29
v
Results ....................................................................................................................................... 35
Speech Reception Thresholds................................................................................................ 35
Lateralization Thresholds ...................................................................................................... 36
Psychometric Curve Fitting ................................................................................................... 38
Discussion ................................................................................................................................. 39
Conclusion ................................................................................................................................. 43
Chapter 3: The Effects of Musical Interval Identification Training and Musical Ability on
Psychophysical Performance ........................................................................................................ 44
Introduction ............................................................................................................................... 44
Methods ..................................................................................................................................... 48
Overview ............................................................................................................................... 48
Participants ............................................................................................................................ 50
Training ................................................................................................................................. 53
Pre- and Post-Training Assessments ..................................................................................... 54
Calibration Procedures ....................................................................................................... 55
Pure Tone Frequency Discrimination ................................................................................ 56
Fundamental Frequency Discrimination ............................................................................ 56
Tonal and Rhythm Comparisons ....................................................................................... 57
Interval Identification ........................................................................................................ 58
The Goldsmith Musical Sophistication Index ....................................................................... 59
Results ....................................................................................................................................... 60
Data Analysis......................................................................................................................... 60
Pure Tone Detection Thresholds ........................................................................................... 60
Pure Tone Frequency Discrimination .................................................................................... 62
Fundamental Frequency Discrimination ............................................................................... 63
Tonal and Rhythm Comparisons ........................................................................................... 65
Interval Identification ............................................................................................................ 66
Correlation Analysis .............................................................................................................. 67
Details of the Training Program ............................................................................................ 71
Discussion ................................................................................................................................. 73
Chapter 4: The Effects of Individual Differences and Perceptual Learning on Stimulation
Rate Discrimination in Cochlear Implant Users ........................................................................... 80
Introduction ............................................................................................................................... 80
vi
Methods ..................................................................................................................................... 84
Subjects.................................................................................................................................. 84
Psychophysical Testing ......................................................................................................... 84
Detection Thresholds and Comfort Levels as a Function of Stimulation Rate ..................... 85
Rate Discrimination Thresholds ............................................................................................ 86
Psychophysical Training of Stimulation Rate Discrimination .............................................. 87
Forward-masked Detection Thresholds ................................................................................. 88
Statistical Methods ................................................................................................................ 89
Results ....................................................................................................................................... 90
Rate Discrimination Thresholds ............................................................................................ 90
Rate Discrimination Improves through Experience .............................................................. 93
Forward-Masked Detection Thresholds ................................................................................ 94
Psychophysically Derived Metric: Forward-Masked Threshold Slopes. .......................... 95
Detection thresholds and comfort levels as a function of stimulation rate ............................ 95
Psychophysically Derived Metric: Average Detection Thresholds. .................................. 96
Psychophysically Derived Metric: Multi-Pulse Integration. ............................................. 97
Correlation Analysis between Psychophysically Derived Metrics and Rate
Discrimination ....................................................................................................................... 97
Exploratory Correlation Analysis among Psychophysically Derived Metrics ...................... 99
Discussion ................................................................................................................................. 99
Comparison with Stimulation Rate Discrimination Literature ............................................ 100
Lack of Correlation between Rate Discrimination and Other Psychophysical Measures ... 102
Psychophysical Training Improves Stimulation Rate Discrimination ................................ 104
Conclusions ............................................................................................................................. 107
Chapter 5: Combining Stimulation Place and Rate Improves Frequency Discrimination in
Cochlear Implant Users............................................................................................................... 108
Introduction ............................................................................................................................. 108
Experiment 1: Melodic Contour Identification ....................................................................... 114
General Methods ................................................................................................................. 114
Subjects ............................................................................................................................ 114
Procedure ......................................................................................................................... 115
Loudness Balancing ......................................................................................................... 116
Stimuli .............................................................................................................................. 117
Analyses ........................................................................................................................... 119
vii
Results ................................................................................................................................. 119
Experiment 2: Frequency Discrimination ............................................................................... 122
General Methods ................................................................................................................. 122
Subjects ............................................................................................................................ 122
Procedure ......................................................................................................................... 123
Loudness Balancing ......................................................................................................... 124
Stimuli .............................................................................................................................. 124
Analyses ........................................................................................................................... 125
Results ................................................................................................................................. 127
Discussion ............................................................................................................................... 134
Coordinated Place and Rate of Stimulation for CIs ............................................................ 135
Does Broad Stimulation Provide Better Access to Rate Pitch Cues? ................................. 137
Conclusions ............................................................................................................................. 138
Chapter 6: The Effect of Stimulation Rate Training on Cochlear Implant Frequency
Discrimination............................................................................................................................. 140
Introduction ............................................................................................................................. 140
Methods ................................................................................................................................... 143
Subjects................................................................................................................................ 144
Overview of Protocol .......................................................................................................... 145
Electrode Psychophysical Loudness Balancing .................................................................. 146
Electrode Psychophysical Stimuli ....................................................................................... 147
Training Procedure .............................................................................................................. 147
Electrode Psychophysical Assessments .............................................................................. 149
Frequency Discrimination ................................................................................................ 149
Intensity Discrimination .................................................................................................. 150
Melodic Contour Identification........................................................................................ 151
Fundamental Frequency Discrimination with Vowel Formants ...................................... 152
Pitch Matching ................................................................................................................. 153
Acoustic Pitch Assessments ................................................................................................ 154
Pure Tone Loudness Scaling ............................................................................................ 154
Pure Tone Detection ........................................................................................................ 155
Pure Tone Frequency Discrimination .............................................................................. 155
Fundamental Frequency Discrimination .......................................................................... 156
viii
Pitch Ranking of Piano Notes .......................................................................................... 157
Melodic Contour Identification........................................................................................ 158
Acoustic Speech Assessments ............................................................................................. 158
Pure Tone Loudness Scaling and Detection .................................................................... 158
Pure Tone Intensity Discrimination ................................................................................. 159
Sentence Completion in Background Noise .................................................................... 159
Results ..................................................................................................................................... 160
Analyses .............................................................................................................................. 160
Training ............................................................................................................................... 161
Electrode Psychophysical Assessments .............................................................................. 161
Frequency Discrimination ................................................................................................ 162
Intensity Discrimination .................................................................................................. 164
Melodic Contour Identification........................................................................................ 164
Fundamental Frequency Discrimination with Vowel Formants ...................................... 166
Pitch Matching ................................................................................................................. 167
Acoustic Pitch Assessments ................................................................................................ 168
Pure Tone Loudness Scaling and Detection Thresholds ................................................. 168
Pure Tone Frequency Discrimination .............................................................................. 169
Fundamental Frequency Discrimination .......................................................................... 170
Pitch Ranking of Piano Notes .......................................................................................... 171
Melodic Contour Identification........................................................................................ 172
Acoustic Speech Assessments ............................................................................................. 173
Pure Tone Scaling and Detection Thresholds .................................................................. 173
Pure Tone Intensity Discrimination ................................................................................. 174
Sentence Completion in Background Noise .................................................................... 175
Discussion ............................................................................................................................... 176
Why did the progression throughout stimulation rate training not transfer to the trained
task? ..................................................................................................................................... 176
Conclusion ............................................................................................................................... 177
General Discussion/Conclusions ................................................................................................ 179
What are the challenges of implementing combined place and rate coding in CI signal
processing? .............................................................................................................................. 180
Culmination of Dissertation Topics: Spatial Hearing and Cues ............................................. 182
ix
References ................................................................................................................................... 184
Appendices .................................................................................................................................. 216
Appendix A ............................................................................................................................. 216
Chapter 2 ............................................................................................................................. 216
Article and Copyright Details .......................................................................................... 216
Appendix B ............................................................................................................................. 217
Chapter 3 ............................................................................................................................. 217
Article and Copyright Details .......................................................................................... 217
Supplementary Materials ................................................................................................. 218
Appendix C ............................................................................................................................. 220
Chapter 4 ............................................................................................................................. 220
Article and Copyright Details .......................................................................................... 220
Appendix D ............................................................................................................................. 221
Chapter 5 ............................................................................................................................. 221
Article and Copyright Details .......................................................................................... 221
Supplementary Materials ................................................................................................. 222
x
List of Tables
Table 3.1. Subject Demographics……………………………………………………………….. 51
Table 3.2. Interval Notation with the Corresponding Semitone Spacing between Notes………..53
Table 3.3. Correlations between Results from Different Procedures Averaged across
Conditions……………………………………………………………………………………….. 68
Table 3.4. Procedure Correlations for No Known Hearing Loss………………………………...68
Table 3.5. Procedure Correlations for CI Users………………………………………………….68
Table 4.1. Subject Demographics………………………………...……………………………... 84
Table 5.1. Subject Information………………………………………………………………… 114
Table 6.1: Subject Information………………………………………………………………… 145
Supplemental Table B.1: Interval Training Levels with Semitone Spacing between Notes
and Base Note Frequency Range………………………………………………………………. 218
xi
List of Figures
Figure 1.1: History of Cochlear Implants………………………………...………………………. 2
Figure 1.2: Diagram of a Speech Processor and Cochlear Implant………………………………. 4
Figure 1.3: Comparison of CI Electrode Arrays Across Implant Companies……………………. 5
Figure 1.4: Sound Coding in Cochlear Implants………………………………............................. 9
Figure 2.1: Polar Plot of the Attenuation of the Binaural Fennec Algorithm…………………… 24
Figure 2.2: Components of Target and Noise Before and After Processing Shows Preservation
of Binaural Cues…………………………...………………………………................................. 30
Figure 2.3: Acoustic Analysis Showing Target and Noise Components Before Processing…….32
Figure 2.4: Attenuation of the Target and Noise Masker at Different Angles and
Reverberation Times……………………...………………………………................................... 34
Figure 2.5: Speech Reception Thresholds were Significantly Better with Binaural Fennec
Processing at all Reverberation Times………………………………...…………………………35
Figure 2.6: Lateralization Thresholds were Improved in Anechoic Conditions with Binaural
Fennec Processing, while not Impeded in Reverberant Conditions…………………………….. 37
Figure 2.7: Psychometric Functions Show the Difference in Detection Accuracy for Both
Speech Reception and Lateralization Thresholds Before and After Binaural Fennec
Processing………………………………...……………………………………………………... 39
Figure 3.1: Visualizations of Musical Notes………………………………...…………………...49
Figure 3.2: Stimulus Level Associated with Detection Thresholds……………………………...61
Figure 3.3: Pure Tone Frequency Discrimination Thresholds………………………………....... 63
Figure 3.4: Fundamental Frequency Discrimination Thresholds……………………………….. 64
Figure 3.5: Tonal and Rhythm Comparisons………………………………...………………….. 65
xii
Figure 3.6: Interval identification………………………………...……………………………... 67
Figure 3.7: Comparisons of Individual Results from Different Procedures…………………….. 70
Figure 3.8: Musical Sophistication Index (MSI) vs Individual Results from Procedures………. 71
Figure 3.9: Number of Cumulative Failed Runs across Levels for Individual Participants…….. 72
Figure 4.1: Individual Rate Discrimination Thresholds………………………………................ 91
Figure 4.2: Boxplot of Median Rate Discrimination Thresholds……………………………….. 91
Figure 4.3: Comparing the Effect of Training on Rate Discrimination in Goldsworthy and
Shannon (2014) and the Present Study………………………………...………………………... 94
Figure 4.4: Individual Forward-Masked Thresholds……………………………………………. 94
Figure 4.5: Individual Detection Threshold Levels………………………………....................... 96
Figure 4.6: Correlation Analysis between Rate Discrimination and Psychophysically Derived
Metrics of Frequency Tuning………………………………...………………………………..... 98
Figure 5.1: Example Melodic Contour Identification Stimuli for Experimental Conditions of
Place, Rate, and Combined Place-Rate. ………………………………...……………………... 118
Figure 5.2: Internote Frequency Spacing Thresholds for Melodic Contour Identification……. 120
Figure 5.3: Individual Internote Frequency Spacing Thresholds for Melodic Contour
Identification…………………………………………………………………………………… 122
Figure 5.4: Example Frequency Discrimination Stimuli for Experimental Conditions of
Place, Rate, and Combined Place-Rate………………………………………………………… 125
Figure 5.5: Frequency Discrimination Thresholds with Multi-Electrode Stimuli……………... 129
Figure 5.6: Individual Frequency Discrimination as a Function of Frequency………………... 130
Figure 5.7: Single and Multi-electrode Rate Discrimination as a Function of Frequency……..131
Figure 5.8: Correlations between Forward Masking and Frequency Discrimination…………..132
xiii
Figure 5.9: Correlations between Rate Discrimination and Individual Metrics of Hearing
Loss and CI Experiences………………………………………………………………………..134
Figure 6.1: Example Stimuli for Procedures……………………………………………………149
Figure 6.2: Frequency Discrimination across Sessions and Ears……………………………… 162
Figure 6.3: Intensity Discrimination across Sessions and Ears………………………………... 164
Figure 6.4: Melodic Contour Identification across Sessions and Ears………………………… 165
Figure 6.5: Fundamental Frequency Discrimination with Vowel Formants across Sessions
and Ears…………………………………………………………………………………………166
Figure 6.6: Pitch Matching with Place and Rate Frequency Cues across Sessions and Ears….. 168
Figure 6.7: Acoustic Pure Tone Detection……………………………………………………...169
Figure 6.8: Acoustic Pure Tone Frequency Discrimination for 110-3520 Hz………………….170
Figure 6.9: Acoustic Fundamental Frequency Discrimination………………………………… 171
Figure 6.10: Acoustic Piano Note Frequency Discrimination…………………………………. 172
Figure 6.11: Acoustic Melodic Contour Identification with Piano Notes……………………... 173
Figure 6.12: Acoustic Pure Tone Detection for 500-4000 Hz…………………………………. 174
Figure 6.13: Acoustic Pure Tone Intensity Discrimination……………………………………. 175
Figure 6.14: Acoustic Sentence Completion in Background Noise…………………………… 176
Supplementary Figure B.1: Image and Explanation of Website………………………………. 219
Supplementary Figure D.1: Example Mapping Interface…………………………………….... 222
Abbreviations
AB – Advanced Bionics Cochlear Implant Company
ACE – Advanced Combination Encoder
ANOVA – Analysis of Variance
AOI – angle of incidence
B – the rate by which the current decreases over the frequency range (seen in equations, i.e.,
Equation 3.1)
BP – Bipolar Mode
CI – Cochlear Implant
cm – centimeter
CRM – Coordinate Response Measure
dB – decibel
dB HL – decibels of hearing loss
dB SPL – decibels sound pressure level
dB SNR – decibels signal-to-noise ratio
DT – Discrimination Thresholds or Detection Thresholds (depending on the context/chapter)
FD – Frequency Discrimination
FDT – Frequency Discrimination Thresholds
F0DT – Fundamental Frequency Discrimination Thresholds
FM – Forward Masking
FSP – Fine Structure Processing
HiRes – High Resolution
HRTF – Head-related Transfer Function
Hz – cycles/second
II – Interval Identification
ILD – Interaural Level Difference (also referred to as, inter-microphone amplitude difference)
ITD – Interaural Timing Difference (also referred to as, inter-microphone timing difference)
L – lower limit of the subject’s dynamic range (seen in equations, i.e., Equation 3.1)
MCI – Melodic Contour Identification
MP – Monopolar Mode
xv
MPI – Multi-pulse Integration
MSI – Goldsmith Musical Sophistication Index Self-Report Inventory
ms – milliseconds
µs – microseconds
NH – normal hearing
NIC – Nucleus Implant Communicator
pps – pulses-per-second
Q – related to the current level at the reference mapping frequency (seen in equations, i.e., 100
Hz for Equation 3.1)
R-SPIN – Revised Speech Perception in Noise
RC – Rhythm Comparisons
SAS – Simultaneous Analog Stimulation
SNHL – Sensorineural Hearing Loss
SNR – Signal-to-Noise Ratio
SRT – Speech Reception Threshold
T60 – reverberation time
TC – Tonal Comparisons
TFS – Temporal Fine Structure
U – upper limit of the subject’s dynamic range (seen in equations, i.e., Equation 3.1)
USC CIRI – University of Southern California CI Research Interface
v – controls asymptotic growth (seen in equations, i.e., Equation 3.1)
x – frequency expressed as log2(frequency/100) (seen in equations, i.e., Equation 3.1)
xvi
Abstract
The main difficulties voiced by cochlear implant (CI) users are speech comprehension in
noise and pitch perception. The goal of this thesis is to address these issues in five studies
through improving spectral resolution with potential ways to improve front and back-end
cochlear implant processing. In these studies, a novel noise reduction algorithm is developed to
improve the clarity of speech in noise and the effects of training and frequency encoding on
frequency resolution are explored in CI users.
The first study evaluated a front-end adaptive binaural beamforming algorithm in which a
spectral analysis of incoming sounds was done to improve speech reception in noise, while
preserving the cues necessary to localize sounds in space. This algorithm assumes a “look-to-
listen” approach by preserving speech presented in front of the listener, even in reverberant
conditions.
As frequency resolution supports all aspects of hearing, the next step was to explore
frequency resolution in CI users. The second study considered the effect of training on musical
interval identification and tested for correlations between psychophysics and musical abilities in
both normal hearing (NH) and CI users. The results indicated strong correlations between
measures of frequency resolution with interval identification; however, only a small effect of
training on interval identification was observed for the CI users. The results showed that the CI
users perform significantly worse than NH listeners at all tasks. This is likely due, at least in part,
to poor frequency resolution through the processor, so the next step was to explore how
frequency encoding could be improved in CIs.
CI stimulation (back-end processing) is primarily driven by temporal envelopes and
electrode place, while stimulation rate is generally limited. CIs provide the means to functionally
xvii
separate and determine the individual contributions of electrode place and stimulation rate as
potential cues for pitch perception. CI users can perceive pitch differences associated with
changes in stimulation rate, but sensitivity generally diminishes above 300 pulses-per-second
(pps), although it can be improved through psychophysical training. The next three described
studies further explore training stimulation rate as a cue for pitch perception and whether
stimulating with combined electrode place and stimulation rate can improve performance at pitch
judgment tasks. The third study demonstrated the ability of CI users to perform pitch ranking
with stimulation rates up to 800 pps. Performance was around 10% discrimination at low rates
and diminished at higher rates. CI users were tested with metrics of frequency tuning which
correlated well with pitch ranking performance. Training at stimulation rate pitch ranking
yielded a significant improvement at low rates. The fourth study focused on frequency coding
with place-of-excitation and stimulation rate from 100 to 1600 Hz. Combining these two cues
provided a significant improvement below 400 Hz that transitioned into performance like place-
of-excitation alone above 400 Hz. The fifth study was a preliminary examination of the extent to
which improving sensitivity to stimulation rates through training can be useful for frequency
discrimination. Performance was assessed before and after training through benchmark tests
which explored pitch, melody, loudness, and speech comprehension. The primary hypothesis is
that improving sensitivity to stimulation rate will improve frequency discrimination in CI users.
These studies contributed to improving CI processing (1) through a novel front-end noise
reduction algorithm which improved speech comprehension in noise and (2) by exploring ways
to improve frequency resolution with implications for back-end CI signal processing. These
studies contribute to improving hearing through CIs and have implications for future strategies to
improve CI outcomes.
1
Chapter 1: General Introduction
Cochlear Implants
Brief History of Development
The electrical stimulation of the brain to produce auditory sensations has been done for
centuries. It was first done by Giuseppe Veratti in 1747 to help reduce a patient’s tinnitus and
hearing loss, shortly followed by Benjamin Wilson in 1748, then famously by Alessandro Volta
in 1800 who stimulated his ear canals with a 50 volt circuit and heard the sound of boiling soup
(Hainarosie et al., 2014; Loeb, 1990; Marchese-Ragona et al., 2019; Mudry & Mills, 2013;
Volta, 1800; B. S. Wilson & Dorman, 2018). This beginning paved the way for the auditory
prosthesis.
2
Figure 1.1: History of Cochlear Implants (Niparko, 2009; B. S. Wilson & Dorman, 2018)
Djourno and Eyriès were the first to implant an electrode array to stimulate the
vestibulocochlear nerve in 1957. They predicted the imminent development of the cochlear
implant (CI), concluding that “the electrical stimulation of the cochlea itself…would without
doubt allow the construction of a possible mechanism for electrical hearing” (Djourno et al.,
1957; Eisen, 2003; Hainarosie et al., 2014; House, 1976; Mudry & Mills, 2013). This inspired
the first true cochlear implantation performed by Drs. William House and John Doyle on January
9, 1961 (House, 1976; Mudry & Mills, 2013; Shannon, 2015). Cochlear implantation boomed
after this (see Figure 1.1) with research from multiple groups (Stanford, UCSF, Graeme Clark,
Burian, Hochmair, and Hochmair-Desoyer, and more) developing single and multi-channel
devices to directly stimulate the auditory nerve (Niparko, 2009; B. S. Wilson & Dorman, 2018).
Graeme Clark’s research in 1967 led to the first commercially available multi-electrode
implantation in 1978 eventually leading to Cochlear Corporation in 1984 with the first Nucleus
device implanted in 1985, the research done by a team of scientists at UCSF (Robin Michelson,
Michael Merzenich, Robert Schindler, Charles Byers, Stephen Rebscher, Gerald Loeb, Mark
White, Robert Shannon, and many more) led to the Advanced Bionics Clarion multi-electrode
device being implanted from 1996 on, and the research from Kurt Burian, Ingeborg Hochmair-
Desoyer, and Erwin Hochmair eventually led to the MED-EL device in 1982 (Eshraghi et al.,
2012; Hainarosie et al., 2014; Loeb, 1990; Merzenich, 2015; Mudry & Mills, 2013; Niparko,
2009; B. S. Wilson & Dorman, 2008, 2018). This is not to say the path to restoring partial
hearing with CIs was without difficulty and opposition. Cochlear implantation was controversial
and, as reported in a personal reflection by Dr. Robert Schindler, many prominent scientists were
doubtful that a CI would ever produce anything but noise (Ramsden, 2013; Schindler, 1999).
3
This is especially amazing when we consider that the CI has now been considered the best
treatment for severe to profound sensorineural hearing loss (SNHL) for almost 55 years!
The implant array and sound processors have evolved much since the inception of the CI.
The 1980s were filled with comparisons of single vs multi-electrode devices in terms of
outcomes (B. S. Wilson & Dorman, 2018). Amazingly, some star users of single-electrode
devices, like the 3M/House device, could achieve similar speech recognition as star users of the
multi-electrode Cochlear Nucleus device (Tyler et al., 1989; Tyler & Moore, 1992). Overall,
multi-electrode devices produced better speech recognition and won out, leading to the last
commercial single-electrode House device being made in 1985. The 1990s brought the search for
better signal processing strategies (B. S. Wilson & Dorman, 2018) and after numerous studies,
the CI companies solidified the strategies which work best for their devices and users. Over the
years, the internal implant has gone from being encased in ceramic and titanium to silicone and
the external processor has gone from large and body worn to small units which can be clipped to
a t-shirt, worn behind the ear, or even off-the-ear connecting and resting directly against the
implant magnet (Hainarosie et al., 2014). The greater changes in the last 20 years have been
earlier implantation with babies being implanted from 6-9 months, increased prevalence of
bilateral implantation, and delivering sounds directly to the CI processor (by audio cable or
Bluetooth streaming).
Brief Overview of the Current Device
Today, the cochlear implant (CI) is the most successful neural prosthesis, having restored
hearing to one million people with hearing loss (Carlyon & Goehring, 2021; Zeng, 2022). The
CI, shown in Figure 1.2, is an implantable device which broadly stimulates the approximately
30,000 auditory nerve fibers with 12-22 electrodes, depending on the implant company. An
4
external speech processor
captures sound through external
microphones, processes the
sound through a filter bank, and
converts it to radio waves
which are then transmitted
across the skin to the receiver.
The implant then delivers
electrical pulses to the auditory
nerve from the distinct electrodes.
Three companies have FDA approval in the U.S. and make up the majority of the
commercial CI market: Cochlear Corporation, Advanced Bionics (AB), and MED-EL
(Hainarosie et al., 2014). These three companies differ in number of electrodes and array length,
shown in Figure 1.3, and in the signal processing strategy employed, see Figure 1.4 in the
frequency encoding section. Even with these differences, there does not seem to be one company
which comes out on top in terms of patient outcomes (Eshraghi et al., 2012; Withers et al., 2011).
Figure 1.2: Diagram of a Speech Processor and
Cochlear Implant (Cochlear Implants, 2022)
5
Figure 1.3: Comparison of CI Electrode Arrays Across Implant Companies (Modified from Shin et
al., 2021)
Cochlear Implant Clinical Use and Outcomes
Variability in Outcomes and Abilities
While the implant has been undeniably successful, there remain substantial individual
differences and variability in outcomes with little correlation to hearing history or physiology,
with some individuals being unable to gain almost any benefits from implantation and others
achieving near NH performance (Moberly et al., 2016; Pisoni et al., 2018; B. S. Wilson &
Dorman, 2008). This variability in outcomes is clearly shown in the ability of CI users to take
advantage of the resolution present in the CI. A speech in noise task showed that even the best CI
users were only able to take advantage of around 7-10 channels (or electrodes) of information,
while NH listeners could use up to 20 channels of information to receive a benefit (Friesen et al.,
2001; Shannon et al., 2004). There also remain limitations to the quality of hearing that can be
achieved even in the best performers (Boisvert et al., 2020). The CI produces very broad
stimulation compared to the refined normal hearing stimulation of the 30,000 auditory nerve
fibers. The CI electrode array (see Figure 1.3) generally only makes it around 1.25 turns into the
cochlea (though MED-EL has an electrode array which make it around 2 turns into the cochlear),
creating a mismatched place of stimulation (for characteristic frequency) in the CI. This leads to
sounds that are spectrally shifted and degraded (Moberly et al., 2016), although there is evidence
of plasticity in terms of spectral shift in CI users (Reiss et al., 2014). These factors all paint the
picture of a device that is somewhat improbable to deliver usable converted sounds.
Overall Good for Speech in Quiet, Poor for Speech in Noise and Pitch Perception
6
Even with these improbable odds, CIs not only restore the ability for implant users to
hear sounds, but to understand speech with 50-60% word recognition in quiet, on average (B. S.
Wilson & Dorman, 2008). While CIs provide relatively good restoration of speech
comprehension in quiet, speech comprehension in noise and the ability to appreciate music
remain elusive for most CI users (A. Caldwell & Nittrouer, 2013; Cullington & Zeng, 2008;
Fetterman & Domico, 2002, 2002; Q.-J. Fu & Nogaki, 2005; Gfeller et al., 2007a; Torkildsen et
al., 2019; B. S. Wilson & Dorman, 2008; Zeng et al., 2008). This is understandable as speech
recognition in noise and pitch judgments in CI users have been shown to be highly variable, even
in controlled testing environments. Reverberation also increases the noisiness of an environment
adversely affecting both speech recognition and pitch perception (Hazrati & Loizou, 2012; Xu et
al., 2021). The difficulty with speech in noise, especially in reverberant environments such as
restaurants or concert halls, causes increased listening effort in individuals with hearing loss (do
Nascimento & Bevilacqua, 2005; Wendt et al., 2017) and affects quality of life for CI users (do
Nascimento & Bevilacqua, 2005). For pitch perception, while a minority of CI users can make
pitch judgments at near NH level (~1% for pure tone discrimination and ~5% for harmonic
complex discrimination), many cannot distinguish between tones multiple semitones apart (1
semitone = 5.95% difference) (Gfeller, Turner, et al., 2002; Goldsworthy, 2015; Goldsworthy et
al., 2013; Penninger et al., 2013; Pretorius & Hanekom, 2008; Wagner et al., 2021). It is
unsurprising then that since a difference in frequency cannot be reliably judged as a change in
pitch, melody recognition and musical emotion are difficult for many CI users (Ambert-Dahan et
al., 2015; Gfeller et al., 2005, 2007a; Gfeller, Turner, et al., 2002; Looi et al., 2004). CI users
have also expressed dissatisfaction in listening to music post-implantation (Gfeller et al., 2000,
2019; Looi & She, 2010) causing decreased time spent listening to music (Lassaletta et al., 2007,
7
2008) with an adverse effect on quality of life, depending on the individual awareness of
resources for CI users (Andries et al., 2021; Gfeller et al., 2019; Lassaletta et al., 2007, 2008;
Looi & She, 2010). This presents a great need for exploring ways to improve noise reduction and
pitch perception in CI users (Nogueira et al., 2019).
Brief History of Front-end Noise Reduction in Hearing Assistive Technology
The front-end processing of noise reduction has long been a goal in hearing assistive
technology, but generally breaks down in environments with reverberation and multiple noise
sources (for reviews: Bentler, 2005; Levitt, 2001). Multiple microphone signal processing
selectively enhances sounds based on spatial location (for reviews: Brandstein and Ward, 2001;
Van Veen and Buckley, 1988) and can be subdivided into fixed (linear combinations of
microphones, e.g., cardioid or dipole) (Chung, 2004; Chung et al., 2004, 2006; Chung & Zeng,
2009; Desloge et al., 1997; Kates, 1993; Soede, Berkhout, et al., 1993; Soede, Bilsen, et al.,
1993; Stadler & Rabinowitz, 1993) and adaptive algorithms (dynamically adjust filter weights
used for combining the microphone signals to minimize output noise power, e.g., null-steering
beamformers) (Frost, 1972; Griffiths & Jim, 1982). These adaptive algorithms have evolved over
the years being used on many clinical CI and hearing aid processors (Greenberg & Zurek, 1992;
Hersbach et al., 2012; Kates & Weiss, 1996; Kokkinakis et al., 2012; Kompis & Dillier, 1994;
Spriet et al., 2007; Vanden Berghe & Wouters, 1998; Welker et al., 1997).
Adaptive algorithms provide good speech reception benefits in noise, but are susceptible
to multiple noise sources and increasing reverberation levels (Desmond et al., 2014; Greenberg
& Zurek, 1992; Hamacher et al., 1997; Hazrati & Loizou, 2012; Kokkinakis & Loizou, 2010; R.
J. van Hoesel & Clark, 1995; Wouters & Vanden Berghe, 2001). A class of target-isolating
8
beamformer algorithms inspired by models of binaural hearing (Jeffress, 1948) have been
developed (Kollmeier et al., 1993; Kollmeier & Koch, 1994) to preserve or attenuate the
components dominated by target or by masker energy, respectively, even in reverberant
environments with multiple noise sources (Goldsworthy, 2014; Goldsworthy et al., 2014;
Lockwood et al., 2004). A difficult aspect in noise reduction algorithm development is the
preservation of binaural cues necessary for sound localization (Kidd et al., 2015; Klasen et al.,
2006, 2007; Kokkinakis et al., 2012; Szurley et al., 2016; Thiemann et al., 2016; Van den
Bogaert et al., 2007, 2008), which will be addressed in this dissertation.
Brief Look at Back-end Frequency Encoding in CI Signal Processing
The back-end processing of sound in CI signal processing has varied a lot since the
beginning of the CI as companies have considered the differences between the NH and CI
auditory systems and attempted to provide better frequency encoding. In the NH auditory
system, frequency is represented inextricably in the place and timing codes at the level of the
auditory nerve. The place code is relayed through the tonotopic organization (or place-frequency
mapping) of the cochlea by encoding low frequencies in the apex and high frequencies in the
base which persists through the ascending auditory pathway (Clopton et al., 1974; Fekete et al.,
1984; Greenwood, 1990; Liberman, 1982; Muniak et al., 2016; Ryugo & May, 1993). The
temporal code derives from the remarkable phase-locked synchrony of the auditory nerve to
acoustic frequencies as high as 5 kHz (van den Honert & Stypulkowski, 1987; Dynes &
Delgutte, 1992; Dreyer & Delgutte, 2006; Hill et al., 1989; Shepherd & Javel, 1997; Rose et al.,
1967; Palmer & Russell, 1986; Heinz et al., 2001; Loeb et al., 1983), although there is active
debate as to the upper limit of usable temporal fine structure (TFS) (Verschooten et al., 2019).
9
Figure 1.4: Sound Coding in Cochlear Implants (Wouters et al., 2015)
The CI speech processor divides the audio signal with a set of bandpass filters and
employs the tonotopic organization of the cochlea for the coding of place (Wouters et al., 2015).
This place coding is coarse due to the large size of CI electrodes and broadness of stimulation
compared to the plentiful, microscopic auditory nerve fibers (Zeng, 2017). The manner of
temporal frequency encoding varies by CI company. Temporal information can be categorized
into the low-frequency temporal envelope, mid-frequency periodicity, and TFS (Wouters et al.,
2015). Cochlear Corporation’s ACE (Advanced Combination Encoder) Strategy and ABs’
HiRes120 (High Resolution) Strategy use temporal envelopes to modulate high-frequency pulse
trains, while MED-EL’s FSP (Fine Structure Processing) seeks to also encode periodicity cues
on a set of apical electrodes (Wouters et al., 2015).
10
A study done in the Bionic Ear Lab characterized the stimulation patterns of the three
main CI companies for pure and complex tones (Goldsworthy & Bissmeyer, in review). Pure
tones primarily provided place cues, with timing cues discarded with Cochlear devices and with
variable amounts of temporal cues provided by AB and MED-EL devices. Stimulation for AB
and Cochlear devices was relatively constrained to 2-3 electrodes compared to the broader
stimulation pattern observed with MED-EL. AB provided some temporal encoding of frequency
information with lower rates, but the synchrony of stimulation to the input frequency was
relatively low compared to MED-EL devices. For complex tones with low harmonics, place
pitch cues could be observed in the patterning of electrical stimulation across electrodes
beginning at 220 Hz for AB, 300 Hz for Cochlear, and 440 Hz for MED-EL. In general, both
place and timing cues for pitch were encoded in the resulting stimulation. For complex tones
with high harmonics, depending on the filter specifications of the sound processing emulation,
the resulting temporal cues for pitch could be encoded with exceptional precision. The relative
depth of stimulation was visibly deeper for Cochlear and MED-EL devices compared to that
provided by AB, likely driven by narrower spectral filtering used with Fidelity 120. No place-of-
excitation cues associated with the fundamental frequency were present in the stimulation
providing a clean comparison of performance with only temporal cues.
A coding strategy called simultaneous analog stimulation (SAS) was developed in the
1980s to introduce temporal fine structure but resulted in loudness variation and overstimulation
(Waltzman, 2006). This strategy was later implemented by AB in the 1990s while seeking to
avoid former issues with stimulation (Clark, 2006). SAS was compared to CIS in multiple
studies with some studies showing some preference of SAS (Battmer et al., 1999, 2000), while
others showed minimal preference (Osberger & Fisher, 2000; Zwolan et al., 2005). Although
11
interestingly one study showed a preference for SAS for those with lower duration of deafness
before implantation (Osberger & Fisher, 2000). SAS led to lower threshold and comfort levels
(Battmer et al., 1999), and could be a jumping-off point for a new TFS processing strategy
(Imennov et al., 2013; Von Wallenberg et al., 1990). There are challenges and considerations
toward the end of encoding TFS in CI signal processing (Laneau et al., 2006; Merzenich, 1983;
Moon & Hong, 2014; Rubinstein et al., 1999; Wouters et al., 2013), including the trade-off
between providing detailed TFS with increased channel interaction versus less detailed TFS with
reduced channel interaction (Loizou et al., 2003), which should be explored in future works.
Exploring Frequency Resolution with Electrode Psychophysics
CIs provide the means to separate and determine the individual contributions of place and
rate as potential cues for pitch perception. This can provide ways to explore how to improve
back-end signal processing in CIs. Studies looking at CI place-of-excitation have shown
tonotopic progression with basal electrodes heard as higher in pitch compared to apical
electrodes (Nelson et al., 1995; Tong & Clark, 1985). Pairs of electrodes simultaneously
stimulated or closely interleaved provide intermediate place cue percepts (Kwon & van den
Honert, 2006; Landsberger & Srinivasan, 2009; Macherey & Carlyon, 2010; H. J. McDermott &
McKay, 1994; Srinivasan et al., 2012). With this method, CI users can generally discriminate
place-of-excitation differences of less than 1 electrode based on pitch (Kenway et al., 2015;
Laneau & Wouters, 2004; Townshend et al., 1987a).
CI users can perceive changes in pitch associated with stimulation rate, but sensitivity
generally diminishes above 300 Hz (Carlyon et al., 2010; Laneau et al., 2004; Macherey &
Carlyon, 2014; H. J. McDermott & McKay, 1997; McKay et al., 2000; Shannon, 1983; Tong et
12
al., 1982; Tong & Clark, 1985; Zeng, 2002). This effectively means that an octave change in
stimulation rate will not always be perceived as an octave change in pitch. However, since many
clinical processors limit TFS cues, it is possible that the perception of pitch associated with
stimulation rate may require experience (Goldsworthy & Shannon, 2014; Wouters et al., 2015).
Studies have explored the combination of place and rate cues for pitch judgments with
varying results. Many studies posited place and rate of stimulation to be perceptually orthogonal,
in that both can be used to manipulate pitch percepts, but that they do not combine
synergistically (Landsberger et al., 2018; Macherey et al., 2011; McKay et al., 2000; Tong et al.,
1983). While other studies have found that pitch perception was strongly a function of both place
and rate cues but with variations in saturation and sound quality (Erfanian Saeedi et al., 2017;
Fearn & Wolfe, 2000; Landsberger et al., 2016; Luo et al., 2012; Schatzer et al., 2014; Stohl et
al., 2008; Swanson et al., 2019). By whatever mechanism place and rate combine to form a pitch
percept, these studies conclude that some combination of these two cues could improve signal
processing strategies opening the window for better pitch perception in CI users (Erfanian Saeedi
et al., 2017; Luo et al., 2012; Rader et al., 2016; Stohl et al., 2008).
Purpose of this Dissertation
Although CI users achieve exceptional speech recognition in quiet, they have trouble in
noisy, reverberant environments and difficulty with pitch perception. This has a significant effect
on the quality of life in CI users. These issues could be caused by any number of factors, with the
factors addressed in this work being those related to CI front- and back-end processing.
13
To address the issues in background noise, a front-end binaural noise reduction
algorithm, referred to as Binaural Fennec, was designed to be robust to moderate levels of
reverberation while preserving binaural cues for sound localization.
As frequency resolution supports all aspects of hearing, the next step was to explore the
existing frequency resolution (based on front- and back-end signal processing) in CI users. The
next study explored acoustic psychophysical assessment tasks and interval identification training
with NH listeners and CI users (through the clinical processor from all three implant companies).
It also explored how performance at all tasks correlated with musical experience. The results
indicated strong correlations between measures of pitch resolution and interval identification;
however, only a small effect of training on interval identification was observed for the CI users.
The results also showed that the CI users perform significantly worse than NH listeners at all
pitch judgment tasks. This is likely due, at least in part, to poor frequency resolution through the
processor, so the next step was to explore how frequency resolution could be tested and
improved in CI users. This could open the door to the potential of future improvement in CI
back-end processing.
Ways to improve frequency resolution and potentially back-end processing were explored
over a subsequent 3 electrode psychophysical studies. The first electrode psychophysical study
looked at the effect of individual differences and stimulation rate training on pitch ranking with
stimulation rate. The second electrode psychophysical study looked at the performance of CI
users on melodic contour identification with typical clinical frequency allocation and simple rate
pitch ranking with expanded frequency allocation to provide better access to low-frequency place
cues. The third electrode psychophysical study did a preliminary exploration of longitudinal
14
training at stimulation rate pitch ranking with assessment tasks consisting of a variety of
electrode and acoustic psychophysical pitch judgment tasks.
These five studies begin to address the capabilities of the current processor and ways to
improve front- and back-end processing in CIs with the goals of improving noise reduction and
pitch perception. These studies contribute to improving hearing through CIs and further our
understanding of the relative importance of place-of-excitation and stimulation rate coding for
frequency resolution. Discussion will focus on future directions based on this dissertation as well
as the challenges to implementing the implications of the electrode psychophysical studies in CI
back-end processing.
15
Chapter 2: A Novel Adaptive Beamforming Algorithm Improves
Signal-to-Noise Ratio while Preserving Cues Necessary for
Localization
The work described in this chapter was published in the Journal of the Acoustical Society of
America.
Bissmeyer, S. R. S., and Goldsworthy, R. L. (2017). “Adaptive spatial filtering improves speech
reception in noise while preserving binaural cues,” The Journal of the Acoustical Society of
America, 142, 1441–1453. doi:10.1121/1.5002691, with the permission of AIP Publishing
Introduction
People with hearing loss struggle to take advantage of situations where background noise
is spatially separate from desired speech (e.g., Arbogast et al., 2005; Marrone et al., 2008a,
2008b; Neher et al., 2011; Best et al., 2012; Woods et al., 2013; Kidd et al., 2015). Studies have
compared speech reception when the desired speech and unwanted masker have the same
location versus when they are spatially separate (Brungart, 2001; Freyman et al., 1999, 2001;
Hawley et al., 2004). These studies have demonstrated that people with NH have better speech
reception when the masker is spatially separate from the desired speech, an effect referred to as
“spatial release from masking” (Hawley et al., 2004; Hirsh et al., 1950; Kidd et al., 2010;
Swaminathan et al., 2016). People with hearing loss, even those treated with CIs or hearing aids,
may exhibit little to no spatial release from masking (Loizou et al., 2009; Marrone et al., 2008c;
Rothpletz et al., 2012). A long-term clinical goal is to restore sufficient hearing to those with
hearing loss so that they may benefit from such spatial release from masking. However, a more
immediate solution to this problem is the use of multiple microphone signal processing, or
directional microphones, to provide spatial filtering of a sound before it is presented to the
hearing-impaired listener.
16
There is a substantial history concerning multiple microphone signal processing for
selectively enhancing sounds based on spatial location (for reviews: Brandstein and Ward, 2001;
Van Veen and Buckley, 1988). Multiple microphone signal processing can be subdivided into
fixed and adaptive algorithms. In fixed algorithms, linear combinations of microphones, or
microphone ports, are used to form a spatial filter that is independent of the input acoustics.
Common examples of fixed spatial filters include cardioid and dipole response patterns, which
have been implemented and shown to provide consistent speech reception benefits for both CI
and hearing aid users (Chung, 2004; Chung et al., 2004, 2006; Chung & Zeng, 2009; Desloge et
al., 1997; Kates, 1993; Soede, Berkhout, et al., 1993; Soede, Bilsen, et al., 1993; Stadler &
Rabinowitz, 1993).
Closely related to fixed algorithms, the earliest adaptive multiple microphone algorithms
were developed to dynamically adjust filter weights used for combining the microphone signals
to minimize output noise power. These early adaptive beamforming algorithms have generally
been referred to as null-steering beamformers (Frost, 1972; Griffiths & Jim, 1982), which have
evolved over the years and have been used on many clinical CI and hearing aid processors
(Greenberg & Zurek, 1992; Hersbach et al., 2012; Kates & Weiss, 1996; Kokkinakis et al., 2012;
Kompis & Dillier, 1994; Spriet et al., 2007; Vanden Berghe & Wouters, 1998; Welker et al.,
1997). Such null-steering beamformers have been shown to provide speech reception benefits in
background noise; but benefits quickly diminish with increasing reverberation and/or number of
noise sources (Desmond et al., 2014; Greenberg & Zurek, 1992; Hamacher et al., 1997; Hazrati
& Loizou, 2012; Kokkinakis & Loizou, 2010; R. J. van Hoesel & Clark, 1995; Wouters &
Vanden Berghe, 2001).
17
In contrast to null-steering beamformers, a relatively new class of beamforming
algorithms has been developed using a fundamentally different approach. Instead of slowly
adapting the steering of spatial nulls, these algorithms use relatively rapid spectrotemporal signal
analysis to determine which components are dominated by target or by masker energy and then
preserve or attenuate the components accordingly. This general approach was inspired by models
of binaural hearing (Jeffress, 1948) leading to the pioneering work of Kollmeier and colleagues
(Kollmeier et al., 1993; Kollmeier & Koch, 1994). Since the signal processing objective of this
class of beamformer is to isolate the target speech and to suppress all other sounds, it is
appropriately referred to as target-isolating beamformers. Lockwood and colleagues (2004)
conducted a systematic study that included a fixed beamformer, two null-steering beamformers,
and two target-isolating beamformers and demonstrated that the target-isolating beamformers
were relatively robust to reverberation compared to null-steering beamformers. Further
developing this class of beamformer, Goldsworthy and colleagues (Goldsworthy, 2014;
Goldsworthy et al., 2014) developed an algorithm, referred to as Fennec, based on analysis of
inter-microphone phase differences between closely-spaced microphones in a behind-the-ear
capsule. They demonstrated that the Fennec algorithm provided speech reception benefits for CI
users even with moderate levels of reverberation and as many as 11 noise sources.
The purpose of the study presented in this article was to evaluate a binaural version of the
Fennec algorithm designed to improve the target to masker ratio when the target is in front of the
listener and the masker is spatially separate, while preserving binaural cues. Previous studies that
have considered configuring spatial filtering to preserve binaural cues have had varying levels of
success. Desloge and colleagues (1997) demonstrated that fixed beamforming provides a degree
of noise suppression while preserving binaural cues. However, subsequent work concerning
18
adaptive spatial filtering has generally been unsuccessful toward achieving both noise
suppression and preservation of binaural cues. One modification put forth by Welker and
colleagues (1997) and revisited by Kidd and colleagues (2015) was to divide the acoustic
spectrum into two regions and to implement noise suppression in one spectral region while
preserving binaural cues in the other spectral region. That approach circumvents the problem to
an extent by either performing spatial filtering within a spectral region or preserving the binaural
cues, but both objectives are not achieved for any spectral region.
A different approach for spatial filtering based on binaurally situated microphones using
adaptive Wiener filtering was evaluated for noise suppression while preserving binaural cues
(Klasen et al., 2006, 2007; Szurley et al., 2016; Van den Bogaert et al., 2007, 2008). This
approach, rather than dividing the acoustic spectrum into processed and unprocessed regions,
introduced a cost function to control the relative degree of noise suppression to preserve binaural
cues (Klasen et al., 2006). This approach had limited success, the limiting factor being that to
preserve binaural cues a substantial loss in noise suppression had to occur (Van den Bogaert et
al., 2007). A second limitation of this approach was that while the modification preserves the
binaural cues associated with the target sound, it does not preserve the binaural cues associated
with the environmental noise sources (Klasen et al., 2007; Kokkinakis et al., 2012; Van den
Bogaert et al., 2008). A more recent study of this approach investigated the use of a remote
microphone with a high target to masker ratio to control the adaptive procedure to achieve both
noise suppression and binaural cue preservation (Szurley et al., 2016). While a theoretically
important step, it is not practical in CI and hearing aid applications to presume that the listener
will have available a remote microphone with a clean representation of the target sound.
19
More recently, an approach using a signal-to-noise ratio estimator that controls a binary
decision mask was evaluated for both providing noise suppression while preserving binaural cues
(Thiemann et al., 2016). That evaluation demonstrated that a constrained target-isolating
beamformer could be successfully configured to provide noise suppression while preserving
binaural cues sufficient to convey sound locations for both the target speech as well as
environmental noise sources.
Inherent asymmetries between ears, such as auditory nerve survival or physical
differences, and asymmetries introduced by the clinical processors, such as adjusted gain control,
can confound interaural level and timing cues for the hearing-impaired listener. Since these are a
significant clinical and signal processing problem, it is essential that the spatial beamforming
algorithm does not introduce any additional interaural distortions. This goal can be achieved by
requiring that the spatial beamforming algorithm applies identical spectrotemporal noise
attenuation to the left and right ear devices at any given moment for any given frequency
component. In that manner, if the left and right ear devices at some moment in time and at some
frequency, have specific interaural level and timing differences, then attenuating the left and
right ears by the same amount will preserve the original interaural differences.
There are numerous ways in which independent left and right ear beamformers could be
combined to produce a coordinated output. The approach considered in this article is
straightforward with the attenuation functions of left and right ear beamformers averaged to
produce a single attenuation function that is jointly applied to left and right ear microphone
signals. Other approaches based on acoustic analysis and dynamic switching to the ear with
better signal-to-noise ratio might be developed to enhance this approach. In the case of dynamic
20
switching, the algorithm could determine which ear has the higher SNR and give more weight to
that ear in the attenuation function allowing better noise reduction due to the better ear effect.
The present article presents acoustic analysis and perceptual results as evidence that the
Binaural Fennec algorithm can improve speech reception in noise while preserving binaural
cues. The perceptual results include measures of speech reception in noisy and reverberant
conditions, as well as measures of sound source lateralization of target speech in the presence of
a masker. For the lateralization task, subjects were asked to detect if the target speech was
coming from the left or the right side in the presence of background noise. All subjects could
readily perform this lateralization task and clearly perceived the target speech as coming from
either the left or the right; importantly, performance on this task was improved by the Binaural
Fennec algorithm when tested in an anechoic condition and was not significantly altered in the
reverberant conditions. Consequently, the results indicate that the algorithm can improve speech
reception in noise while preserving binaural cues necessary for lateralization.
Methods
Subjects
Sixteen NH subjects participated in this study. The University of Southern California’s
Institutional Review Board approved the study protocol. All subjects provided informed consent
and were paid for their participation. All subjects were native English speakers who had pure
tone audiometric thresholds of 20 dB HL or better at octave frequencies between 125 and 8000
Hz.
Materials
21
The Coordinate Response Measure (CRM) sentence database (Bolia et al., 2000) was
used to measure speech reception and lateralization thresholds. The CRM materials consist of
sentences of the form “Ready callsign go to color number now,” with all 256 combinations of 8
call signs (“Arrow”, “Baron”, “Charlie”, “Eagle”, “Hopper”, “Laker”, “Ringo” “Tiger”), 4 colors
(“blue”, “green”, “red”, “white”), and 8 numbers (1 through 8). These sentence materials were
recorded using 4 female and 4 male talkers, with an average sentence length of 3 seconds. For
speech reception thresholds measured in the present study, only one of the talkers (a male) was
used for both the target and the masker speech. The rationale for using the same talker on each
trial is that in typical conversations one is aware of whom one is speaking to, while there are
circumstances (e.g., answering the phone) that this assumption does not hold, it is typically true;
therefore, we chose not to include talker variability as a perceptual dimension. In addition, for
speech reception and lateralization testing, the competing talker masker was also selected as the
same male speaker, but time-reversed. The rationale for using the same male talker for the
competing masker was that we are primarily interested in energetic masking and wanted to
minimize talker-specific cues that affect masking release such as vocal tract length and voicing
cues.
Head-related transfer functions (HRTFs) to simulate spatial configurations were
generated using a room simulation method (Peterson, 1986; Shinn-Cunningham et al., 2001).
This simulation method was used since it provides a precise method for introducing certain
aspects of interaural cues, such as interaural timing and level differences associated with head
shadow, while precisely controlling reverberation levels. This simulation method does not
simulate the effect of pinnae so does not capture spectral cues associated with elevation or front
versus back asymmetries. This method does, however, provide precise control over inter-
22
microphone placement and reverberation, which is useful for studies of spatial filtering since it
allows primary factors of inter-microphone differences and reverberation levels to be examined
while controlling for other factors such as measurement noise often encountered with measured
HRTFs. The simulated room measured 4 x 4 x 2.6 m with a 17-cm diameter reflective sphere
located in the center of the room serving as a head model. Four microphone positions were
rendered with 2 microphones on either side of the reflective sphere. The 2 microphones on either
side of the head were separated by 1 cm in an endfire configuration (i.e., microphone array
collinear with a target that is straight ahead of the listener). HRTFs were generated for sound
source to microphone position for sound sources located 1 m away from the center of the sphere
in the azimuthal plane for every angle from 0 to 360 degrees, and for 4 different reverberation
times (T60) including 0, 400, 800, 1200 ms. These HRTFs were used to spatialize target speech
and masker speech in the various acoustic conditions described below for acoustic analysis and
human subject testing.
Binaural Fennec Algorithm
Goldsworthy and colleagues (2014) introduced a spatial filtering algorithm referred to as
“Fennec” that uses two microphones situated 1 cm apart in an endfire configuration. The present
article evaluates the performance of a binaural version of the Fennec algorithm, referred to as
Binaural Fennec. The Binaural Fennec algorithm uses 4 microphone signals with two
microphones over each ear in an endfire configuration. In this manner, there are 4 microphones
total with two over each ear. To prevent the binaural version of the algorithm from producing
any inherent interaural distortions, the left and right ear algorithms are combined to determine a
joint spectrotemporal attenuation that is identically applied to the left and right ear signals. By
applying the same spectrotemporal attenuation to both sides, the noise reduction processing
23
emphasizes or suppresses specific spectrotemporal components without modifying interaural
cues.
The Fennec algorithm compares the phase information of microphones that are 1 cm
apart and situated over the ear. The first stage of Fennec processing is to compute a short-time
Fourier transform of the front and back microphone signals. The implementation described in
this article used a Fourier transform with 46.4-ms (1024-point) Hann windows with half-window
overlap. Front and back microphone short-time Fourier transforms, 𝐹 (𝑡 , 𝑓 ) and 𝐵 (𝑡 , 𝑓 ), were
used to calculate a spectrotemporal attenuation function based on inter-microphone phase
differences.
Phase-based attenuation was calculated by estimating the angle of incidence (AOI) for
each spectrotemporal component:
𝐴𝑂𝐼 (𝑡 , 𝑓 ) = cos
−1
(
𝑐 𝑑𝜋𝑓 ∠
𝐹 (𝑡 , 𝑓 )
𝐵 (𝑡 , 𝑓 )
),
where c is the speed of sound, d is the inter-microphone spacing, and 𝐹 (𝑡 , 𝑓 ) and 𝐵 (𝑡 , 𝑓 ) are the
front and back microphone short-time Fourier transforms, respectively. The estimated 𝐴𝑂𝐼 (𝑡 , 𝑓 )
is then transformed to an attenuation function:
𝐴 𝑝 ℎ𝑎𝑠𝑒
(𝑡 , 𝑓 ) =
𝑁 (𝐴𝑂𝐼 (𝑡 , 𝑓 )|0, 𝛽 )
𝑁 (0|0, 𝛽 )
,
where 𝑁 (𝑥 |𝜇 , 𝜎 ) is the probability density function of a normal distribution with mean 𝜇 and
standard deviation 𝜎 . In this equation, 𝛽 is the beam width, which for the present evaluation was
set to 30 degrees. The resulting attenuation function has the shape of a normal probability density
function but with a maximum value of 1 at 0 degrees, and gradually approaching a value of 0 as
(Equation 2.2)
(Equation 2.1)
24
the angle of incidence exceeds the beam width. Figure 1.1 illustrates the resulting attenuation as
a function of the estimated angle of incidence.
Figure 2.1: Polar Plot of the Attenuation of the Binaural Fennec Algorithm
Polar plot indicating attenuation as a function of angle of incidence for the Fennec algorithm as expressed in Equation 2.2. For
each spectrotemporal component (i.e., each time-frequency cell of the short-time Fourier transform), inter-microphone phase
differences are used to estimate angle of incidence and then converted to an attenuation term per Equations 2.1 & 2.2.
The Fennec algorithm as described above was implemented on the left and right endfire
pairs of microphones. Specifically, two different spectrotemporal weighting functions were
calculated based on acoustic analysis of the left and right ear endfire pairs. At this point, one
potential noise reduction solution would be to implement the Fennec algorithm independently on
the left and right ear signals, but that solution would produce changes in the interaural
characteristics. To avoid introducing such interaural distortions, it is necessary to apply the same
spectrotemporal attenuation to the left and right ears. To accomplish this, the information from
the left and right algorithms needs to be combined to form a joint spectrotemporal attenuation.
There are multiple methods that might be used to combine information across ears, such
as better ear analysis, but for this first examination of the Binaural Fennec algorithm, a
25
straightforward averaging of spectrotemporal attenuation is used. Specifically, spectrotemporal
attenuation was calculated for the left and right ears using Eq. 2.2, then those two left and right
attenuation functions were averaged to determine a joint spectrotemporal attenuation function
(i.e., 𝐴 (𝑡 , 𝑓 ) =
𝐴 𝐿 + 𝐴 𝑅 2
). This attenuation function was then multiplied by the short-time Fourier
transforms of the left and right back microphone signals. Since the same attenuation function is
multiplied with the left and right microphone signals, any inherent inter-microphone timing and
level differences will be preserved for each spectrotemporal component.
The effect of this processing is to suppress spectrotemporal regions with poor target to
masker ratios while preserving spectrotemporal regions with higher target to masker ratios. For
the evaluations conducted here, the Binaural Fennec algorithm is implemented by jointly
applying 𝐴 (𝑡 , 𝑓 ) to the left and right front microphone signals. This processed condition is
always compared to an unprocessed condition which was simply routing the left and right front
microphone signals to the left and right channels of the headphones.
Speech Reception Thresholds
Speech reception thresholds were measured for 8 conditions consisting of each
combination of the 4 reverberation levels and with Binaural Fennec processing compared to
unprocessed conditions. The primary comparison in this measure is between Binaural Fennec
processed and unprocessed combined speech and noise. The speech reception procedure is
always administered with the presence of the masker noise. Feedback was provided during the
test with response buttons flashing green or red for correct and incorrect answers, respectively. A
target sentence was randomly selected from the CRM database always using the same male
talker. The target sentence was then filtered through the corresponding HRTF for 0 degrees. The
26
masker consisted of 3 randomly concatenated time-reverse sentences always using the same male
talker as the target. The masker was then filtered through the corresponding HRTF for the 90-
degree angle of incidence.
Subjects were instructed that the masker would be coming from 90 degrees to the right
but that they were listening for the sentence coming from straight ahead. The subjects were
tested at 4 reverberation levels (T60 = 0, 400, 800, and 1200 ms). The target speech and masking
time-reversed speech were processed using the same level of reverberation. The masking speech
was time-reversed to reduce semantic cues as several studies have found substantial decrease in
informational masking using time-reversed speech (Best et al., 2012; Freyman et al., 2001;
Gallun et al., 2013; Iyer et al., 2010; Kidd et al., 2016; Marrone et al., 2008b; Swaminathan et
al., 2015).
The speech reception threshold procedure was implemented in MATLAB, the combined
speech and noise was transmitted through an ESI U24XL external sound card and presented to
listeners at 65 dB SPL through Sennheiser HD 280 pro headphones in a sound attenuating booth.
All materials were down-sampled and processed to 16000 Hz but resampled to 44100 Hz to
avoid distortion effects associated with the digital-to-analog conversion. Subjects were unaware
of any details associated with the acoustic and signal processing aspects of the conditions on
which they were tested.
Sentences were scored correct when the subject identified both the color and number of
the sentence. The initial target to masker ratio of the procedure was set to 0 dB, which was
decreased/increased adaptively based on correct/incorrect answers. The step size of this
decrease/increase started at 4 dB, multiplying by 2
−1/4
for each reversal, until it reached a value
27
of 2 dB on the 4
th
reversal and continued at that step size until the end of that run. The 1-up, 1-
down procedure continued for 8 reversals and the average target to masker ratio from the last 4
reversals was taken as the SRT for the run. The 8 study conditions were tested in random order
with 3 repetitions of each condition.
Lateralization Thresholds
Lateralization thresholds were measured for the same 8 conditions as the speech
reception thresholds, but adjusting the target to masker ratio based on a subject’s ability to
correctly identify if a sentence was incident from the left or from the right of the straight ahead
direction. Like the speech reception procedure, the primary comparison in this measure is
between Binaural Fennec processed and unprocessed combined speech and noise. The
lateralization is always performed with the presence of the masker. In this manner, the measured
lateralization thresholds are an indicator of the noise tolerance for lateralization; specifically,
measuring the minimum target to masker ratio at which the subjects could perform the
lateralization task.
To measure lateralization thresholds, target speech was generated either 30 degrees to the
left or to the right of the straight ahead direction and then combined with a time-reversed
competing talker at 90 degrees to the right as a masker. Based on the 8° minimum audible angle
difference between a sound coming from 0° straight ahead, we proposed 30°, an angle greater
than the minimum audible angle, to be sure that we would be measuring the noise tolerance of
lateralization with and without the Binaural Fennec algorithm, as opposed to tapping into their
auditory perception limits (Carlile et al., 2016).
28
Subjects were instructed to listen for the target speech, which was always the same
sentence (“Ready Charlie, go to blue one now”), and determine if the target speech was coming
from the left or from the right. All subjects reported that the target speech clearly appeared from
a distinct spatial location from the time-reversed masker, and that they could easily perform the
task until the target to masker ratio was substantially lowered. This task is relevant to listening
situations such as when a person is attending to someone talking approximately straight ahead of
them, but not precisely straight ahead. Being able to hear the lateralized position of the target
speech may facilitate the listener’s ability to attend to that speech; just as important, the ability to
lateralize the target speech should contribute to the listener’s sense of auditory space.
Lateralization thresholds were measured for the same 8 conditions used for the speech
reception thresholds, specifically the combinations of 4 reverberation levels (T60 = 0, 400, 800,
and 1200 ms) with Binaural Fennec compared to unprocessed conditions. The target sentence
was filtered with an HRTF corresponding to either 30 degrees to the left or to the right of the
straight ahead direction. Masking time-reversed speech was generated from a random selection
of CRM sentences and filtered with an HRTF corresponding to 90 degrees to the right of the
straight ahead direction. The average target to masker ratio across microphones was controlled
by the adaptive procedure. The initial value of the target to masker ratio was 12 dB. The subject
was asked the question “Is the sound coming from the left or the right?” in a two-alternative
forced-choice task.
The adaptive procedure was identical to the procedure used for speech reception
thresholds with the exception that a 2-up, 1-down procedure was used, thus converging to 70.7%
detection accuracy. This modification of was made since chance performance for the
lateralization procedure was 50%, so a higher convergence point for detection accuracy was
29
needed. In contrast, for the speech reception procedure chance performance for the speech
reception procedure was only 3%, so the 1-up, 1-down procedure was sufficient. The
lateralization procedure continued for 8 reversals and the average target to masker ratio from the
last 4 reversals was taken as the threshold for the run.
Acoustic Analyses
Before examining the results of the speech reception and lateralization measures, acoustic
analyses are presented to provide insight into algorithm performance. These acoustic analyses
were implemented to quantify how well the Binaural Fennec algorithm improves the target to
masker ratio while preserving binaural cues. For these acoustic analyses, 16 different sentences
were drawn from the CRM database for both the target and masker signals. The same talker was
used for both the target and masker signal and the masker was time-reversed.
For the first acoustic analysis, the combined target plus masker signal was the anechoic
condition with target at 0 degrees and masker at 90 degrees with 0 dB target to masker ratio. To
analyze the effect of the Binaural Fennec algorithm on interaural differences, the front left and
front right microphone signals were compared in terms of inter-microphone timing and level
differences. Figure 2.2 plots the inter-microphone timing and level differences between the left
and right front microphones before and after Binaural Fennec processing. The points represented
in Figure 2.2 are the individual spectrotemporal components (e.g., the individual time-frequency
cells of the short-time Fourier transforms) of the signals, which are plotted by their individual
timing and level differences. The solid line indicates the average energy associated with each
inter-microphone timing or level difference. This average was calculated by summing
component energy into bins using a histogram method and weighting the terms by the
30
component energy. The upper left panel shows the distribution of signal power associated with
time differences between microphones.
Two distribution clusters occur for interaural timing differences of 0 and 750 μs,
corresponding to the target and masker locations of 0 and 90 degrees, respectively. The left
lower panel shows the corresponding distribution after Binaural Fennec processing. The target
distribution was relatively unchanged indicating that the time differences between microphones
for the target speech were preserved while the energy associated with the masker speech was
reduced. The timing differences associated with the masker speech were not changed, but the
power associated with those components was reduced.
Figure 2.2: Components of Target and Noise Before and After Processing Shows Preservation of
Binaural Cues
Binaural Fennec preservation of binaural cues for the anechoic condition. Target speech was generated at 0 degrees with a
competing talker masker at 90 degrees. The average target to masker ratio across microphones was 0 dB. Individual markers
represent individual spectrotemporal components of the short-time Fourier transform as a power versus binaural cue pair. Solid
line indicates a weighted histogram of power for these components. Subplot A shows the target at 0 us ITD and the masker at
around 700 us ITD. After processing in subplot C, the masker power at 700 us is greatly attenuated, while the target power is
generally preserved. Subplot B shows the target at 0 dB ILD and the masker at around 5 dB ILD. After processing in subplot C,
the masker power at 5dB is greatly attenuated, while the target power is generally preserved.
31
The inter-microphone level differences shown in the right panels of Figure 2.2 have a
distribution cluster at 0 dB corresponding to the target at 0 degrees and with a widespread
distribution between 0 and 8 dB corresponding to the masker at 90 degrees. The effect of
Binaural Fennec algorithm was again to emphasize the spectrotemporal components having
inter-microphone time differences indicating the target signal. The processed inter-microphone
level differences for the target and masker were not changed, but the power associated with those
components was reduced.
The purpose of the preceding analysis was to substantiate the claim that the Binaural
Fennec algorithm preserves interaural timing and level cues. This claim is a straightforward
consequence since the algorithm applies identical spectrotemporal attenuation to the left and
right short-time Fourier transforms; consequently, the processing necessarily will preserve
interaural differences. However, a relevant issue is the extent that such interaural cues are
initially present in the left and right microphone signals prior to processing when the listener is in
a reverberant environment. To investigate that issue, a second acoustic analysis was implemented
using the same target and masker signals at 0 dB target to masker ratio, but considering a range
of reverberation levels (T60 = 0, 400, 800, and 1200 ms).
For this second acoustic analysis, the spectrotemporal components of the short-time
Fourier transform were divided into components that were either target or masker dominated
based on whether the target to masker ratio for each spectrotemporal component was either
greater or less than 0 dB. With this division of spectrotemporal components, a histogram analysis
of interaural timing and level differences was calculated. Figure 2.3 illustrates equal-contour
lines for the distributions of interaural timing and level differences associated with target and
masker dominated components. The upper left subplot indicates that the interaural distributions
32
associated with the anechoic condition have distinct distributions for the target and masker
dominated components.
This result provides acoustic differences for the Binaural Fennec algorithm to separate
the target and masker components based on phase and/or level differences. The distinction
between target and masker dominated components, however, becomes blurred with increasing
reverberation. The upper-right subplot of Figure 2.3 illustrates the distribution of interaural
timing and level cues for the 400 ms reverberation time with much greater overlap in the target
and masker dominated distributions. This blurring of interaural distributions worsens with
increasing reverberation until there is complete overlap for the 1200 ms reverberation level.
Figure 2.3: Acoustic Analysis Showing Target and Noise Components Before Processing
Acoustic analysis of inter-microphone timing and level differences for simulated rooms having reverberation times (T60) of 0,
400, 800, and 1200 ms.
A third acoustic analysis was completed to quantify the net improvement in the target to
masker ratio produced by the algorithm for different reverberation levels (T60 = 0, 400, 800, and
1200 ms) when varying masker location in the azimuthal plane. The target speech was generated
at a location of 0 degrees, while the masker was generated at angles ranging from -180 to 180
33
degrees in 30 degree increments. For each combination, the target and masker signals were
filtered through corresponding HRTFs and added together such that the average target to masker
ratio was 0 dB.
For these conditions, the combined target plus masker was filtered through the Binaural
Fennec algorithm, which produced a spectrotemporal attenuation function. In typical use, this
spectrotemporal attenuation would be applied to the short-time Fourier transform components of
the left and right microphone signals before inverse transforming and presenting to the listener.
For analysis purposes, this spectrotemporal attenuation was applied separately to the target and
masker signals to quantify the net effects on the overall target to masker ratio. This type of
algorithm analysis has been referred to as “yoked” processing (e.g., Greenberg and Zurek, 1992).
Figure 2.4 illustrates the overall average attenuation of the left and right front microphone
signals that results from the spectrotemporal attenuation applied separately to the target and
masker.
34
Figure 2.4: Attenuation of the Target and Noise Masker at Different Angles and Reverberation
Times
Binaural Fennec attenuation of target and masker signals. Each panel shows results from the 4 reverberant conditions (T60 = 0,
400, 800, 1200 ms). The angle of incidence for the target speech was always 0 degrees, results are plotted for masker angles
ranging from -180 to 180 degrees in 30-degree increments.
Considering the anechoic condition, the algorithm did not suppress the masker when the
masker was within the processing beam width (±30 degrees). Once the masker was outside of the
beam width, the masker was progressively attenuated reaching maximum attenuation near ±120
degrees. Attenuation of the masker for a masker angle of 120 degrees was approximately 17 dB,
while attenuation of the target for that condition was approximately 1 dB, for a target to masker
ratio improvement of approximately 16 dB. Performance was substantially degraded for the 400
ms condition with the primary effect being that the target was progressively attenuated. This
effect was more pronounced for the 800 ms condition with more than 8 dB of target attenuation
for all conditions. Consequently, for that condition, the overall improvement in the target to
masker ratio was reduced to approximately 4 dB for conditions where the masker was outside of
the defined beam width. While the observed target to masker benefit was reduced compared to
the less reverberant conditions, it was still a substantial benefit. For the 1200 ms condition, the
effect was most pronounced causing more than 10 dB attenuation of the target, while the masker
was attenuated by 13 dB.
The reason that algorithm performance degrades under increasing reverberation is that
the inter-microphone timing differences used to estimate angle of incidence becomes
increasingly diffuse. Target reflections combine with the direct component to indicate an angle
of incidence that is outside of the processing beam, hence components are attenuated. While
reverberation degrades the benefit derived from the algorithm, a consistent 3 to 4 dB
improvement is still maintained for reverberant conditions.
35
Results
Speech Reception Thresholds
Speech reception thresholds were measured for 8 conditions consisting of the
combinations of 4 reverberation levels with the Binaural Fennec algorithm compared to the
unprocessed condition. The upper panel of Figure 2.5 plots speech reception thresholds averaged
across repetitions and subjects for each condition. The lower panel of Figure 2.5 plots the
average difference between speech reception thresholds between Binaural Fennec and the
unprocessed condition for each reverberation level. The general trend observed in these results is
that speech reception thresholds increase with increasing levels of reverberation, but noting that
Binaural Fennec consistently improves thresholds compared to the unprocessed condition.
Figure 2.5: Speech Reception Thresholds were Significantly Better with Binaural Fennec
Processing at all Reverberation Times
Speech reception thresholds (upper panel) and changes in speech reception thresholds derived from the Binaural Fennec
algorithm (lower panel) for normal hearing listeners with target speech at 0 degrees and masker at 90 degrees for each of the 4
reverberant conditions (T60 = 0, 400, 800, 1200 ms). Plotted speech reception thresholds are averages across repetitions and
subjects, error bars indicate the standard error of the mean across subjects.
36
Analysis of variance (ANOVA) was implemented on the measured speech reception
thresholds with reverberation level and algorithm (i.e., Binaural Fennec or unprocessed) as
factors and treating subject as a random blocking factor. Both reverberation level (F(3,256) =
1948.8, p < 0.001) and algorithm (F(1,256) = 528.42, p < 0.001) were significant effects
confirming the clear trends observed in Figure 2.5 that speech reception thresholds were
degraded by reverberation and improved by the Binaural Fennec algorithm. The interaction
between reverberation level and algorithm was significant (F(3,256) = 149, p < 0.001), reflecting
the fact that the Binaural Fennec algorithm provided more benefit in the anechoic condition, thus
warranting post-hoc analysis to determine if the observed speech reception benefits were
significant at the other reverberation levels.
A post-hoc multiple comparisons procedure was implemented on the statistics of the
ANOVA described in the previous paragraph using Fisher’s least significant difference method
with a significance criterion of 0.05 to compare average speech reception thresholds with the
Binaural Fennec algorithm and the unprocessed conditions for each reverberation level. The
comparisons at each reverberation level were significant. So, while the speech reception benefits
derived from the Binaural Fennec algorithm decreased with increasing reverberation, the
algorithm continued to provide significant benefits even for the highest level of reverberation
tested.
Lateralization Thresholds
Lateralization thresholds were measured for the same 8 conditions as measured for speech
reception, consisting of the combinations of 4 reverberation levels with the Binaural Fennec
algorithm compared to an unprocessed condition. The upper panel of Figure 2.6 plots
37
lateralization thresholds averaged across repetitions and subjects for each condition. The lower
panel of Figure 2.6 plots the average difference between lateralization thresholds measured for
the unprocessed conditions compared to Binaural Fennec for each level of reverberation. The
general trend is that lateralization thresholds increased with increasing levels of reverberation,
noting that the Binaural Fennec algorithm improved thresholds for the anechoic condition, but
not for the other reverberant conditions.
Figure 2.6: Lateralization Thresholds were Improved in Anechoic Conditions with Binaural Fennec
Processing, while not Impeded in Reverberant Conditions
Lateralization thresholds (upper panel) and changes in lateralization thresholds derived from the Binaural Fennec algorithm
(lower panel) for normal hearing listeners with target speech either 30 degrees to the left or to the right and masker at 90 degrees
for each of the 4 reverberant conditions (T60 = 0, 400, 800, 1200 ms). Plotted lateralization thresholds are averages across
repetitions and subjects, error bars indicate the standard error of the mean across subjects.
An ANOVA was implemented for lateralization thresholds with reverberation level and
algorithm (i.e., Binaural Fennec or unprocessed) as factors and treating subject as a random
blocking factor. Both reverberation (F(3,256) = 523.4, p < 0.001) and algorithm (F(1,256) =
6.67, p = 0.0208) were significant confirming the trend observed in Fig. 2.4 that lateralization
thresholds were degraded by reverberation; however, the significance of algorithm is likely
weighted by the exceptional performance of the Binaural Fennec algorithm in the anechoic
38
condition. This suspicion was substantiated in that the interaction between reverberation and
algorithm was significant (F(3,256) = 9.44, p < 0.001), thus warranting post-hoc analysis of
algorithm comparisons at each reverberation level.
A post-hoc multiple comparisons procedure was implemented on the statistics of the
ANOVA described in the previous paragraph using Fisher’s least significant difference method
with a significance criterion of 0.05 to compare average lateralization thresholds with the
Binaural Fennec algorithm relative to the unprocessed conditions for each level of reverberation.
With this criterion, the Binaural Fennec algorithm only provided significant improvements in
lateralization thresholds in the anechoic condition. The associated p-values for comparing
thresholds with the Binaural Fennec algorithm and the unprocessed conditions at each
reverberation level were: T60 = 0 ms (p < 0.001), T60 = 400 ms (p = 0.68), T60 = 800 ms (p =
0.99), T60 = 1200 ms (p = 0.55). Stating this result in a positive manner, while the Binaural
Fennec algorithm did not improve lateralization thresholds in the reverberant conditions, it did
not degrade them either. Thus, the results indicate that the algorithm improved speech reception
thresholds while at least maintaining lateralization thresholds for the evaluated conditions.
Psychometric Curve Fitting
Additional analysis was conducted on the speech reception and lateralization thresholds to
estimate psychometric curves for the observed data. For each acoustic condition the detection
accuracy (e.g., correct/incorrect response) was analyzed as a function of the target to masker
ratio for every trial across subjects. A cumulative distribution function for a normal distribution
was fit to this detection accuracy versus target to masker ratio comparison for each condition.
Figure 2.7 illustrates the derived psychometric curves for each condition. The large benefits of
39
the Binaural Fennec algorithm for both speech reception and lateralization in the anechoic can be
clearly seen in the upper subplots. For the reverberant conditions, the speech reception
psychometric curves illustrate some improvement derived from the algorithm even in the most
reverberant condition; however, no significant improvement was derived from the algorithm on
the lateralization task. Consequently, for the present measures of speech reception and
lateralization, the evidence indicates that the algorithm can improve speech reception while at
least preserving lateralization of sound.
Figure 2.7: Psychometric Functions Show the Difference in Detection Accuracy for Both Speech
Reception and Lateralization Thresholds Before and After Binaural Fennec Processing
Psychometric functions estimated from fitting cumulative distribution functions of Gaussian distributions to the measured
detection accuracy for across all trials and subjects. Specifically, for every trial, detection accuracy was analyzed as a function of
the target to masker ratio for that trial and the psychometric function was fitted by minimizing the mean squared error across
every trial. The results indicate the Binaural Fennec algorithm improves speech reception thresholds, while preserving
lateralization thresholds.
Discussion
The Binaural Fennec algorithm improved speech reception while preserving lateralization
for NH listeners for the conditions tested. This is an important result since adaptive spatial
40
filtering algorithms have had mixed results for improving speech reception while preserving the
interaural cues needed for sound localization. Certain modifications of beamforming algorithms
avoided this issue by spectrally parsing the input signals and performing noise reduction or
binaural cue preservation in different frequency regions, but not both (Kidd et al., 2015; Welker
et al., 1997). Other approaches have used a trade-off function that allows a degree of binaural
cue preservation to be maintained, but at the cost of noise suppression performance (Klasen et
al., 2006, 2007; Szurley et al., 2016; Van den Bogaert et al., 2007, 2008). The study described in
the present article demonstrates that both objectives can be achieved, even in reverberant
conditions.
There are several advantages for configuring spatial filtering for preserving binaural cues.
Perhaps the most obvious is that a listener would be able to perceive the location of
environmental sounds. This perceptual goal would be beneficial with respect to both the desired
sound being attended to as well as any unwanted environmental sounds. It is important to note
that while spatial filtering suppresses unwanted sounds, it generally does not completely remove
those suppressed sounds. Consequently, the listener would still be able to hear sounds that are
coming from locations removed from the target direction, but those sounds would simply be
conveyed with less intensity. The procedures put forth by others to only preserve the binaural
cues associated with target speech were not allowing the listener to associate a location, and
consequently to orient, to other sounds in the environment.
Another advantage of preserving interaural timing and level cues is that it presumably
would facilitate the listener’s ability to segregate multiple sound sources into independent
streams. A primary rationale for spatial filtering is the recognition that with hearing loss an
individual’s ability to spatially segregate sounds is diminished; consequently, spatial filtering is
41
used to enhance sounds from a desired direction and, in a sense, perform the auditory stream
segregation that occurs naturally in healthy auditory physiology. However, since hearing loss can
have different degrees of severity, it would be better to provide spatial filtering preprocessing
that retains binaural cues, which would allow the listener to make use of any residual stream
segregation abilities that they retain. Individuals with mild hearing loss may have different
attenuation versus interaural cue preservation trade-offs than individuals with severe to profound
hearing loss.
This initial study of the Binaural Fennec algorithm examined speech reception and
lateralization thresholds in background noise for NH listeners. The motivation for using NH
listeners was that NH listeners have relatively homogenous sound lateralization abilities.
Specifically, the NH subjects could all perform the lateralization task and were impacted to a
similar extent by reverberation and by additive background noise. Consequently, a clear baseline
lateralization performance was determined, and it was demonstrated that the Binaural Fennec
algorithm could preserve lateralization thresholds, and even improve lateralization thresholds for
the anechoic condition. In this manner, using NH listeners for initial evaluations established an
important baseline for confirming that the Binaural Fennec algorithm does not degrade
lateralization thresholds.
This initial study of the Binaural Fennec algorithm examined performance differences for
the algorithm compared to an unprocessed condition, which was a simple routing of the left and
right front microphone signals. This comparison provides a straightforward examination of
algorithm performance that can be extrapolated to compare performance with other systems. For
example, Chung and colleagues (2006) reported a 3.5 dB benefit for a hypercardioid directional
over an omni-directional microphone. More generally, the overall improvement provided by
42
directional relative to omni-directional microphones is roughly 3–5 dB in real-world
environments with low reverberation compared to omni-directional microphones for listeners
with acoustic hearing (Amlani, 2001; Bentler, 2005; Chung, 2004; Ricketts, 2001; Valente et al.,
2000; Wouters et al., 1999). Comparing to this history of fixed directional microphones, the
Binaural Fennec algorithm outperforms fixed processing for anechoic and low levels of
reverberation, but that the approaches yield similar results for more reverberant conditions.
As the field of adaptive spatial filtering evolves, it is important to clarify differences
between emerging algorithms. The term “null-steering” has been traditionally used to describe
the class of adaptive multiple microphone algorithms that are based on relatively slow (i.e., ~500
ms) adaptations of a linear weighting of microphone signals to minimize the output noise power.
This class of spatial filtering is appropriately referred to as null-steering since an instantaneous
view of the directional response of the system would indicate a spatial null that is steered toward
the angle of incidence for the unwanted sound. So, if a masker existed at 90 degrees, the null-
steering beamformer would ideally place a null at 90 degrees to cancel that noise. However, it
would not simultaneously suppress other potential maskers at other angles. For a similar reason,
we refer to the class of algorithms developed by Kollmeier and colleagues (Kollmeier et al.,
1993; Kollmeier & Koch, 1994) and expanded by others (Goldsworthy, 2014; Goldsworthy et
al., 2014; Lockwood et al., 2004) as target-isolating beamformers since, rather than steer a
spatial null, these algorithms make a spatial beam oriented at the target sound and suppress all
other sounds.
This distinction between null-steering and target-isolating beamformers is relevant to
their performance in reverberant environments. Although both classes of beamformers are
affected by reverberation, there are differences regarding the mechanisms. Reverberation has two
43
mechanisms of action on null-steering beamforming. First, reverberation causes the target speech
to leak into the background noise estimator. This leakage reduces the performance of the
calculations associated with minimizing the output noise power. Second, reverberation causes
the background noise to become statistically spatially diffuse. Since null-steering beamformers
are designed to orient a spatial null to cancel a single background noise source, they are not
effective at cancelling diffuse noise. For target-isolating beamformers, the mechanism that
reverberation affects performance is like the first mechanism described above. Specifically, with
reverberation the target speech leaks outside of the algorithmic beam width and the target speech
is attenuated. However, the second mechanism does not affect target-steering beamformers since
this class of beamformer is designed to suppress sounds from all angles outside of the beam
width. Consequently, the target-isolating beamformers are likely to be relatively robust to the
fact that background noise becomes statistically diffuse with reverberation.
Conclusion
Binaural Fennec is an adaptive multiple microphone spatial filtering algorithm designed to
improve output target to masker ratio while preserving interaural timing and level differences.
Results indicated that the Binaural Fennec algorithm significantly improved speech reception
thresholds in the presence of a competing talker at 90 degrees even in reverberant conditions.
Results also indicated that the Binaural Fennec algorithm improved lateralization thresholds for
an anechoic condition, while not significantly affecting threshold performance for reverberant
conditions. This indicates that although Binaural Fennec did not improve lateralization
thresholds in reverberant conditions, it did not degrade the necessary spatial cues for
lateralization. The conclusion drawn from these results is that the Binaural Fennec algorithm can
improve speech reception in noise, while preserving lateralization performance.
44
Chapter 3: The Effects of Musical Interval Identification Training
and Musical Ability on Psychophysical Performance
The work described in this chapter was published in Frontiers in Neuroscience.
Bissmeyer S.R.S., Ortiz J.R., Gan H. and Goldsworthy R.L. (2022) Computer-based Musical
Interval Training Program for CI Users and Listeners with No Known Hearing Loss. Front.
Neurosci. 16:903924. doi: 10.3389/fnins.2022.903924
Introduction
CIs have successfully restored speech perception to people with severe hearing loss. Most
CI users achieve high levels of speech recognition and spoken language skills (Shannon et al.,
2004; B. S. Wilson & Dorman, 2008). However, CI users struggle to understand speech in noisy
environments and many complain about the sound of music (do Nascimento & Bevilacqua,
2005; Fetterman & Domico, 2002; Q.-J. Fu & Nogaki, 2005; Kong et al., 2004; H. J.
McDermott, 2004; Nimmons et al., 2008). Studies have shown that current CI technology is
limited in its ability to convey the musical percepts of pitch and timbre (Drennan & Rubinstein,
2008; Limb & Rubinstein, 2012). This has resulted in both pitch resolution and timbre
recognition being markedly diminished for CI users compared to their NH peers (Drennan et al.,
2008; Gfeller et al., 2007b; Gfeller, Witt, et al., 2002; Goldsworthy, 2015; Goldsworthy et al.,
2013; Limb & Roy, 2014; Luo et al., 2019; H. J. McDermott, 2004). This loss of resolution and
fidelity has several potential causes including limited number of implanted electrodes, electrode
array placement, broad current spread, sound processing designed for speech rather than music,
poor coding of timing cues for pitch, and poor neural health (M. T. Caldwell et al., 2017; Crew et
al., 2012; Dhanasingh & Jolly, 2017; Finley et al., 2008; Landsberger et al., 2015; Limb & Roy,
45
2014; Mangado et al., 2018; Nogueira et al., 2016; Rebscher, 2008; van der Marel et al., 2014;
Venail et al., 2015; Würfel et al., 2014; Zeng et al., 2014).
These technological and physiological constraints limit how music is transmitted by the
implant and, consequently, limits music enjoyment for CI users. Studies have assessed adult CI
user’s listening habits and music enjoyment through questionnaires (Gfeller et al., 2000; Looi &
She, 2010). They found that many were dissatisfied and spent less time listening to music post-
implantation. Assessment studies have also shown that CI users have more difficulty than NH
listeners with pitch-based perceptual tasks, including frequency discrimination and melody
recognition (Gfeller et al., 2005, 2007a; Gfeller, Turner, et al., 2002; Goldsworthy, 2015;
Penninger et al., 2013).
Melody is a fundamental aspect of music made up of a sequence of musical intervals
which not only relies on the detection and direction of pitch changes, but also their magnitude.
Even for those who casually listen to music, identifying the magnitude between pitches is a basic
component which allows a listener to readily recognize a melody whether sung in a different
register or played in a different key. If a difference in frequency cannot reliably be heard as an
equivalent change in pitch, then the intended melody sounds cacophonous and out-of-tune. This
has been confirmed by Luo and colleagues who found that CI users perceived melodies as out-
of-tune more often than NH listeners (Luo et al., 2014). Furthermore, the ability to perceive
musical intervals also has implications for the emotion and tension conveyed by music. A single
semitone difference between two pitches will determine the tonality of the interval (e.g., major,
minor, diminished, perfect, or augmented) which, along with other important cues like timbre
and tempo, will affect the listener’s emotional response to a melody (Camarena et al., 2022; Luo
& Warner, 2020). The ability to reliably distinguish intervals requires listeners to have a
46
resolution of at least a semitone (J. H. McDermott et al., 2010), and it is well established that
most CI users have pitch resolution that is worse than a semitone (e.g., Pretorius and Hanekom,
2008; Goldsworthy, 2015). Without accurate perception of a musical interval, it is likely that
tonality and emotion intended to be conveyed by music will be lost and this is likely a
contributing factor to decreased musical enjoyment in CI users.
Musical interval labeling is an important skill for musicians and any individual who
desires to participate in musical activities such as playing an instrument or singing. It is difficult
to master identifying musical intervals, even in NH listeners and musicians (J. H. McDermott et
al., 2010). Given the evidence discussed that suggests that musical interval perception is
distorted in the context of melody perception for CI users, it is likely that CI users struggle to
identify musical intervals as well. It is necessary for CI users to take steps to regain access to
interval cues for musical tension and emotion. They must first undergo a period of focused aural
rehabilitation to learn how the lower-level pitch cues are provided by electrical stimulation via
their device (Gfeller, 2001), then develop the higher-level association between specific musical
intervals and intent through further musical interval training (Fujioka et al., 2004).
Despite the importance of intervals to melody, there is only a small body of research
investigating musical interval perception in CI users. Existing studies have shown that CI users
have poor interval identification compared to their NH peers, especially above middle C. Pitch
and relative intervals can be conveyed by stimulation timing (i.e., the modulation or stimulation
rate) but with much variability in pitch salience and in the upper frequency that can be conveyed
by stimulation rate (Pijl, 1997; Pijl & Schwarz, 1995a, 1995b; Todd et al., 2017). Place cues for
pitch (i.e., active electrodes and stimulation configuration) provide a strong sense of pitch but
one that is compressed compared to normal (Stupak et al., 2021). Stupak and colleagues found
47
consistent warping of intervals among CI users, suggesting the ability to perceive intervals is
likely not linked to duration of deafness (Stupak et al., 2021). Spitzer and colleagues investigated
musical interval distortion in CI users who had NH in their non-implanted ear (i.e., single-sided
deafness) (Spitzer et al., 2021). They found that the musical interval needed to create a match in
the implanted ear was, on average, 1.7 times greater than the corresponding interval in the
acoustic hearing ear.
Given the distorted representation of pitch and the issue of frequency compression in
current CI signal processing, experience and training may be required to improve interval
identification and enable access to melody through clinical devices. Interval identification is
challenging for NH people and CI users alike, which makes it a demanding task for training.
Moore and Amitay found that pitch training with a more difficult, or even impossible, task
resulted in more robust learning (D. R. Moore & Amitay, 2007). Musical interval training in NH
listeners has led to improvement in both the trained and untrained tasks (Little et al., 2019).
There are currently no studies investigating the effectiveness of musical interval training in CI
users.
In the present study, we use an interval labeling task to evaluate subject’s ability to
strengthen the association between specific musical intervals and musical intent and to
consistently label intervals across an ecologically relevant musical range (i.e., the typical vocal
range of humans). We note the connection between musical intervals and musical intent does not
require the ability to label intervals, for example, a listener may readily associate a song in a
major key as happy or bright and a song in a minor key as sad or dark (Camarena et al., 2022)
without being able to label the interval pattern being used. However, given that we are interested
in the restoration of a stable interval percept in CI users, we chose to use a labeling task as an
48
important intermediary to quantify the consistency of interval labeling across musical octaves
when those CI users are provided with training to the interval cues. This training task requires
participants to attend to multiple musical interval presentations, associate interval magnitudes
with specific labels (e.g., major third, octave), and compare presentations to intervals heard in
preceding trials.
The present study has two objectives. First, to examine the performance on the trained
task of interval identification and on a battery of untrained musical tasks, including frequency
discrimination and tonal and rhythm comparisons before and after a two-week musical interval
training program. Second, to characterize the relationship between the dimensions of music
perception with low-level psychoacoustics and higher-level rhythm and tonal comparisons,
interval identification, and musical sophistication. The overarching hypothesis motivating this
study is that both low-level psychophysical access to pitch cues as well as higher-level labeling
of intervals limits interval identification accuracy in CI users, and, to a certain extent, those with
no known hearing loss. The results show that the low-level psychophysical tasks probing pitch
resolution serve as predictors of higher-level measures of music perception. The results also
clarify the extent that interval training improves access to the low-level and higher-level cues
necessary for music perception. Discussion focuses on the importance of basic elements of pitch
perception for reestablishing musical interval perception for CI users and on methods for
improving training programs for musical interval identification.
Methods
Overview
49
Participants with no known hearing loss and CI users completed assessments before and
after two weeks of interval training. The pre- and post-assessments included measures of pure
tone detection, pure tone frequency and fundamental frequency discrimination, tonal and rhythm
comparisons, and musical interval identification (the trained task) administered on the Team
Hearing website coded in JavaScript. The measures of pure tone detection, pure tone frequency,
and fundamental frequency discrimination used synthesized stimuli generated using JavaScript.
The measures of tonal and rhythm comparisons used marimba notes rendered using Finale
Version 3.5.1 software (Coda Music) (https://www.finalemusic.com/), and the measures of
interval identification for both training and assessment used piano notes rendered using
MuseScore 3 software (https://musescore.org/en).
Figure 3.1: Visualizations of Musical Notes
The left subpanel shows auditory nerve response to musical notes for normal hearing using physiology modeling software
(Zilany et al., 2014). The right subpanel shows CI stimulation patterns emulated using the Nucleus MATLAB Toolbox (Nucleus
MATLAB Toolbox version 4.42, (Swanson & Mauch, 2006). For both visualizations, the two notes being compared are A2 (110
Hz) and A3 (220 Hz).
50
Figure 3.1 shows typical normal hearing neural response patterns (left subpanel) and Cochlear
Corporation CI stimulation patterns (right subpanel) for representative musical notes,
highlighting the difference in frequency representation between the two groups. In the left
subpanel, the 110 Hz and 220 Hz place cues can be visualized at the fundamental as well as the
ascending harmonic frequencies and temporal cues can be observed with a doubling of the rate
for 220 Hz. In the right subpanel, the place and temporal cues are not as clearly visualized, with
the harmonic structure coarsely represented and the fundamental frequencies conveyed only
through weak amplitude modulation. The place and temporal representation in CI stimulation is
poor compared to the cues available for pitch perception in the normal auditory system. This
representation reinforces the basis of the first part of the hypothesis, that CI users are limited in
low-level psychophysical access to pitch cues. A permalink for this experiment can be found at:
https://www.teamhearing.org/81, after entering the site, press the “Studies” button to enter the
experiment.
Participants
Thirteen adult CI users, with six bilaterally implanted and seven unilaterally implanted,
and seven listeners with no known hearing loss took part in this experiment. All participants
completed the two-week interval training protocol. Participant ages ranged from 23-77 years old
with an average age of 62.9 years in the CI user group and 42.3 years in listeners with no known
hearing loss. Relevant subject information is provided in Table 3.1. Participants provided
informed consent and were paid for their participation. The experimental protocol was approved
by the University of Southern California Institutional Review Board.
51
Table 3.1. Subject Demographics
Age at time of testing and age at onset of hearing loss (when applicable) is given in years. Duration of profound hearing loss prior
to implantation (when applicable) is given in years and estimated from subject interviews. SNHL = Sensorineural Hearing Loss.
S
u
b
je
ct
A
g
e
G
e
n
d
e
r
Etiol
ogy
Ea
r
Te
ste
d
M
SI
Sc
or
e
Ag
e
at
O
ns
et
Year
s
Impl
ante
d
CI
Compan
y &
Processo
r
Implant
Model
Duration
of
Deafness
Before
Implantat
ion
Method of
Streaming
H
1
5
3
M
No
Kno
wn
Heari
ng
Loss
Bo
th
To
get
her
3.
61
N/
A
N/A N/A N/A N/A Apple Earbuds
H
2
2
4
F
No
Kno
wn
Heari
ng
Loss
Bo
th
To
get
her
5.
89
N/
A
N/A N/A N/A N/A
Koss UR20
Headphones
H
3
6
6
F
No
Kno
wn
Heari
ng
Loss
Bo
th
To
get
her
3.
5
N/
A
N/A N/A N/A N/A Apple Earbuds
H
4
5
4
M
No
Kno
wn
Heari
ng
Loss
Bo
th
To
get
her
3.
83
N/
A
N/A N/A N/A N/A
Free Field
through Dell
Optiplex 3080
Speakers
H
5
3
9
M
No
Kno
wn
Heari
ng
Loss
Bo
th
To
get
her
4.
33
N/
A
N/A N/A N/A N/A
Free Field
through
Panasonic TV
TH-50PX80U
speakers
H
6
2
3
F
No
Kno
wn
Heari
ng
Loss
Bo
th
To
get
her
5.
94
N/
A
N/A N/A N/A N/A
Free Field
through
Yamaha HS5
Powered Studio
Monitor
Speaker
H 3 F No Bo 6. N/ N/A N/A N/A N/A Beyer Dynamic
52
7 7 Kno
wn
Heari
ng
Loss
th
To
get
her
39 A DT 770 Pro
Headphones
C
2
3
7
F
Unkn
own
Bo
th
To
get
her
4.
78
15
L:9
R:13
Cochlear
N7s
L:CI24R
E (CA)
R:CI24R
E (CA)
L:5 R:1 Mini Mic2
C
3
7
6
F
Progr
essiv
e
SNH
L
Bo
th
To
get
her
2.
11
40
L:21
R:17
Cochlear
N6s
L:CI24R
(CS)
R:CI24R
E (CA)
L:1 R:5
Cochlear
Binaural Cable
C
1
0
4
6
M
Ototo
xic
Medi
cine
Lef
t
3.
28
12 33
Cochlear
N6
CI22M-,
USA
1 Mini Mic
C
1
1
5
8
F
Sudd
en
SNH
L
Ri
ght
1.
83
55 2
AB
Naida CI
Q90
HiRes
Ultra 3D
CI
HiFocus
SlimJ
1 AB Bluetooth
C
1
3
5
9
M
Mum
ps
Disea
se
Ri
ght
3.
39
14 3
MED-EL
Sonnet
Sonata 2
Mi1260
42
I-loop
streaming
C
1
5
5
8
M
Ototo
xic
Medi
cine
Lef
t
2 54 1
AB
Naida
HiRes
Ultra 3D
CI with
HiFocus
Mid-
Scala
Electrode
1
Bluetooth/Com
pilot
C
1
6
6
6
M
Ototo
xic
Medi
cine
Lef
t
4.
11
38 18
Cochlear
N5
CI24R
(CS)
5
Sony MDR-
D150
Headphones
C
1
7
7
4
F
Unkn
own
Bo
th
To
get
her
1.
78
Bir
th
L:20
R:15
Cochlear
N6s
L:CI24R
(CS)
R:CI24R
E (CA)
L:9 R:9
Free Field
through HP
Computer
Speakers
C
1
7
2
F
Meas
les In
Bo
th
2.
56
Bir
th
L:12
R:10
Cochlear
N6s
L:CI24R
E (CA)
L:1 R:1
Free Field
through HP
53
8 Utero To
get
her
R:CI512 Computer
Speakers
C
2
0
6
7
F
Unkn
own
Bo
th
To
get
her
4.
11
18
L:4
R:5
L:Cochle
ar N6
R:Cochle
ar N7
L:CI522
R:CI522
L:14
R:16
Free Field
through iPad
Speakers
C
2
2
6
5
F
Mum
ps
Disea
se
Lef
t
5.
11
5 2
Cochlear
N7
CI512 58 Mini Mic
C
2
8
7
7
M
Unkn
own
Bo
th
To
get
her
3.
33
60
L:2
R:1
MED-EL
Rondo 3s
Synchron
y 2
Mi1250
L:1 R:2
Bluetooth
streaming using
AudioLink
C
3
2
6
3
F
Progr
essiv
e
SNH
L
Lef
t
6.
28
20 13
Cochlear
N7
CI24RE
(CA)
5
Direct
Bluetooth
streaming from
iPad
Training
All assessments and the musical interval training program were completed remotely by
participants using a web application. For training, participants complete six listening exercises
each day requiring approximately 20 minutes each day for two weeks. Each listening exercise
included 20 trials of interval identification for which participants needed to identify 80% of the
intervals correctly to proceed to the next difficulty training level. Levels were organized into 36
increasingly difficult levels with fewer comparisons and larger interval spacings on lower
difficulty training levels.
Table 3.2. Interval Notation with the Corresponding Semitone Spacing between Notes
Interval
Semitone
Spacing
Minor 2
nd
1
54
Major 2
nd
2
Minor 3
rd
3
Major 3
rd
4
Perfect 5
th
7
Octave 12
For each trial, listeners were presented with an ascending musical interval and asked to
indicate the interval that they heard. The online interface displayed two to four response buttons
on screen depending on the level, with specific musical interval labels provided for selection. In
total, training was provided for six different ascending melodic intervals consisting of two
sequentially presented piano notes. The intervals presented and the corresponding semitone
spacings between notes are listed in Table 3.2. Practice was provided for intervals with base
notes near A2 (110 Hz), A3 (220 Hz), and A4 (440 Hz). These training levels were divided into
6 different interval groupings with 6 base note frequencies within each interval grouping. The
interval groupings, described in semitone spacing between notes, were [2,12], [2,7], [7,12],
[4,7,12], [2,4,7], and [1,2,3,4]. The base note frequencies within each interval grouping were 1)
A2 (110 Hz) no variation, 2) A2 (110 Hz) +/- 6 semitones, 3) A3 (220 Hz) no variation, 4) A3
(220 Hz) +/- 6 semitones, 5) A4 (440 Hz) no variation, and 6) A4 (440 Hz) +/- 6 semitones. See
supplementary Table B.1. 1 for more information about the training levels. Feedback was
displayed after each response on the response button selected with a green check mark for
correct answers and a red “X” for wrong answers. For wrong answers, participants were given
the correct answer on screen and the option to replay the interval comparison as needed.
Pre- and Post-Training Assessments
Participants completed pre- and post-training assessments to characterize the effect of
training on the trained task and on untrained measures of pitch discrimination and music
55
perception. The assessments included pure tone detection, pure tone frequency discrimination,
fundamental frequency discrimination, tonal and rhythm comparisons, and musical interval
identification.
Calibration Procedures
Before completing the assessments, participants completed two procedures to
characterize relative loudness levels with their devices (computer, audio device, hearing device,
etc.) kept how the subject would normally listen. First, participants were asked to use a method
of adjustment to set a 1 kHz pure tone to subjective “soft,” “medium soft,” “medium,” and
“medium loud” intensity levels in dB relative to the maximum output level of sound card without
clipping. Second, pure tone detection thresholds were measured in dB relative to the maximum
output level of sound card at 250, 1000, and 4000 Hz to provide a comparison of relative
detection levels across frequencies. Stimuli were 400 ms sinusoids with 20 ms raised-cosine
attack and release ramps. At the beginning of a measurement run, participants set the volume to a
“soft but audible” level. The detection thresholds were then measured using a three-alternative,
three-interval, forced-choice procedure in which two of the intervals contained silence and one
interval contained the gain-adjusted tone. Participants were told via on-screen instructions to
select the interval that contained the tone. The starting gain value was a threshold level as
specified by the participant through method of adjustment. This value was reduced by 2 dB after
correct answers and increased by 6 dB after mistakes to obtain the true detection threshold level.
A run continued until three mistakes were made and the average of the last four reversals was
taken as the detection threshold. This procedure converges to 75% detection accuracy
(Kaernbach, 1991). Relative dynamic range could then be calculated by subtracting the detection
threshold from the comfortable listening intensity level set at 1000 Hz. The remainder of the
56
assessments and interval training were conducted at the volume the participant set as
“comfortable.”
Pure Tone Frequency Discrimination
Pure tone frequency discrimination was measured for pure tones near 250, 1000, and
4000 Hz. Stimuli were 400 ms in duration with 20 ms raised-cosine attack and release ramps.
Discrimination was measured using a two-alternative, two-interval, forced-choice procedure
where the target stimulus had an adaptively higher frequency than the standard. Participants were
provided with on-screen instructions to choose the sound that was “higher in pitch.” Each
measurement run began with a frequency difference of 100% (an octave) between the standard
and target stimuli. This frequency difference was reduced by a factor of √2
3
after correct answers
and increased by a factor of two
after mistakes. For each trial, the precise frequency tested was
roved to add perturbations which contribute to the ecological relevance of the stimulus (e.g.,
vocal pitch fluctuations) while avoiding both artifactual effects (e.g., sidebands outside of the
filter, beating) and habituation to the base note frequency. The frequency roving was done within
a quarter-octave range uniformly distributed and geometrically centered on the nominal
condition frequency. Relative to the roved frequency value, the standard frequency was lowered,
and the target raised by √1 + ∆ 100 ⁄ . The gain of the standard and target were roved by 6 dB
based on a uniform distribution centered on the participant’s comfortable listening level. A run
ended when the participant made four mistakes and the average of the last four reversals was
taken as the discrimination threshold.
Fundamental Frequency Discrimination
57
Fundamental frequency discrimination was measured for fundamental frequencies near
110, 220, and 440 Hz for low-pass filtered harmonic complexes. Stimuli were 400 ms in duration
with 20 ms raised-cosine attack and release ramps. These fundamental frequencies were chosen
as representative of the fundamental frequencies used in the interval identification assessment
and training. A total of nine measurement runs were conducted consisting of three repetitions of
the three fundamental frequencies. The condition order was randomized for each repetition.
Harmonic complexes were constructed in the frequency domain by summing all non-zero
harmonics from the fundamental to 2 kHz with a low-pass filtering function. All harmonics were
of equal amplitude prior to filtering. The form of the low-pass filtering function was:
𝑔𝑎𝑖𝑛 = {
1 𝑖𝑓 𝑓 < 𝑓 𝑒
max(0, 1 − (log
2
𝑓 − log
2
𝑓 𝑒 )
2
) 𝑜𝑡 ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑔𝑎𝑖𝑛 is the gain expressed as a linear multiplier applied to each harmonic component, 𝑓 is
the frequency of the component, and 𝑓 𝑒 is the edge frequency of the passband, which was set as
1 kHz for the low-pass filter. Note, as thus defined, the low-pass filter gain is zero above 2 kHz.
Fundamental frequency discrimination was measured using a two-alternative, two-interval,
forced-choice procedure where the target had an adaptively higher fundamental frequency
compared to the standard. The same adaptive procedure, amplitude and frequency roving, and
scoring logic were used as for pure tone frequency discrimination but with adaptive control over
fundamental frequency.
Tonal and Rhythm Comparisons
Participant performance on tonal and rhythm comparisons was measured using a two-
alternative, two-interval, forced-choice procedure. The stimuli were the same as those generated
and used by Habibi and colleagues (2016). In each trial, participants were presented with two 2.5
(Equation 4.1)
58
second long pre-rendered melodies rendered with marimba-like timbre, which contained 5
distinct pitches corresponding to the first 5 notes of the C major scale with fundamental
frequencies ranging from 261 to 392 Hz (Habibi et al., 2016). The melodies were either the same
or differed on a single note in terms of tonality or rhythm, and the listener had to choose between
the on-screen options: “Same” or “Different.” The tonal and rhythm comparison procedures
tested the subjects ability to identify deviations in either tonality or rhythm between pairs of
unfamiliar 5-note melodies based on Western classical rules (Habibi et al., 2013, 2014, 2016).
Tonal, or pitch, deviations involved the pitch change of a single note in the 5-note melody. The
pitch deviations were restricted to the first 5 notes of the C major scale. Rhythm deviations
involved the prolongation of a single note creating a delay in the subsequent note, the duration of
which was consequently shorter so that the offset time was unchanged. The duration of each note
ranged from 125 ms to 1500 ms to create rhythmic patterns. The standard melody had no
deviations in pitch or note duration. This assessment consisted of three repetitions of each set,
consisting of 24 trials, half of which were tonal comparisons and half of which were rhythm
comparisons. Performance was measured as the percentage of correct responses for each
comparison domain.
Interval Identification
Performance on musical interval identification was assessed with piano notes for three
note ranges near A2, A3, and A4 (110, 220, and 440 Hz, respectively). Participants were
presented with two sequentially played piano notes separated by 4, 7, or 12 semitones to
represent a major 3
rd
, perfect 5
th
, or octave interval, respectively. Note, these specific test
conditions corresponded to training levels 20, 22, and 24 of the training program. Responses
were collected using a three-alternative forced-choice procedure where the participant had to
59
choose between the on-screen options: “major 3
rd
,” perfect 5
th
,” or “octave.” Each measurement
run consisted of 20 trials and there were three repetitions of each condition (A2, A3, A4) for a
total of nine measurement runs. The musical interval chosen on any trial was randomly selected.
In total, each participant completed 180 trials during the interval identification assessment and
was presented with approximately 60 presentations of each of the three intervals utilized in this
assessment. The base note of the comparison was roved within an octave range centered on the
nominal condition note.
The Goldsmith Musical Sophistication Index
The level of prior musical experience was measured using the Goldsmith Musical
Sophistication Index Self-Report Inventory (MSI), a 39-item psychometric instrument used to
quantify the amount of musical engagement, skill, and behavior of an individual (Müllensiefen et
al., 2014). The questions on this assessment are grouped into five subscales: active engagement,
perceptual abilities, musical training, singing abilities, and emotion. Questions under the active
engagement category consider instances of deliberate interaction with music (i.e., “I listen
attentively to music for X hours per day”). The perceptual abilities category includes questions
about music listening skills (e.g., “I can tell when people sing or play out of tune”). Musical
training questions inquire about individuals’ formal and non-formal music practice experiences
(“I engaged in regular daily practice of a musical instrument including voice for X years”).
Singing abilities questions inquire about individuals’ singing skills and activities (e.g., “After
hearing a new song two or three times I can usually sing it by myself”). Questions under the
emotion category reflect on instances of active emotional responses to music (e.g., “I sometimes
choose music that can trigger shivers down my spine”). These topics together consider an
individual’s holistic musical ability, including instances of formal and non-formal music training
60
and engagement. The composite score of these subscales makes up an individual’s general
musical sophistication score. All items, except those assessing musical training, are scored on a
seven-point Likert scale with choices that range from completely disagree to completely agree
(Müllensiefen et al., 2014).
Results
Data Analysis
Results from each procedure were analyzed using a mixed-effect analysis of variance.
The analysis factors depended on the procedure, but all analyses included test group (CI users
versus listeners with no known hearing loss) as a between-subject factor and test session (pre-
versus post-training) as a within-subject factor. Planned comparisons were made between test
group and test session for all assessment tasks to test whether musical interval identification
training would improve the performance of the two groups on different musical tasks from pre-
to post-training. Effect size was calculated using Cohen’s method (Cohen, 1992) and
significance levels using multiple comparisons with Bonferroni adjustments. Comparisons
between individual results across measures were performed using Pearson's correlation
coefficients.
Pure Tone Detection Thresholds
61
Figure 3.2: Stimulus Level Associated with Detection Thresholds
Detection thresholds for 250, 1000, and 4000 Hz for those with no known hearing loss (left subpanel) and for CI users (right
subpanel). The gain is in decibels with a gain of 100 dB corresponding to the maximum gain of the listening device. Smaller
symbols indicate individual thresholds. Individual thresholds for CI users with implants from Cochlear Corporation are
represented with a circle, AB with a diamond, and MED-EL with a square. Larger circles indicate group averages for each
session with error bars indicating standard errors of the means.
Figure 3.2 shows the pure tone detection thresholds measured as a calibration procedure
in dB relative to soundcard at 250, 1000, and 4000 Hz for those with no known hearing loss and
for CI users. The difference in average detection thresholds between groups was significant
exhibiting a large effect size (𝐹 1,18
= 10.1, 𝑝 = 0.005, 𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.5) with CI users setting the
average software volume higher (29.2 ± 11.8) than those with no known hearing loss (12.3 ±
10.4). Importantly, these thresholds are measured relative to the system volume that participants
adjust their computers for at-home listening. These results are not indicative of absolute
detection, but they do indicate that when participants adjust their computers and listening devices
62
to be comfortable, CI users had elevated detection thresholds. It is important to note as well that
relative to the self-selected comfortable listening level at 1000 Hz, CI users had elevated
detection thresholds, or a smaller relative dynamic range (𝐹 1,18
= 3,14, 𝑝 = 0.09, 𝑑 𝐶𝑜 ℎ𝑒𝑛
=
0.7). For relative detection thresholds, the effect of frequency was significant (𝐹 2,36
= 17.3, 𝑝 <
0.001) as was the interaction between frequency and participant group (𝐹 2,36
= 4.1, 𝑝 = 0.024).
The interaction effect is evident in the particularly elevated thresholds at 250 Hz for the CI users.
The effect of session (pre- versus post-training) was not significant (𝐹 1,18
= 2.4, 𝑝 = 0.14) nor
was the interaction between session and participant group (𝐹 1,18
= 2.0, 𝑝 = 0.17). The
interaction effect of frequency and session (pre- versus post-training) was not significant
(𝐹 2,36
= 0.09, 𝑝 = 0.91) nor was the interaction for frequency, session, and participant group
(𝐹 2,36
= 0.4, 𝑝 = 0.68).
Pure Tone Frequency Discrimination
63
Figure 3.3: Pure Tone Frequency Discrimination Thresholds
Pure tone frequency discrimination as percent difference on logarithmic scale (left y axis) and semitones (right y axis) for
frequencies 250, 1000, and 4000 Hz for participants with no known hearing loss (left subpanel) and for CI users (right subpanel).
Smaller symbols indicate individual thresholds. Individual thresholds for CI users with implants from Cochlear Corporation are
represented with a circle, AB with a diamond, and MED-EL with a square. Larger circles indicate group averages for each
session with error bars indicating standard errors of the means.
Figure 3.3 shows pure tone frequency discrimination for all participants before and after
training. The CI users had poorer discrimination compared to those with no known hearing loss
(𝐹 1,18
= 12.84, 𝑝 = 0.002). Average discrimination thresholds across frequencies and sessions
were 7.04% (or 1.18 semitones) for CI users and 1.05% (or 0.18 semitones) for those with no
known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.6). There was a small effect of frequency (𝐹 2,36
= 1.95, 𝑝 =
0.09) as well as a small effect for the interaction between frequency and participant group
(𝐹 2,36
= 2.15, 𝑝 = 0.074). The interaction effect can be seen in that discrimination improved
with increasing frequency for those with no known hearing loss, but CI users had best
discrimination near 1 kHz. The effect of test session was not significant (𝐹 1,18
= 0.03, 𝑝 = 0.87)
nor was the interaction between session and participant group (𝐹 1,18
= 0.006, 𝑝 = 0.94). The
interaction effect of frequency and session (pre- versus post-training) was not significant
(𝐹 2,36
= 1.63, 𝑝 = 0.21) nor was the interaction for frequency, session, and participant group
(𝐹 2,36
= 0.0003, 𝑝 = 0.99).
Fundamental Frequency Discrimination
64
Figure 3.4: Fundamental Frequency Discrimination Thresholds
Fundamental frequency discrimination as percent difference on a logarithmic scale (left y axis) and semitones (right y axis) for
fundamental frequencies 110, 220, and 440 Hz for participants with no known hearing loss (left subpanel) and for CI users (right
subpanel). Smaller symbols indicate individual thresholds. Individual thresholds for CI users with implants from Cochlear
Corporation are represented with a circle, AB with a diamond, and MED-EL with a square. Larger circles indicate group
averages for each session with error bars indicating standard errors of the means.
Figure 3.4 shows fundamental frequency discrimination thresholds for all participants
before and after training. The CI users had poorer discrimination compared to those with no
known hearing loss (𝐹 1,18
= 19.3, 𝑝 < 0.001). Average discrimination thresholds across
frequencies and sessions were 11.8% (or 1.93 semitones) for CI users and 0.9% (or 0.16
semitones) for those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 2.1). The effect of fundamental
frequency was significant (𝐹 2,36
= 8.6, 𝑝 < 0.001) as well the interaction between fundamental
frequency and group (𝐹 2,36
= 4.5, 𝑝 = 0.017). The effect of fundamental frequency is evident in
that discrimination generally worsened with increasing fundamental frequency, which is more
pronounced in the CI users. The effect of test session was not significant (𝐹 1,18
= 2.0, 𝑝 = 0.18)
65
nor was the interaction between session and participant group (𝐹 1,18
= 0.33, 𝑝 = 0.57).
Averaged across groups and conditions, the effect of training on discrimination was small but
positive (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.13). The interaction effect of frequency and session (pre- versus post-
training) was not significant (𝐹 2,36
= 0.49, 𝑝 = 0.62) nor was the interaction for frequency,
session, and participant group (𝐹 2,36
= 0.07, 𝑝 = 0.93).
Tonal and Rhythm Comparisons
Figure 3.5: Tonal and Rhythm Comparisons
Tonal and rhythm comparisons as percentage of correct responses for listeners with no known hearing loss (left subpanel) and for
CI users (right subpanel). Smaller symbols indicate individual thresholds. Individual thresholds for CI users with implants from
Cochlear Corporation are represented with a circle, AB with a diamond, and MED-EL with a square. Larger circles indicate
group averages for each session with error bars indicating standard errors of the means.
Figure 3.5 shows performance on tonal and rhythm comparisons for all participants
before and after training. CI users had poorer performance on tonal comparisons compared to
66
those with no known hearing loss (𝐹 1,18
= 13.2, 𝑝 = 0.0019). Average performance across
sessions was 69.1% correct for CI users and 91.3% correct for those with no known hearing loss
(𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.85). The effect of test session was not significant (𝐹 1,18
= 0.35, 𝑝 = 0.56) nor was
the interaction between session and participant group (𝐹 1,18
= 0.01, 𝑝 = 0.92). Neither group
significantly improved on tonal comparisons across sessions.
CI users also had poorer performance on rhythm comparisons compared to those with no known
hearing loss (𝐹 1,18
= 21.5, 𝑝 < 0.001). Average performance across sessions was 76.8% correct
for CI users and 92.5% correct for those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.9). The effect
of test session was not significant (𝐹 1,18
= 1.75, 𝑝 = 0.2) nor was the interaction between
session and participant group (𝐹 1,18
= 1.1, 𝑝 = 0.31). Neither group significantly improved on
rhythm comparisons across sessions.
Interval Identification
67
Figure 3.6: Interval identification
Interval identification as percentage of correct responses for participants with no known hearing loss (left subpanel) and for CI
users (right subpanel) at 110, 220, and 440 Hz. Smaller symbols indicate individual thresholds. Individual thresholds for CI users
with implants from Cochlear Corporation are represented with a circle, AB with a diamond, and MED-EL with a square. Larger
circles indicate group averages for each session with error bars indicating standard errors of the means.
Figure 3.6 shows performance on interval identification for all participants before and
after training. CI users had poorer interval identification compared to those with no known
hearing loss (𝐹 1,18
= 9.0, 𝑝 = 0.009). Average performance across sessions was 52.4% correct
for CI users and 79.2% correct for those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.5). There was a
no effect of frequency (𝐹 2,30
= 2.05, 𝑝 = 0.15) but a small effect for the interaction between
frequency and participant group (𝐹 2,30
= 2.94, 𝑝 = 0.068). The effect of test session was not
significant (𝐹 1,18
= 3.6, 𝑝 = 0.076) nor was the interaction between session and participant
group (𝐹 1,18
= 2.0, 𝑝 = 0.17). Planned comparisons of the performance before and after training
indicated that, on average, the CI users improved from 48.6 to 58.2% correct (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.63).
The interaction effect of frequency and session (pre- versus post-training) was not significant
(𝐹 2,30
= 0.2, 𝑝 = 0.82) nor was the interaction for frequency, session, and participant group
(𝐹 2,30
= 0.6, 𝑝 = 0.56).
Correlation Analysis
Correlations were calculated between results from different procedures based on averages
across conditions. Correlations were calculated for all participants (Table 3.3) and for the two
participant groups separately (Table 3.4 (no known hearing loss) and Table 3.5 (CI)). While the
current measures of statistical significance for these tables are 𝑝 < 0.05 (*), 𝑝 < 0.01 (**), 𝑝 <
0.0024 (
x
) and 𝑝 < 0.001 (***), only the correlations with 𝑝 < 0.0024 (
x
) or 𝑝 < 0.001 (***)
68
were statistically significant for the stringent Bonferroni-adjusted criteria which adjusts alpha
from 0.05 to 0.05/21 or 0.0024.
Table 3.3. Correlations between Results from Different Procedures Averaged across Conditions
For clarity, only the correlation magnitudes are displayed, but all comparisons were congruent in that better performance on one
measure corresponded with better performance on another. Correlation coefficients and p-values associated with p-values less
than 0.05 are emboldened. Abbreviations: detection thresholds (DT), frequency discrimination thresholds (FDT), fundamental
frequency discrimination thresholds (F0DT), tonal comparisons (TC), rhythm comparisons (RC), interval identification (II) ,
musical sophistication index (MSI), 𝑝 < 0.05 (*), 𝑝 < 0.01 (**),𝑝 < 0.0024 (x) and 𝑝 < 0.001 (***). Note that only the
correlations with 𝑝 < 0.0024 (x) or 𝑝 < 0.001 (***) were statistically significant for the stringent Bonferroni-adjusted criteria
which adjusts alpha from 0.05 to 0.05/21 or 0.0024.
FDT F0DT TC RC II MSI
DT 0.52*
0.55* 0.41 0.59** 0.46* 0.52*
FDT 0.94*** 0.81*** 0.82*** 0.92*** 0.72***
F0DT 0.88*** 0.88*** 0.93*** 0.73***
TC 0.90*** 0.86*** 0.75***
RC 0.82*** 0.70***
II 0.75***
Table 3.4. Procedure Correlations for No Known Hearing Loss
As for Table 3.3 but only including those with no known hearing loss.
FDT F0DT TC RC II MSI
DT 0.14
0.03 0.03 0.21 0.14 0.35
FDT 0.97*** 0.88** 0.83
*
0.91** 0.79*
F0DT 0.83* 0.81* 0.86* 0.69
TC 0.94
x
0.86* 0.68
RC 0.83* 0.61
II 0.93
x
Table 3.5. Procedure Correlations for CI Users
As for Table 3.3 but only including CI users
FDT F0DT TC RC II MSI
DT 0.26
0.28 0.04 0.51 0.15 0.35
FDT 0.85*** 0.58* 0.62* 0.84*** 0.55
F0DT 0.77
x
0.76** 0.94*** 0.63*
TC 0.77
x
0.74** 0.67*
RC 0.63* 0.61*
II 0.55
69
Considering the correlations for all participants in Table 3.3, all correlations, except between
detection thresholds and tonal comparisons (𝑝 = 0.07), were significant indicating the general
trend that the best performing participants were consistent across procedures. While detection
thresholds were correlated with other measures, the explained variance was not as high as for the
other comparisons. These correlations with detection thresholds were likely driven by group
effects with CI users having elevated detection thresholds and consistently poorer performance
on other measures. This notion is supported by the fact that none of the within-group correlations
were significant for comparisons with detection thresholds. The low-level measures of pure tone
and fundamental frequency discrimination were highly correlated with the higher-level measures
of tonal and rhythm comparisons and interval identification. The strength of these correlations
generally held when considering correlations within each participant group. For CI users, both
pure tone and fundamental frequency discrimination were particularly well correlated to interval
identification. The strong relationship between frequency discrimination and interval
identification suggests that training on one of these dimensions could strengthen the other,
although it is important to note that no training effects were found in this study. While
fundamental frequency discrimination produced the highest correlation with interval
identification, the other assessments were all significantly correlated with interval identification
as well. Multiple regression analyses were calculated to determine which pairs of assessments
including an interaction term provided the highest joint correlation with interval identification.
The highest correlation was observed between interval identification with a multiple regression
analysis of fundamental frequency discrimination and MSI scores, which produced a correlation
coefficient of 0.97 when the interaction between measures was included and 0.94 when the
70
interaction was not modeled. In general, the correlation between assessments were strongly
interdependent (additional variance was not well explained by combining measures), with the
most notable exception that jointly modeling MSI scores and fundamental frequency
discrimination produced the largest correlation.
Figure 3.7: Comparisons of Individual Results from Different Procedures
Procedure correlations based on averages across conditions. For each comparison, each symbol represents the average measure
for each individual participant averaged across conditions and repetitions. Individual thresholds for CI users with implants from
Cochlear Corporation are represented with a circle (red), AB with a diamond (dark red), and MED-EL with a square (light red).
As an example of specific relationships, Figure 3.7 compares performance on pure tone
frequency discrimination, tonal and rhythm comparisons, and interval identification with
fundamental frequency discrimination. Participants who had better fundamental frequency
discrimination for complex tones tended to have better performance on all other measures. As a
second example of specific relationships, Figure 3.8 compares performance on detection
71
thresholds, pure tone frequency discrimination, fundamental frequency discrimination, tonal
comparisons, rhythm comparisons, and interval identification with MSI scores. Participants who
had higher MSI scores—in particular, those with NH—tended to have better performance on all
other measures.
Figure 3.8: Musical Sophistication Index (MSI) vs Individual Results from Procedures
Comparisons of Musical Sophistication Index (MSI) with individual results from different procedures based on averages across
conditions. For each comparison, each symbols represents the average measure for each individual participant averaged across
conditions and repetitions. Individual thresholds for CI users with implants from Cochlear Corporation are represented with a
circle (red), AB with a diamond (dark red), and MED-EL with a square (light red).
Details of the Training Program
72
Figure 3.9: Number of Cumulative Failed Runs across Levels for Individual Participants
Each line ends when the participant completed two weeks of training or reached the final level of the training program. Note, the
ordinate has a different scale for the two participant groups. The assessment conditions used for interval identification correspond
to training levels of 20, 22, and 24.
Figure 3.9 shows the number of cumulative failed runs during training across difficulty
levels for individual participants. The purpose of reporting training progress in terms of
cumulative fails was to highlight which subjects had the most difficulty completing the training
task at specific difficulty training levels and overall. Subjects H2, H6, and H7 had perfect
performance on all difficulty training levels, so their data points overlap and only H7 is visible.
Subject H5 had impressive performance as well. These four subjects were all accomplished
musicians, which is reflected in their exceptional performance. The other three subjects (H1, H3,
and H4) were non-musicians, with H4 struggling the most in this group. Subject C2 was a
musician from a young age before getting the CI, which may have contributed to the great
performance. Subjects C16 and C32 were both avid musicians who passed most difficulty
73
training levels with ease. Subject C22 was an accomplished musician but also a bimodal listener
who has not had much focused rehabilitation of the CI alone, which may have contributed to the
difficulty getting past even the first level.
Discussion
The primary aim of this study was to characterize performance on assessment tasks for CI
users and listeners with no known hearing loss before and after two weeks of online musical
interval training. Pre-training and post-training assessments measured pure tone and fundamental
frequency discrimination, tonal and rhythm comparisons, and interval identification. The
overarching hypothesis motivating this study is that both low-level psychophysical access to
pitch cues as well as higher-level labeling of intervals limits identification accuracy in CI users,
and, to a certain extent, those with no known hearing loss. Strong correlations were found
between low-level measures of frequency and fundamental frequency resolution with higher-
level rhythm and tonal comparisons, interval identification, and musical sophistication, thus
supporting the first part of the overarching hypothesis. Furthermore, dedicated training on
interval identification during this study provided CI participants opportunity to build (or rebuild)
the association between interval and naming convention, along with experience with assessment
tasks requiring pitch judgments.
The strength of the relationship between interval identification and frequency
discrimination is well explained by separating the skills needed to perform interval identification
into two components. The listener must first, hear the difference in pitch between two successive
notes and second, label the magnitude of the pitch difference with the corresponding interval.
Challenged in this way, participants use increasingly fine distinctions between interval
74
magnitudes to determine the interval label. It was surprising then that a few listeners with no
known hearing loss had pitch resolution at or worse than 1 semitone (note, one semitone is
approximately a 6% difference in fundamental frequency). This could have been a function of
age (p<0.02 for correlations with PT, F0, II, and MSI), experience, unknown hearing loss, or
even attention. Most CI users had pitch resolution worse than two semitones, and although age
was not a factor . This poor resolution makes it difficult to form magnitude judgments, except for
stark interval comparisons such as a major 2
nd
versus an octave. One CI user, who had pitch
resolution better than a semitone, was able to correctly label 80% of intervals on the assessment
task. While it was not guaranteed that the higher-level task of interval labeling would directly
influence performance on lower-level psychoacoustic tasks in this brief training, given the
strength of the relationship between interval identification and frequency resolution, it is possible
that more extensive practice at interval labeling may transfer to simpler tasks such as pitch
ranking and melodic contour identification, although this study did not find any evidence for this
claim. It has also been shown that incidental listening to musical materials can improve
resolution of those materials (Little et al., 2019).
The absence of significant learning in both participants groups should be taken into
consideration when evaluating the effectiveness of training strategies. It has been proposed that
auditory perceptual learning requires both stimulus exposure and execution of the task to be
learned—provided in the current study by task practice—and a sufficient amount of practice per
day (Wright, 2013). These requirements for learning must be balanced with common barriers to
training paradigm success—fatigue and attrition. Studies of computer-based auditory training
programs for individuals with hearing loss have varying definitions of retention and many
studies do not report their compliance level (Henshaw & Ferguson, 2013). The present study
75
aimed to make musical interval training accessible and convenient by providing an online
training program that participants could use at home and by limiting training sessions to 20
minutes per day for two weeks. This is a relatively brief training protocol compared to other
training programs for CI users (Looi et al., 2012). While this brief period of training likely
contributed to the 100% retention rate, it may not have provided enough practice needed for
learning, leading to the lack of improved performance on the trained task. Further investigation
into musical interval identification training in CI users is necessary to clarify the optimal amount
of daily and total training needed for learning.
An additional consideration is the difficulty of using the online interface given the age of
participants in the CI user group. Technological literacy is generally lower among older
populations and the mean age of the CI users was 62.9 years compared to 42.3 years for the
listeners with no known hearing loss. Multiple participants reported difficulty using the online
interface throughout the study. This may have made learning through the online interface
difficult and training sessions may not have progressed as intended. While age did not
significantly correlate with performance in the CI group, it did for the group with no known
hearing loss for pure tone frequency discrimination (𝑝 = 0.014), fundamental frequency
discrimination (𝑝 = 0.017), and interval identification (𝑝 = 0.001).
The assessment used for interval identification may also have been too difficult for the CI
users. The conditions for the assessment procedure presented the participants with three types of
intervals (major 3
rd
, perfect 5
th
, and octave) over three root note frequency ranges (octave ranges
centered on A2, A3, and A4). These conditions correspond to training levels 20, 22, and 24 of
the training program. Many participants in the CI group did not progress beyond level 22 within
the two-week training period. Therefore, one explanation for the lack of improvement in musical
76
interval identification after training is that some participants may have only been exposed to
easier levels of musical interval identification.
Our training protocol required participants to learn a difficult task in a brief amount of
time. Musical intervals are a relatively abstract concept and represent the pitch ratio, a concept
that is difficult to grasp without prior musical training. Given that it is well-known that interval
labeling is a skill that cannot be learned without dedicated musical training, a control group of
participants who did not train on interval labeling was not included. It is possible that task
familiarity had a small impact on participant performance that cannot be assessed without a
control group, since tasks in the pre- and post-training assessments were identical. However, task
familiarity is unlikely to have contributed significantly in this study given that there were no
significant improvements in performance found across sessions. Furthermore, the interval
labeling task was chosen due to its challenging nature, requiring participants to attend to multiple
musical interval stimuli in order to progress through the difficulty training levels. Studies have
suggested that an auditory task must be sufficiently difficult to result in learning since adequate
amounts of attention is a requirement of learning, but there is evidence that exceptionally
difficult tasks can still facilitate perceptual learning (Amitay et al., 2006; D. Moore & Amitay,
2007). However, the extent that task difficulty limits the higher-level labeling aspect of interval
identification is poorly understood.
Musical interval identification also requires a listener to distinguish between two pitches
and many listeners without prior musical training have poor resolution. McDermott and
colleagues demonstrated that NH non-musicians and even some amateur musicians had pitch
interval thresholds greater than a semitone for pure and complex tone conditions (J. H.
McDermott et al., 2010). They found that interval resolution was up to 8 times worse than
77
frequency resolution, indicating that the frequency resolution necessary to discriminate between
intervals of one semitone difference in width (e.g., minor second vs major second) may need to
be better than 1 semitone.
Even poorer pitch resolution is demonstrated in CI users (e.g., Pretorius and Hanekom,
2008; Goldsworthy, 2015). The ability to distinguish between two pitches is affected by the cues
(temporal and place-of-excitation) provided by the processor for different stimuli (see Figure 3.1
for representative encoding of musical notes). Cochlear Corporation (9/13 subjects) generally
discards temporal fine structure while providing temporal cues through F0 envelope modulation,
MED-EL (2/13 subjects) and AB (2/13 subjects) attempt to encode more temporal cues through
their processors, especially at lower frequencies (Arnoldner et al., 2007; Gazibegovic et al.,
2010; Wouters et al., 2015). Swanson and colleagues (2019) unpacked the temporal and place-
of-excitation cues for pure tones and harmonic complexes for the Cochlear Corporation signal
processing strategy. They showed that pure tones provide only place cues to pitch, with the filter
bandwidth at different frequencies having a substantial effect on the pitch resolution. For pure
tones near 1000 Hz, variation in pure tone frequency will produce variation in the relative
amplitude of two neighboring filters, hence variation in currents on neighboring electrodes. For
pure tones near 250 Hz, this mechanism does not work as well because the lowest filter is
centered at 250 Hz, so there is no lower neighbor. For pure tones near 4000 Hz, the filters are
much wider, and if two tones are both within one filter passband, then there may be little
difference in the two corresponding stimulation patterns. This may explain the general pattern of
results in Figure 3.3 with poor resolution at both lower (<250Hz) and higher (>4kHz)
frequencies and better resolution between 250 and 4kHz (Pretorius & Hanekom, 2008). Our
rationale for using pure tones of 250, 1000, and 4000 Hz is to broadly characterize spectral
78
resolution as conveyed by place pitch cues across the electrode array. Pure tones primarily
provide place pitch cues with the exception of strategies that attempt to provide timing cues for
pure tones. While MED-EL and AB would attempt to provide temporal cues for 250 Hz pure
tones, it does not appear to have broad effect on individual performance in Figure 3.3. Harmonic
complexes below 220 Hz would have good temporal cues provided by amplitude modulation at
the fundamental because the individual harmonics are not resolved by the ACE filter bank,
between 220 and 440 would have a mixture of the two cues, and above 440 Hz would have only
place cues because the individual harmonics are resolved (Swanson et al., 2019). The results of
Figure 3.4 suggest that subjects may have been more sensitive to temporal pitch cues than place
pitch cues. Interval identification was done with musical piano notes to provide the richest
encoding of musical tonality (Helmholtz, 1885; Siedenburg et al., 2019). The cues provided by
these notes varied based on frequency, with trials presenting place, temporal, or a mixture of
cues. Although the higher-level assessments could have been designed with stimuli to isolate a
single pitch cue, as was done by Vandali and colleagues (2015), the present study focused on
providing musical notes with the most potential cues for pitch and interval judgments, leaving
the cues chosen up to each subject’s clinical signal processing strategy.
Considering the relationships between pitch resolution and music perception, this study
demonstrates that pure tone frequency and fundamental frequency discrimination are both highly
correlated with musical interval identification. This correlation is anticipated given that a musical
interval is comprised of two different pitches. These correlations suggest that improving access
to low-level cues for pure tone frequency and/or fundamental frequency perception could
improve higher-level musical abilities. To improve perception of complex listening situations
and musical perception for CI users, future signal processing strategies should improve access to
79
stimulation cues that support pitch perception, whether that be through better coding of place-of-
excitation cues, better coding of temporal modulation cues, or a synergy of these two (Arnoldner
et al., 2007; Erfanian Saeedi et al., 2017; Firszt et al., 2009; Francart et al., 2015; Q. J. Fu &
Shannon, 1999a; Grasmeder et al., 2014; Laneau et al., 2006; Leigh et al., 2004; Lorens et al.,
2010; Luo et al., 2012; Müller et al., 2012; Rader et al., 2016; Riss et al., 2008, 2014, 2016; Stohl
et al., 2008; Vermeire et al., 2010). Concomitant with better signal processing, structured aural
rehabilitation programs should be designed to reintroduce CI users to the subtle stimulation cues
for pitch perception. Given the correlations of low-level pitch perception with higher-level
musical perception tasks, improvement in signal processing and dedicated aural rehabilitation
will likely improve musical enjoyment and appreciation for CI users.
80
Chapter 4: The Effects of Individual Differences and Perceptual
Learning on Stimulation Rate Discrimination in Cochlear Implant
Users
The work described in this chapter was published in PLOS ONE.
Bissmeyer S.R.S., Hossain S., Goldsworthy R.L. (2020) Perceptual learning of pitch provided by
CI stimulation rate. PLoS ONE 15(12): e0242842. https://doi-
org.libproxy2.usc.edu/10.1371/journal.pone.0242842
Introduction
In the auditory system, acoustic frequency is encoded in the place-of-excitation and
timing properties of the auditory nerve response. Place coding of frequency is produced by
cochlear mechanics and captured by auditory neurons giving rise to the well-established
tonotopy of the auditory system (Muniak et al., 2016). Temporal coding of frequency is captured
by phase-locked firing synchronous to acoustic frequencies at least as high as 3 kHz (van den
Honert & Stypulkowski, 1987; Dynes & Delgutte, 1992; Dreyer & Delgutte, 2006; Hill et al.,
1989; Shepherd & Javel, 1997), and arguably as high as 10 kHz (Heinz et al., 2001), although
there is a considerable debate in the field on the exact limit of temporal frequency coding
(Verschooten et al., 2019). CIs use place and timing of stimulation by allocating higher acoustic
frequencies to more basal electrodes and by modulating constant-rate stimulation with temporal
envelopes (Wouters et al., 2015). Historically, sound processing for CIs has limited temporal
cues to modulation frequencies less than 300 Hz (Shannon et al., 2004). Limiting stimulation
timing in such a manner discards TFS (B. S. Wilson et al., 2004), which if preserved might
improve pitch perception and speech comprehension in noise for CI users (Arnoldner et al.,
2007; Heng et al., 2011; Müller et al., 2012; Smith et al., 2002; A. E. Vandali et al., 2005;
81
Vermeire et al., 2010). The present study considers individual differences and perceptual
learning for stimulation rate discrimination when provided in a clear and consistent manner using
single-electrode stimulation.
Stimulation rate was one of the first psychophysical dimensions explored in CI science,
and studies have shown that recipients hear increasing pitch with increasing stimulation rate, but
resolution decreases above 300 Hz (Carlyon et al., 2010; H. J. McDermott & McKay, 1997;
McKay et al., 2000; Shannon, 1983; Tong et al., 1982; Tong & Clark, 1985; Townshend et al.,
1987b; Zeng, 2002). Several factors may contribute to the loss of resolution for higher rates, with
lack of experience possibly a factor. In general, many aspects of hearing improve with
experience. Speech comprehension and auditory awareness dramatically improve over the first
year after CI activation, and even after years of experience further benefits can be derived from
auditory training (Q.-J. Fu & Galvin, 2007; Shafiro et al., 2015; A. Vandali et al., 2015).
Training can improve psychophysical abilities and speech comprehension (Argstatter et al.,
2016; Q.-J. Fu & Galvin, 2007; Hutter et al., 2015). Since variable stimulation rate is not
typically used by clinical devices, it is possible that recipients could learn to use it if provided
access to the new information. Efforts to restore TFS to CI stimulation have demonstrated mixed
results, but certain studies indicate benefits for speech and music perception emerging with
experience (Arnoldner et al., 2007; Müller et al., 2012; Vermeire et al., 2010; Wouters et al.,
2015).
Perceptual learning has seldom been explored specifically for CI stimulation rate.
Goldsworthy and Shannon (2014) found that rate discrimination improves with training, with
benefits observed for rates as high as 3520 Hz (Goldsworthy & Shannon, 2014). This perceptual
learning of stimulation rate as a cue for pitch is similar to that shown for NH non-musicians, who
82
can improve their frequency discrimination from 1 to 0.1%, equivalent to musician level
performance, with training (Micheyl et al., 2006). This musician advantage has been shown to be
preserved across age and degrees of hearing loss (Bianchi et al., 2019). Considering the
remarkable plasticity of tonotopic coding of pitch (Reiss et al., 2014), one might predict
comparable plasticity of temporal pitch mechanisms (Kilgard & Merzenich, 1998).
Hypothetically, stimulation rate discrimination may also depend on the health of the
auditory nerve. Rate discrimination varies across subjects and within subjects across electrodes
(Baumann & Nobbe, 2004; Kong et al., 2009; Zeng, 2002). Variations in neural health may limit
stimulation rate sensitivity through diminished population coding and diminished neural
synchrony. The present study considers individual differences in psychophysical measures of
spatial tuning and temporal integration as predictors of stimulation rate discrimination. Spatial
tuning is quantified through forward-masked detection thresholds and average detection
thresholds, which reflect multiple aspects of spatial tuning including electrode-neural geometry,
local neural health, and tonotopic pitch associated with different places of excitation (Bierer &
Faulkner, 2010a; McKay, 2012; Zhou, 2016; Zhou et al., 2019). Others have suggested that
decoding of TFS in ascending auditory pathway depends on precise phase relationships across
auditory nerve fibers, characteristics that diminish with impoverished neural health (Golding &
Oertel, 2012). Degradation of the spatial patterning of neural health may degrade the temporally
precise mechanisms observed in the cochlear nucleus that have been suggested as underlying
encoding of TFS into average rate codes (Golding & Oertel, 2012). Consequently, forward-
masked and detection thresholds are considered here to the test the hypothesis that degradations
in spatial tuning will affect temporal pitch mechanisms that rely on precise integration times
across fibers. Further, even in the absence of gross degradations of spatial tuning, it is
83
hypothesized that temporal precision supported by fast-acting ion channels and vesicle release
will contribute to individual differences observed in rate discrimination. Consequently, the
psychophysically derived metric of multi-pulse integration is calculated from measured detection
thresholds (without masking) to quantify neural integration and potential neural degeneration
(Zhou & Pfingst, 2016a, 2016b).
There is some evidence that monopolar mode may improve performance for measures of
intensity discrimination, speech recognition, and rate discrimination (Lehnhardt et al., 1992;
Pfingst et al., 1997; Zhou et al., 2019; Zwolan et al., 1996), while other studies have shown no
consistent benefit from stimulation mode (Drennan et al., 2003; Macherey et al., 2011; Morris &
Pfingst, 2000; Pfingst et al., 2001). There has also been no agreed upon advantage for electrode
location along the current electrode array, with a tendency toward an apical electrode location
benefit (Goldsworthy & Shannon, 2014; Gordon et al., 2007; Middlebrooks & Snyder, 2007,
2010; Nelson et al., 2011), while other studies focus on finding local extrema at various
individual locations along the array (Zhou, 2016; Zhou et al., 2019; Zhou & Pfingst, 2014,
2016a, 2016b). The configurations in the present study were focused on comparing bipolar to
monopolar stimulation mode and the most distal electrode locations feasible along the array.
The present study was designed to examine individual differences and perceptual
learning of stimulation rate discrimination when provided in a clear and consistent manner.
Individual differences were examined to test the hypothesis that stimulation rate discrimination
can be predicted by psychophysical measures of spatial tuning and temporal integration, which
serve as a proxy for estimating the health of the auditory nerve. The effect of psychophysical
experience over three test sessions was examined to test the hypothesis that rate pitch
84
discrimination improves with focused psychophysical training. The results provide insight into
the extent that variable stimulation rates could be used to improve CI pitch perception.
Methods
Subjects
Seven CI users participated in this study. Four bilateral users were tested and completed
the protocol in each ear sequentially, with the first ear tested randomly selected. All subjects
were implanted with devices from Cochlear Corporation. Relevant subject information is
provided in Table 4.1. Participants provided informed consent and were paid for their
participation. The University of Southern California’s Institute Review Board approved the
study.
Table 4.1. Subject Demographics
Psychophysical Testing
Sub
ject
Gen
der
Ear
tested
Etiology
Age at Onset
of Hearing
Loss (yrs.)
Age at
Deafness
Age at
implant
ation
Age at
time of
testing
Bipolar
Mode
C1 M Both Meniere's 39
L:46
R:39
L:46
R:43
46 BP+3
C2 F Both Unknown 15 22
L:23
R:27
33 BP+2
C3 F Both
Progressive
Nerve Loss
40 53
L:54
R:58
71 BP+3
C4 M Both
Progressive
Nerve Loss
Birth 7
L:44
R:57
57 BP+4
C5 M Right Noise Induced 50 50 70 79 BP+3
C6 F Right
Progressive
Nerve Loss
20 50 64 67 BP+3
C7 F Left Noise Induced 20 44 54 66 BP+1
85
Overview. Subjects participated in a single-electrode psychophysical protocol with all
procedures conducted using the USC CI Research Interface (Shannon, 2015; Shannon et al.,
1990). Procedures were scheduled during three sessions, with one week between sessions. All
procedure used cathodic-leading biphasic pulse trains and always provided correct-answer
feedback. Every measure was tested at two locations (apical, basal) for two stimulation
configurations (bipolar, monopolar) with counterbalancing of all locations and configurations
across test sessions. During the initial session, loudness growth functions, baseline rate
discrimination thresholds, and forward-masked thresholds were measured. On the second
session, training was provided for rate discrimination for two hours and rate discrimination and
forward-masked thresholds were measured again. The third session was as the second session,
with two hours of psychophysical training followed by rate discrimination and forward masking
measures.
Detection Thresholds and Comfort Levels as a Function of Stimulation Rate
Detection thresholds and comfort levels were measured as a function of stimulation rate
to provide loudness balancing in rate discrimination procedures. Detection thresholds and
comfort levels were measured using a method of adjustment. Subjects used a graphical user
interface with six sliders controlling different stimulation rates from 50 to 1600 Hz in octave
intervals. After adjusting a slider, the subject would hear a 400 ms pulse train comprised of
biphasic pulses with 50 s phase durations and 50 s interphase gaps, the stimulation rate
corresponding to the slider and the current level corresponding to the slider height (values
rounded to the nearest clinical unit). Subjects were instructed to adjust all six sliders to detection
threshold and then to values that were loud but comfortable. The subjects were instructed to
loudness balance their detection and loud but comfortable levels across frequencies. The bipolar
86
mode chosen for each subject was the narrowest configuration in which they could still reach the
loud but comfortable level (100% dynamic range). The resulting detection thresholds and
comfort levels were fit with a logistic equation of the form:
𝑌 (𝑥 ) = 𝑈 −
𝑈 − 𝐿 (1 + 𝑄 𝑒 −𝐵𝑥
)
1
𝑣 ,
where U and L are the upper and lower limits of the subject’s dynamic range, Q is related to the
current level at 100 Hz, B is the rate by which the current decreases over the frequency range, x
is frequency expressed as log2(frequency/100), and v controls asymptotic growth. Various
equations were explored, and the fitted logistic equation provide the best compromise in terms of
shape for the nonlinear decrease in levels for increasing rates and had the lowest adjusted mean
squared error out of the functions considered. These were used to balance loudness in subsequent
rate discrimination procedures.
Rate Discrimination Thresholds
Rate discrimination thresholds were measured using a two-interval, two-alternative,
forced- choice procedure in which subjects were asked to select the interval that was higher in
pitch. Both the standard and target were 400 ms pulse trains comprised of biphasic pulses with
50 s phase durations and 50 s interphase gaps. The standard stimulation rates tested were
nominally 100, 200, 400, and 800 Hz. The rate of the target stimulus was adaptively controlled.
For each interval, the separate amplitudes of the standard and target were randomly roved
between 90 and 100% (uniform distribution) of the subject’s dynamic range as fitted by the
logistic function.
The initial difference between standard and target stimulation rates was 40%. This
difference was decreased by a step following correct responses and increased by three steps
(Equation 3.1)
87
following incorrect responses (75% detection accuracy, Kaernbach, 1991) (Kaernbach, 1991).
The initial step size was 0.9 and was decreased by a factor of 2
-½
after each reversal until
obtaining a value of √2 on the fourth reversal, at which point the step size was held constant at
√2 (i.e., requiring two correct responses to halve the rate difference). Adaptive runs continued
for a total of 8 reversals and the discrimination threshold was calculated as the average of the last
4 reversals.
Psychophysical Training of Stimulation Rate Discrimination
Psychophysical training of stimulation rate discrimination was conducted using single-
electrode rate discrimination procedures. The procedure used was a two-interval, two-alternative,
forced-choice procedure in which the stimulation rate difference between the standard and target
intervals was held constant but the base rate was adaptively increased to provide training at
increasing higher stimulation rates (Goldsworthy & Shannon, 2014). The standard and target
stimuli were 400 ms pulse trains comprised of biphasic pulses with 50 s phase durations and 50
s interphase gaps. Stimulation current levels were controlled using the fitted logistic functions
to detection thresholds and comfort levels. The initial value of the standard rate was 100 Hz, and
the target stimulation rate was specified to be 20% higher than the standard (i.e., 120 Hz).
Stimulation rates were constrained between 100 and 1600 Hz. Following correct responses, both
the standard and target stimulation rates were increased by a step; following incorrect responses,
both the standard and target stimulation rates were decreased by three steps (75% detection
accuracy) (Kaernbach, 1991). The initial step size was 2
1/3
(i.e., the base rate doubled after 3
correct responses), but was reduced by a factor of 0.9 until obtaining 2
1/12
(i.e., the base rate was
increased by a semitone after each correct response). Adaptive runs continued for 12 reversals
88
and the upper limit of discrimination was calculated as the average of the last 6 reversals. This
procedure was conducted for 8 conditions consisting of the 4 combinations of apical/basal
electrodes and monopolar/bipolar stimulation modes first tested using a 20% rate differences,
then tested using a 10% rate difference. Total training time was approximately 2 hours.
Forward-masked Detection Thresholds
Forward-masked detection thresholds were measured for combinations of masker
locations (apical, basal) and stimulation configurations (bipolar, monopolar) using a three-
interval, three-alternative, forced-choice procedure for a set of probe electrode locations. The
masker and probe were 500 Hz pulse trains comprised of biphasic pulses with 50 s phase
durations and 50 s interphase gaps. The masker and probe were 200 and 20 ms in duration,
respectively. The probe was presented following the masker with the first phase of the probe
starting 2 ms after the first phase of the last masker pulse. For this procedure, two of the intervals
only contained the masker, while the randomly assigned target interval included the probe
following the masker. The probe locations evaluated were 0, 1, 2 and 4 electrodes away from the
masker electrode. The initial value of the probe stimulus was set to the subject’s comfort level in
clinical units. The level of the probe was decreased by a step following correct responses and
increased by three steps following incorrect responses, which converges to 75% detection
accuracy (Kaernbach, 1991). The initial step for a run was 8% of the subject’s dynamic range in
clinical units, and the step was decreased by a factor of 2
-½
after each reversal until obtaining a
value of 2% on the fourth reversal, at which point the step was held constant at 2% of the
subject’s dynamic range. An adaptive run continued for a total of 10 reversals and the forward-
masked threshold for the run was calculated as the average of the last 6 reversals.
89
Statistical Methods
Stimulation rate discrimination thresholds were measured for all combinations of
stimulation modes (bipolar, monopolar), electrode locations (apical, basal), and stimulation rates
(100, 200, 400, 800 Hz). For each condition, rate discrimination thresholds were measured with
three repetitions and test sessions were repeated once a week for three weeks. Stimulation rate
training was administered each week through a staircase method which provided subjects
training into higher rates depending on their ability to discriminate rates. The statistical method
implemented for rate discrimination is a multi-factorial repeated measures analysis of variance
(ANOVA) with second-order interactions for the factors of subject, stimulation mode, electrode
location, test session, and stimulation rate. All statistics were calculated on logarithmically
transformed rate discrimination thresholds, with the rationale for using logarithmic transforms
provided by Micheyl and colleagues (Micheyl et al., 2006). Post-hoc multiple comparisons were
implemented for significant factors and interactions.
Three psychophysically derived metrics of spatial and temporal tuning were calculated:
detection thresholds, multi-pulse integration, and forward masking. All psychophysically derived
metrics were calculated for each subject, stimulation mode (bipolar, monopolar), and electrode
location (apical, basal). Normalized correlation coefficients were calculated for rate
discrimination thresholds and for all psychophysically derived metrics. A coefficient of rate
discrimination thresholds was calculated for each subject based on the average rate
discrimination threshold across stimulation rates. A coefficient of detection thresholds was
calculated as the mean value across stimulation rates. A coefficient of multi-pulse integration
slopes was calculated based on linear regression of the measured detection thresholds for
stimulation rates from 50 to 800 Hz. A coefficient of forward masking was generated through
90
fitting forward-masked thresholds with a regression line to the probe locations separated by 0, 1,
and 2 electrodes from the masker. All coefficients were normalized on a linear scale by
subtracting the average across all conditions. Analysis of variance with second-order interactions
was implemented for all metrics for the factors of subject, stimulation mode, and electrode
location. Correlation analysis was implemented using a linear correlation procedure which
analyzed the correlation coefficients and produced a R value and p value to characterize the
linear orientation and confidence level of the correlation, respectively.
Results
Results were collected for seven CI users, four of whom were bilateral and were tested in
each ear. Each subject completed the three-session protocol including laboratory measures of
rate discrimination training and assessment, and metrics such as forward-masked thresholds and
equal-loudness contours as a function of stimulation rate. Results examine distribution of
measures across subjects, the effect of psychophysical training on observed thresholds, and the
predictive power of psychophysically derived metrics of spatial tuning and multi-pulse
integration.
Rate Discrimination Thresholds
Figure 4.1 shows rate discrimination thresholds for each subject averaged across
repetitions and sessions. Rate discrimination thresholds vary greatly among subjects. Figure 4.2
shows individual and median rate discrimination thresholds to highlight trends across mode and
electrode.
91
Figure 4.1: Individual Rate Discrimination Thresholds
Individual rate discrimination thresholds by mode and electrode across frequencies
Figure 4.2: Boxplot of Median Rate Discrimination Thresholds
Boxplot showing median rate discrimination thresholds by mode and electrode across frequencies. Filled symbols represent each
implant user’s thresholds over weekly session and repetition.
Analysis of variance was implemented on the measured rate discrimination thresholds
with subject, stimulation mode, electrode location, stimulation rate, and test session (i.e., week)
92
as factors. Subject was significant indicating substantial variability across subjects in terms of
average discrimination thresholds (F(10, 1479) = 13.23, p < 0.001). Stimulation mode was not
significant indicating that average rate discrimination thresholds were not statistically different
between bipolar and monopolar stimulation modes (F(1, 1479) = 0.2, p = 0.6827). Similarly,
electrode location was not significant indicating that average rate discrimination thresholds were
not statistically different between apical and basal electrode locations (F(1, 1479) = 0.93, p =
0.3359). Stimulation rate, as expected given the noted deterioration of discrimination with
increasing stimulation rate, was significant (F(3, 1479) = 589.6, p < 0.001). Test session was
significant (F(2, 1479) = 6.16, p = 0.0022) indicating that average rate discrimination thresholds
improved over the course of the three-session protocol.
All second-order interactions with subject were highly significant (p < 0.001) indicating
substantial individual variability in how thresholds were affected by stimulation mode, electrode
location, and auditory training. The interaction between stimulation mode and electrode location
was weakly significant (F(1, 1479) = 4.49, p = 0.0343). Post-hoc multiple comparison indicated
that this interaction between stimulation mode and electrode location was primarily driven by
lower discrimination thresholds observed for monopolar stimulation of the apical electrode.
Generally, there has been mixed discussion in the literature on an established stimulation mode
or electrode location benefit, with a tendency toward monopolar mode and apical location
stimulation providing a benefit, which could drive the interaction between monopolar stimulation
and apical electrode location (Macherey et al., 2011; Middlebrooks & Snyder, 2007; Zhou et al.,
2019).
The highly significant interaction between both subject and mode and subject and
electrode location revealed that although there was not a group benefit for stimulation mode or
93
electrode location, some individuals were affected by mode and location. Post-hoc multiple
comparisons revealed that although some subjects received a significant benefit for a specific
mode or electrode location as gauged by Fisher’s least significant difference criteria, the majority
of these significant results were not significant as gauged by more stringent Bonferroni criteria.
Rate Discrimination Improves through Experience
Rate discrimination thresholds improved significantly over the course of the three-session
protocol (F(2, 1479) =6.16, p = 0.0022). Figure 4.3 compares the results of the present study to
those of Goldsworthy and Shannon (2014) (Goldsworthy & Shannon, 2014). Rate discrimination
thresholds are plotted versus hours of psychophysical training for different base rates. The results
of the present study showed that training over three sessions with one week between each session
provides a significant performance benefit to implant users. Discrimination thresholds
significantly improved from week 1 to week 3 with averaged thresholds being 21.5% before
training and 18.26% after training, improving on average by 1.62% per week.
94
Figure 4.3: Comparing the Effect of Training on Rate Discrimination in Goldsworthy and Shannon
(2014) and the Present Study
The effect of training on rate sensitivity in present study, closed symbols, compared to rate training data in Goldsworthy and
Shannon (2014), open symbols, with a linear fit to the data shown with the root-mean-square error in shaded gray.
Forward-Masked Detection Thresholds
Forward-masked thresholds were measured for specific probe locations separated by 0, 1,
2, and 4 electrodes from the masker electrode. Figure 4.4 shows forward-masked thresholds for
the forced-choice procedure for apical and basal electrode locations and for bipolar and
monopolar stimulation modes. Thresholds were normalized in linear scale by dividing the
forward masking function by the peak threshold shift resulting in scale where 0 is no masking
and 1 is maximum masking (McKay, 2012). Thresholds typically decreased monotonically with
increasing separation between masker and probe electrodes, but with noted deviations from that
rule as observed for subjects 2L, 2R, 4L, and 4R.
Figure 4.4: Individual Forward-Masked Thresholds
Forward masking protocol with individual results for CI users. These thresholds are represented as a proportion of percent
dynamic range relative to the peak of the masking function.
95
Psychophysically Derived Metric: Forward-Masked Threshold Slopes.
Forward-masked thresholds generally saturated at a spatial separation of 2 electrodes, so
the furthest electrode was excluded from the fitted slopes. Subject was significant (F(10, 10) =
9.92, p < 0.001) indicating individual differences in spatial selectivity. Stimulation mode (F(1, 10)
= 3.11, p = 0.1083) and electrode location (F(1, 10) = 3.78, p = 0.1264) were not significant
factors. Second-order interactions between subject and stimulation mode (F(10, 10) =1.97, p =
0.1503) and subject and electrode location (F(10, 10) = 1.86, p = 0.1719) were not significant. The
interaction between stimulation mode and electrode location (F(1, 10) = 13.21, p = 0.0058) was
significant. A post-hoc multiple comparison test indicated that this interaction between
stimulation mode and electrode location was primarily driven by steeper forward-masked
threshold slopes observed for bipolar stimulation of the apical electrode.
Detection thresholds and comfort levels as a function of stimulation rate
Figure 4.5 shows detection threshold levels as a function of stimulation rate. Detection
thresholds and comfort levels typically exhibit temporal integration across pulses by decreasing
monotonically for increasing rates.
96
Figure 4.5: Individual Detection Threshold Levels
Individual threshold levels across stimulation rate exhibiting integration across pulses.
Psychophysically Derived Metric: Average Detection Thresholds.
Average detection thresholds were calculated from the detection thresholds shown in
Figure 4.5. Group averages indicate similar thresholds for apical and basal electrodes when
tested in monopolar mode, but with apical sites having higher average thresholds than basal sites
when tested in bipolar mode. Subject was significant indicating the variability across subjects in
stimulation levels required to obtain detection thresholds (F(10, 10) = 9.6, p < 0.001). Stimulation
mode was significant reflecting the lower stimulation levels required to reach detection
thresholds for monopolar stimulation (F(1, 10) = 34.9, p < 0.001). Electrode location was not
significant (F(1, 10) = 2.0, p = 0.18) indicating similar thresholds for apical and basal stimulation
sites. The interaction between subject and stimulation mode was not significant (F(10, 10) = 1.8, p
= 0.17); however, the interaction between subject and electrode location was significant
indicating individual variability in terms of apical and basal stimulation levels required to reach
threshold (F(10, 10) = 7.9, p = 0.0015). The interaction between stimulation mode and electrode
location was not statistically significant (F(1, 10) = 3.2, p = 0.10).
97
Psychophysically Derived Metric: Multi-Pulse Integration.
Multi-pulse integration is shown in Figure 4.5 as a function of the decrease in measured
detection thresholds due to integration across pulses with increasing stimulation rate. In contrast
to the other metrics, subject was not significant (F(10, 10) = 2.2, p = 0.11) indicating lower
variability in subject performance across conditions than was the case for either average
thresholds or forward-masked tuning slopes. Neither stimulation mode (F(1, 10) = 2.6, p = 0.14)
nor electrode location (F(1, 10) = 0.1, p = 0.76) were significant. Neither the interaction between
subject and stimulation mode (F(10, 10) = 1.1, p = 0.44) nor between stimulation mode and
electrode location (F(1, 10) = 0.04, p = 0.84) were significant. The interaction between subject and
electrode location, however, was significant (F(10, 10) = 5.2, p = 0.008) indicating that individual
differences in multi-pulse integration slopes across apical and basal stimulation sites may be
promising as a potential predictor of individual variability.
Correlation Analysis between Psychophysically Derived Metrics and Rate
Discrimination
The three psychophysically derived metrics (forward-masked threshold slopes, detection
thresholds, and multi-pulse integration) were analyzed for correlation with rate discrimination
thresholds. These metrics were calculated for each subject for each combination of stimulation
mode, electrode location, and stimulation rates. We first tested the correlation between grand
averages across stimulation modes, electrode locations, and stimulation rates for each metric and
the corresponding grand average measured rate discrimination. None of those correlations were
significant. As suggested by Zhou and Pfingst (2016a, 2016b), such across-subject correlations
tend to be weak since multiple individual differences affect perceptual outcomes (Zhou &
98
Pfingst, 2016a, 2016b). Consequently, we then tested the correlation between normalized metric
differences and normalized rate discrimination differences. Specifically, to normalize the average
across conditions of the metrics and rate discrimination thresholds for each subject were
subtracted from the metrics and discrimination thresholds for each subject for each electrode and
stimulation mode (Figure 4.6). None of the correlations tested were significant at a 0.05 level, let
alone the more stringent levels recommended for multiple comparisons analyses. In the
correlation between forward-masked threshold slopes and rate discrimination thresholds, the
apical electrode yielded a weak negative correlation (p = 0.06) indicating slightly lower rate
discrimination thresholds for shallower forward-masked slopes, in agreement with literature
correlating shallower forward-masked slopes with neural health (Zhou & Pfingst, 2016a, 2016b).
Consistent with the literature, there is a slight trend toward steeper multi-pulse integration slopes
correlating with lower rate discrimination thresholds (Zhou et al., 2019).
Figure 4.6: Correlation Analysis between Rate Discrimination and Psychophysically Derived
Metrics of Frequency Tuning
99
Correlation analyses for rate discrimination correlated with detection thresholds, forward-masked slopes, and multi-pulse
integration slopes normalized to the average across conditions for each subject.
Exploratory Correlation Analysis among Psychophysically Derived Metrics
Exploratory correlation analyses performed among the psychophysically derived metrics
(forward-masked threshold slopes, detection thresholds, and multi-pulse integration) to analyze
within-category correlations yielded mildly significant results. The correlations were done by
normalizing by the average across conditions for each subject. In the correlation between
forward-masked threshold slopes and detection thresholds, monopolar mode yielded a weak
positive correlation (p = 0.132) and basal electrode yielded a significant positive correlation (p =
0.003) indicating slightly lower detection thresholds for steeper forward-masked threshold
slopes, consistent with the literature (Zhou, 2016; Zhou & Pfingst, 2016b). The correlation
between multi-pulse integration slopes and detection threshold produced a significant result for
both bipolar (p = 0.016) and monopolar (p = 0.003) stimulation modes and for the basal
electrode (p = 0.015) with higher detection thresholds correlating with steeper multi-pulse
integration slopes. Forward-masked threshold slopes exhibited a weak correlation to multi-pulse
integration slopes for bipolar mode (p = 0.08) and a stronger correlation for the basal electrode
(p = 0.01) with shallower forward-masked threshold slopes correlating with steeper multi-pulse
integration slopes, consistent with the literature (Zhou & Pfingst, 2016a, 2016b).
Discussion
The present study was designed to examine individual differences and perceptual
learning of stimulation rate discrimination in adult CI users. Individual differences were
examined by measuring rate discrimination at different electrode sites and configurations, while
exploring psychophysically derived metrics of spatial tuning and temporal integration. While
100
much across and within-subject variability was observed for rate discrimination and for derived
metrics, few significant correlations were observed. One observed trend, consistent with previous
studies (Zhou et al., 2019), was that stimulation rate discrimination tended to be better for
monopolar stimulation in the apex. Perceptual learning of rate discrimination was examined
across several training sessions and discrimination thresholds improved with training. That
stimulation rate discrimination improves with experience suggests that the true limits of rate
discrimination may remain unknown until clinical devices provide such information in a clear
and consistent manner.
Comparison with Stimulation Rate Discrimination Literature
For the lower stimulation rates tested (100, 200 Hz), the discrimination thresholds of the
present study are comparable to the more sensitive thresholds reported in the literature. In the
present study, the average discrimination threshold prior to training was 8% for 100 Hz base
rates. The more sensitive thresholds reported in the literature are typically between 7 and 10% at
100 Hz (Goldsworthy & Shannon, 2014; B. C. J. Moore & Carlyon, 2005; Townshend et al.,
1987b; Venter & Hanekom, 2014). The review by Moore and Carlyon (2005) was a compilation
of 5 studies across 19 subjects (McKay et al., 1999, 2000; B. C. J. Moore & Carlyon, 2005;
Pfingst et al., 1994; R. J. M. van Hoesel & Clark, 1997; Zeng, 2002). Other studies found poorer
discrimination, with thresholds measured at 100 Hz ranging from 20 to 40% (Bahmer &
Baumann, 2013a; Cosentino et al., 2016; Stahl et al., 2016).
For the higher stimulation rates (400, 800 Hz), discrimination thresholds of the present
study are more sensitive than generally reported. Average rate discrimination thresholds of the
present study were 30% and 54%, when measured at 400 and 800 Hz, respectively. In
comparison, Townshend and colleagues (1987) reported that two of their subjects had an average
101
40% rate discrimination threshold, while the third subject could not discriminate rates above 175
Hz (Townshend et al., 1987b). Three studies reported rate discrimination thresholds in the range
of 47-49% (Cosentino et al., 2016; Venter & Hanekom, 2014; Zeng, 2002). Bahmer and
Baumann (2013) reported thresholds of 69.2% (Bahmer & Baumann, 2013a). Few studies tested
single-electrode rate discrimination above 400 Hz. Bahmer and Baumann (2013) reported 76.9%
at 566 Hz and Zeng (2002) reported 113% for 500 Hz (Bahmer & Baumann, 2013a; Zeng,
2002). Two of the subjects in Townshend and colleagues (1987) could discriminate rates above
400 Hz (Townshend et al., 1987b). One of their subjects was able to discriminate 19%
differences at 900 Hz and the other 15% differences at 1000 Hz. McKay and colleagues (2000)
tested at 500 and 600 Hz, but their subjects could not discriminate rates based on perceived pitch
(McKay et al., 2000).
The duration and amount of feedback seems to be a primary factor driving differences in
measured rate discrimination thresholds across studies. Thresholds measured in the current study
are consistent with most studies at lower frequencies, but more sensitive than studies which use a
brief task exposure time. The current study is based on a total of 12 hours of psychophysical
training and assessment with feedback always provided. Training was designed to gradually
increase exposure to higher stimulation rates. The extended and progressive experience provides
familiarity with the rate cue that is reflected in lower discrimination thresholds (Figures 3.1, 3.2,
and 3.3). Differences between the training protocol used in the present study and the training
protocol used in Goldsworthy and Shannon, 2014, led to different amounts of training for the
highest rates under consideration. Specifically, in the prior study from our group, training was
provided also using an adaptive procedure to gradually introduce higher stimulation rates to
subjects; however, the protocol used in Goldsworthy and Shannon, 2014, allowed for 32
102
reversals during each training run, which allowed subjects to work into progressively higher rates
and hold their performance in that region. For the present study, only 12 reversals were provided
during training, consequently, subjects received considerably less time to work and sustain into
the higher rate region. This is the likely cause for why benefits of training were only observed for
the lower rates in the present study.
Lack of Correlation between Rate Discrimination and Other Psychophysical Measures
In the present study, psychophysically derived metrics of spatial tuning and temporal
integration were not predictive of rate discrimination thresholds, but were consistent with
previous studies (Zhou, 2016; Zhou et al., 2019; Zhou & Pfingst, 2016a, 2016b). We interpret
these results as evidence that stimulation rate, at least for rates as high as 400 to 800 Hz, are well
encoded into auditory nerve activity. It is difficult to know how much of the correlations are
driven by the plasticity of pitch and the underlying peripheral sensitivity. It is quite possible that
over the course of training, better correlations could be found as we explore the peripheral
limitations, however, in the present study, pre- and post-training rate discrimination thresholds as
well as thresholds at different base rates were examined for correlations and the conclusions
remained the same. Studies using neural response telemetry have provided evidence that
temporal synchrony of neural response is well maintained in CI recipients at least up to 1 kHz
(Dynes & Delgutte, 1992; Goldsworthy & Shannon, 2014; Hughes et al., 2012a, 2014). If
forward-masked thresholds or multi-pulse integration quantify neural health, typical variations
do not appear to strongly affect stimulation rate discrimination. A study which made a similar
comparison found a weak relationship between multi-pulse integration slopes and rate
discrimination (Zhou et al., 2019). We interpret the lack of correlation in a positive manner, that
103
modest variations in neural health do not significantly impair a recipient’s ability to hear pitch
evoked by stimulation rate.
In the present study, the psychophysically derived metrics of spatial tuning and multi-
pulse integration were calculated using the most apical and most basal electrode locations. Other
studies examining metrics of spatial tuning and temporal integration of have explicitly chosen
electrodes with steep and shallow slopes to consider effects at local extrema (Zhou & Pfingst,
2016a, 2016b). That approach may be more sensitive for detecting correlations since it may
identify particularly healthy or damaged regions of the auditory nerve. Similarly, other variations
in the measurement of forward-masked thresholds may affect the overall strength of correlation
(Jesteadt et al., 2005; McKay, 2012; Nelson & Donaldson, 2002; Shannon et al., 1990; Zhou &
Pfingst, 2016b).
A significant interaction between stimulation mode and electrode location occurred for
three of the measures: rate discrimination, forward-masked thresholds, and detection thresholds.
One significant point was that the lowest rate discrimination thresholds and the shallowest
forward-masked threshold slopes occurred for the same electrode-mode combinations of bipolar
basal and monopolar apical. This trend of lower rate discrimination thresholds correlating to
shallower forward-masked threshold slopes, and thus steeper multi-pulse integration, agrees with
the literature (Zhou et al., 2019; Zhou & Pfingst, 2016b). For detections thresholds, monopolar
threshold levels were similar across electrode locations, and unsurprisingly were lower than
bipolar thresholds. Detection thresholds in the bipolar configuration were lower in the base
correlating with shallower forward-masked threshold slopes and lower rate discrimination
thresholds for the same configuration and location. Overall, this provides a trend with monopolar
providing better performance in the apex and bipolar providing better performance in the base.
104
This trend is in agreement with Zhou and colleagues (2019) with the sites exhibiting the
shallowest forward-masked thresholds slopes, monopolar apical and bipolar basal, providing
improved rate discrimination performance over the sites with steeper forward-masked threshold
slopes (Zhou et al., 2019). As mentioned, in the literature there has been a tendency toward a
benefit provided by monopolar mode and apical location stimulation providing a benefit, which
may contribute to the better performance in the monopolar apical configuration (Macherey et al.,
2011; Middlebrooks & Snyder, 2007; Zhou et al., 2019).
A limitation of the present study concerns how electrode configuration might affect rate
discrimination in that the comparison was made between monopolar with relatively broad bipolar
configurations. We chose to examine bipolar configurations for which comfort levels could be
mapped with relatively short pulsatile phase durations. We chose to do so to concentrate charge
in a temporally precise manner but doing so required broader bipolar configurations to be used.
Consequently, it is unclear whether the effect of electrode configuration would be more
pronounced using narrower configurations such as tripolar or quadrupolar. Another possible
limitation is some subjects exhibited a non-monotonic pattern in their detection thresholds which
can be observed in Figure 4.5. This non-monotonic pattern was reflected in the multi-pulse
integration metric as well and was most pronounced for subject 4. The subjects were instructed
to set their threshold as the lowest level at which they first heard the stimulus, with careful
attention given to loudness balancing. That being the case, the detection thresholds may have
been more conclusively set by another method, such as an alternative forced-choice procedure.
Psychophysical Training Improves Stimulation Rate Discrimination
Stimulation rate discrimination improved with training. While pitch discrimination has
been shown to be perceptually plastic even in NH listeners, it is possible that there is greater
105
potential for training pitch associated with stimulation rate since variable stimulation rates are
typically not used by CIs. Few studies have considered perceptual learning of stimulation rate,
though several studies have consider perceptual and physiological plasticity associated with
tonotopy, with attention given to the tonotopic mismatch between the acoustically and
electrically stimulated areas. Reiss and colleagues (2014) showed plasticity in the representation
of place pitch provided by the frequency allocation of the CI processor, especially over the first
two years of use (Reiss et al., 2014).
Animal studies have considered temporal coding of frequency following deafening and
implantation. Fallon and colleagues (2014) studied cats who were deafened at birth, implanted at
8 weeks, and activated 2 weeks post-surgery, after which they were stimulated constantly for 6-8
months (Fallon et al., 2014). They found that a moderate duration of deafness with cochlear
implantation had minimal overall effect on the temporal response properties of neurons, with the
only significant effect being the decreased ability of the neural population to respond to every
pulse in a pulse train. Another study showed that longer durations of deafness can have more
adverse effects on temporal response properties, but that training can provide a profound
improvement in degraded temporal processing (Vollmer & Beitel, 2011).
Given the evidence for plasticity of stimulation rate pitch perception and the evidence for
strong neural synchrony to electrical stimulation, we speculate that variable stimulation rates can
be used to improve pitch perception for CI users (Goldsworthy & Shannon, 2014; Irvine, 2018).
Attempts to restore TFS into CI stimulation have been mixed with some studies indicating no
benefits, but others indicating benefits for speech and music perception that emerge over a year
or more of experience (Arnoldner et al., 2007; A. E. Vandali & van Hoesel, 2011, 2012; Wouters
et al., 2015). A challenge associated with CI sound processing design is to determine the extent
106
that temporal coding is limited by sound processing rather than by physiology. In that regard,
single-electrode psychophysics are insightful as to the physiological limits and the potential for
perceptual learning. Psychophysical training of stimulation rate pitch perception has rarely been
investigated since measures require laboratory hardware and repeat visits from subjects.
Typically, rate discrimination is assessed in acute laboratory protocols across one or two sessions
without substantial familiarization or dedicated psychophysical training (Carlyon et al., 2010;
Kong et al., 2009; Kong & Carlyon, 2010). Since CI signal processing may not adequately use
stimulation rate to encode acoustic cues, the only experience CI users may have with this cue for
higher rates is during these acute laboratory visits designed to assess its salience.
For learning to occur, in general, the cue of interest must be presented in a clear and
consistent manner and provided on a regular basis (Rosskothen-Kuhl et al., 2018; Wright et al.,
2010). The results of the present study indicate that rate discrimination improves with as little as
12 hours of training and assessment. These results are consistent with Goldsworthy and Shannon
(2014), which examined the effects of auditory training on rate discrimination thresholds of six
CI users over the course of 28 hours of training and assessment, but with notably less
improvement for the higher rates tested (Goldsworthy & Shannon, 2014). While the training in
Goldsworthy and Shannon (2014) focused on rates from 110 to 1760 Hz, the training in the
present study focused on the regions below 400 Hz due to the smaller rate differences used (10 &
20%), and interestingly the effects of training did not transfer to higher frequencies.
Psychophysical training with large rate differences (e.g., > 40%) and using large number of trials
(e.g., > 40) in the adaptive procedures would allow subjects to gradually work up to higher
stimulation rates for consistent training at those rates. A consistent, daily training of the relevant
107
rate cues under the right conditions may produce results similar to Goldsworthy and Shannon
(2014).
Conclusions
The present study examined individual differences and perceptual learning of stimulation
rate pitch perception. Individual differences between and within subjects based on forward-
masked thresholds and multi-pulse integration were not predictive of rate discrimination
thresholds. We interpret this finding as evidence that peripheral coding of stimulation rate does
not strongly affect rate discrimination in CI users. In contrast, rate discrimination thresholds
significantly improved for base rates of 100 and 200 Hz with relatively brief exposure and
training for associating pitch with stimulation rate. This provides further evidence for the
plasticity of temporal pitch provided by stimulation rate. Consequently, sound processing
strategies designed to encode acoustic TFS into fine timing of stimulation should be examined in
the context of perceptual learning of pitch.
108
Chapter 5: Combining Stimulation Place and Rate Improves
Frequency Discrimination in Cochlear Implant Users
The work described in this chapter was published in Hearing Research.
Bissmeyer, S.R.S., Goldsworthy, R.L., 2022. Combining Place and Rate of Stimulation Improves
Frequency Discrimination in CI Users. Hearing Research 424, 108583.
https://doi.org/10.1016/j.heares.2022.108583
Introduction
Though CIs have been widely successful, there are well-known deficiencies related to
speech recognition and music appreciation. Pitch perception, essential for speech and music, is
poorly provided by CIs. Poor pitch resolution diminishes speech comprehension in background
noise (A. Caldwell & Nittrouer, 2013; do Nascimento & Bevilacqua, 2005; Q.-J. Fu & Nogaki,
2005), vocal emotion recognition (Deroche et al., 2014; Gilbers et al., 2015; Luo et al., 2007),
music appreciation (Bruns et al., 2016; Gfeller et al., 2000), and, consequently, quality of life
(Ambert-Dahan et al., 2015; Lassaletta et al., 2007, 2008; Looi et al., 2008, 2012; Looi & She,
2010; Moran et al., 2016). Motivated by the essential role of pitch in hearing, the study described
here considers psychophysical cues that support frequency discrimination in CI users.
In NH, frequency is inseparably encoded in the tonotopic and temporal response
properties of the auditory nerve. The tonotopic response to frequency, or place-frequency map, is
initiated by mechanical tuning properties of the cochlea and persists throughout the ascending
auditory pathway (Clopton et al., 1974; Fekete et al., 1984; Liberman, 1982; Muniak et al., 2016;
Ryugo & May, 1993). The temporal response properties derive from the remarkable ability of the
auditory nerve to phase-lock synchronously to acoustic frequencies as high as 5 kHz (van den
Honert & Stypulkowski, 1987; Dynes & Delgutte, 1992; Dreyer & Delgutte, 2006; Hill et al.,
109
1989; Shepherd & Javel, 1997; Rose et al., 1967; Palmer & Russell, 1986; Heinz et al., 2001).
Although the auditory nerve can phase-lock to relatively high frequencies, there is active debate
as to the upper limit of usable temporal frequency information for tasks such as sound
localization, pitch perception, and speech perception (Verschooten et al., 2019). Because
tonotopic and temporal cues are inseparable in NH, there is debate regarding the contributions of
these cues, as well as the possible need for aligning these cues synergistically (Attneave et al.,
1971; Carlyon et al., 2012; Luo et al., 2012; McKay et al., 2000; Oxenham, 2013; Oxenham et
al., 2004, 2011; Palmer & Russell, 1986; Rose et al., 1967). By whatever mechanism that
tonotopic and temporal cues are decoded into a sense of pitch, NH listeners can discriminate pure
tones that differ by 1-5% in frequency for a wide range of frequencies (300-4000 Hz) with best
discrimination near 0.1% (Goldsworthy et al., 2013; Micheyl et al., 2006; Tyler et al., 1983).
Since tonotopic and temporal cues can be independently conveyed by CIs, there are theoretical
and practical motivations to measuring the contributions of these cues to pitch (Arnoldner et al.,
2007; Laneau et al., 2004; Litvak et al., 2003; Oxenham et al., 2004; Shannon et al., 2004; Smith
et al., 2002; Vermeire et al., 2010; B. S. Wilson et al., 2004).
Pitch perception provided by CI place-of-excitation has been studied using clinical
processors and direct electric stimulation. The smallest discriminable difference in pitch between
two frequencies is often measured as a discrimination threshold (measured in % difference from
the base frequency for the present study). A single electrode will often have a quarter to one-
third octave filter bandwidth with around 3-4 semitones allocated to each electrode (or 18.9-26%
discrimination threshold for discriminating between single electrodes). CI users can discriminate
pure tones with their clinical processors that differ by between 1 and 30% across a wide range of
frequencies (250-2000 Hz), with an average around 10%, an order of magnitude worse than NH
110
(Goldsworthy, 2015; Goldsworthy et al., 2013; Pretorius & Hanekom, 2008). Pure tone
frequency discrimination through the clinical processor generally relies on place-of-excitation
cues, but with the acknowledgement that some processing strategies, such as Fine Structure
Processing (FSP) for MED-EL implants, may preserve temporal cues for low-frequency pure
tones. Computer-controlled electrode psychophysics, which bypass the clinical processor, allow
specific place cues to be provided, but the stimuli may not be as familiar to the participant.
Studies have shown tonotopic progression with basal electrodes heard as higher in pitch
compared to apical electrodes (Nelson et al., 1995; Tong & Clark, 1985). Pairs of electrodes
simultaneously stimulated or closely interleaved provide intermediate place cue percepts (Kwon
& van den Honert, 2006; Landsberger & Srinivasan, 2009; Macherey & Carlyon, 2010; H. J.
McDermott & McKay, 1994; Srinivasan et al., 2012). With this method, CI users can generally
discriminate place-of-excitation differences of less than 1 electrode (Kenway et al., 2015; Laneau
& Wouters, 2004; Townshend et al., 1987a).
Studies have also examined the use of temporal cues for discriminating pitch (Bernstein
& Oxenham, 2006; Houtsma & Smurzynski, 1990; Kaernbach & Bering, 2001; Shackleton &
Carlyon, 1994). A semitone difference in Western musical notation is 5.95% from the base note
frequency. CI users can generally discriminate between harmonic complexes that differ by 5 to
30% for fundamental frequencies between 110 and 880 Hz, much worse than the 0.1 to 5%
frequency resolution observed in NH (Goldsworthy, 2015; Goldsworthy et al., 2013; Luo et al.,
2019; Micheyl et al., 2006). The extent that this poor resolution is caused by degradation of
tonotopic relative to temporal cues is unknown (Swanson et al., 2019). Studies that bypass
clinical processing and test rate discrimination directly generally conclude that the temporal pitch
mechanism is weak and unusable above 300 Hz (Carlyon et al., 2010; Laneau et al., 2004;
111
Macherey & Carlyon, 2014; H. J. McDermott & McKay, 1997; McKay et al., 2000; Shannon,
1983; Tong et al., 1982; Tong & Clark, 1985; Zeng, 2002). However, since many clinical
processors poorly encode temporal cues, it is possible that stimulation rate perception may
require experience (Goldsworthy & Shannon, 2014; Wouters et al., 2015).
Mechanisms for decoding a sense of pitch based on stimulation rate have been put forth
based on neural circuitry of the cochlear nucleus that receive inputs from broadly tuned regions
of the auditory nerve (Bahmer & Langner, 2009; Golding & Oertel, 2012). This has led to
speculation that multi-electrode stimulation with consistent timing information presented across
the electrode array might provide better access to stimulation rate as a cue for pitch perception
(Venter & Hanekom, 2014). The rationale was that the neural mechanisms of the cochlear
nucleus would thus have better access to neural events across fibers, which would allow neural
processing along the lines suggested by the Wever volley principle (Wever & Bray, 1937).
Evidence for an advantage for multi-electrode compared to single-electrode stimulation has been
mixed with some studies finding a small and consistent benefit of multi-electrode stimulation
(Penninger et al., 2015; Venter & Hanekom, 2014), while other studies found no significant
difference (Bahmer & Baumann, 2013b; Carlyon et al., 2010; Laneau & Wouters, 2004;
Marimuthu et al., 2016).
Hypothetically, place and rate cues for pitch may also be affected by the health of the
auditory nerve. Forward-masked thresholds reflect multiple aspects of frequency tuning
including electrode-neural geometry and local neural health (Bierer & Faulkner, 2010b;
Bissmeyer et al., 2020a; McKay, 2012; Zhou, 2016). The present study aims to explore
correlations of forward masking with pitch tasks to ascertain whether there is a relationship
between an individual’s ability to discriminate pitch and their auditory neural health.
112
Studies have explored the combination of place and rate cues for pitch with varying
results. Fearn and Wolfe (2000) implemented pitch scaling across electrodes and rates in which
the subjects assigned a value on a numerical scale to the pitch of each stimulus considered, from
0 for very low pitch to 100 for very high pitch. They found that pitch perception was strongly a
function of both cues, albeit with some saturation for more basally stimulated electrodes and
marked saturation for rates above 500 pulses-per-second. Landsberger and colleagues (2016)
looked at scaling pitch and quality for single-electrode stimulation in long electrode arrays
finding that different combinations of place and rate could produce similar pitch percepts but
with different sound qualities. Schatzer and colleagues (2014) showed that the ability of single-
sided deafened subjects to pitch match CI stimulation rates to a contralaterally presented acoustic
pure tone of fixed frequency was better with apical electrodes for 100-300 Hz pure tones and
basal electrodes for 450 Hz pure tones, while successful pitch matches could be made with
medial electrodes across these pure tone frequencies (Landsberger et al., 2016). Rader and
colleagues (2016) performed a similar experiment but with pitch matching acoustic pure tones to
place dependent stimulation rates. They found very close acoustic to electrode frequency pitch
matches in what they described as “unparalleled restoration of tonotopic pitch perception in CI
users with single-sided deafness” and suggested that place dependent stimulation rates in CI
signal processing could greatly improve pitch perception. Swanson and colleagues (2019)
explored the contributions of place and rate to pitch percepts with judiciously chosen audio
signals delivered through the clinical processor and found that rate and place could be used for
pitch ranking and melody recognition, but that it could not be ruled out that melody recognition
with place cues was perceived as brightness/timbre. Place and rate of stimulation have been
posited to be perceptually orthogonal, in that both can be used to manipulate pitch percepts, but
113
that they do not combine synergistically (Landsberger et al., 2018; Macherey et al., 2011;
McKay et al., 2000; Tong et al., 1983). There is some evidence though that place and temporal
pitch cues can be combined synergistically, though the mechanism of synergy is uncertain
(Erfanian Saeedi et al., 2017; Luo et al., 2012; Rader et al., 2016; Stohl et al., 2008). Whether the
place-rate integration is a fused synergy of the two cues for a single pitch percept or a perceptual
weighting of the individual pitch dimensions for a pitch judgment, these four studies conclude
that some combination of these two cues could improve signal processing strategies opening the
window for better pitch perception in CI users.
The present study tests the primary hypothesis that combining stimulation place and rate
improves frequency discrimination beyond either cue alone. In the first experiment, frequency
discrimination needed to identify melodic contours was measured with place and rate cues,
separately and combined. The frequency allocation used in Experiment 1 was modeled after the
default allocation used on Cochlear Corporation devices. In the second experiment, a similar
place-rate paradigm was used to test frequency discrimination needed for pitch ranking, with
frequency allocation modified to improve access to low frequencies. Beyond the primary
hypothesis that frequency discrimination is better provided by place and rate cues combined, we
tested two secondary hypotheses that 1) broadness of stimulation could improve rate
discrimination and 2) that auditory neural health as measured by forward masking has an effect
on an individual’s ability to perceive changes in pitch. The results clarify how place and rate
cues combine to improve discrimination, which should inform developments in sound processing
for CIs.
114
Experiment 1: Melodic Contour Identification
General Methods
Subjects
Seven CI users participated in this study. Four bilateral users were tested in each ear separately
with the first ear tested randomly selected. All subjects were implanted with devices from
Cochlear Corporation and were tested using the USC CI Research Interface which bypasses the
clinical processor to provide precise control over stimulation parameters delivered directly
through the implant (Shannon, 2015; Shannon et al., 1990). Relevant subject information is
provided in Table 5.1. C9 had single-sided deafness until age 40 (their non-implanted ear
information was provided for reference of post-lingual hearing). Participants provided informed
consent and were paid for their participation. The University of Southern California’s
Institutional Review Board approved the study.
Table 5.1. Subject Information
Age at time of testing and age at onset of hearing loss is given in years. Duration of profound hearing loss prior to implantation is
given in years and estimated from subject interviews. SNHL=sensorineural hearing loss
ID Age
Gen
der
Ear
Tested
Etiology
Age at
Onset
Years
Impla
nted
Implant model
Proce
ssor
Duration
of
Deafness
Age at
Implan
tation
1 47 M Both Meniere's 39
L:1
R:4
L:CI532
R:CI24RE (CA)
L:N7
R:N7
L:1
R:4
L:46
R:43
2 34 F Both Unknown 15
L:11
R:7
L:CI24RE (CA)
R:CI24RE (CA)
L:N7
R:N7
L:5
R:1
L:27
R:23
3 72 F Both
Progressive
SNHL
40
L:18
R:14
L:CI24R (CS)
R:CI24RE (CA)
L:N6
R:N6
L:1
R:5
L:54
R: 58
4 58 M Both
Progressive
SNHL
Birth
L:14
R:1
L:CI24RE (CA)
R:CI532
L:N7
R:N7
L:37
R:50
L:44
R:57
5 80 M Right Noise Induced 40 10 CI24RE (CA) N6 20 70
8 70 F Right Sudden SNHL 68 1 CI1522 N7 1 68
115
9 72 M Right Unknown
L:40
R:Birth
1 CI532 N7
L:7
R:62
71
Procedure
The threshold frequency difference that allows 75% identification accuracy for melodic
contour identification was measured for place, rate, and combined place-rate cues. The primary
hypothesis focused on testing whether melodic contour identification is better conveyed by
combined place and rate of stimulation than by either cue alone. Melodic contour identification
was measured using a one-interval, nine-alternative, forced-choice procedure. The nine melodic
contours consisted of five-note patterns including “rising,” “falling,” “flat,” “rising-flat,”
“falling-flat,” “rising-falling,” “falling-rising,” “flat-rising,” and “flat-falling” (Crew et al., 2012;
Galvin et al., 2007). These nine contours of varying difficulty were presented an equal amount of
times in pseudorandom order to measure overall realistic performance in an adaptive procedure
(Galvin et al., 2007). Nine experimental conditions were tested including all combinations of
three cue types (place, rate, and combined) and three center-note frequencies (100, 200, and 400
Hz) with the closest match to these center-note frequencies, based on Western music notation,
being G2, G3, and G4. The rationale for testing such low frequencies was the similarity of place
and temporal resolution, the self-reported best frequencies for CI user’ music appreciation, and
to probe performance at ecologically relevant fundamental frequencies of voicing which cross
the range of the clinical filter bank. Conditions were repeated three times in random order.
Correct-answer feedback was provided on all trials.
For each trial within a measurement run, the amplitudes of the five notes in the contour
were randomly and independently roved between 90 and 100% (uniform distribution) of the
116
width of the subject’s dynamic range (in units of charge per phase—decibels re 1 ηCoulomb) as
fitted by the logistic function. For each trial, the frequency of the third note of the five-note
contour was roved within a quarter octave of the condition frequency; the third note did not
change in the contours, so it was chosen for roving since the note frequencies were defined
adaptively relative to the roved frequency of the third note. The purpose of frequency roving was
to add perturbations which contribute to the ecological relevance of the stimulus (e.g., music
played in different keys, vocal pitch fluctuations) while avoiding habituation to the third note
frequency. The frequency spacing between notes in the melodic contour was adaptively
controlled based on performance in this identification task. Both the adaptive ceiling and the
initial frequency spacing between notes were 100% so that the greatest possible difference
between notes would be an octave; the internote frequency spacing for identification was
decreased by a factor of √2
3
following correct answers and increased by a factor of 2 following
mistakes. This adaptive rule keeps the internote frequency spacing at a difficulty level such that
the procedure converges to 75% identification accuracy, or percent correct (Kaernbach, 1991). A
measurement run continued until 12 mistakes were made and the internote threshold was
calculated as the average frequency spacing of the last 8 reversals.
Loudness Balancing
Detection thresholds and comfortable stimulation levels were measured as a function of
stimulation rate to provide loudness balancing for procedures across electrodes and rates
(Bissmeyer et al., 2020a; Goldsworthy et al., 2021, 2022). These levels were measured in
monopolar stimulation mode using a method of adjustment. Subjects used a graphical user
interface (see Supplementary Figure 1) with sliders to control and set the threshold and comfort
levels for each of the eight stimulation rates, from 50 to 6400 Hz in octave intervals. Upon
117
adjusting the slider, the subject would hear a change in amplitude for a 400 ms pulse train
comprised of biphasic pulses with 25 s phase durations and 8 s interphase gaps. This pulse
shape was designed to provide the necessary charge for stimulation over a brief phase duration.
The chosen phase duration corresponds to typical clinical processor settings, and the maximum
amplitude was 255 clinical units as defined by Cochlear Corporation. Subjects were instructed to
adjust stimulation level for detection thresholds and for comfortable levels. The resulting
detection thresholds and comfort levels were fit with a logistic equation of the form:
𝑌 (𝑥 ) = 𝑈 −
𝑈 − 𝐿 (1 + 𝑄 𝑒 −𝐵𝑥
)
1
𝑣 ,
where 𝑈 and 𝐿 are the upper and lower limits of the subject’s dynamic range (converted from
clinical units to units of charge per phase), 𝑄 is related to the current level at 100 Hz, 𝐵 is the
rate by which the current decreases over the frequency range, 𝑥 is frequency expressed as
log2(frequency/100), and 𝑣 controls asymptotic growth. Fitted logistic equations were used to
balance loudness for all stimuli used in the experiment.
Stimuli
The internote frequency spacing needed to support melodic contour identification (MCI)
was measured using multi-electrode stimuli. Stimuli were generated by filtering pure tones
through a filter bank with output envelopes used to define place, rate, and place-rate stimulation
patterns. Melodic contours were defined as 5-note sequences of pure tones with tones defined as
200 ms sinusoids with 100 ms raised-cosine attack and release ramps. The slow attack and
release times were used to promote a gradual recruitment of neurons to avoid hyper-
(Equation 5.1)
118
synchronization to the first pulse (Carlyon & Deeks, 2013, 2015; Hughes et al., 2012b, 2014;
Hughes & Laurello, 2017). Sequences were filtered through a 4
th
-order, 22-channel, filter bank.
The filter bank used logarithmic frequency allocation with quarter-octave spacing from 200 to
6400 Hz. This logarithmic filter bank was modeled after the quasi-logarithmic frequency
allocation table used with Cochlear Corporation devices. Filtered outputs were converted to
channel envelopes using a Hilbert transform. An “N-of-M” algorithm was used to select the 3
channels with the most energy. The output envelopes were then used to control constant-rate
pulse trains comprised of pulses that were 25 s in phase duration with 8 s interphase gaps with
stimulation rate experimentally controlled to provide place, rate, and combined place-rate cues
for the frequencies used in the melodic contour. Example stimuli are shown in Figure 5.1.
Figure 5.1: Example Melodic Contour Identification Stimuli for Experimental Conditions of Place,
Rate, and Combined Place-Rate.
119
The melodic contour is “rising”, but the frequency allocation table used for the melodic contour identification task, modeled after
the frequency allocation for typical Cochlear Corporation devices, has a cutoff of 200 Hz limiting the place information at the
lower frequencies and making it look more like the “flat-rising” contour for place-of-excitation cues. The first panel shows the
condition where place of stimulation is varied, and rate is held constant at the center-note frequency. The second panel shows the
condition where rate of stimulation is varied, and place of stimulation is held constant at the center-note frequency. The third
panel shows the combined place-rate condition with both place and rate covaried for all notes.
Analyses
The primary hypothesis tested is that frequency discrimination for melodic contour
identification is better with combined place and rate of stimulation than with either cue alone.
The collected data consisted of 3 repetitions of 9 conditions (3 stimulation cue types crossed with
3 center-note frequencies). Hypotheses were tested using a two-way repeated measures analysis
of variance with stimulation cue and condition frequency as within-subject factors. All statistics
were calculated on logarithmically transformed thresholds to be consistent with the underlying
perceptual scale in frequency discrimination and the use of multiplicative (rather than additive)
steps in the adaptive logic (Micheyl et al., 2006). Planned multiple comparisons were used to
quantify the effect of cue type at each frequency. Cohen’s d was used as a measure of effect size
(Cohen, 1992).
Results
No clear trends emerged in the participants’ performance in the up-down procedure. With
12 mistakes necessary to finish a run, there were an average of 39.1 trials per run with a standard
deviation of 15.6 trials, and with the longest run being 63 trials long. Average internote
frequency spacing across all conditions including subject was 35.1% with a standard deviation of
2.5%. Figure 5.2 shows internote frequency spacing thresholds needed to support melodic
contour identification with place, rate, and combined place-rate cues. These internote frequency
spacing thresholds are a function of the percent difference from the base note frequency.
120
Frequency spacing thresholds were better with combined place and rate of stimulation than with
either cue alone. Cue type was significant (F(2,20) = 17.17, p < 0.001) with across frequency
averages of 52.8% for place, 38.6% for rate, and 22.7% for combined cues. The corresponding
comparisons of effect size were large and significant when comparing thresholds with combined
cues with either cue alone (𝑑 𝐶𝑜 ℎ𝑒𝑛
> 0.6, 𝑏𝑜𝑡 ℎ 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ). As a main effect, frequency was
not significant (F(2, 20) = 1.13, p = 0.34), reflecting that internote frequency spacing averaged
across cue type changed little with averages of 34.3% at 100 Hz, 39.9% at 200 Hz, and 34.0% at
400 Hz. Clearly, though, the salience of the cue changed with frequency, as manifested as a
significant interaction between cue type and frequency (F(4, 40) = 12.78, p < 0.001).
Figure 5.2: Internote Frequency Spacing Thresholds for Melodic Contour Identification
Internote frequency spacing thresholds for melodic contour identification (MCI) as a function of center-note frequency. Symbols
indicate thresholds averaged across subjects with shaded error bars showing standard errors of the means.
121
Figure 5.2 shows the trade-off in performance between place and rate with poor place
resolution at lower frequencies and worsening rate resolution at higher frequencies. The poor
place resolution reflects the reduced filter spacing near 100 and 200 Hz. The worsening rate
resolution at 400 Hz is balanced by an improved place resolution at that frequency. This trade-off
allows for the flat performance from combined place-rate cues as a function of frequency.
Planned multiple comparisons were conducted to test the hypothesis that the combined cue
performance was better than either cue alone at each condition frequency. This analysis indicated
that only for the 200 Hz condition was the combined cue performance significantly better than
either cue alone (place, p < 0.001; rate, p = 0.0135); for the 100 and 400 Hz conditions, the
combined cue performance was not significantly better than the stronger cue (rate, p = 0.72 at
100 Hz; place, p = 0.21 at 400 Hz). So, while performance with combined cues was better than
either cue alone as a main effect, performance was often driven by the stronger of the two cues.
122
Figure 5.3: Individual Internote Frequency Spacing Thresholds for Melodic Contour Identification
Individual internote frequency spacing thresholds for melodic contour as a function of center-note frequency. Symbols indicate
internote frequency spacing thresholds averaged across repetitions.
Figure 5.3 shows individual performance on melodic contour identification. Most
subjects were able to perform this task but with substantial variability across subjects and even
across ears in bilateral implant users. Only two subjects had a consistent benefit from combined
cues for all frequencies (2R and 9). These results provide insight into individual differences
using combined cues for melodic contour identification; for example, some implant users
received a combined benefit at 400 Hz at a frequency where the ability to use rate for melodic
contour identification is relatively poor (2L, 2R, 5, and 9), while one subject appeared to be
confounded by poor rate resolution as combined performance was poorer than for the place cue
alone at 400 Hz (1R). Most subjects at 400 Hz had combined cue performance consistent with
their performance with place cues alone (1L, 3L, 3R, 4L, 4R, and 8). For the bilateral subjects
(1-4), Subjects 3 and 4 exhibited markedly different performance between ears, with one ear
performing relatively well and the other ear performing near ceiling, while Subjects 1 and 2
demonstrated similar performance between their respective ears.
Experiment 2: Frequency Discrimination
General Methods
Subjects
The same subjects were tested as in Experiment 1.
123
Procedure
Frequency discrimination was measured using a two-interval, two-alternative, forced-
choice procedure in which subjects were asked which interval was higher in pitch. The condition
frequencies were 100, 200, 400, 800, and 1600 Hz for single and multi-electrode stimulation.
The primary hypothesis focused on testing whether frequency discrimination is better provided
by combined place and rate of stimulation than by either cue alone. This was tested with place,
rate, and place-rate stimuli, with the focus of comparing place and rate separately to the
combined place-rate stimulation. There were 15 multi-electrode conditions comprised of all
combinations of the 3 types of stimuli (place, rate, and combined place-rate) at the 5 test
frequencies. To explore the secondary hypothesis of broad stimulation improving rate
discrimination, 5 single-electrode conditions with rate only for the 5 test frequencies were tested
to be compared to the multi-electrode rate stimulus at the 5 test frequencies. Conditions were
repeated three times in random order. Correct-answer feedback was provided on all trials.
For each trial within a measurement run, the amplitudes of the standard and target were
randomly and independently roved in the same manner as Experiment 1. For each trial, the
frequency of the standard was roved within a quarter octave of the condition frequency; the
target frequency was defined adaptively higher relative to the roved standard frequency. The
initial difference that the target frequency was higher than the standard frequency was 64% with
an adaptive ceiling of 128% frequency difference. The difference for discrimination was
decreased by a factor of √2
3
after correct answers and increased by a factor of 2 after mistakes.
This adaptive rule keeps the frequency spacing for discrimination at a difficulty level such that
the procedure which converges to 75% percent correct, Kaernbach, 1991). The procedure
124
continued until the participant made 10 mistakes and the discrimination threshold was calculated
as the average of the last 8 reversals.
Loudness Balancing
The detection threshold and comfort levels from Experiment 1 were used to balance
loudness in the same manner.
Stimuli
Frequency discrimination for pitch ranking was measured for loudness balanced single
and multi-electrode stimuli. Stimuli were created as described for Experiment 1 but with key
differences meant to improve place resolution at the lower frequencies. This was done by
filtering a pure tone through a filter bank and using the output envelopes to scale constant-rate
pulse trains. The pure tones were 400 ms sinusoids with 200 ms raised-cosine attack and release
ramps. Tones were filtered through a 22-channel filter bank comprised of second-order filters
logarithmically spaced one-third octave apart with center frequencies from 50 to 6400 Hz. This
filter spacing was modified from the frequency allocation similar to that which is used with
Cochlear Corporation devices to provide better place coding of frequencies below 200 Hz. Since
participants were given no acclimation period to these programming changes, we chose to use a
simple pitch ranking task to measure frequency discrimination. Filtered outputs were converted
to channel envelopes using a Hilbert transform. A second processing difference from Experiment
1 was that the “N-of-M” algorithm was used to select the 5 channels (for the multi-electrode
conditions), rather than 3 channels, with the most energy to explore the potential benefit of
broader stimulation. Similar to Experiment 1, these envelopes were used to modulate constant-
rate pulse trains comprised of pulses that were 25 s in phase duration with 8 s interphase gaps.
125
The rate of the constant-rate pulse trains was experimentally controlled depending on the
condition. For the single-electrode rate only condition, the channel with the most peak energy
was used for stimulation. Example stimuli are shown in Figure 5.4.
Figure 5.4: Example Frequency Discrimination Stimuli for Experimental Conditions of Place, Rate,
and Combined Place-Rate
The first panel shows the condition where place of stimulation is varied from 100 to 200 Hz and rate is held constant at the base
frequency of 100 Hz for both the standard and target stimuli. The second panel shows the condition where rate of stimulation is
varied from 100 to 200 Hz and place of stimulation is held constant at the base frequency of 100 Hz for both the standard and
target stimuli. The third panel shows the combined place-rate condition with both place and rate covaried from 100 Hz for the
standard stimulus to 200 Hz for the target stimuli.
Analyses
The primary hypothesis tested is that frequency discrimination is better provided by
combined place and rate of stimulation than by either cue alone. Each subject completed 3
repetitions of 15 conditions consisting of every combination of stimulation cue (place, rate,
combined) and condition frequency (100, 200, 400, 800, 1600 Hz). A two-way repeated
measures analysis of variance (ANOVA) with interactions was conducted with cue type and
126
frequency as within-subject factors. All statistics were calculated on logarithmically transformed
thresholds (Micheyl et al., 2006). Planned multiple comparisons were conducted to examine the
effect of cue type at each frequency. Cohen’s d was used as a measure of effect size (Cohen,
1992).
A secondary hypothesis tested is that consistent stimulation rates provided on multiple
electrodes can improve rate discrimination over that with single-electrode stimulation, with the
rationale that consistent rates on multiple electrodes could improve rate discrimination. Each
subject completed 3 repetitions of stimulation rate discrimination for 10 conditions consisting of
2 stimulation configurations (single, multi) and 5 condition frequencies (100, 200, 400, 800,
1600 Hz). A two-way repeated measures ANOVA was conducted with stimulation configuration
and frequency as within-subject factors. Planned multiple comparisons were conducted to test
the effect of stimulation configuration at each frequency.
Correlation analysis was conducted between the measures of frequency discrimination
with forward-masked thresholds reported in a previous study (Bissmeyer et al., 2020a). Forward-
masked thresholds were measured as the probe detection threshold on a set of electrodes, 0,1, 2,
and 4 electrodes away from the masker, presented at a comfortable level. The metric of
frequency tuning based on forward-masked detection was calculated as the slope of the
thresholds across these electrodes. Five of the participants from the present study (4 of whom
were bilateral) took part in Bissmeyer et al., 2020, and the reported metric of frequency tuning
based on forward-masked detection was tested for correlation with the frequency discrimination
thresholds measured by pitch ranking reported in the present study. The hypothesis was that an
individual’s ability to discriminate pitch would be affected their auditory neural health, as
measured by forward masking. Correlation analysis was conducted between monopolar forward-
127
masked thresholds averaged across apical and basal testing sites and frequency discrimination
thresholds averaged across frequency.
Results
No clear trends emerged in the participants’ performance in the up-down procedure. With
10 mistakes necessary to complete a run, there were an average of 40.1 trials per run with a
standard deviation of 10.7 trials, and with the longest run being 60 trials long. Average frequency
discrimination across all conditions including subject was 15.4% with a standard deviation of
3.2%. Figure 5.5 shows the benefit of combining place and rate cues compared to place or rate
cues alone. Average discrimination was better with combined place and rate cues than with
either cue alone (F(2,20) = 26.91, p < 0.001). The grand means for stimulation cue averaged across
frequency were 18.4% for place, 19.6% for rate, and 9.0% for the combined cue conditions. This
benefit of the combined cue condition was large and significant when compared to place or rate
alone (𝑑 𝐶𝑜 ℎ𝑒𝑛
> 0.7, 𝑏𝑜𝑡 ℎ 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ).
As shown in Figure 5.5, rate discrimination thresholds exhibit the characteristic trend of
worsening for higher rates. In contrast, place discrimination is relatively flat but with an average
best performance near 400 Hz, which corresponds to a location near electrode 8 for the
frequency allocation used in this experiment. Discrimination for the combined cue condition
tracks the better of the two cues with a significant and synergistic improvement measured for the
100, 200, and 400 Hz conditions. These observations were statistically confirmed with a clear
effect of frequency on discrimination thresholds with worsening thresholds for higher
frequencies (F(4,40) = 22.18, p < 0.001), and there was a significant interaction between
stimulation cue and frequency (F(8,80) = 14.79, p < 0.001).
128
Planned multiple comparisons were calculated to test the hypothesis that the combined
cue would provide better discrimination over either cue alone for each frequency. The multiple
comparisons test was conducted with Fisher’s least significant difference. Measured
discrimination thresholds were significantly better for the combined cue than for either cue alone
for the 100 (place, p < 0.001; rate, p = 0.02), 200 (place, p < 0.001), and 400 (place, p = 0.047;
rate, p = 0.0055) Hz conditions, except for rate discrimination at 200 Hz not reaching
significance (p = 0.074). The effect sizes of these comparisons were large (𝑑 𝐶𝑜 ℎ𝑒𝑛
> 0.4). For
the 800 and 1600 Hz conditions, rate discrimination was significantly worse than for the
combined cue condition (p < 0.001), and place discrimination was not significantly different
from the combined cue condition (p = 0.38 and p = 0.59, respectively). Place and rate
discrimination were significantly different for all frequencies (p < 0.01) except for 400 Hz (p =
0.14), with the stronger cue switching between 200 and 400 Hz.
129
Figure 5.5: Frequency Discrimination Thresholds with Multi-Electrode Stimuli
Frequency discrimination with multi-electrode stimuli averaged across participants for the factors of stimulation cue and
frequency with shaded error bars showing standard errors of the means.
Figure 5.6 plots individual discrimination demonstrating that performance is highly
variable. For rate cues, some implant users struggle above 200 Hz (e.g., 4L), while others have
discrimination resolution better than 10% for frequencies up to 800 Hz (e.g., 1R). For place cues,
some implant users struggle with electrode discrimination and their performance is consistently
poor (e.g., 3R), while others are consistently flat hovering around 15% discrimination (e.g., 1L).
These results provide insight into individual benefit from combined cues; for example, some
implant users receive a benefit at 1600 Hz at a frequency where rate discrimination is relatively
poor (3L, 3R, 4L, 5, and 9), while others appear to be confounded by the poor rate cue and
combined performance is worse for the place cue alone at 1600 Hz (1R, 2L, 4R, and 8).
Considering differences across ears within the same participant, Subject 1L had a place-
rate benefit from 200 to 800 Hz over either cue alone while 1R did not have a significant place-
rate benefit for any frequency. Subject 4 is the only bilateral user who had similar performance
across ears and, interestingly, had poor use of place and rate cues at higher frequencies. Each
participant, and sometimes the same participant across ears, receive varying benefits from
different cues.
130
Figure 5.6: Individual Frequency Discrimination as a Function of Frequency
Symbols show discrimination thresholds averaged across repetitions for each cue type.
In Figure 5.7, we explore the secondary hypothesis of whether rate discrimination is
better with multi-electrode than with single-electrode stimulation. The results show that the
effect of stimulation configuration was significant (F(1,10) = 22.2, p < 0.001), with single-
electrode rate discrimination averaged across frequency (14.2%) significantly better than multi-
electrode rate discrimination (19.6%) (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.27). The effect of stimulation rate was
significant (F(4,40) = 32.1, p < 0.001), reflecting the well-established deterioration of
discrimination for increasing rates. The interaction between stimulation configuration and rate
was not significant (F(4,40) = 1.6, p = 0.19), reflecting the similar trend as a function of frequency
after adjusting for mean differences.
Planned multiple comparisons were calculated to test the significance of stimulation
configuration for each frequency with Fisher’s least significant difference. Measured
131
discrimination thresholds were significantly better with single-electrode than with multi-
electrode stimulation for the 100 (p = 0.004), 400 (p = 0.038), and 800 (p = 0.0025) Hz
conditions. The effect sizes of these comparisons were large (𝑑 𝐶𝑜 ℎ𝑒𝑛
> 0.4, 𝑎𝑙𝑙 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ).
Figure 5.7: Single and Multi-electrode Rate Discrimination as a Function of Frequency
Symbols indicate discrimination thresholds averaged across subjects with shaded error bars indicating standard errors of the
means.
Forward-masked detection thresholds were examined to test the hypothesis that
degradations in frequency tuning—reflecting electrode-neural geometry, local neural health, and
tonotopic pitch associated with different places of excitation—affect temporal and tonotopic
pitch mechanisms (Bierer & Faulkner, 2010b; Bissmeyer et al., 2020a; McKay, 2012; Zhou,
2016; Zhou et al., 2019). Figure 5.8 shows the correlation of forward-masked slopes, see
methods in Bissmeyer and colleagues (2020), with frequency discrimination for the subset of 5
overlapping subjects in the present study (1-5) for all stimulation cues (place, rate, place-rate,
132
and single-electrode rate). Frequency discrimination based on place cues yielded a weakly
significant positive correlation (p = 0.053) indicating better than average place discrimination for
steeper than average forward-masked slopes. The correlations for rate (p = 0.008), place-rate (p =
0.017), and single-electrode rate (p = 0.004) were significant, likewise indicating better
discrimination for steeper forward-masked slopes. The consistent trend across correlations was
that steeper forward-masked slopes, or sharper frequency tuning, correlated with better than
average frequency discrimination.
Figure 5.8: Correlations between Forward Masking and Frequency Discrimination
Correlations between forward-masked threshold slopes normalized by the subtraction of the average with frequency
discrimination thresholds for subjects 1 through 5 for the stimulation cues of place, rate, place-rate, and single-electrode rate.
A correlation between performance on Experiment 1 and Experiment 2 was done to
explore whether those who performed better at simple frequency discrimination also performed
133
better at melodic contour identification. The position correlation indicating better performance
on one task was predictive of the other held up for the place cue only (p = 0.062) albeit
insignificantly, rate cue only (p = 0.037), combined place-rate cue (p = 0.029), and performance
averaged across the cue conditions (p = 0.011). Correlations were also explored for place cue vs
rate cue performance for both experiments to explore any individualized preference for tonotopic
vs temporal cues. Albeit insignificant, a positive trending correlation was found with subjects
performing better at tonotopic cues also performing better with temporal cues for both frequency
discrimination (p = 0.25) and melodic contour identification (p = 0.21).
The performance at both experiments was then correlated with age, duration of deafness
before implantation and duration of CI experience. Figure 5.9 shows the 2 correlations which
reached or neared significance, as well as the corresponding pairs to these correlations which did
not. Melodic contour identification based on rate cues yielded a significant positive correlation
with duration of CI experience (p = 0.034) indicating better than average place discrimination for
steeper than average forward-masked slopes. Frequency discrimination based on rate cues
reached a near significant positive correlation with duration of deafness before implantation (p =
0.058). Neither melodic contour identification correlated with duration of deafness before
implantation (p = 0.63) nor frequency discrimination correlated with duration of CI experience (p
= 0.39) reached significance but were plotted to demonstrate the positive but insignificant pairing
to the significant correlations. The consistent trend across correlations was that rate
discrimination is better for shorter duration of deafness before implantation and for shorter
duration of CI experience. One possibility is that those who have longer durations of
implantation may be less sensitive to temporal cues since the processor does not encode temporal
fine structure.
134
Figure 5.9: Correlations between Rate Discrimination and Individual Metrics of Hearing Loss and
CI Experiences
Correlations between rate discrimination, as measured by melodic contour and simple frequency discrimination, and the metrics
of duration of deafness before implantation and CI experience.
Discussion
The primary hypothesis tested by the experiments described here is that coordinated use
of place and rate of stimulation can enhance frequency discrimination for CI users. This
hypothesis was substantiated in both experiments with significant improvements observed with
combined place and rate of stimulation. That coordinated use of place and rate can improve basic
frequency discrimination as well as melodic contour identification motivates careful
consideration of how these cues are provided by CIs. A secondary hypothesis was tested in
Experiment 2, that multi-electrode stimulation provides better access to stimulation rate cues
compared to single-electrode stimulation. The evidence indicates the contrary, that single-
135
electrode stimulation provides better rate discrimination. Discussion focuses on the clinical
implications of these two findings.
Coordinated Place and Rate of Stimulation for CIs
CI users hear a sense of pitch associated with both place and rate of stimulation. Place of
stimulation makes use of the basic tonotopy of the auditory system with more deeply implanted
electrodes typically evoking lower pitch percepts. In clinical programming, the way acoustic
frequency is allocated to electrodes is flexible. Different manufacturers use different rules for
frequency allocation and audiologists may tailor allocation for individuals. The default frequency
allocation for Cochlear Corporation devices uses a lower frequency edge of 188 Hz with center
frequencies of filters spaced 125 Hz apart until the middle of the array at which point
transitioning to quasi-logarithmic spacing. With such spacing, only the most apical electrode is
allocated to the region representing the typical range of fundamental frequencies of spoken
speech in adults. The default frequency allocation for AB devices uses logarithmic spacing with
a lower frequency edge of 333 Hz. A rationale for providing little or no frequency allocation
below 333 Hz is that fundamental frequencies of speech will manifest in the temporal envelopes
extracted from each band. However, few studies have considered the extent that a dense
frequency allocation in the range of fundamental frequencies for spoken speech might improve
pitch perception (Geurts & Wouters, 2004).
In the present study, Experiment 1 considered frequency allocation similar to Cochlear
Corporation devices, while Experiment 2 considered a denser frequency allocation with
logarithmically spaced filters from 50 to 6400 Hz with one-third octave spacing providing more
resolution in the lower frequencies. With the spacing in Experiment 2, participants could, on
average, discriminate pitch changes of about 15% based on changes in place of stimulation. This
136
is remarkable since the experiment was a short-term experiment without familiarization to this
cue. The trade-off that must be considered, though, is the extent that increasing the density of
allocation to low frequencies in the voice pitch range reduces the density of allocation of higher
frequencies in the range of formant frequencies. It is difficult to explore this trade-off because
frequency allocation is such a basic element of CI programming that modifying it can require
months to adjust to depending on the extent of the changes. Longitudinal systematic studies of
allocation are needed.
Stimulation rate, whether as modulation rate or as variable pulse rate, also evokes a
consistent sense of pitch for CI users. Most CIs use modulation rates of constant-rate pulsatile
stimulation to convey periodicity cues for pitch, though some strategies use stimulation that is
triggered by phase locking stimulation to the temporal fine structure of sound in each frequency
band (Arnoldner et al., 2007; R. J. M. van Hoesel & Tyler, 2003; Wouters et al., 2015). The
provision of this temporal information does not covary with place of stimulation in existing CIs.
Specifically, the place and rate of stimulation is not coordinated such that higher frequencies
cause both an increase in modulation rate and a basal shift in stimulation place (Arnoldner et al.,
2007; Riss et al., 2014). Instead, the rate of stimulation is like that of NH when listening to
unresolved harmonics, where only temporal cues are available for pitch (B. C. J. Moore &
Carlyon, 2005; Swanson et al., 2019). Evidence clearly indicates that pitch resolution is better
provided in NH when covarying place and rate cues are provided for low-numbered,
tonotopically resolved, harmonics (Bernstein & Oxenham, 2006; Houtsma & Smurzynski, 1990;
Kaernbach & Bering, 2001; Shackleton & Carlyon, 1994).
The results of the present experiments indicate that stimulation rate provides a robust cue
for detecting pitch changes at least up to 400 Hz. Discrimination of pitch changes based on
137
stimulation rate was, on average, better than 10% when tested near 100 and 200 Hz but degraded
to about 20% near 400 Hz. The combined use of place and rate of stimulation provided better
frequency discrimination than either cue alone for these frequencies; however, discrimination
with the combined cue was generally only marginally better than with the stronger of the two
cues. This finding suggests that optimal encoding of place and rate cues would benefit from
detailed and individualized characterization of cue strength. Such an optimization might follow
the approach presented here but with familiarization to the jointly encoded place-rate cues. The
familiarization process is important since there is clear evidence of rehabilitative plasticity
associated with both place and rate of stimulation (Goldsworthy & Shannon, 2014; Reiss et al.,
2014).
Does Broad Stimulation Provide Better Access to Rate Pitch Cues?
The present study included a component in Experiment 2 that directly compared rate
discrimination with multi-electrode and single-electrode stimulation. Contrary to the argument
for multi-electrode stimulation, the results presented here indicate a small but significant
advantage for single-electrode stimulation. Our interpretation of this finding is that single-
electrode stimulation is temporally more precise since it avoids the smearing of temporal
information that necessarily must occur with Cochlear Corporation devices, which require a 12
µs delay between pulses across electrodes (Boulet et al., 2016). Stimulation used in the present
study was presented base to apex, so would have been grossly consistent with physiological
compensatory mechanisms for delay, but, in the described experiment, delays were not tailored
to cochlear delays of characteristic frequencies estimated from electrode positions. It is possible
that the sense of pitch provided by stimulation rate using multiple electrodes could be optimized
by tailoring the stimulus delay either psychophysically or by physiological estimates. This,
138
however, is speculation and it may well be that the physiological compensatory mechanisms may
not exist for CI users who do not receiving the traveling wave through their processor.
We further postulate that any need for stimulating broad regions of the auditory nerve to
provide sufficient across fiber excitations for upstream decoding to take place is already
provided by the broad stimulation patterns that occur for a single-electrode using monopolar
stimulation (Middlebrooks & Snyder, 2010). This is supported by the observed correlation of
forward-masked threshold slopes with frequency discrimination indicating better place and rate
discrimination with narrower fields of stimulation as quantified by steeper forward-masked
slopes. That better rate discrimination was positively correlated with steeper forward-masked
slopes suggests that both single and multi-electrode stimulation are broad enough to provide
across fiber comparisons for upstream decoding, with narrower stimulation providing an
advantage because it avoids unnecessary temporal smearing. That better place discrimination
was positively correlated with steeper forward-masked slopes suggests that place pitch
judgments partially depend on comparisons of the overall excitation pattern and not simply the
centroid of the response. The small but consistent benefit for single-electrode compared to multi-
electrode stimulation for rate discrimination highlights how a relatively narrow field of
stimulation may provide better frequency access to both place and rate cues for CI users.
Conclusions
Two experiments were described that examined the sense of pitch conveyed by electrode
position and stimulation rate, separately and combined, for CI users. Results indicate that
frequency discrimination was generally better with place and rate cues combined than with either
cue alone; however, resolution was often dominated by the stronger of the two cues. A
139
synergistic benefit of combined cues was measured up to 400 Hz for the simple frequency
discrimination task. It remains unknown to what extent covarying stimulation place and rate in
clinical devices could lead to long-term benefits after optimizing frequency allocation and
providing familiarization to the newly encoded information.
140
Chapter 6: The Effect of Stimulation Rate Training on
Cochlear Implant Frequency Discrimination
Introduction
Cochlear implants (CIs) have been widely successful at restoring partial hearing to 1
million deaf individuals (Zeng, 2022), but there are well-known deficiencies related to speech
recognition in noise and pitch/music perception (Bruns et al., 2016; A. Caldwell & Nittrouer,
2013; Deroche et al., 2014; do Nascimento & Bevilacqua, 2005; Q.-J. Fu & Nogaki, 2005;
Gfeller et al., 2000; Gilbers et al., 2015; Luo et al., 2007). These both are largely affected by the
frequency cues provided to CI users (Arnoldner et al., 2007; Heng et al., 2011; Müller et al.,
2012; Smith et al., 2002; A. E. Vandali et al., 2005; Vermeire et al., 2010). While the tonotopic,
or the place-frequency map, (Clopton et al., 1974; Fekete et al., 1984; Liberman, 1982; Muniak
et al., 2016; Ryugo & May, 1993) and temporal phase-locked (van den Honert & Stypulkowski,
1987; Dynes & Delgutte, 1992; Dreyer & Delgutte, 2006; Hill et al., 1989; Shepherd & Javel,
1997; Rose et al., 1967; Palmer & Russell, 1986; Heinz et al., 2001; Verschooten et al., 2019)
frequency cues are inseparable in NH, they can be independent conveyed by CIs. There is debate
regarding the contributions and synergy of these cues (Attneave et al., 1971; Bernstein &
Oxenham, 2006; Carlyon et al., 2012; Luo et al., 2012; McKay et al., 2000; Oxenham, 2013;
Oxenham et al., 2004, 2011; Palmer & Russell, 1986; Rose et al., 1967; Verschooten et al.,
2019), providing theoretical and practical motivations to measuring their individual contributions
to pitch judgments (Arnoldner et al., 2007; Laneau et al., 2004; Litvak et al., 2003; Oxenham et
al., 2004; Shannon et al., 2004; Smith et al., 2002; Vermeire et al., 2010; B. S. Wilson et al.,
2004).
141
The smallest discriminable difference in pitch between two frequencies is often measured
as a discrimination threshold (% difference from the base frequency for the present study), with a
semitone difference in Western musical notation being 5.95%. For place pitch driven by the
tonotopic progression along electrodes (Nelson et al., 1995; Tong & Clark, 1985), a single
electrode will often have a quarter to one-third octave filter bandwidth with around 3-4
semitones allocated to each electrode (or 18.9-26% discrimination threshold for discriminating
between single electrodes). Pure tone frequency discrimination through the clinical processor,
which generally relies on place-of-excitation cues, averages around 10%, an order of magnitude
worse than NH (Goldsworthy, 2015; Goldsworthy et al., 2013; Pretorius & Hanekom, 2008).
Computer-controlled electrode psychophysics bypass the clinical processor allowing specific
place cues to be provided. Pairs of electrodes simultaneously stimulated or closely interleaved
provide intermediate place cue percepts (Kwon & van den Honert, 2006; Landsberger &
Srinivasan, 2009; Macherey & Carlyon, 2010; H. J. McDermott & McKay, 1994; Srinivasan et
al., 2012) which allow for discrimination differences of less than 1 electrode (Kenway et al.,
2015; Laneau & Wouters, 2004; Townshend et al., 1987a).
Studies have examined the contribution of temporal cues to the perception of pitch
finding a weaker but present model (Houtsma & Smurzynski, 1990; Kaernbach & Bering, 2001;
Shackleton & Carlyon, 1994). Harmonic complex frequency discrimination through the clinical
processor averages varies between 5 to 30%, much worse than the 0.1 to 5% frequency
resolution observed in NH (Goldsworthy, 2015; Goldsworthy et al., 2013; Luo et al., 2019;
Micheyl et al., 2006). The extent that this poor resolution is caused by degradation of tonotopic
relative to temporal cues is unknown (Swanson et al., 2019). Studies bypassing clinical
processing to test rate discrimination directly generally conclude that the temporal pitch
142
mechanism is weak and unusable above 300 Hz (Carlyon et al., 2010; Laneau et al., 2004;
Macherey & Carlyon, 2014; H. J. McDermott & McKay, 1997; McKay et al., 2000; Shannon,
1983; Tong et al., 1982; Tong & Clark, 1985; Zeng, 2002).
Historically, sound processing for cochlear implants has limited temporal cues to
modulation frequencies less than 300 Hz (Shannon et al., 2004; Wouters et al., 2015). Limiting
stimulation timing in such a manner discards temporal fine structure (B. S. Wilson et al., 2004),
which if preserved, and provided inseparably with place cues as in NH, might improve pitch
perception and speech comprehension in noise for CI users (Arnoldner et al., 2007; Heng et al.,
2011; Müller et al., 2012; Smith et al., 2002; A. E. Vandali et al., 2005; Vermeire et al., 2010).
The combination of psychophysical place and rate cues for pitch perception has been explored
with some evidence of an integration of place and rate for a combined pitch percept, though the
mechanism of such integration is uncertain, with marked saturation at higher frequencies
(Bissmeyer & Goldsworthy, 2022; Erfanian Saeedi et al., 2017; Fearn & Wolfe, 2000; Luo et al.,
2012; Rader et al., 2016; Stohl et al., 2008; Swanson et al., 2019). Other studies found that the
pitch percepts related to place and rate combinations did not appear to have fusion or
proportionate contributions (Landsberger et al., 2016, 2018; Macherey et al., 2011; McKay et al.,
2000; Rader et al., 2016; Tong et al., 1983). Whether place-rate integration is a fused synergy of
the two cues for a single pitch percept or a perceptual weighting of the individual dimensions for
a pitch judgment, studies conclude that some combination of these two cues could improve
signal processing strategies opening the window for better pitch perception in CI users (Erfanian
Saeedi et al., 2017; Luo et al., 2012; Rader et al., 2016; Stohl et al., 2008).
Since many clinical processors poorly encode temporal cues, it is possible that
experienced may be required for stimulation rate to contribute to a combined pitch percept. The
143
few studies exploring stimulation rate training point to the improved ability to use stimulation
rate for simple pitch ranking (Bissmeyer et al., 2020b; Goldsworthy & Shannon, 2014). The
effect of stimulation rate training on more complicated pitch judgment tasks or to supplement
pitch ranking in the presence of place-frequency cues has not been explored. Motivated by the
limited use of stimulation rate in CI signal processing and the evidence that stimulation rate pitch
ranking sensitivity improves with experience, the present study considers perceptual learning of
stimulation rate to improve access to the psychophysical cues that could support frequency
discrimination in CI users.
The present study tests the primary hypothesis that training at stimulation rate can
improve frequency discrimination for both stimulation rate alone and the combination of
stimulation place and rate. Performance was assessed before and after stimulation rate training
with a battery of electrode and acoustic psychophysical tasks. The electrode psychophysical
tasks explored were frequency discrimination, loudness intensity discrimination, voice pitch
discrimination of synthetic vowels, melodic contour identification, and rate and place pitch
matching. The acoustic psychophysical tasks explored were pure tone frequency discrimination,
fundamental frequency discrimination, melodic contour identification, consonant identification,
vowel identification, sentence completion in noise, and loudness intensity discrimination. The
results clarify how stimulation rate training contributes to the combined use of place and rate
cues for frequency discrimination, which should inform developments in sound processing for
CIs.
Methods
144
Subjects
Two bilateral CI users with devices from Cochlear Corporation participated in this
remote training study. These subjects were tested in each ear separately with the trained ear
randomly selected. All subjects were tested and trained with electrode psychophysics using the
Nucleus Implant Communicator (NIC) 4.1 System which uses a research processor to provide
precise control over stimulation parameters delivered directly through the implant (Litovsky et
al., 2017). The acoustic psychophysical assessments delivered through the subject’s personal
clinical processor were administered on the Team Hearing website coded in JavaScript. A
permalink for this experiment can be found at: https://www.teamhearing.org/90, after entering
the site, press the “Studies” button to enter the experiment. Relevant subject information is
provided in Table 6.1. Participants provided informed consent and were paid for their
participation. The University of Southern California’s Institutional Review Board approved the
study.
145
Table 6.1: Subject Information
Overview of Protocol
Subjects participated in a 4-week stimulation rate training protocol with a battery of
electrode and acoustic psychophysical assessment tasks administered before and after training.
The electrode psychophysical assessment tasks were frequency discrimination, loudness intensity
discrimination, voice pitch discrimination of synthetic vowels, melodic contour identification,
and pitch matching of place and rate. The frequencies cues provided for each task were
determined based on task length/difficulty and are individually reported in the detailed methods
below. A modified frequency allocation table was used to improve access to low-frequency place
cues (Bissmeyer & Goldsworthy, 2022). The acoustic psychophysical assessment tasks were
pure tone frequency discrimination, fundamental frequency discrimination, melodic contour
identification, consonant identification, vowel identification, sentence completion in noise, and
loudness intensity discrimination. The cues provided for these acoustic psychophysical tasks
were based on the clinical processor settings for Cochlear Corporation (Swanson et al., 2019;
Su
bj
ec
t
Ag
e
G
en
de
r
Ear
Tes
ted
Ear
Trai
ned
Etiolo
gy
Age
at
Ons
et
Years
Impla
nted
Implant
model
Pro
cess
or
Durati
on of
Deafne
ss
Age
at
Impl
antat
ion
Method
of
Streami
ng
C
2
37 F
Bot
h
Left
Unkno
wn
15
L:11
R:7
L:CI24RE
(CA)
R:CI24RE
(CA)
L:N
7
R:N
7
L:5
R:1
L:27
R:23
Mini
Mic2
C
3
77 F
Bot
h
Righ
t
Progre
ssive
SNHL
40
L:18
R:14
L:CI24R
(CS)
R:CI24RE
(CA)
L:N
6
R:N
6
L:1
R:5
L:54
R:58
Cochlear
Binaural
Cable
146
(Equation 6.1)
Wouters et al., 2015). All pre- and post-training assessments were delivered without correct-
answer feedback. Correct-answer feedback was only provided for the stimulation rate training
task. All adaptive procedures had an adaptive rule which converged to 75% identification or
discrimination accuracy, depending on the task (Kaernbach, 1991).
Electrode Psychophysical Loudness Balancing
Detection thresholds and comfortable stimulation levels were measured as a function of
stimulation rate to provide loudness balancing for procedures across electrodes and rates
(Bissmeyer et al., 2020b; Bissmeyer & Goldsworthy, 2022; Goldsworthy et al., 2021, 2022).
These levels were measured in monopolar stimulation mode using a method of adjustment.
Subjects used a graphical user interface (see Supplementary Figure D.1) with sliders to control
and set the threshold and comfort levels for each of the eight stimulation rates, from 55 to 7040
Hz in octave intervals. Upon adjusting the slider, the subject would hear a change in amplitude
for a 400 ms pulse train comprised of biphasic pulses with 25 s phase durations and 8 s
interphase gaps. This pulse shape was designed to provide the necessary charge for stimulation
over a brief phase duration. The chosen phase duration corresponds to typical clinical processor
settings, and the maximum amplitude was 255 clinical units as defined by Cochlear Corporation.
Subjects were instructed to adjust stimulation level for detection
thresholds and for comfortable levels. The resulting detection thresholds
and comfort levels were fit with a logistic equation of the form:
𝑌 (𝑥 ) = 𝑈 −
𝑈 − 𝐿 (1 + 𝑄 𝑒 −𝐵𝑥
)
1
𝑣 ,
147
where 𝑈 and 𝐿 are the upper and lower limits of the subject’s dynamic range (converted from
clinical units to units of charge per phase), 𝑄 is related to the current level at 100 Hz, 𝐵 is the
rate by which the current decreases over the frequency range, 𝑥 is frequency expressed as
log2(frequency/100), and 𝑣 controls asymptotic growth. Fitted logistic equations were used to
balance loudness for all electrode psychophysical stimuli used in the experiment.
Electrode Psychophysical Stimuli
Loudness balanced dual-electrode stimuli were used for all electrode psychophysical
procedures to create a virtual channel with biphasic pulses delivered sequentially from base to
apex. All stimuli were generated by filtering 400 ms pure tone sinusoids with 40 ms raised-
cosine attack and release ramps through a 22-channel filter bank comprised of second-order
filters logarithmically spaced one-third octave apart with center frequencies from 55 to 7040 Hz.
This filter spacing was modified from the default Cochlear Corporation frequency allocation
table to provide better place coding of frequencies below 200 Hz (Bissmeyer & Goldsworthy,
2022). Filtered outputs were converted to channel envelopes using a Hilbert transform. These
envelopes were used to modulate constant-rate pulse trains comprised of pulses that were 25 s
in phase duration with 8 s interphase gaps. The rate and place of the constant-rate pulse trains
were experimentally controlled depending on the condition. The vowel procedure had 4 dual-
electrode pairs representing the fundamental and first 3 formant frequencies. All other
procedures tested with only one dual-electrode pairing.
Training Procedure
Psychophysical training of stimulation rate discrimination was conducted using dual-
electrode rate discrimination procedures. The training was conducted using a two-interval, two-
148
alternative, forced-choice procedure in which the stimulation rate difference between the
standard and target intervals was held constant but the base rate was adaptively increased to
provide training at increasingly higher stimulation rates (Bissmeyer et al., 2020b; Goldsworthy &
Shannon, 2014). The standard and target stimuli were 400 ms pulse trains comprised of biphasic
pulses with 25 s phase durations and 8 s interphase gaps. Stimulation current levels were
controlled and loudness balanced using the fitted logistic functions to detection thresholds and
comfort levels. For each trial within a measurement run, the amplitudes of the standard and
target were randomly and independently roved between 90 and 100% (uniform distribution) of
the width of the subject’s dynamic range (in units of charge per phase—decibels re 1 ηCoulomb)
as fitted by the logistic function. Stimulation rates of the standard interval were constrained
between 55 and 1760 Hz. The initial value of the standard rate was 55 Hz, and the target
stimulation rate was specified based on the stimulation rate difference training level (e.g., initial
target rate for the 1
st
level—100% difference is 110 Hz). The stimulation rate difference training
levels were 100%, 80%, 60%, 40%, 20%, 10%, and 5%. The subject was required to reach a
1000 Hz base rate averaged across the 3 repetitions to move onto the next stimulation rate
difference level. Each training session consisted of 3 repetitions of the same stimulation rate
difference, determined by the training level. Following correct responses, the base stimulation
rate was increased by the step size of 2
1/12
(i.e., the base rate was increased by a semitone);
following incorrect responses, both the base stimulation rate was decreased by was 2
3/12
(i.e., the
base rate was decreased by three semitones after each incorrect response) converging to 75%
accuracy (Kaernbach, 1991). Adaptive runs continued until the participant made 12 mistakes and
the upper limit of discrimination was calculated as the average of the last 4 reversals. This daily
149
training was conducted for 4 weeks with an effort to achieve 28 sessions for each subject. Total
training time was approximately 14 hours.
Electrode Psychophysical Assessments
Figure 6.1: Example Stimuli for Procedures
Example stimuli for (A) frequency discrimination (FD), (B) intensity discrimination, (C) melodic contour identification (MCI),
and (D) fundamental frequency discrimination with vowel formants.
Frequency Discrimination
Frequency discrimination was measured using a two-interval, two-alternative, forced-
choice procedure in which subjects were asked which interval was higher in pitch. The condition
frequencies were 110, 220, 440, and 880 Hz for dual-electrode stimulation. The primary
hypothesis focused on testing whether frequency discrimination is better provided by combined
place and rate of stimulation than by either cue alone. This was tested with place, rate, and place-
150
rate stimuli, with the focus of comparing place and rate separately to the combined place-rate
stimulation. Example stimuli are shown in Figure 6.1A. There were 12 conditions comprised of
all combinations of the 3 types of stimuli (place, rate, and combined place-rate) at the 4 test
frequencies. Conditions were repeated three times in random order.
For each trial within a measurement run, the amplitudes of the standard and target were
randomly and independently roved between 90 and 100% (uniform distribution) of the width of
the subject’s dynamic range (in units of charge per phase—decibels re 1 ηCoulomb) as fitted by
the logistic function. For each trial, the frequency of the standard was roved within a quarter
octave of the condition frequency; the target frequency was defined adaptively higher relative to
the roved standard frequency. The initial difference that the target frequency was higher than the
standard frequency was 64% with an adaptive ceiling of 128% frequency difference. The
difference for discrimination was decreased by a factor of √2
3
after correct answers and increased
by a factor of 2 after mistakes (converging to 75% accuracy, Kaernbach, 1991). The procedure
continued until the participant made 10 mistakes and the discrimination threshold was calculated
as the average of the last 8 reversals.
Intensity Discrimination
Intensity discrimination was measured using a two-interval, two-alternative, forced-
choice procedure in which subjects were asked which interval was higher in pitch. Example
stimuli are shown in Figure 6.1B. The condition frequencies were 110, 220, 440, and 880 Hz for
dual-electrode stimulation. There were 4 conditions comprised of loudness discrimination at the
4 test frequencies. Conditions were repeated three times in random order.
151
For each trial within a measurement run, the amplitudes of the standard and target were
randomly and independently roved between 90 and 100% (uniform distribution) of the width of
the subject’s dynamic range (in units of charge per phase—decibels re 1 ηCoulomb) as fitted by
the logistic function. Frequency roving was not applied to this study, and the frequency was not
changed between the standard and target presentations. The initial loudness difference was 64%
difference of the dynamic range with an adaptive ceiling of 100% difference. The difference for
intensity discrimination was decreased by a factor of √2
3
after correct answers and increased by a
factor of 2 after mistakes (converging to 75% accuracy, Kaernbach, 1991). The procedure
continued until the participant made 4 mistakes and the discrimination threshold was calculated
as the average of the last 4 reversals.
Melodic Contour Identification
Melodic contour identification was measured using a one-interval, nine-alternative,
forced-choice procedure. The nine melodic contours consisted of five-note patterns including
“rising,” “falling,” “flat,” “rising-flat,” “falling-flat,” “rising-falling,” “falling-rising,” “flat-
rising,” and “flat-falling” (Crew et al., 2012; Galvin et al., 2007). These nine contours of varying
difficulty were presented twice in pseudorandom order to measure overall realistic performance
with a total of 18 trials in a measurement run (Galvin et al., 2007). Twenty-four possible
experimental conditions were tested with four center-note frequencies and six internote semitone
spacing levels for the stimulation rate cue type only. Example stimuli are shown in Figure 6.#.
The four center-note frequencies presented were 110, 220, 440, and 880 Hz with the match to
these center-note frequencies, based on Western music notation, being A2, A3, A4, and A5. The
six internote semitone spacing levels were 12, 9, 6, 3, 2, and 1 semitone apart. Example stimuli
152
are shown in Figure 6.1C. For each frequency tested, the subject had to achieve a percent correct
of 50% to move on to the next internote semitone spacing level, with chance being 11%.
For each trial within a measurement run, the amplitudes of the five notes in the contour
were randomly and independently roved between 90 and 100% (uniform distribution) of the
width of the subject’s dynamic range (in units of charge per phase—decibels re 1 ηCoulomb) as
fitted by the logistic function. For each trial, the frequency of the third note of the five-note
contour was roved within a quarter octave of the condition frequency; the third note did not
change in the contours, so it was chosen for roving since the note frequencies were defined
adaptively relative to the roved frequency of the third note. The purpose of frequency roving was
to add perturbations which contribute to the ecological relevance of the stimulus (e.g., music
played in different keys, vocal pitch fluctuations) while avoiding habituation to the third note
frequency. The frequency spacing between notes in the melodic contour was controlled by the
internote semitone spacing level.
Fundamental Frequency Discrimination with Vowel Formants
Fundamental frequency discrimination with vowel formants was measured using a two-
interval, two-alternative, forced-choice procedure in which subjects were asked which interval
was higher in pitch. The fundamental frequency conditions were 110, 220, and 440 Hz, chosen
as representative of spoken speech fundamental frequencies. This was tested with place, rate, and
place-rate stimuli for constant and variable vowel formants between the standard and target
intervals. There were 18 dual-electrode conditions comprised of all combinations of the 3 types
of stimuli (place, rate, and combined place-rate) at the 3 test frequencies for the constant and
variable vowel formants. Conditions were repeated three times in random order.
153
Voice pitch discrimination of synthetic vowels was measured in the presence of constant
and varying formant frequencies. The first three formant frequencies were chosen for six vowels
with IPA symbols: ɑ, æ, ə, ɝ, i, and u—specified as [730, 1090, 2440], [660, 1720, 2410], [520,
1190, 2390], [490, 1350, 1690], [270, 2290, 3010], and [300, 870, 2240], respectively. To adjust
for changes in formant frequencies that occur with higher fundamental frequencies, formant
frequency values were increased by 12.3% (2 semitones) and 33.5% (5 semitones) when using
fundamental frequencies near 220 and 440 Hz, respectively. These three formant frequencies
were represented in three dyads with different place-of-excitation cues but with the standard and
target stimulation rates. The constant vowel condition presented two vowels with identical
formant frequencies and the variable vowel condition presented two vowels with different
formant frequencies. Example stimuli are shown in Figure 6.1D.
For each trial within a measurement run, the amplitudes of the standard and target were
randomly and independently roved between 90 and 100% (uniform distribution) of the width of
the subject’s dynamic range (in units of charge per phase—decibels re 1 ηCoulomb) as fitted by
the logistic function. For each trial, the frequency of the standard was roved within a quarter
octave of the condition frequency; the target frequency was defined adaptively higher relative to
the roved standard frequency. The initial difference that the target frequency was higher than the
standard frequency was 64% with an adaptive ceiling of 128% frequency difference. The
difference for discrimination was decreased by a factor of √2
3
after correct answers and increased
by a factor of 2 after mistakes (converging to 75% accuracy, Kaernbach, 1991). The procedure
continued until the participant made 10 mistakes and the discrimination threshold was calculated
as the average of the last 8 reversals.
Pitch Matching
154
Pitch matching was measured using a novel interface in which subjects were asked to
pitch match a target interval to the standard. The standard was manipulated in one dimension of
frequency, either place or rate, then the subject was given the control to match the pitch of the
target with the other dimension of frequency. The goal was to manipulate the two opposing
dimensions of frequency, place and rate, and observe the effect of cue type on these
manipulations and whether they were affected by stimulation rate training. Because these
dimensions are not equivalent in the pitch difference that is derived from a change or the
weighting of the two dimensions, there were no technical correct answers. The condition
frequency was 220 Hz with the rationale being that rate and place weighting would be more
similar at this frequency, but that since this frequency is near the 300 Hz limit, training could
also provide a benefit and difference across sessions. There were 12 conditions with 2 cue
dimensions (place and rate) and 6 standard semitone differences from the nominal frequency (-6,
-3, -1, 1, 3, and 6 semitones). Example stimuli for frequency discrimination, which is a similar
paradigm of place and rate manipulation, are shown in Figure 6.1A. Conditions were repeated
three times in random order. No amplitude or frequency roving was used, but loudness balancing
was still implemented.
Acoustic Pitch Assessments
A permalink for the entire acoustic experiment test set can be found at:
https://www.teamhearing.org/90. Upon entering the site, click the “Studies” button to start the
experiment. No correct-answer feedback was provided for any assessment tasks.
Pure Tone Loudness Scaling
155
Pure tone loudness scaling of an 880 Hz sinusoid was measured to characterize loudness
growth. Participants were provided with an application interface to set gain values that produced
tones that were “Soft”, “Medium Soft”, “Medium”, and “Medium Loud” for an 880 Hz pure tone
that was 400 ms in duration with 40 ms raised-cosine attack and release ramps.
Pure Tone Detection
Pure tone detection thresholds were measured for 400 ms sinusoids with 40 ms raised-
cosine attack and release ramps. Detection thresholds were measured for octave frequencies from
110 to 3520 Hz, based on the fundamental frequencies of voicing as well as pitches essential to
music. An application interface was provided at the beginning of each measurement run allowing
participants to set the stimulus gain to be “soft but audible”. From the starting gain value,
detection thresholds were measured using a three-alternative, three-interval, forced-choice
procedure where two of the intervals contained silence and the target interval contained the gain-
adjusted tone. The gain was reduced by a step following correct answers and was increased by
three times this step following wrong answers (converging to 75% accuracy, Kaernbach, 1991).
The initial step was 6 dB and was decreased by 2 dB following the first correct answer following
each mistake while limiting the smallest step to 2 dB. A run continued until the participant made
3 mistakes and the gain value at the end of the run was taken as the detection threshold.
Pure Tone Frequency Discrimination
Pure tone frequency discrimination thresholds were measured for pure tones near octave
frequencies from 110 to 3520 Hz. These four condition frequencies were measured with three
repetitions. Stimuli were 400 ms sinusoids with 40 ms raised-cosine attack and release ramps. An
application interface was provided at the beginning of a run allowing participants to set the
156
stimulus gain to be “comfortable”. Discrimination thresholds were measured using a two-
alternative, two-interval, forced-choice procedure with the target having an adaptively higher
frequency compared to the standard.
Participants were instructed to choose the interval that was “higher in pitch”. At the
beginning of a run, the adaptive frequency difference was 100% (an octave). This frequency
difference was reduced by a factor of 2
-1/3
following correct answers and increased by a factor of
2 following wrong answers. For each trial, a roved frequency value was selected from a quarter-
octave-wide uniform distribution geometrically centered on the condition frequency. Relative to
this roved frequency value, the standard frequency was lowered, and the target raised by
√1 + ∆ 100 ⁄ . The gain for the standard and target intervals were independently roved by 6 dB,
also based on a uniform distribution, centered on the comfortable listening level. A run continued
until the participant made four mistakes and the average of the last four reversals was taken as
the discrimination threshold.
Fundamental Frequency Discrimination
Fundamental frequency discrimination thresholds were measured for harmonic
complexes for fundamental frequencies near 110, 220, 440, and 880 Hz for high-pass filtered
complexes. These fundamental frequencies were chosen as representative of spoken speech, with
the addition of 880 to correspond to the electrode psychophysical assessment. The low-pass
condition was chosen as representative of spoken speech with a high-frequency roll-off
(Swanson et al., 2019). A total of 12 measurement runs were conducted consisting of 3
repetitions of the 4 fundamental frequencies. The condition order was randomized for each
repetition.
157
Harmonic complexes were constructed in the frequency domain by summing all non-zero
harmonics from the fundamental to 10 kHz with a high-pass filtering function.
The high-pass filter was defined as:
𝑔𝑎𝑖𝑛 = {
1 𝑖𝑓 𝑓 > 𝑓 𝑒
max(0, 1 − (log
2
𝑓 − log
2
𝑓 𝑒 )
2
) 𝑜𝑡 ℎ𝑒𝑟𝑤𝑖𝑠𝑒
with the edge frequency specified as 4 kHz resulting in a gain of 0 for component frequencies of
2 kHz and lower.
Fundamental frequency discrimination thresholds were measured using a two-interval,
two-alternative, forced-choice procedure in which participants were instructed to choose the
interval that was “higher in pitch”. The same adaptive decision and scoring logic as for pure tone
frequency discrimination, described above, were used to calculate discrimination thresholds for
fundamental frequency.
Pitch Ranking of Piano Notes
Pitch ranking of piano notes were measured for center frequencies near 110, 220, and
440 Hz. A base note was chosen for each interval within an octave range centered around the
center frequency. The piano notes were rendered through MuseScore and pitch shifted in
MATLAB keeping the spectral information the same and altering only the temporal information.
A total of 6 measurement runs were conducted consisting of 2 repetitions of the 3 center
frequencies. The condition order was randomized for each repetition.
Discrimination thresholds were measured using a two-interval, two-alternative, forced-
choice procedure in which participants were instructed to choose the interval that was “higher in
pitch”. The same adaptive decision and scoring logic as for pure tone and fundamental frequency
158
discrimination, described above, were used to calculate discrimination thresholds for pitch
ranking of piano notes.
Melodic Contour Identification
Melodic contour identification performance was measured for MIDI piano notes within a
one and a quarter octave frequency range centered around 220 Hz. Stimuli were 5 note intervals
with each note being a 200 ms sinusoid with a 40 ms raised-cosine attack and release ramp.
Performance was measured for 4, 2, and 1 semitone internote differences, respectively. While the
ordering of the semitone difference conditions from more to less favorable may provide a slight
familiarization advantage (which also might be offset by fatigue), we note that our primary
hypothesis concerns the effect of the rate training paradigm on this test, which is not included in
the training paradigm. An application interface was provided at the beginning of a run allowing
participants to set the stimulus gain to be “comfortable”. Identification performance was
measured using a nine-alternative, one-interval, forced-choice procedure. Participants were
instructed to choose the melodic contour that they heard. Each measurement run consisted of 18
trials, randomly presenting each of the 9 contours twice. Results were reported in percent correct.
Acoustic Speech Assessments
Pure Tone Loudness Scaling and Detection
Pure tone loudness scaling and pure tone detection thresholds were obtained in the same
manner as pitch assessments. The frequency for pure tone loudness scaling was 1000 Hz and
pure tone detection thresholds were obtained for 500, 2000, and 4000 Hz, based on the
frequencies essential to speech understanding.
159
Pure Tone Intensity Discrimination
Pure tone intensity discrimination thresholds were measured for pure tones near octave
frequencies from 110 to 880 Hz. These four condition frequencies were measured with three
repetitions. Stimuli were 400 ms sinusoids with 40 ms raised-cosine attack and release ramps. An
application interface was provided at the beginning of a run allowing participants to set the
stimulus gain to be “comfortable”. Discrimination thresholds were measured using a two-
alternative, two-interval, forced-choice procedure with the target having an adaptively higher
intensity compared to the standard.
Participants were instructed to choose the interval that was louder. At the beginning of a
run, the adaptive intensity difference was 12 dB, with an adaptive ceiling of 24 dB. This intensity
difference was reduced by a factor of 2
-1/3
following correct answers and increased by a factor of
2 following wrong answers (converging to 75% accuracy, Kaernbach, 1991). For each standard
and target stimulus, a separate roved frequency value was randomly selected from a quarter-
octave-wide uniform distribution geometrically centered on the condition frequency. The gain
for the standard and target intervals were co-roved by 6 dB, based on a uniform distribution,
centered on the comfortable listening level. The target interval intensity was then raised by
√1 + ∆ 100 ⁄ . A run continued until the participant made four mistakes and the average of the
last four reversals was taken as the intensity discrimination threshold.
Sentence Completion in Background Noise
Speech reception thresholds (SRTs) were measured for a sentence completion task using
speech materials from the Revised Speech Perception in Noise (R-SPIN) corpus (Bilger, 1984;
Wilson et al., 2012) in the presence of two-talker background noise (Leibold & Buss, 2013). The
160
user interface presented 25 different word options, and participants were asked to choose the
word that ended the last spoken sentence. The R-SPIN corpus contains sentence materials that
include both high and low amounts of contextual information. Only the materials with low
context information were used in the present study. All words were presented at the same RMS
value as of a 1 kHz pure tone that was specified as “Medium” loud. SRTs were measured using
an adaptive procedure, described below, in which the level of the target speech was adaptively
varied in a two-talker babble. The two-talker babble used was produced by two adult female
voices reading passages from children’s books (Leibold & Buss, 2013), whereas the target
speech consisted of isolated words spoken by an adult male voice. The SNR was specified based
on the root mean square value of the word and noise samples. For a given SNR, the word and
noise samples were combined and scaled such that the total output power was set equal to the
subject-specified “Medium” loudness level for a 1 kHz pure tone as described in the loudness
scaling section. The initial signal to noise ratio between the spoken sentence and background
noise was set to 12 dB, with an adaptive ceiling of 24 dB, and was decreased by 2 dB after
correct responses and increased by 6 dB after incorrect responses (converging to 75% accuracy,
Kaernbach, 1991). The procedure continued until the participants made four incorrect responses
and the average of the last four reversals was taken as the SRT.
Results
Analyses
The primary hypothesis tested is that stimulation rate training improves frequency
discrimination with rate of stimulation and combined place and rate of stimulation. The battery
of psychophysical assessments was given to study the effect of stimulation rate training on pitch
161
judgment, loudness judgment, and speech tasks. A repeated measures analysis of variance
(ANOVA) with interactions was conducted with factors varying for each assessment. All
statistics for discrimination assessments were calculated on logarithmically transformed
thresholds (Micheyl et al., 2006). Measured Cohen’s d was used as a measure of effect size
(Cohen, 1992).
Planned multiple comparisons were calculated for electrode psychophysics to test the
hypothesis that the combined cue would provide better discrimination over either cue alone for
each frequency, when applicable. The multiple comparisons test was conducted with Fisher’s
least significant difference.
Training
All subjects completed four weeks of training, with Subject A completing 25 and Subject
B completing 28 training sessions. Both subjects reached the level of 40% difference from base
frequency. Subject A completed 8 sessions and Subject B 9 sessions at the 40% level before the
end of the 4-week protocol. In a previous study (Bissmeyer et al., 2020b), subjects had an
average of about 40% discrimination at 400 Hz. It is unsurprising then, that although they were
able to pass the training levels above the forty percent difference, that they may need more
focused and longitudinal training to get down to the lower percent differences. It remains to be
seen whether this would improve performance at the trained and assessment tasks. The level of
training in the present study was designed to provide optimal and daily experience at higher
frequencies without causing fatigue. Although the subjects were not fatigued with the training, it
may have not been enough to provide lasting improvement for the assessment tasks.
Electrode Psychophysical Assessments
162
Frequency Discrimination
Figure 6.2: Frequency Discrimination across Sessions and Ears
Frequency discrimination thresholds with dual-electrode stimuli averaged across the participants for the factors of stimulation cue
and frequency with error bars showing standard errors of the means. Each subplot represents the ear tested (whether trained or
non-trained) in either pre- or post-assessment testing.
Figure 6.2 shows frequency discrimination before and after training. Discrimination
thresholds averaged across all conditions including subject were 19.30% before and 17.02%
after. Figure 6.2 shows that training did not significantly improve performance across sessions
(F(1,263) = 1.35, p = 0.25, 𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.23). There was also no difference between the trained and
untrained ear for the bilateral CI users (F(1,263) =0.01, p = 0.9). There was an effect of frequency
on performance (F(3,263) = 3.52, p = 0.016). There was also an effect of cue type on performance
(F(2,263) = 30.54, p < 0.001), with average discrimination generally being better with combined
place and rate cues than with either cue alone. The grand means for stimulation cue averaged
across all other conditions were 28.67% for place, 20.16% for rate, and 10.31% for the combined
163
cue conditions. This benefit of the combined cue condition was large and significant when
compared to place (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.93) or rate alone (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.63).
As shown in Figure 6.2, rate discrimination thresholds exhibit the characteristic trend of
worsening for higher rates. In contrast, place discrimination improved for increasing rates with
an average best performance near 880 Hz, which corresponds to a location near electrode 13 for
the frequency allocation used in this experiment. Neither of these trends changed between
sessions. Discrimination for the combined cue condition generally tracks the better of the two
cues, with a significant and synergistic combined improvement measured for the 440 Hz
condition (place, p = 0.041; rate, p = 0.016). These observations were statistically confirmed
with a clear effect of frequency on discrimination thresholds with worsening thresholds for
higher frequencies (F(4,40) = 22.18, p < 0.001), and there was a significant interaction between
stimulation cue and frequency (F(8,80) = 14.79, p < 0.001). All other second order interactions
were not significant.
Planned multiple comparisons were calculated to test the hypothesis that the combined
cue would provide better discrimination over either cue alone for each frequency. The multiple
comparisons test was conducted with Fisher’s least significant difference. Measured
discrimination thresholds were significantly better for the combined cue than for at least one cue
alone for the 110 (place, p < 0.001), 220 (place, p < 0.001), 440 (place, p = 0.041; rate, p =
0.016) Hz, and 880 (rate, p < 0.001) Hz conditions. The effect sizes of these comparisons were
large (𝑑 𝐶𝑜 ℎ𝑒𝑛
> 0.4). Place and rate discrimination were significantly different for all
frequencies (p < 0.001) with the exception of 440 Hz (p = 0.71), with the stronger cue switching
between 220 and 440 Hz.
164
Intensity Discrimination
Figure 6.3: Intensity Discrimination across Sessions and Ears
Intensity discrimination thresholds with dual-electrode stimuli averaged across the participants for the factor of frequency with
error bars showing standard errors of the means. Each subplot represents the ear tested (whether trained or non-trained) in either
pre- or post-assessment testing.
Figure 6.3 shows intensity discrimination before and after training. Interestingly for this
control condition, session had an effect on performance (F(1,83) = 10.49, p = 0.0017, 𝑑 𝐶𝑜 ℎ𝑒𝑛
=
0.7). There was no difference between the trained and untrained ear for the bilateral CI users
(F(1,83) = 1.15, p = 0.28). There was a significant effect of frequency with increasing frequency
resulting in better loudness discrimination (F(3,83) = 10.35, p < 0.001).
Melodic Contour Identification
165
Figure 6.4: Melodic Contour Identification across Sessions and Ears
Melodic contour identification with dual-electrode stimuli measured as percent correct averaged across the participants for the
factors of frequency and semitone spacing with error bars showing standard errors of the means. Each subplot represents the ear
tested (whether trained or non-trained) in either pre- or post-assessment testing.
Figure 6.4 shows percent correct for melodic contour identification across stimulation
rate semitone spacings for 110 to 880 Hz in octave spacings. There was no effect of session on
performance (F(2,20) = 17.17, p = 0.21). The ear tested had a significant effect on performance
(F(2,20) = 17.17, p < 0.001), with the subject getting better identification performance based on the
ear tested second (always the untrained ear in bilateral users) rather than the trained ear. This
seems to imply a limited form of experience with the task providing better performance, rather
than stimulation rate training carrying over toward better melodic contour identification.
Semitone spacing between notes (F(2,20) = 17.17, p < 0.001) and the base frequency of notes
(F(2,20) = 17.17, p < 0.001) were significant factors affecting performance. No second order
interactions were significant.
166
Fundamental Frequency Discrimination with Vowel Formants
Figure 6.5: Fundamental Frequency Discrimination with Vowel Formants across Sessions and Ears
Fundamental frequency discrimination thresholds with vowel formants with dual-electrode stimuli averaged across the
participants for the factors of stimulation cue and frequency with error bars showing standard errors of the means. Each subplot
represents the ear tested (whether trained or non-trained) in either pre- or post-assessment testing.
Figure 6.5 shows the fundamental frequency discrimination thresholds before and after
training for both the trained and non-trained ears. Fundamental frequency discrimination with
vowel formants before and after training averaged across all conditions including subject was
23.68% before and 23.06% after. There was no effect of stimulation rate training on the
performance across sessions (F(1,405) = 0.15, p = 0.7, 𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.024). There was also no
difference between the trained and untrained ear for the bilateral CI users (F(1,405) = 1.53, p =
0.22). There was no effect of method (constant vs variable vowel formants) on performance (F(1,
405) = 1.98, p = 0.16). There was an effect of frequency on performance (F(2, 405) = 24.37, p <
0.001). There was also an effect of cue type on performance (F(2, 405) = 255.24, p < 0.001). The
grand means for stimulation cue averaged across all other conditions were 68.56% for place,
167
14.86% for rate, and 12.53% for the combined cue conditions. This benefit of the combined cue
condition was large and significant when compared to place alone (𝑝 < 0.001, 𝑑 𝐶𝑜 ℎ𝑒𝑛
= 2.38)
but small when compared to rate alone (𝑝 = 0.098, 𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.19).
There was a significant interaction between mode and frequency (F(4, 405) = 17.56, p <
0.001). As shown in Figure 6.5, fundamental frequency rate discrimination thresholds exhibit the
characteristic trend of worsening at 440 Hz, with the combined cue following that trend as rate is
the stronger cue in this task. In contrast, fundamental frequency place discrimination was poor
for this task and only began improving slightly at 440 Hz. The performance change at 440 Hz for
both rate and place drove this significant interaction, but place discrimination did not improve
enough to cause a synergistic improvement over rate alone for the combined condition. Neither
of these trends changed between sessions.
Pitch Matching
168
Figure 6.6: Pitch Matching with Place and Rate Frequency Cues across Sessions and Ears
Pitch matching with dual-electrode stimuli averaged across the participants showing different subplots for the factor of
stimulation cue perturbation done in semitone spacing. Each subplot represents the ear tested (whether trained or non-trained) in
either pre- or post-assessment testing, with error bars showing standard errors of the means. The red color denotes place
manipulation after rate perturbation and the blue color denotes rate manipulation after place perturbation. The diamonds represent
data from the trained ear, and the squares represent data from the non-trained ear.
Figure 6.6 shows the pitch matching manipulations before and after training for both the
trained and non-trained ears. This was a unique test in that there were no correct answers. We
analyzed the data based on a linear fit, where if the R value was -1, the place and rate
manipulations were the same, where a change in stimulation rate needed an equal but opposite
change in place based on frequency allocation. It was self-reported to be a very difficult task, as
these are two potential dimensions of pitch that do not have the same equivalent pitch change for
an equivalent change in frequency.
There was no effect of session (F(1,9) = 81.5, p = 0.07), ear (trained and non-trained) for
the bilateral CI users (F(1,9) = 1.45, p = 0.49), or manipulation cue (F(1,9) = 13.83, p = 0.17). There
were two significant second order interactions with manipulation cue, for ear (F(1,9) = 5.25, p =
0.048) and session (F(1,9) = 21.24, p = 0.0013). The interaction between manipulation cue and ear
was driven by the untrained ear being closer to R = 1 than the trained ear (p < 0.001).
Interestingly, the interaction between manipulation cue and session was driven by progress
toward R = 1 between sessions, showing an effect of session potentially providing a more
equivalent comparison between place and rate (p < 0.001).
Acoustic Pitch Assessments
Pure Tone Loudness Scaling and Detection Thresholds
169
Figure 6.7: Acoustic Pure Tone Detection
Pure tone detection thresholds for the acoustic pitch assessment in octave steps from 110 to 3520 Hz averaged across the
participants shown across frequency with error bars showing standard errors of the means.
Figure 6.7 shows pure tone detection thresholds for the pitch assessment. Neither session
(F(1,29) = 0.88, p = 0.52) nor ear (F(1,29) = 1.56, p = 0.48) influenced performance. There was a
significant effect of frequency with increasing frequency resulting in lower detection thresholds
(F(5,29) = 75.94, p < 0.001). There only significant second order interaction was between ear and
session (F(1,29) = 5.49, p = 0.026), due to the untrained ear having worse thresholds in the pre-
assessment.
Pure Tone Frequency Discrimination
170
Figure 6.8: Acoustic Pure Tone Frequency Discrimination for 110-3520 Hz
Pure tone frequency discrimination thresholds for the acoustic pitch assessment averaged across the participants shown across
frequency with error bars showing standard errors of the means.
Figure 6.8 shows pure tone frequency discrimination thresholds. There was no significant
effect of session on performance (F(1,77) = 0.08, p = 0.78). There was a small difference between
the trained and untrained ear for the bilateral CI users (F(1,77) =4.73, p = 0.03). There was a
significant effect of frequency on performance (F(5,77) = 9.61, p < 0.001). There were no
significant second order interactions.
Fundamental Frequency Discrimination
171
Figure 6.9: Acoustic Fundamental Frequency Discrimination
Fundamental frequency discrimination thresholds for the acoustic pitch assessment averaged across the participants shown across
frequency with error bars showing standard errors of the means.
Figure 6.9 shows fundamental frequency discrimination thresholds. There was no effect
of session on performance (F(1,83) = 0.26, p = 0.61). There was a small difference between the
trained and untrained ear for the bilateral CI users (F(1,83) = 5.43, p = 0.02). There was a
significant effect of frequency on performance (F(3,83) = 11.09, p < 0.001), with participants
performing worse at 440 Hz than at the other frequencies. There were no significant second order
interactions.
Pitch Ranking of Piano Notes
172
Figure 6.10: Acoustic Piano Note Frequency Discrimination
Piano note frequency discrimination thresholds for the acoustic pitch assessment averaged across the participants shown across
frequency with error bars showing standard errors of the means.
Figure 6.10 shows pitch ranking of piano notes reported in discrimination thresholds.
There was no effect of session on performance (F(1,38) = 0.1, p = 0.75). There was no difference
between the trained and untrained ear for the bilateral CI users (F(1,38) = 0.1, p = 0.28). There was
a significant effect of frequency on performance (F(2,38) = 3.59, p = 0.037), with participants
performing worse at 220 Hz. There were no significant second order interactions.
Melodic Contour Identification
173
Figure 6.11: Acoustic Melodic Contour Identification with Piano Notes
Melodic contour identification for the acoustic pitch assessment reported in percent correct, averaged across the participants
shown across frequency with error bars showing standard errors of the means.
Figure 6.11 shows melodic contour identification reported in percent correct. There was
no effect of session (F(1,14) = 0.07, p = 0.8), ear (F(1,14) = 1.74, p = 0.21), or frequency (F(2,14) =
0.32, p = 0.73) on performance. There were no significant second order interactions.
Acoustic Speech Assessments
Pure Tone Scaling and Detection Thresholds
174
Figure 6.12: Acoustic Pure Tone Detection for 500-4000 Hz
Pure tone detection thresholds for the acoustic speech frequencies for octave intervals from 500 to 4000 Hz averaged across the
participants shown across frequency with error bars showing standard errors of the means.
Figure 6.12 shows pure tone detection thresholds for the speech assessment. Session had
no effect on performance (F(1,19) = 0, p = 1). There was a difference between the trained and
untrained ear for the bilateral CI users (F(1,19) = 6.49, p = 0.02), with the trained ear having lower
thresholds than the non-trained ear. There was a significant effect of frequency with increasing
frequency resulting in better loudness discrimination (F(3,19) = 3.38, p = 0.04). There were no
significant second order interactions.
Pure Tone Intensity Discrimination
175
Figure 6.13: Acoustic Pure Tone Intensity Discrimination
Pure tone intensity discrimination thresholds for the speech assessment averaged across the participants shown across frequency
with error bars showing standard errors of the means.
Figure 6.13 shows pure tone intensity discrimination thresholds. Neither session (F(1,83) =
3.74, p = 0.06) nor ear (F(1,83) = 1.87, p = 0. 18) influenced performance. There was a significant
effect of frequency with increasing frequency resulting in better loudness discrimination (F(3,83) =
3.87, p = 0.01). There was a second order interaction between ear and session (F(1,83) = 4.29, p =
0.04) driven by slightly better performance in the non-trained ear during the first session.
Sentence Completion in Background Noise
176
Figure 6.14: Acoustic Sentence Completion in Background Noise
Sentence completion in background noise for the speech assessment reported in dB SNR averaged across the participants shown
across frequency with error bars showing standard errors of the means.
Figure 6.14 shows sentence completion in competing noise. This task did not yield any
significant effects or interactions. The task remained difficult for all subjects regardless of ear
(F(1,20) = 10.67, p = 0.57) or session (F(1,20) = 16.67, p = 0.5).
Discussion
Why did the progression throughout stimulation rate training not transfer to the
trained task?
The first issue that could be addressed is, were we truly training the implant users on a
pitch judgment test with pitch cues? A potential argument to the contrary is that training may
simply be teaching them to associate a non-pitch difference in timbre with an imposed
177
terminology of pitch. In an argument for the cue being pitch, it was found that when not
providing feedback in the pre-training assessments, CI users were able to perform well when
asked to do pitch ranking based on stimulation rate (the subsequently trained cue).
Another potential reason for poor transference is the time spent training which was
relatively low compared to Goldsworthy & Shannon (2014). The Goldsworthy and Shannon
paper had subjects do 28 hours of training in-person for 2 hours/session twice per week for 4
weeks, while the present study had subjects do 14 hours remotely at 30 min/day for 4 weeks.
The third potential reason is the personal motivation of each subject. Keeping subjects
motivated to test and train at home was difficult with a retention rate of 33%. The amount of pre-
and post-assessment testing was intense, which could have also decreased the motivation to test.
In previous studies done in the Bionic Ear Lab, subjects have mostly been brought into the lab,
increasing the interaction and personal vested interest in the results and discussion. The remote
setup of this project was a conjunction of increases in remote testing during the pandemic,
especially with older subjects, and the protocol design to complete small amounts of daily
training. In the next iteration of this study, based on the results of this preliminary exploration,
the assessment testing would be pared down so that the subjects would be brought into the lab
for a single day of assessment testing both before and after training. The remote training period
will potentially be longer to meet the time spent in the Goldsworthy and Shannon (2014) study,
which would potentially call for a mid-training assessment where the subject would be brought
back into the lab.
Conclusion
178
Motivated by the essential role of pitch in hearing and the difficulty that CI users have with pitch
perception, the present study does a preliminary exploration of providing experience with
stimulation rate for pitch judgments, a cue which is not consistently provided in CI signal
processing. This ongoing study showed an improvement in the ability of CI users to integrate
place and rate, but not an experience-driven improvement at frequency discrimination with
stimulation rate alone. The relatively weak stimulation rate cue can be combined with place to
provide an improvement in frequency discrimination performance over either cue alone. This
combined benefit motivates exploring the potential of encoding combined rate and place cues on
the apical electrodes, balanced with providing the necessary place resolution in the 1-2 kHz
range for speech understanding.
179
General Discussion/Conclusions
This dissertation focused on the improving frequency resolution with implications for
front and back-end CI processing. The ultimate goal was to provide the means to improve noise
reduction and pitch perception, which are the issues most commonly faced by CI users. The
Binaural Fennec algorithm was developed to improve speech recognition in a reverberant
environment with multiple noise sources, while preserving the cues necessary for localization.
The capability of CI users to perform pitch judgment tasks was measured through musical
interval identification with training, simple rate discrimination with training, combined place-rate
melodic contour identification with the Cochlear Co. frequency allocation as well as place-rate
frequency discrimination with improved low-frequency place resolution, and an exploratory
study into longitudinal stimulation rate training for the improvement of combined place-rate
frequency discrimination. These studies explore both abilities to perceive pitch through the
processor as well as how frequency resolution can be improved in CIs. While there was limited
support of rate discrimination training improving access to frequency cues for pitch judgments,
this dissertation does demonstrate that combining place and rate at low frequencies can improve
frequency discrimination (tested through pitch ranking) over place or rate alone.
Some avenues should be explored based on the topics in this dissertation. The next step
for front-end processing with noise reduction would be encoding the Binaural Fennec noise
reduction algorithm into hearing assistive technology and perform tests in controlled and
uncontrolled environments. The next steps for back-end processing with pitch perception are (1)
continuing to characterize whether trained stimulation rate pitch can be useful for more difficult
pitch judgment or musical tasks and (2) exploring the feasibility of and potential improvement
from implementing variable stimulation rate and better low-frequency place resolution in CI
180
signal processing. This dissertation showed that the combination of place and rate, with a
modified frequency allocation table, improved low-frequency perception of pitch changes over
either cue alone. This along with the evidence from MED-EL’s FSP motivates the exploratory
implementation of combined place-rate coding on a pair or set of apical electrodes. These
channels could better encode fundamental frequency of voicing, vocal emotion, and low-
frequency instrumental and vocal musical compositions. I plan to continue to help oversee the
preliminary study of longitudinal stimulation rate training as described in Chapter 6.
What are the challenges of implementing combined place and rate
coding in CI signal processing?
Currently the only commercially available signal processing strategy which attempts to
encode temporal fine structure (TFS) is MED-EL’s Fine Structure Processing (FSP) strategy. It
does so on the 4 most apical electrodes which are placed well into the second turn of the cochlea
because of MED-EL’s long electrode array. This introduces TFS near the place of stimulation
which is most conditioned to receive fine timing information congruently with place information.
There are challenges and considerations toward the end of encoding TFS in CI signal
processing (Laneau et al., 2006; Merzenich, 1983; Moon & Hong, 2014; Rubinstein et al., 1999;
Wouters et al., 2013), including the trade-off between providing detailed TFS with increased
channel interaction versus less detailed TFS with reduced channel interaction (Loizou et al.,
2003). For a place-rate signal processing strategy to be optimal, the fundamental frequency
would need to be known, which is not currently possible in a real-world scenario. Other issues
such as the entrainment of the neurons to the first pulse of the stimulus is difficult (Hughes et al.,
2012b; Rubinstein & Hong, 2003; B. S. Wilson et al., 1997). It has been proposed that things can
181
be done to improve the stochasticity of firing and provide better temporal information, like
different stimulation pulse shapes, but this has yet to be established and implemented. Some
work has been done to show that conditioning the neurons with subthreshold pulses, in a similar
way to the noise which is present in the NH auditory system, could improve the responsiveness
of the system. This could provide better amplitude resolution with reduced thresholds and
improved dynamic range, which could improve the definition of temporal encoding and result in
less channel interaction for spectral coding (Drennan & Rubinstein, 2008; Hong & Rubinstein,
2006; Karg et al., 2013).
The challenge toward the end of improving place resolution is that increasing resolution
in the fundamental frequencies makes it difficult to retain the necessary resolution in the formant
frequencies. It also creates an even larger place to characteristic frequency mismatch than is
currently present in CI signal processing (Moberly et al., 2016). This would introduce distortion,
which is, at least in the short-term, unlikely to provide improvement (Q. J. Fu & Shannon,
1999b; Q.-J. Fu & Shannon, 1999). Although there is evidence of plasticity in terms of spectral
shift in CI users (Reiss et al., 2014). The study presented in chapter 5 showed that increasing
place resolution below 400 Hz as well as combining place and rate of stimulation improved
performance at frequency discrimination over either cue alone, so it is important to consider the
issues presented here and look toward addressing them in future works (Bissmeyer &
Goldsworthy, 2022). Overall, finding ways to improve the current CI signal processing, such as
deeper modulation of the temporal envelope, would potentially also provide a benefit without
introducing greater place-frequency mismatches, albeit not to the extent that place-rate signal
processing could potentially improve resolution.
182
Culmination of Dissertation Topics: Spatial Hearing and Cues
The culmination of the topics in this dissertation would be to improve spatial hearing
through noise reduction that preserved binaural cues, synchronized bilateral processors for level
and timing differences (Dwyer et al., 2021), and the introduction of the temporal information
needed to encode interaural timing differences between ears (Churchill et al., 2014; Warnecke et
al., 2020). Some studies which did a short-term exploration of synchronized processors
(Dennison et al., 2022) and of TFS encoding (Ausili et al., 2020) did not find a benefit, but it is
likely that all three suggestions would need to be implemented and CI users would need
experience/training with this form of spatial hearing to receive a benefit (Rosskothen-Kuhl et al.,
2021; Sunwoo et al., 2021). This does raise a question though of the practicality and feasibility
of implementing three new elements into an already time-intensive fitting process. The current
fitting process takes an audiologist about 30-90 minutes per session (Hoppe et al., 2017;
Vaerenberg et al., 2014). Currently, temporal information and coordination of processor
necessary to provide fine interaural timing cues between processors has not been implemented
by any implant company and would need to be before this is possible. It is likely that
coordination of processors (once made possible) would include looking at electrode insertion
mismatches, frequency allocation matching, and gain matching between ears. This alone could
take upwards of 2-3 hours. There is a likelihood that an improved interface and coordinated
mapping of both ears (e.g., matching each electrode gain to the loudness associated with the gain
of the corresponding contralateral electrode) could cut down time spent considerably. There is
also a possibility of a simpler matching interface built into the app, where the CI user could
adjust gain and frequency differences between ears in real-time to get the best match, with safety
parameters in place determined by their audiologist. This would also cut down on the additional
183
initial fitting time as well as allow the CI user to adjust things in a realistic sound environment.
This may help cut down on the difference in sound quality at the clinic versus at home, which is
one of the common complaints I have heard personally from CI users. The spatial hearing
training portion would be offered to the individual after a proper fitting and could be offloaded
onto a research team. In the USC Otolaryngology department, this would create a stronger
relationship between the clinic and research departments, and the research team could implement
goal-based training strategies which provide individualized training and benchmarks. This is just
one potential fleshed-out way to improve CI processing and outcomes based on the topics in this
dissertation.
The issues of noise reduction and music appreciation have a significant effect on the
quality of life in CI users, so it is important to continue to explore how noisy situations and
music can be better processed and encoded in CIs.
184
References
Ambert-Dahan, E., Giraud, A.-L., Sterkers, O., & Samson, S. (2015). Judgment of musical
emotions after cochlear implantation in adults with progressive deafness. Frontiers in
Psychology, 6(181), 1–11. https://doi.org/10.3389/fpsyg.2015.00181
Amitay, S., Irwin, A., & Moore, D. R. (2006). Discrimination learning induced by training with
identical stimuli. Nature Neuroscience, 9(11), 1446–1448.
https://doi.org/10.1038/nn1787
Amlani, A. M. (2001). Efficacy of directional microphone hearing aids: A meta-analytic
perspective. Journal of the American Academy of Audiology, 12(4), 202–214.
Andries, E., Gilles, A., Topsakal, V., Vanderveken, O. M., Van de Heyning, P., Van Rompaey,
V., & Mertens, G. (2021). Systematic Review of Quality of Life Assessments after
Cochlear Implantation in Older Adults. Audiology and Neurotology, 26(2), 61–75.
https://doi.org/10.1159/000508433
Arbogast, T. L., Mason, C. R., & Kidd, G. (2005). The effect of spatial separation on
informational masking of speech in normal-hearing and hearing-impaired listeners. The
Journal of the Acoustical Society of America, 117(4), 2169–2180.
https://doi.org/10.1121/1.1861598
Argstatter, H., Hutter, E., & Grapp, M. (2016). Music therapy in the early rehabilitation of adult
cochlear implant (CI) users: Individual training and band project. Nordic Journal of
Music Therapy, 25, 9–9. https://doi.org/10.1080/08098131.2016.1179879
Arnoldner, C., Riss, D., Brunner, M., Durisin, M., Baumgartner, W.-D., & Hamzavi, J.-S.
(2007). Speech and music perception with the new fine structure speech coding strategy:
Preliminary results. Acta Oto-Laryngologica, 127(12), 1298–1303.
https://doi.org/10.1080/00016480701275261
Attneave, F., Olson, R. K., & EXARC Experimental Archaeology Collection Manager. (1971).
Pitch as a medium: A new approach to psychophysical scaling. American Journal of
Psychology, 84, 147–166.
Ausili, S. A., Agterberg, M. J. H., Engel, A., Voelter, C., Thomas, J. P., Brill, S., Snik, A. F. M.,
Dazert, S., Van Opstal, A. J., & Mylanus, E. A. M. (2020). Spatial Hearing by Bilateral
Cochlear Implant Users With Temporal Fine-Structure Processing. Frontiers in
Neurology, 11. https://www.frontiersin.org/articles/10.3389/fneur.2020.00915
Bahmer, A., & Baumann, U. (2013a). New parallel stimulation strategies revisited: Effect of
synchronous multi electrode stimulation on rate discrimination in cochlear implant users.
Cochlear Implants International, 14(3), 142–149.
https://doi.org/10.1179/1754762812Y.0000000011
Bahmer, A., & Baumann, U. (2013b). New parallel stimulation strategies revisited: Effect of
synchronous multi electrode stimulation on rate discrimination in cochlear implant users.
185
Cochlear Implants International, 14(3), 142–149.
https://doi.org/10.1179/1754762812Y.0000000011
Bahmer, A., & Langner, G. (2009). A simulation of chopper neurons in the cochlear nucleus
with wideband input from onset neurons. Biological Cybernetics, 100(1), 21–33.
https://doi.org/10.1007/s00422-008-0276-3
Battmer, R. D., Goldring, J. E., Kanert, W., Meyer, V., Bertram, B., & Lenarz, T. (2000).
Simultaneous Analog Stimulation Pilot Pediatric Study. Annals of Otology, Rhinology &
Laryngology, 109(12_suppl), 58–60. https://doi.org/10.1177/0003489400109S1224
Battmer, R. D., Haake, P., Zilberman, Y., & Lenarz, T. (1999). Simultaneous Analog Stimulation
(SAS)–Continuous Interleaved Sampler (CIS) Pilot Comparison Study in Europe. Annals
of Otology, Rhinology & Laryngology, 108(4_suppl), 69–73.
https://doi.org/10.1177/00034894991080S414
Baumann, U., & Nobbe, A. (2004). Pulse rate discrimination with deeply inserted electrode
arrays. Hearing Research, 196(1–2), 49–57. https://doi.org/10.1016/j.heares.2004.06.008
Bentler, R. A. (2005). Effectiveness of directional microphones and noise reduction schemes in
hearing aids: A systematic review of the evidence. Journal of the American Academy of
Audiology, 16(7), 473–484. https://doi.org/10.3766/jaaa.16.7.7
Bernstein, J. G. W., & Oxenham, A. J. (2006). The relationship between frequency selectivity
and pitch discrimination: Effects of stimulus level. The Journal of the Acoustical Society
of America, 120(6), 3916–3928. https://doi.org/10.1121/1.2372451
Best, V., Marrone, N., Mason, C. R., & Kidd, G. (2012). The influence of non-spatial factors on
measures of spatial release from masking. The Journal of the Acoustical Society of
America, 131(4), 3103–3110. https://doi.org/10.1121/1.3693656
Bianchi, F., Carney, L. H., Dau, T., & Santurette, S. (2019). Effects of Musical Training and
Hearing Loss on Fundamental Frequency Discrimination and Temporal Fine Structure
Processing: Psychophysics and Modeling. Journal of the Association for Research in
Otolaryngology, 20(3), 263–277. https://doi.org/10.1007/s10162-018-00710-2
Bierer, J. A., & Faulkner, K. F. (2010a). Identifying Cochlear Implant Channels with Poor
Electrode-Neuron Interface: Partial Tripolar, Single-Channel Thresholds and
Psychophysical Tuning Curves: Ear and Hearing, 31(2), 247–258.
https://doi.org/10.1097/AUD.0b013e3181c7daf4
Bierer, J. A., & Faulkner, K. F. (2010b). Identifying Cochlear Implant Channels with Poor
Electrode-Neuron Interface: Partial Tripolar, Single-Channel Thresholds and
Psychophysical Tuning Curves. Ear and Hearing, 31(2), 247–258.
https://doi.org/10.1097/AUD.0b013e3181c7daf4
186
Bilger, R. C. (1984). Speech recognition test development. (ASHA Reports 14; Speech
Recognition by the Hearing Impaired.). American Speech-Language-Hearing
Association. https://www.asha.org/siteassets/publications/ashareports14.pdf
Bissmeyer, S. R. S., & Goldsworthy, R. L. (2022). Combining Place and Rate of Stimulation
Improves Frequency Discrimination in Cochlear Implant Users. Hearing Research, 424,
108583. https://doi.org/10.1016/j.heares.2022.108583
Bissmeyer, S. R. S., Hossain, S., & Goldsworthy, R. L. (2020a). Perceptual learning of pitch
provided by cochlear implant stimulation rate. PLOS ONE, 15(12), e0242842.
https://doi.org/10.1371/journal.pone.0242842
Bissmeyer, S. R. S., Hossain, S., & Goldsworthy, R. L. (2020b). Perceptual learning of pitch
provided by cochlear implant stimulation rate. PLOS ONE, 15(12), e0242842.
https://doi.org/10.1371/journal.pone.0242842
Boisvert, I., Reis, M., Au, A., Cowan, R., & Dowell, R. C. (2020). Cochlear implantation
outcomes in adults: A scoping review. PLOS ONE, 15(5), e0232421.
https://doi.org/10.1371/journal.pone.0232421
Bolia, R. S., Nelson, W. T., Ericson, M. a, & Simpson, B. D. (2000). A speech corpus for
multitalker communications research. The Journal of the Acoustical Society of America,
107(2), 1065–1066.
Boulet, J., White, M., & Bruce, I. C. (2016). Temporal Considerations for Stimulating Spiral
Ganglion Neurons with Cochlear Implants. Journal of the Association for Research in
Otolaryngology: JARO; New York, 17(1), 1–17.
http://dx.doi.org.libproxy2.usc.edu/10.1007/s10162-015-0545-5
Brandstein, M., & Ward, D. (2001). Microphone Arrays: Signal Processing Techniques and
Applications. Springer Science & Business Media.
Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two
simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 1101–
1109. https://doi.org/10.1121/1.1345696
Bruns, L., Mürbe, D., & Hahne, A. (2016). Understanding music with cochlear implants.
Scientific Reports, 6. https://doi.org/10.1038/srep32026
Caldwell, A., & Nittrouer, S. (2013). Speech Perception in Noise by Children With Cochlear
Implants. Journal of Speech, Language, and Hearing Research : JSLHR, 56(1), 13–30.
https://doi.org/10.1044/1092-4388(2012/11-0338)
Caldwell, M. T., Jiam, N. T., & Limb, C. J. (2017). Assessment and improvement of sound
quality in cochlear implant users. Laryngoscope Investigative Otolaryngology, 2(3), 119–
124. https://doi.org/10.1002/lio2.71
187
Camarena, A., Manchala, G., Papadopoulos, J., O’Connell, S. R., & Goldsworthy, R. L. (2022).
Pleasantness Ratings of Musical Dyads in Cochlear Implant Users. Brain Sciences, 12(1),
Article 1. https://doi.org/10.3390/brainsci12010033
Carlile, S., Fox, A., Orchard-Mills, E., Leung, J., & Alais, D. (2016). Six Degrees of Auditory
Spatial Separation. Journal of the Association for Research in Otolaryngology: JARO,
17(3), 209–221. https://doi.org/10.1007/s10162-016-0560-1
Carlyon, R. P., & Deeks, J. M. (2013). Relationships Between Auditory Nerve Activity and
Temporal Pitch Perception in Cochlear Implant Users. In B. C. J. Moore, R. D. Patterson,
I. M. Winter, R. P. Carlyon, & H. E. Gockel (Eds.), Basic Aspects of Hearing (pp. 363–
371). Springer. https://doi.org/10.1007/978-1-4614-1590-9_40
Carlyon, R. P., & Deeks, J. M. (2015). Combined neural and behavioural measures of temporal
pitch perception in cochlear implant users. The Journal of the Acoustical Society of
America, 138(5), 2885–2905. https://doi.org/10.1121/1.4934275
Carlyon, R. P., Deeks, J. M., & McKay, C. M. (2010). The upper limit of temporal pitch for
cochlear-implant listeners: Stimulus duration, conditioner pulses, and the number of
electrodes stimulated. The Journal of the Acoustical Society of America, 127(3), 1469–
1478. https://doi.org/10.1121/1.3291981
Carlyon, R. P., & Goehring, T. (2021). Cochlear Implant Research and Development in the
Twenty-first Century: A Critical Update. Journal of the Association for Research in
Otolaryngology, 22(5), 481–508. https://doi.org/10.1007/s10162-021-00811-5
Carlyon, R. P., Long, C. J., & Micheyl, C. (2012). Across-Channel Timing Differences as a
Potential Code for the Frequency of Pure Tones. Journal of the Association for Research
in Otolaryngology, 13(2), 159–171. https://doi.org/10.1007/s10162-011-0305-0
Chung, K. (2004). Challenges and recent developments in hearing aids. Part I. Speech
understanding in noise, microphone technologies and noise reduction algorithms. Trends
in Amplification, 8(3), 83–124. https://doi.org/10.1177/108471380400800302
Chung, K., & Zeng, F. G. (2009). Using hearing aid adaptive directional microphones to enhance
cochlear implant performance. Hearing Research, 250(1–2), 27–37.
https://doi.org/10.1016/j.heares.2009.01.005
Chung, K., Zeng, F.-G., & Acker, K. N. (2006). Effects of directional microphone and adaptive
multichannel noise reduction algorithm on cochlear implant performance. The Journal of
the Acoustical Society of America, 120(4), 2216–2216. https://doi.org/10.1121/1.2258500
Chung, K., Zeng, F.-G., & Waltzman, S. (2004). Using hearing aid directional microphones and
noise reduction algorithms to enhance cochlear implant performance. Acoustics Research
Letters Online, 5(2), 56–61. https://doi.org/10.1121/1.1666869
Churchill, T. H., Kan, A., Goupell, M. J., & Litovsky, R. Y. (2014). Spatial hearing benefits
demonstrated with presentation of acoustic temporal fine structure cues in bilateral
188
cochlear implant listeners. The Journal of the Acoustical Society of America, 136(3),
1246–1256. https://doi.org/10.1121/1.4892764
Clark, G. (2006). Cochlear Implants: Fundamentals and Applications. Springer Science &
Business Media.
Clopton, B. M., Winfield, J. A., & Flammino, F. J. (1974). Tonotopic organization: Review and
analysis. Brain Research, 76(1), 1–20. https://doi.org/10.1016/0006-8993(74)90509-5
Cohen, J. (1992). Statistical Power Analysis. Current Directions in Psychological Science, 1(3),
98–101. https://doi.org/10.1111/1467-8721.ep10768783
Cosentino, S., Carlyon, R. P., Deeks, J. M., Parkinson, W., & Bierer, J. A. (2016). Rate
discrimination, gap detection and ranking of temporal pitch in cochlear implant users.
Journal of the Association for Research in Otolaryngology, 17(4), 371–382.
https://doi.org/10.1007/s10162-016-0569-5
Crew, J. D., Galvin, J. J., & Fu, Q.-J. (2012). Channel interaction limits melodic pitch perception
in simulated cochlear implants. The Journal of the Acoustical Society of America, 132(5),
EL429–EL435. https://doi.org/10.1121/1.4758770
Cullington, H. E., & Zeng, F.-G. (2008). Speech recognition with varying numbers and types of
competing talkers by normal-hearing, cochlear-implant, and implant simulation subjects.
The Journal of the Acoustical Society of America, 123(1), 450–461.
https://doi.org/10.1121/1.2805617
Dennison, S. R., Jones, H. G., Kan, A., & Litovsky, R. Y. (2022). The Impact of Synchronized
Cochlear Implant Sampling and Stimulation on Free-Field Spatial Hearing Outcomes:
Comparing the ciPDA Research Processor to Clinical Processors. Ear & Hearing, 43(4),
1262–1272. https://doi.org/10.1097/AUD.0000000000001179
Deroche, M. L. D., Lu, H.-P., Limb, C. J., Lin, Y.-S., & Chatterjee, M. (2014). Deficits in the
pitch sensitivity of cochlear-implanted children speaking English or Mandarin. Frontiers
in Neuroscience, 8, 282. https://doi.org/10.3389/fnins.2014.00282
Desloge, J. G., Rabinowitz, W. M., & Zurek, P. M. (1997). Microphone-Array Hearing Aids with
Binaural Output—Part I : Fixed-Processing Systems. 5(6), 529–542.
Desmond, J. M., Collins, L. M., & Throckmorton, C. S. (2014). The effects of reverberant self-
and overlap-masking on speech recognition in cochlear implant listeners. The Journal of
the Acoustical Society of America, 135(6), EL304-10. https://doi.org/10.1121/1.4879673
Dhanasingh, A., & Jolly, C. (2017). An overview of cochlear implant electrode array designs.
Hearing Research, 356, 93–103. https://doi.org/10.1016/j.heares.2017.10.005
Djourno, A., Eyries, C., & Vallancien, B. (1957). De l’excitation e ́lectrique du nerf cochle ́aire
chez l’homme, par induction à distance, à l’aide d’un micro-bobinage inclus à demeure.
C R Seances Soc Biol Fil. 1957;151:423-425. C R Seances Soc Biol Fil., 151, 423–425.
189
do Nascimento, L. T., & Bevilacqua, M. C. (2005). Evaluation of speech perception in noise in
cochlear implanted adults. Brazilian Journal of Otorhinolaryngology, 71(4), 432–438.
https://doi.org/10.1016/S1808-8694(15)31195-2
Drennan, W. R., Gatehouse, S., & Lever, C. (2003). Perceptual segregation of competing speech
sounds: The role of spatial location. The Journal of the Acoustical Society of America,
114(4), 2178–2189. https://doi.org/10.1121/1.1609994
Drennan, W. R., Longnion, J. K., Ruffin, C., & Rubinstein, J. T. (2008). Discrimination of
Schroeder-Phase Harmonic Complexes by Normal-Hearing and Cochlear-Implant
Listeners. Journal of the Association for Research in Otolaryngology, 9(1), 138–149.
https://doi.org/10.1007/s10162-007-0107-6
Drennan, W. R., & Rubinstein, J. T. (2008). Music perception in cochlear implant users and its
relationship with psychophysical capabilities. Journal of Rehabilitation Research and
Development; Washington, 45(5), 779–789.
Dreyer, A., & Delgutte, B. (2006). Phase Locking of Auditory-Nerve Fibers to the Envelopes of
High-Frequency Sounds: Implications for Sound Localization. Journal of
Neurophysiology, 96(5), 2327–2341. https://doi.org/10.1152/jn.00326.2006
Dwyer, R. T., Chen, C., Hehrmann, P., Dwyer, N. C., & Gifford, R. H. (2021). Synchronized
Automatic Gain Control in Bilateral Cochlear Implant Recipients Yields Significant
Benefit in Static and Dynamic Listening Conditions. Trends in Hearing, 25,
23312165211014140. https://doi.org/10.1177/23312165211014139
Dynes, S. B. C., & Delgutte, B. (1992). Phase-locking of auditory-nerve discharges to sinusoidal
electric stimulation of the cochlea. Hearing Research, 58(1), 79–90.
https://doi.org/10.1016/0378-5955(92)90011-B
Eisen, M. D. (2003). Djourno, Eyries, and the First Implanted Electrical Neural Stimulator to
Restore Hearing. Otology & Neurotology, 24(3), 500–506.
https://doi.org/10.1097/00129492-200305000-00025
Erfanian Saeedi, N., Blamey, P. J., Burkitt, A. N., & Grayden, D. B. (2017). An integrated model
of pitch perception incorporating place and temporal pitch codes with application to
cochlear implant research. Hearing Research, 344, 135–147.
https://doi.org/10.1016/j.heares.2016.11.005
Eshraghi, A. A., Nazarian, R., Telischi, F. F., Rajguru, S. M., Truy, E., & Gupta, C. (2012). The
cochlear implant: Historical aspects and future prospects. Anatomical Record (Hoboken,
N.J. : 2007), 295(11), 1967–1980. https://doi.org/10.1002/ar.22580
Fallon, J. B., Shepherd, R. K., Nayagam, D. A. X., Wise, A. K., Heffer, L. F., Landry, T. G., &
Irvine, D. R. F. (2014). Effects of deafness and cochlear implant use on temporal
response characteristics in cat primary auditory cortex. Hearing Research, 315, 1–9.
https://doi.org/10.1016/j.heares.2014.06.001
190
Fearn, R., & Wolfe, J. (2000). Relative Importance of Rate and Place: Experiments Using Pitch
Scaling Techniques with Cochlear Implant Recipients. Annals of Otology, Rhinology &
Laryngology, 109(12_suppl), 51–53. https://doi.org/10.1177/0003489400109S1221
Fekete, D. M., Rouiller, E. M., Liberman, M. C., & Ryugo, D. K. (1984). The central projections
of intracellularly labeled auditory nerve fibers in cats. Journal of Comparative
Neurology, 229(3), 432–450. https://doi.org/10.1002/cne.902290311
Fetterman, B. L., & Domico, E. H. (2002). Speech recognition in background noise of cochlear
implant patients. Otolaryngology--Head and Neck Surgery: Official Journal of American
Academy of Otolaryngology-Head and Neck Surgery, 126(3), 257–263.
https://doi.org/10.1067/mhn.2002.123044
Finley, C. C., Holden, T. A., Holden, L. K., Whiting, B. R., Chole, R. A., Neely, G. J., Hullar, T.
E., & Skinner, M. W. (2008). Role of Electrode Placement as a Contributor to Variability
in Cochlear Implant Outcomes. Otology & Neurotology, 29(7), 920–928.
https://doi.org/10.1097/MAO.0b013e318184f492
Firszt, J. B., Holden, L. K., Reeder, R. M., & Skinner, M. W. (2009). Speech Recognition in
Cochlear Implant Recipients: Comparison of Standard HiRes and HiRes 120 Sound
Processing. Otology & Neurotology, 30(2), 146.
https://doi.org/10.1097/MAO.0b013e3181924ff8
Francart, T., Osses, A., & Wouters, J. (2015). Speech perception with F0mod, a cochlear implant
pitch coding strategy. International Journal of Audiology, 54(6), 424–432.
https://doi.org/10.3109/14992027.2014.989455
Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2001). Spatial release from informational
masking in speech recognition. The Journal of the Acoustical Society of America, 109(5),
2112–2122. https://doi.org/10.1121/1.1354984
Freyman, R. L., Helfer, K. S., McCall, D. D., & Clifton, R. K. (1999). The role of perceived
spatial separation in the unmasking of speech. The Journal of the Acoustical Society of
America, 106(6), 3578–3588. https://doi.org/10.1121/1.428211
Friesen, L. M., Shannon, R. V., Baskent, D., & Wang, X. (2001). Speech recognition in noise as
a function of the number of spectral channels: Comparison of acoustic hearing and
cochlear implants. The Journal of the Acoustical Society of America, 110(2), 1150–1163.
https://doi.org/10.1121/1.1381538
Frost, O. L. (1972). An algorithm for linearly constrained adaptive array processing. Proceedings
of the IEEE, 60(8), 926–935. https://doi.org/10.1109/PROC.1972.8817
Fu, Q. J., & Shannon, R. V. (1999a). Effects of electrode configuration and frequency allocation
on vowel recognition with the Nucleus-22 cochlear implant. Ear and Hearing, 20(4),
332–344. https://doi.org/10.1097/00003446-199908000-00006
191
Fu, Q. J., & Shannon, R. V. (1999b). Effects of electrode location and spacing on phoneme
recognition with the Nucleus-22 cochlear implant. Ear and Hearing, 20(4), 321–331.
https://doi.org/10.1097/00003446-199908000-00005
Fu, Q.-J., & Galvin, J. J. (2007). Perceptual Learning and Auditory Training in Cochlear Implant
Recipients. Trends in Amplification, 11(3), 193–205.
https://doi.org/10.1177/1084713807301379
Fu, Q.-J., & Nogaki, G. (2005). Noise Susceptibility of Cochlear Implant Users: The Role of
Spectral Resolution and Smearing. JARO: Journal of the Association for Research in
Otolaryngology, 6(1), 19–27. https://doi.org/10.1007/s10162-004-5024-3
Fu, Q.-J., & Shannon, R. V. (1999). Recognition of spectrally degraded and frequency-shifted
vowels in acoustic and electric hearing. The Journal of the Acoustical Society of America,
105(3), 1889–1900. https://doi.org/10.1121/1.426725
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances
automatic encoding of melodic contour and interval structure. Journal of Cognitive
Neuroscience, 16(6), 1010–1021. https://doi.org/10.1162/0898929041502706
Gallun, F., Diedesch, A., Kampel, S., & Jakien, K. (2013). Independent impacts of age and
hearing loss on spatial release in a complex auditory environment. Frontiers in
Neuroscience, 7. https://www.frontiersin.org/article/10.3389/fnins.2013.00252
Galvin, J. J. I., Fu, Q.-J., & Nogaki, G. (2007). Melodic Contour Identification by Cochlear
Implant Listeners. Ear and Hearing, 28(3), 302.
https://doi.org/10.1097/01.aud.0000261689.35445.20
Gazibegovic, D., Arnold, L., Rocca, C., & Boyle, P. (2010). Evaluation of Music Perception in
Adult Users of HiRes® 120 and Previous Generations of Advanced Bionics® Sound
Coding Strategies. Cochlear Implants International, 11(sup1), 296–301.
https://doi.org/10.1179/146701010X12671177989354
Geurts, L., & Wouters, J. (2004). Better place-coding of the fundamental frequency in cochlear
implants. The Journal of the Acoustical Society of America, 115(2), 844–852.
https://doi.org/10.1121/1.1642623
Gfeller, K. (2001). Aural Rehabilitation of Music Listening for Adult Cochlear Implant
Recipients: Addressing Learner Characteristics. Music Therapy Perspectives, 19(2), 88–
95. https://doi.org/10.1093/mtp/19.2.88
Gfeller, K., Christ, A., Knutson, J. F., Witt, S., Murray, K. T., & Tyler, R. S. (2000). Musical
backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant
recipients. Journal of the American Academy of Audiology, 11(7), 390–406.
Gfeller, K., Driscoll, V., & Schwalje, A. (2019). Adult Cochlear Implant Recipients’
Perspectives on Experiences With Music in Everyday Life: A Multifaceted and Dynamic
192
Phenomenon. Frontiers in Neuroscience, 13.
https://www.frontiersin.org/articles/10.3389/fnins.2019.01229
Gfeller, K., Olszewski, C., Rychener, M., Sena, K., Knutson, J. F., Witt, S., & Macpherson, B.
(2005). Recognition of “real-world” musical excerpts by cochlear implant recipients and
normal-hearing adults. Ear and Hearing, 26(3), 237–250.
https://doi.org/10.1097/00003446-200506000-00001
Gfeller, K., Turner, C., Mehr, M., Woodworth, G., Fearn, R., Knutson, J. F., Witt, S., &
Stordahl, J. (2002). Recognition of familiar melodies by adult cochlear implant recipients
and normal-hearing adults. Cochlear Implants International, 3(1), 29–53.
https://doi.org/10.1002/cii.50
Gfeller, K., Turner, C., Oleson, J., Zhang, X., Gantz, B., Froman, R., & Olszewski, C. (2007a).
Accuracy of Cochlear Implant Recipients on Pitch Perception, Melody Recognition, and
Speech Reception in Noise. Ear and Hearing, 28(3), 412–423.
https://doi.org/10.1097/AUD.0b013e3180479318
Gfeller, K., Turner, C., Oleson, J., Zhang, X., Gantz, B., Froman, R., & Olszewski, C. (2007b).
Accuracy of Cochlear Implant Recipients on Pitch Perception, Melody Recognition, and
Speech Reception in Noise. Ear & Hearing, 28(3), 412–423.
https://doi.org/10.1097/AUD.0b013e3180479318
Gfeller, K., Witt, S., Adamek, M., Mehr, M., Rogers, J., Stordahl, J., & Ringgenberg, S. (2002).
Effects of training on timbre recognition and appraisal by postlingually deafened cochlear
implant recipients. Journal of the American Academy of Audiology, 13(3), 132–145.
Gilbers, S., Fuller, C., Gilbers, D., Broersma, M., Goudbeek, M., Free, R., & Başkent, D. (2015).
Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in
Emotional Speech. I-Perception, 6(5), 1–19. https://doi.org/10.1177/0301006615599139
Golding, N. L., & Oertel, D. (2012). Synaptic integration in dendrites: Exceptional need for
speed. The Journal of Physiology, 590(Pt 22), 5563–5569.
https://doi.org/10.1113/jphysiol.2012.229328
Goldsworthy, R. L. (2014). Two-Microphone Spatial Filtering Improves Speech Reception for
Cochlear-Implant Users in Reverberant Conditions With Multiple Noise Sources. Trends
in Hearing, 18. https://doi.org/10.1177/2331216514555489
Goldsworthy, R. L. (2015). Correlations Between Pitch and Phoneme Perception in Cochlear
Implant Users and Their Normal Hearing Peers. Journal of the Association for Research
in Otolaryngology: JARO, 16(6), 797–809. https://doi.org/10.1007/s10162-015-0541-9
Goldsworthy, R. L., & Bissmeyer, S. R. S. (in review). Cochlear implant users can effectively
combine place and timing cues for pitch perception. Ear and Hearing.
Goldsworthy, R. L., Bissmeyer, S. R. S., & Camarena, A. (2022). Advantages of Pulse Rate
Compared to Modulation Frequency for Temporal Pitch Perception in Cochlear Implant
193
Users. Journal of the Association for Research in Otolaryngology.
https://doi.org/10.1007/s10162-021-00828-w
Goldsworthy, R. L., Camarena, A., & Bissmeyer, S. R. S. (2021). Pitch perception is more robust
to interference and better resolved when provided by pulse rate than by modulation
frequency of cochlear implant stimulation. Hearing Research, 409, 108319.
https://doi.org/10.1016/j.heares.2021.108319
Goldsworthy, R. L., Delhorne, L. A., Braida, L. D., & Reed, C. M. (2013). Psychoacoustic and
Phoneme Identification Measures in Cochlear-Implant and Normal-Hearing Listeners.
Trends in Amplification, 17(1), 27–44. https://doi.org/10.1177/1084713813477244
Goldsworthy, R. L., Delhorne, L. A., Desloge, J. G., & Braida, L. D. (2014). Two-microphone
spatial filtering provides speech reception benefits for cochlear implant users in difficult
acoustic environments. The Journal of the Acoustical Society of America, 136(2), 867–
876. https://doi.org/10.1121/1.4887453
Goldsworthy, R. L., & Shannon, R. V. (2014). Training improves cochlear implant rate
discrimination on a psychophysical task. The Journal of the Acoustical Society of
America, 135(1), 334–341. https://doi.org/10.1121/1.4835735
Gordon, K. A., Papsin, B. C., & Harrison, R. V. (2007). Auditory brainstem activity and
development evoked by apical versus basal cochlear implant electrode stimulation in
children. Clinical Neurophysiology, 118(8), 1671–1684.
https://doi.org/10.1016/j.clinph.2007.04.030
Grasmeder, M. L., Verschuur, C. A., & Batty, V. B. (2014). Optimizing frequency-to-electrode
allocation for individual cochlear implant users. The Journal of the Acoustical Society of
America, 136(6), 3313–3324. https://doi.org/10.1121/1.4900831
Greenberg, J. E., & Zurek, P. M. (1992). Evaluation of an adaptive beamforming method for
hearing aids. The Journal of the Acoustical Society of America, 91(3), 1662–1676.
https://doi.org/10.1121/1.402446
Greenwood, D. D. (1990). A cochlear frequency‐position function for several species—29 years
later. The Journal of the Acoustical Society of America, 87(6), 2592–2605.
https://doi.org/10.1121/1.399052
Griffiths, L., & Jim, C. (1982). An alternative approach to linearly constrained adaptive
beamforming. IEEE Transactions on Antennas and Propagation, 30(1), 27–34.
https://doi.org/10.1109/TAP.1982.1142739
Habibi, A., Cahn, B. R., Damasio, A., & Damasio, H. (2016). Neural correlates of accelerated
auditory processing in children engaged in music training. Developmental Cognitive
Neuroscience, 21, 1–14. https://doi.org/10.1016/j.dcn.2016.04.003
194
Habibi, A., Wirantana, V., & Starr, A. (2013). Cortical Activity During Perception of Musical
Pitch: Comparing Musicians and Nonmusicians. Music Perception: An Interdisciplinary
Journal, 30(5), 463–479. https://doi.org/10.1525/mp.2013.30.5.463
Habibi, A., Wirantana, V., & Starr, A. (2014). Cortical Activity during Perception of Musical
Rhythm; Comparing Musicians and Non-musicians. Psychomusicology, 24(2), 125–135.
https://doi.org/10.1037/pmu0000046
Hainarosie, M., Zainea, V., & Hainarosie, R. (2014). The evolution of cochlear implant
technology and its clinical relevance. Journal of Medicine and Life, 7(Spec Iss 2), 1–4.
Hamacher, V., Doering, W. H., Mauer, G., Fleischmann, H., & Hennecke, J. (1997). Evaluation
of noise reduction systems for cochlear implant users in different acoustic environment.
The American Journal of Otology, 18(6 Suppl), S46-49.
Hawley, M. L., Litovsky, R. Y., & Culling, J. F. (2004). The benefit of binaural hearing in a
cocktail party: Effect of location and type of interferer. Journal of the Acoustical Society
of America, 115(2), 833–843. https://doi.org/10.1121/1.1639908
Hazrati, O., & Loizou, P. C. (2012). The combined effects of reverberation and noise on speech
intelligibility by cochlear implant listeners. International Journal of Audiology,
51(January), 437–443. https://doi.org/10.3109/14992027.2012.658972
Heinz, M. G., Colburn, H. S., & Carney, L. H. (2001). Evaluating Auditory Performance Limits:
I. One-Parameter Discrimination Using a Computational Model for the Auditory Nerve.
Neural Computation, 13(10), 2273–2316. https://doi.org/10.1162/089976601750541804
Helmholtz, H. von. (1885). On the Sensations of Tone as a Physiological Basis for the Theory of
Music. Longmans, Green.
Heng, J., Cantarero, G., Elhilali, M., & Limb, C. J. (2011). Impaired perception of temporal fine
structure and musical timbre in cochlear implant users. Hearing Research, 280(1–2),
192–200. https://doi.org/10.1016/j.heares.2011.05.017
Henshaw, H., & Ferguson, M. A. (2013). Efficacy of Individual Computer-Based Auditory
Training for People with Hearing Loss: A Systematic Review of the Evidence. PLoS
ONE, 8(5), e62836. https://doi.org/10.1371/journal.pone.0062836
Hersbach, A. A., Arora, K., Mauger, S. J., & Dawson, P. W. (2012). Combining Directional
Microphone and Single-Channel Noise Reduction Algorithms: A Clinical Evaluation in
Difficult Listening Conditions With Cochlear Implant Users. Ear and Hearing, 33(4),
e13. https://doi.org/10.1097/AUD.0b013e31824b9e21
Hill, K. G., Stange, G., & Mo, J. (1989). Temporal synchronization in the primary auditory
response in the pigeon. Hearing Research, 39(1), 63–73. https://doi.org/10.1016/0378-
5955(89)90082-8
195
Hirsh, I. J., Rosenblith, W. A., & Ward, W. D. (1950). The Masking of Clicks by Pure Tones and
Bands of Noise. The Journal of the Acoustical Society of America, 22(5), 631–637.
https://doi.org/10.1121/1.1906662
Hong, R. S., & Rubinstein, J. T. (2006). Conditioning pulse trains in cochlear implants: Effects
on loudness growth. Otology & Neurotology: Official Publication of the American
Otological Society, American Neurotology Society [and] European Academy of Otology
and Neurotology, 27(1), 50–56. https://doi.org/10.1097/01.mao.0000187045.73791.db
Hoppe, U., Liebscher, T., & Hornung, J. (2017). Anpassung von Cochleaimplantatsystemen.
HNO, 65(7), 546–551. https://doi.org/10.1007/s00106-016-0226-7
House, W. F. (1976). Cochlear Implants. Annals of Otology, Rhinology & Laryngology,
85(3_suppl), 3–3. https://doi.org/10.1177/00034894760850S303
Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for complex
tones with many harmonics. The Journal of the Acoustical Society of America, 87(1),
304–310. https://doi.org/10.1121/1.399297
Hughes, M. L., Baudhuin, J. L., & Goehring, J. L. (2014). The relation between auditory-nerve
temporal responses and perceptual rate integration in cochlear implants. Hearing
Research, 316, 44–56. https://doi.org/10.1016/j.heares.2014.07.007
Hughes, M. L., Castioni, E. E., Goehring, J. L., & Baudhuin, J. L. (2012a). Temporal response
properties of the auditory nerve: Data from human cochlear-implant recipients. Hearing
Research, 285(1), 46–57. https://doi.org/10.1016/j.heares.2012.01.010
Hughes, M. L., Castioni, E. E., Goehring, J. L., & Baudhuin, J. L. (2012b). Temporal response
properties of the auditory nerve: Data from human cochlear-implant recipients. Hearing
Research, 285(1), 46–57. https://doi.org/10.1016/j.heares.2012.01.010
Hughes, M. L., & Laurello, S. A. (2017). Effect of stimulus level on the temporal response
properties of the auditory nerve in cochlear implants. Hearing Research, 351, 116–129.
https://doi.org/10.1016/j.heares.2017.06.004
Hutter, E., Argstatter, H., Grapp, M., & Plinkert, P. K. (2015). Music therapy as specific and
complementary training for adults after cochlear implantation: A pilot study. Cochlear
Implants International, 16(sup3), S13–S21.
https://doi.org/10.1179/1467010015Z.000000000261
Imennov, N. S., Won, J. H., Drennan, W. R., Jameyson, E., & Rubinstein, J. T. (2013). Detection
of acoustic temporal fine structure by cochlear implant listeners: Behavioral results and
computational modeling. Hearing Research, 298, 60–72.
https://doi.org/10.1016/j.heares.2013.01.004
Irvine, D. R. F. (2018). Plasticity in the auditory system. Hearing Research, 362, 61–73.
https://doi.org/10.1016/j.heares.2017.10.011
196
Iyer, N., Brungart, D. S., & Simpson, B. D. (2010). Effects of target-masker contextual similarity
on the multimasker penalty in a three-talker diotic listening task. The Journal of the
Acoustical Society of America, 128(5), 2998–2910. https://doi.org/10.1121/1.3479547
Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and
Physiological Psychology, 41(1), 35–39. https://doi.org/10.1037/h0061495
Jesteadt, W., Schairer, K. S., & Neff, D. L. (2005). Effect of variability in level on forward
masking and on increment detection. The Journal of the Acoustical Society of America,
118(1), 325–337. https://doi.org/10.1121/1.1928709
Kaernbach, C. (1991). Simple adaptive testing with the weighted up-down method. Perception &
Psychophysics, 49(3), 227–229. https://doi.org/10.3758/BF03214307
Kaernbach, C., & Bering, C. (2001). Exploring the temporal mechanism involved in the pitch of
unresolved harmonics. The Journal of the Acoustical Society of America, 110(2), 1039–
1048. https://doi.org/10.1121/1.1381535
Karg, S. A., Lackner, C., & Hemmert, W. (2013). Temporal interaction in electrical hearing
elucidates auditory nerve dynamics in humans. Hearing Research, 299, 10–18.
https://doi.org/10.1016/j.heares.2013.01.015
Kates, J. M. (1993). Superdirective arrays for hearing aids. Proceedings of IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, 73–76.
https://doi.org/10.1109/ASPAA.1993.379993
Kates, J. M., & Weiss, M. R. (1996). A comparison of hearing-aid array processing techniques.
The Journal of the Acoustical Society of America, 99(5), 3138–3148.
https://doi.org/10.1121/1.414798
Kenway, B., Tam, Y. C., Vanat, Z., Harris, F., Gray, R., Birchall, J., Carlyon, R., & Axon, P.
(2015). Pitch Discrimination: An Independent Factor in Cochlear Implant Performance
Outcomes. Otology & Neurotology, 36(9), 1472–1479.
https://doi.org/10.1097/MAO.0000000000000845
Kidd, G., Mason, C. R., Best, V., & Marrone, N. (2010). Stimulus factors influencing spatial
release from speech-on-speech masking. Acoustical Society of America, 128(4), 1965–
1978. https://doi.org/10.1121/1.3478781͔
Kidd, G., Mason, C. R., Best, V., & Swaminathan, J. (2015). Benefits of Acoustic Beamforming
for Solving the Cocktail Party Problem. Trends in Hearing, 19(0), 1–15.
https://doi.org/10.1177/2331216515593385
Kidd, G., Mason, C. R., Swaminathan, J., Roverud, E., Clayton, K. K., & Best, V. (2016).
Determining the energetic and informational components of speech-on-speech masking.
The Journal of the Acoustical Society of America, 140(1), 132.
https://doi.org/10.1121/1.4954748
197
Kilgard, M. P., & Merzenich, M. M. (1998). Plasticity of temporal information processing in the
primary auditory cortex. Nature Neuroscience, 1(8), 727–731.
https://doi.org/10.1038/3729
Klasen, T. J., Doclo, S., Van den Bogaert, T., Moonen, M., & Wouters, J. (2006). Binaural
Multi-Channel Wiener Filtering for Hearing Aids: Preserving Interaural Time and Level
Differences. 2006 IEEE International Conference on Acoustics Speech and Signal
Processing Proceedings, 5, V–V. https://doi.org/10.1109/ICASSP.2006.1661233
Klasen, T. J., Van den Bogaert, T., Moonen, M., & Wouters, J. (2007). Binaural noise reduction
algorithms for hearing aids that preserve interaural time delay cues. IEEE Transactions
on Signal Processing, 55(4), 1579–1585. https://doi.org/10.1109/TSP.2006.888897
Kokkinakis, K., Azimi, B., Hu, Y., & Friedland, D. R. (2012). Single and Multiple Microphone
Noise Reduction Strategies in Cochlear Implants. Trends in Amplification, 16(2), 102–
116. https://doi.org/10.1177/1084713812456906
Kokkinakis, K., & Loizou, P. C. (2010). Multi-microphone adaptive noise reduction strategies
for coordinated stimulation in bilateral cochlear implant devices. The Journal of the
Acoustical Society of America, 127(5), 3136–3144. https://doi.org/10.1121/1.3372727
Kollmeier, B., & Koch, R. (1994). Speech enhancement based on physiological and
psychoacoustical models of modulation perception and binaural interaction. The Journal
of the Acoustical Society of America, 95(3), 1593–1602. https://doi.org/10.1121/1.408546
Kollmeier, B., Peissig, J., & Hohmann, V. (1993). Real-time multiband dynamic compression
and noise reduction for binaural hearing aids. Journal of Rehabilitation Research and
Development, 30(1), 82–94.
Kompis, M., & Dillier, N. (1994). Noise reduction for hearing aids: Combining directional
microphones with an adaptive beamformer. The Journal of the Acoustical Society of
America, 96(3), 1910–1913. https://doi.org/10.1121/1.410204
Kong, Y.-Y., & Carlyon, R. P. (2010). Temporal pitch perception at high rates in cochlear
implants. The Journal of the Acoustical Society of America, 127(5), 3114–3123.
https://doi.org/10.1121/1.3372713
Kong, Y.-Y., Cruz, R., Jones, J. A., & Zeng, F.-G. (2004). Music perception with temporal cues
in acoustic and electric hearing. Ear and Hearing, 25(2), 173–185.
https://doi.org/10.1097/01.aud.0000120365.97792.2f
Kong, Y.-Y., Deeks, J. M., Axon, P. R., & Carlyon, R. P. (2009). Limits of temporal pitch in
cochlear implants. The Journal of the Acoustical Society of America, 125(3), 1649–1657.
https://doi.org/10.1121/1.3068457
Kwon, B. J., & van den Honert, C. (2006). Dual-electrode pitch discrimination with sequential
interleaved stimulation by cochlear implant users. The Journal of the Acoustical Society
of America, 120(1), EL1–EL6. https://doi.org/10.1121/1.2208152
198
Landsberger, D. M., Marozeau, J., Mertens, G., & Van de Heyning, P. (2018). The relationship
between time and place coding with cochlear implants with long electrode arrays. The
Journal of the Acoustical Society of America, 144(6), EL509–EL514.
https://doi.org/10.1121/1.5081472
Landsberger, D. M., & Srinivasan, A. G. (2009). Virtual Channel Discrimination is Improved by
Current Focusing in Cochlear Implant Recipients. Hearing Research, 254(1–2), 34–41.
https://doi.org/10.1016/j.heares.2009.04.007
Landsberger, D. M., Svrakic, M., Roland, J. T. J., & Svirsky, M. (2015). The Relationship
between Insertion Angles, Default Frequency Allocations, and Spiral Ganglion Place
Pitch in Cochlear Implants. Ear and Hearing, 36(5), e207–e213.
https://doi.org/10.1097/AUD.0000000000000163
Landsberger, D. M., Vermeire, K., Claes, A., Van Rompaey, V., & Van de Heyning, P. (2016).
Qualities of Single Electrode Stimulation as a Function of Rate and Place of Stimulation
with a Cochlear Implant. Ear and Hearing, 37(3), e149–e159.
https://doi.org/10.1097/AUD.0000000000000250
Laneau, J., & Wouters, J. (2004). Multichannel Place Pitch Sensitivity in Cochlear Implant
Recipients. Journal of the Association for Research in Otolaryngology, 5(3), 285–294.
https://doi.org/10.1007/s10162-004-4049-y
Laneau, J., Wouters, J., & Moonen, M. (2004). Relative contributions of temporal and place
pitch cues to fundamental frequency discrimination in cochlear implantees. The Journal
of the Acoustical Society of America, 116(6), 3606–3619.
https://doi.org/10.1121/1.1823311
Laneau, J., Wouters, J., & Moonen, M. (2006). Improved Music Perception with Explicit Pitch
Coding in Cochlear Implants. Audiology & Neurotology; Basel, 11(1), 38–52.
Lassaletta, L., Castro, A., Bastarrica, M., Pérez-Mora, R., Herrán, B., Sanz, L., de Sarriá, M. J.,
& Gavilán, J. (2008). Changes in listening habits and quality of musical sound after
cochlear implantation. Otolaryngology–Head and Neck Surgery, 138(3), 363–367.
https://doi.org/10.1016/j.otohns.2007.11.028
Lassaletta, L., Castro, A., Bastarrica, M., Pérez-Mora, R., Madero, R., De Sarriá, J., & Gavilán,
J. (2007). Does music perception have an impact on quality of life following cochlear
implantation? Acta Oto-Laryngologica, 127(7), 682–686.
https://doi.org/10.1080/00016480601002112
Lehnhardt, E., Gnadeberg, D., Battmer, R. D., & von Wallenberg, E. (1992). Experience with the
cochlear miniature speech processor in adults and children together with a comparison of
unipolar and bipolar modes. ORL; Journal for Oto-Rhino-Laryngology and Its Related
Specialties, 54(6), 308–313. https://doi.org/10.1159/000276320
199
Leibold, L. J., & Buss, E. (2013). Children’s Identification of Consonants in a Speech-Shaped
Noise or a Two-Talker Masker. Journal of Speech, Language, and Hearing Research,
56(4), 1144–1155. https://doi.org/10.1044/1092-4388(2012/12-0011)
Leigh, J. R., Henshall, K. R., & McKay, C. M. (2004). Optimizing Frequency-to-Electrode
Allocation in Cochlear Implants. Journal of the American Academy of Audiology, 15(8),
574–584. https://doi.org/10.3766/jaaa.15.8.5
Levitt, H. (2001). Noise reduction in hearing aids: A review. 38(1), 11.
Liberman, M. C. (1982). The cochlear frequency map for the cat: Labeling auditory‐nerve fibers
of known characteristic frequency. The Journal of the Acoustical Society of America,
72(5), 1441–1449. https://doi.org/10.1121/1.388677
Limb, C. J., & Roy, A. T. (2014). Technological, biological, and acoustical constraints to music
perception in cochlear implant users. Hearing Research, 308, 13–26.
https://doi.org/10.1016/j.heares.2013.04.009
Limb, C. J., & Rubinstein, J. T. (2012). Current Research on Music Perception in Cochlear
Implant Users. Otolaryngologic Clinics of North America, 45(1), 129–140.
https://doi.org/10.1016/j.otc.2011.08.021
Litovsky, R. Y., Goupell, M. J., Kan, A., & Landsberger, D. M. (2017). Use of Research
Interfaces for Psychophysical Studies With Cochlear-Implant Users. Trends in Hearing,
21, 2331216517736464. https://doi.org/10.1177/2331216517736464
Little, D. F., Cheng, H. H., & Wright, B. A. (2019). Inducing musical-interval learning by
combining task practice with periods of stimulus exposure alone. Attention, Perception,
& Psychophysics, 81(1), 344–357. https://doi.org/10.3758/s13414-018-1584-x
Litvak, L. M., Delgutte, B., & Eddington, D. K. (2003). Improved temporal coding of sinusoids
in electric stimulation of the auditory nerve using desynchronizing pulse trains. The
Journal of the Acoustical Society of America, 114(4 Pt 1), 2079–2098.
https://doi.org/10.1121/1.1612493
Lockwood, M. E., Jones, D. L., Bilger, R. C., Lansing, C. R., O’Brien, W. D., Wheeler, B. C., &
Feng, A. S. (2004). Performance of time- and frequency-domain binaural beamformers
based on recorded signals from real rooms. The Journal of the Acoustical Society of
America, 115(1), 379–391. https://doi.org/10.1121/1.1624064
Loeb, G. E. (1990). Cochlear prosthetics. Annual Review of Neuroscience, 13, 357–371.
https://doi.org/10.1146/annurev.ne.13.030190.002041
Loeb, G. E., White, M. W., & Jenkins, W. M. (1983). Biophysical Considerations in Electrical
Stimulation of the Auditory Nervous System. Annals of the New York Academy of
Sciences, 405(1), 123–136. https://doi.org/10.1111/j.1749-6632.1983.tb31625.x
200
Loizou, P. C., Hu, Y., Litovsky, R., Yu, G., Peters, R., Lake, J., & Roland, P. (2009). Speech
recognition by bilateral cochlear implant users in a cocktail-party setting. The Journal of
the Acoustical Society of America, 125(1), 372–383. https://doi.org/10.1121/1.3036175
Loizou, P. C., Stickney, G., Mishra, L., & Assmann, P. (2003). Comparison of Speech
Processing Strategies Used in the Clarion Implant Processor. Ear and Hearing, 24(1),
12–19. https://doi.org/10.1097/01.AUD.0000052900.42380.50
Looi, V., Gfeller, K., & Driscoll, V. D. (2012). Music Appreciation and Training for Cochlear
Implant Recipients: A Review. Seminars in Hearing, 33(4), 307–334.
https://doi.org/10.1055/s-0032-1329222
Looi, V., McDermott, H., McKay, C., & Hickson, L. (2004). Pitch discrimination and melody
recognition by cochlear implant users. International Congress Series, 1273, 197–200.
https://doi.org/10.1016/j.ics.2004.08.038
Looi, V., McDermott, H., McKay, C., & Hickson, L. (2008). Music perception of cochlear
implant users compared with that of hearing aid users. Ear and Hearing, 29(3), 421–434.
https://doi.org/10.1097/AUD.0b013e31816a0d0b
Looi, V., & She, J. (2010). Music perception of cochlear implant users: A questionnaire, and its
implications for a music training program. International Journal of Audiology, 49(2),
116–128. https://doi.org/10.3109/14992020903405987
Lorens, A., Zgoda, M., Obrycka, A., & Skarżynski, H. (2010). Fine Structure Processing
improves speech perception as well as objective and subjective benefits in pediatric
MED-EL COMBI 40+ users. International Journal of Pediatric Otorhinolaryngology,
74(12), 1372–1378. https://doi.org/10.1016/j.ijporl.2010.09.005
Luo, X., Fu, Q.-J., & Galvin, J. J. (2007). Cochlear Implants Special Issue Article: Vocal
Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users. Trends
in Amplification, 11(4), 301–315. https://doi.org/10.1177/1084713807305301
Luo, X., Masterson, M. E., & Wu, C.-C. (2014). Melodic interval perception by normal-hearing
listeners and cochlear implant users. The Journal of the Acoustical Society of America,
136(4), 1831–1844. https://doi.org/10.1121/1.4894738
Luo, X., Padilla, M., & Landsberger, D. M. (2012). Pitch contour identification with combined
place and temporal cues using cochlear implants. The Journal of the Acoustical Society of
America, 131(2), 1325–1336. https://doi.org/10.1121/1.3672708
Luo, X., Soslowsky, S., & Pulling, K. R. (2019). Interaction Between Pitch and Timbre
Perception in Normal-Hearing Listeners and Cochlear Implant Users. Journal of the
Association for Research in Otolaryngology, 20(1), 57–72.
https://doi.org/10.1007/s10162-018-00701-3
201
Luo, X., & Warner, B. (2020). Effect of instrument timbre on musical emotion recognition in
normal-hearing listeners and cochlear implant users. The Journal of the Acoustical
Society of America, 147(6), EL535–EL539. https://doi.org/10.1121/10.0001475
Macherey, O., & Carlyon, R. P. (2010). Temporal pitch percepts elicited by dual-channel
stimulation of a cochlear implant. The Journal of the Acoustical Society of America,
127(1), 339–349. https://doi.org/10.1121/1.3269042
Macherey, O., & Carlyon, R. P. (2014). Re-examining the upper limit of temporal pitch. The
Journal of the Acoustical Society of America, 136(6), 3186–3199.
https://doi.org/10.1121/1.4900917
Macherey, O., Deeks, J. M., & Carlyon, R. P. (2011). Extending the Limits of Place and
Temporal Pitch Perception in Cochlear Implant Users. Journal of the Association for
Research in Otolaryngology: JARO; New York, 12(2), 233–251.
http://dx.doi.org.libproxy1.usc.edu/10.1007/s10162-010-0248-x
Mangado, N., Pons-Prats, J., Coma, M., Mistrík, P., Piella, G., Ceresa, M., & González Ballester,
M. Á. (2018). Computational Evaluation of Cochlear Implant Surgery Outcomes
Accounting for Uncertainty and Parameter Variability. Frontiers in Physiology, 0.
https://doi.org/10.3389/fphys.2018.00498
Marchese-Ragona, R., Pendolino, A. L., Mudry, A., & Martini, A. (2019). The Father of the
Electrical Stimulation of the Ear. Otology & Neurotology, 40(3), 404–406.
https://doi.org/10.1097/MAO.0000000000002153
Marimuthu, V., Swanson, B. A., & Mannell, R. (2016). Cochlear Implant Rate Pitch and Melody
Perception as a Function of Place and Number of Electrodes. Trends in Hearing, 20,
2331216516643085. https://doi.org/10.1177/2331216516643085
Marrone, N., Mason, C. R., & Kidd, G. (2008a). Evaluating the Benefit of Hearing Aids in
Solving the Cocktail Party Problem. Trends in Amplification, 12(4), 300–315.
https://doi.org/10.1177/1084713808325880
Marrone, N., Mason, C. R., & Kidd, G. (2008b). Tuning in the spatial dimension: Evidence from
a masked speech identification task. The Journal of the Acoustical Society of America,
124(2), 1146–1158. https://doi.org/10.1121/1.2945710
Marrone, N., Mason, C. R., & Kidd, G. (2008c). The effects of hearing loss and age on the
benefit of spatial separation between multiple talkers in reverberant rooms. The Journal
of the Acoustical Society of America, 124(5), 3064–3075.
https://doi.org/10.1121/1.2980441
McDermott, H. J. (2004). Music Perception with Cochlear Implants: A Review. Trends in
Amplification, 8(2), 49–82. https://doi.org/10.1177/108471380400800203
202
McDermott, H. J., & McKay, C. M. (1994). Pitch ranking with nonsimultaneous dual‐electrode
electrical stimulation of the cochlea. The Journal of the Acoustical Society of America,
96(1), 155–162. https://doi.org/10.1121/1.410475
McDermott, H. J., & McKay, C. M. (1997). Musical pitch perception with electrical stimulation
of the cochlea. The Journal of the Acoustical Society of America, 101(3), 1622–1631.
https://doi.org/10.1121/1.418177
McDermott, J. H., Keebler, M. V., Micheyl, C., & Oxenham, A. J. (2010). Musical intervals and
relative pitch: Frequency resolution, not interval resolution, is special. The Journal of the
Acoustical Society of America, 128(4), 1943–1951. https://doi.org/10.1121/1.3478785
McKay, C. M. (2012). Forward masking as a method of measuring place specificity of neural
excitation in cochlear implants: A review of methods and interpretation. The Journal of
the Acoustical Society of America, 131(3), 2209–2224. https://doi.org/10.1121/1.3683248
McKay, C. M., McDermott, H. J., & Carlyon, R. P. (2000). Place and temporal cues in pitch
perception: Are they truly independent? Acoustics Research Letters Online, 1(1), 25–30.
https://doi.org/10.1121/1.1318742
McKay, C. M., O’Brien, A., & James, C. J. (1999). Effect of current level on electrode
discrimination in electrical stimulation. Hearing Research, 136(1), 159–164.
https://doi.org/10.1016/S0378-5955(99)00121-5
Merzenich, M. M. (1983). Coding of Sound in a Cochlear Prosthesis: Some Theoretical and
Practical Considerations. Annals of the New York Academy of Sciences, 405(1), 502–508.
https://doi.org/10.1111/j.1749-6632.1983.tb31665.x
Merzenich, M. M. (2015). Early UCSF contributions to the development of multiple-channel
cochlear implants. Hearing Research, 322, 39–46.
https://doi.org/10.1016/j.heares.2014.12.008
Micheyl, C., Delhommeau, K., Perrot, X., & Oxenham, A. J. (2006). Influence of musical and
psychoacoustical training on pitch discrimination. Hearing Research, 219(1), 36–47.
https://doi.org/10.1016/j.heares.2006.05.004
Middlebrooks, J. C., & Snyder, R. L. (2007). Auditory Prosthesis with a Penetrating Nerve
Array. Journal for the Association for Research in Otolaryngology, 8(2), 258–279.
https://doi.org/10.1007/s10162-007-0070-2
Middlebrooks, J. C., & Snyder, R. L. (2010). Selective Electrical Stimulation of the Auditory
Nerve Activates a Pathway Specialized for High Temporal Acuity. Journal of
Neuroscience, 30(5), 1937–1946. https://doi.org/10.1523/JNEUROSCI.4949-09.2010
Moberly, A. C., Bates, C., Harris, M. S., & Pisoni, D. B. (2016). The enigma of poor
performance by adults with cochlear implants. Otology and Neurotology, 37(10), 1522–
1528. Scopus. https://doi.org/10.1097/MAO.0000000000001211
203
Moon, I. J., & Hong, S. H. (2014). What Is Temporal Fine Structure and Why Is It Important?
Korean Journal of Audiology, 18(1), 1–7. https://doi.org/10.7874/kja.2014.18.1.1
Moore, B. C. J., & Carlyon, R. P. (2005). Perception of Pitch by People with Cochlear Hearing
Loss and by Cochlear Implant Users. In Pitch (pp. 234–277). Springer, New York, NY.
https://doi.org/10.1007/0-387-28958-5_7
Moore, D., & Amitay, S. (2007). Auditory Training: Rules and Applications. Seminars in
Hearing, 28(2), 099–109. https://doi.org/10.1055/s-2007-973436
Moore, D. R., & Amitay, S. (2007). Auditory Training: Rules and Applications. Seminars in
Hearing, 28(02), 099–109. https://doi.org/10.1055/s-2007-973436
Moran, M., Rousset, A., & Looi, V. (2016). Music appreciation and music listening in prelingual
and postlingually deaf adult cochlear implant recipients. International Journal of
Audiology, 55(sup2), S57–S63. https://doi.org/10.3109/14992027.2016.1157630
Morris, D. J., & Pfingst, B. E. (2000). Effects of Electrode Configuration and Stimulus Level on
Rate and Level Discrimination with Cochlear Implants. Journal of the Association for
Research in Otolaryngology, 1(3), 211–223. https://doi.org/10.1007/s101620010022
Mudry, A., & Mills, M. (2013). The Early History of the Cochlear Implant: A Retrospective.
JAMA Otolaryngology–Head & Neck Surgery, 139(5), 446–453.
https://doi.org/10.1001/jamaoto.2013.293
Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians:
An index for assessing musical sophistication in the general population. PloS One, 9(2),
e89642. https://doi.org/10.1371/journal.pone.0089642
Müller, J., Brill, S., Hagen, R., Moeltner, A., Brockmeier, S., Stark, T., Helbig, S., Maurer, J.,
Zahnert, T., Zierhofer, C., Nopp, P., & Anderson, I. (2012). Clinical Trial Results with
the MED-EL Fine Structure Processing Coding Strategy in Experienced Cochlear
Implant Users. ORL : Journal for Oto - Rhino - Laryngology and Its Related Specialties;
Basel, 74(4), 185–198. http://dx.doi.org.libproxy2.usc.edu/10.1159/000337089
Muniak, M. A., Connelly, C. J., Suthakar, K., Milinkeviciute, G., Ayeni, F. E., & Ryugo, D. K.
(2016). Central Projections of Spiral Ganglion Neurons. In The Primary Auditory
Neurons of the Mammalian Cochlea (pp. 157–190). Springer, New York, NY.
https://doi.org/10.1007/978-1-4939-3031-9_6
Neher, T., Laugesen, S., Jensen, N. S., & Kragelund, L. (2011). Can basic auditory and cognitive
measures predict hearing-impaired listeners’ localization and spatial speech recognition
abilities? The Journal of the Acoustical Society of America, 130(3), 1542–1558.
https://doi.org/10.1121/1.3608122
Nelson, D. A., & Donaldson, G. S. (2002). Psychophysical recovery from pulse-train forward
masking in electric hearing. The Journal of the Acoustical Society of America, 112(6),
2932–2947. https://doi.org/10.1121/1.1514935
204
Nelson, D. A., Kreft, H. A., Anderson, E. S., & Donaldson, G. S. (2011). Spatial tuning curves
from apical, middle, and basal electrodes in cochlear implant users. The Journal of the
Acoustical Society of America, 129(6), 3916–3933. https://doi.org/10.1121/1.3583503
Nelson, D. A., Van Tasell, D. J., Schroder, A. C., Soli, S., & Levine, S. (1995). Electrode
ranking of ‘“place pitch”’ and speech recognition in electrical hearing. The Journal of the
Acoustical Society of America, 98(4), 1987–1999. https://doi.org/10.1121/1.413317
Nimmons, G. L., Kang, R. S., Drennan, W. R., Longnion, J., Ruffin, C., Worman, T., Yueh, B.,
& Rubinstein, J. T. (2008). Clinical Assessment of Music Perception in Cochlear Implant
Listeners. Otology & Neurotology, 29(2), 149–155.
https://doi.org/10.1097/mao.0b013e31812f7244
Niparko, J. K. (2009). Cochlear Implants: Principles & Practices. Lippincott Williams &
Wilkins.
Nogueira, W., Nagathil, A., & Martin, R. (2019). Making Music More Accessible for Cochlear
Implant Listeners: Recent Developments. IEEE Signal Processing Magazine, 36(1), 115–
127. https://doi.org/10.1109/MSP.2018.2874059
Nogueira, W., Schurzig, D., Büchner, A., Penninger, R. T., & Würfel, W. (2016). Validation of a
Cochlear Implant Patient-Specific Model of the Voltage Distribution in a Clinical Setting.
Frontiers in Bioengineering and Biotechnology, 0.
https://doi.org/10.3389/fbioe.2016.00084
Osberger, M. J., & Fisher, L. (2000). New directions in speech processing: Patient performance
with simultaneous analog stimulation. The Annals of Otology, Rhinology & Laryngology,
109(12), 70–73.
Oxenham, A. J. (2013). Revisiting place and temporal theories of pitch. Acoustical Science and
Technology, 34(6), 388–396. https://doi.org/10.1250/ast.34.388
Oxenham, A. J., Bernstein, J. G. W., & Penagos, H. (2004). Correct tonotopic representation is
necessary for complex pitch perception. Proceedings of the National Academy of
Sciences, 101(5), 1421–1425. https://doi.org/10.1073/pnas.0306958101
Oxenham, A. J., Micheyl, C., Keebler, M. V., Loper, A., & Santurette, S. (2011). Pitch
perception beyond the traditional existence region of pitch. Proceedings of the National
Academy of Sciences of the United States of America, 108(18), 7629–7634.
https://doi.org/10.1073/pnas.1015291108
Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig and
its relation to the receptor potential of inner hair-cells. Hearing Research, 24(1), 1–15.
https://doi.org/10.1016/0378-5955(86)90002-X
Penninger, R. T., Chien, W. W., Jiradejvong, P., Boeke, E., Carver, C. L., & Limb, C. J. (2013).
Perception of Pure Tones and Iterated Rippled Noise for Normal Hearing and Cochlear
205
Implant Users. Trends in Amplification, 17(1), 45–53.
https://doi.org/10.1177/1084713813482759
Penninger, R. T., Kludt, E., Büchner, A., & Nogueira, W. (2015). Stimulating on multiple
electrodes can improve temporal pitch perception. International Journal of Audiology,
54(6), 376–383. https://doi.org/10.3109/14992027.2014.997313
Peterson, P. M. (1986). Simulating the response of multiple microphones to a single acoustic
source in a reverberant room. The Journal of the Acoustical Society of America, 80(5),
1527–1529. https://doi.org/10.1121/1.394357
Pfingst, B. E., Franck, K. H., Xu, L., Bauer, E. M., & Zwolan, T. A. (2001). Effects of Electrode
Configuration and Place of Stimulation on Speech Perception with Cochlear Prostheses.
Journal of the Association for Research in Otolaryngology, 2(2), 87–103.
https://doi.org/10.1007/s101620010065
Pfingst, B. E., Holloway, L. A., Poopat, N., Subramanya, A. R., Warren, M. F., & Zwolan, T. A.
(1994). Effects of stimulus level on nonspectral frequency discrimination by human
subjects. Hearing Research, 78(2), 197–209. https://doi.org/10.1016/0378-
5955(94)90026-4
Pfingst, B. E., Zwolan, T. A., & Holloway, L. A. (1997). Effects of stimulus configuration on
psychophysical operating levels and on speech recognition with cochlear implants.
Hearing Research, 112(1), 247–260. https://doi.org/10.1016/S0378-5955(97)00122-6
Pijl, S. (1997). Labeling of Musical Interval Size by Cochlear Implant Patients and Normally
Hearing Subjects. Ear and Hearing, 18(5), 364–372.
Pijl, S., & Schwarz, D. W. F. (1995a). Melody recognition and musical interval perception by
deaf subjects stimulated with electrical pulse trains through single cochlear implant
electrodes. The Journal of the Acoustical Society of America, 98(2), 886–895.
https://doi.org/10.1121/1.413514
Pijl, S., & Schwarz, D. W. F. (1995b). Intonation of musical intervals by musical intervals by
deaf subjects stimulated with single bipolar cochlear implant electrodes. Hearing
Research, 89(1), 203–211. https://doi.org/10.1016/0378-5955(95)00138-9
Pisoni, D. B., Kronenberger, W. G., Harris, M. S., & Moberly, A. C. (2018). Three challenges
for future research on cochlear implants. World Journal of Otorhinolaryngology - Head
and Neck Surgery, 3(4), 240–254. https://doi.org/10.1016/j.wjorl.2017.12.010
Pretorius, L. L., & Hanekom, J. J. (2008). Free field frequency discrimination abilities of
cochlear implant users. Hearing Research, 244(1–2), 77–84.
https://doi.org/10.1016/j.heares.2008.07.005
Rader, T., Döge, J., Adel, Y., Weissgerber, T., & Baumann, U. (2016). Place dependent
stimulation rates improve pitch perception in cochlear implantees with single-sided
deafness. Hearing Research, 339, 94–103. https://doi.org/10.1016/j.heares.2016.06.013
206
Ramsden, R. T. (2013). History of cochlear implantation. Cochlear Implants International,
14(sup4), 3–5. https://doi.org/10.1179/1467010013Z.000000000140
Rebscher, S. J. (2008). Considerations for design of future cochlear implant electrode arrays:
Electrode array stiffness, size,. The Journal of Rehabilitation Research and Development,
45(5), 731–748. https://doi.org/10.1682/JRRD.2007.08.0119
Reiss, L. A. J., Turner, C. W., Karsten, S. A., & Gantz, B. J. (2014). Plasticity in human pitch
perception induced by tonotopically mismatched electro-acoustic stimulation.
Neuroscience, 256, 43–52. https://doi.org/10.1016/j.neuroscience.2013.10.024
Ricketts, T. A. (2001). Directional Hearing Aids. Trends in Amplification, 5(4), 139–176.
https://doi.org/10.1177/108471380100500401
Riss, D., Arnoldner, C., Baumgartner, W.-D., Kaider, A., & Hamzavi, J.-S. (2008). A New Fine
Structure Speech Coding Strategy: Speech Perception at a Reduced Number of Channels.
29(6), 5.
Riss, D., Hamzavi, J.-S., Blineder, M., Flak, S., Baumgartner, W.-D., Kaider, A., & Arnoldner,
C. (2016). Effects of Stimulation Rate With the FS4 and HDCIS Coding Strategies in
Cochlear Implant Recipients: Otology & Neurotology, 37(7), 882–888.
https://doi.org/10.1097/MAO.0000000000001107
Riss, D., Hamzavi, J.-S., Blineder, M., Honeder, C., Ehrenreich, I., Kaider, A., Baumgartner, W.-
D., Gstoettner, W., & Arnoldner, C. (2014). FS4, FS4-p, and FSP: A 4-Month Crossover
Study of 3 Fine Structure Sound-Coding Strategies. Ear and Hearing, 35(6), e272.
https://doi.org/10.1097/AUD.0000000000000063
Rose, J. E., Brugge, J. F., Anderson, D. J., & Hind, J. E. (1967). Phase-locked response to low-
frequency tones in single auditory nerve fibers of the squirrel monkey. Journal of
Neurophysiology, 30(4), 769–793. https://doi.org/10.1152/jn.1967.30.4.769
Rosskothen-Kuhl, N., Buck, A. N., Li, K., & Schnupp, J. W. (2021). Microsecond interaural time
difference discrimination restored by cochlear implants after neonatal deafness. ELife, 10,
e59300. https://doi.org/10.7554/eLife.59300
Rosskothen-Kuhl, N., Buck, A. N., Li, K., & Schnupp, J. W. H. (2018). Microsecond Interaural
Time Difference Discrimination Restored by Cochlear Implants After Neonatal Deafness.
BioRxiv, 498105. https://doi.org/10.1101/498105
Rothpletz, A. M., Wightman, F. L., & Kistler, D. J. (2012). Informational Masking and Spatial
Hearing in Listeners with and without Unilateral Hearing Loss. Journal of Speech,
Language, and Hearing Research, 55(2), 511–531. https://doi.org/10.1044/1092-
4388(2011/10-0205)
Rubinstein, J. T., & Hong, R. (2003). Signal Coding in Cochlear Implants: Exploiting Stochastic
Effects of Electrical Stimulation. Annals of Otology, Rhinology & Laryngology,
112(9_suppl), 14–19. https://doi.org/10.1177/00034894031120S904
207
Rubinstein, J. T., Wilson, B. S., Finley, C. C., & Abbas, P. J. (1999). Pseudospontaneous
activity: Stochastic independence of auditory nerve fibers with electrical stimulation.
Hearing Research, 127(1), 108–118. https://doi.org/10.1016/S0378-5955(98)00185-3
Ryugo, D. K., & May, S. K. (1993). The projections of intracellularly labeled auditory nerve
fibers to the dorsal cochlear nucleus of cats. Journal of Comparative Neurology, 329(1),
20–35. https://doi.org/10.1002/cne.903290103
Schatzer, R., Vermeire, K., Visser, D., Krenmayr, A., Kals, M., Voormolen, M., Van de
Heyning, P., & Zierhofer, C. (2014). Electric-acoustic pitch comparisons in single-sided-
deaf cochlear implant users: Frequency-place functions and rate pitch. Hearing Research,
309, 26–35. https://doi.org/10.1016/j.heares.2013.11.003
Schindler, R. A. (1999). Personal Reflections on Cochlear Implants. Annals of Otology,
Rhinology & Laryngology, 108(4_suppl), 4–7.
https://doi.org/10.1177/00034894991080S402
Shackleton, T. M., & Carlyon, R. P. (1994). The role of resolved and unresolved harmonics in
pitch perception and frequency modulation discrimination. The Journal of the Acoustical
Society of America, 95(6), 3529–3540. https://doi.org/10.1121/1.409970
Shafiro, V., Sheft, S., Kuvadia, S., & Gygi, B. (2015). Environmental Sound Training in
Cochlear Implant Users. Journal of Speech, Language and Hearing Research (Online);
Rockville, 58(2), 509–519. http://dx.doi.org/10.1044/2015_JSLHR-H-14-0312
Shannon, R. V. (1983). Multichannel electrical stimulation of the auditory nerve in man. II.
Channel interaction. Hearing Research, 12(1), 1–16. https://doi.org/10.1016/0378-
5955(83)90115-6
Shannon, R. V. (2015). Auditory Implant Research at the House Ear Institute 1989–2013.
Hearing Research, 322, 57–66. https://doi.org/10.1016/j.heares.2014.11.003
Shannon, R. V., Adams, D. D., Ferrel, R. L., Palumbo, R. L., & Grandgenett, M. (1990). A
computer interface for psychophysical and speech research with the Nucleus cochlear
implant. The Journal of the Acoustical Society of America, 87(2), 905–907.
https://doi.org/10.1121/1.398902
Shannon, R. V., Fu, Q.-J., Galvin, J., & Friesen, L. (2004). Speech Perception with Cochlear
Implants. In Cochlear Implants: Auditory Prostheses and Electric Hearing (pp. 334–
376). Springer, New York, NY. https://doi.org/10.1007/978-0-387-22585-2_8
Shepherd, R. K., & Javel, E. (1997). Electrical stimulation of the auditory nerve. I. Correlation of
physiological responses with cochlear status. Hearing Research, 108(1), 112–144.
https://doi.org/10.1016/S0378-5955(97)00046-4
Shin, S., Ha, Y., Choi, G., Hyun, J., Kim, S., Oh, S.-H., & Min, K.-S. (2021). Manufacturable
32-Channel Cochlear Electrode Array and Preliminary Assessment of Its Feasibility for
Clinical Use. Micromachines, 12(7), Article 7. https://doi.org/10.3390/mi12070778
208
Shinn-Cunningham, B. G., Desloge, J. G., & Kopco, N. (2001). Empirical and modeled acoustic
transfer functions in a simple room: Effects of distance and direction. Proceedings of the
2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics
(Cat. No.01TH8575), 183–186. https://doi.org/10.1109/ASPAA.2001.969573
Siedenburg, K., Saitis, C., & McAdams, S. (2019). The Present, Past, and Future of Timbre
Research. In K. Siedenburg, C. Saitis, S. McAdams, A. N. Popper, & R. R. Fay (Eds.),
Timbre: Acoustics, Perception, and Cognition (pp. 1–19). Springer International
Publishing. https://doi.org/10.1007/978-3-030-14832-4_1
Smith, Z. M., Delgutte, B., & Oxenham, A. J. (2002). Chimaeric sounds reveal dichotomies in
auditory perception. Nature, 416(6876), 87–90. https://doi.org/10.1038/416087a
Soede, W., Berkhout, A. J., & Bilsen, F. A. (1993). Development of a directional hearing
instrument based on array technology. The Journal of the Acoustical Society of America,
94(2), 785–798. https://doi.org/10.1121/1.408180
Soede, W., Bilsen, F. A., & Berkhout, A. J. (1993). Assessment of a directional microphone
array for hearing-impaired listeners. The Journal of the Acoustical Society of America,
94(2 Pt 1), 799–808. https://doi.org/10.1121/1.408181
Spitzer, E. R., Galvin, J. J., Friedmann, D. R., & Landsberger, D. M. (2021). Melodic interval
perception with acoustic and electric hearing in bimodal and single-sided deaf cochlear
implant listeners. Hearing Research, 400, 108136.
https://doi.org/10.1016/j.heares.2020.108136
Spriet, A., Van Deun, L., Eftaxiadis, K., Laneau, J., Moonen, M., van Dijk, B., van Wieringen,
A., & Wouters, J. (2007). Speech understanding in background noise with the two-
microphone adaptive beamformer BEAM in the Nucleus Freedom Cochlear Implant
System. Ear and Hearing, 28(1), 62–72.
https://doi.org/10.1097/01.aud.0000252470.54246.54
Srinivasan, A. G., Shannon, R. V., & Landsberger, D. M. (2012). Improving Virtual Channel
Discrimination in a Multi-Channel Context. Hearing Research, 286(1–2), 19–29.
Stadler, R. W., & Rabinowitz, W. M. (1993). On the potential of fixed arrays for hearing aids.
The Journal of the Acoustical Society of America, 94(3), 1332–1342.
https://doi.org/10.1121/1.408161
Stahl, P., Macherey, O., Meunier, S., & Roman, S. (2016). Rate discrimination at low pulse rates
in normal-hearing and cochlear implant listeners: Influence of intracochlear stimulation
site. The Journal of the Acoustical Society of America, 139(4), 1578–1591.
https://doi.org/10.1121/1.4944564
Stohl, J. S., Throckmorton, C. S., & Collins, L. M. (2008). Assessing the pitch structure
associated with multiple rates and places for cochlear implant users. The Journal of the
Acoustical Society of America, 123(2), 1043–1053. https://doi.org/10.1121/1.2821980
209
Stupak, N., Todd, A. E., & Landsberger, D. M. (2021). Place-Pitch Interval Perception With a
Cochlear Implant. Ear and Hearing, 42(2), 301–312.
https://doi.org/10.1097/AUD.0000000000000922
Sunwoo, W., Delgutte, B., & Chung, Y. (2021). Chronic Bilateral Cochlear Implant Stimulation
Partially Restores Neural Binaural Sensitivity in Neonatally-Deaf Rabbits. Journal of
Neuroscience, 41(16), 3651–3664. https://doi.org/10.1523/JNEUROSCI.1076-20.2021
Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V., Kidd, J., & Patel, A. D. (2015). Musical
training, individual differences and the cocktail party problem. Scientific Reports, 5(1),
Article 1. https://doi.org/10.1038/srep11628
Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V., Roverud, E., & Kidd, G. (2016). Role
of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening. The
Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 36(31),
8250–8257. https://doi.org/10.1523/JNEUROSCI.4421-15.2016
Swanson, B. A., Marimuthu, V. M. R., & Mannell, R. H. (2019). Place and Temporal Cues in
Cochlear Implant Pitch and Melody Perception. Frontiers in Neuroscience, 13(1266), 1–
18. https://doi.org/10.3389/fnins.2019.01266
Swanson, B. A., & Mauch, H. (2006). ftware user manual Nucleus Matlab Toolbox 4.20 so . Lane
Cove NSW, Australia, Cochlear Ltd.
http://scholar.google.com/citations?view_op=view_citation&hl=en&user=Yji9ZBQAAA
AJ&cstart=20&pagesize=80&sortby=pubdate&citation_for_view=Yji9ZBQAAAAJ:9Zl
FYXVOiuMC
Szurley, J., Bertrand, A., Van DIjk, B., & Moonen, M. (2016). Binaural noise cue preservation in
a binaural noise reduction system with a remote microphone signal. IEEE/ACM
Transactions on Audio Speech and Language Processing, 24(5), 952–966.
https://doi.org/10.1109/TASLP.2016.2535199
Thiemann, J., Müller, M., Marquardt, D., Doclo, S., & van de Par, S. (2016). Speech
enhancement for multimicrophone binaural hearing aids aiming to preserve the spatial
auditory scene. EURASIP Journal on Advances in Signal Processing, 2016(1), 12–12.
https://doi.org/10.1186/s13634-016-0314-6
Todd, A. E., Mertens, G., Van de Heyning, P., & Landsberger, D. M. (2017). Encoding a Melody
Using Only Temporal Information for Cochlear-Implant and Normal-Hearing Listeners.
Trends in Hearing, 21, 2331216517739745. https://doi.org/10.1177/2331216517739745
Tong, Y. C., Blamey, P. J., Dowell, R. C., & Clark, G. M. (1983). Psychophysical studies
evaluating the feasibility of a speech processing strategy for a multiple‐channel cochlear
implant. The Journal of the Acoustical Society of America, 74(1), 73–80.
https://doi.org/10.1121/1.389620
210
Tong, Y. C., & Clark, G. M. (1985). Absolute identification of electric pulse rates and electrode
positions by cochlear implant patients. The Journal of the Acoustical Society of America,
77(5), 1881–1888. https://doi.org/10.1121/1.391939
Tong, Y. C., Clark, G. M., Blamey, P. J., Busby, P. A., & Dowell, R. C. (1982). Psychophysical
studies for two multiple‐channel cochlear implant patients. The Journal of the Acoustical
Society of America, 71(1), 153–160. https://doi.org/10.1121/1.387342
Torkildsen, J. von K., Hitchins, A., Myhrum, M., & Wie, O. B. (2019). Speech-in-Noise
Perception in Children With Cochlear Implants, Hearing Aids, Developmental Language
Disorder and Typical Development: The Effects of Linguistic and Cognitive Abilities.
Frontiers in Psychology, 10, 2530. https://doi.org/10.3389/fpsyg.2019.02530
Townshend, B., Cotter, N., Van Compernolle, D., & White, R. L. (1987a). Pitch perception by
cochlear implant subjects. The Journal of the Acoustical Society of America, 82(1), 106–
115. https://doi.org/10.1121/1.395554
Townshend, B., Cotter, N., Van Compernolle, D., & White, R. L. (1987b). Pitch perception by
cochlear implant subjects. The Journal of the Acoustical Society of America, 82(1), 106–
115. https://doi.org/10.1121/1.395554
Tyler, R. S., & Moore, B. C. J. (1992). Consonant recognition by some of the better
cochlear‐implant patients. The Journal of the Acoustical Society of America, 92(6), 3068–
3077. https://doi.org/10.1121/1.404203
Tyler, R. S., Moore, B. C. J., & Kuk, F. K. (1989). Performance of Some of the Better Cochlear-
Implant Patients. Journal of Speech, Language, and Hearing Research, 32(4), 887–911.
https://doi.org/10.1044/jshr.3204.887
Tyler, R. S., Wood, E. J., & Fernandes, M. (1983). Frequency resolution and discrimination of
constant and dynamic tones in normal and hearing‐impaired listeners. The Journal of the
Acoustical Society of America, 74(4), 1190–1199. https://doi.org/10.1121/1.390043
Vaerenberg, B., Smits, C., De Ceulaer, G., Zir, E., Harman, S., Jaspers, N., Tam, Y., Dillon, M.,
Wesarg, T., Martin-Bonniot, D., Gärtner, L., Cozma, S., Kosaner, J., Prentiss, S.,
Sasidharan, P., Briaire, J. J., Bradley, J., Debruyne, J., Hollow, R., … Govaerts, P. J.
(2014). Cochlear Implant Programming: A Global Survey on the State of the Art. The
Scientific World Journal, 2014, e501738. https://doi.org/10.1155/2014/501738
Valente, M., Schuchman, G., Potts, L. G., & Beck, L. B. (2000). Performance of dual-
microphone in-the-ear hearing aids. Journal of the American Academy of Audiology,
11(4), 181–189.
Van den Bogaert, T., Doclo, S., Wouters, J., & Moonen, M. (2008). The effect of
multimicrophone noise reduction systems on sound source localization by users of
binaural hearing aids. The Journal of the Acoustical Society of America, 124(1), 484–497.
https://doi.org/10.1121/1.2931962
211
Van den Bogaert, T., Wouters, J., Doclo, S., & Moonen, M. (2007). Binaural Cue Preservation
for Hearing Aids using an Interaural Transfer Function Multichannel Wiener Filter. 2007
IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP
’07, 4, IV-565-IV–568. https://doi.org/10.1109/ICASSP.2007.366975
van den Honert, C., & Stypulkowski, P. H. (1987). Temporal response patterns of single auditory
nerve fibers elicited by periodic electrical stimuli. Hearing Research, 29(2), 207–222.
https://doi.org/10.1016/0378-5955(87)90168-7
van der Marel, K. S., Briaire, J. J., Wolterbeek, R., Snel-Bongers, J., Verbist, B. M., & Frijns, J.
H. M. (2014). Diversity in Cochlear Morphology and Its Influence on Cochlear Implant
Electrode Position. Ear and Hearing, 35(1), e9.
https://doi.org/10.1097/01.aud.0000436256.06395.63
van Hoesel, R. J., & Clark, G. M. (1995). Evaluation of a portable two-microphone adaptive
beamforming speech processor with cochlear implant patients. The Journal of the
Acoustical Society of America, 97(4), 2498–2503. https://doi.org/10.1121/1.411970
van Hoesel, R. J. M., & Clark, G. M. (1997). Psychophysical studies with two binaural cochlear
implant subjects. The Journal of the Acoustical Society of America, 102(1), 495–507.
https://doi.org/10.1121/1.419611
van Hoesel, R. J. M., & Tyler, R. S. (2003). Speech perception, localization, and lateralization
with bilateral cochlear implants. The Journal of the Acoustical Society of America,
113(3), 1617–1630. https://doi.org/10.1121/1.1539520
Van Veen, B. D., & Buckley, K. M. (1988). Beamforming: A versatile approach to spatial
filtering. IEEE ASSP Magazine, 5(2), 4–24. https://doi.org/10.1109/53.665
Vandali, A. E., Sucher, C., Tsang, D. J., McKay, C. M., Chew, J. W. D., & McDermott, H. J.
(2005). Pitch ranking ability of cochlear implant recipients: A comparison of sound-
processing strategies. The Journal of the Acoustical Society of America, 117(5), 3126–
3138. https://doi.org/10.1121/1.1874632
Vandali, A. E., & van Hoesel, R. J. M. (2011). Development of a temporal fundamental
frequency coding strategy for cochlear implants. The Journal of the Acoustical Society of
America, 129(6), 4023–4036. https://doi.org/10.1121/1.3573988
Vandali, A. E., & van Hoesel, R. J. M. (2012). Enhancement of temporal cues to pitch in
cochlear implants: Effects on pitch ranking. The Journal of the Acoustical Society of
America, 132(1), 392–402. https://doi.org/10.1121/1.4718452
Vandali, A., Sly, D., Cowan, R., & van Hoesel, R. (2015). Training of Cochlear Implant Users to
Improve Pitch Perception in the Presence of Competing Place Cues. Ear and Hearing,
36(2), e1–e13. https://doi.org/10.1097/AUD.0000000000000109
212
Vanden Berghe, J., & Wouters, J. (1998). An adaptive noise canceller for hearing aids using two
nearby microphones. The Journal of the Acoustical Society of America, 103(6), 3621–
3626. https://doi.org/10.1121/1.423066
Venail, F., Mura, T., Akkari, M., Mathiolon, C., Menjot de Champfleur, S., Piron, J. P., Sicard,
M., Sterkers-Artieres, F., Mondain, M., & Uziel, A. (2015). Modeling of Auditory
Neuron Response Thresholds with Cochlear Implants. BioMed Research International,
2015, e394687. https://doi.org/10.1155/2015/394687
Venter, P. J., & Hanekom, J. J. (2014). Is There a Fundamental 300 Hz Limit to Pulse Rate
Discrimination in Cochlear Implants? Journal of the Association for Research in
Otolaryngology, 15(5), 849–866. https://doi.org/10.1007/s10162-014-0468-6
Vermeire, K., Kleine Punte, A., & Van de Heyning, P. (2010). Better Speech Recognition in
Noise with the Fine Structure Processing Coding Strategy. ORL, 72(6), 305–311.
https://doi.org/10.1159/000319748
Verschooten, E., Shamma, S., Oxenham, A. J., Moore, B. C. J., Joris, P. X., Heinz, M. G., &
Plack, C. J. (2019). The upper frequency limit for the use of phase locking to code
temporal fine structure in humans: A compilation of viewpoints. Hearing Research, 377,
109–121. https://doi.org/10.1016/j.heares.2019.03.011
Vollmer, M., & Beitel, R. E. (2011). Behavioral training restores temporal processing in auditory
cortex of long-deaf cats. Journal of Neurophysiology, 106(5), 2423–2436.
https://doi.org/10.1152/jn.00565.2011
Volta, A. (1800). On the Electricity Excited by the Mere Contact of Conducting Substances of
Different Kinds. Abstracts of the Papers Printed in the Philosophical Transactions of the
Royal Society of London, 1, 27–29.
Von Wallenberg, E. L., Hochmair, E. S., & Hochmair-Desoyer, I. J. (1990). Initial Results with
Simultaneous Analog and Pulsatile Stimulation of the Cochlea. Acta Oto-Laryngologica,
109(sup469), 140–149. https://doi.org/10.1080/00016489.1990.12088421
Wagner, L., Altindal, R., Plontke, S. K., & Rahne, T. (2021). Pure tone discrimination with
cochlear implants and filter-band spread. Scientific Reports, 11(1), 1–8.
https://doi.org/10.1038/s41598-021-99799-4
Waltzman, S. B. (2006). Cochlear implants: Current status. Expert Review of Medical Devices,
3(5), 647–655. https://doi.org/10.1586/17434440.3.5.647
Warnecke, M., Peng, Z. E., & Litovsky, R. Y. (2020). The impact of temporal fine structure and
signal envelope on auditory motion perception. PLOS ONE, 15(8), e0238125.
https://doi.org/10.1371/journal.pone.0238125
Welker, D. P., Greenberg, J. E., Desloge, J. G., & Zurek, P. M. (1997). Microphone-array
hearing aids with binaural output. II. A two-microphone adaptive system. IEEE
213
Transactions on Speech and Audio Processing, 5(6), 543–551.
https://doi.org/10.1109/89.641299
Wendt, D., Hietkamp, R. K., & Lunner, T. (2017). Impact of Noise and Noise Reduction on
Processing Effort: A Pupillometry Study. Ear and Hearing, 38(6), 690–700.
https://doi.org/10.1097/AUD.0000000000000454
Wever, E. G., & Bray, C. W. (1937). The Perception of Low Tones and the Resonance-Volley
Theory. The Journal of Psychology, 3(1), 101–114.
https://doi.org/10.1080/00223980.1937.9917483
Wilson, B. S., & Dorman, M. F. (2008). Cochlear implants: A remarkable past and a brilliant
future. Hearing Research, 242(1–2), 3–21. https://doi.org/10.1016/j.heares.2008.06.005
Wilson, B. S., & Dorman, M. F. (2018). A Brief History of the Cochlear Implant and Related
Treatments. In Neuromodulation (Second Edition) (pp. 1197–1207). Elsevier.
https://doi.org/10.1016/B978-0-12-805353-9.00099-1
Wilson, B. S., Finley, C. C., Lawson, D. T., & Zerbi, M. (1997). Temporal representations with
cochlear implants. The American Journal of Otology, 18(6 Suppl), S30-34.
Wilson, B. S., Sun, X., Schatzer, R., & Wolford, R. D. (2004). Representation of fine structure or
fine frequency information with cochlear implants. International Congress Series, 1273,
3–6. https://doi.org/10.1016/j.ics.2004.08.018
Wilson, R. H., McArdle, R., Watts, K. L., & Smith, S. L. (2012). The Revised Speech Perception
in Noise Test (R-SPIN) in a Multiple Signal-to-Noise Ratio Paradigm. Journal of the
American Academy of Audiology, 23(08), 590–605. https://doi.org/10.3766/jaaa.23.7.9
Withers, S. J., Gibson, W. P., Greenberg, S. L., & Bray, M. (2011). Comparison of outcomes in a
case of bilateral cochlear implantation using devices manufactured by two different
implant companies (Cochlear Corporation and Med-El). Cochlear Implants International,
12(2), 124–126. https://doi.org/10.1179/146701010X12711475887315
Woods, W. S., Kalluri, S., Pentony, S., & Nooraei, N. (2013). Predicting the effect of hearing
loss and audibility on amplified speech reception in a multi-talker listening scenario. The
Journal of the Acoustical Society of America, 133(6), 4268–4278.
https://doi.org/10.1121/1.4803859
Wouters, J., Doclo, S., Koning, R., & Francart, T. (2013). Sound Processing for Better Coding of
Monaural and Binaural Cues in Auditory Prostheses. Proceedings of the IEEE, 101(9),
1986–1997. https://doi.org/10.1109/JPROC.2013.2257635
Wouters, J., Litière, L., & van Wieringen, A. (1999). Speech intelligibility in noisy environments
with one- and two-microphone hearing aids. Audiology: Official Organ of the
International Society of Audiology, 38(2), 91–98.
https://doi.org/10.3109/00206099909073008
214
Wouters, J., McDermott, H. J., & Francart, T. (2015). Sound Coding in Cochlear Implants: From
electric pulses to hearing. IEEE Signal Processing Magazine, 32(2), 67–80.
https://doi.org/10.1109/MSP.2014.2371671
Wouters, J., & Vanden Berghe, J. (2001). Speech recognition in noise for cochlear implantees
with a two-microphone monaural adaptive noise reduction system. Ear and Hearing,
22(5), 420–430. https://doi.org/10.1097/00003446-200110000-00006
Wright, B. A. (2013). Induction of auditory perceptual learning. Proceedings of the International
Symposium on Auditory and Audiological Research, 4, 1–11.
Wright, B. A., Sabin, A. T., Zhang, Y., Marrone, N., & Fitzgerald, M. B. (2010). Enhancing
Perceptual Learning by Combining Practice with Periods of Additional Sensory
Stimulation. Journal of Neuroscience, 30(38), 12868–12877.
https://doi.org/10.1523/JNEUROSCI.0487-10.2010
Würfel, W., Lanfermann, H., Lenarz, T., & Majdani, O. (2014). Cochlear length determination
using Cone Beam Computed Tomography in a clinical setting. Hearing Research, 316,
65–72. https://doi.org/10.1016/j.heares.2014.07.013
Xu, L., Luo, J., Xie, D., Chao, X., Wang, R., Zahorik, P., & Luo, X. (2021). Reverberation
Degrades Pitch Perception but Not Mandarin Tone and Vowel Recognition of Cochlear
Implant Users. Ear & Hearing, Publish Ahead of Print.
https://doi.org/10.1097/AUD.0000000000001173
Zeng, F.-G. (2002). Temporal pitch in electric hearing. Hearing Research, 174(1–2), 101–106.
https://doi.org/10.1016/S0378-5955(02)00644-5
Zeng, F.-G. (2017). Challenges in Improving Cochlear Implant Performance and Accessibility.
IEEE Transactions on Biomedical Engineering, 64(8), 1662–1664.
https://doi.org/10.1109/TBME.2017.2718939
Zeng, F.-G. (2022). Celebrating the one millionth cochlear implant. JASA Express Letters, 2(7),
077201. https://doi.org/10.1121/10.0012825
Zeng, F.-G., Rebscher, S., Harrison, W. V., Sun, X., & Feng, H. (2008). Cochlear
Implants:System Design, Integration and Evaluation. IEEE Reviews in Biomedical
Engineering, 1, 115–142. https://doi.org/10.1109/RBME.2008.2008250
Zeng, F.-G., Tang, Q., Lu, T., & Bensmaia, S. J. (2014). Abnormal Pitch Perception Produced by
Cochlear Implant Stimulation. PLoS ONE, 9(2), e88662.
https://doi.org/10.1371/journal.pone.0088662
Zhou, N. (2016). Monopolar Detection Thresholds Predict Spatial Selectivity of Neural
Excitation in Cochlear Implants: Implications for Speech Recognition. PLOS ONE,
11(10), e0165476. https://doi.org/10.1371/journal.pone.0165476
215
Zhou, N., Mathews, J., & Dong, L. (2019). Pulse-rate discrimination deficit in cochlear implant
users: Is the upper limit of pitch peripheral or central? Hearing Research, 371, 1–10.
https://doi.org/10.1016/j.heares.2018.10.018
Zhou, N., & Pfingst, B. E. (2014). Relationship between multipulse integration and speech
recognition with cochlear implants. The Journal of the Acoustical Society of America,
136(3), 1257–1268. https://doi.org/10.1121/1.4890640
Zhou, N., & Pfingst, B. E. (2016a). Evaluating multipulse integration as a neural-health correlate
in human cochlear-implant users: Relationship to forward-masking recovery. The Journal
of the Acoustical Society of America, 139(3), EL70–EL75.
https://doi.org/10.1121/1.4943783
Zhou, N., & Pfingst, B. E. (2016b). Evaluating multipulse integration as a neural-health correlate
in human cochlear-implant users: Relationship to spatial selectivity. The Journal of the
Acoustical Society of America, 140(3), 1537–1547. https://doi.org/10.1121/1.4962230
Zilany, M. S. A., Bruce, I. C., & Carney, L. H. (2014). Updated parameters and expanded
simulation options for a model of the auditory periphery. The Journal of the Acoustical
Society of America, 135(1), 283–286. https://doi.org/10.1121/1.4837815
Zwolan, T. A., Kileny, P. R., Ashbaugh, C., & Telian, S. A. (1996). Patient performance with the
Cochlear Corporation “20 + 2” implant: Bipolar versus monopolar activation. The
American Journal of Otology, 17(5), 717–723.
Zwolan, T. A., Kileny, P. R., Smith, S., Waltzman, S., Chute, P., Domico, E., Firszt, J., Hodges,
A., Mills, D., Whearty, M., Osberger, M. J., & Fisher, L. (2005). Comparison of
Continuous Interleaved Sampling and Simultaneous Analog Stimulation Speech
Processing Strategies in Newly Implanted Adults with a Clarion 1.2 Cochlear Implant.
Otology & Neurotology, 26(3), 455–465.
https://doi.org/10.1097/01.mao.0000169794.76072.16
216
Appendices
Appendix A
Chapter 2
Article and Copyright Details
The author of this manuscript has the ability
to reuse this manuscript for this dissertation
(https://publishing-aip-
org.libproxy2.usc.edu/resources/researchers/
rights-and-permissions/permissions/).
Citation in journal format: Bissmeyer, S. R.
S., and Goldsworthy, R. L. (2017).
“Adaptive spatial filtering improves speech
reception in noise while preserving binaural
cues,” The Journal of the Acoustical Society
of America, 142, 1441–1453.
doi:10.1121/1.5002691, with the permission
of AIP Publishing
217
Appendix B
Chapter 3
Article and Copyright Details
Copyright allows for the republication of
this article within this dissertation
(https://www.frontiersin.org/guidelines/polic
ies-and-publication-ethics/).
Citation in journal format: Bissmeyer, S. R.
S., Ortiz, J. R., Gan, H., and Goldsworthy,
R. L. (2022). Computer-based musical
interval training program for CI users and
listeners with no known hearing loss. Front.
Neurosci. 16, 903–924. doi:
10.3389/fnins.2022.903924.
218
Supplementary Materials
Supplemental Table B.1: Interval Training Levels with Semitone Spacing between Notes and Base
Note Frequency Range
Level Semitone Spacing; Base Note Frequency Range
1 2, 12; 110 Hz
2 2, 12; 78-156 Hz
3 2, 12; 220 Hz
4 2, 12; 156-311 Hz
5 2, 12; 440 Hz
6 2, 12; 311-622 Hz
7 2, 7; 110 Hz
8 2, 7; 78-156 Hz
9 2, 7; 220 Hz
10 2, 7; 156-311 Hz
11 2, 7; 440 Hz
12 2, 7; 311-622 Hz
13 7, 12; 110 Hz
14 7, 12; 78-156 Hz
15 7, 12; 220 Hz
16 7, 12; 156-311 Hz
17 7, 12; 440 Hz
18 7, 12; 311-622 Hz
19 4, 7, 12; 110 Hz
20 4, 7, 12; 78-156 Hz
21 4, 7, 12; 220 Hz
22 4, 7, 12; 156-311 Hz
23 4, 7, 12; 440 Hz
24 4, 7, 12; 311-622 Hz
25 2, 4, 7; 110 Hz
26 2, 4, 7; 78-156 Hz
27 2, 4, 7; 220 Hz
28 2, 4, 7; 156-311 Hz
29 2, 4, 7; 440 Hz
30 2, 4, 7; 311-622 Hz
31 1, 2, 3, 4; 110 Hz
32 1, 2, 3, 4; 78-156 Hz
33 1, 2, 3, 4; 220 Hz
34 1, 2, 3, 4; 156-311 Hz
219
35 1, 2, 3, 4; 440 Hz
36 1, 2, 3, 4; 311-622 Hz
Supplementary Figure B.1: Image and Explanation of Website
220
Appendix C
Chapter 4
Article and Copyright Details
Copyright allows for the republication of
this article within this dissertation
(https://journals-plos-
org.libproxy2.usc.edu/plosone/s/licenses-
and-copyright).
Citation in journal format: Bissmeyer
S.R.S., Hossain S., Goldsworthy R.L.
Perceptual learning of pitch provided by CI
stimulation rate. PLOS ONE. 2020;15:
e0242842.
doi:10.1371/journal.pone.0242842
221
Appendix D
Chapter 5
Article and Copyright Details
Hearing Research grants that authors can
internally circulate their manuscripts within
their institutions and use them for
dissertations as long as the article is clearly
cited
(https://www.elsevier.com/about/policies/co
pyright;https://www.elsevier.com/journals/h
earing-research/0378-5955/guide-for-
authors#txt10010).
Citation in journal format: Bissmeyer,
S.R.S., Goldsworthy, R.L., 2022.
Combining Place and Rate of Stimulation
Improves Frequency Discrimination in CI
Users. Hearing Research 424, 108583.
https://doi.org/10.1016/j.heares.2022.10858
3
222
Supplementary Materials
Supplementary Figure D.1: Example Mapping Interface
Graphical user interface to set the threshold and comfort levels for each of the eight stimulation rates, from 50 to 6400 Hz in
octave intervals
Abstract (if available)
Abstract
The main difficulties voiced by cochlear implant (CI) users are speech comprehension in noise and music perception. The goal of this thesis is to address these issues in five studies through improving frequency resolution with potential ways to improve front and back-end CI processing. In these studies, a novel noise reduction algorithm is developed to improve the clarity of speech in noise and the effects of training and frequency encoding on frequency resolution are explored in CI users. The novel binaural noise reduction algorithm performed spectral analysis of incoming sounds to improve speech reception in noise, while preserving the cues necessary to localize sounds in space. The second study explored the frequency cues available to normal hearing (NH) and CI users and their effect on psychophysics and musical interval identification. CI users performed significantly worse than NH listeners at all tasks, likely due in part to poor frequency resolution through the processor. The next three studies explored training stimulation rate as a frequency cue and whether stimulating with combined electrode place and stimulation rate can improve performance at frequency discrimination. While training stimulation rate produced varying results, combining electrode place and stimulation rate provided a significant improvement up to 400 Hz. These studies contributed to improving CI processing (1) through a novel front-end noise reduction algorithm and (2) by exploring ways to improve frequency resolution with implications for back-end CI signal processing. These studies contribute to improving hearing through CIs and have implications for future strategies to improve CI outcomes.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Understanding music perception with cochlear implants with a little help from my friends, speech and hearing aids
PDF
Did you get all that? Encoding of amplitude modulations at the auditory periphery predicts hearing outcomes
PDF
Intensity discrimination in single and multi-electrode patterns in cochlear implants
PDF
Wideband low phase-noise RF and mm-wave frequency generation
PDF
Development of back-end processing system for high frequency ultrasound b-mode imaging
PDF
Development of front-end circuits for high frequency ultrasound system
PDF
A percutaneously implantable wireless neurostimulator for treatment of stress urinary incontinence
PDF
Effects of air polishing for the treatment of peri-implant diseases: a systematic review and meta-analysis
PDF
Design and use of a biomimetic tactile microvibration sensor with human-like sensitivity and its application in texture discrimination using Bayesian exploration
PDF
High frequency ultrasound elastography and its biomedical applications
PDF
Mixed-signal integrated circuits for interference tolerance in wireless receivers and fast frequency hopping
PDF
Towards a high resolution retinal implant
PDF
Single-cell analysis with high frequency ultrasound
PDF
Moleular modelling of organic photoredox catalysts for CO₂ reduction
PDF
High frequency ultrasonic phased array system and its applications
PDF
Digital to radio frequency conversion techniques
PDF
On the electrophysiology of multielectrode recordings of the basal ganglia and thalamus to improve DBS therapy for children with secondary dystonia
PDF
Mechanical design and preclinical testing of a percutaneously implantable fetal micropacemaker
PDF
Microfluidic cell sorting with a high frequency ultrasound beam
PDF
Improving arterial spin labeling in clinical application with deep learning
Asset Metadata
Creator
Bissmeyer, Susan Rebekah Subrahmanyam
(author)
Core Title
Improving frequency resolution in cochlear implants with implications for cochlear implant front- and back-end processing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Degree Conferral Date
2022-12
Publication Date
01/13/2023
Defense Date
12/06/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cochlear implants,frequency discrimination,frequency resolution,noise reduction,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldsworthy, Raymond Lee (
committee chair
), Kalluri, Radha (
committee member
), Litvak, Leonid (
committee member
), Loeb, Gerald (
committee member
), Shera, Christopher (
committee member
)
Creator Email
ssubrahm@usc.edu,susanbissmeyer@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112715988
Unique identifier
UC112715988
Identifier
etd-BissmeyerS-11411.pdf (filename)
Legacy Identifier
etd-BissmeyerS-11411.pdf
Document Type
Dissertation
Format
theses (aat)
Rights
Bissmeyer, Susan Rebekah Subrahmanyam
Internet Media Type
application/pdf
Type
texts
Source
20230118-usctheses-batch-1001
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
cochlear implants
frequency discrimination
frequency resolution
noise reduction