Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Did you get all that? Encoding of amplitude modulations at the auditory periphery predicts hearing outcomes
(USC Thesis Other)
Did you get all that? Encoding of amplitude modulations at the auditory periphery predicts hearing outcomes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Did You Get All That?
Encoding of Amplitude Modulations at the Auditory Periphery
Predicts Hearing Outcomes
by
Andres Camarena
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
NEUROSCIENCE
August 2023
Copyright 2023 Andres Camarena
ii
Acknowledgements
I would like to thank my advisor, Dr. Raymond Goldsworthy. You are ambitious and
practical; realistic and unafraid to reinvent the wheel. I am very fortunate to have been
trained in your lab and I will always remember your influence and guidance.
I would like to thank my past and present committee members, Dr. Gerald Loeb, Dr.
Leonid Litvak, Dr. Shri Narayanan, Dr. Jason Zevin, Dr. Laurie Eisenberg, and Dr. Dani
Byrd for your advisement and support throughout my training.
Thank you to all our research participants who sacrificially listened to “beeps and bops”
to advance our understanding of hearing.
Thank you to my funding sources: the USC Caruso Department of Otolaryngology, the
Hearing and Communication Neuroscience T32 Training Program, and for Ray holding
off on buying the research-boat to instead pay for his student through the TFS4CIs R01.
Thank you to God for blessing me more than I deserve; for the family, friends, and
opportunities that surround me.
iii
Table of Contents
Acknowledgements ..........................................................................................................................ii
List of Tables ................................................................................................................................. viii
List of Figures .................................................................................................................................. ix
Abbreviations ................................................................................................................................. xii
Abstract ......................................................................................................................................... xiv
Chapter 1: General Introduction ..................................................................................................... 1
Sound Transduction to the Auditory Nerve ................................................................................ 1
Representation of Pitch at the Auditory Periphery .................................................................... 2
Spectral Excitation Along the Tonotopic Axis .......................................................................... 2
Periodicity in Spectral Excitation ............................................................................................. 3
The Auditory Nerve Response to Temporal Periodicity .............................................................. 5
General Features of Primary Afferents in the Auditory Nerve................................................ 5
Vector Strength as a Metric for Measuring Envelope Synchrony ........................................... 6
Envelope Synchrony in the Auditory Nerve ............................................................................ 8
Modulation and Carrier Frequency ......................................................................................... 8
Modulation Depth ................................................................................................................... 9
Interactions with Sound-Pressure Level ................................................................................ 10
Severe Sensorineural Hearing Loss Limits Sound Capture at the Auditory Nerve ................... 14
Electrical Stimulation of the Auditory Nerve via Cochlear Implant .......................................... 14
History of Cochlear Implants ................................................................................................. 15
Hardware Components of a Cochlear Implant ...................................................................... 16
General Sound Processing in Cochlear Implants ................................................................... 17
General Hearing Outcomes in Cochlear Implant Users ......................................................... 18
iv
General Limitations in Electric Stimulation ........................................................................... 19
Limitations in Modern Methods for the Extraction of Amplitude Modulation Across
Cochlear Implant Manufacturers .......................................................................................... 21
Cochlear Limited – Advanced Combination Encoder ............................................................ 23
MED-EL – Fine Structure Processing ...................................................................................... 24
Advanced Bionics – HiRes Fidelity 120 .................................................................................. 26
Conventional Methods for Improving Hearing Outcomes Focus on Spectral Resolution .... 26
Poor Access to Temporal Pitch Cues Limits Complex Listening ................................................ 27
Peripheral Encoding of Sound as a Factor Driving Performance .............................................. 29
Chapter 2: Pitch Resolution and Sensitivity to Amplitude Modulation Influence
Consonance/Dissonance Perception ............................................................................................ 31
Introduction............................................................................................................................... 32
Methods .................................................................................................................................... 36
Participants ............................................................................................................................ 36
Materials and Procedure ....................................................................................................... 38
Calibration Procedures .......................................................................................................... 39
Modulation Detection ........................................................................................................... 40
Fundamental Frequency Discrimination ............................................................................... 41
Consonance Identification ..................................................................................................... 42
Pleasantness Ratings ............................................................................................................. 43
Speech Reception in Multi-talker Background Noise ............................................................ 43
The Goldsmith Musical Sophistication Index ........................................................................ 44
Data Analysis .......................................................................................................................... 45
Results ....................................................................................................................................... 45
v
Calibration Procedures .......................................................................................................... 45
Modulation Detection ........................................................................................................... 47
Fundamental Frequency Discrimination ............................................................................... 48
Consonance Identification ..................................................................................................... 50
Pleasantness Ratings ............................................................................................................. 51
Speech Reception in Multi-talker Background Noise ............................................................ 53
Correlation Analysis ............................................................................................................... 54
Discussion .................................................................................................................................. 59
Chapter 3: Pitch Resolution and Sensitivity to Amplitude Modulation Influence
Sound Source Separation .............................................................................................................. 64
Introduction............................................................................................................................... 65
Methods .................................................................................................................................... 68
Participants ............................................................................................................................ 68
Remote Assessment .............................................................................................................. 70
Loudness and Sensation Levels ............................................................................................. 70
Amplitude Modulation Detection ......................................................................................... 71
Pitch Discrimination ............................................................................................................... 72
Temporal Jitter Detection ...................................................................................................... 74
Speech Reception in Single and Multi-talker Background Noise .......................................... 75
Data Analysis .......................................................................................................................... 76
Results ....................................................................................................................................... 77
Modulation Detection ........................................................................................................... 77
Frequency Discrimination ...................................................................................................... 78
Temporal Jitter Detection with Background Noise ............................................................... 79
vi
Speech Reception in Single and Multi-talker Background Noise .......................................... 81
Psychophysical Thresholds for Pitch Predict Stream Segregation
in Cochlear Implant Users ...................................................................................................... 83
Discussion .................................................................................................................................. 88
Chapter 4: The Fidelity of Capture at the Auditory Nerve Predicts Hearing Performance .......... 92
Introduction............................................................................................................................... 93
EXPERIMENT I: EFFECT OF MODULATION DEPTH ON MODULATION FREQUENCY
DISCRIMINATION OF SAM TONES AND SAM NOISE ................................................................. 96
Methods .................................................................................................................................... 96
Overview ................................................................................................................................ 96
Participants ............................................................................................................................ 96
Remote Assessment .............................................................................................................. 97
Loudness and Sensation Levels ............................................................................................. 98
Modulation Sensitivity ........................................................................................................... 99
Pitch Resolution ................................................................................................................... 101
Data Analysis – Descriptive and Inferential Statistics ......................................................... 103
Predictive Analytics.............................................................................................................. 103
Results ..................................................................................................................................... 107
Modulation Sensitivity ......................................................................................................... 107
Pitch Resolution ................................................................................................................... 108
Modulation Sensitivity Predicts Pitch Resolution for Cochlear Implant Users ................... 110
Vector Strength Predicts Pitch Resolution .......................................................................... 112
EXPERIMENT II: MODULATION SENSITIVITY AND PITCH RESOLUTION PROVIDED
BY SINGLE-ELECTRODE STIMULATION .................................................................................... 113
Methods .................................................................................................................................. 113
vii
Overview .............................................................................................................................. 113
Participants .......................................................................................................................... 114
General Stimuli .................................................................................................................... 115
Modulation Sensitivity ......................................................................................................... 116
Pitch Resolution ................................................................................................................... 117
Data Analysis – Descriptive and Inferential Statistics ......................................................... 117
Predictive Analytics.............................................................................................................. 118
Results ..................................................................................................................................... 118
Modulation Sensitivity ......................................................................................................... 118
Pitch Resolution ................................................................................................................... 119
Modulation Sensitivity Predicts Pitch Resolution................................................................ 121
Vector Strength Predicts Pitch Resolution .......................................................................... 122
Discussion ................................................................................................................................ 123
Chapter 5: General Discussion .................................................................................................... 127
Sensitivity to Peripheral Pitch Cues Contributes to Hearing Performance ............................. 127
Envelope Synchrony at the Auditory Nerve is Predictive of Performance ............................. 129
General Framework of Modulation Enhancement Strategies ................................................ 130
Limitations in Modulation Enhancement ................................................................................ 131
A Modern Implementation of Modulation Enhancement in CI Sound Processing ................ 132
Concluding Remarks ................................................................................................................ 134
References .................................................................................................................................. 135
viii
List of Tables
Table 2.1: Participant Information ................................................................................................ 38
Table 2.2: Correlation Coefficients Across Procedures ................................................................ 55
Table 2.3: Correlation Coefficients comparing MSI Performance against Perceptual
Measures ....................................................................................................................................... 58
Table 3.1: Participant Information ................................................................................................ 69
Table 3.2: Correlation Coefficients Across Procedures ................................................................ 87
Table 4.1: Participant Information for Experiment I..................................................................... 97
Table 4.2: Correlation Coefficients Comparing Pitch Resolution Across Procedures for
Acoustic Stimuli ........................................................................................................................... 111
Table 4.3: Participant Information for Experiment II.................................................................. 114
ix
List of Figures
Figure 1.1 : Perceptual Space of Temporal Periodicity ................................................................... 4
Figure 1.2: Properties of Modulation Transfer Function Cutoff Frequencies
of Auditory Nerve Fibers ................................................................................................................. 9
Figure 1.3: Envelope Synchrony of the Auditory Nerve as a Function of Modulation Depth ...... 10
Figure 1.4: Envelope Synchrony of the Auditory Nerve
as a Function of Sound-Pressure Level and Modulation Depth ................................................... 11
Figure 1.5: From Peripheral Fluctuation Profile to Midbrain Rate Profile ................................... 13
Figure 1.6: History of Cochlear Implants ...................................................................................... 16
Figure 1.7: Sound Processing in Cochlear Implants using the CIS Framework ............................. 18
Figure 1.8: Speech Recognition with Cochlear Implants Over the Years ..................................... 19
Figure 1.9: Estimated Modulation Depth Provided by Cochlear Implant Sound Processing ....... 21
Figure 1.10: Estimated Modulation Depth Provided by Cochlear Implant
Sound Processing Across Implant Manufacturers ........................................................................ 23
Figure 1.11: Vocal Emotion Recognition as a Function of Spectral Channels
and Envelope Cutoff ..................................................................................................................... 29
Figure 2.1: Calibration Procedure to Reference Sound Levels to Sensation Levels ..................... 46
Figure 2.2: Sensitivity to Amplitude Modulation ......................................................................... 48
Figure 2.3: Pitch Resolution for Harmonic Complexes ................................................................. 49
Figure 2.4: Consonance Identification for Pairs of Rendered Piano Notes .................................. 50
Figure 2.5: Pleasantness Ratings for Pairs of Rendered Piano Notes ........................................... 52
Figure 2.6: Correlations Between the Pleasantness Profile of Individuals and the Average
Ratings of the Group with No Known Hearing Loss ...................................................................... 53
Figure 2.7: Speech Reception in Multi-Talker Background Noise ................................................ 54
x
Figure 2.8: Correlations between Perceptual Measures and Modulation Sensitivity .................. 57
Figure 2.9: Correlations between Music Sophistication and Modulation Sensitivity .................. 59
Figure 3.1: Sensitivity to Amplitude Modulation .......................................................................... 78
Figure 3.2: Pitch Resolution for Pure Tones and Harmonic Complexes ...................................... 79
Figure 3.3: Stream Segregation for Tonal Stimuli as a Function of Masker Pitch Distance ......... 80
Figure 3.4: Stream Segregation for Speech Material as a Function of Masker Pitch Distance .... 82
Figure 3.5: Correlations between Perceptual Measures and Modulation Sensitivity .................. 84
Figure 3.6: Correlations between Perceptual Measures
and Pure Tone Frequency Discrimination .................................................................................... 85
Figure 3.7: Correlations between Perceptual Measures and F0 Frequency Discrimination ........ 86
Figure 4.1: Computational Modeling of the Auditory Nerve Response
to an Amplitude Modulated Tone .............................................................................................. 105
Figure 4.2: Computational Modeling of the Auditory Nerve Response
to Amplitude Modulated Noise .................................................................................................. 106
Figure 4.3: Modulation Sensitivity to Amplitude Modulated Tones and Noise ........................ 108
Figure 4.4: Pitch Resolution as a Function of Modulation Depth for Acoustic Stimuli .............. 109
Figure 4.5: Correlations between Pitch Resolution
and Modulation Sensitivity for Acoustic Stimuli ......................................................................... 111
Figure 4.6: Correlations between Pitch Resolution
and Modeled Vector Strength for Acoustic Stimuli .................................................................... 113
Figure 4.7: Modulation Sensitivity to Amplitude Modulated Pulse Trains
Compared Alongside Results from Experiment I ........................................................................ 119
Figure 4.8: Pitch Resolution as a Function of Modulation Depth
for Amplitude Modulated Pulse Trains Compared Alongside Results from Experiment I ......... 120
xi
Figure 4.9: Correlations between Pitch Resolution and Modulation Sensitivity
for Amplitude Modulated Pulse Trains ....................................................................................... 121
Figure 4.10: Correlations between Pitch Resolution and Modeled Vector Strength
across all Experiments ................................................................................................................ 123
xii
Abbreviations
AB – Advanced Bionics Cochlear Implant Company
ACE – Advanced Combination Encoder
AN – Auditory Nerve
ANOVA – Analysis of Variance
CI – Cochlear Implant
CF – Characteristic Frequency
dB – decibel
DT – Discrimination Thresholds / Detection Thresholds
F0– Fundamental Frequency
FDT – Frequency Discrimination Thresholds
FSP – Fine Structure Processing
HiRes – High Resolution
Hz – cycles/second
ILD – Interaural Level Difference
ITD – Interaural Timing Difference
JDT – Jitter Detection Threshold
ms – milliseconds
MSI – Goldsmith Musical Sophistication Index Self-Report Inventory
MTF – Modulation Transfer Function
NKHL – No Known Hearing Loss
pps – pulses-per-second
xiii
SNR – Signal-to-Noise Ratio
SPL –sound pressure level
SPIN –Speech Perception in Noise
SRT – Speech Reception Threshold
µs – microseconds
xiv
Abstract
Cochlear implants (CIs) restore hearing in people with sensorineural hearing loss and
largely rehabilitate speech understanding without the need for visual cues. That speech
understanding can be restored through an engineered device is impressive; however, most
recipients still express dissatisfaction with their hearing outcomes. In particular, cochlear
implant users commonly report difficulties engaging with music as well as listening in noisy
environments. These tasks are shaped by a number of acoustic cues, with a common and
prominent contributor being pitch. However, pitch is poorly conveyed by CI sound processing
and leave cochlear implant users with a limited ability to hear sharply and robustly. The
purpose of this thesis is to characterize the features of pitch that contribute to hearing
performance and to investigate methods for improving hearing outcomes in recipients of the
cochlear implant.
The first study investigates the features of pitch that contribute to the perception of
musical consonance and dissonance. In particular, we hypothesize that sensitivity to amplitude
modulation is a driving factor in the perceived pleasantness of musical harmony. The second
study follows this framework to characterize the features of pitch that facilitate stream
segregation and investigates the relationship between low-level psychophysical thresholds for
pitch with the ability to stream segregate. The third study continues in this framework and
investigates the relationship between modulation sensitivity and pitch resolution. The final
study takes a step beyond this characterization and demonstrates improved hearing outcomes
in cochlear implant users through enhancement of temporal envelope cues. Taken together,
xv
these studies point to sensitivity to low-level pitch cues as a limiting factor in hearing
performance and encourage advancements in sound processing that preserve the spectral and
temporal cues defining pitch.
1
Chapter 1: General Introduction
Speech, music, and environmental noise are perceived as sound when processed by the
auditory system. What begins as external vibrations are captured by the ear, transforming
mechanical energy into a rich electrical signal conveying loudness, pitch, timbre, and the many
features that describe our listening environment. The peripheral auditory system is responsible
for the capture and initial processing of sound, with the encoded information further processed
by upstream structures. But what if sound is poorly captured? While the listening environment
could truly be ideal, poor capture of sound by the auditory periphery can limit hearing
perception—even when upstream structures are intact and functioning. In the same vein,
devices meant to restore hearing can limit perception by delivering a poor signal for the
auditory system to process. This work examines the relevance of robust encoding of amplitude
modulation at the auditory periphery in hearing performance.
Sound Transduction to the Auditory Nerve
The auditory periphery is demarcated into three regions: the outer, middle, and inner
ear. The outer ear consists of the cartilage pinna, the auditory canal, and the tympanic
membrane and is responsible for shaping and directing sound energy towards the middle ear.
The middle ear is comprised of the three ossicles which are involved in the
mechanotransduction of sound energy to the cochlea within the inner ear. Vibrations of the
ossicles into the cochlea drive pressure changes in the fluid-filled chamber which vibrate the
organ of Corti —the sensory organ housing neural hair cells. Vibration in the organ of Corti
2
trigger the mechanically-gated hair cells to transduce an electrical signal to the auditory nerve,
which propagates to ascending structures.
Representation of Pitch at the Auditory Periphery
Spectral Excitation Along the Tonotopic Axis
When sound energy is transferred to the cochlea, energy corresponding to different
frequencies are distributed along the length of the basilar membrane in accordance with the
tissue’s tuning properties. In this way, the auditory nerve is tonotopically organized with the
characteristic frequency (CF) of the fiber decreasing from base to apex. The pitch of simple
stimuli such as pure tones are determined by the place-of-excitation along this tonotopic axis,
with normal-hearing youth able to hear frequencies between ~20-20,000 Hz.
Voiced speech and musical notes are harmonic signals characterized by a more complex
spectral-excitation pattern. The pitch of a harmonic signal is most saliently derived from place-
of-excitation cues associated with the fundamental frequency and lower harmonics of the
fundamental (Cariani & Delgutte 1996a; Cariani & Delgutte 1996b; Plack & Oxenham 2005;
Oxenham et al. 2011; Cedolin & Delgutte 2010; Oxenham 2012). However, pitch perception
does not degrade at high sound levels as one might expect from purely spectral modes of pitch
extraction (Cedolin & Delgutte 2005), with a “residue pitch” persisting even when the lowest
harmonics are masked or missing (Ritsma 1962; McDermott & Oxenham 2008; Oxenham et al.
2011; Wang & Walker 2012). Therefore, for at least a certain range of periodicities, a temporal
analysis of sound is performed for pitch perception.
3
Periodicity in Spectral Excitation
The bandwidth of the frequency tuning curves broadens with increasing CF. As a result,
the higher harmonics of a complex no longer provide place-of-excitation cues for pitch.
However, interactions between harmonic components within a single filter result in a neural
response with a periodic fluctuation in amplitude. The repetition rate of this amplitude
modulation is a temporal cue for pitch, and can be perceived upwards of 800 Hz in normal-
hearing adults (Figure 1, Refs. (Joris et al. 2004; Viemeister 1979)). The psychophysical
modulation transfer function —how modulation detection thresholds change as a function of
modulation frequency— is low-pass in shape with a 3-dB cut-off around 50 Hz, decaying at –4
dB/octave, and extending to around 2.2 kHz (Forrest & Green 1987; Viemeister 1979; Bacon &
Viemeister 1985; Viemeister & Plack 1993). However, interpretation of these psychophysical
results requires understanding of auditory nerve responses to modulated signals.
4
Figure 1.1 : Perceptual Space of Temporal Periodicity
Amplitude modulation (AM) stimuli generate different percepts that encompass several regions
of modulation and carrier frequencies. At very low fm, most strongly near 4 Hz and
disappearing around 20 Hz, a sensation of fluctuation or rhythm is produced (hatched). The rate
at which the temporal envelope of fluent speech varies is also typically 4 Hz (syllables/s).
Fluctuation makes a smooth transition to a percept of roughness, which starts at ∼15 Hz
(bottom curved line), is strongest near 70 Hz, and disappears below 300 Hz (top curved line).
Harmonic complex tones produce a pitch that corresponds to a frequency close to the
fundamental frequency. However, the lower harmonics can be removed without affecting the
pitch, resulting in “residue pitch” if fc and fm are chosen within the shaded region. Finally, small
interaural time differences (ITD) can be detected between modulated stimuli to the two ears
for a region of combinations of fm and fc that overlaps with the region for residue pitch (thick
line). Note that these are regions in stimulus space where modulation is perceptually relevant,
but the precise relationship of these percepts to physiological response modulation is usually
unclear. For reference, the small dots indicate –10 dB cutoff values for modulation transfer
5
functions (MTFs) of auditory nerve fibers [based on further analysis of data reported by (Joris &
Yin 1992)]. Delineation of psychophysical regions is based on References (Bernstein & Trahiotis
1994; Henning & Ashton 1981; Ritsma 1962; E 1968; Zwicker & Fastl 2013). The ordinate is
truncated at 4 Hz. Figure description is from (Joris et al. 2004).
The Auditory Nerve Response to Temporal Periodicity
Peripheral auditory neurons can “phase-lock” to the fine-structure of a pure tone, with
measurements in cat displaying synchrony up to 4-5 kHz (Johnson 1980). Likewise, phase-
locking can occur in response to the temporal envelope of a sinusoidally amplitude-modulated
(SAM) tone. While there is no spectral energy associated with the modulation frequency, the
amplitude of the neural response can fluctuate in synchrony with the envelope of the
modulated signal. As such, the poststimulus time (PST) histogram of a fiber responding to a
SAM tone will contain energy reflecting the three spectral components of the signal —the
carrier frequency as well as the two side bands (f c ± f m)— as well as at the modulation
frequency. While the ability to synchronize to the envelope varies across fiber type in the
auditory nerve, these peripheral processes come together to enable robust capture of
periodicity.
General Features of Primary Afferents in the Auditory Nerve
The auditory nerve is made up of two types of afferent fibers. Type-II fibers only make
up 5-10% of the primary afferents of the auditory nerve and are thought to mediate
nociception in the cochlea (Flores et al. 2015). Type-I fibers constitute the vast majority (90-
95%) of the auditory nerve and are attributed with general sound capture. Type-I fibers
spontaneously discharge and are further divided by their spontaneous rate (SR) of firing which
6
are typically low-SR and high-SR —sometimes further divided by a medium-SR group (Liberman
1978; Kim & Molnar 1979).
Low-SR fibers typically have a high dynamic range, and can be sensitive to both the
temporal fine stricture and the temporal envelope of an acoustic signal (Horst et al. 1985).
When the signal level is near threshold, low-SR fibers low-SR fibers are capable of encoding
the repetition rate of the carrier frequency, whereas at medium levels, the neural response
begins to reflect the modulation of the temporal envelope (Evans & Palmer 1980; Horst et al.
1990). Above 30–40 dB of threshold, low-threshold fibers saturate and activity representing
envelope modulation degrades until it resembles the carrier alone. As a result, depending on
the level of the amplitude modulated signal, the activity of the low-SR fiber could represent the
periodicity of the modulator or carrier (Delgutte n.d.).SR High-SR fibers, which make up ~60% of
the auditory nerve in cats, begin to display activity close to the hearing threshold and are easily
saturated by sound level. Much like low-SR fibers, high-SR fibers have a varying capacity to
synchronize to the envelope or carrier. Though, as a result of being easily saturated, synchrony
is oftentimes stronger for the carrier rate than to the temporal envelope. Together, high-
threshold and low-threshold fibers are capable of robustly capturing temporal features of
sound.
Vector Strength as a Metric for Measuring Envelope Synchrony
The simplest signal characterized by a temporal envelope is a SAM tone. While SAM
tones are less complex than speech and music, their parameterization can be more discretely
controlled. As such, SAM tones have been a powerful tool for characterizing the physiological
7
response to AM stimulation —in particular, how well the auditory nerve synchronizes to
periodicity in the temporal envelope.
A popular metric used to quantify envelope synchrony is “vector strength”, R, which
operates between values of 0 and 1. In neuroscience, vector strength is typically calculated
based on action potentials, which being all-or-nothing events, can be defined as:
𝑉𝑆
𝐴𝑁
= |
1
𝑁 ∑ 𝑒 𝑗 2𝜋𝑓 𝑡 𝑖 𝑁 𝑖 =1
|
Where 𝑁 is the number of action potentials, 𝑓 is the frequency (i.e. the carrier or
modulation frequency) of interest, and 𝑡 𝑖 is the time at which the action potential occurred
(Goldberg & Brown 1968; van Hemmen 2013). This process considers the spike timing of each
action potential against the stimulus period of interest and measures the extent that firing is
synchronized to the peaks of the modulation cycle. An R value of 1 would indicate that spikes
fired perfectly in phase with the peaks of the modulation cycle. An R value of 0 would indicate
that action potentials were uniformly distributed across time, and were not clustered in a
manner reflecting the modulation frequency of interest. While synchronization to the envelope
is largely centered at the peaks of the modulation cycle, both neural response properties and
stimulus parameterization can have a varying effect on neural synchrony to modulation. Using
the vector strength metric, synchrony to modulation can be characterized with precise control
to gain insight into how the auditory nerve responds to modulated signals.
8
Envelope Synchrony in the Auditory Nerve
The primary features of an amplitude modulated stimulus are modulation frequency,
carrier frequency, modulation depth, and sound-pressure level. Their general effect on
envelope synchrony in the cat auditory nerve (Joris & Yin 1992) is discussed below.
Modulation and Carrier Frequency
Beating of harmonically related components within a cochlear filter results in amplitude
modulation in the neural response —which is the modulation frequency. Physiological
modulation transfer functions are low-pass in shape with the corner frequency defined by the
bandwidth of the cochlear filter. This stereotyped shape is maintained across fibers of different
CF and SR, with the corner frequency shifting with increasing CF of the stimulated fiber (Figure
2a, Ref. (Joris & Yin 1992)). This increase in the corner frequency of the modulation transfer
function largely reflects the increase in the filter bandwidth along the tonotopic axis (Figure 2b,
Ref. (Joris & Yin 1992)).
9
Figure 1.2: Properties of Modulation Transfer Function Cutoff Frequencies of Auditory Nerve
Fibers
MTF 3 dB cutoff frequencies as a function of tuning curve parameters for low (triangles) and high
( + ) SR fibers. (a) f 3 dB versus CF. (b) f 3 dB versus tuning curve bandwidth at 10 dB above
threshold. Figure description is from (Joris & Yin 1992).
Modulation Depth
Increasing modulation depth of a SAM stimulus results in a mostly monotonic increase
in envelope synchrony at the auditory nerve (Figure 3a, Ref. (Joris & Yin 1992)). Though, the
response is typically more modulated than the stimulus (Figure 3b). Only portions of the
stimulus that reach above threshold are reflected in the firing response of the nerve fiber.
Therefore, the neural response is effectively a rectified version of the stimulus input.
10
Figure 1.3: Envelope Synchrony of the Auditory Nerve as a Function of Modulation Depth
Effect of increasing m in a high-frequency fiber (CF = 20.2 kHz, SR= 53 s
-1
). f m = 100Hz, SPL= 49
dB. (a) Comparison of period histograms and half-wave rectified AM stimuli. (b) Graphed on a
common abscissa are synchrony to f m (circles) and gain (crosses). (a) and (b) are obtained from
the same responses. Dotted line indicates zero gain. F c = CF and closed symbols indicate
nonsignificant R values, unless otherwise specified. Figure description is from (Joris & Yin 1992).
Interactions with Sound-Pressure Level
Envelope synchrony changes non-monotonically with increasing presentation level of
the stimulus. This synchrony-level function has a stereotypical shape with synchrony quickly
reaching its maximum before sloping downwards with increasing level. The level that provides
the best envelope synchrony is typically between threshold to a tone at the fiber’s CF and the
11
fiber’s saturation point. Above this best modulation level (BML), synchrony to the envelope
decreases linearly before plateauing out at high levels. This stereotyped shape is preserved
even when modulation depth (Figure 4a) or modulation frequency (Figure 4b, Ref. (Joris & Yin
1992)) are varied.
Figure 1.4: Envelope Synchrony of the Auditory Nerve as a Function of Sound-Pressure Level
and Modulation Depth
Effect of parametric change of m (a) and f m (b) on synchrony-level functions. (a) CF = 2 kHz, SR =
27 s
-1
, f m= 50 Hz. (b) CF = 27.2 kHz, SR = 32 s
-1
, m = 0.99. Figure description is from (Joris & Yin
1992).
12
Conversational speech level levels are in the range of ~55-65 dB SPL (Olsen 1998), and
are notably greater than the best modulation levels observed by (Joris & Yin 1992). However,
this discrepancy may reflect the methodology used to characterize the auditory nerve response
to modulation —specifically, that the carrier frequency of the modulated tone was always set at
the characteristic frequency of the probed fiber. Envelope synchrony at fibers having a
characteristic frequency near a harmonic or vowel formant is typically poor, and instead
synchronize to the locally tuned carrier frequency to the extent that phase locking is capable
(Delgutte & Kiang 1984; Sachs et al. 2002; Kumaresan et al. 2013). Rather, fibers tuned between
spectral peaks display strong synchrony to the stimulus F0 (Carney et al. 2015; Carney 2018),
with estimations of vector strength to the periodicity of F0 as high as 0.97 (Goldsworthy 2022).
In this manner, envelope synchrony remains robust even for sounds at conversational levels.
The fluctuating profile of envelope synchrony across the length of the auditory nerve can also
be seen in the modeled response of the inferior colliculus (Figure 5; Ref. (Carney 2018)).
Neurons in the inferior colliculus (IC) can display tuning to both the stimulus fine structure as
well as to the periodic amplitude fluctuations in the auditory nerve response (Langner &
Schreiner 1988; Krishna & Semple 2000; Nelson & Carney 2007; Joris et al. 2004). Notably, their
tuning for modulation frequency are in the range of voiced pitch (Langner 1992), and point to
envelope periodicity as a relevant cue for upstream processing (Delgutte et al. 1998).
13
Figure 1.5: From Peripheral Fluctuation Profile to Midbrain Rate Profile
From peripheral fluctuation profile to midbrain rate profile. Top: The spectrum of the vowel
/æ/ (from “had,” black line) is dominated by harmonics of the fundamental (F0 = 115 Hz) and
has an overall shape that is determined by resonances of the vocal tract (Fant 1960). The
spectral envelope (dashed line) highlights spectral peaks at the formant frequency locations.
Responses of model auditory nerve fibers with CF = 500 Hz (below F1), 700 Hz (near F1), 1200
Hz (between F1 and F2), and 1800 Hz (near F2). Responses of auditory nerve fibers tuned near
formant peaks have small low-frequency fluctuations because they are dominated by a single
harmonic (synchrony capture). Fibers tuned away from spectral peaks have responses that
fluctuate strongly at F0, in addition to phase-locking to harmonics near the characteristic
frequency. Bottom: responses of a simple IC model consisting of a bandpass filter centered at
100 Hz with a bandwidth of 100 Hz (i.e., Q = 1). Model IC neurons have large differences in rate
due to the differences in fluctuation amplitudes in each channel. The fluctuation amplitude
profile across auditory nerve frequency channels is thus converted into a rate profile across IC
neurons with bandpass modulation transfer functions (MTFs). IC neurons phase-lock to low-
frequency fluctuations (review: (Rees & Langner 2005)). Vowel waveform is from the
(Hillenbrand et al. 1995) database. Figure description is from (Carney 2018).
14
The response properties of the auditory nerve demonstrate a robust capacity for
envelope phase-locking in a range relevant to our natural acoustic environment (Joris et al.
2004). However, the extent that envelope cues are preserved in the stimulation pattern
conveyed by cochlear implant sound processing is unclear.
Severe Sensorineural Hearing Loss Limits Sound Capture at the Auditory Nerve
Hearing loss can occur even when the mechanotransduction of sound energy to the
cochlea is intact. This kind of peripheral hearing loss is categorized as sensorineural and can be
the result of damage caused by degenerative processes in aging, noise exposure, or ototoxic
drugs. The prevalence of disabling sensorineural hearing loss has risen since it was first
reported, with global estimates around 1% of the global population in 1985 to approximately
6.1% in 2018 (Geneva: World Health Organization 2018). Amplification of sound via hearing aid
is a common treatment for mild to moderate sensorineural hearing loss; however, persons with
severe impairment may not benefit if hair cells are unable to become stimulated by mechanical
energy. In these instances, electrical stimulation of the auditory nerve can be considered.
Electrical Stimulation of the Auditory Nerve via Cochlear Implant
Cochlear implants are a surgical intervention to restore hearing in those with severe
sensorineural hearing loss. These devices bypass the transducer structures to provide direct
stimulation to the auditory nerve, largely improving hearing and health-utility outcomes (Wyatt
et al. 1995; Saunders et al. 2016).
15
History of Cochlear Implants
The earliest and most prominent indicator that electrical stimulation could evoke an
auditory percept came from Alessandro Volta in 1790, in which he boldly stimulated his ears
with conductive rods driven by a 50 V battery stack. The “boom” that followed was a convincing
demonstration of electrically-driven hearing as the experiment was never repeated. But even if
the nerve could be driven to evoke a sound sensation, how the cochlea responds to different
sounds was not fully understood. By the 1900’s, cochlear anatomy as well as its tonotopicity
were more greatly understood (Corti 1851; Kolmer 1909; von Békésy 1928). In particular,
Wever and Bray noted that electric potentials in the cochlea followed the waveform of sound,
suggesting that stimuli could be designed to elicit specific percepts (Wever & Bray 1930).
In 1957, Djourno and Eyriès implanted the first device for electrical stimulation of the
auditory nerve (Djourno & Eyries 1957). Remarkably, the patient could detect sounds generated
by the single-electrode device and could coarsely discriminate between frequencies up to 1000
Hz, though they could not understand speech. This demonstration of electrically-evoked
hearing sparked the development of increasingly more intricate devices (Figure 6, Ref. (Wilson
& Dorman 2008)). As of December 2019 there were approximately 736,000 registered devices
implanted worldwide (NIDCD 2021) —up from 324,200 in 2012 (NIDCD 2021).
16
Figure 1.6: History of Cochlear Implants
Early history of cochlear implants. Developers and places of origin are shown, along with a
timeline for the various efforts. Initial stages of development are depicted with the light lines,
and clinical applications of devices are depicted with the heavy lines. Most of these devices are
no longer in use, and many of the development efforts have been discontinued. Present devices
and efforts are described in the text. (This figure is adapted from a historical model
conceptualized by Donald K. Eddington, Ph.D., of the Massachusetts Eye & Ear Infirmary, and is
used here with his permission. The figure also appeared in Niparko and Wilson, 2000, and is
reprised here with the permission of Lippincott Williams & Wilkins.) Figure description is from
(Dorman & Wilson 2004).
Hardware Components of a Cochlear Implant
The major components that make up a modern cochlear implant are (1) a microphone;
(2) a speech processor; (3) a transcutaneous transmitter for power and instruction for stimulus;
17
(4) an implanted receiver/stimulator; (5) a multi-wire cable; and (6) reference and intracochlear
electrodes (Wilson 2004).
General Sound Processing in Cochlear Implants
Continuous Interleaved Sampling (CIS) is a simple and widely used process descriptive of
many modern sound processing strategies (Figure 7, Refs. (Zeng et al. 2008)). Sound is captured
by the external microphone on the behind-the-ear processor which, depending on the system,
operates in a frequency range of 70-8500 Hz (Koch et al. 2004; Hochmair et al. 2006; Balkany et
al. 2007). A low-pass, pre-emphasis filter is applied with a corner frequency near 1 kHz and
rolloff around -6 dB/octave, with the signal then band-passed into sixth-order Butterworth
filters numbering the amount of stimulation channels offered by the device (Loizou 1998). The
envelopes of the filtered signals are extracted either through full-wave rectification or a Hilbert
transformation of the signal, followed by low-pass filtering with a cutoff between 125-400 Hz
(Zeng et al. 2008; Wouters et al. 2015). Patient-specific maps are used to shape the boundaries
of a nonlinear compressor which adjust the signal to the individual’s electrical dynamic range.
In so doing, many devices can operate with an input dynamic range of ~75-80 dB —matching
the range of amplitude variations in speech and environmental sounds (Zeng et al. 2002; James
et al. 2003). The level-adjusted modulation envelope of each stimulation channel is then
applied to a fixed-rate, biphasic carrier and is delivered by its corresponding stimulation-
channel. Notably, stimulation in CIS-related strategies is delivered in an interleaved manner.
While interleaved stimulation of a fixed-rate carrier removes the temporal fine structure of the
original signal, the approach curbs the effect of current interactions between electrodes which
commonly result in unexpected changes in loudness and pitch perception.
18
Figure 1.7: Sound Processing in Cochlear Implants using the CIS Framework
Block diagram and signal processing in the continuous-interleaved-sampling (CIS) strategy.
Figure description is from (Zeng et al. 2008).
General Hearing Outcomes in Cochlear Implant Users
Modern devices have expanded on the CIS strategy, each with their own unique method
to preserve the spectral and temporal features of sound. Despite these efforts, speech
comprehension plateaus to similar levels across devices and sound processing strategies (Figure
8, Ref. (Zeng et al. 2008)), with cochlear implant users often having poorer outcomes compared
to their normal-hearing peers (Dorman et al. 1991; Goldsworthy et al. 2013; Goldsworthy 2015;
Barda et al. 2018). Outside the speech domain, CI users face additional deficits including music
appreciation, sound source separation, as well as coarser pitch resolution (Limb & Roy 2014;
Galvin et al. 2009; Donnelly et al. 2009; Chatterjee & Oberzut 2011). These deficits in
19
perception are caused in part by the degeneration of the auditory nerve; however, the poor
capture of sound is exacerbated by limitations in the electrical stimulus.
Figure 1.8: Speech Recognition with Cochlear Implants Over the Years
Sentence recognition scores with a quiet background as a function of time for the 3M/House
single-electrode device (first column), the Cochlear Nucleus device (filled bars), the Advanced
Bionics Clarion device (open bars), and the Med-El device (shaded bars). Previous results before
2004 were summarized in (Zeng 2004). The latest results were obtained in the following
references: Nucleus Freedom (Balkany et al. 2007), Clarion HiRes system (Koch et al. 2004), and
Med-El Opus device (Arnoldner et al. 2007). Figure description is from (Zeng et al. 2008).
General Limitations in Electric Stimulation
Cochlear implants attempt to convey the spectral and temporal characteristics of sound,
but face limitations on both fronts. Place-of-excitation cues are limited by the number of
electrodes along the length of the auditory nerve. While the healthy auditory nerve contains
around 30,000 fibers, cochlear implants use an array of no more than 22 electrodes —further
degraded by the spread of electrical current— and are unable to provide the resolution needed
20
to resolve harmonic components (Wouters et al. 2015; Limb & Roy 2014; Dorman & Wilson
2004).
This hardware limitation can be partially circumvented by an adjustment in the
bandpass filterbank. In contrast to the cochlear filters seen in the healthy auditory nerve,
cochlear implants often incorporate narrow filters with minimal overlap. The rationale for this
design is to improve the spectral resolution currently limited by the small number of electrodes,
but doing so limits the number of harmonic components that interact within a filter to provide
deep modulation in the temporal envelope. As a result, periodicity in CI stimulation is often
degraded —even at frequencies designed for envelope extraction— with the remaining
modulation at times asynchronous across stimulation channels (Figure 1.9, Ref. (Milczynski et
al. 2009)). Increasing the upper bound for envelope extraction would extend the range of
periodicities captured by CIs, but would not address the degraded depth in modulation. A direct
and simple solution would be to increase the overlap between adjacent filters, improving
envelope cues at the cost of spectral resolution. The compromise made in commercial devices
leaves much to be desired, with CI users having limited access to the basic cues required for
pitch perception and complex listening. Manufacturers for cochlear implants approach this
compromise differently, each with unique methods for preserving the spectral and envelope
cues of the acoustic signal. With respect to the capture and representation of amplitude
modulation cues, the processing blocks with the most direct relation are the frequency filter
bank and the envelope extractor—with filter structure, parameterization, and additional
processing differing between implant manufacturers.
21
Figure 1.9: Estimated Modulation Depth Provided by Cochlear Implant Sound Processing
Modulation depth calculated from temporal information available in each electrodogram to
harmonic complexes with F0 ranging from 104 to 330 Hz. Each complex stimulus contained 16
harmonics, meaning that the bandwidth of the stimuli ranged from 1500 (lowest F0 stimulus) to
almost 5000 Hz (highest F0 stimulus). The stimulus waveforms were all preprocessed in the
nucleus MATLAB toolbox (NMT) developed by Cochlear Corp and delivered directly to the
subject’s implant. For ACE the calculations were performed with [denoted by ACE (ip)] and
without phase [dashed line denoted by ACE (oop)] information. Figure and description adapted
from (Milczynski et al. 2009).
Limitations in Modern Methods for the Extraction of Amplitude Modulation Across
Cochlear Implant Manufacturers
Cochlear Limited, MED-EL, and Advanced Bionics are three leading manufacturers of
cochlear implant systems. Each company has developed its own unique signal processing
strategies and filter bank design to provide the best possible hearing experience for users.
While the exact methods differ, the practical goal shared by these companies is the
preservation of the spectral and temporal features of sound while minimizing the processing
22
time, power consumption, and distortions of the processing scheme. However, technical
limitations for the extraction of amplitude modulations persist (Figure 1.10, Ref. (Goldsworthy
& Bissmeyer (in press))).
Goldsworthy and Bissmeyer (in press) quantified the extent that amplitude modulations
in the range of voiced and musical pitch were conveyed via the electrical stimulation of three
major cochlear implant manufacturers: Cochlear Limited, MED-EL, and Advanced Bionics. While
filter specifications of the sound processing emulation can affect exact values of the encoding
of amplitude modulations, the general trend observed when comparing simulations across
input fundamental frequencies was that envelope synchrony tended to be highest for MED-EL
devices, followed by Cochlear, and then Advanced Bionics. The authors observed that the depth
of modulation was deeper and more coherent across channels for MED-EL devices compared to
that provided by the Cochlear and Advanced Bionics systems. Regarding the poorer envelope
synchrony observed in the emulation of Advanced Bionics’ sound processing, the authors posit
that synchrony is dampened by the narrow spectral filtering used to support the system’s high
number of spectral channels—even narrower than that used by Cochear’s Advanced
Combination Encoder (ACE) and MED-EL’s Fine Structure Processing (FS4-p). The general
approach and limitations of the filtering used by each of these manufacturers are discussed
below.
23
Figure 1.10: Estimated Modulation Depth Provided by Cochlear Implant Sound Processing
Across Implant Manufacturers
Modulation depth calculated from temporal information available in the electrical stimulation
pattern provided by various cochlear manufacturers. Stimuli were high-pass filtered harmonic
complexes with F0 ranging from 55 to 880 Hz. Figure adapted from (Goldsworthy & Bissmeyer
(in press)).
Cochlear Limited – Advanced Combination Encoder
The Advanced Combination Encoder (ACE) strategy developed for Cochlear’s Nucleus
systems uses a bandpass filter bank consisting of 22 Finite Impulse Response (FIR) filters—each
filter corresponds to an intracochlear stimulating electrode and together cover the range of
human speech frequencies (McDermott et al. 1992). The FIR filters implemented in the ACE
strategy are designed to provide high spectral resolution with minimal phase distortion, but
require the use of a sliding-window analysis—2-4 ms frames in which a Fast Fourier Transform
is performed (Zeng et al. 2008). The bandwidth of each filter is determined by the number of
filter coefficients used, with Cochlear systems typically employing narrow, sixth-order
Butterworth filters (Wilson 2004). To reduce redundancy across stimulation channels and
24
improve spectral resolution, adjacent filters minimally overlap such that spectral information is
represented more discretely. The rationale for this design is to improve the spectral resolution
currently limited by the small number of electrodes; however, doing so limits the number of
harmonic components that interact within a filter to provide deep modulation in the bandpass
filtered signal—extracted via Hilbert transformation for frequencies up to the analysis rate of
the device. Moreover, the length of the temporal frame required to perform the FFT can act as
a low-pass filter, limiting envelope extraction. The temporal frame determines the time
resolution of the FFT analysis—a longer temporal frame allows for a more accurate
representation of the frequency content of sound, but also results in poorer capture of
amplitude modulation (Nogueira et al. 2009). As a result, extraction of modulation frequency
within the speech range may be limited by the output of the implemented filter bank as well as
the time-course of the spectral analysis. Further deficits to modulation encoding may occur
through the channel selection process employed by the ACE processing strategy. In contrast to
CIS, the ACE strategy selects a subset of the most significant bands (typically 8 to 12) based on
the channel energy at each processing cycle (Wouters et al. 2015). However, amplitude
modulations are typically poor at the location of spectral peaks, and more prominent at
channels with lower spectral energy (Carney 2018). Therefore, the selection of only the most
prominent spectral channels may further limit modulation encoding in the ACE processing
scheme.
MED-EL – Fine Structure Processing
MED-EL uses the Fine Structure Processing (FS4-p) coding strategy, which builds upon
the CIS strategy to provide additional temporal fine structure information (Riss et al. 2014). FS4-
25
p employs a filter bank with 12 bandpass filters, each corresponding to an electrode in the
implant, with a bell-shaped frequency response for each stimulation channel. For all but the
most apical channels, envelope extraction is performed via half-wave rectification of the
bandpass-filtered output. In contrast to the FIR filters used by Cochlear, the infinite impulse
response (IIR) filters used by MED-EL do not have explicit time constraints to perform a spectral
analysis. As such, amplitude modulations with an IIR filter are not constrained by the time-
course of the spectral analysis. However, the non-linear phase response of the IIR filter can
introduce phase distortions in the filtered signal and result in a poor representation of slow
amplitude modulations.
Additional temporal cues are provided through the processing done on the most apical
channels. The FS4-p strategy incorporates temporal fine structure information by providing
additional stimulation pulses at the zero-crossings of the input signal in the low-frequency
bands. Specifically, the four most apical channels in which the upper frequency limit is set lower
than 950 Hz are stimulated in parallel to provide up to two fine structure channels (Riss et al.
2014). On these channels, stimulation bursts are delivered in response to positive zero-
crossings in the bandpass-filtered waveform, with the shape of the modulation envelope
adjusted to approximate the half-wave rectified output that occurs in the basal channels
(Wouters et al. 2015). While the parallel stimulation used in FS4-p reduces the number of fine
structure channels compared to previous iterations of the processing scheme, the temporal
precision is improved such that nearly all zero crossing in the fine structure channels lead to
synchronous stimulation for frequencies up to 950 Hz (Riss et al. 2014). Therefore, while the
most basal channels are limited by phase distortions and the parameterization of the filter
26
bank, the fine structure channels are capable of strongly delivering amplitude modulation cues
that extend even beyond voiced fundamental frequencies.
Advanced Bionics – HiRes Fidelity 120
The filter bank used in Advanced Bionic’s HiRes Fidelity 120 sound processing strategy is
a much denser filter bank than that used by Cochlear and MED-EL, with 120 channels that span
the entire range of human hearing (Nogueira et al. 2009). While the number of stimulation
channels typically matches the number of intracochlear electrodes, the Advanced Bionics
system creates more stimulation sites by steering current between physical electrodes.
However, the creation and use of additional stimulation sites requires an even finer spectral
analysis. This analysis is performed by an FIR filter bank followed by an FFT sliding window
analysis, with the Hilbert envelope computed from the FFT bins for modulations up to 2800 Hz
(Koch et al. 2004). However, while the system is capable of detecting very high periodicities, the
extent that amplitude modulations are present in the bandpass filtered waveform may be
affected by the compromise between spectral and temporal resolution.
Conventional Methods for Improving Hearing Outcomes Focus on Spectral Resolution
Conventional methods for improving hearing outcomes in CI users focus on
improvements to spectral resolution and do not typically co-employ improvements to envelope
cues. By varying the current between electrode pairs, a point of maximal excitation can be
created in a space between physical electrodes. Notably, this virtual channel elicits a tonotopic
pitch distinct from and intermediate to the stimulating electrodes (Donaldson et al. 2005; Firszt
et al. 2007; Koch et al. 2007). Further sharpening of the virtual channel can be provided by
taking advantage of current interactions between channels such that charge destructively
27
interferes at all locations aside from the intended stimulation channel (van den Honert & Kelsall
2007). It was reasonable to predict that virtual channels could improve hearing outcomes
through increased spectral resolution. However, algorithms for current steering and sharpening
have only had modest benefits (van den Honert & Kelsall 2007; Berenstein et al. 2008; Nogueira
et al. 2009; Srinivasan et al. 2013). In fact, speech comprehension and pitch perception in CI
users plateaus around 6-10 spectral channels (Fishman Kim E. et al. 1997; Fu et al. 1998;
Faulkner et al. 2001; Friesen et al. 2001) and call into question the benefit such strategies can
offer.
There is an additional tradeoff between frequency resolution and temporal resolution
when using current steering strategies. The increase in stimulation sites requires a finer spectral
analysis, often performed by a Fast Fourier Transform (FFT), with the spectral content of sound
estimated by a sliding window analysis (Nogueira et al. 2009). However, this process can act as
a low-pass filter limiting envelope extraction. Moreover, level adjustments in the electrode pair
of a virtual channel can introduce temporal fluctuations unrelated to the acoustic signal, further
smearing periodicity cues. As a result, benefits from increased spectral resolution via current
steering and sharpening may be offset by reductions in temporal envelope cues (Drennan et al.
2010; de Jong et al. 2017). Consideration must be made for the contribution of periodicity pitch
in complex listening tasks.
Poor Access to Temporal Pitch Cues Limits Complex Listening
A range of acoustic cues are often used during complex listening, though the
contribution of each of these cues is dependent on the task. For instance, while there are a
28
number of covarying cues that contribute to vocal emotion, the pitch of F0 is particularly
important (Williams & Stevens 1972; Murray & Arnott 1993; Banse & Scherer 1996; Yildirim et
al. 2004). However, CI users have reduced access to the spectral and temporal cues conveying
pitch and often have poorer recognition of vocal emotion (House 1994; Pereira 2000; Schorr et
al. 2004; Luo et al. 2007). While this deficit may exist in part due to degeneration of the
auditory pathway, deficits in complex listening are oftentimes mirrored in normal hearing
adults when listening through simulations of CI sound processing (Qin & Oxenham 2003; Qin &
Oxenham 2005; Stickney et al. 2007; Cullington & Zeng 2008). In these instances, poor access to
the relevant acoustic cues is the limiting factor in perception —even though the machinery for
the encoding and processing of information are functional. Therefore, complex listening
depends not just on the relevant cues being present, but in their fidelity as they reach the
neural sensor.
For instance, (Luo et al. 2007) investigated the contribution of spectral and temporal
pitch cues in vocal emotion recognition with stimuli varying in spectral resolution and in the
upper frequency cutoff of the provided temporal envelope —normal hearing adults listened to
a simulation of CI sound processing and CI users listened through a modified speech processor.
For both normal hearing and CI users, performance improved with additional spectral cues,
though performance quickly plateaued in CI users. Increasing the envelope filter cutoff
frequency from 50 Hz into the upper range of periodicity pitch had an immediate and sizable
benefit in performance in both groups. Notably, CI performance with the 1-channel/400-Hz
envelope filter processor produced nearly the same mean performance as the 8-channel/50-Hz
envelope filter processor.
29
Figure 1.11: Vocal Emotion Recognition as a Function of Spectral Channels and Envelope Cutoff
Mean vocal emotion recognition scores for A) normal-hearing and B) cochlear implant subjects
listening to amplitude-normalized speech as a function of the number of channels. The open
downward triangles show data with the low-cutoff temporal envelope filter, and the filled
upward triangles show data with the high-cutoff temporal envelope filter. The filled circle
shows mean performance when listening to amplitude-normalized speech. The error bars
represent 1 SD. The dashed horizontal line indicates chance performance level (ie, 20% correct).
Figure and description adapted from (Luo et al. 2007).
Despite the degraded temporal pitch cues offered in CI sound processing, these data
suggest that envelope cues are still accessible to CI users and that they contribute to complex
listening. Moreover, these data suggest that access to temporal envelope cues may be able to
compensate for reduced spectral resolution and point to enhancement of temporal envelope
cues in electric stimulation as an avenue for hearing restoration.
Peripheral Encoding of Sound as a Factor Driving Performance
Meaningful improvements in hearing outcomes requires effort in two areas: 1)
knowledge of the cues relevant to a listening task and 2) robust delivery of those cues in the
30
electric stimulus. The performed work demonstrates this process across a range of listening
tasks: consonance/dissonance perception, stream segregation, and modulation frequency
discrimination. The three studies were designed to characterize similarities and differences
between cochlear implant users with normal hearing, and to characterize relationships
between low-level psychophysical sensitivities to modulation and pitch with performance
during listening tasks. The final study takes a step beyond this characterization and
demonstrates improved hearing outcomes in CI users through enhancement of temporal
envelope cues. Taken together, these data point to sensitivity to low-level pitch cues as a
limiting factor in hearing performance and encourage advancements in sound processing that
preserve the spectral and temporal cues defining pitch.
31
Chapter 2: Pitch Resolution and Sensitivity to Amplitude Modulation
Influence Consonance/Dissonance Perception
The work described in this chapter was published in Brain Sciences:
Camarena, A., Manchala, G., Papadopoulos, J., O’Connell, S. R., & Goldsworthy, R. L. (2021).
Pleasantness ratings of musical dyads in cochlear implant users. Brain Sciences, 12(1), 33.
Authors: Andres Camarena
1
, Grace Manchala
1
, Julianne Papadopolous
1,2
, Samantha R.
O’Connel
1
, and Raymond Goldsworthy
1*
1
Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine,
University of Southern California, Los Angeles, CA 90033, USA
2
Thornton School of Music, University of Southern California, Los Angeles, CA 90089, USA
*
Author to whom correspondence should be addressed.
32
Introduction
Music is a powerful tool used to express and elicit emotion. It can be a deeply personal
source of enjoyment. Hearing loss, however, can dampen or distort incoming sound, and thus
greatly reduce music appreciation. Cochlear implants (CIs) restore hearing in people with
sensorineural hearing loss and largely rehabilitate speech understanding without the need for
visual cues. Despite having generally high levels of speech comprehension, music appreciation
in CI users is largely diminished compared to their normal-hearing peers (Gfeller et al. 2007;
Cullington & Zeng 2008). In particular, CI users struggle with facets of musical listening including
pitch perception, instrument identification, and melody recognition (Gfeller & Lansing 1991;
Gfeller et al. 2006; Gfeller et al. 2008). This dampened enjoyment of music has come to be
expected by both clinicians and prospective CI recipients and can be a major factor in
determining whether a person goes forward with implantation (Eisenberg 1982). Therefore,
further investigation is warranted to determine the aspects of hearing that significantly impact
how music is appreciated in the hard-of-hearing community.
The perceptual deficits that CI users face are largely caused by technological limits of the
implanted electrodes. While the healthy auditory nerve contains around 30,000 fibers, cochlear
implants use an array of no more than 22 electrodes. This limited range of electrodes along
with the spread of electrical current reduces the resolution needed to resolve harmonics (Limb
& Roy 2014; Wouters et al. 2015; Dorman & Wilson 2004). Likewise, temporal cues for pitch are
dampened by how sound processing for cochlear implants converts sound into electrical
stimulation. In healthy hearing, when multiple harmonics interact within the cochlea of the
inner ear, they produce a temporal beating at the fundamental frequency. This beating
33
produces deep modulations in the auditory nerve response, which is a clear temporal cue for
pitch perception. In contrast to the cochlear filters of healthy hearing, cochlear implants use
narrow filters with minimal overlap (Barda et al. 2018; Loizou 1997). Discrete filters are used to
allocate narrow frequency bands to the limited number of implanted electrodes. The
consequence, however, is that fewer harmonics interact within a filter, which results in shallow
modulation of the neural response—timing cues relevant to the perception of pitch.
Poor representation of place and timing cues for pitch and timbre in CIs has a marked
effect on music appreciation for CI users (Gfeller & Lansing 1991; Gfeller et al. 2006; Gfeller et
al. 2008; Looi & She 2010; Gfeller et al. 2012). Pitch is an important aspect of music listening—it
is the sensation of hearing a single sound from a complete range of sounds and is a building
block to create musical melodies. The pitch of a harmonic signal is most saliently derived from
place-of-excitation cues associated with the fundamental frequency and lower harmonics of the
fundamental (Cariani & Delgutte 1996a; Cariani & Delgutte 1996b; Plack & Oxenham 2005;
Oxenham et al. 2011; Cedolin & Delgutte 2010; Oxenham 2012). For higher harmonics, cochlear
filters become increasingly broad, and harmonic components no longer provide discernable
place cues for pitch. However, interactions between harmonic components within a single filter
result in a neural response with a periodic fluctuation in amplitude. The repetition rate of this
amplitude modulation is a temporal cue for pitch that can be perceived at least up to around
500 Hz in normal-hearing adults but is often lower in those with hearing impairment
(Shackleton & Carlyon 1994; Carlyon & Shackleton 1994; Carlyon & Deeks 2002; Venter &
Hanekom 2014; Zeng 2002; McKay et al. 2000; Carlyon et al. 2010).
34
Timbre is also a vital component to enjoying music. It is described by the American
Standards Association (1960) as “that attribute of sensation in terms of which a listener can
judge those two sounds having the same loudness and pitch are dissimilar”, and is often
associated with the character or brightness of a sound. While timbre is often defined by what it
is not (e.g., that it is not loudness or pitch), it can be clearly described by several acoustic
features. For instance, by extending the attack time of a trumpet, it becomes qualitatively like a
violin; yet the two instruments can still be distinguished by their unique spectral content even if
they were playing the same note. The timbre of the harmonic signal is comprised of three
acoustic features that drive timbre perception, including the temporal envelope, the spectral
envelope, and spectral flux (Rossing 1989; McAdams et al. 1995; Kong et al. 2011; Patil et al.
2012). The independence between pitch and timbre was demonstrated in a psychophysical
experiment by Plomp and Steeneken (1971) where they concluded that timbre has a perceptual
correlate of spectral excitation along the basilar membrane (Plomp & Steeneken 1971).
Together, pitch and timbre are both needed to provide the scaffolding that makes the
perception of voiced speech and musical notes an enjoyable experience.
Oftentimes, songs are not constructed out of single notes but composed of chords:
musical structures formed by the simultaneous presentation of two or more harmonic sounds.
Music’s perceived pleasantness is driven by how well the sources harmonize. The degree of
harmony, however, varies with the interval distance between the combined harmonics. A single
harmonic series will have overtones spaced at integer multiples of the fundamental. When
played simultaneously with another harmonic series, the components of each series may fuse
in pleasant consonance. In contrast, the overtones produced at more awkward intervals may
35
sound harsh and dissonant. For example, complexes with a fundamental ratio of 1:1, 1:2, or 2:3
tend to sound consonant, whereas ratios of 8:9, 8:15, or 32:45 are often considered dissonant
(Dowling & Harwood 1986; Deutsch 2007; Bidelman & Krishnan 2009; Tramo et al. 2001;
McDermott et al. 2010). When described in musical notation, unison, octave, and perfect fifth
are considered consonant, while major second, major seventh, and tritone are dissonant. In the
present study, pleasantness ratings of two-note chords referred to as “dyads” is examined in
detail.
One factor that may impact music enjoyment for CI users may be their perception of
consonance and dissonance. While CI users can rank or rate stimuli based on dissonance,
performance is often poorer or less pronounced than in normal hearing listeners (Knobloch et
al. 2018; Crew et al. 2015; Crew et al. 2016). While their absolute ratings were often lower than
normal hearing listeners, Spitzer and colleagues (2008) demonstrated that elements of the
profile of pleasantness ratings across intervals was shared for CI users and normal-hearing
listeners (Spitzer et al. 2008). This consistency across groups, however, was mostly in the
perception of dissonance with CI users only displaying a mild sensitivity to consonance at an
octave interval. Analysis of the modeled output of the CI suggests that spectral cues did not
contribute strongly to ratings of harmonic intervals but were instead likely driven by temporal
envelope cues. Therefore, further study is required to characterize the psychophysical cues that
lend themselves to the perceived pleasantness of musical harmony.
Studies also demonstrate that previous levels of musical experience, including active
music listening and engagement, can influence performance on musical tasks. For example,
36
LoPresto (2015)’s work on consonance and dissonance demonstrates that normal hearing,
musically trained participants were more likely than non-musically trained participants to
indicate that they disliked the sound of dissonant intervals in comparison to consonant intervals
(LoPresto 2015). Music training studies with adult CI users also provide evidence that attentive
music listening and engagement can lead to improved performance on frequency change
detection and speech in noise identification (Looi et al. 2012). We, therefore, predict that
higher levels of musical experience among both normal hearing listeners and CI users will be
positively correlated with music and speech perception.
The purpose of this study was to characterize the pleasantness profile of CI users across
harmonic intervals and to determine the aspects of modulation and pitch perception that
influence the perception of consonance and dissonance. Furthermore, this study also
investigates how levels of musical sophistication influence CI users’ performance in these tasks.
Pleasantness ratings were obtained from participants with no known hearing loss and from CI
users for harmonic intervals spanning an octave. We hypothesized that sensitivity to temporal
pitch cues is a driving factor in pleasantness ratings. Specifically, we predicted that CI users with
pleasantness profiles that are most like normal-hearing listeners and those with higher levels of
musical sophistication would be those most sensitive to amplitude modulations.
Methods
Participants
Nine total CI users (R = 36–83 years old, M = 65.5 years, SD = 13.7 years, females = 5)
and eight individuals with no known hearing loss (R = 25–49 years old, M = 32.8 years, SD = 9.6
37
years, females = 3) took part in this experiment. Seven CI participants used Cochlear
Corporation implants (Cochlear Americas, Lone Tree, CO, USA), one used an Advanced Bionics
implant (Sonova, Los Angeles, CA, USA), and one used a Med-El implant (Med-El, Innsbruck,
Austria). Complete CI participant information is provided in Table 2.1. Participants gave
informed consent and were paid $15/hour for their participation. The experimental protocol
was approved by the University of Southern California Institutional Review Board.
38
Table 2.1: Participant Information
Subject information. Age at time of testing and age at onset of hearing loss is given in years.
Duration of profound hearing loss prior to implantation is given in years and estimated from
subject interviews.
Materials and Procedure
People with no known hearing loss and CI users took part in an online listening
experiment designed to characterize pleasantness ratings of musical dyads—pairs of musical
notes presented simultaneously. All testing was done through TeamHearing: a free web-based
software platform developed by our lab at USC for Aural Rehabilitation and Assessment
(www.teamhearing.org, accessed on 20 December 2021). The TeamHearing web application
includes a range of speech and pitch perception tests created to measure various aspects of
hearing including musical judgements, psychophysical discriminations, and speech reception in
various environments and noise conditions. The specific TeamHearing measures used for this
study are described in the follow sections.
39
TeamHearing assessments were accessed on a personal computer, personal tablet, or a
mobile device. People with no known hearing loss used headphones to complete the task. CI
users were asked to complete the task in a method that was most comfortable for them, either
by listening to the task through speakers or receiving sound input directly to their processor via
Bluetooth or through a Mini Microphone device (Cochlear Americas, Lone Tree, CO, USA).
Calibration of sound levels were conducted using loudness adjustments and detection
thresholds for pure tones. Participants completed five assessments: modulation detection,
fundamental frequency discrimination, consonance identification, pleasantness ratings for
musical dyads, and speech reception thresholds on a sentence completion task in multi-talker
background noise. Total testing time was two to three hours. A permalink for this experiment
can be found at https://www.teamhearing.org/82, accessed on 20 December 2021.
Calibration Procedures
Before completing the assessments, participants completed procedures to characterize
relative loudness levels. First, participants adjusted a 1 kHz pure tone to be “soft”, “medium
soft”, “medium”, and “medium loud”. Second, pure tone detection thresholds were measured
for octave steps between 125 and 8000 Hz. Stimuli were 400 ms sinusoids with 20 ms raised-
cosine attack and release ramps. At the beginning of a measurement run, participants set the
stimulus volume to be “soft but audible”. Detection thresholds were then measured using a
three-interval, three-alternative, forced-choice procedure in which two of the intervals
contained silence and one interval contained the gain-adjusted tone. Participants were
instructed via on-screen instructions to select the interval that contained the tone. The starting
40
gain was set by the participant and thereafter reduced by 2 dB after correct responses and
increased by 6 dB after incorrect responses. A run continued until three mistakes were made
and the average of the last four reversals was taken as the detection threshold. This procedure
converges to 75% detection accuracy (Kaernbach 1991).
Modulation Detection
Modulation detection was measured for modulation frequencies near 10 and 110 Hz.
These modulation frequencies were chosen as representative of a roughness cue relevant to
harmonic distortion (10 Hz) and as representative of one of the relevant fundamental
frequencies being examined for pleasantness ratings (110 Hz). Modulation detection was
measured using a three-interval, three-alternative, forced-choice procedure where two of the
intervals contained standard stimuli without modulation and one of the intervals was
modulated with adaptively controlled modulation depth. The standard stimuli were 1 kHz pure
tones that were 400 ms in duration with 20 ms raised-cosine attack and release ramps. The
target stimulus was identically defined except being amplitude modulated. The initial
modulation depth was set to 100%. The modulation depth was decreased by a factor of √2
3
following correct answers and was increased by a factor of two following mistakes. This
adaptive logic converges to 75% detection accuracy (Kaernbach 1991). A measurement run
ended after the participant made four mistakes and the average of the last four reversals was
taken as the modulation detection thresholds. Each of the two modulation frequencies tested
(10, 110 Hz) was measured with three repetitions with conditions presented in random order.
41
Correct answer feedback was provided on all trials for this and all subsequent procedures
except for pleasantness ratings (as there is no correct answer for that procedure).
Fundamental Frequency Discrimination
Fundamental frequency discrimination was measured for fundamental frequencies near
110, 220, and 440 Hz. These fundamental frequencies were chosen as representative of the
typical range of spoken speech and as indicative of the range over which discrimination
typically deteriorates for CI users. Discrimination was measured using a two-interval, two-
alternative, forced-choice procedure for which participants were asked which interval was
higher in pitch. The stimuli were complex tones constructed in the frequency domain by
summing all harmonics from the fundamental to 2 kHz with a low pass filtering function. The
form of the low pass filtering function was:
𝑔𝑎𝑖𝑛 = {1 𝑖𝑓 𝑓 < 𝑓 𝑒 (0.1 − (𝑓 − 𝑓 𝑒 )
2
) 𝑜𝑡 ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑔𝑎𝑖𝑛 is the gain expressed as a linear multiplier applied to each harmonic
component, 𝑓 is the frequency of the component, and 𝑓 𝑒 is the edge frequency of the
passband, which was set as 1 kHz for the low pass filter. Note, as thus defined, the low-pass
filter gain is zero above 2 kHz. Each measurement run began with a fundamental frequency
difference of 100% (an octave). This difference was adaptively controlled and reduced by a
factor of √2
3
after correct answers and increased by a factor of two
after incorrect responses.
For each trial, the precise fundamental frequency tested was roved with values selected from a
quarter-octave range uniformly distributed and geometrically centered on the nominal
42
condition frequency. Relative to the roved value, the standard fundamental frequency was
lowered, and the target raised by √1 + ∆ 100 ⁄ . The gain of the standard and target were roved
by 6 dB based on a uniform distribution centered on the participant’s comfortable listening
level. A run ended when the participant made four mistakes and the average frequency
difference of the last four reversals was taken as the discrimination threshold.
Consonance Identification
To test consonance identification, participants were asked to categorize dyads as either
consonant or dissonant. Four-note dyads were examined including two dyads typically labeled
as consonant (i.e., perfect fifth, octave) and two dyads typically labeled as dissonant (i.e.,
tritone, major seventh). Consonance identification was measured for root notes near 110, 220,
and 440 Hz with each dyad type measured with ten trials each. All musical stimuli were
generated using MuseScore 3 composition and notation software (https://musescore.org/en,
accessed on 20 December 2021; Musescore BVBA, Belgium). Stimuli were grand piano notes
with a duration of three seconds. Before commencing a measurement run, participants were
first provided with several examples of consonant and dissonant dyads. During piloting of the
pleasantness ratings procedures, it was noted that some CI users who had extensive musical
experience could clearly hear the difference between dyads that are typically labeled as
consonant (e.g., perfect fifth, octave) and those typically labeled dissonant (e.g., tritone, major
seventh), though they were reluctant to assign the terms “pleasant” or “unpleasant” to this
distinction.
43
Pleasantness Ratings
The same musical note stimuli described for consonance identification were used for
collecting pleasantness ratings for all participants. Musical dyads were formed by combining
two of the rendered piano notes with dyadic combinations including thirteen pairings of every
note combination with semitone spacing ranging from unison (i.e., combining a note with itself)
to an octave (i.e., combining a note with a note one octave higher). Dyads were organized into
experimental conditions with pleasantness ratings collected for pairings near 110, 220, and 440
Hz. Musical dyads were presented one at a time. Participants were asked to rate the
pleasantness of the dyad on a Likert scale from 0 to 6 with 0 labeled as “dissonant or
unpleasant”, 3 as “neutral”, and 6 as “consonant or pleasant”. For a measurement run, each of
the thirteen dyadic pairings (each semitone spacing from unison to octave, inclusive) were
presented twice. A total of nine measurement runs were made including three repetitions of
each of the three note ranges (110, 220, 440 Hz).
Speech Reception in Multi-talker Background Noise
Speech reception thresholds were measured for a sentence completion task using
speech materials from the Speech Perception in Noise Test (SPIN) corpus in the presence of
multi-talker background noise (Wilson et al. 2012). The user interface presented twenty-five
different word options, and participants were asked to choose the word that ended the last
spoken sentence. The modified SPIN corpus contains sentence materials that include both high
and low amounts of contextual information. Only the materials with low context information
were used in the present study, since we are mainly concerned with the availability of low-level
44
perceptual cues as opposed to cognitive factors. Speech reception thresholds were measured
using an adaptive procedure. The initial signal to noise ratio between the spoken sentence and
background noise was set to 12 dB and was decreased by 2 dB after correct responses and
increased by 6 dB after incorrect responses. The procedure continued until the participants
made four incorrect responses and the average of the last four reversals was taken as the
reception threshold. This adaptive rule converges to 75% identification accuracy for the speech
reception (Kaernbach 1991).
The Goldsmith Musical Sophistication Index
Musical experience was measured using the Goldsmith Musical Sophistication Index
Self-Report Inventory (MSI), a 39-item psychometric instrument used to quantify the amount of
musical engagement, skill, and behavior of an individual (Müllensiefen et al. 2014). The
questions on this assessment are grouped into five subscales: active engagement, perceptual
abilities, musical training, singing abilities, and emotion. Questions under the active
engagement category consider instances of deliberate interaction with music (i.e., “I listen
attentively to music for X hours per day”). The perceptual abilities category includes questions
about music listening skills (e.g., “I can tell when people sing or play out of tune”). Musical
training questions inquire about individuals’ formal and non-formal music practice experiences
(“I engaged in regular daily practice of a musical instrument including voice for X years”).
Singing abilities questions inquire about individuals’ singing skills and activities (e.g., “After
hearing a new song two or three times I can usually sing it by myself”). Questions under the
emotion category reflect on instances of active emotional responses to music (e.g., “I
45
sometimes choose music that can trigger shivers down my spine”). These topics together
consider an individual’s holistic musical ability, including instances of formal and non-formal
music training and engagement. The composite score of these subscales makes up an
individual’s general musical sophistication score. All items, except those assessing musical
training, are scored on a seven-point Likert scale with choices that range from “completely
disagree” to “completely agree” (Müllensiefen et al. 2014).
Data Analysis
Data processing and statistical analyses were performed in MATLAB 2021a
programming environment (MathWorks, Inc., Natick, MA, USA). Results from each test were
analyzed using a 2 × 3 mixed analysis of variance (ANOVA) with a between-subject factor of
group (CI versus those with no known hearing loss) and a within-subject factor of measurement
repetition (three repetitions per test). Effect size was calculated using Cohen’s d. Post-hoc
Bonferroni adjustments were performed for significant main effects (Cohen 1992). Pearson’s
bivariate correlations were calculated to investigate relationships between average scores on
perceptual tests and musical sophistication measures.
Results
Calibration Procedures
Figure 2.1 compares loudness settings and pure tone detection thresholds for both
participant groups. The difference in average detection thresholds between groups was
significant (𝐹 1,21
= 19.2, 𝑝 < 0.001) with cochlear implant users setting the average software
46
volume higher (38.2 ± 15.2) compared to those with no known hearing loss (8.3 ± 16.4).
Importantly, these thresholds are measured relative to the system volume that participants
adjust their computers to for the at-home listening procedures. These results are not indicative
of absolute detection thresholds, but they show that when participants adjust their computer
and listening device settings to be comfortable, CI users have elevated detection thresholds.
The effect of frequency was significant (𝐹 1,21
= 21.8, 𝑝 < 0.001) as was the interaction
between frequency and participant group (𝐹 6,21
= 3.2, = 0.005). The interaction effect is
evidenced by the cochlear implant users have particularly elevated thresholds for the lowest
and highest frequencies tested.
Figure 2.1: Calibration Procedure to Reference Sound Levels to Sensation Levels
47
Comparison of loudness settings and detection thresholds for participants with no known
hearing loss and for cochlear implant users. Detection thresholds and loudness settings are
plotted in decibels relative to the maximum output volume of the testing device (computer or
tablet) with 100 dB corresponding to the maximum output. Smaller circles indicate individual
results and larger circles indicate group averages with error bars indicating standard errors of
the mean. Stars indicate group averages for loudness levels corresponding to “soft,” “medium
soft,” “medium,” and “medium loud.”
Modulation Detection
Figure 2.2 shows modulation detection thresholds for 10 and 110 Hz modulation
frequencies. Participants with no known hearing loss were more sensitive to modulations than
the cochlear implant users (𝐹 1,20
= 17.2, 𝑝 < 0.001). Modulation frequency affected sensitivity
(𝐹 1,20
= 12.2, 𝑝 = 0.002), and there was a significant interaction between modulation
frequency and participant group (𝐹 1,20
= 16.7, 𝑝 < 0.001). For those with no known hearing
loss, modulation detection improved from 11.2% at 10 Hz to 4.5% at 110 Hz (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 2.8, 𝑝 <
0.001); for cochlear implant users, detection slightly worsened from 18.7 to 21.0% (𝑑 𝐶𝑜 ℎ𝑒𝑛
=
0.14, 𝑝 = 0.63). Neither repetition nor the interaction between participant group with
repetition was significant (𝑝 > 0.1, 𝑓𝑜𝑟 𝑏𝑜𝑡 ℎ 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠 𝑜 𝑛𝑠 ) indicating that significant main
effects were not influenced by an effect of learning.
48
Figure 2.2: Sensitivity to Amplitude Modulation
Modulation detection threshold as a percent difference for modulation frequencies of 10 and
110 Hz. The smaller circles represent individual detection thresholds. The larger circles with
error bars represent across participant averages with error bars indicating standard errors of
the means.
Fundamental Frequency Discrimination
Figure 2.3 shows fundamental frequency discrimination thresholds measured near
fundamental frequencies of 110, 220, and 440 Hz. Participants with no known hearing loss had
better discrimination than the cochlear implant users (𝐹 1,21
= 57.0, 𝑝 < 0.001). Fundamental
frequency affected sensitivity (𝐹 2,21
= 7.6, 𝑝 = 0.002), and there was a strong interaction
between fundamental frequency and participant group (𝐹 2,21
= 4.2, 𝑝 = 0.02). For those with
49
no known hearing loss, discrimination thresholds were around 0.5% with little variation across
fundamental frequencies (𝑝 > 0.1, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ). In contrast, for CI users,
discrimination worsened from 5.5% at 110 Hz to 14.0% at 220 Hz (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.74, 𝑝 = 0.01),
then further worsened to 40.3% at 440 Hz (𝑑 𝐶 𝑜 ℎ𝑒𝑛
= 0.71, 𝑝 = 0.008). Neither repetition nor
the interaction between participant group with repetition was significant (𝑝 >
0.1, 𝑓𝑜𝑟 𝑏𝑜𝑡 ℎ 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ) indicating that significant main effects were not influenced by an
effect of learning.
Figure 2.3: Pitch Resolution for Harmonic Complexes
Fundamental frequency discrimination thresholds as a percent difference measured for
fundamental frequencies of 110, 220, and 440 Hz. The smaller circles represent individual
thresholds. The larger circles with error bars represent participant averages with error bars
indicating standard errors of the means.
50
Consonance Identification
Figure 2.4 shows identification accuracy for consonant (unison, perfect fifth, and octave)
and dissonant (major second, tritone, and major seventh) dyads. Participants with no known
hearing loss had better identification than the CI users (𝐹 1,21
= 11.2, 𝑝 = 0.003). Neither
fundamental frequency (𝐹 2,21
= 0.3, 𝑝 = 0.79), nor the interaction between fundamental
frequency and participant group (𝐹 2,42
= 0.1, 𝑝 = 0.94) were significant. This contrasts with
fundamental frequency discrimination. Neither repetition nor the interaction between
participant group with repetition was significant (𝑝 > 0.1, 𝑓𝑜𝑟 𝑏𝑜𝑡 ℎ 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ) indicating
that significant main effects were not influenced by an effect of learning.
Figure 2.4: Consonance Identification for Pairs of Rendered Piano Notes
51
Consonance identification as a percent correct across distinct frequencies ranging from 110 to
440 Hz. The smaller circles represent individual identification measures. The larger circles
represent participant averages with error bars indicating standard errors of the means.
Pleasantness Ratings
Figure 2.5 shows pleasantness ratings for musical dyads ranging from unison to an
octave in semitone increments. Averaged across all conditions, CI users rated dyads as less
pleasant with an average rating of 2.86 compared to 3.46 for those with no known hearing loss
(𝐹 1,21
= 6.5, 𝑝 = 0.02, 𝑑 = 0.37). Importantly, the interaction between hearing group and
dyadic interval was significant (𝐹 12,252
= 7.8, 𝑝 < 0.001), indicating the profile differences in
ratings. These underlying differences can broadly be seen in that CI users have flatter use of the
ratings scale, but a more detailed profile analysis is considered in the subsequent paragraph.
Overall consonance ratings of both groups were similar in that unison, perfect fourth, perfect
fifth, and octave were consistently rated as more pleasant, while minor second, tritone, and
major seventh were consistently rated as less pleasant. Additionally, and interestingly, a main
effect of note range was observed for both groups. The average consonance rating across
groups and intervals was higher for ascending root notes (𝐹 2,21
= 9.1, 𝑝 < 0.001). Grand
averages of consonance ratings were 2.72, 3.12, and 3.35 for root notes near 110, 220, and 440
Hz, respectively.
52
Figure 2.5: Pleasantness Ratings for Pairs of Rendered Piano Notes
Pleasantness ratings on a scale from 0 to 6 for musical dyads from unison to octave. Each
subplot indicates ratings for root notes near 110, 220, and 440 Hz. The circles represent the
participant average with error bars indicating standard errors of the means.
Further analysis of the similarities between pleasantness ratings were conducted by
calculating the correlation between individual ratings with the average ratings from the group
with no known hearing loss. Figure 2.6 shows the correlations for each note range. For the
individuals within the group with no known hearing loss, the correlations are high since this
represents correlations of individual ratings trend with its own group average. These
correlations indicate the consistency within the group. In contrast, the CI users exhibited a
much greater variability with some participants having pleasantness ratings within the group
range for those with no known hearing loss, while other participants exhibited no or even
53
negative correlation. Thus, some CI users have near normal pleasantness ratings for two-note
chords, while others flat or even opposing ratings.
Figure 2.6: Correlations Between the Pleasantness Profile of Individuals and the Average
Ratings of the Group with No Known Hearing Loss
Correlation coefficient between individual pleasantness ratings and the average ratings from
the group with no known hearing loss. The smaller circles represent individual correlations. The
larger circles represent participant averages with error bars indicating the standard errors of
the means.
Speech Reception in Multi-talker Background Noise
Figure 2.7 shows speech reception thresholds for participants with no known hearing
loss and for the CI users. Participants with no known hearing loss had better speech recognition
with average thresholds of −11.0 dB compared to CI users with average thresholds of 9.2 dB
54
(𝑑 𝐶𝑜 ℎ𝑒𝑛
= 3.2, 𝑝 < 0.001). Thus, the difference between group averages was more than 20 dB.
Neither repetition nor the interaction between participant group with repetition was significant
(𝑝 > 0.1, 𝑓𝑜𝑟 𝑏𝑜𝑡 ℎ 𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛𝑠 ), indicating that significant main effects were not influenced
by an effect of learning.
Figure 2.7: Speech Reception in Multi-Talker Background Noise
Speech reception thresholds for participants with no known hearing loss and for CI users. The
smaller circles represent individual thresholds. The larger circle represents group averages with
error bars indicating standard errors of the means.
Correlation Analysis
Correlation analyses were conducted to consider relationships between average results
across procedures. Specifically, for each procedure, measures were averaged across repetitions
and conditions to yield a single value for each participant. These participant averages were then
55
used to calculate the correlation between results across procedures. For pleasantness ratings,
the correlation between individual pleasantness ratings and the group average for participants
with no known hearing loss was used as the procedural result. Table 2.2 summarizes the
correlations across procedures. All correlations were significant when considering the entire
participant pool (𝑝 < 0.001). To characterize the extent to which these strong correlations
were driven by a group effect, separate correlation analyses were conducted for the two
participant groups. For the group with no known hearing loss, significant correlations were
found between modulation detection and speech reception thresholds, between fundamental
frequency discrimination and both consonance identification and pleasantness ratings, and
between consonance identification and pleasantness ratings. For the CI users, all correlations
were generally strong with all associated p-values less than 0.1 and most less than 0.05. In
summary, strong correlations were observed between measures with the strength of
correlation generally persisting even when considering the participant groups separately.
Table 2.2: Correlation Coefficients Across Procedures
Correlation coefficients comparing individual results from different procedures averaged across
conditions. Only correlation magnitudes are displayed, but all correlation directions indicate
better performance on one measure corresponding to better performance on the other
measure unless marked with a †. Correlation analyses were performed across all participants
and within each group. Abbreviations: modulation detection thresholds (MDT), fundamental
frequency discrimination thresholds (F0DT), consonance identification (CID), pleasantness
ratings (PR), and speech reception thresholds (SRT). 𝑝 < 0.05 (*), 𝑝 < 0.01 (**), and 𝑝 < 0.001
(***).
56
All
Participants
F0DT CI PR SRT
MDT 0.72
***
0.76
***
0.66
***
0.73
***
F0DT 0.85
***
0.79
***
0.84
***
CI 0.80
***
0.66
***
PR 0.75
***
No Known
Hearing Loss
F0DT CI PR SRT
MDT .09 .16 .18 .81
*
F0DT .92
**
.89
**
.09
CI .91
**
.12
PR .34
Cochlear
Implant
Users
F0DT CI PR SRT
MDT .54
*
.69
**
.50 .49
F0DT .80
***
.75
***
.62
*
CI .70
**
.48
PR .72
**
As an example of specific correlations, Figure 2.8 compares performance on
fundamental frequency discrimination, consonance identification, pleasantness ratings, and
57
speech reception with modulation detection. Participants who had better modulation detection
generally performed better or as well on the other procedures.
Figure 2.8: Correlations between Perceptual Measures and Modulation Sensitivity
Comparisons of individual results from different procedures based on averages across
conditions. For each comparison, circles represent the average measure for each individual
participant averaged across conditions and repetitions.
A final correlation analysis was conducted to compare MSI with performance on
modulation detection, fundamental frequency discrimination, consonance identification,
pleasantness ratings, and speech reception in noise. The composite general musical
sophistication scores were used for correlations in data analysis. Table 2.3 shows that for
normal hearing and CI users together, there is a significant correlation between MSI scores and
58
all perceptual measures. The strong correlations between MSI and perceptual measures are
generally preserved in the within-group correlations with a few exceptions. For those with no
known hearing loss, the correlations between MSI and modulation detection and with speech
reception in noise were not significant. For cochlear implant users, the correlation between MSI
and speech reception in noise did not reach significance, but all other correlations with MSI
were significant.
Table 2.3: Correlation Coefficients comparing MSI Performance against Perceptual Measures
Correlation coefficients comparing MSI to performance on perceptual measures. Only
correlation magnitudes are displayed, but all correlation directions indicate higher MSI
corresponding to better performance on the perceptual measure unless marked with a †.
Abbreviations: modulation detection thresholds (MDT), fundamental frequency discrimination
thresholds (F0DT), consonance identification (CID), pleasantness ratings (PR), and speech
reception thresholds (SRT). 𝑝 < 0.05 (*), 𝑝 < 0.01 (**), and 𝑝 < 0.001 (***).
MDT F0DT CI PR SRT
All Participants MSI 0.65
***
0.83
***
0.85
***
0.77
***
0.60
**
No Known
Hearing Loss
MSI 0.20 0.81
*
0.96
***
0.87
**
0.06 †
CI Users MSI 0.56
*
0.85
***
0.72
***
0.70
***
0.45
The perceptual measures are plotted against MSI for comparison in Figure 2.9. The clear
trend is for better performance with higher MSI composite scores. The relationship is precise
and well described by a linear relationship for modulation detection, fundamental frequency
59
discrimination, consonance identification, and pleasantness ratings profile. The relationship is
less precise and did not reach significance for within-group comparisons for speech reception in
noise.
Figure 2.9: Correlations between Music Sophistication and Modulation Sensitivity
Comparisons of individual results from different procedures based on MSI scores. For each
comparison, circles represent the average measure for each individual participant averaged
across conditions and repetitions.
Discussion
The hypothesis tested by this experiment is that low-level sensitivity to modulations and
to pitch change are predictive of higher-level measures of consonance perception. We also
predicted that higher levels of musical sophistication would be positively correlated with
60
performance on music and speech perception tasks. The first hypothesis was supported by
evidence of strong correlations between measures of modulation detection and fundamental
frequency discrimination, along with higher-level measures such as consonance identification
and speech reception in background noise. The second hypothesis was partially supported by
strong correlations between musical sophistication scores and performance on fundamental
frequency discrimination thresholds, consonance identification, and pleasantness ratings for
both the no known hearing loss group and CI users. MSI scores for both groups were not
significantly correlated with modulation detection or speech reception thresholds. Discussion is
focused on the significance of these trends and how they relate to other aspects of hearing,
such as audibility, pitch resolution, and speech comprehension in challenging environments.
The present experiment was in part motivated by a study by Spitzer and colleagues
(2008), who examined pleasantness ratings in people who had a cochlear implant in one ear
and normal hearing in the other (Spitzer et al. 2008). In that study, the authors found that
pleasantness ratings were generally flat across musical dyads when listening with the implanted
ear. The authors noted that there were similarities in the pleasantness ratings between the
implanted and normal-hearing ear; for example, participants tended to rate minor second and
major seventh intervals as relatively dissonant in both ears. However, pleasantness ratings
were generally flat as heard through the cochlear implant. The authors speculated that access
to consonance perception provided by cochlear implants is likely mediated by modulation
sensitivity, though they did not test this hypothesis explicitly.
61
In the present study, the relationships between modulation and pitch sensitivities with
pleasantness ratings was explicitly considered. The results indicate that both low-level
measures of modulation and pitch sensitivity are well correlated to consonance identification.
Even when considering the CI users in isolation, the correlation between pitch discrimination
with consonance identification and with pleasantness ratings profile was exceptionally strong.
This evidence indicates that consonance perception amongst CI users is a broadly varying
dimension of hearing, with the implant users who are most sensitive to modulations and pitch
changes having the best access to consonance perception.
Results demonstrate that musical sophistication level is another factor that is strongly
correlated with CI users’ perception of consonance. This is supported by previous works by
LoPresto and Firestone which conclude that increased music training and engagement can lead
to improved consonance identification and pitch discrimination (LoPresto 2015; Firestone et al.
2020). In contrast to previous work, our results did not demonstrate in CI users a connection
between music sophistication and speech reception in noise. Similarly, music training levels
were not significantly correlated with low-level measures of modulation sensitivity as originally
predicted. Together, these findings point to music sophistication as one of several factors that
influence the perception of consonance.
A further contribution of the present study is the precise characterization of how
strongly the pleasantness ratings of CI users can align with those with no known hearing loss.
Previous studies have not considered how the general shape of pleasantness ratings as a
function of dyadic interval compares for cochlear implant users. In the present study, the
62
correlation between pleasantness ratings for individual CI users with a template from those
with no known hearing loss clearly indicates that CI users can have normal pleasantness ratings.
More specifically, levels of musical sophistication for CI users and those with no known hearing
loss were significantly correlated with pleasantness ratings. This indicates that music training
plays a similar role in CI users as in normal hearing populations when it comes to pleasantness
identification. However, some CI users have distinctly abnormal ratings with negative
correlation to those with no known hearing loss, suggesting a possible reversal in which dyadic
intervals sound pleasant and which sound unpleasant.
Worth noting is that a strong correlation was also observed between pitch resolution
and speech reception in multi-talker babble. The presumed mediating mechanism is that CI
users who are more sensitive to pitch change can use this access to pitch to attend to target
speech in the presence of competing talkers. While that mediating mechanism was not
explicitly tested in the present study, the strong correlations support the conjecture. However,
it is also possible that the best performing CI users are high performing on both pitch and
speech tasks without the pitch mechanism necessarily facilitating speech recognition. Further
evidence of the association, though not a causative relationship, was provided in previous work
from our laboratory (Goldsworthy et al. 2013; Goldsworthy 2015). Returning to the present
study, the strong correlations observed between pleasantness ratings profiles with speech
reception provide further evidence of the association between musical and speech domains
(Chatterjee & Peng 2008; Garadat et al. 2012; Luo et al. 2008; Milczynski et al. 2012; Vliegen et
al. 1999). We presume that these relationships are partly driven by low-level access to
63
psychophysical cues for modulation sensitivity and pitch resolution, though causality has not
been established.
The present study demonstrates the importance of sensitivity to modulation and pitch
detection in combination with music training in order to enhance consonance and dissonance
identification abilities among CI users. With the ability to discriminate between consonant and
dissonant sounds in music comes the potential to identify sounds that are more pleasant to an
individual’s music listening (LoPresto 2015). In addition to these findings, there are certain
limitations to consider. The limited number of participants and their individual differences
should be considered. For example, some participants had years of musical training experience
and had a greater understanding of consonance and the expected pleasantness of various
musical intervals. Additionally, years of experience using cochlear implants, implant layout, and
streaming method were not controlled among participants. While analyses conducted did
attempt to limit the effect to which these individual differences could have an effect in the
calculated threshold and correlation coefficients, further studies are needed to understand said
differences. Additionally, it should be noted that the only aspect of music pleasantness
measured in this study was for harmony at specific intervals —music perception in CI users in
general is affected by various other factors such as simultaneous presentation of musical
instruments and voices. The extent to which all factors play a role in perception should be
carefully considered when analyzing temporal cues in sound processors.
64
Chapter 3: Pitch Resolution and Sensitivity to Amplitude Modulation
Influence Sound Source Separation
The work described in this chapter is being prepared for submission in the Journal of the
Acoustical Society of America.
Authors: Andres Camarena
1
, Matthew Fitzgerald
2
, Takako Fujioka
2
, and Raymond
Goldsworthy
1*
1
Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine,
University of Southern California, Los Angeles, CA 90033, USA
2
Stanford Ear Institute, Department of Otolaryngology – Head & Neck Surgery, Stanford
University School of Medicine, Stanford University, Palo Alto, California, United States of
America
*
Author to whom correspondence should be addressed.
65
Introduction
Cochlear implants (CIs) restore function to individuals with profound sensorineural
hearing loss, with many recipients able to understand speech without the support of
accompanying visual cues. However, speech reception in CI users is severely limited in the
presence of background noise (Gfeller et al. 2007; Cullington & Zeng 2008). Speech reception
thresholds (SRTs), the signal to noise ratio (SNR) at which speech recognition is at 50%, are a
common measure of stream segregation—the ability to attend to a sound source interleaved or
competing with other sound sources. SRTs in normal hearing listeners are around -15 dB when
speech is masked by one other speaker, with thresholds plateauing near -3 dB with three or
more talker maskers present (Cullington & Zeng 2008). In contrast, CI users of the same study
required target material to be louder than masker speech, even when only one competing
talker was present. However, the authors note that performance significantly improved when
the target male voice was masked by a competing female talker rather than by another male
talker. These findings are in agreement with previous work demonstrating differences in pitch
as a relevant cue in separating sound sources (Brokx & Nooteboom 1982; Vliegen et al. 1999;
Brungart et al. 2001; Drullman & Bronkhorst 2004; Gutschalk et al. 2007; Paredes-Gallardo et al.
2018). However, a similar release from masking is not observed in CI users or in normal-hearing
adults listening through simulations of CI sound processing (Qin & Oxenham 2003; Qin &
Oxenham 2005; Stickney et al. 2004; Stickney et al. 2007; Cullington & Zeng 2008). Therefore,
stream segregation in CI users is likely limited by reduced access to pitch cues at the auditory
66
periphery (Beauvois & Meddis 1996; Brungart et al. 2001; Pressnitzer et al. 2008; Bidet-Caulet
& Bertrand 2009; Moore Brian C. J. & Gockel Hedwig E. 2012).
The pitch of a complex signal, such as speech and music, is represented by a
combination of spectral and temporal cues. Place-of-excitation cues are captured at the region
of the auditory nerve corresponding to its characteristic frequency as well as at integer
multiples of the fundamental frequency, with spectral peaks resolvable up to around the 12
th
harmonic (Oxenham 2012). For higher harmonics, cochlear filters become increasingly broad,
and harmonic components no longer provide discernable tonotopic cues for pitch. However,
interactions between harmonic components within a single filter result in a neural response
with a periodic fluctuation in amplitude. This periodicity in the spectral excitation pattern is an
additional pitch cue with repetition rates in the range of 30 to 4000 Hz capable of supporting
melodic pitch (Attneave & Olson 1971; Pressnitzer et al. 2008).
CIs attempt to convey spectral and temporal correlates of pitch, but face limitations on
both fronts. Place-of-excitation cues are limited by the specificity of excitation along the length
of the auditory nerve. While the healthy auditory nerve contains around 30,000 fibers, CIs use
an array of no more than 22 electrodes—further degraded by the spread of electrical current—
and are unable to provide the resolution needed to resolve harmonic components (Wouters et
al. 2015; Limb & Roy 2014; Dorman & Wilson 2004). To preserve spectral resolution offered by
the limited number of electrodes, CI sound processing makes an adjustment to the bandpass
filter bank of the stimulation channels. In contrast to the cochlear filters seen in the healthy
auditory nerve, CIs incorporate relatively narrow filters with minimal overlap. However, doing
67
so limits the number of harmonic components that interact within a filter to provide deep
modulation in the temporal envelope. As a result, periodicity in the stimulus envelope is often
degraded—even at frequencies designed for envelope extraction—with the remaining
modulation at times asynchronous across stimulation channels (Milczynski et al. 2009). In this
manner, the depth of modulation is reduced to preserve frequency resolution—though neither
are well conveyed.
Conventional methods for improving hearing outcomes for CI users focus on
improvements to spectral resolution and do not typically co-employ improvements to envelope
cues. These strategies take advantage of current interactions of nearby electrodes such that
charge summates at locations in between physical electrodes (Donaldson et al. 2005; Firszt et
al. 2007; Koch et al. 2007; van den Honert & Kelsall 2007). However, while these methods
increase the number of spectral channels available to the implanted ear, only modest benefits
have been observed (van den Honert & Kelsall 2007; Berenstein et al. 2008; Nogueira et al.
2009; Srinivasan et al. 2013). In fact, speech comprehension and pitch perception in CI users
plateaus around 6-10 spectral channels (Fishman Kim E. et al. 1997; Fu et al. 1998; Faulkner et
al. 2001; Friesen et al. 2001). Though, it is possible that the improvements to spectral resolution
are offset by reductions in temporal envelope cues (Drennan et al. 2010; de Jong et al. 2017).
The increased number of spectral channels provided by such strategies require a finer spectral
analysis—often performed by a Fast Fourier Transform—with the spectral content of sound
estimated by a sliding window analysis. However, this process can act as a low-pass filter
limiting envelope extraction. Furthermore, level adjustments in the electrode pair of a virtual
68
channel can introduce temporal fluctuations unrelated to the acoustic signal. Rather, efforts to
improve pitch perception and hearing outcomes likely require a combination of spectral and
temporal cues to be present in the electrical stimulus. Therefore, there is strong motivation to
characterize the contribution and accessibility of both spectral and temporal envelope cues as
they relate to hearing outcomes.
Towards this goal, the present study characterizes how differences in pitch between
sound sources facilitate stream segregation. Specifically, this study investigates the relationship
between low-level psychophysical thresholds for modulation sensitivity and pitch resolution
with auditory stream segregation abilities. Psychophysical thresholds were measured for
modulation sensitivity and for pitch resolution, and auditory stream segregation abilities were
obtained from CI users and peers with no known hearing loss. We hypothesized that sensitivity
to both spectral and temporal pitch cues facilitate the ability to attend to a sound source in
background noise. Specifically, we predicted that CI users with greater sensitivity to amplitude
modulations and better resolution for pure and complex tones would be those with the
greatest ability to segregate auditory streams.
Methods
Participants
Ten CI users (R = 21-75 years old, M = 61.8 years, SD = 15.3 years, females = 7) and eight
individuals with no known hearing loss (R = 19-66 years old, M = 33.4 years, SD = 15.8 years,
females = 3) participated. Seven CI participants used Cochlear Corporation implants (Cochlear
69
Americas, Lone Tree, CO, USA), three used an Advanced Bionics implant (Sonova, Los Angeles,
CA, USA). Bilaterally implanted CI users were tested with each ear separately. Complete CI
participant information is provided in Table 1. Participants gave informed consent and were
paid $15/hour for their participation. The experimental protocol was approved by the
University of Southern California Institutional Review Board (HS-19-00482 approved July 10
th
,
2019).
Table 3.1: Participant Information
CI participant information. Age at time of testing and age at onset of hearing loss is given in
years. Duration of profound hearing loss prior to implantation is given in years and estimated
from subject interviews.
70
Remote Assessment
All testing was done through TeamHearing: a web application for auditory rehabilitation
and assessment (www.teamhearing.org). Remote auditory assessments followed the
recommendations of the Acoustical Society of America task force on remote testing (Peng et al.
2022). Research assistants met with study participants using video conferencing prior to study
completion. Participants were instructed to test in a quiet room free from distractions and were
provided with stereo headphones for testing (Koss UR20 Over-Ear Headphones). CI users were
asked to complete the task as they normally would connect for computer audio. When listening
through a free field speaker, CI users were asked to disable hearing technology worn on—or to
otherwise occlude—the ear contralateral to that being tested. When streaming via Bluetooth,
CI users were asked to disable any Bluetooth compatible devices worn on the contralateral ear.
Total testing time was three to four hours.
Loudness and Sensation Levels
Loudness scaling and detection thresholds were measured to reference sound levels to
sensation levels. Loudness scaling was measured by having participants adjust a 1 kHz pure
tone to be “soft,” “medium soft,” “medium,” and “medium loud.” Detection thresholds were
measured for pure tones for frequencies of 500, 1000, and 2000 Hz. Stimuli were 400
millisecond sinusoids with 20 millisecond raised-cosine attack and release ramps. Thresholds
were measured using a three-interval, three-alternative, forced-choice procedure in which two
of the intervals contained silence and one interval contained the tone. The starting level was
set by the participant to be “soft but audible” and was then reduced by 2 dB following correct
71
answers and increased by 6 dB after mistakes. A run continued until three mistakes were made
and the average of the last four reversals was taken as the threshold. This adaptive rule
converges to 75% detection accuracy (Kaernbach 2001).
Amplitude Modulation Detection
Sensitivity to amplitude modulation was measured for modulation frequencies near 10
and 110 Hz. These modulation frequencies were chosen as representative of a roughness cue
relevant to harmonic distortion (10 Hz) and as representative of the fundamental frequency
being examined for temporal jitter detection and speech reception thresholds (110 Hz). Stimuli
were 400 milliseconds in duration with 20 milliseconds raised-cosine attack and release times.
Stimulus intervals were separated by 200 milliseconds. The carrier frequency of the modulated
sinusoids were 1 kHz pure tones. Prior to each trial, participants were presented with a
representative stimulus from the upcoming exercise and allowed to adjust the level to be
comfortable. To minimize discrepancies in loudness between modulated and non-modulated
tones, stimuli were roved by ±6 dB.
Detection was measured using a three-interval, three-alternative, forced-choice
procedure for which participants were asked which interval was modulated (or different). Prior
to testing, participants listened to examples of modulated and unmodulated stimuli. The initial
modulation depth was 100% but decreased by a factor of √2
3
following correct answers and
increased by a factor of two following mistakes. This adaptive logic converges to 75% detection
accuracy (Kaernbach 1991). A measurement run ended after the participant made four
mistakes and the average of the last four reversals was taken as the modulation detection
72
thresholds. Each of the two modulation frequencies tested (10, 110 Hz) was measured with
three repetitions with conditions presented in random order. Correct answer feedback was
provided on all trials for this and all subsequent procedures.
The modulating envelope was sinusoidal and defined according to the equations:
𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑜𝑟 = 1 −
𝑚 2
∗ (1 − cos(2𝜋𝑓𝑡 ))
𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑜𝑟 = max (𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑜𝑟 , 0)
The first equation is often used with the modulation index, m, constrained between 0
and 1; however, the rectification applied in the second equation allows any modulation index
from zero to infinity to be used—amounting to sharpening of the temporal envelope. For
measuring modulation detection thresholds, modulation depth was allowed to increase beyond
100% depth if the participant needed it, in which case the above equations were used to specify
the shape of the modulation envelope.
Pitch Discrimination
Discrimination for the frequency of pure tones and the fundamental frequency (F0) of
complex tones were measured using a two-interval, two-alternative, forced-choice procedure
for which participants were asked which interval was higher in pitch. Pure tone frequency
discrimination thresholds (FDTs) were measured at 1000 Hz and was chosen its position near
common vowel formant frequencies. Complex tone F0 discrimination thresholds were
measured at F0 of 110 Hz. This F0 was chosen as representative of the typical range of male-
spoken speech and mirrors the center frequency of the speech material used in the speech
73
reception task. Complex tones were constructed in the frequency domain by summing all
harmonics from the fundamental to 2 kHz with a low pass filtering function. The form of the
low pass filtering function was:
𝑔𝑎𝑖𝑛 = {1 𝑖𝑓 𝑓 < 𝑓 𝑒 max(0, 1 − (log
2
𝑓 − log
2
𝑓 𝑒 )
2
) 𝑜𝑡 ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑔𝑎𝑖𝑛 is the gain expressed as a linear multiplier applied to each harmonic
component, 𝑓 is the frequency of the component, and 𝑓 𝑒 is the edge frequency of the
passband, which was set as 1 kHz for the low pass filter. Note, as thus defined, the low-pass
filter gain is zero above 2 kHz. Each measurement run began with a frequency or fundamental
frequency difference of 100% (an octave). This difference was adaptively controlled and
reduced by a factor of √2
3
after correct responses and increased by a factor of two
after
incorrect responses. For each trial, the precise frequency or fundamental frequency tested was
roved with values selected from a quarter-octave range uniformly distributed and geometrically
centered on the nominal condition frequency. Relative to the roved value, the standard
frequency or fundamental frequency was lowered, and the target raised by √1 +
∆
100
. The gain
of the standard and target were roved by 6 dB based on a uniform distribution centered on the
participant’s comfortable listening level. A run ended when the participant made four incorrect
responses and the average frequency difference of the last four reversals was taken as the
discrimination threshold.
74
Temporal Jitter Detection
Modeled after work by Paredes-Gallardo et al. 2018, participants were tasked with
detecting a temporal jitter in the last tone of a train of isochronously repeating tones while in
the presence of masker noise. Across trials, the masker noise could be identical in pitch—in
pure tone frequency or fundamental frequency—or up to an octave higher. In this manner, the
procedure measures thresholds for stream segregation ability between the target and masker
within a given pitch distance (within a trial) and in relation to other step sizes used (across
trials). Pure tones were used to assess perception based purely on tonotopic cues. Harmonic
complexes were chosen for their analogous properties with speech—in that they contain both
spectral and temporal cues for pitch— while remaining relatively less complex. For both pure
tones and harmonic complexes, the stimuli were pitch shifted by 0, 3, 6, or 12 semitones
depending on the condition, with the motivation being that the greater differences in pitch
would increase masking release.
Jitter detection thresholds were measured for pure and complex tones using a two-
interval, two-alternative, forced-choice procedure for which participants were asked whether
the last note in the target stream was early or late relative to its periodicity. The task worked
adaptively in that, with the correct response, the jitter size became smaller until the threshold
was measured. The target stream consisted of 1000 Hz pure tones or complex tones with F0 at
110 Hz. Within a trial, the masker stream matched the stimulus type of the target notes—either
pure or complex—and could be 12, 6, 3, or 0 semitones higher in F0. Notes were 100 ms in
duration and were played for the target stream at a periodicity of 400 ms for a total of 1.5
75
seconds. Notes for the masker stream were played at a periodicity centered at 400 ms with a
jitter of +/-100 ms so that the masker stream did not directly compete with the target stream
rhythm. Each measurement run began with a temporal jitter magnitude of 100 ms for the last
note of the target stream. Additionally, the task began with the masker absent, with the signal
to noise ratio (SNR) adapting with correct responses. Once the target and masker had equal
presentation levels, jitter magnitude of the final target stream note was adaptively controlled
and reduced by a factor of √2
3
after correct responses and increased by a factor of two
after
incorrect responses. However, the jitter was never exceeded 100ms. Thus, the threshold near
to 100ms represents ceiling.
Speech Reception in Single and Multi-talker Background Noise
Speech reception thresholds were measured for spondee words in the presence of
background speech. The background speech was generated to have a high degree of
informational masking to force listeners to attend to pitch differences to segregate target
speech from competing speech. Specifically, the competing speech was generated from the
same corpus of words as the target speech. Two types of competing speech were generated:
one with a relatively sparse presentation of speech, the other with a relatively dense
presentation. For the sparse background, words were randomly selected from the spondee-
word corpus and concatenated so that the temporal spacing between words was uniformly
distributed between 0.5 and 0.6 seconds. For the dense background, words were randomly
selected and concatenated so that the temporal spacing between words was between 0.2 and
0.3 seconds. Further, for the dense spacing, selected words were randomly time-reversed with
76
50% probability. For both types of competing speech, the speech was pitch shifted by 0, 3, 6, or
12 semitones depending on the condition, with the motivation being that the greater
differences in pitch would increase masking release. These competing-speech samples were
generated to be 20 seconds long and were played continuously in a loop during the speech
reception procedure. Speech reception thresholds were measured using an adaptive
procedure. The initial SNR between the target speaker and background noise was set to 12 dB
and was decreased by 2 dB after correct responses and increased by 6 dB after incorrect
responses. The procedure continued until the participants made four incorrect responses and
the average of the last four reversals was taken as the reception threshold. This adaptive rule
converges to 75% identification accuracy for the speech reception (Kaernbach 2001).
Data Analysis
Data processing and statistical analyses were performed in MATLAB 2021a
programming environment (MathWorks, Inc., Natick, MA, USA). Results from each test were
analyzed using a mixed analysis of variance (ANOVA) with a between-subject factor of group
(cochlear implant versus those with no known hearing loss) and a within-subject factor of
measurement repetition (three repetitions per test). Effect size was calculated using Cohen’s d.
Post-hoc Bonferroni adjustments were performed for significant main effects (Cohen 1992). For
both temporal jitter detection and speech reception, thresholds were logarithmically averaged
across repetitions for each participant.
Pearson's bivariate correlations were calculated to investigate relationships between
average scores on perceptual tests when the pitch distance between target and masker was 6
77
semitones apart and either modulation detection or pitch discrimination. An initial analysis of
variance was conducted with repetition as a within-subject factor, but neither repetition nor
any of its interactions with other factors were significant.
Results
Modulation Detection
Figure 3.1 shows modulation detection thresholds for 10 and 110 Hz modulation
frequencies. At 10 Hz modulation frequency, cochlear implant users had similar sensitivity to
those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.10, 𝑝 = 0.23). However, both groups performed
differently when detecting modulations at 110 Hz (𝐹 1,14
= 16.7, 𝑝 < 0.001). For those with no
known hearing loss, modulation detection improved from 11.6% at 10 Hz to 6.9% at 110 Hz
(𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.49, 𝑝 = 0.036); for cochlear implant users, detection slightly worsened from 10.9
to 17.8% (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.57, 𝑝 = 0.012).
78
Figure 3.1: Sensitivity to Amplitude Modulation
Modulation detection thresholds for sinusoidally amplitude-modulated tones. The bottom and
top edges of each box indicate 25
th
and 75
th
percentiles, respectively. Whiskers extend to the
most extreme data points not considered outliers. Outliers are plotted individually using the '+'
symbol. Black circles with error bars indicate the sample mean and standard error. The
horizontal line within each box indicates the median.
Frequency Discrimination
Figure 3.2 shows discrimination thresholds for pure tones and complex tones for
participants with no known hearing loss and for cochlear implant users. On average, cochlear
implant users had poorer pitch resolution (𝐹 1,21
= 37.12, 𝑝 < 0.001) with a grand average of
6.8% compared to 0.55% for those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 2.73, 𝑝 < 0.001). For
79
those with no known hearing loss, the change from pure to complex tone had little effect on
discrimination thresholds with performance below 1% for both conditions (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.21, 𝑝 =
0.59). In contrast, pitch resolution in cochlear implant users worsened from 4.29% for pure
tones to 10.7% for complex tones (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.87, 𝑝 = 0.002).
Figure 3.2: Pitch Resolution for Pure Tones and Harmonic Complexes
Frequency discrimination thresholds as a percent difference for 1000 Hz pure tones and 110 Hz
complex tones. The bottom and top edges of each box indicate 25
th
and 75
th
percentiles,
respectively. Whiskers extend to the most extreme data points not considered outliers. Outliers
are plotted individually using the '+' symbol. Black circles with error bars indicate sample means
and standard errors. The red horizontal line within each box indicates the sample median.
Abbreviations: fundamental frequency (F0).
Temporal Jitter Detection with Background Noise
Figure 3.3 shows temporal jitter detection thresholds as a function of pitch distance for
pure and complex tones. For masker pitch distances between 0 and 12 semitones, masking
80
release improved with increasing pitch distance (𝐹 3,63
= 36.5, 𝑝 < 0.001) and had a similar
effect on both participant groups (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.56, 𝑝 = 0.12). Masking release facilitated by
increasing masker pitch distance was impacted by stimulus type (𝐹 3,63
= 4.22, 𝑝 = 0.008),
with poorer masking release for complex tones than for pure tones. While both participant
groups were similarly impacted by the change in stimulus, the effect on cochlear implant users
was slightly larger (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.90, 𝑝 < 0.001) than that observed in those with no known
hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.52, 𝑝 = 0.024). Independent of masker pitch distance, streaming
ability was generally greater for pure tones than for complex tones (𝐹 3,63
= 18.3, 𝑝 < 0.001).
For instance, average detection thresholds across both groups—for masker pitch distances
between 0 and 12 semitones—were 33.8 ms for pure tones and 47.7 ms when listening to
complex tones (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.7, 𝑝 < 0.001).
Figure 3.3: Stream Segregation for Tonal Stimuli as a Function of Masker Pitch Distance
81
Temporal jitter detection thresholds (JDT) as a function of pitch distance for pure and complex
tones. The bottom and top edges of each box indicate 25
th
and 75
th
percentiles, respectively.
Whiskers extend to the most extreme data points not considered outliers. Outliers are plotted
individually using the '+' symbol. Black circles with error bars indicate sample means and
standard errors. The red horizontal line within each box indicates the sample median.
Abbreviations: fundamental frequency (F0).
Speech Reception in Single and Multi-talker Background Noise
Speech reception thresholds were measured as a function of pitch distance for single
and multi-talker background noise (Figure 3.4). For masker pitch distances between 0 and 12
semitones, masking release improved with increasing pitch distance (𝐹 3,63
= 102.83, 𝑝 <
0.001). However, this general effect was driven by the improvement in thresholds by the no
known hearing loss group (𝐹 3,63
= 52.5, 𝑝 < 0.001). In contrast, cochlear implant users
struggled to attend to the target sound once the background became similar in presentation
level. When averaging across masker type, average speech reception thresholds for cochlear
implant users changed from 2.79 dB when the masker was the same pitch as the target (0
masker semitone pitch distance) to only 0.38 dB for when the masker became 12 semitones
higher in pitch than the target (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.70, 𝑝 = 0.027). Whereas masking release facilitated
by increasing pitch distance was more substantial for those with no known hearing loss, with
average speech reception thresholds improving from -1.82 dB to -15.75 dB when the pitch of
the masker changed from 0 to 12 semitones higher than the target (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 2.71, 𝑝 < 0.001).
That this masking release was maintained in both single-talker and multi-talker
background noise suggests a general benefit from masker pitch distance independent of
82
informational masking. Though, there was a general effect on masker type on masking release
(𝐹 3,63
= 18.9, 𝑝 < 0.001) with average speech reception thresholds—for masker pitch
distances between 0 and 12 semitones—worsening in those with no known hearing loss from -
16.9 dB in single-talker background noise to -8.74 dB in multi-talker background noise
(𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.04, 𝑝 < 0.001).
Figure 3.4: Stream Segregation for Speech Material as a Function of Masker Pitch Distance
Speech reception thresholds as a function of pitch distance for single-talker and multi-talker
background noise. The bottom and top edges of each box indicate 25
th
and 75
th
percentiles,
respectively. Whiskers extend to the most extreme data points not considered outliers. Outliers
are plotted individually using the '+' symbol. Black circles with error bars indicate sample means
and standard errors. The red horizontal line within each box indicates the sample median.
Abbreviations: speech reception threshold (SRT).
83
Psychophysical Thresholds for Pitch Predict Stream Segregation in Cochlear Implant
Users
Correlation analyses were performed to characterize the relationship between low-level
psychophysical thresholds to pitch and stream segregation ability. For both temporal jitter
detection and speech reception thresholds, comparisons are shown for when the target and
masker stream were 6 semitones apart in pitch. Table 3.2 summarizes correlations for all
participants together and separated into groups. In general, the relationship between low-level
psychophysical thresholds for pitch and stream segregation ability were significant when
considering the entire participant pool. Likewise, these trends were mostly preserved when
considering only cochlear implant users. However, similar trends were not observed in those
with no known hearing loss. In contrast to the larger effects observed in the prior analyses, the
relationship between psychophysical sensitivity to pitch and stream segregation may be more
difficult to observe with the small number of recruited participants—particularly noticeable in
the those with no known hearing loss. While these observed trends in cochlear implant users
are consistent across behavioral tasks, it is difficult to generalize the observed relationships to a
normal hearing population.
Correlations for jitter detection and speech reception against modulation sensitivity are
shown in Figure 3.5. When considering the entire participant pool, sensitivity to amplitude
modulation had strong explanatory power for stream segregation ability, with individuals most
sensitive to modulation having the greatest ability to attend to a target stream in the presence
of background noise. To characterize the extent to which these correlations were driven by a
group effect, separate correlation analyses were conducted for the two participant groups. For
84
cochlear implant users, significant correlations were found between modulation sensitivity and
streaming of complex tones in the jitter detection task. Trends for modulation sensitivity
against speech reception in single-talker and multi-talker noise generally persisted, with those
with greater sensitivity to modulation being those with greater streaming ability. Similar trends
were not observed in those with no known hearing loss.
Figure 3.5: Correlations between Perceptual Measures and Modulation Sensitivity
Comparisons of jitter detection and speech reception against modulation sensitivity. Jitter
detection thresholds and speech reception thresholds are plotted versus modulation detection
thresholds with markers representing individual thresholds averaged across conditions and
repetitions. Panel columns represent the different comparisons made. Least-squares regression
lines are plotted to visualize trends within each participant group. Abbreviations: fundamental
frequency (F0), jitter detection threshold (JDT), speech reception threshold (SRT), and
modulation detection threshold (MDT).
Correlations for jitter detection and speech reception against pure tone frequency
discrimination are shown in Figure 3.6. Frequency discrimination for pure and complex tones
was not significantly correlated with performance in temporal jitter detection. Though, a
congruent relationship was observed within cochlear implant users, with lower discrimination
thresholds associated with greater stream segregation for tonal stimuli. Significant correlations
85
were found between pure tone frequency discrimination and speech reception thresholds in
both single-talker and multi-talker noise, with lower discrimination thresholds associated with
greater streaming ability. Significant correlations persisted in the CI group, but were not
observed in the group with no known hearing loss.
Figure 3.6: Correlations between Perceptual Measures and Pure Tone Frequency Discrimination
As for Figure 3.5 but with correlation analyses conducted jitter detection and speech reception
against for pure tone frequency discrimination. Abbreviations: fundamental frequency (F0),
jitter detection threshold (JDT), speech reception threshold (SRT), and frequency discrimination
threshold (FDT).
Correlations for jitter detection and speech reception against fundamental frequency
discrimination are shown in Figure 3.7. When considering the entire participant pool, a
congruent relationship was observed between complex frequency discrimination and
86
performance in the jitter detection task, with lower discrimination thresholds associated
greater stream segregation ability. Notably, this relationship was largely preserved in cochlear
implant users, with strong correlations observed between complex frequency discrimination
and streaming of speech material. Similar trends were not observed in those with no known
hearing loss. Together, these data reaffirm the relationship between low-level sensitivity to
spectral and modulation pitch cues, and bring attention to the value in improving access to
pitch through the electrical stimulation patterns designed for hearing restoration.
Figure 3.7: Correlations between Perceptual Measures and F0 Frequency Discrimination
As for Figure 3.6 but with correlation analyses conducted for jitter detection and speech
reception against fundamental frequency discrimination. Abbreviations: fundamental (F0),
jitter detection threshold (JDT), speech reception threshold (SRT), and frequency discrimination
threshold (FDT).
87
Table 3.2: Correlation Coefficients Across Procedures
Magnitude of correlation coefficients comparing individual results from different procedures
averaged across conditions. For clarity, only the correlation magnitudes are displayed, but all
comparisons were congruent in that better performance on one measure corresponded with
better performance on another. Correlation coefficients and p-values associated with p-values
less than 0.05 are emboldened. Abbreviations: Temporal jitter detection thresholds for pure
tones (Pure-JDT), temporal jitter detection thresholds for complex tones (JDT-F0), speech
reception thresholds for single-talker noise (ST-SRT), and speech reception thresholds for multi-
talker noise (MT-SRT). 𝑝 < 0.05 (*), 𝑝 < 0.01 (**), and 𝑝 < 0.001 (***).
Combined Pure-JDT F0-JDT ST-SRT MT-SRT
MDT 0.55* 0.58* 0.66**
DT-Pure 0.17 0.31 0.68*** 0.79***
DT-F0 0.14 0.23 0.82*** 0.86***
CI Users Pure-JDT F0-JDT ST-SRT MT-SRT
MDT 0.65* 0.47 0.52
DT-Pure 0.31 0.43 0.53* 0.53*
DT-F0 0.16 0.22 0.76*** 0.73**
NKHL Pure-JDT F0-JDT ST-SRT MT-SRT
MDT -0.47 -0.44 0.01
DT-Pure -0.27 -0.07 -0.35 0.38
88
Discussion
The present study examined the psychophysical features of pitch that facilitate
streaming of stimuli of varying spectral content and temporal overlap. In particular, this study
investigated the influence of pitch differences on stream segregation in the context of
competing and non-competing maskers. The hypothesis tested by these experiments is that
pitch distance between a target and masker promotes their perception as two separate
streams. Likewise, we predicted that low-level sensitivity to modulation and pitch were
predictive of streaming ability. Overall, the results suggest that differences in pitch help to
facilitate streaming. However, cochlear implant users have poorer sensitivity to pitch and
experience poorer release from masking than those with no known hearing loss. Discussion is
focused on the characteristics of these trends and how they change with stimulus complexity
and hearing loss.
Perceptual thresholds indicative of stream segregation were significantly affected by
pitch distance for both tonal and speech material. These results are in general agreement with
a study from (Vliegen & Oxenham 1999) which investigated the effect of pitch distance on
streaming ability when listening to a sequence of tone triplets (ABA ABA …). The authors found
that the percentage of reported segregation responses increased with increasing pitch
difference between tones A and B. Notably, this trend persisted through all stimulus types
including pure tones, low-pass filtered harmonic complexes, and high-pass filtered harmonic
DT-F0 -0.36 -0.24 -0.28 0.11
89
complexes where only harmonics above the 10
th
were at full amplitude. In contrast to
traditional models of stream segregation which focus on spectral masking (Hartmann &
Johnson 1991; Beauvois & Meddis 1996; McCabe & Denham 1996), a growing body of evidence
suggest that spectral and temporal cues can independently contribute to the percept of
streaming (Vliegen et al. 1999; Grimault et al. 2000; Grimault et al. 2002; Roberts et al. 2002).
For those with hearing impairment, however, poor access to the spectral and temporal cues
conveying pitch may be a limiting factor in stream segregation. As such, the present study
explicitly considered sensitivity to tonotopic and modulation pitch as predictors of stream
segregation ability.
Correlation analyses demonstrated a congruent relationship between frequency
discrimination and performance on the temporal jitter task when the F0 distance between
target and masker stream was 6 semitones. Sharper pitch resolution for pure or complex tones
were beneficial to streaming irrespective of noise density, with exceptionally strong
correlations observed between frequency discrimination and streaming ability. Likewise,
sensitivity to amplitude modulation was predictive of performance across all measures, with
those most sensitive to amplitude modulation being those with greater streaming ability. Taken
together, these data point to low-level access to modulation and pitch as factors contributing to
complex listening.
The profile of masking release for the temporal jitter task is in general agreement with a
study performed by (Gutschalk et al. 2007), which coupled objective measures for stream
segregation with behavioral responses of having perceived separate sounds. In this study,
90
measures for fMRI and MEG were collected during a sequential stream segregation task with
notes that shared spectral envelopes but differed in F0. Participants were asked to report when
they perceived two separate streams when listening to sequences of either (ABBBABBB…) or
(BAB-BAB-…). While fMRI and MEG measures were generally graded across pitch distances
spanning 10 semitones, behavioral responses indicative of having heard two streams typically
reached a sudden plateau with pitch distances starting at 3 semitones. Overall, their results for
stream segregation of complex tones are in agreement with trends in the present study, and
reaffirm temporal pitch alongside tonotopic cues as relevant towards stream segregation.
However, stimulation patterns offered by conventional sound processing face limitations in the
delivery of spectral and temporal pitch cues, poorly conveying harmonic structure or
modulation related to signal F0 (Chatterjee & Peng 2008; Limb & Roy 2014; Wouters et al.
2015; Barda et al. 2018).
While sensitivity to pitch generally promoted streaming ability in the present study,
cochlear implant users experienced a reduced benefit. For instance, while the profile of
masking release was generally shared between groups in the temporal jitter task, the
magnitude of masking release was diminished for cochlear implant users, with greater pitch
shifts required to match performance in the group with no known hearing loss. Moreover,
masking release when listening to speech material was wholly absent in cochlear implant users
even when the masker was not in direct competition with the target speech —as was the case
in the single-talker background noise condition. Overall, these data suggest that cochlear
implant users have reduced access to the tonotopic and modulation pitch cues relevant for
91
stream segregation of ecologically relevant sounds. Therefore, enhancement of low-level
psychophysical cues for pitch may be an avenue for restoring streaming ability in cochlear
implant users.
92
Chapter 4: The Fidelity of Capture at the Auditory Nerve Predicts
Hearing Performance
The work described in this chapter is being prepared for submission in the Journal of the
Acoustical Society of America.
Authors: Andres Camarena
1
and Raymond Goldsworthy
1*
1
Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine,
University of Southern California, Los Angeles, CA 90033, USA
Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine,
University of Southern California, Los Angeles, California, United States of America
*
Author to whom correspondence should be addressed.
93
Introduction
When sound is transferred to the cochlea, different frequencies are distributed along
the length of the basilar membrane in accordance with the tissue’s tuning properties. This
tonotopic representation of frequency is exploited in one of the most successful neural
interfaces: the cochlear implant. Commercial multi-electrode implants directly stimulate
different regions of the auditory nerve by spectrally separating incoming sound and allocating
different frequencies to specific electrodes along the length of the auditory nerve. Devices
largely use monopolar, biphasic pulse trains with constant stimulation rates—unrelated to the
temporal fine structure of the acoustic signal—and modest changes in the temporal envelope
to encode sound information. This approach restores impressive function to individuals with
profound sensorineural hearing loss. Even in ideal acoustic environments, however, cochlear
implant users struggle with perceptual deficits that include pitch, sound source separation, and
aspects of speech prosody (Donnelly et al. 2009; Galvin et al. 2009; Limb & Roy 2014; Penninger
et al. 2014; Chatterjee et al. 2015).
The pitch of harmonic sounds derives from the fundamental frequency, which is
redundantly encoded by place-of-excitation and fine-timing cues (Cariani & Delgutte 1996a;
Cariani & Delgutte 1996b; Plack & Oxenham 2005; Cedolin & Delgutte 2010; Oxenham et al.
2011; Oxenham 2012). In normal hearing, low-frequency harmonics provide a strong sense of
pitch transmitted by both resolved place-of-excitation in the auditory nerve and by phase-
locked spikes temporally synchronized to the fundamental frequency (Plack & Oxenham 2005).
High-frequency harmonics, which transmit primarily temporal cues, also provide a sense of
94
pitch but with poorer resolution compared to sounds containing low-frequency harmonics.
Cochlear implants convey place and timing cues for pitch, but face limitations on both fronts.
Place cues are limited by the number of electrodes, current spread, and neural health, and are
thus unlikely to provide sufficient resolution to convey tonotopically resolved harmonics of a
complex. Temporal pitch cues—provided by amplitude modulation of pulse trains—are not
constrained by existing hardware of the implanted electrode array, but these cues are limited
by sound-processing design. Narrow filters are often used to improve frequency tuning but
doing so limits the number of harmonic components that interact within a filter to provide deep
modulations. In this manner, the depth of modulation is reduced to preserve frequency
resolution, with the benefits believed to compensate for the perceptual consequences of
reduced modulations.
The importance of modulation depth has been relatively well explored in spatial hearing
compared to pitch perception. For spatial hearing, several studies have shown that
lateralization of modulated tones improves with modulation depth (Bernstein & Trahiotis 2002;
Bernstein & Trahiotis 2004; Bernstein & Trahiotis 2005; Bernstein & Trahiotis 2007; Bernstein &
Trahiotis 2009; Bernstein & Trahiotis 2010; Bernstein & Trahiotis 2014; Bernstein & Trahiotis
2017; Dietz et al. 2013). Compared to its role in spatial hearing, the role of modulation depth in
pitch perception is not well characterized—though some evidence exists (Chatterjee & Oberzut
2011). Chatterjee and Oberzut (2011) found that discrimination of modulation frequency was
generally better in those with better modulation sensitivity. Their interpretation being that
pitch discrimination ability improves with increasing pitch saliency. Assuming the strength of
95
the pitch conveyed by modulation increases with modulation depth, they predicted that
increasing modulation depth would increase pitch salience and improve pitch discrimination.
Towards this goal, they measured pitch discrimination provided by modulation frequency with
modulation depth circumstantially enhanced to achieve equal salience across modulation
frequencies. While the addition of compensatory modulation depth generally resulted in similar
or improved discrimination, a clear relationship between modulation depth and pitch
discrimination was not determined. The present study is designed to characterize the extent
that pitch resolution changes as a function of modulation depth. Doing so clarifies the
importance of modulation depth as a cue to be preserved or enhanced in cochlear implant
signal processing.
A real-time implementation of modulation enhancement in cochlear implant sound
processing demonstrated benefits to pitch perception while addressing many of the notable
concerns in the practical use of such strategies including the processing time to estimate F0,
loudness fluctuations, and speech intelligibility (Vandali et al. 2019). While these benefits are
argued to come about through increased access to envelope cues, how pitch perception
changes with varying access to envelope cues is not fully understood —particularly at reduced
modulation depths. Therefore, there is strong motivation to fully characterize the contribution
of amplitude modulation on pitch perception. The present study addresses this gap in
knowledge and systematically explores how modulation depth of the stimulus as well as
sensitivity to amplitude modulation affects pitch resolution.
96
EXPERIMENT I: EFFECT OF MODULATION DEPTH ON MODULATION FREQUENCY
DISCRIMINATION OF SAM TONES AND SAM NOISE
Methods
Overview
Experiment I characterizes modulation sensitivity and pitch resolution driven by
modulation frequency in adults with no known hearing loss and in adult cochlear implant users.
Modulation sensitivity was measured as modulation detection thresholds for amplitude-
modulated sinusoids and narrow-band noises for modulation frequencies of 55, 110, 220, and
440 Hz. Pitch resolution was measured as modulation frequency discrimination thresholds, also
for amplitude-modulated sinusoids and narrow-band noises, for modulation frequencies near
110, 220, and 440 Hz. Results were analyzed to consider how modulation sensitivity affects
pitch resolution across participant groups.
Participants
Fifteen adult cochlear implant users (R = 18-84 years old, M = 62.5 years, SD = 15.9
years, females = 9) and 13 individuals with no known hearing loss (R = 25-66 years old, M = 34.8
years, SD = 11.8 years, females = 5) took part in this study. Twelve of the cochlear implant
participants used Cochlear Corporation devices (Cochlear Americas, Lone Tree, CO, USA), two
used Advanced Bionics devices (Sonova, Los Angeles, CA, USA), and two used Med-El devices
(Med-El, Innsbruck, Austria). Participant information is provided in Table 1. Participants gave
97
informed consent and were paid $15 per hour for participating. The experimental protocol was
approved by the university Institutional Review Board.
Table 4.1: Participant Information for Experiment I
Subject information. Age at time of testing and age at onset of hearing loss is given in years.
Duration of profound hearing loss prior to implantation is given in years and estimated from
subject interview.
Remote Assessment
Adults with no known hearing loss and cochlear implant users took part in an online
listening experiment designed to characterize how modulation depth and modulation
sensitivity affect pitch resolution. All testing was done through TeamHearing: a web application
for auditory rehabilitation and assessment (www.teamhearing.org). Participants were asked to
complete testing on their personal computer, mobile, or tablet device. Remote auditory
98
assessments followed the recommendations of the Acoustical Society of America task force
remote testing (Peng et al. 2022). To maximize the quality of data obtained remotely, subjects
were encouraged to test in an environment with low-ambient noise. All participants with no
known hearing loss used headphones. Cochlear implant users were asked to complete the task
as they normally would connect for computer audio. When listening through a free field
speaker, cochlear implant users were asked to disable hearing technology worn on—or to
otherwise occlude—the ear contralateral to that being tested. When streaming via Bluetooth,
cochlear implant users were asked to disable any Bluetooth compatible devices worn on the
contralateral ear. Total testing time was three to four hours. During this testing time,
participants were encouraged to take breaks when feeling fatigued.
Loudness and Sensation Levels
Loudness scaling and detection thresholds were measured to reference sound levels to
sensation levels. Loudness scaling was measured by having participants adjust a 1 kHz pure
tone to be “soft,” “medium soft,” “medium,” and “medium loud.” Detection thresholds were
measured for pure tones for frequencies of 500, 1000, and 2000 Hz. Stimuli were 400
millisecond sinusoids with 20 millisecond raised-cosine attack and release ramps. Thresholds
were measured using a three-interval, three-alternative, forced-choice procedure in which two
of the intervals contained silence and one interval contained the tone. The starting level was
set by the participant to be “soft but audible” and was then reduced by 2 dB following correct
answers and increased by 6 dB after mistakes. A run continued until three mistakes were made
99
and the average of the last four reversals was taken as the threshold. This adaptive rule
converges to 75% detection accuracy (Kaernbach 2001).
Modulation Sensitivity
Modulation sensitivity was measured as modulation detection thresholds for
modulation frequencies of 55, 110, 220, and 440 Hz. Detection was measured using a three-
interval, three-alternative, forced-choice procedure for which participants were asked which
interval was modulated (or different). Prior to testing, participants listened to examples of
modulated and unmodulated stimuli. The initial modulation depth was 100% but decreased by
a factor of √2
3
following correct answers and increased by a factor of two following mistakes.
This adaptive rule converges to 75% detection accuracy (Kaernbach 2001). A measurement run
ended after participants made four mistakes and the final depth was taken as the detection
threshold. Modulation frequencies were measured in random order with three repetitions.
Correct answer feedback was provided on all trials for this and all subsequent procedures.
The modulating envelope was sinusoidal and defined according to the equations:
𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑜𝑟 = 1 −
𝑚 2
∗ (1 − cos(2𝜋𝑓𝑡 ))
𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑜𝑟 = max (𝑚𝑜𝑑𝑢𝑙𝑎𝑡𝑜𝑟 , 0)
The first equation is often used in the literature with modulation indices constrained
between 0 and 1; however, we note that the rectification applied in the second equation allows
any modulation index from zero to infinity to be used including the special case of half-wave
100
rectified sinusoids, which in this context corresponds to a modulation index of two, or 200%
depth. For measuring modulation detection thresholds, modulation depth was allowed to
increase beyond 100% depth if the participant needed it, in which case the above equations
were used to specify the shape of the modulation envelope.
For all procedures, standard and target stimuli were sinusoidally amplitude-modulated
sinusoids and amplitude-modulated narrow-band noises. These two types of stimuli were used
because amplitude-modulated sinusoids provide coherent modulations for which the temporal
dynamics are well defined; however, amplitude-modulated sinusoids can produce audible
spectral sidebands for high modulation frequencies. Whereas amplitude-modulated noise
mitigates the issues of spectral sidebands, the modulations are incoherently distributed across
a frequency range. Stimuli were 400 milliseconds in duration with 20 milliseconds raised-cosine
attack and release times. Stimulus intervals were separated by 200 milliseconds.
The carrier frequency of the modulated sinusoids and the center frequency of the
narrow-band noises were 6 kHz for participants with no known hearing loss and 4 kHz for
cochlear implant users. The different carrier frequencies were used because the higher
frequency was desired for those with no known hearing loss to ensure that spectral
components were not resolved in the encoding at the level of the cochlea or auditory nerve.
This higher frequency could not consistently be used with cochlear implant users because they
were not always provided by clinical devices; consequently, 4 kHz was used for the cochlear
implant users. For the amplitude-modulated narrow-band noises, Gaussian white-noise was
passed through a quarter-octave wide, second-order, bandpass filter. A quarter-octave filter
101
was chosen as it is larger than estimated auditory filters in that region (Moore & Glasberg
1983). Like the presence of spectral sidebands, systematic changes in loudness can affect
perception of modulated signals; however, previous evidence indicates that neither normal
hearing, hearing-impaired, or cochlear implant users benefit meaningfully from the presence of
loudness cues during tasks that utilize amplitude-modulated signals (Viemeister 1979;
Donaldson & Viemeister 2002; Schlittenlacher & Moore 2016; Monaghan et al. 2022). In
particular, Chatterjee & Oberzut (2011) show that level-roving does not have a significant effect
on modulation detection or discrimination thresholds — the two behavioral measures
performed in this study — and may simply increase the task difficulty. Although intensity cues
play only a minor role in temporal envelope processing, the current study incorporates ±6 dB
intensity level-roving during stimulus presentation. To further minimize cues not related to
pitch frequency, stimuli were roved in modulation frequency by 1/8
th
octave. Prior to each trial,
participants were presented with a representative stimulus from the upcoming exercise and
allowed to adjust the level to be comfortable. Cochlear implant users with residual hearing
were asked to use earplugs. In the present study, there were no cochlear implant users who
used free-field acoustic speakers while having residual hearing in the non-implanted ear.
Pitch Resolution
Pitch resolution was measured as modulation frequency discrimination thresholds for
modulation frequencies near 110, 220, and 440 Hz. These modulation frequencies were chosen
as representative of the typical range of spoken speech and as the range that discrimination
typically degrades for cochlear implant users. Furthermore, this range is important for music
102
perception as it centers upon middle C (261.6 Hz) — the center note on a piano and middle-
ranged pitch that can be easily heard by both cochlear implant users and those with no known
hearing loss. Discrimination was measured using a two-interval, two-alternative, forced-choice
procedure for which participants were asked which interval was higher in pitch. As for the
measures of modulation sensitivity, stimuli were either amplitude-modulated sinusoids or
narrow-band noises. Modulation depth was an independent variable with depths of 25, 50, and
100% tested, as well as an enhanced condition where the modulation was a half-wave rectified
sinusoid. Each measurement run began with a modulation frequency difference of 100% (i.e.,
an octave). This difference was adaptively decreased by a factor of √2
3
following correct
answers and increased by a factor of two following mistakes. This adaptive rule converges to
75% discrimination accuracy (Kaernbach 1991).
For each trial, the specific modulation frequency was roved with values selected from a
quarter-octave range uniformly distributed and geometrically centered on the condition
frequency. Relative to the roved value, the standard modulation frequency was lowered, and
the target raised by a factor of √1 + ∆ 100 ⁄ , where ∆ indicates the percent difference. The
intensity of the standard and target were roved by 6 dB based on a uniform distribution
centered on the participant’s comfortable listening level. A run ended when the participant
made four mistakes and the final difference was taken as the discrimination threshold.
103
Data Analysis – Descriptive and Inferential Statistics
Individual results are plotted for each participant with descriptive statistics provided for
group means and standard error for each condition. When comparing results across conditions,
the effect size of the comparison is quantified as Cohen’s d (Cohen 1992).
The measures of modulation sensitivity and pitch resolution are full factorial in design.
Modulation sensitivity was measured for each participant using two stimulus types and four
modulation frequencies, with three repetitions of each. Pitch resolution was measured for each
participant using the same two stimulus types, three modulation frequencies, and four
modulation depths, with three repetitions of each. The strength of these effects and their
interactions were quantified using repeated-measures analyses of variance with stimulus type,
modulation depth (where applicable), and modulation frequency as within-subject factors and
participant group as a between-subject factor. An initial analysis of variance was conducted
with repetition as a within-subject factor, but neither repetition nor any of its interactions with
other factors were significant (p > 0.05), signifying that there is no significant variability in
performance across repetitions within each condition.
The primary hypothesis of modulation sensitivity predicting pitch resolution was tested
using Pearson r bivariate correlation analyses between discrimination thresholds of pitch
resolution and detection thresholds of modulation sensitivity.
Predictive Analytics
Computational models were used to test the predictive power of vector strength as a
measure of synchrony in healthy physiology and in cochlear implant stimulation. Acoustic
104
stimuli for each experimental condition were processed through a phenomenological model of
auditory physiology (Zilany et al. 2014). The same analyses were performed using stimulation
patterns of typical cochlear implant sound processing (Swanson & Mauch 2006) followed by
current spread and a point process model of neural excitation (Litvak et al. 2007; Goldwyn et al.
2012; Goldsworthy 2022). Vector strength to the input modulation frequency was calculated
for modeled auditory-nerve activity and for cochlear implant stimulation patterns. Linear
regression was used to test measured behavioral thresholds for modulation frequency
discrimination.
Figures 4.1 and 4.2 show modeled auditory-nerve response and cochlear-implant
stimulation for representative stimuli. Auditory-nerve activity was modeled using a
phenomenological model of auditory processing that has been developed across multiple
institutions (Zilany et al., 2014). The auditory-nerve model was implemented with 256 fibers
with logarithmically spaced characteristic frequencies from 125 to 8000 Hz. The input level for
all stimuli was specified as 65 dB SPL. The species parameter was set to human, which uses
basilar-membrane tuning based on Shera et al. (2002). The inner and outer hair-cell scaling
factor was specified to model normal hearing. The fiber type was specified as having a low-
spontaneous rate of discharge.
Cochlear-implant stimulation was modeled using the Nucleus MATLAB Toolbox made
available by Cochlear Corporation (Lone Tree, CO). Default processing was used to emulate
Advanced Combinatorial Encoders (ACE). Details on processing have been described elsewhere
(B. Swanson, 2006). Cochlear-implant processing was implemented with 22 channels, 8
105
channels selected per analysis frame, a total stimulation rate of 14,400 pulses per second
corresponding to a channel stimulation rate of 1800 pulses per second per channel. All
parameters were default parameters as specified in the 4.42 version release. Current spread
was modeled using an inverse law for voltage attenuation. Rationale for using simple models of
electrode geometry and summation of electric fields have been described elsewhere (Litvak et
al., 2007). The modeled voltage source after current spread was used to drive a point process
model of neuronal excitation as developed and described by Goldwyn et al. (2012).
Figure 4.1: Computational Modeling of the Auditory Nerve Response to an Amplitude
Modulated Tone
Modeled auditory-nerve response and cochlear-implant stimulation patterns for representative
stimuli. The stimulus for the auditory model was an amplitude-modulated sinusoid with a 6 kHz
carrier frequency and 220 Hz modulation frequency. Upper left panels show spectrogram
representations of average fiber firing rate for 256 fibers logarithmically spaced between 125
106
and 8000 Hz. Tonotopy is shown as the average firing rate across time and synchrony as the
corresponding vector strength to the modulation frequency. The stimulus for cochlear-implant
stimulation was an amplitude-modulated sinusoid with a 4 kHz carrier frequency and 220 Hz
modulation frequency. Lower left panels show electrodogram representations to the same
stimulus with individual pulses drawn for corresponding electrodes. Tonotopy is shown as
normalized charge delivered per electrode and synchrony as charge-weighted vector strength
to the input tone frequency.
Figure 4.2: Computational Modeling of the Auditory Nerve Response to Amplitude Modulated
Noise
As for Figure 4.1 but using amplitude-modulated narrow-band noise as the input stimulus.
The measures of synchrony provided for representative stimuli in Figures 4.1 and 4.2 are
based on calculations of vector strength. In neuroscience, vector strength is typically calculated
based on action potentials, which being all-or-nothing events, can be defined as:
107
𝑉𝑆
𝐴𝑁
= |
1
𝑁 ∑ 𝑒 𝑗 2𝜋𝑓 𝑡 𝑖 𝑁 𝑖 =1
|
Where 𝑁 is the number of action potentials, 𝑓 is the frequency (or fundamental
frequency) of interest, and 𝑡 𝑖 is the time of the event (Goldberg and Brown, 1968; van
Hemmen, 2013). This measure of vector strength is used as a prediction metric to test for
correlation with behavioral measures of modulation frequency discrimination. Vector strength
was calculated for modeled neurons both for the model of normal hearing as well as for the
response to electrical stimulation.
Results
Modulation Sensitivity
Figure 4.3 shows modulation detection thresholds for participants with no known
hearing loss and for cochlear implant users. On average, cochlear implant users were less
sensitive to changes in modulation (𝐹 1,32
= 39.7, 𝑝 < 0.001) with average thresholds of 50.9%
compared to 17.2% for those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.89, 𝑝 = 0.0015).
Modulation sensitivity generally worsened with increasing modulation frequency (𝐹 3,96
=
25.9, 𝑝 < 0.001), though a significant interaction between modulation frequency and
participant group (𝐹 3,96
= 2.93, 𝑝 = 0.05) identifies that one groups performed worse on
modulation sensitivity as frequencies increased. Detection thresholds were generally better for
amplitude-modulated sinusoids compared to narrow-band noise (𝐹 1,32
= 48.2, 𝑝 < 0.001) and
108
there was not a significant interaction between stimulus type and hearing group (𝐹 1,32
=
2.41, 𝑝 = 0.13).
Figure 4.3: Modulation Sensitivity to Amplitude Modulated Tones and Noise
Modulation detection thresholds for sinusoidally amplitude-modulated tones and narrow-band
noise. The bottom and top edges of each box indicate 25
th
and 75
th
percentiles, respectively.
Whiskers extend to the most extreme data points not considered outliers. Outliers are plotted
individually using the '+' symbol. Black circles with error bars indicate the sample mean and
standard error. The horizontal line within each box indicates the median. Abbreviations: no
known hearing loss (NKHL) and cochlear implant (CI).
Pitch Resolution
Figure 4.4 shows modulation frequency discrimination thresholds for participants with
no known hearing loss and for cochlear implant users. On average, cochlear implant users had
worse pitch resolution (𝐹 1,32
= 39.7, 𝑝 < 0.001) with a grand average of 54.3% compared to
17.5% for those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.00, 𝑝 < 0.001). For both groups, pitch
109
resolution improved with modulation depth (𝐹 3,32
= 228, 𝑝 < 0.001). The effect of depth,
however, was dampened in cochlear implant users compared to those with no known hearing
loss as evidenced by the interaction between participant group and modulation depth (𝐹 3,32
=
44.1, 𝑝 < 0.001). Further examining this interaction by comparing extreme depths for each
group, average discrimination thresholds for those with no known hearing loss, improved from
76.1 to 4.0% (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 2.38, 𝑝 < 0.001) when comparing the 25% depth to half-wave rectified
modulation. In contrast, the improvement in discrimination resulting from modulation depth
for the same conditions was only from 76.1 to 20.3% (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.32, 𝑝 < 0.001) for the
cochlear implant users. Similarly, all comparisons of the improvement from 25% depth to half-
wave rectified modulation was always larger for those with no known hearing loss.
Figure 4.4: Pitch Resolution as a Function of Modulation Depth for Acoustic Stimuli
Modulation frequency discrimination thresholds for amplitude-modulated tones and noise. The
bottom and top edges of each box indicate 25
th
and 75
th
percentiles, respectively. Whiskers
extend to the most extreme data points not considered outliers. Outliers are plotted
110
individually using the '+' symbol. Black circles with error bars indicate sample means and
standard errors. The red horizontal line within each box indicates the sample median.
Abbreviations: Half wave rectification (HWR), no known hearing loss (NKHL) and cochlear
implant (CI).
Modulation Sensitivity Predicts Pitch Resolution for Cochlear Implant Users
Correlation analyses were performed to characterize the relationship between
modulation sensitivity and pitch resolution including the effect of modulation depth. Figure 4.5
shows pitch resolution plotted against modulation sensitivity for all participants. For each
stimulus type, thresholds were logarithmically averaged across modulation frequencies and
repetitions for each participant. Correlations between modulation frequency discrimination and
modulation detection were calculated for each depth. Table 2 summarizes correlations for all
participants together and separated into groups. Modulation detection was largely predicative
of discrimination, with individuals most sensitive to modulation having the best pitch
resolution. For cochlear implant users, the correlation between sensitivity and pitch resolution
was significant for all comparisons explaining between 21% and 61% of the variability in pitch
resolution across participants.
Expanding on previous work, the current study examines the effect of modulation depth
within this relationship. When discriminating between modulation frequencies, increasing
modulation depth generally improved pitch resolution; however, even those with high
sensitivity to modulation had limited resolution when modulation depth of the signal was poor.
Similarly, those who were less sensitive to modulation required deeper or enhanced
modulation before resolution improved. These data reaffirm the relationship between
111
modulation sensitivity and temporal pitch processing, and bring attention to the value in
providing access to envelope cues through the given stimulus.
Figure 4.5: Correlations between Pitch Resolution and Modulation Sensitivity for Acoustic
Stimuli
Comparisons of pitch resolution and modulation sensitivity. Modulation frequency
discrimination is plotted versus detection thresholds with both representing individual
thresholds averaged across conditions and repetitions. Panel columns represent the different
modulation depths and panel rows represent the different stimulus types. Comparisons of jitter
detection and speech reception with modulation sensitivity. Least-squares regression lines are
plotted to visualize trends within each participant group. Abbreviations: Half wave rectification
(HWR), sinusoidally amplitude-modulated (SAM), no known hearing loss (NKHL) and cochlear
implant (CI).
Table 4.2: Correlation Coefficients Comparing Pitch Resolution Across Procedures for Acoustic
Stimuli
112
Magnitude of correlation coefficients comparing individual results from modulation detection
and modulation frequency discrimination. Detection thresholds averaged across modulation
frequency were correlated with discrimination thresholds for each modulation depth condition.
Correlation coefficients with p-values less than 0.05 are emboldened and are labeled as follows:
p<0.05 (*), p<0.01 (**), and p<0.001 (***).
Modulation Depth
SAM Tones SAM Noise
25% 50% 100% HWR 25% 50% 100% HWR
Combined 0.49** 0.65*** 0.77*** 0.72*** 0.23 0.41* 0.74*** 0.80***
CI Users
0.67*** 0.75*** 0.78*** 0.66** 0.61** 0.45* 0.49* 0.48*
No Known
Hearing Loss
0.63 0.41 0.31 0.01 0.34 0.19 0.28 0.47
Vector Strength Predicts Pitch Resolution
Figure 4.6 shows pitch resolution versus vector strength of modeled auditory-nerve
activity for participants with no known hearing loss and for cochlear implant users. Vector
strength as a predictive metric of pitch resolution explains 94% of the variance in those with no
known hearing loss and 74% of the variance in cochlear implant users. One notable result of
this comparison is that the calculated vector strength of cochlear implant stimulation is often
poorer compared to the calculated vector strength of modeled auditory-nerve activity. This
indicates that while vector strength is highly predictive of behavioral thresholds for each group,
cochlear implant users are limited by poorer encoding of the temporal envelope at the auditory
periphery.
113
Figure 4.6: Correlations between Pitch Resolution and Modeled Vector Strength for Acoustic
Stimuli
Modulation frequency discrimination thresholds plotted against vector strength of modulation
synchrony observed in modeled auditory-nerve fibers driven by acoustic and electric
stimulation. Markers represent the average across participants, modulation frequencies, and
repetitions. Depth is conveyed by decreasing marker opacity with increasing modulation depth.
Abbreviations: sinusoidally amplitude-modulated (SAM).
EXPERIMENT II: MODULATION SENSITIVITY AND PITCH RESOLUTION PROVIDED
BY SINGLE-ELECTRODE STIMULATION
Methods
Overview
Experiment II was designed to compare modulation sensitivity and pitch resolution for
single-electrode stimulation. The motivation of Experiment 2 is to clarify any loss of modulation
encoding introduced by sound processing for cochlear implants. To this end, analogous
114
measures of modulation sensitivity and pitch resolution were conducted as in Experiment 1 but
using amplitude-modulated pulse trains delivered to a single electrode. Results are analyzed to
compare modulation sensitivity and pitch resolution across experiments, to characterize the
effect of modulation depth on pitch resolution, and to test the predictive power of modulation
sensitivity and encoding on individual pitch resolution.
Participants
Eight adult cochlear implant users (R = 51-83 years old, M = 67.8 years, SD = 12.4 years,
females = 3) took part in this study. All participants in Experiment II used Cochlear Corporation
implants (Cochlear Americas, Lone Tree, CO, USA). Participant information is provided in
Table 4.3. Participants gave informed consent and were paid $15 per hour for participating. The
experimental protocol was approved by the university Institutional Review Board.
Table 4.3: Participant Information for Experiment II
Subject information. Age at time of testing and age at onset of hearing loss is given in years.
Duration of profound hearing loss prior to implantation is given in years and estimated from
subject interview.
115
General Stimuli
For all procedures, stimuli were pulse trains, 400 milliseconds in duration, with 10
millisecond raised-cosine attack and release times. For psychophysical trials, standard and
target intervals were separated by 200 milliseconds. Pulsatile stimulation rates were 3500 Hz
for participants with N22 implants and 14 kHz for others. Frequency allocation of clinical sound
processing is such that acoustic stimuli from Experiment 1 were generally represented by
electrodes near the basal end of the array (Zeng et al. 2008). As such, stimulation in Experiment
2 was delivered to the most basal electrode consistently across procedures. Modulation depth
was specified between thresholds of audibility and comfortable stimulation levels (prompted as
“Your most comfortable listening level”, also as “a 4 out of 5”).
Prior to each trial, participants were presented with a representative stimulus from the
upcoming exercise and allowed to adjust the level to a comfortable listening level. To minimize
cues not related to pitch frequency, stimuli were roved in level by 10% of the subject’s dynamic
range, and roved in modulation frequency by 1/4 octave for the discrimination task. 10% of the
electrical dynamic range was chosen to mitigate the small discrepancy in loudness that occurs
with increasing modulation depth —for pulse trains modulated at 32% depth, (Chatterjee &
Oberzut 2011) measured a mean discrepancy in level of only 1.09 dB. Nevertheless, additional
loudness-balancing was performed for the modulation detection task and is described below.
116
Modulation Sensitivity
The procedure for modulation detection was analogous to that described in Experiment
I but included additional loudness balancing. Prior to each measurement run, participants
increased the stimulus level of an unmodulated pulse train to be comfortable. Participants were
then asked to adjust the level of a modulated pulse train with 100% depth until it matched the
presentation level of the unmodulated pulse train. Participants heard both stimuli sequentially
when making these adjustments with the level of the unmodulated pulse train held constant.
The difference in level between the modulated and unmodulated stimulus was used to
compensate for the decreased loudness that occurs with increasing modulation depth. The
amount of added compensation varied linearly with increasing depth:
𝐶𝑜𝑚𝑝𝑒𝑛𝑠𝑎𝑡𝑖𝑜𝑛 = 𝑚 ∗ (𝐶 𝑚𝑜𝑑 −𝐶 𝑢𝑛𝑚𝑜𝑑 )
Where 𝑚 is the modulation index, 𝐶 𝑚𝑜𝑑 is comfort set for the modulated pulse train,
and 𝐶 𝑢𝑛 𝑚 𝑜𝑑
is the comfort set for the unmodulated pulse train. Level roving occurred after
loudness balancing was applied.
The adaptive procedure was identical to that of Experiment I but with modulation depth
applied between thresholds of audibility and comfortable stimulation levels. The initial depth
was 100%, which decreased by a factor of √2
3
following correct answers and increased by a
factor of two following mistakes. A measurement run ended after participants made four
mistakes and the final depth was taken as the detection threshold. Modulation frequencies
were measured in random order with three repetitions.
117
Pitch Resolution
The procedure for modulation frequency discrimination was analogous to that
described in Experiment I but using modulated pulse trains and with an added comparison for
variable pulse rates. Modulation depths included 25, 50, 100, and 200% (i.e., half-wave
rectified), and for when modulation frequency was conveyed by pulse rate. The adaptive
procedure was identical to that of Experiment 1: the initial frequency difference was 100%,
which decreased by a factor of √2
3
following correct answers and increased by a factor of two
following mistakes. A measurement run ended after participants made four mistakes and the
final depth was taken as the discrimination threshold. Conditions were measured in random
order with three repetitions.
Data Analysis – Descriptive and Inferential Statistics
Individual results are plotted for each participant group with descriptive statistics
indicated for group means and standard errors. When comparing results across conditions,
effect size is quantified as Cohen’s d (Cohen 1992). For modulation detection, repeated-
measures analysis of variance was performed with modulation frequency as a within-subject
factor and participant group as a between-subject factor. For modulation frequency
discrimination, within-subject factors were modulation depth (with the condition of pulse rate
excluded) and modulation frequency, with participant group as a between-subject factor.
Pearson r bivariate correlation analyses were conducted between measures of modulation
sensitivity and pitch resolution averaged across repetitions and modulation frequency.
Correlation analyses were performed across modulation depths.
118
Predictive Analytics
Computational models were used to test the predictive power of vector strength of
modeled auditory-nerve response for pitch resolution. Vector strength to the input modulation
frequency was calculated for modeled auditory-nerve activity in response to cochlear implant
stimulation patterns and tested using linear regression to measured behavioral thresholds for
modulation frequency discrimination.
Results
Modulation Sensitivity
Figure 4.7 shows modulation detection thresholds for amplitude-modulated pulse trains
in cochlear implant users. Results from the first experiment are provided for comparison of the
analogous measures using amplitude-modulated sinusoids. Cochlear implant users were
notably more sensitive to modulations when presented to a single electrode with modulation
depth specified between detection thresholds and comfortable stimulation levels (𝐹 1,30
=
7.42, 𝑝 < 0.05). Comparing modulation sensitivity to the results from Experiment 1 for
amplitude-modulated tones, detection thresholds improved from 26.0 to 5.3% at 55 Hz
(𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.34, 𝑝 < 0.001) , from 27.0 to 6.2% at 110 Hz (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 1.09, 𝑝 = 0.0029) , and
from 40.4 to 19.0% at 220 Hz (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.61, 𝑝 = 0.13) . With diminishing returns, the
difference between clinical sound processing and single-electrode stimulation was notably
small for the 440 Hz condition (𝑑 𝐶𝑜 ℎ𝑒𝑛
= 0.082, 𝑝 = 0.813).
119
Figure 4.7: Modulation Sensitivity to Amplitude Modulated Pulse Trains Compared Alongside
Results from Experiment I
Modulation detection thresholds for sinusoidally amplitude-modulated tones and noise (from
the first experiment) and for modulated pulse trains presented to a single electrode using direct
stimulation. The bottom and top edges of each box indicate 25
th
and 75
th
percentiles,
respectively. Whiskers extend to the most extreme data points not considered outliers. Outliers
are plotted individually using the '+' symbol. Black circles with error bars indicate the sample
mean and standard error. The horizontal line within each box indicates the median.
Abbreviations: no known hearing loss (NKHL) and cochlear implant (CI).
Pitch Resolution
Figure 4.8 shows modulation frequency discrimination thresholds for amplitude-
modulated pulse trains, as well as analogous measures from Experiment 1. Pitch resolution
worsened with increasing modulation frequency (𝐹 2,59
= 6.29, 𝑝 < 0.01), regardless of
whether stimuli were heard through the sound processor or via direct stimulation. The graded
effect of modulation depth on pitch resolution seen in Experiment 1 was not observed with
120
direct stimulation. Rather, pitch resolution was possible and relatively good for the smallest
depth tested, rivaling the best performance measured for any condition when listening with
clinical sound processing. When averaging across all modulation frequencies tested, this initial
advantage is notable, even when compared to those with no known hearing loss (𝑑 𝐶𝑜 ℎ𝑒𝑛
=
1.03, 𝑝 = 0.004).
Figure 4.8: Pitch Resolution as a Function of Modulation Depth for Amplitude Modulated Pulse
Trains Compared Alongside Results from Experiment I
Modulation frequency discrimination thresholds for sinusoidally amplitude-modulated tones
and single-electrode pulse trains. The bottom and top edges of each box indicate 25
th
and 75
th
percentiles, respectively. Whiskers extend to the most extreme data points not considered
outliers. Outliers are plotted individually using the '+' symbol. Black circles with error bars
indicate sample means and standard errors. The horizontal line within each box indicates the
sample median. Abbreviations: half-wave rectification (HWR), no known hearing loss (NKHL),
and cochlear implant (CI).
121
Modulation Sensitivity Predicts Pitch Resolution
Figure 4.9 shows measures of pitch resolution plotted versus modulation sensitivity for
direct stimulation and for the analogous measures of Experiment 1. Modulation sensitivity had
strong explanatory power towards pitch resolution, with cochlear implant users most sensitive
to modulation having the best pitch resolution. This relationship appears as an extension of
that observed in Experiment 1 with tones presented through clinical sound processors, with
resolution further improving with increased modulation sensitivity. Therefore, perception of
modulation in cochlear implant users appears to be at least partially limited by the sound
processing of the implant itself.
Figure 4.9: Correlations between Pitch Resolution and Modulation Sensitivity for Amplitude
Modulated Pulse Trains
Comparisons of individual results for modulation frequency discrimination versus modulation
detection based on averages across modulation frequencies and repetitions for amplitude-
modulated pulse trains. Different panels show different modulation depths, and from left to
right are 25%, 50%, 100%, half-wave rectification (HWR), and for when modulation frequency
was conveyed by pulse rate (Pulse Rate). Markers represent the average measure for each
individual participant across repetitions and modulation frequency for modulation detection
and modulation frequency discrimination. Least-squares regression lines are plotted in black.
122
Vector Strength Predicts Pitch Resolution
Figure 4.10 shows average behavioral thresholds for modulation frequency
discrimination plotted versus the vector strength of modeled auditory-nerve fibers for
participants with no known hearing loss and for cochlear implant users. Vector strength as a
predictive metric of discrimination thresholds was not strongly predictive of performance. It is
possible that cochlear implant users are less able to use the encoded information. Notably, the
estimated vector strength is greater than that provided by the clinical sound processor. Indeed,
the discrimination thresholds at the lowest modulation depth provided by pulse trains matched
the performance seen at the deepest depth condition when listening through the sound
processor.
123
Figure 4.10: Correlations between Pitch Resolution and Modeled Vector Strength across all
Experiments
Modulation frequency discrimination thresholds plotted against vector strength of synchrony to
modulation frequency observed in modeled auditory-nerve fibers for amplitude-modulated
tones and pulse trains. Markers represent averages across participants, repetitions, and
modulation frequencies. Depth is conveyed by marker opacity, with opacity decreasing with
increasing modulation depth. Abbreviations: no known hearing loss (NKHL) and cochlear
implant (CI).
Discussion
The hypothesis tested by this study is that sensitivity to modulation, as well as the
degree that envelope cues are available in the stimulus, are predictive of temporal pitch
perception. This hypothesis was supported by evidence of strong positive correlations between
measures of modulation detection and modulation frequency discrimination, with pitch
resolution improving with increasing modulation depth. This discussion is focused on the
significance of these results and implications for sound processing for cochlear implants.
Pitch resolution provided by modulation frequency improved with deeper modulations
for both those with no known hearing loss and cochlear implant users–even when considering
poor modulation encoding by conventional sound processing. Improving access to envelope
cues through direct electrode stimulation, however, improved both modulation sensitivity and
pitch resolution. Overall, the results support the hypothesis that temporal pitch perception
improves with increased access to envelope cues and highlights the need to preserve deep
modulations with electrical stimulation of the auditory nerve.
124
Our findings are in general agreement with a classic study from Patterson et al. (1978),
where they investigated the modulation depth required to discriminate between two
sinusoidally amplitude-modulated noise samples differing in modulation frequency by 20%
(Patterson et al. 1978). The results from their study indicate that pitch discrimination of
modulated signals is constrained by the audibility of the modulation, with this threshold
described as “essentially a constant multiple of modulation threshold.” However, results from
Experiment 1 suggest that there is a non-linear improvement in pitch discrimination that
extends beyond this “rate threshold”.
Across all modulation frequencies tested, pitch resolution as measured in Experiment 1
continued to improve beyond the point of detection. Notably, benefits plateaued at modulation
depths approaching 100% with further sharpening offering only mild improvements to pitch
perception. This result is in line with previous work by Landsberger (2008) who explored the
effect of envelope sharpening on modulation pitch discrimination at a fixed modulation depth.
Stimuli consisted of monopolar pulse trains delivered to single electrodes with modulations
that were either sinusoidal, sawtooth, sharpened sawtooth, or square waveforms. He found
that just noticeable differences for a 100 Hz modulator were similar for all waveforms tested,
irrespective of the position along the auditory nerve. Like the present study, these data suggest
that at sufficiently high depths, mild sharpening of the temporal envelope may not offer
appreciable improvements to pitch perception. In fact, it may take dramatic enhancements in
envelope shape to measure a benefit above 100% modulated (Goldsworthy et al. 2021). Of
course, few would describe the modulation provided by cochlear implant stimulation as
125
needlessly sharp — rather, envelope cues are often poorly conveyed through the speech
processor. As such, an explicit goal of this study was to determine the consequence of degraded
envelope cues on pitch resolution. In particular, we hypothesized that performance would be
determined by the encoding of modulation at the auditory periphery, with increased envelope
synchrony associated with greater pitch discrimination ability.
The results of the present study were well predicted by vector strength as a measure of
synchrony based on modeled auditory-nerve activity representing normal hearing or in
response to cochlear implant stimulation emulating the Nucleus system using the ACE
processing strategy. Interestingly, the increased envelope synchrony provided by modulated
pulse trains did not improve pitch resolution beyond that provided by the sound processor. The
predictive power of this metric suggests it could be used to optimize temporal coding for
cochlear implants. Specifically, sound processing could be designed to produce high levels of
synchrony as quantified by the vector strength introduced here. Synchrony measured in this
way would be high for strategies that enforce or enhance modulation (Vandali & van Hoesel
2011; Vandali et al. 2017) or for strategies based on fine timing of channel envelopes (Hochmair
et al. 2006; Smith et al. 2014; Hossain & Goldsworthy 2018; Lamping et al. 2020).
The physiological underpinning of modulation enhancement was explored by Van
Canneyt et al., 2019. They note that increasing the decay time or off time of the temporal
envelope leads to larger auditory steady state responses. Additionally, a longer off time and a
shorter attack time reduced the delay of the response. In conjunction with their objective
measures, Van Canneyt and colleagues modeled the effects of envelope shape on the neural
126
response. Modeled responses support the effect of decay and off time and point to a sharp
attack as a factor that increased the synchrony of the population response. Therefore, deeper
modulations may allow more neurons to recuperate from refractoriness to create a sharper
temporal response reflecting the modulation frequency of the stimulus. These findings mirror
results from both animal and human studies where increasing modulation depth of an
amplitude-modulated pulse train increased modulation in the evoked response at the auditory
periphery (Jeng et al. 2009; Tejani et al. 2017). However, the correlation between physiological
capture of the temporal envelope and perception requires further investigation. Work by Tejani
and colleagues (2017) have begun to reveal the association, with greater neural encoding of
modulation correlating with greater sensitivity to modulation for rates as high as 500 Hz;
however, similar associations have not been found for fluctuations near 20 Hz (He et al. 2023).
The present study builds upon a growing body of literature that has examined a similar
effect in binaural hearing and growing evidence of the physiological basis. Given the crucial role
that pitch plays in perception, new designs for sound processing should continue to push the
envelope towards deeper modulations while preserving the essential cues for speech
reception. The present article also introduces a metric of synchrony that could be used in
optimization of temporal coding. In general, while efforts have been made to refine temporal
coding for cochlear implants, much remains to model the sophisticated physiological processes
that occur in healthy physiology.
127
Chapter 5: General Discussion
Sensory systems are inherently limited by the quality of their inputs. Therefore,
attempts to study —much less restore— sensation require effort in two areas: 1) understanding
of the cues driving sensation and 2) robust delivery of those cues to the neural sensor.
Following this principle, the purpose of the performed work was to investigate the pitch cues
used in complex listening, and to characterize the relationship between pitch sensitivity and
signal fidelity on perception. Overall, these data point to reduced sensitivity to low-level pitch
cues as a limiting factor in hearing performance and encourage advancements in sound
processing that preserve the spectral and temporal cues defining pitch. Discussion is focused on
the significance of overarching trends and their implications for speech processing strategies, as
well as anticipated challenges.
Sensitivity to Peripheral Pitch Cues Contributes to Hearing Performance
A core hypothesis evaluated in these studies was the relationship between sensitivity to
pitch and success in complex listening tasks. This hypothesis was supported by evidence of
strong correlations between perceptual thresholds and performance, with performance
improving with increased sensitivity to low-level pitch cues. In particular, CI users with the
greatest modulation sensitivity were typically those with the greatest performance.
Results from the consonance/dissonance perception study are in general agreement
with previous work demonstrating sensitivity to spectrotemporal pitch as predictive of aspects
of music perception including melody and pitch (Choi et al. 2018) —although, performance
128
measured by (Choi et al. 2018) was not strongly impacted when considering spectral or
temporal cues alone. Therefore, reliable access to the combination of spectral and temporal
pitch cues may be required for robust music perception. Results from the stream segregation
study are similarly in line with previous findings which demonstrate a relationship between
speech understanding and sensitivity to modulation (Won et al. 2011; Won et al. 2015). Indeed,
the authors suggest measurements of modulation sensitivity through the sound processor as a
valuable metric for predicting the efficacy of clinical sound processing strategies. However,
temporal envelope cues are poorly conveyed by CI sound processing and limit the ability to
benefit from envelope cues.
For instance, the profile of masking release observed during the stream segregation
study was diminished in the CI group when listening to pure and complex tones, with greater
pitch shifts required to match performance in the group with no known hearing loss —worse,
masking release was wholly absent in CI users when listening to speech material, even when
the masker was not in direct competition with the target speech. In a similar vein, while we
observed a strong relationship between modulation detection and consonance identification in
CI users, both modulation sensitivity and the perception of consonance were poor compared
with the group with no known hearing loss. Likewise, while modulation detection was strongly
predictive of modulation frequency discrimination; pitch perception in CI users appeared to be
limited by the modulation conveyed in the electrical stimulus. Overall, these data suggest that
hearing outcomes in CI users is limited by poor access to modulation pitch, and point to
modulation enhancement as an avenue for restoring hearing in CI users.
129
To this end, we investigated the effect of signal fidelity —in the form of modulation
depth— on pitch perception. A similar effort was performed by (Chatterjee & Oberzut 2011) in
which they circumstantially enhanced modulation depth to compensate for differences in
detection thresholds across tested F0s. The addition of compensatory modulation depth
generally resulted in similar or improved discrimination thresholds; however, no clear
relationship between modulation depth and pitch discrimination was measured. In contrast,
the present work observed a strong effect of modulation enhancement on pitch perception in
those with no known hearing loss as well as in CI users —even when considering poor
modulation encoding by conventional sound processing. A notable observation was that the
individuals with the greatest modulation sensitivity typically required less modulation depth in
the stimulus before reaching their best discrimination performance. However, the ultimate
factor limiting performance was signal fidelity. While individuals with no known hearing loss
were generally sensitive to modulation, their ability to discriminate modulation frequency was
limited when the stimulus envelope was shallow. In a similar vein, pitch perception was
improved in CI users when circumventing the speech processor to provide more deeply
modulated stimulation. Taken together, these data point to modulation enhancement as a
method for improving hearing outcomes in CI users.
Envelope Synchrony at the Auditory Nerve is Predictive of Performance
The predictive power of vector strength as a metric for modulation frequency
discrimination thresholds was exceptional, accounting for 94% of the variance in those with no
known hearing loss and 74% in CI users listening through the sound processor. However, the
130
calculated vector strength for cochlear implant stimulation was often weaker compared to
estimates for their normal-hearing peers. This suggests that while vector strength is highly
predictive of behavioral thresholds for each group, CI users are limited by poorer encoding of
the temporal envelope at the auditory periphery. Indeed, pitch perception was improved when
CI users were provided with high-fidelity stimulation that circumvented the sound processor.
The predictive power of this metric suggests it could be used to optimize the delivery of
envelope cues in CI stimulation. Specifically, sound processing could be designed to produce
high levels of synchrony as quantified by the vector strength. Synchrony measured in this way
would be high for strategies that enforce or enhance modulation (Vandali & van Hoesel 2011;
Vandali et al. 2017), or for strategies based on fine timing of channel envelopes (Hossain and
Goldsworthy, 2018).
General Framework of Modulation Enhancement Strategies
Modulation enhancement strategies typically adopt the ACE framework with the
inclusion of additional processing blocks —namely for F0 estimation and the enforcement of
modulation on selected channels. The F0mod strategy is a simple and representative example
of the core features in modulation enhancement strategies (Laneau et al. 2006; Milczynski et al.
2009). Following conventional envelope extraction via ACE processing, F0mod performs an
autocorrelation of the power spectrum —performed by FFT sliding window analysis— to
estimate the F0 of a complex signal. The estimated F0 is then applied coherently across
stimulation channels as a sinusoidal envelope modulated at 100% within the subject’s electrical
131
dynamic range. Modulation enhancement through the F0mod strategy has led to notable
improvements in musical pitch perception compared to ACE (Laneau et al. 2006; Milczynski et
al. 2009) while maintaining comparable sentence recognition in both quiet and noisy
environments (Milczynski et al. 2012). Encouraged by these results, more sophisticated
strategies have since been developed that consider the practical limitations of modulation
enhancement.
Limitations in Modulation Enhancement
An early and prevalent concern during the development of modulation enhancement
strategies was the impact they would have on speech intelligibility. The initial outcomes for
modulation enhancement and envelope sharpening were mixed, with improvements to pitch
perception paired with deficits in vowel recognition and formant discrimination (Green et al.
2005). These deficits were thought to arise from poorer coding of spectral information provided
by the experimental strategy, and served both as a practical reminder and cautionary tale to
those working on modulation enhancement: gains in specific auditory assessments cannot
come at the cost of deficiencies in other aspects of hearing.
Another practical concern was the logic gating the use of modulation enhancement. The
acoustic input is not guaranteed to have a periodic envelope or fundamental frequency. In
these cases, modulation enhancement may search for periodicity that does not exist and distort
the signal. A solution could be to use conventional signal processing when periodicity is not
detected; however, fluctuating between sound processing strategies may reduce speech
132
intelligibility. Moreover, transitions between strategies may create loudness fluctuations, with
modulation enhancement being generally softer than conventional processing. To mitigate this,
modulation enhancement could be paired with a parameter that adds gain to the modulated
channel proportional to the modulation depth enforced (Vandali et al. 2019).
Though, (Monaghan et al. 2022) found CI users to be largely insensitive to changes in
modulation depth that occurred at the lower end of the subject’s dynamic range —for rates
between 15.625 and 250 Hz, they found participants unable to detect a doubling of modulator
rate applied to the bottom 80–90% of the envelope. Therefore, modulation enhancement
strategies may not offer a benefit when representing the pitch of soft sounds. Similarly,
strategies that apply a varying degree of modulation enhancement may not be completely
effective, and may instead require full modulation depth or sharpening of the envelope so that
envelope peaks reach the upper end of the subject’s dynamic range.
A Modern Implementation of Modulation Enhancement in CI Sound Processing
The Optimized Pitch and Language (OPAL) strategy is a modern implementation of
modulation enhancement in CI sound processing with a framework similar to F0mod (Vandali et
al. 2019). A notable difference is in the enforcement of modulation. In contrast to F0mod which
fully modulates all stimulation channels, the degree of modulation applied to each channel in
OPAL is variable, with the depth of modulation derived from a function of the harmonic signal
power within the given channel —this modulation replaces the envelopes applied by ACE
(Vandali & van Hoesel 2011; Vandali & van Hoesel 2012). The method for F0 estimation used in
133
OPAL is based on harmonic sieves, which are a series of filterbanks, and is more precise and
robust to noise than previously used methods. The bank of filters within each sieve are
harmonically related, with center frequencies located at integer multiples of their shared F0.
Subsequent sieves in the series are spaced one semitone apart up to approximately 300 Hz. The
sieves with the largest power become candidates for the estimated F0, with F0 estimation
errors reduced by a second series of sieves which have narrower filter widths —error rates for
this process are within ±0.5 semitones of the known signal F0 (Vandali & van Hoesel 2011).
Finally, additional channel gain is introduced to compensate for changes in loudness created by
the application of enhanced modulation, with an average of only 4.5 dB in added gain needed
to compensate for loudness changes between OPAL and ACE processing (Vandali et al. 2019).
A real-time implementation of OPAL provided significant improvement to pitch
perception compared with ACE, while preserving speech perception in quiet and in noise
(Vandali et al. 2019). Notably, only a 4-week adaptation period to OPAL was required to match
sentence perception in ACE. In summary, the real-time implementation of OPAL demonstrates
benefits to pitch perception while addressing many of the notable concerns in the practical use
of modulation enhancement strategies including the processing time to estimate F0, loudness
fluctuations, and speech intelligibility. However, further research is required to determine the
effect of modulation enhancement across complex listening tasks, as well as the most effective
method for modulation enhancement (Smith et al. 2014; Lamping et al. 2020).
134
Concluding Remarks
The work performed in this dissertation highlight a principle general to both the study
and practical rescue of perception and sensorimotor systems: you get what you put in. To drive
a system, you require the relevant input. But just as important, you require a high-fidelity input.
This dissertation characterizes the basic features of pitch that contribute to complex listening
and demonstrates improvements in performance with increased sensitivity and access to pitch
cues at the neural sensor.
135
References
Arnoldner, C., Riss, D., Brunner, M., et al. (2007). Speech and music perception with the new
fine structure speech coding strategy: preliminary results. Acta Oto-Laryngologica, 127,
1298–1303.
Attneave, F., Olson, R.K. (1971). Pitch as a Medium: A New Approach to Psychophysical Scaling.
The American Journal of Psychology, 84, 147–166.
Bacon, S.P., Viemeister, N.F. (1985). Temporal Modulation Transfer Functions in Normal-
Hearing and Hearing-Impaired Listeners. Audiology, 24, 117–134.
Balkany, T., Hodges, A., Menapace, C., et al. (2007). Nucleus Freedom North American Clinical
Trial. Otolaryngol Head Neck Surg, 136, 757–762.
Banse, R., Scherer, K.R. (1996). Acoustic profiles in vocal emotion expression. Journal of
Personality and Social Psychology, 70, 614–636.
Barda, S., Vir, D., Singh, S. (2018). Coding and Analysis of Speech in Cochlear Implant: A Review.
International Journal of Advanced Research in Computer Science; Udaipur, 9, 118–125.
Beauvois, M.W., Meddis, R. (1996). Computer simulation of auditory stream segregation in
alternating‐tone sequences. The Journal of the Acoustical Society of America, 99, 2270–
2280.
von Békésy, G. (1928). Zur Theorie des Hörens; die Schwingungsform der Basilarmembran.
Phys. Zeits, 29, 793–810.
Berenstein, C.K., Mens, L.H.M., Mulder, J.J.S., et al. (2008). Current Steering and Current
Focusing in Cochlear Implants: Comparison of Monopolar, Tripolar, and Virtual Channel
Electrode Configurations. Ear and Hearing, 29, 250.
Bernstein, L.R., Trahiotis, C. (2010). Accounting quantitatively for sensitivity to envelope-based
interaural temporal disparities at high frequencies. The Journal of the Acoustical Society
of America, 128, 1224.
Bernstein, L.R., Trahiotis, C. (2017). An interaural-correlation-based approach that accounts for
a wide variety of binaural detection data. The Journal of the Acoustical Society of
America, 141, 1150–1160.
Bernstein, L.R., Trahiotis, C. (1994). Detection of interaural delay in high‐frequency sinusoidally
amplitude‐modulated tones, two‐tone complexes, and bands of noise. The Journal of
the Acoustical Society of America, 95, 3561–3567.
136
Bernstein, L.R., Trahiotis, C. (2002). Enhancing sensitivity to interaural delays at high
frequencies by using “transposed stimuli.” The Journal of the Acoustical Society of
America, 112, 1026–1036.
Bernstein, L.R., Trahiotis, C. (2009). How sensitivity to ongoing interaural temporal disparities is
affected by manipulations of temporal features of the envelopes of high-frequency
stimuli. The Journal of the Acoustical Society of America, 125, 3234.
Bernstein, L.R., Trahiotis, C. (2005). Measures of extents of laterality for high-frequency
“transposed” stimuli under conditions of binaural interference. The Journal of the
Acoustical Society of America, 118, 1626–1635.
Bernstein, L.R., Trahiotis, C. (2014). Sensitivity to envelope-based interaural delays at high
frequencies: Center frequency affects the envelope rate-limitation. The Journal of the
Acoustical Society of America, 135, 808–816.
Bernstein, L.R., Trahiotis, C. (2004). The apparent immunity of high-frequency “transposed”
stimuli to low-frequency binaural interference. The Journal of the Acoustical Society of
America, 116, 3062–3069.
Bernstein, L.R., Trahiotis, C. (2007). Why do transposed stimuli enhance binaural processing?:
Interaural envelope correlation vs envelope normalized fourth moment. The Journal of
the Acoustical Society of America, 121, EL23–EL28.Bidelman, G.M., Krishnan, A. (2009).
Neural Correlates of Consonance, Dissonance, and the Hierarchy of Musical Pitch in the
Human Brainstem. J. Neurosci., 29, 13165–13171.
Bidet-Caulet, A., Bertrand, O. (2009). Neurophysiological mechanisms involved in auditory
perceptual organization. Front. Neurosci., 3. Available at:
https://www.frontiersin.org/articles/10.3389/neuro.01.025.2009/full [Accessed May 1,
2019].
Brokx, J.P.L., Nooteboom, S.G. (1982). Intonation and the perceptual separation of
simultaneous voices. Journal of Phonetics, 10, 23–36.
Brungart, D.S., Simpson, B.D., Ericson, M.A., et al. (2001). Informational and energetic masking
effects in the perception of multiple simultaneous talkers. The Journal of the Acoustical
Society of America, 110, 2527–2538.
Cariani, P.A., Delgutte, B. (1996a). Neural correlates of the pitch of complex tones. I. Pitch and
pitch salience. Journal of Neurophysiology, 76, 1698–1716.
Cariani, P.A., Delgutte, B. (1996b). Neural correlates of the pitch of complex tones. II. Pitch shift,
pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region
for pitch. Journal of Neurophysiology, 76, 1717–1734.
137
Carlyon, R.P., Deeks, J.M. (2002). Limitations on rate discrimination. The Journal of the
Acoustical Society of America, 112, 1009–1025.
Carlyon, R.P., Deeks, J.M., McKay, C.M. (2010). The upper limit of temporal pitch for cochlear-
implant listeners: Stimulus duration, conditioner pulses, and the number of electrodes
stimulated. The Journal of the Acoustical Society of America, 127, 1469–1478.
Carlyon, R.P., Shackleton, T.M. (1994). Comparing the fundamental frequencies of resolved and
unresolved harmonics: Evidence for two pitch mechanisms? The Journal of the
Acoustical Society of America, 95, 3541–3554.
Carney, L.H. (2018). Supra-Threshold Hearing and Fluctuation Profiles: Implications for
Sensorineural and Hidden Hearing Loss. JARO, 19, 331–352.
Carney, L.H., Li, T., McDonough, J.M. (2015). Speech Coding in the Brain: Representation of
Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations. eNeuro, 2. Available
at: https://www.eneuro.org/content/2/4/ENEURO.0004-15.2015 [Accessed February 7,
2023].
Cedolin, L., Delgutte, B. (2005). Pitch of Complex Tones: Rate-Place and Interspike Interval
Representations in the Auditory Nerve. Journal of Neurophysiology, 94, 347–362.
Cedolin, L., Delgutte, B. (2010). Spatiotemporal Representation of the Pitch of Harmonic
Complex Tones in the Auditory Nerve. J. Neurosci., 30, 12712–12724.
Chatterjee, M., Oberzut, C. (2011). Detection and rate discrimination of amplitude modulation
in electrical hearing. The Journal of the Acoustical Society of America, 130, 1567–1580.
Chatterjee, M., Peng, S.-C. (2008). Processing F0 with cochlear implants: Modulation frequency
discrimination and speech intonation recognition. Hearing Research, 235, 143–156.
Chatterjee, M., Zion, D.J., Deroche, M.L., et al. (2015). Voice emotion recognition by cochlear-
implanted children and their normally-hearing peers. Hearing Research, 322, 151–162.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Corti, A. (1851). Recherches sur l’organe de l’ouïe des mammifères, Akademische
Verlagsgesellschaft.
Crew, J.D., Galvin, J.J., Fu, Q.-J. (2016). Perception of Sung Speech in Bimodal Cochlear Implant
Users. Trends in Hearing, 20, 2331216516669329.
Crew, J.D., Iii, J.J.G., Landsberger, D.M., et al. (2015). Contributions of Electric and Acoustic
Hearing to Bimodal Speech and Music Perception. PLOS ONE, 10, e0120279.
138
Cullington, H.E., Zeng, F.-G. (2008). Speech recognition with varying numbers and types of
competing talkers by normal-hearing, cochlear-implant, and implant simulation
subjects. The Journal of the Acoustical Society of America, 123, 450–461.
Delgutte, B. Representation of speech-like sounds in the discharge patterns of auditory-nerve
fibers. , 16.
Delgutte, B., Hammond, B., Cariani, P. (1998). Neural coding of the temporal envelope of
speech: Relation to modulation transfer functions. Psychophysical and Physiological
Advances in Hearing.
Delgutte, B., Kiang, N.Y.S. (1984). Speech coding in the auditory nerve: V. Vowels in background
noise. The Journal of the Acoustical Society of America, 75, 908–918.
Deutsch, D. (2007). Music Perception. Front. Biosci, 12, 4473–4482.
Djourno, A., Eyries, C. (1957). Auditory prosthesis by means of a distant electrical stimulation of
the sensory nerve with the use of an indwelt coiling. Presse Med (1893), 65, 1417.
Donaldson, G.S., Kreft, H.A., Litvak, L. (2005). Place-pitch discrimination of single- versus dual-
electrode stimuli by cochlear implant users. The Journal of the Acoustical Society of
America, 118, 623–626.
Donnelly, P.J., Guo, B.Z., Limb, C.J. (2009). Perceptual fusion of polyphonic pitch in cochlear
implant users. The Journal of the Acoustical Society of America, 126, EL128–EL133.
Dorman, M.F., Basham, M., G.E.A.R.Y., et al. (1991). Speech understanding and music
appreciation with the Ineraid cochlear implant. The Hearing Journal, 44.6, 34–37.
Dorman, M.F., Wilson, B.S. (2004). The Design and Function of Cochlear Implants: Fusing
medicine, neural science and engineering, these devices transform human speech into
an electrical code that deafened ears can understand. American Scientist, 92, 436–445.
Dowling, W.J., Harwood, J.L. (1986). Music Cognition 1st Edn., Orlando, FL: Academic Press.
Drennan, W.R., Won, J.H., Nie, K., et al. (2010). Sensitivity of psychophysical measures to signal
processor modifications in cochlear implant users. Hearing Research, 262, 1–8.
Drullman, R., Bronkhorst, A.W. (2004). Speech perception and talker segregation: Effects of
level, pitch, and tactile support with multiple simultaneous talkers. The Journal of the
Acoustical Society of America, 116, 3090–3098.
E, V.T. (1968). Uber Akustische Rauhigkeit und Schwankungsstarke. Acustica, 20, 215–224.
139
Eisenberg, L.S. (1982). Use of the cochlear implant by the prelingually deaf. Ann Otol Rhinol
Laryngol Suppl, 91, 62–66.
Evans, E., Palmer, A.R. (1980). Relationship between the dynamic range of cochlear nerve fibres
and their spontaneous activity. Experimental brain research, 115–118.
Fant, G. (1960). Acoustic theory of speech perception. Mouton, The Hague.
Faulkner, A., Rosen, S., Wilkinson, L. (2001). Effects of the Number of Channels and Speech-to-
Noise Ratio on Rate of Connected Discourse Tracking Through a Simulated Cochlear
Implant Speech Processor. Ear and Hearing, 22, 431–438.
Firestone, G.M., McGuire, K., Liang, C., et al. (2020). A Preliminary Study of the Effects of
Attentive Music Listening on Cochlear Implant Users’ Speech Perception, Quality of Life,
and Behavioral and Objective Measures of Frequency Change Detection. Frontiers in
Human Neuroscience, 14, 110.
Firszt, J.B., Koch, D.B., Downing, M., et al. (2007). Current Steering Creates Additional Pitch
Percepts in Adult Cochlear Implant Recipients. Otology & Neurotology, 28, 629.
Fishman Kim E., Shannon Robert V., Slattery William H. (1997). Speech Recognition as a
Function of the Number of Electrodes Used in the SPEAK Cochlear Implant Speech
Processor. Journal of Speech, Language, and Hearing Research, 40, 1201–1215.
Flores, E.N., Duggan, A., Madathany, T., et al. (2015). A Non-canonical Pathway from Cochlea to
Brain Signals Tissue-Damaging Noise. Current Biology, 25, 606–612.
Forrest, T.G., Green, D.M. (1987). Detection of partially filled gaps in noise and the temporal
modulation transfer function. The Journal of the Acoustical Society of America, 82,
1933–1943.
Friesen, L.M., Shannon, R.V., Baskent, D., et al. (2001). Speech recognition in noise as a function
of the number of spectral channels: Comparison of acoustic hearing and cochlear
implants. The Journal of the Acoustical Society of America, 110, 1150–1163.
Fu, Q.-J., Shannon, R.V., Wang, X. (1998). Effects of noise and spectral resolution on vowel and
consonant recognition: Acoustic and electric hearing. The Journal of the Acoustical
Society of America, 104, 3586–3596.
Galvin, J.J., Fu, Q.-J., Oba, S.I. (2009). Effect of a competing instrument on melodic contour
identification by cochlear implant users. The Journal of the Acoustical Society of
America, 125, EL98–EL103.
140
Garadat, S.N., Zwolan, T.A., Pfingst, B.E. (2012). Across-site patterns of modulation detection:
Relation to speech recognition. The Journal of the Acoustical Society of America, 131,
4030–4041.
Geneva: World Health Organization (2018). Addressing the rising prevalence of hearing loss.
Gfeller, K., Driscoll, V., Smith, R.S., et al. (2012). The Music Experiences and Attitudes of a First
Cohort of Prelingually Deaf Adolescent and Young Adult Cochlear Implant Recipients.
Semin Hear, 33, 346–360.
Gfeller, K., Lansing, C.R. (1991). Melodic, Rhythmic, and Timbral Perception of Adult Cochlear
Implant Users. Journal of Speech, Language, and Hearing Research, 34, 916–920.
Gfeller, K., Oleson, J., Knutson, J.F., et al. (2008). Multivariate Predictors of Music Perception
and Appraisal by Adult Cochlear Implant Users. J Am Acad Audiol, 19, 120–134.
Gfeller, K., Turner, C., Oleson, J., et al. (2007). Accuracy of Cochlear Implant Recipients on Pitch
Perception, Melody Recognition, and Speech Reception in Noise. Ear and Hearing, 28,
412.
Gfeller, K.E., Olszewski, C., Turner, C., et al. (2006). Music Perception with Cochlear Implants
and Residual Hearing. AUD, 11, 12–15.
Goldberg, J.M., Brown, P.B. (1968). Functional organization of the dog superior olivary complex:
an anatomical and electrophysiological study. Journal of Neurophysiology, 31, 639–656.
Goldsworthy, R.L. (2022). Computational Modeling of Synchrony in the Auditory Nerve in
Response to Acoustic and Electric Stimulation. Frontiers in Computational Neuroscience,
16. Available at: https://www.frontiersin.org/articles/10.3389/fncom.2022.889992
[Accessed February 7, 2023].
Goldsworthy, R.L. (2015). Correlations Between Pitch and Phoneme Perception in Cochlear
Implant Users and Their Normal Hearing Peers. JARO, 16, 797–809.
Goldsworthy, R.L., Bissmeyer, S.R.S. (in press). Cochlear Implant Users can Effectively Combine
Place and Timing Cues for Pitch Perception. The Journal of the Acoustical Society of
America.
Goldsworthy, R.L., Delhorne, L.A., Braida, L.D., et al. (2013). Psychoacoustic and Phoneme
Identification Measures in Cochlear-Implant and Normal-Hearing Listeners. Trends in
Amplification, 17, 27–44.
Goldwyn, J.H., Rubinstein, J.T., Shea-Brown, E. (2012). A point process framework for modeling
electrical stimulation of the auditory nerve. Journal of Neurophysiology, 108, 1430–
1452.
141
Grimault, N., Bacon, S.P., Micheyl, C. (2002). Auditory stream segregation on the basis of
amplitude-modulation rate. The Journal of the Acoustical Society of America, 111, 1340–
1348.
Grimault, N., Micheyl, C., Carlyon, R.P., et al. (2000). Influence of peripheral resolvability on the
perceptual segregation of harmonic complex tones differing in fundamental frequency.
The Journal of the Acoustical Society of America, 108, 263–271.
Gutschalk, A., Oxenham, A.J., Micheyl, C., et al. (2007). Human Cortical Activity during
Streaming without Spectral Cues Suggests a General Neural Substrate for Auditory
Stream Segregation. J. Neurosci., 27, 13074–13081.
Hartmann, W.M., Johnson, D. (1991). Stream Segregation and Peripheral Channeling. MUSIC
PERCEPT, 9, 155–183.
He, S., Skidmore, J., Koch, B., et al. (2023). Relationships Between the Auditory Nerve Sensitivity
to Amplitude Modulation, Perceptual Amplitude Modulation Rate Discrimination
Sensitivity, and Speech Perception Performance in Postlingually Deafened Adult
Cochlear Implant Users. Ear and Hearing, 44, 371.
van Hemmen, J.L. (2013). Vector strength after Goldberg, Brown, and von Mises: biological and
mathematical perspectives. Biol Cybern, 107, 385–396.
Henning, G.B., Ashton, J. (1981). The effect of carrier and modulation frequency on
lateralization based on interaural phase and interaural group delay. Hearing Research, 4,
185–194.
Hillenbrand, J., Getty, L.A., Clark, M.J., et al. (1995). Acoustic characteristics of American English
vowels. The Journal of the Acoustical Society of America, 97, 3099–3111.
Hochmair, I., Nopp, P., Jolly, C., et al. (2006). MED-EL Cochlear Implants: State of the Art and a
Glimpse Into the Future. Trends in Amplification, 10, 201–219.
van den Honert, C., Kelsall, D.C. (2007). Focused intracochlear electric stimulation with phased
array channels. The Journal of the Acoustical Society of America, 121, 3703–3716.
Horst, J.W., Javel, E., Farley, G.R. (1990). Coding of spectral fine structure in the auditory nerve.
II: Level‐dependent nonlinear responses. The Journal of the Acoustical Society of
America, 88, 2656–2681.
Horst, J.W., Javel, E., Farley, G.R. (1985). Extraction and enhancement of spectral structure by
the cochlea. The Journal of the Acoustical Society of America, 78, 1898–1901.
Hossain, S., Goldsworthy, R.L. (2018). Factors Affecting Speech Reception in Background Noise
with a Vocoder Implementation of the FAST Algorithm. JARO, 19, 467–478.
142
House, D. (1994). Perception and production of mood in speech by cochlear implant users. In
Proceedings of the International Conference on Spoken Language Processing. (pp. 2051–
2054).
James, C.J., Skinner, M.W., Martin, L.F.A., et al. (2003). An Investigation of Input Level Range for
the Nucleus 24 Cochlear Implant System: Speech Perception Performance, Program
Preference, and Loudness Comfort Ratings. Ear and Hearing, 24, 157.
Jeng, F.-C., Abbas, P.J., Hu, N., et al. (2009). Effects of temporal properties on compound action
potentials in response to amplitude-modulated electric pulse trains in guinea pigs.
Hearing Research, 247, 47–59.
Johnson, D.H. (1980). The relationship between spike rate and synchrony in responses of
auditory‐nerve fibers to single tones. The Journal of the Acoustical Society of America,
68, 1115–1122.
de Jong, M.A.M., Briaire, J.J., Frijns, J.H.M. (2017). Take-Home Trial Comparing Fast Fourier
Transformation-Based and Filter Bank-Based Cochlear Implant Speech Coding Strategies.
BioMed Research International, 2017, e7915042.
Joris, P.X., Schreiner, C.E., Rees, A. (2004). Neural Processing of Amplitude-Modulated Sounds.
Physiological Reviews, 84, 541–577.
Joris, P.X., Yin, T.C.T. (1992). Responses to amplitude‐modulated tones in the auditory nerve of
the cat. The Journal of the Acoustical Society of America, 91, 215–232.
Kaernbach, C. (2001). Adaptive threshold estimation with unforced-choice tasks. Perception &
Psychophysics, 63, 1377–1388.
Kaernbach, C. (1991). Simple adaptive testing with the weighted up-down method. Perception
& Psychophysics, 49, 227–229.
Kim, D.O., Molnar, C.E. (1979). A population study of cochlear nerve fibers: comparison of
spatial distributions of average-rate and phase-locking measures of responses to single
tones. Journal of Neurophysiology, 42, 16–30.
Knobloch, M., Verhey, J.L., Ziese, M., et al. (2018). Musical Harmony in Electric Hearing. Music
Perception, 36, 40–52.
Koch, D.B., Downing, M., Osberger, M.J., et al. (2007). Using Current Steering to Increase
Spectral Resolution in CII and HiRes 90K Users. Ear and Hearing, 28, 38S.
Koch, D.B., Osberger, M.J., Segel, P., et al. (2004). HiResolutionTM and Conventional Sound
Processing in the HiResolutionTM Bionic Ear: Using Appropriate Outcome Measures to
Assess Speech Recognition Ability. AUD, 9, 214–223.
143
Kolmer, W. (1909). Histologische Studien am Labyrinth 74th ed., Arch. Mikroskop. Anat.
Kong, Y.-Y., Mullangi, A., Marozeau, J., et al. (2011). Temporal and Spectral Cues for Musical
Timbre Perception in Electric Hearing. Journal of Speech, Language, and Hearing
Research, 54, 981–994.
Krishna, B.S., Semple, M.N. (2000). Auditory Temporal Processing: Responses to Sinusoidally
Amplitude-Modulated Tones in the Inferior Colliculus. Journal of Neurophysiology, 84,
255–273.
Kumaresan, R., Peddinti, V.K., Cariani, P. (2013). Synchrony capture filterbank: Auditory-inspired
signal processing for tracking individual frequency components in speech. The Journal of
the Acoustical Society of America, 133, 4290–4310.
Lamping, W., Goehring, T., Marozeau, J., et al. (2020). The effect of a coding strategy that
removes temporally masked pulses on speech perception by cochlear implant users.
Hearing Research, 391, 107969.
Langner, G. (1992). Periodicity coding in the auditory system. Hearing Research, 60, 115–142.
Langner, G., Schreiner, C.E. (1988). Periodicity coding in the inferior colliculus of the cat. I.
Neuronal mechanisms. Journal of Neurophysiology, 60, 1799–1822.
Liberman, M.C. (1978). Auditory‐nerve response from cats raised in a low‐noise chamber. The
Journal of the Acoustical Society of America, 63, 442–455.
Limb, C.J., Roy, A.T. (2014). Technological, biological, and acoustical constraints to music
perception in cochlear implant users. Hearing Research, 308, 13–26.
Litvak, L.M., Spahr, A.J., Emadi, G. (2007). Loudness growth observed under partially tripolar
stimulation: Model and data from cochlear implant listeners. The Journal of the
Acoustical Society of America, 122, 967–981.
Loizou, P.C. (1998). Mimicking the human ear. IEEE Signal Processing Magazine, 15, 101–130.
Loizou, P.C. (1997). Signal processing for cochlear prosthesis: a tutorial review. In Proceedings
of 40th Midwest Symposium on Circuits and Systems. Dedicated to the Memory of
Professor Mac Van Valkenburg. Proceedings of 40th Midwest Symposium on Circuits
and Systems. Dedicated to the Memory of Professor Mac Van Valkenburg. (pp. 881–885
vol.2).
Looi, V., Gfeller, K., Driscoll, V.D. (2012). Music Appreciation and Training for Cochlear Implant
Recipients: A Review. Semin Hear, 33, 307–334.
144
Looi, V., She, J. (2010). Music perception of cochlear implant users: A questionnaire, and its
implications for a music training program. International Journal of Audiology, 49, 116–
128.
LoPresto, M.C. (2015). Measuring Musical Consonance and Dissonance. The Physics Teacher, 53,
225–229.
Luo, X., Fu, Q.-J., Galvin, J.J. (2007). Cochlear Implants Special Issue Article: Vocal Emotion
Recognition by Normal-Hearing Listeners and Cochlear Implant Users. Trends in
Amplification, 11, 301–315.
Luo, X., Fu, Q.-J., Wei, C.-G., et al. (2008). Speech Recognition and Temporal Amplitude
Modulation Processing by Mandarin-Speaking Cochlear Implant Users. Ear and Hearing,
29, 957.
McAdams, S., Winsberg, S., Donnadieu, S., et al. (1995). Perceptual scaling of synthesized
musical timbres: Common dimensions, specificities, and latent subject classes. Psychol.
Res, 58, 177–192.
McCabe, S.L., Denham, M.J. (1996). A Model of Auditory Streaming. In D. S. Touretzky, M. C.
Mozer, & M. E. Hasselmo, eds. Advances in Neural Information Processing Systems 8.
(pp. 52–58). MIT Press. Available at: http://papers.nips.cc/paper/1026-a-model-of-
auditory-streaming.pdf [Accessed May 1, 2019].
McDermott, H.J., McKay, C.M., Vandali, A.E. (1992). A new portable sound processor for the
University of Melbourne/Nucleus Limited multielectrode cochlear implant. The Journal
of the Acoustical Society of America, 91, 3367–3371.
McDermott, J.H., Lehr, A.J., Oxenham, A.J. (2010). Individual Differences Reveal the Basis of
Consonance. Current Biology, 20, 1035–1041.
McDermott, J.H., Oxenham, A.J. (2008). Spectral completion of partially masked sounds.
Proceedings of the National Academy of Sciences, 105, 5939–5944.
McKay, C.M., McDermott, H.J., Carlyon, R.P. (2000). Place and temporal cues in pitch
perception: are they truly independent? Acoustics Research Letters Online, 1, 25–30.
Milczynski, M., Chang, J.E., Wouters, J., et al. (2012). Perception of Mandarin Chinese with
cochlear implants using enhanced temporal pitch cues. Hearing Research, 285, 1–12.
Milczynski, M., Wouters, J., Wieringen, A. van (2009). Improved fundamental frequency coding
in cochlear implant signal processing. The Journal of the Acoustical Society of America,
125, 2260.
145
Moore Brian C. J., Gockel Hedwig E. (2012). Properties of auditory stream formation.
Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 919–931.
Müllensiefen, D., Gingras, B., Musil, J., et al. (2014). The Musicality of Non-Musicians: An Index
for Assessing Musical Sophistication in the General Population. PLOS ONE, 9, e89642.
Murray, I.R., Arnott, J.L. (1993). Toward the simulation of emotion in synthetic speech: A review
of the literature on human vocal emotion. The Journal of the Acoustical Society of
America, 93, 1097–1108.
Nelson, P.C., Carney, L.H. (2007). Neural Rate and Timing Cues for Detection and Discrimination
of Amplitude-Modulated Tones in the Awake Rabbit Inferior Colliculus. Journal of
Neurophysiology, 97, 522–539.
NIDCD (2021). Cochlear Implants. NIDCD. Available at:
https://www.nidcd.nih.gov/health/cochlear-implants [Accessed November 10, 2022].
Nogueira, W., Litvak, L., Edler, B., et al. (2009). Signal Processing Strategies for Cochlear
Implants Using Current Steering. EURASIP J. Adv. Signal Process., 2009, 531213.
Olsen, W.O. (1998). Average Speech Levels and Spectra in Various Speaking/Listening
Conditions. American Journal of Audiology, 7, 21–25.
Oxenham, A.J. (2012). Pitch Perception. J. Neurosci., 32, 13335–13338.
Oxenham, A.J., Micheyl, C., Keebler, M.V., et al. (2011). Pitch perception beyond the traditional
existence region of pitch. Proceedings of the National Academy of Sciences, 108, 7629–
7634.
Paredes-Gallardo, A., Madsen, S.M.K., Dau, T., et al. (2018). The Role of Temporal Cues in
Voluntary Stream Segregation for Cochlear Implant Users. Trends in Hearing, 22,
2331216518773226.
Patil, K., Pressnitzer, D., Shamma, S., et al. (2012). Music in Our Ears: The Biological Bases of
Musical Timbre Perception. PLOS Computational Biology, 8, e1002759.
Patterson, R.D., Johnson‐Davies, D., Milroy, R. (1978). Amplitude‐modulated noise: The
detection of modulation versus the detection of modulation rate. The Journal of the
Acoustical Society of America, 63, 1904–1911.
Peng, Z.E., Waz, S., Buss, E., et al. (2022). FORUM: Remote testing for psychological and
physiological acoustics. The Journal of the Acoustical Society of America, 151, 3116–
3128.
146
Penninger, R., Kludt, E., Limb, C.J., et al. (2014). Perception of Polyphony With Cochlear
Implants for 2 and 3 Simultaneous Pitches. Otology & Neurotology, 431–436.
Pereira, C. (2000). The perception of vocal affect by cochlear implantees. Thieme Medica, 343–
345.
Plack, C.J., Oxenham, A.J. (2005). The Psychophysics of Pitch. In C. J. Plack, R. R. Fay, A. J.
Oxenham, et al., eds. Pitch: Neural Coding and Perception. Springer Handbook of
Auditory Research. (pp. 7–55). New York, NY: Springer. Available at:
https://doi.org/10.1007/0-387-28958-5_2 [Accessed July 28, 2021].
Plomp, R., Steeneken, H. (1971). Pitch versus Timbre. In Seventh International Congress on
Acoustics.
Pressnitzer, D., Sayles, M., Micheyl, C., et al. (2008). Perceptual Organization of Sound Begins in
the Auditory Periphery. Current Biology, 18, 1124–1128.
Qin, M.K., Oxenham, A.J. (2005). Effects of Envelope-Vocoder Processing on F0 Discrimination
and Concurrent-Vowel Identification. Ear and Hearing, 26, 451–460.
Qin, M.K., Oxenham, A.J. (2003). Effects of simulated cochlear-implant processing on speech
reception in fluctuating maskers. The Journal of the Acoustical Society of America, 114,
446–454.
Rees, A., Langner, G. (2005). Temporal Coding in the Auditory Midbrain. In J. A. Winer & C. E.
Schreiner, eds. The Inferior Colliculus. (pp. 346–376). New York, NY: Springer. Available
at: https://doi.org/10.1007/0-387-27083-3_12 [Accessed February 8, 2023].
Riss, D., Hamzavi, J.-S., Blineder, M., et al. (2014). FS4, FS4-p, and FSP: A 4-Month Crossover
Study of 3 Fine Structure Sound-Coding Strategies. Ear and Hearing, 35, e272.
Ritsma, R.J. (1962). Existence Region of the Tonal Residue. I. The Journal of the Acoustical
Society of America, 34, 1224–1229.
Roberts, B., Glasberg, B.R., Moore, B.C.J. (2002). Primitive stream segregation of tone
sequences without differences in fundamental frequency or passband. The Journal of
the Acoustical Society of America, 112, 2074–2085.
Rossing, T.D. (1989). The Science of Sound, Addison Wesley: Reading, MA.
Sachs, M.B., Bruce, I.C., Miller, R.L., et al. (2002). Biological Basis of Hearing-Aid Design. Annals
of Biomedical Engineering, 30, 157–168.
Saunders, J.E., Francis, H.W., Skarzynski, P.H. (2016). Measuring Success: Cost-Effectiveness and
Expanding Access to Cochlear Implantation. Otology & Neurotology, 37, e135.
147
Schorr, E.A., Fox, N.A., Roth, F.P. (2004). Social and emotional functioning of children with
cochlear implants: description of the sample. International Congress Series, 1273, 372–
375.
Shackleton, T.M., Carlyon, R.P. (1994). The role of resolved and unresolved harmonics in pitch
perception and frequency modulation discrimination. The Journal of the Acoustical
Society of America, 95, 3529–3540.
Smith, Z.M., Kan, A., Jones, H.G., et al. (2014). Hearing better with interaural time differences
and bilateral cochlear implants. The Journal of the Acoustical Society of America, 135,
2190–2191.
Spitzer, J., Mancuso, D., Cheng, M.-Y. (2008). Development of a Clinical Test of Musical
Perception: Appreciation of Music in Cochlear Implantees (AMICI). Journal of the
American Academy of Audiology, 19.1, 56–81.
Srinivasan, A.G., Padilla, M., Shannon, R.V., et al. (2013). Improving speech perception in noise
with current focusing in cochlear implant users. Hearing Research, 299, 29–36.
Stickney, G.S., Assmann, P.F., Chang, J., et al. (2007). Effects of cochlear implant processing and
fundamental frequency on the intelligibility of competing sentences. The Journal of the
Acoustical Society of America, 122, 1069–1078.
Stickney, G.S., Zeng, F.-G., Litovsky, R., et al. (2004). Cochlear implant speech recognition with
speech maskers. The Journal of the Acoustical Society of America, 116, 1081–1091.
Swanson, B., Mauch, H. (2006). Nucleus Matlab Toolbox 4.20 software user manual.
Tejani, V.D., Abbas, P.J., Brown, C.J. (2017). Relationship Between Peripheral and
Psychophysical Measures of Amplitude Modulation Detection in Cochlear Implant Users.
Ear and Hearing, 38, e268.
Tramo, M.J., Cariani, P.A., Delgutte, B., et al. (2001). Neurobiological Foundations for the
Theory of Harmony in Western Tonal Music. Annals of the New York Academy of
Sciences, 930, 92–116.
Vandali, A., Dawson, P., Au, A., et al. (2019). Evaluation of the Optimized Pitch and Language
Strategy in Cochlear Implant Recipients. Ear and Hearing, 40, 555–567.
Vandali, A.E., Dawson, P.W., Arora, K. (2017). Results using the OPAL strategy in Mandarin
speaking cochlear implant recipients. International Journal of Audiology, 56, S74–S85.
Vandali, A.E., van Hoesel, R.J.M. (2011). Development of a temporal fundamental frequency
coding strategy for cochlear implants. The Journal of the Acoustical Society of America,
129, 4023–4036.
148
Venter, P.J., Hanekom, J.J. (2014). Is There a Fundamental 300 Hz Limit to Pulse Rate
Discrimination in Cochlear Implants? JARO, 15, 849–866.
Viemeister, N.F. (1979). Temporal modulation transfer functions based upon modulation
thresholds. The Journal of the Acoustical Society of America, 66, 1364–1380.
Viemeister, N.F., Plack, C.J. (1993). Time Analysis. Human Psychophysics, 116–154.
Vliegen, J., Moore, B.C.J., Oxenham, A.J. (1999). The role of spectral and periodicity cues in
auditory stream segregation, measured using a temporal discrimination task. The
Journal of the Acoustical Society of America, 106, 938–945.
Vliegen, J., Oxenham, A.J. (1999). Sequential stream segregation in the absence of spectral
cues. The Journal of the Acoustical Society of America, 105, 339–346.
Wang, X., Walker, K.M.M. (2012). Neural Mechanisms for the Abstraction and Use of Pitch
Information in Auditory Cortex. J. Neurosci., 32, 13339–13342.
Wever, E.G., Bray, C.W. (1930). The nature of acoustic response: The relation between sound
frequency and frequency of impulses in the auditory nerve. Journal of Experimental
Psychology, 13, 373–387.
Williams, C., Stevens, K. (1972). Emotions and speech: some acoustical correlates. J Acoust Soc
Am, 50, 1238–1250.
Wilson, B.S. (2004). Engineering Design of Cochlear Implants. In F.-G. Zeng, A. N. Popper, & R. R.
Fay, eds. Cochlear Implants: Auditory Prostheses and Electric Hearing. Springer
Handbook of Auditory Research. (pp. 14–52). New York, NY: Springer. Available at:
https://doi.org/10.1007/978-0-387-22585-2_2 [Accessed November 11, 2022].
Wilson, B.S., Dorman, M.F. (2008). Cochlear implants: A remarkable past and a brilliant future.
Hearing Research, 242, 3–21.
Wilson, R.H., McArdle, R., Watts, K.L., et al. (2012). The Revised Speech Perception in Noise Test
(R-SPIN) in a multiple signal-to-noise ratio paradigm. J Am Acad Audiol, 23, 590–605.
Wouters, J., McDermott, H.J., Francart, T. (2015). Sound Coding in Cochlear Implants: From
electric pulses to hearing. IEEE Signal Processing Magazine, 32, 67–80.
Wyatt, J.R., Niparko, J.K., Rothman, M.L., et al. (1995). Cost effectiveness of the multichannel
cochlear implant. Am J Otol, 16, 52–62.
Yildirim, S., Bulut, M., Lee, C. (2004). An acoustic study of emotions expressed in speech. In
Proceedings of the International Conference on Spoken Language Processing.
149
Zeng, F.-G. (2002). Temporal pitch in electric hearing. Hearing Research, 174, 101–106.
Zeng, F.-G. (2004). Trends in Cochlear Implants. Trends in Amplification, 8, 1–34.
Zeng, F.-G., Grant, G., Niparko, J., et al. (2002). Speech dynamic range and its effect on cochlear
implant performance. The Journal of the Acoustical Society of America, 111, 377–386.
Zeng, F.-G., Rebscher, S., Harrison, W., et al. (2008). Cochlear Implants: System Design,
Integration, and Evaluation. IEEE Reviews in Biomedical Engineering, 1, 115–142.
Zilany, M.S.A., Bruce, I.C., Carney, L.H. (2014). Updated parameters and expanded simulation
options for a model of the auditory periphery. The Journal of the Acoustical Society of
America, 135, 283–286.
Zwicker, E., Fastl, H. (2013). Psychoacoustics: Facts and Models, Springer Science & Business
Media.
Abstract (if available)
Abstract
Cochlear implants (CIs) restore hearing in people with sensorineural hearing loss and largely rehabilitate speech understanding without the need for visual cues. That speech understanding can be restored through an engineered device is impressive; however, most recipients still express dissatisfaction with their hearing outcomes. In particular, cochlear implant users commonly report difficulties engaging with music as well as listening in noisy environments. These tasks are shaped by a number of acoustic cues, with a common and prominent contributor being pitch. However, pitch is poorly conveyed by CI sound processing and leave cochlear implant users with a limited ability to hear sharply and robustly. The purpose of this thesis is to characterize the features of pitch that contribute to hearing performance and to investigate methods for improving hearing outcomes in recipients of the cochlear implant. The first study investigates the features of pitch that contribute to the perception of musical consonance and dissonance. In particular, we hypothesize that sensitivity to amplitude modulation is a driving factor in the perceived pleasantness of musical harmony. The second study follows this framework to characterize the features of pitch that facilitate stream segregation and investigates the relationship between low-level psychophysical thresholds for pitch with the ability to stream segregate. The third study continues in this framework and investigates the relationship between modulation sensitivity and pitch resolution. The final study takes a step beyond this characterization and demonstrates improved hearing outcomes in cochlear implant users through enhancement of temporal envelope cues. Taken together, these studies point to sensitivity to low-level pitch cues as a limiting factor in hearing performance and encourage advancements in sound processing that preserve the spectral and temporal cues defining pitch.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Understanding music perception with cochlear implants with a little help from my friends, speech and hearing aids
PDF
Improving frequency resolution in cochlear implants with implications for cochlear implant front- and back-end processing
PDF
Axon guidance cues in development of the mammalian auditory circuit
PDF
Effects of air polishing for the treatment of peri-implant diseases: a systematic review and meta-analysis
PDF
Prosthetic vision in blind human patients: Predicting the percepts of epiretinal stimulation
PDF
The planning, production, and perception of prosodic structure
PDF
The effect of vertical level discrepancy of adjacent dental implants on crestal bone resorption: a retrospective radiographic analysis
PDF
Association between the depth of implant cover screw and the marginal bone loss with the use of a removable provisional restoration: a retrospective two-dimensional radiographic evaluation
PDF
Beyond the sea that separates: francophone voices of Oceania
PDF
Task-dependent modulation of corticomuscular coherence during dexterous manipulation
PDF
Can I make the time or is time running out? That depends in part on how I think about difficulty
PDF
Mechanical design and preclinical testing of a percutaneously implantable fetal micropacemaker
PDF
Investigating the effects of Pavlovian cues on social behavior
PDF
Prenatal and brain factors shape appetite regulation and weight from childhood onwards
PDF
The neuroscience of ambivalent and ambiguous feelings
PDF
The role of nucleotide excision repair in cisplatin chemotherapy-induced deafness
PDF
Mapping multi-scale connectivity of the mouse posterior parietal cortex
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Predicting CPAP adherence by a sleep study: a wavelet-based deep learning approach
PDF
Physiology of the inner ear: the role of the biophysical properties of spiral ganglion neurons in encoding sound intensity information at the auditory nerve
Asset Metadata
Creator
Camarena, Andres
(author)
Core Title
Did you get all that? Encoding of amplitude modulations at the auditory periphery predicts hearing outcomes
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Degree Conferral Date
2023-08
Publication Date
06/05/2023
Defense Date
05/04/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cochlear implants,hearing loss,OAI-PMH Harvest,pitch perception,psychophysics
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Loeb, Gerald (
committee chair
), Goldsworthy, Raymond (
committee member
), Litvak, Leonid (
committee member
), Narayanan, Shrikanth (
committee member
)
Creator Email
andresc@usc.edu,arcamare94@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113147787
Unique identifier
UC113147787
Identifier
etd-CamarenaAn-11924.pdf (filename)
Legacy Identifier
etd-CamarenaAn-11924
Document Type
Dissertation
Format
theses (aat)
Rights
Camarena, Andres
Internet Media Type
application/pdf
Type
texts
Source
20230606-usctheses-batch-1052
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
cochlear implants
hearing loss
pitch perception
psychophysics