Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Understanding music perception with cochlear implants with a little help from my friends, speech and hearing aids
(USC Thesis Other)
Understanding music perception with cochlear implants with a little help from my friends, speech and hearing aids
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1 Understanding Music Perception with Cochlear Implants with a Little Help from My Friends, Speech and Hearing Aids Joseph David Crew in partial fulfillment of the requirements for the degree of Doctor of Philosophy Biomedical Engineering University of Southern California Los Angeles, California May 2016 2 Thinking of you, Dad... With Love to Momma and Kyle and Michael 3 Acknowledgements A very special thanks to my good friend and unsuspecting mentor, John Galvin. I will always remember these years and these arguments. Thank you, Qian-Jie Fu, Bob Shannon, and David Landsberger and the rest of the extended House Ear Institute (House Mafia) family for your generous help and guidance. Thank you to my dissertation committee members and USC Biomedical Engineering faculty and staff and friends. 4 Executive Abstract Cochlear implants (CIs) provide good speech understanding to profoundly deaf individuals but do not provide good pitch perception, a critical component of music perception and enjoyment. The cues necessary for speech understanding are fairly well understood, but pitch perception remains a critical area of research in CIs. In this thesis, experimental data is presented that attempts to further explore pitch and music perception with CIs. In the first experiment, the influence of channel interaction or spectral smearing on melodic pitch perception was examined. The results indicated that the spectral envelope was used to rank the pitch of a stimulus in simulated CIs as CI processing removes the fine spectral cues and harmonic relationships. In the second experiment, the effect of adding a hearing aid (HA) in addition to a CI was examined for both speech-in-noise and melodic pitch perception. The addition of a HA improved speech perception slightly but drastically improved pitch perception. The results indicated that the fine-structure frequency cues provided by a HA contributed strongly to pitch perception, even in the presence of a competing instrument. In the third experiment, a test database of acoustic stimuli was created. The stimuli set consisted of 50 different words sung over an octave range. Thus the stimuli contained simultaneous pitch and speech information. This database was tested for normal hearing subjects, divided into two groups based on musicianship. There was no effect of musicianship on speech, but there was a large effect of musicianship for pitch perception. There were no differences across speech conditions, ranging from spoken utterances, to sung speech with a constant pitch, to sung speech with variable pitch. The melodic pitch conditions ranged from fixed timbre, either with a piano or the same word, to variable timbre, different words across a melodic contour. Non-musician performance was significantly worse relative to musicians for melodic pitch, and performance in the fixed timbre conditions was significantly better than in the variable timbre conditions. In the fourth experiment, this newly created database was tested with CI+HA users as in the second experiment. In general, performance worsened as the tasks became increasingly difficult, but combined device use improved speech and melodic pitch perception. The bulk of speech information was provided by the CI, but the HA provided the bulk of melodic pitch cues similar to the results observed in the second experiment. Pitch perception was much more difficult with variable timbre than fixed timbre, even with a HA. This suggested that CI+HA users still lack critical pitch processing abilities. The results suggested that CI users lack the necessary auditory cues for good pitch processing, especially in more complex music listening situations (e.g., polyphonic music, lyrical melodies). Leveraging residual acoustic hearing via a HA can help improve pitch perception. And while CI listeners may learn to use alternative cues (spectral envelope, repetition rate cues) to perform a pitch perception task, the restoration of the missing cues (namely fine-structure and harmonic relationships) is the ultimate goal for CI device design. This improvement would likely contribute to improved quality of sound with a CI. 5 Table of Contents Acknowledgements ....................................................................................................................................... 3 Executive Abstract ........................................................................................................................................ 4 General Introduction..................................................................................................................................... 8 Cochlear Implants Are Good for Speech in Quiet ..................................................................................... 8 Cochlear Implants Are Limited for Music.................................................................................................. 9 Pitch Perception is Complex and Difficult to Measure ............................................................................. 9 Introduction to the Following Experiments ............................................................................................ 10 Chapter 1 - Examining the Effect of Channel Interaction on Pitch Perception in Simulated Cochlear Implants ...................................................................................................................................................... 12 Introduction ............................................................................................................................................ 12 Methods .................................................................................................................................................. 12 Figure 1.1. Schematic signal processing for the acoustic CI simulation with channel interaction ..... 13 Figure 1.2. Frequency analysis of experimental stimuli ...................................................................... 14 Results ..................................................................................................................................................... 14 Figure 1.3. Mean MCI performance with the CI simulation ............................................................... 15 Discussion................................................................................................................................................ 15 Conclusion ............................................................................................................................................... 16 Chapter 2 - Speech and Music Perception with Cochlear Implants and Hearing Aids ............................... 17 Introduction ............................................................................................................................................ 17 Methods .................................................................................................................................................. 18 Figure 2.1. Spectrograms and electrodograms for the No Masker condition for 1- and 3-semitone spacings ............................................................................................................................................... 19 Figure 2.2. Spectrograms and electrodograms for the A3 and A6 Masker conditions ....................... 21 Results ..................................................................................................................................................... 22 Figure 2.3. MCI performance for individual subjects across hearing devices and masker conditions23 Figure 2.4. Speech-in-noise results for individual subjects across hearing devices ........................... 24 Discussion................................................................................................................................................ 25 Conclusion ............................................................................................................................................... 25 6 Chapter 3 – Development and Testing of the Sung Speech Corpus with Normal Hearing Musicians and Non-musicians............................................................................................................................................. 26 Introduction ............................................................................................................................................ 26 Methods .................................................................................................................................................. 26 Results ..................................................................................................................................................... 27 Figure 3.1. Box plots for sentence recognition and MCI scores for musicians and non-musicians .... 28 Discussion................................................................................................................................................ 28 Conclusion ............................................................................................................................................... 29 Chapter 4 – Sung Speech EAS ..................................................................................................................... 30 Introduction ............................................................................................................................................ 30 Methods .................................................................................................................................................. 31 Figure 4.1. Spectrograms and electrodograms for stimuli in the MCI task ........................................ 32 Figure 4.2. Spectrograms and electrodograms for stimuli in the Sentence Recognition task ............ 33 Results ..................................................................................................................................................... 34 Figure 4.3. MCI performance for individual subjects across hearing devices and timbre condition . 35 Figure 4.4. Sentence identifcation performance for individual subjects across hearing devices and timbre condition ................................................................................................................................. 37 Figure 4.5. Bimodal benefit for sung speech ...................................................................................... 38 Discussion................................................................................................................................................ 39 Speech Perception: Spoken vs. Sung and Constant Pitch vs. Changing Pitch ..................................... 39 Pitch Perception: Fixed Word vs. Random Sentences ........................................................................ 40 Bimodal Benefit for Speech ................................................................................................................ 40 Bimodal Benefit for Music .................................................................................................................. 41 Sung Speech Corpus vs. Previous Speech and Music Tasks ................................................................ 41 Conclusion ............................................................................................................................................... 42 General Discussion ...................................................................................................................................... 43 Limitations of Pitch in Cochlear Implants ............................................................................................... 43 What Have We Learned about Pitch Perception with Cochlear Implants? ............................................ 44 Improvements to Pitch Perception for Cochlear Implant Users ............................................................. 45 Restoring Harmonic Pitch in Cochlear Implants ..................................................................................... 46 References .................................................................................................................................................. 48 7 Appendix A: “Channel Interaction Limits Melodic Pitch Perception in Simulated Cochlear Implants.” JASA-EL 2012 ............................................................................................................................................... 51 Appendix B: “Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception.” PLOS ONE 2015 ........................................................................................................................................... 59 Appendix C: “Melodic Contour Identification and Sentence Recognition Using Sung Speech.” JASA-EL 2015 ............................................................................................................................................... 78 8 General Introduction Cochlear Implants Are Good for Speech in Quiet A cochlear implant (CI) is an incredible, life-changing medical device that restores the sensation of hearing to profoundly deaf individuals. A CI works by injecting electric currents into the cochlea which activate the spiral ganglion neurons that make up the auditory nerve. The brain interprets this neural activation pattern as sound and hearing. CIs owe their success to the fact that speech understanding is well conveyed by the device; it is now an expected clinical outcome that a patient will have good open- set speech recognition (McDermott 2004; Shannon et al., 2004). Originally, many scientists, engineers, physicians, and other medical experts did not believe that CIs would be able to provide good speech understanding on their own; these experts believed the device would serve as an aid to lip reading and nothing more. There is a long history of speech perception related research with CIs (Fu 2002; Rubinstein 2004 for review). A major understanding of why CIs are adequate for speech was from Shannon et al. (1995). In that study, the authors used vocoders as a model of CIs. Vocoders simulate CI processing for normal hearing (NH) listeners with nearly identical processing schemes. First, the acoustic stimulus is fed into a bank of band-pass filters; then, the temporal envelopes, the slow-varying energy over time of a particular frequency band, are extracted. Up to this point, the processing for CIs and vocoders is identical; the difference lies in the final output stage. In CIs, the temporal envelope information is used to amplitude modulate biphasic pulses at different electrodes or channels; in vocoders, the temporal envelopes are used to amplitude modulate band-pass filtered noise or sine waves corresponding to the center frequency of that band. Shannon et al. (1995) showed that good levels of speech understanding can be obtained with as few as four to eight channels. Today’s CIs process and stimulate 12 to 22 frequency channels which is good enough for speech in quiet. CIs provide temporal envelope information across many frequency channels (Fu 2002; Shannon et al., 2004). This broad spectral pattern, known as the spectral envelope, is also well conveyed by a CI, although CIs lack the spectral resolution of NH listeners (Friesen et al., 2001) and CI spectral resolution is highly subject dependent. The coarse spectral envelope and good temporal processing provide an approximate, if crude and somewhat distorted, representation of the acoustic peripheral activation pattern. But it is because of the ability of the human brain to recognize patterns over a wide range of inputs that allows for good levels of speech understanding with a CI. This may be intuitive as NH listeners can understand speech from a variety of individuals with different voice pitch fundamental frequencies (F0), different accents, and different vocal tract anatomies. While CIs provide good speech understanding in ideal settings, they struggle in more difficult, realistic listening environments. For example, while CI users can understand the lexical meaning of speech, they have difficulty identifying non-linguistic aspects of speech such as vocal emotion or voice gender (Luo et al., 2007). CI users also struggle in settings with many speakers and noisy environments (Fu and Nogaki, 9 2005). And different talkers can affect CI speech understanding, as CI users struggle with speaker normalization (Chang and Fu, 2006), the act of matching a particular external auditory pattern to some internal pattern template across many different talkers. At the moment, the fine details of speech are not well conveyed by CIs. Cochlear Implants Are Limited for Music While CIs are good for speech, many subjects claim that music perception and enjoyment is not adequate with the device. In speech, the auditory stimulus serves as an abstraction of some higher level thought, whereas, in music, there is no abstract representation; music is all about the fine details of the actual sound. Additionally, CI subjects generally say that the sound quality of the device is poor and unnatural; for some, this severely limits the enjoyment of sound and music with a CI (Gfeller et al., 2000). Upon activation, a common response is that the sound is high pitched and squeaky. This is likely due to the frequency warping from mapping low frequency temporal information to a higher frequency effective place of stimulation; this occurs because of the limited cochlear coverage, the extent to which electrodes can activate the lower characteristic frequency neurons. Subjects also report that the sound is alien and robotic; this likely comes from the fixed carrier nature of electric hearing. Music is a complex, multi-dimensional art form and sensation. While the quality of music with CIs is poor, there are certain aspects of music that are fairly well conveyed by the device. CIs provide good temporal processing as stated earlier, and they code sound level intensity fairly well. Intensity and timing can provide rhythmic cues in music, and CI listeners perform fairly well in simple rhythmic music tasks (Gfeller et al., 1997; McDermott 2004). Identifying instruments is more difficult for CI users, but they can correctly identify instruments across families (e.g., woodwind from strings) primarily based on the temporal envelope cues; the attack of an instrument is one of the strongest dimensions of timbre (Macherey and Delpierre, 2013) and this cue is well conveyed by a CI. However, the primary area of difficulty in music with CIs is melody identification, which depends strongly on pitch perception. Pitch perception is not only a fundamental component of music, it also is related to certain aspects of speech perception, namely, vocal emotion, lexical meaning in tonal languages, and speaker identification. Pitch Perception is Complex and Difficult to Measure Pitch perception is a longstanding debate in the auditory research community (Oxenham 2012 for a brief review). Much of this debate stems from disagreements regarding the experimental definition of pitch. One original definition of pitch is that it can be used to rank stimuli outside of loudness and duration. This definition suffers because it is not limited enough. Acoustic stimuli can be manipulated in a number of ways, and reliably discriminating one cue and assigning that dimension greater weight in a pitch 10 ranking task does not really elucidate singular properties of pitch. This is of great importance when considering non-harmonic, noise-like stimuli as in Burns and Viemeister (1981). A second definition of pitch, pitch is that which conveys a melody, is also limited in experimental contexts. Successfully conveying a melody is primarily measured by familiar melody recognition, or some generalization of that task such as familiar melody identification without rhythm cues (McDermott 2004). Matching an auditory pattern to some label suffers as previously because there are a number of cues that a listener could use to match a pattern. Previous experiments have shown there are at least three “pitch” cues that can be utilized to perform this task: 1. pitch related to harmonics (see pitch of the missing fundamental; Oxenham 2012), 2. pitch related to the temporal rates (see rate pitch in CIs; Zeng 2002), and 3. pitch related to the spectral centroid of non-harmonic stimuli (McDermott et al., 2008). Because the frequency of a stimulus corresponds to a location in the cochlea, the “place pitch” associated with cochlear location and “rate pitch” associated with the fine timing cues (e.g., zero- crossings) are co-varied in acoustic hearing. These two dimensions can be separated in CIs, and this is the basis for many pitch studies in CIs (Zeng, 2002). One issue with familiar melodies is the fact that they might not actually be familiar. Another is that it is difficult to measure the pitch resolution with this task because altering pitch changes the melody itself. Because of these factors, Melodic Contour Identification (MCI; Galvin et al., 2009) was created. MCI is a task in which the user identifies one of nine possible contours, related to pitch direction. The intervals, the number of semitones between two successive notes, can be varied to produce some measure of pitch resolution. In the following four experiments, MCI is used throughout as our measure of pitch perception. While it is imperfect, it is useful in approximating one’s pitch perception abilities. And thoughtful variations on the MCI task can yield strong insights. For musicians, the concepts and working models of pitch and melodies coincide with harmonicity, which is the combination of multiple pitches. Harmonicity relates to the intervals between the simultaneous notes, and there is a predictable pattern of pleasing and displeasing intervals and chords (three or more notes) that forms the basis of Western music. Thus, melodic pitch perception is not simply understood by successfully identifying direction (as in MCI or isolated pitch discrimination); it is important to maintain the proper frequency relationship of the two notes. Previous studies show CI users have little ability to perceive the interval relationships in a mistuned melody paradigm (Peretz et al., 2003). While this dissertation does not approach measuring the proper intervals for CI pitch, this remains the ultimate goal of CI processing and is likely related to harmonic pitch and fine-structure cues. Introduction to the Following Experiments In the following chapters, a number of experiments try to elucidate critical components of pitch perception in CIs. In Chapter 1, vocoders are again used a model of CIs; however, instead of simply changing the number of channels to approximate spectral resolution, the spectral smearing across 11 channels is manipulated. This simulates channel interactions and neural spreads of excitation that are an issue in electric hearing. Pitch perception is measured via MCI as a function of channel interaction. This is used to qualify the cues available to CI users in a pitch task. One way of adding the fine-structure cues to a CI user is to leverage residual acoustic hearing. This residual acoustic hearing, usually amplified with a hearing aid (HA), has been shown to improve certain aspects of speech perception and music perception (Dorman et al., 2008). In Chapter 2, bimodal CI users, those with a CI in one ear and a HA in the opposite ear, were tested for speech understanding in noise and pitch perception with and without competing instruments. The devices were tested alone and together for a total of three hearing mode conditions, CI, HA, and CI+HA. While both HAs and CIs are auditory assist devices, they work in fundamentally different ways. Which cues are provided by a CI and a HA, and the performance with a particular device in speech and music tasks, can help identify which cues are necessary for speech and music. While bimodal listening has been shown to improve speech and music, the experimental conditions and stimuli were typically very different between speech and music tasks, and these differences may obscure the contributions of either device during combined device use. In Chapter 3, an acoustic stimuli database, called the Sung Speech Corpus (SSC), was created containing varying speech and pitch information. It is hypothesized that varying spectral envelope patterns, by singing different words, would influence pitch perception and varying the pitch of the words would influence speech perception. Chapter 3 details the creation of this database and shows NH performance across a number of fixed and variable speech and melodic pitch conditions. Subjects were divided into musician and non-musician groups to test the influence of musical experience and training on performance. In order to optimize bimodal mapping for a subject across speech and music tasks, there needs to be adequate testing of combined device performance in tasks that are sensitive to pitch and speech cues. In Chapter 4, bimodal listeners were tested for speech understanding and pitch perception with the SSC stimuli. Again, the cues transmitted by HAs and CIs with the performance reveals which cues are necessary for speech and pitch perception and how they are confounded by certain manipulations to the acoustic stimuli. These experiments reveal critical components of melodic pitch perception and point to future avenues of research related to pitch perception with CIs. 12 Chapter 1 - Examining the Effect of Channel Interaction on Pitch Perception in Simulated Cochlear Implants The following is an executive summary of the publication titled, “Channel interaction limits melodic pitch perception in simulated cochlear implants,” published in the Journal of the Acoustical Society of America - Express Letters in November 2012. The final, formatted manuscript is provided in Appendix A. Introduction Most CI users can understand speech very well in quiet listening environments, but they have difficulty with speech perception in noisy backgrounds. CI performance is also poor for music perception, relative to NH listeners. It is thought that poor spectral resolution is a primary cause for poorer performance in difficult music tasks (Shannon et al., 2004; McDermott 2004). In CIs spectral resolution is limited by the number of physically implanted electrodes (or virtual channels in the case of current steering) as well as overlapping neural spreads of excitation also referred to as channel interaction. While the effect of spectral resolution and channel interaction has been examined for speech perception in quiet and in noise (Friesen et al., 2001; Fu and Nogaki, 2005), less is known about the effects of channel interaction on melodic pitch perception. In this study, MCI performance was examined for NH subjects listening to a novel 16-channel sinewave vocoder that simulated different amounts of channel interaction. It was hypothesized that increasing amounts of channel interaction would lead to a decrease in melodic pitch perception. Methods Figure 1.1 shows a schematic representation of implementing channel interactions in simulated CIs. As in typical vocoders, the acoustic stimulus was fed into a bank of 16 bandpass filters with center frequencies corresponding to the frequency-to-place map of Greendwood (1990). The slowly varying envelope energy was extracted in each band, and this band-specific temporal envelope was used to amplitude modulate a sinewave carrier at the center frequency of each band. Differing degrees of channel interaction were simulated by adding variable amounts of the temporal envelope from one analysis band to other bands. The amount of envelope information added to adjacent bands depended on both the targeted degree of channel interaction and the frequency distance between bands. Corresponding output filter slopes were 24 (“slight”), 12 (“moderate”), and 6 (“severe”) dB/octave. 13 Figure 1.1. Schematic signal processing for the acoustic CI simulation with channel interaction. Temporal envelope information extracted from an analysis band (only one band is shown) is added to other bands with a gain of k i , which corresponds to filter slope in dB/octave. The sinewave carriers were the center frequencies of a particular frequency analysis band. Figure 1.2 shows frequency analysis for A3 (the lowest note in any contour), B3 (two semitones higher than A3), and C#4 (two semitones higher than B3) for the original, unprocessed stimuli and the CI simulations with no, slight, and severe channel interaction. As the amount of channel interaction increased, there was a corresponding decrease in spectral envelope depth. In Figure 1.2, the difference across notes was less and less as the amount of channel interaction increased. Also from Figure 1.2, the Unprocessed stimuli show the place coding and harmonic relationships while the processed stimuli show fixed carrier nature of CI processing. 14 Figure 1.2. Frequency analysis of experimental stimuli. From the left to right, the columns indicate different notes. The top row shows the unprocessed signal. Rows 2, 3, and 4 show stimuli processed by the CI simulations with no, slight, and severe channel interaction, respectively. Results All subjects scored greater than 90% correct for the unprocessed stimuli. Figure 1.3 shows average MCI performance as a function of semitone spacing (left panel) and as a function of channel interaction average across semitones (right panel). MCI performance decreased as the amount of channel interaction decreased. frequency (Hz) -99 -66 -33 0 -99 -66 -33 0 A3 (220 Hz) Amplitude (dBFS) -99 -66 -33 0 0.1 1 10 -99 -66 -33 0 Frequency (kHz) 0.1 1 10 0.1 1 10 B3 (247 Hz) C#4 (277 Hz) Unprocessed None Slight Severe 15 Figure 1.3. Mean MCI performance with the CI simulation as a function of semitone spacing (left panel) or the degree of channel interaction (right panel; performance averaged across semitone spacing conditions). The error bars indicate the standard error. The dashed line in the right panel shows mean CI performance from Zhu et al. (2011). Discussion As hypothesized, increasing amounts of channel interaction negatively affected melodic pitch perception. From Figure 1.2, CI processing removes the fine-structure cues and harmonic relationships inherent in the original stimuli, and channel interaction further weakens the remaining spectral envelope cues available to the listener. These results suggest that reducing channel interaction may be as important a goal as increasing the number of stimulation sites. In this study, the amount of channel interaction was constant across subjects and across channels for a particular condition. For actual CI users, channel interaction might vary greatly across CI users and across electrode location. Mean performance with CI users is shown in Figure 1.3 and comes from a Zhu et al. (2011) in which the authors measured MCI performance using the same piano stimuli as in this study. Interestingly, actual CI performance was most comparable to the Slight channel interaction condition. Note that the NH subjects who participated had no prior experience listening to vocoded sounds while the CI subjects had years of experience with electric hearing. Thus, NH performance would probably increase with training and experience. The present results may also explain some of the variability seen in CI music studies, with some subjects having little to no channel interaction and other subjects having severe channel interaction across electrodes. There are other factors that might influence melodic pitch perception for CI users such as frequency allocation, nerve survival, and music experience. Semitone Spacing 3 2 1 Percent correct 0 20 40 60 80 100 None Slight Moderate Severe Channel interaction None Slight Moderate Severe Percent correct 0 20 40 60 80 100 16 Conclusion The results of Chapter 1 illustrate the difference in pitch processing for acoustic hearing and CIs. CIs have a fixed place of stimulation and no fine structure cues; thus, a CI user must weigh the changes in spectral envelope to judge the pitch of a particular stimulus. In normal hearing, the harmonic relationships present within the acoustic signal strongly influence the perception of pitch. This understanding serves as the foundation for Chapter 2 which is concerned with the pitch perception in CIs when fine-structure cues are reintroduced, as with a HA. 17 Chapter 2 - Speech and Music Perception with Cochlear Implants and Hearing Aids The following is an executive summary of the publication titled, “Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception,” published in the journal, Public Library of Science One (PLOS ONE) in March 2015. The final, formatted manuscript is provided in Appendix B. Introduction In the previous study, it was shown that CIs do not provide the necessary cues for complex pitch perception. CIs transmit temporal envelope information over a broad spectral pattern, and while these cues may be used in a pitch ranking task, they are clearly not as robust as pitch processing for NH listeners. HAs provide fine spectral cues over a limited frequency range that depends on the residual acoustic hearing of an individual. These limits on frequency information eventually affect speech perception, and a HA user might opt for a CI for good speech comprehension. But the residual acoustic hearing can enable good pitch processing because it activates the normal hearing auditory system. HAs worn in conjunction with a CI have been shown to improve the perception of speech in noise (Kong et al., 2005; Kong and Carlyon, 2007; Brown and Bacon, 2009; Dorman and Gifford, 2010), but less is known about melodic pitch perception with combined device use. Kong et al. (2005) found that HA performance were better than CI performance for familiar melody identification; CI+HA performance relative to HA performance was mixed across subjects with some doing better with the HA only and some doing better with CI+HA. Similar results were found in Dorman et al. (2008). Looi et al. (2008) and El Fata et al. (2009) found that the use of HAs with CIs improved the overall perceived quality of sound compared to a CI only. More complex music testing, such as MCI with a competing instrument, has not been fully examined. These previous studies have shown widely ranging contributions of acoustic hearing and electric hearing to speech and music perception. In this study, speech and music perception were measured with the CI only, the HA only, and CI+HA together. It was hypothesized that combined device use would allow for better segregation of speech from noise and better segregation of a target melody from a masking instrument. Speech reception thresholds (SRTs) were measured in multi-talker babble. MCI performance was measured with no competing instrument and two conditions with a competing instrument; one masker had a fundamental frequency range that overlapped with the target stimuli and one masker had fundamental frequencies well above the target stimuli F0s. 18 Methods Speech understanding in noise was measured using simple sentences (as in Nilsson et al., 1994) presented in multi-talker babble. The intensity of the background noise was varied to quantify SRTs which are defined as the signal-to-noise ratio (SNR) that gives 50% of words correctly identified in sentences. During the task, if the subject correctly identified half or more of the words in a sentence, the background level was increased, and thus the SNR decreased. If a subject did not reach 50% correct, background noise was decreased giving a higher SNR. Thus, lower SNRs indicated greater performance. MCI was tested with the same unprocessed piano stimuli as in the previous study, however, there were two additional masker conditions. The A3 Masker had frequencies that overlapped with the F0s of the target stimuli; the A6 Masker had an F0 that did not overlap with the F0s of the target contour. Figure 2.1 shows the “rising” contour with 1- and 3-semitone spacing. The leftmost panels show the original spectrograms, the middle panels show a simulated HA spectrogram (with a sharp audibility cutoff around 500 Hz), and the rightmost panels show a typical CI output across electrodes channels (referred to as an electrodogram). Again, CIs provide no fine-structure or harmonic cues, while HAs provide little high frequency information necessary for consonant and word perception. Differences in the stimulation pattern across notes can be seen in the 3-semitone spacing condition while the changes are less distinct in the 1-semitone spacing condition. Figure 2.1. Spectrograms and electrodograms for the No Masker condition for 1 spacings. The far left panel shows a schematic re is shown in black. The middle two panels show a spectral representation of the original stimuli (left) and simulated HA output (right). A steeply sloping hearing loss was simulated using AngelS purposes only. The far right panel shows an idealized electrodogram representing the electrical stimulation patterns for a CI. Electrodograms were simulated using default stimulation parameters for the Cochlear Freedom and Nucleus-24 devices: 900 Hz/channel stimulation rate, 8 maxima, frequency allocation Table 6, etc. Figure 2.1. Spectrograms and electrodograms for the No Masker condition for 1- and 3 . The far left panel shows a schematic representation of HA and CI frequency ranges. The target contour is shown in black. The middle two panels show a spectral representation of the original stimuli (left) and simulated HA output (right). A steeply sloping hearing loss was simulated using AngelSim and is intended for illustrative purposes only. The far right panel shows an idealized electrodogram representing the electrical stimulation patterns for a CI. Electrodograms were simulated using default stimulation parameters for the Cochlear Freedom 24 devices: 900 Hz/channel stimulation rate, 8 maxima, frequency allocation Table 6, etc. 19 and 3-semitone presentation of HA and CI frequency ranges. The target contour is shown in black. The middle two panels show a spectral representation of the original stimuli (left) and simulated im and is intended for illustrative purposes only. The far right panel shows an idealized electrodogram representing the electrical stimulation patterns for a CI. Electrodograms were simulated using default stimulation parameters for the Cochlear Freedom 24 devices: 900 Hz/channel stimulation rate, 8 maxima, frequency allocation Table 6, etc. 20 Figure 2.2 shows the “rising” contour with 3-semitone spacing for the A3 (overlapping) Masker and the A6 (non-overlapping) Masker. Figure 2.2 is similar to Figure 2.1; the left panel shows the original spectrogram, the middle panel shows a typical HA spectrogram, and the right panel shows a typical CI electrodogram. The A6 Masker was targeted to be audible in the CI only, but this depended on the audibility thresholds for a particular subject. The overlapping masker can be seen at the bottom of the A3 original spectrogram, and the non-overlapping masker can be seen above the changes in F0 in the A6 original spectrogram. The HA spectrograms show that only the A3 Masker is represented in the simulated HA spectrogram. The electrodograms for the two masker conditions are distinct from the no masker condition (see electrodograms in the bottom panel of Figure 2.1 compared to the electrodograms in Figure 2.2). Figure 2.2. Spectrograms and electrodograms for the A3 and A6 Masker conditions figure shows (from left to right) a schematic representation of the test condition in relation to the frequency ranges of the HA and the CI, a spectrogra an idealized electrodogram for the A3 Masker condition; the bottom half shows the same information for the A6 Masker condition. Figure details are similar to the details of figure black and the masking instrument notes are shown in gray. Figure 2.2. Spectrograms and electrodograms for the A3 and A6 Masker conditions. The top half of the figure shows (from left to right) a schematic representation of the test condition in relation to the frequency ranges of the HA and the CI, a spectrogram of the original stimuli, a spectrogram of the simulated HA output, and an idealized electrodogram for the A3 Masker condition; the bottom half shows the same information for the A6 milar to the details of figure 2.1. The target instrument notes are shown in black and the masking instrument notes are shown in gray. 21 . The top half of the figure shows (from left to right) a schematic representation of the test condition in relation to the frequency m of the original stimuli, a spectrogram of the simulated HA output, and an idealized electrodogram for the A3 Masker condition; the bottom half shows the same information for the A6 . The target instrument notes are shown in 22 Results Figure 2.3 shows the MCI performance for different subjects across the three masker conditions. Data in Figure 2.3 were averaged across 1-, 2-, and 3-semitone spacing as in the right panel of Figure 1.3. For all masker conditions, MCI scores were greatest for HA-only and worst for CI-only. CI+HA scores, relative to HA-only scores, were variable for different subjects; some subjects seemed to focus on the better ear (CI+HA ≈ HA) while others experienced perceptual interference (CI+HA < HA), and some of the poorest performing subjects seemed to experience perceptual integration (CI+HA > HA). CI-only was significantly worse than HA-only and CI+HA. The A3 Masker was significantly worse than with the No Masker condition. In general, HAs provided the bulk of the melodic pitch performance and the addition of a CI did not affect MCI performance in a consistent way. 23 Figure 2.3. MCI performance for individual subjects across hearing devices and masker conditions. CI- only performance is shown by the black bars, HA-only performance is shown by the white bars, and CI+HA performance is shown by the gray bars. Mean performance is shown at the far right within each masker condition; error bars indicate standard error. MCI with No Masker is shown in the top panel, MCI with the overlapping, A3 Masker is shown in the middle panel, and MCI with the non-overlapping, A6 Masker is shown in the bottom panel. No Masker S1 S2 S3 S4 S5 S6 S7 S8 AVE Percent correct 0 20 40 60 80 100 CI HA CI+HA A6 Masker Subject S1 S2 S3 S4 S5 S6 S7 S8 AVE Percent correct 0 20 40 60 80 100 CI HA CI+HA A3 Masker S1 S2 S3 S4 S5 S6 S7 S8 AVE Percent correct 0 20 40 60 80 100 CI HA CI+HA 24 Figure 2.4 shows the SRTs measured for the individual subjects across the device conditions. Some SRTs could not be obtained as subject performance was less than 50%; if a subject could never get more than half the words identified even in quiet, then there was no way to complete this adaptive procedure. In general, speech performance was much greater with the CI than the HA alone, and CI+HA scores were often slightly better than CI-only scores. There was no statistically significant improve in SRTs from CI to CI+HA; however, this is likely due to limited number of subjects and high across subject variability. Figure 2.4. Speech-in-noise results for individual subjects across hearing devices. CI-only SRTs are shown by the black bars, HA-only SRTs are shown by the white bars, and CI+HA SRTs are shown by the gray bars. Mean performance is shown at the far right; error bars indicate standard error. Asterisks indicate that SRTs could not be measured for that condition. Bars closer to the top of the graph indicate better performance. 25 Discussion This study showed that the contributions of acoustic and electric hearing to bimodal speech and music performance were both subject-dependent and task-dependent. As in previous studies, CIs contributed mostly to speech perception while adding little to melodic pitch perception, whereas the opposite was true for HAs. HAs were good for pitch perception but did not provide good speech perception. Bimodal music perception was driven largely by the HA; adding a CI did little to improve MCI scores, and for a few subjects, e.g., S1 and S7, CI+HA scores were quite lower than HA-only scores. Kong et al. (2005) found that CI+HA scores were on average better than HA-only scores. The difference may be due to the different test paradigms, familiar melody identification vs. MCI. The effect of the masker was variable across subject and test device; some subjects (S4, S5) had performance similar across all three conditions while many experienced a considerable drop in performance from the No Masker to the A3 Masker (e.g., S3). The A3, Overlapping Masker produced statistically worse performance than either the No Masker or A6 Masker conditions. In general, the HA was more robust to increasing complexity as measured by MCI. This suggests that the fine-structure cues provided by the HA can help in segregating two melodies. Measured bimodal speech perception was consistent with previous studies which showed a slight increase in speech perception with CI+HA compared to CI-only. For most subjects, the HA provided little speech information. This is to be expected; if speech with a HA were good enough, that subject would likely not qualify to receive a CI. While this data showed that HA-only was better than CI+HA for the music tasks, real-world music listening is likely maximized when using both devices. For example, music contains complex timbres, melodies, harmonies, and rhythmic patterns as well as large dynamic ranges and frequency ranges. The frequency range of many instruments might be well outside the residual acoustic hearing range (e.g., cymbals, percussion instruments). And perhaps more importantly, much of pop music contains sung melodies, i.e., lyrics, whose lexical information might be better conveyed by a CI. Conclusion While CIs are better for providing the necessary cues for speech perception, HAs are indeed better for certain aspects of music perception such as low-frequency pitch perception. This is influenced by the audibility and resolution of one’s residual acoustic hearing. When comparing the cues provided by the individual devices (as seen in Figures 2.1, 2.2, and 1.2), it becomes apparent that while CI users may be able to use remaining auditory cues such as spectral envelope, these cues are not robust to typical changes and complexities seen in everyday music. The use of a HA might restore some of the missing cues, but is still unclear how the two devices interact in complex speech and music as the stimuli and test paradigm were different when testing speech and music. 26 Chapter 3 – Development and Testing of the Sung Speech Corpus with Normal Hearing Musicians and Non-musicians The following is an executive summary of the publication titled, “Melodic contour identification and speech recognition using sung speech,” published in the Journal of the Acoustical Society of America - Express Letters in September 2015. The final, formatted manuscript is provided in Appendix C. Introduction In the previous study, bimodal CI users (those who wear a HA in their opposite ear) were tested for speech and music perception. The results indicated that CIs deliver good speech cues while HAs deliver good low-frequency pitch cues. However, very different test methods and stimuli might have confounded these results; a test subject may be focus on the better ear for a particular task. This may obscure the interaction between the two devices for a given task. To address these concerns, we created an acoustic stimuli database featuring variable pitch and speech information for future testing with bimodal subjects. In this Chapter, we refer to different words as different timbres because vowels have different spectral envelopes which distinguish them, and spectral envelope is a key component of timbre (Macherey and Delpierre, 2013). This database would allow for various test conditions of fixed or variable timbre and/or pitch. Future testing with bimodal CI and HA users could illuminate how pitch and timbre are confounded in either device and whether combined device use could disentangle these percepts. This chapter details the creation of the Sung Speech Corpus (SSC) and examines the effect of musicianship on melodic pitch perception and sentence recognition. It was hypothesized that a musicianship effect would be observed, especially as the tasks increased in complexity. Methods Two groups of subjects were recruited: musicians and non-musicians. Musicians were defined as those who regularly play a musical instrument, and non-musicians were defined as those who never had formal or informal music training. Those who did not play an instrument regularly but did have some previous music experience (e.g., briefly learning to play an instrument, sang in a choir) were excluded. Thus, the subject pool represented the extremes of the music experience continuum. The SSC consists of 50 monosyllable words produced by an adult male. There are 5 categories: “name” “verb” “number” “color” “article of clothing.” There are 10 words within each category. Each word was sung over an octave range, F0s from A2 (110 Hz) to A3 (220 Hz), in discrete semitone steps. Natural speech utterances were also produced to compare the intelligibility of typical, spoken words to these sung words. The database allows one to create a five word sentence that is simultaneously a five note 27 melody; word and sentence recognition can then be measured in addition to melodic pitch perception for these stimuli. The stimuli were post-processed to ensure the correct target pitch, consistent target duration (500 ms), and normalized amplitude. Sentence recognition was measured using a matrix sentence test in which the subject is asked to correctly identify the target word for each of the five categories. There were four possible test conditions for sentence recognition: Spoken (containing only the natural speech utterances), Flat Contour (the same pitch across all words), Fixed Contour (the same contour, with changing pitch, across all words), and Random Contour (any of the nine possible contours). The Spoken and Flat Contour conditions were fixed pitch conditions, and the Fixed Contour and Random Contour conditions were variable pitch conditions. For scoring purposes, a subject was required to correctly identify all five of the words in the sentence for it to count as a successful trial. Thus, it would be possible to consistently get four out of the five words correct and end up with a 0% score. This scoring system served to expand the range of scores at the top end of performance. Small improvements in successfully identifying words could yield large changes in the final score. Melodic pitch perception was measured, as previously, with MCI. There were four possible test conditions for MCI: Piano (the same stimuli as in Chapters 1 and 2; in Chapter 2, this is the No Masker condition), Fixed Word (the same word across all notes of a contour across all trials), Fixed Sentence (the same sentence across all trials, but different words across a contour), and Random Sentence (a different sentence for each trial). The Piano and Fixed Word conditions were fixed timbre conditions, while the Fixed Sentence and Random Sentence conditions were variable timbre conditions. Results Figure 3.1 shows the sentence recognition (top panels) and MCI (bottom panels) scores for the different test conditions. For sentence recognition, nearly all subjects scored 100%; there was no effect of musicianship and there was no effect of the different conditions. For MCI performance, musicians consistently outperformed non-musicians for all test conditions and scored nearly 100% in each task. For non-musicians performance was much worse and more variable. For non-musicians, performance was similar between the Piano and Fixed Word (fixed timbre) conditions, and performance was similar for the Fixed Sentence and Random Sentence (variable timbre) conditions. Performance in the two fixed timbre conditions was significantly greater than in the variable timbre conditions. 28 Figure 3.1. Box plots for sentence recognition and MCI scores for musicians and non-musicians. Results for sentence recognition are shown in the top panel, and MCI results are shown in the bottom panels. Musicians are labeled with M and non-musicians are labeled as NM. Each panel shows data for different test conditions. The boxes show the 25th and 75th percentile, the error bars show the 10th and 90th percentiles, the solid line shows the median, the dashed line shows the mean, and the symbols show outliers. Discussion There was no observed musician benefit for speech recognition; this difference might have been obscured by ceiling effects as nearly every subject score 100% correct for a given speech condition. There was a strong musician benefit across all MCI test conditions. The observed difference in performance for even the simple music stimuli (Piano and Fixed Word) might have arisen from our subject exclusion criteria; people with essentially no music training were compared to musicians with decades of music experience. For the non-musicians, variations in timbre significantly affected MCI performance. The hypothesis that musicianship gives a larger advantage as the stimuli increase in complexity was supported. Piano M NM MCI % Correct 0 20 40 60 80 100 Fixed Word M NM Fixed Sentence M NM Random Sentence M NM Spoken M NM Sentence ID % Correct 0 20 40 60 80 100 Flat Contour M NM Fixed Contour M NM Random Contour M NM 29 Conclusion The SSC may be a useful tool in evaluating speech perception with variable pitch and melodic pitch perception with variable timbre (i.e., different words). Because changing spectral envelopes is well conveyed by a CI and changing pitch is well conveyed by a HA, the SSC may give a greater insight into combined device performance in variable pitch or timbre tasks than previous experimental methods. 30 Chapter 4 – Sung Speech EAS The following data is unpublished. It is targeted for submission in Trends in Hearing. Introduction Chapter 1 used NH subjects listening to a vocoder simulation of a CI to argue that CI users primarily use spectral envelope cues to perform a melodic pitch task (MCI); when the place of stimulation is fixed and no fine structure cues are present, the only cue available is a global spectral envelope ranking most likely based on spectral centroid or some spectral weighting. One major implication of this finding is that changing the spectral envelope of a stimulus could be confused with a change in the pitch associated with that stimulus; different vowels have different spectral envelopes so it is possible that two different vowels would be perceived as having two different pitches as well with a CI. Chapter 2 showed that hearing aids (HAs) conveyed low-frequency pitch information quite well; these abilities are related to audibility and spectral resolution of the acoustic ear (Zhang et al., 2013). Surprisingly, even though HAs provide little speech information alone, the combined use of HAs with CIs has been shown to improve speech perception relative to the CI alone. Thus, electro-acoustic stimulation (EAS) users, those who use acoustic hearing in conjunction with their CI, may give a window into the similarities and differences between speech and music (in this case, pitch) perception. In general, the CIs provide the coarse spectral resolution and adequate sub-band temporal processing necessary for speech, and the HAs provide low-frequency pitch information. However, it is difficult to observe how acoustic and electric hearing are combined as previous testing for speech and music used wildly different stimuli (simple sentences in background noise vs. MIDI synthesized piano notes) and test methods (speech reception thresholds vs. MCI). In this study, bimodal subjects (a particular type of EAS subject: those who wear a CI in one ear and a HA in the opposite ear) were tested using the SSC. The SSC is made up of a matrix sentence test (5 categories with 10 words each) with each word sung at different pitches so that an experimenter can construct a five word sentence that is simultaneously a five note melody. This allows for testing speech perception with constant and/or variable pitch and for testing pitch perception with constant and/or variable speech information (akin to different timbres). Chapter 3 documented the rationale and utility of the test database and showed NH speech and music perception results for a number of manipulations within the database. The results of the NH listeners with music experience showed that pitch perception was independent of spectral envelope (or in this particular case, different words). We hypothesized that, similar to the results of Chapter 2, the HA alone will provide better pitch perception than the CI alone, and the CI alone will provide better speech understanding than the HA alone. Unlike the results of Chapter 2, we hypothesized that combined device use (CI+HA) will be better than either ear alone because we explicitly varied the cues that are well represented by both devices. And we hypothesized 31 that the bimodal benefit, the improvement with CI+HA relative to the better ear, will increase as the test condition becomes more complex as the access of cues transmitted by a particular device would become more important when multiple acoustic parameters are varied. Methods Subjects were tested with the CI only, the HA only, and both CI+HA. Subjects used their everyday, clinical devices and settings throughout testing. All stimuli were presented at 65 dBA in a sound-treated booth. Testing was similar to that in Chapter 2 except for the use of the SSC stimuli as in Chapter 3. Two pitch or music conditions were tested using the SSC stimuli: Fixed Word and Random Sentences. The Fixed Sentence condition was not tested as performance was very similar to Random Sentences (Chapter 3), and the Piano condition was not re-tested in this case as that data appears in Chapter 2. The two conditions probe the effect of variable timbre on melodic pitch perception. If timbre is indeed confounded with pitch in CIs, there should be a noticeable performance drop from fixed timbre to variable timbre. Figure 4.1 shows the original spectrogram, a simulated HA spectrogram, and a simulated CI electrodogram for example stimuli in the MCI conditions, similar to Figures 2.1 and 2.2. For the Fixed Word condition, the full bandwidth spectrogram shows the F0 and its associated harmonics; the harmonic content is drastically reduced for the simulated HA while no discernible fine-structure cues are present in the electrodogram. In the electrodogram, the spectral weighting changes very little from the lowest F0 “pink” to the highest F0. Thus it seems as if the broad spectral envelope does not change when a word is produced at different pitches indicating that there is little difference in the formant structure across changes in pitch. In the Random Word condition, the electrodogram reveals a change in spectral envelope across different words. For example, look closely at the energy present around channels 13 through 16; “Five” and “Ties” have more energy in these bands than “Loans” and “Gold” as the hard “i” has higher frequency formants than the hard “o.” This would likely cause pitch ranking to be distorted if global spectral envelope cues were used to make a pitch judgment. The HA spectrograms show that the fine-structure spectral cues are conveyed by a HA which would presumably allow for good MCI performance in the variable timbre condition. 32 Figure 4.1. Spectrograms and electrodograms for stimuli in the MCI task. The top panel shows the Fixed Word condition, and the bottom panel shows the Random Sentence condition; both are for a 3-semitone rising contour. The far left panels show a spectral representation of the original stimuli, and the middle panels show simulated HA output. A steeply sloping hearing loss was simulated using AngelSim and is intended for illustrative purposes only. The far right panels shows an idealized electrodogram representing the electrical stimulation patterns for a CI. Three speech conditions were tested using SSC stimuli: Spoken, Flat Contour, and Random Contour. The Fixed Contour condition was not tested as Fixed Contour performance was very similar to Random Contour performance in Chapter 3. The three conditions show the influence of production, spoken vs. sung, on speech intelligibility, and the influence of steady or dynamic pitch, Flat Contour vs. Random Contour, on speech intelligibility. Figure 4.2 shows full bandwidth spectrograms, simulated HA spectrograms, and a representative electrodogram for an example sentence for the Spoken, Flat Contour, and Random Contour conditions. In each example, the target sentence is “Kate moves two red socks.” HA spectrograms were simulated the same as in Chapter 2 as with the electrodograms. pink pink pink pink pink pink pink pink pink pink pink pink pink pink pink 0.1 8.0 kHz 0.1 8.0 kHz 20 15 10 5 Electrode Fixed Word Rising Contour Mark loans five gold ties Mark loans five gold ties Mark loans five gold ties 0.1 8.0 kHz 0.1 8.0 kHz 20 15 10 5 Electrode Random Sentence Rising Contour 33 Figure 4.2. Spectrograms and electrodograms for stimuli in the Sentence Recognition task. The natural speech utterances (Spoken) are shown in the top panel; sung speech with a constant pitch is shown in the middle panel (Flat Contour) and sung speech with a 3-semitone spacing rising contour is shown in the bottom panel (Rising Contour). The left panels show a spectral representation of the original stimuli, and the middle panels show simulated HA output. A steeply sloping hearing loss was simulated using AngelSim and is intended for illustrative purposes only. The far right panel shows an idealized electrodogram representing the electrical stimulation patterns for a CI. Kate moves two red socks Kate moves two red socks Kate moves two red socks 0.1 8.0 kHz 0.1 8.0 kHz 20 15 10 5 Electrode Spoken Kate moves two red socks Kate moves two red socks Kate moves two red socks 0.1 8.0 kHz 0.1 8.0 kHz 20 15 10 5 Electrode Sung: Flat Contour Kate moves two red socks Kate moves two red socks Kate moves two red socks 0.1 8.0 kHz 0.1 8.0 kHz 20 15 10 5 Electrode Sung: Rising Contour 34 In Figures 4.1 and 4.2, the electrodograms show that much of the temporal envelope and broad spectral weighting cues are transmitted by the CI. The simulated HA spectrograms show the lack of high frequency information but the presence of low-frequency pitch cues. Much consonant information is high frequency and this would be lacking for a HA user with high frequency hearing loss. The temporal envelope and spectral weighting for a particular word is similar across the speech conditions based on the CI electrodograms. However, it does appear that the vowels are longer duration in the two sung conditions (Flat Contour and Random Contours) relative to the spoken stimuli. From the full bandwidth spectrograms, normal speech features subtle frequency dynamics within a word whereas the sung speech features a very stable frequency. The Rising Contour electrodogram is very similar to the Flat Contour electrodogram indicating that changing the pitch of the words did not shift the spectral energy into different bands; thus they would seem to be equally intelligible for speech comprehension. Results Figure 4.3 shows subject performance for the two MCI conditions. Data presented in Figure 4.3 have been averaged across 1- to 3-semitones. On average, HA-only performance was better than CI-only performance for both Fixed Word and Random Sentence conditions, however, the difference was small when compared to the results of Chapter 2. CI+HA scores were the best on average, but again, the differences were small. CI+HA scores relative to either device alone were variable with some subjects (e.g., C1, C4, C8) seeming to attend to the HA only (i.e., CI+HA = HA). C3 had better CI scores than HA scores and seemed to enjoy a bimodal benefit (CI+HA > CI) for the MCI tasks while C7 and C9 showed no consistency in the best device. There was a large across subject drop from Fixed Word to Random Sentences. A two-way RM ANOVA showed a significant main effect of test condition (i.e., Fixed Word vs. Random Sentences) [F(1, 12) = 20.7, p = 0.004] but no main effect of hearing mode and no significant interaction. 35 Figure 4.3. MCI performance for individual subjects across hearing devices and timbre condition. CI- only performance is shown by the black bars, HA-only performance is shown by the white bars, and CI+HA performance is shown by the gray bars. Mean performance is shown at the far right for each timbre condition; error bars indicate standard error. MCI with Fixed Word is shown in the top panel, and MCI with Random Sentences is shown in the bottom panel. Figure 4.4 shows the individual subject performance for hearing mode across the three speech testing conditions. On average, CI performance was much greater than HA performance for all conditions, and CI+HA was usually greater than CI-only. Only subject C10 performed better with HA-only than CI-only for all conditions; C7 had HA performance similar to CI performance for both of the sung speech conditions but not for the spoken speech condition. For everyone else, speech performance was largely driven by the CI, and the addition of the HA improved performance further. There was a large drop in Random Sentence C1 C3 C4 C7 C8 C9 C10 AVE Percent Correct 0 20 40 60 80 100 CI HA CI+HA Fixed Word C1 C3 C4 C7 C8 C9 C10 AVE Percent Correct 0 20 40 60 80 100 36 performance from Spoken to Flat Contour, but there was little to no drop from Flat Contour to Random Contour. A two-way repeated-measures (RM) ANOVA showed significant main effects of hearing mode [F(2,24) = 12.6, p = 0.001] and test condition [F(2,24) = 67.8, p < 0.001] as well as a significant interaction [F(4,24) = 3.1, p = 0.033]. Post-hoc paired t-tests with Bonferonni corrections showed significant differences between Spoken vs. Flat Contour (p < 0.001), Spoken vs. Random Contour (p < 0.001), CI+HA vs. HA (p < 0.001), and CI vs. HA (p = 0.024). 37 Figure 4.4. Sentence identifcation performance for individual subjects across hearing devices and timbre condition. CI-only performance is shown by the black bars, HA-only performance is shown by the white bars, and CI+HA performance is shown by the gray bars. Mean performance is shown at the far right for each speech condition; error bars indicate standard error. Fixed Contour C1 C3 C4 C7 C8 C9 C10 AVE Percent Correct 0 20 40 60 80 100 CI HA CI+HA Spoken C1 C3 C4 C7 C8 C9 C10 AVE Percent Correct 0 20 40 60 80 100 Random Contour C1 C3 C4 C7 C8 C9 C10 AVE Percent Correct 0 20 40 60 80 100 38 Figure 4.5 shows the average bimodal benefit, measured by CI+HA scores minus the better ear (maximum of CI or HA) scores, for the Speech and Music tasks. The mean bimodal benefit (across subjects) was approximately 15 percentage points for the three Speech conditions and approximately 5 percentage points for the two Music conditions. The bimodal benefit for speech was analyzed using a two-way RM ANOVA, with hearing mode (bimodal, best single ear) and test condition (Spoken, Flat, Random). Results showed significant main effects of hearing mode [F(1,12) = 14.2, p = 0.009] and test condition [F(2,12) = 83.3, p < 0.001] but no significant interaction. A two-way RM ANOVA for the MCI tasks with CI+HA relative to the better ear showed a significant main effect of test condition [F(1,6) = 16.7, p = 0.006] but no main effect of hearing mode and no significant interaction. These results indicate that there is a substantial and statistically significant bimodal benefit for speech but little bimodal benefit for music performance as measured in this study. Figure 4.5. Bimodal benefit for sung speech. The left bars show the bimodal benefit (CI+HA - better ear scores in percent) for speech. The right bars show the bimodal benefit for music. Bimodal Benefit (% point increase) 0 5 10 15 20 Spoken Flat cont. Rand. cont. Fixed word Rand. sent. Speech Music 39 Discussion The present data show differences in perception of spoken versus sung speech, the effect of changing pitch on word and sentence identification, and the effect of changing timbre (in this paper, words) on pitch perception for subjects who wear a CI in one ear and a HA in the opposite ear. As in Chapter 2, our results show that CIs represent speech information quite well and HAs represent low frequency pitch information quite well. However, previous studies have shown an inconsistent bimodal benefit that depended on the hearing mode and/or test stimuli and paradigm (Mok et al., 2006; Gifford et al., 2007). Our results show combined device use was best across conditions. However the hypothesis that combined device use was more important was not supported as the bimodal benefit was similar across the speech conditions and similar across the melodic pitch conditions. Speech Perception: Spoken vs. Sung and Constant Pitch vs. Changing Pitch There was a large drop in performance from the Spoken speech condition to either of the sung speech conditions, Flat Contour or Random Contour. This indicates that normally produced speech is more intelligible than our Sung Speech stimuli. In Chapter 3, NH listeners had little difficulty identifying sentences with either the Spoken or Sung Speech stimuli; however, hearing impaired listeners showed a significant deficit with Sung Speech relative to Spoken speech for every hearing mode. NH listeners, both musician and non-musician, experienced a ceiling effect with the sentence identification task and scored nearly 100% across the different speech conditions. No CI+HA users scored 100% in any of the conditions for any of the device configurations. Previous studies have shown that NH listeners are able to correctly identify words regardless of the native language or speaking style of the talker (Ji et al., 2014); CI listeners are much more susceptible to “atypical” speech (e.g., telephone speech, computer speech, fast speech, non-native talkers, etc.). Our results agree with previous results that show a decrease in speech recognition when the speech is altered from its normal representation. This confirms again that CI users struggle with speaker normalization; it seems as if CI users need a good match between the external, perceived auditory pattern and the internal pattern template. There were minimal differences in performance between the Flat Contour and Random Contour conditions. This suggests the range of pitch variations tested did not negatively affect word identification. But more likely, the pitch range was greatly compressed by the CI signal processing. The top panel of Figure 4.1 shows the word, “pink,” sung over a one octave range; the spectral energy hardly changes from the lowest F0, the left most word, to the highest F0, the right most word as observed in the CI electrodogram. It seems that changing the pitch does not affect the formants, at least as much of a change as to activate the next spectral band. The middle and bottom panels of Figure 4.2 show there is little change in the electrodogram with the same word sung at different pitches. In Figure 4.2, the “two” is the same stimulus in both the Flat Contour and Rising Contour; “Kate” is 6 semitones lower in the 40 Rising Contour and “Socks” is 6 semitones higher in pitch in the Rising Contour example. There seem to be small, temporal changes, but minimal spectral envelope changes as observed in the electrodograms. The temporal changes are likely because each stimulus was recorded independently; thus, there were small differences between the same words sung at different pitches. These small changes did not seem to affect performance with the NH (Chapter 3) or CI+HA users. Pitch Perception: Fixed Word vs. Random Sentences There was a significant drop in MCI performance from the Fixed Word condition to the Random Sentence condition indicating that varying the word or timbre negatively affected pitch perception. This significance held for post-hoc comparisons with the CI, HA, and CI+HA. In the HA spectrograms in Figure 4.1, the fine-structure frequency changes were present and nearly identical in the Fixed Word and Random Sentence examples. However, in the electrodograms, there is a drastic difference in spectral information between the two examples. The different vowels in the Random Sentence example show up in the electrodogram. This would make the ability to correctly identify the target contour more difficult if a CI user were using the spectral envelope or spectral weighting to make a pitch judgment. Subjects C4 and C8 did not experience a large drop in performance from Fixed Word to Random Sentences with their HAs and for CI+HA; these subjects had good residual acoustic hearing and were likely able to focus on the harmonic relationships and fine-structure cues to make correct pitch judgments. C3, C7, and C8 had good CI-only performance (>75%) for the Fixed Word condition, but their CI-only scores dropped dramatically when tested with Random Sentences. These results have clear implications for testing pitch perception with hearing impaired listeners; musical pitch is based primarily on harmonics and fine- structure cues and these are not well conveyed by a CI. CI users must use other cues, namely spectral envelope and weak temporal cues, to make pitch judgments. Thus, introducing some jitter in the spectral envelope would decrease pitch perception performance but might be more akin to real world music listening. Bimodal Benefit for Speech From Figure 4.5, a large and significant bimodal benefit was observed (~15 percentage points) in all three speech conditions. There was no significant interaction when comparing CI+HA to the better ear for speech; this can be seen in Figure 4.5 as the bimodal benefit is nearly identical for Spoken, Flat Contour, and Random Contour. The observed bimodal benefit was subject-dependent. In Figure 4.4, C3 and C4 show a ~10% bimodal benefit across all conditions while subject C8 shows a larger bimodal benefit across conditions. C1 and C7 show a more nuanced bimodal benefit that is related to the particular test condition; for C1, there is a bimodal benefit in the Spoken and Flat Contour conditions 41 while there is no observed benefit for the Random Contour condition. C7 actually shows a performance drop from CI-only to CI+HA for the Spoken condition while there is a small increase from CI-only to CI+HA for the two Sung Speech conditions. Subject C10 is the only subject whose HA performance was consistently greater than CI-only performance in the Sentence Identification task; the amount of bimodal benefit for this subject was similar for the Spoken and Flat Contour conditions, but was reduced for the Random Contour condition. Taken together these results indicate that while the HA provides very little speech information on its own, it can add to speech perception with a CI. Bimodal Benefit for Music There was no significant bimodal benefit for either of the MCI tasks. In Figure 4.5, the average bimodal benefit was ~5% which was not great enough to be considered statistically significant relative to the better ear. For the Sung Speech stimuli, there was no significant difference observed between HA-only and CI-only performance unlike in previous EAS music studies (Chapter 2; Kong et al., 2005). Interestingly, CI-only performance for the Fixed Word condition was better than CI-only performance with the Piano stimuli as in Chapter 2. C1 scored about 25% for CI-only piano but scored about 55% for CI-only Fixed Word; C7 scored about 25% with the piano stimuli while she scored about 75% with the Fixed Word; C8 scored about 35% with the piano but scored about 85% with the Fixed Word. However, these results are not universal; C3 scored nearly 100% with CI-only for the piano but only scored about 80% with the Fixed Words while C9 scored about 60% with the piano but scored nearly 45% with the Fixed Words. While both conditions are considered constant timbre, there are some differences between the stimuli that may account for some of these results. The piano stimuli featured F0s ranging from 220 Hz to 440 Hz while the Fixed Word stimuli featured F0s ranging from 110 Hz to 220 Hz. Curiously, the low-frequency cutoff of CIs is generally around 200-300 Hz meaning that the F0 information from the sung speech should not show up in the spectral channels of the CI except as a temporal modulation in the output bands. This would seem to indicate that piano should be easier than fixed words for pitch perception, however, CIs are primarily geared towards speech perception which may explain the better performance with speech than piano stimuli. Sung Speech Corpus vs. Previous Speech and Music Tasks The current study presents the results of speech testing and music testing with novel stimuli. Previous experiments have tested speech perception with variable amounts of background noise (Chapter 2; Kong et al., 2005); in this study, rather than adding noise or multiple talkers, we explicitly manipulated the pitch of the words to examine the effect on speech identification. While there was no difference between the Flat Contour and Random Contour scores, this manipulation adds a level of complexity that 42 mimics some pitch dynamics in normal speech (e.g., question vs. statement). For music testing via MCI, instead of adding a second instrument as Chapter 2, we made the “instrument” more dynamic by changing the words that were sung. Thus, this study examines the effect of introducing variation of some acoustic cues presented in the speech and music stimuli. There is a confounding of pitch and timbre in CIs, and the SSC can be used to examine this particular question. Therefore, this experiment allows for another insight into perception of speech and music with CIs and HAs. Conclusion CI users lack the pitch processing abilities of NH listeners. HAs can improve pitch perception in conjunction with a HA, but there are still limits of melodic pitch perception, especially when variability is introduced to the acoustic stimulus. The results indicate that pitch and timbre are confounded to some degree in CIs, though this does not affect speech perception as much as melodic pitch perception. By varying the spectral envelope, MCI performance decreases significantly for CI+HA users but not for musician NH subjects. This suggests that fine-structure and harmonic cues are critical for generalized complex pitch perception, and a goal for CI research should be the restoration of these cues. Also, when testing pitch perception with CI users, some jitter of acoustic cues needs to be introduced, and simple melody tasks likely over estimate CI users pitch processing abilities. 43 General Discussion Limitations of Pitch in Cochlear Implants While CIs are good for speech perception, they are limited in their ability to convey complex pitch, even for a single musical instrument. Spectral resolution has been hypothesized as a major reason for the lack of pitch processing abilities with a CI. In Chapter 1, CIs were simulated using a vocoder with differing amounts of channel interaction, and pitch perception was measured using MCI. The 16-channel, no channel interaction condition is somewhat of an ideal situation, and even in this condition, MCI performance was low (about 40%) with 1-semitone spacing. This suggests that CI processing fundamentally alters pitch perception and it is much degraded relative to normal acoustic hearing. Figure 1.2 shows the frequency content of the processed and unprocessed stimuli. For the unprocessed stimuli, the fine-structure information and harmonic relationships are present; these cues are discarded in typical CI processing and the processed stimuli does not contain the fine-structure cues and harmonic relationships due to the fixed place of stimulation. Performance with actual CI users was similar to NH performance with a vocoder, as seen in the dashed line in the right panel of Figure 1.3 and the CI-only scores for the No Masker condition in Figure 2.3. CIs do provide a coarse spectral envelope which can be used to pitch rank stimuli; however, this spectral envelope can be weakened by increasing amounts of channel interaction as seen in Figure 1.2. The results of Chapter 1 indicate that as channel interaction increases, MCI performance decreases alongside the decrease in spectral envelope depth. The right panels in Figure 2.1 show a typical CI output for the MIDI piano stimuli for the MCI task. The rising pattern can be seen in the electrodogram for the 3-semitone spacing condition, but it is difficult to discern in the 1-semitone spacing condition. The fine-structure cues and harmonic relationships can be seen in the original and HA spectrograms, but are not present in the CI electrodograms. The CI-only performance in Figure 2.3 suggests there is some pitch perception abilities in CI users, but this performance is much weaker than with the HA which provides the fine-structure cues. CI users really struggle with pitch perception when the task and stimuli increase in complexity and more approximate real-world music listening conditions such as polyphonic music. In Chapter 2, MCI was tested with one instrument and two competing instruments, one in the same frequency range and one separated by a large F0 difference. The right panels of Figure 2.2 show the CI output for the two masker conditions. While the 3-semitone spacing condition shows a clear rising pattern for No Masker (bottom right panel of Figure 2.1), the rising pattern is obscured for the A3 Masker condition even with 3- semitone spacing; the rising pattern is similar to the No Masker condition with the A6 masker condition but is a little more diffuse. The CI electrodograms predict that CI-only performance might be similar between the No Masker and A6 Masker conditions and would be much worse with the A3 Masker. Actual performance follows this prediction remarkably. The results in Figure 2.3 indicate that CI-only performance with the A6 Masker (bottom panel) is just a bit worse than No Masker (top panel); A3 44 Masker CI-only performance is significantly lower and approaches chance level for many of the subjects. These results suggest that reliable pitch ranking with CIs is much more difficult with multiple instruments when there is a large degree of overlap in the frequency content of the stimuli. These results also suggest that CIs have extreme difficulty in segregate competing instruments; again, this is likely due to the lack of fine-structure cues and harmonic relationships in CI processing. CI pitch perception also worsens when the spectral envelopes are varied across notes. Chapter 3 details the development and evaluation of a novel stimuli set that contains both variable pitch and speech information. Different vowels have different spectral envelopes (Rosen et al., 1999); thus, a database of different words containing different vowels sung at different pitches allows an experimenter to measure pitch perception with fixed or variable spectral envelopes. The right panel of Figure 4.1 shows a typical CI output for the sung speech stimuli in two conditions: Fixed Word (same spectral envelope) and Random Sentence (varying spectral envelopes). In the Fixed Word condition, the spectral content is similar from the lowest F0 “pink” to the highest F0 “pink” (top right panel). This shows that the spectral envelope is similar across the same word which is expected as the same word has the same vowel. This also shows that for a fixed speaker or singer, the vowel formants do not change much as the target pitch is changed, at least with respect to the wide CI input filters. In the Random Sentence condition, the spectral content changes for different words with different vowels. Notice the dramatic difference in activation around electrodes 13-16 for the different words, especially “loans” vs. “five” and “gold” vs. “ties.” Changing vowels changes the CI spectral envelope output, and this extra variation would presumably cause pitch perception to worsen if the spectral envelope is the primary cue for pitch ranking. As expected, CI-only performance in the Random Sentence condition was significantly worse relative to the Fixed Word condition. Collectively, these results suggest that pitch perception in CIs is fundamentally different than for normal acoustic hearing. While CI users can pitch rank and pattern match melodies and melodic contours to some degree, there is a limit to this ability. Even in the easiest listening conditions, such as one simple non-varying instrument (e.g., piano), pitch perception for CI users is much worse than NH for small F0 changes (e.g., 1-semitone). Pitch perception in CIs is further reduced when the complexity increases, such as with two instruments simultaneously presented or with one dynamic instrument (e.g., singing a melody with different words). Ultimately, the cues used by CI users to judge the pitch of a stimulus in isolation are not robust or invariant to typical manipulations seen in real-world listening conditions. What Have We Learned about Pitch Perception with Cochlear Implants? Pitch perception is limited in CIs, but CI users do have some ability to rank pitches and pattern match to familiar melodies and melodic contours. While CI users do not perform as well as NH listeners, they are capable of some pitch perception, depending on the task and stimuli. Sometimes, the performance is better than would be suggested by CI simulations. The results presented in Chapters 1 through 4 suggest 45 that CI users primarily rely on spectral envelope or spectral weighting cues to perform in a pitch task. In Chapter 1, as the spectral envelope depth or contrasts were decreased, MCI performance decreased. In Chapter 2, as a second instrument was added, the target contours as observed in the spectral envelopes were not as apparent as with one instrument, and there was a corresponding drop in performance. Musicians claim that spectral envelope should not be used to rank the pitch of a stimulus as evidenced in Figure 3.1; this confounding cue does influence non-musicians in a pitch task. In Chapter 4, spectral envelopes were varied by changing the words while maintaining the target pitch; again, performance dropped relative to the fixed timbre condition. Pitch, as defined by NH musicians, is invariant of spectral envelopes and is primarily based on fine-structure cues and harmonic relationships; this is not the case for CI users. CI users seem to be able to use other cues that are potentially unavailable to NH listeners via a vocoder simulation. While the results in Figure 1.3 indicate that ideal CI listeners would struggle with 1-semitone spacing in an MCI task, individual results in Figure 2.3 show some CI users perform considerably better. For example, S3 scored nearly 100% with CI-only in the No Masker and A6 Masker conditions indicating successful trials with the 1-semitone spacing. Interestingly, the Fixed Word condition in Figure 4.1 seems to show very little changes in spectral envelope across F0, however, subjects are able to reach 55% performance on average. And the Random Sentence CI-only performance is greater than chance level, though it is reaching the floor in some cases. These results suggest that some other pitch cues might be available to CI users. Because there is no fine- structure or harmonic information delivered by a CI, it stands to reason that CI users might adapt to rely and learn to access the available cues, namely temporal repetition rate cues and spectral envelope cues. However, the use of these cues is likely idiosyncratic and subject to a number of confounds, such as confounding pitch and timbre. The available cues for melodic pitch perception with a CI are likely not robust to changes in temporal envelopes, spectral envelopes, and polyphonic stimuli. As such, it is likely that isolated MCI overestimates the musical pitch abilities of CI users. The pitch cues related to fine- structure and harmonic relationships are the most salient for NH listeners and are robust to these typical manipulations. Thus restoring these cues in CIs is likely to drastically improve pitch perception and is probably related to the qualitative unpleasantness of CIs. Improvements to Pitch Perception for Cochlear Implant Users Vocoders are used to simulate CI processing for NH listeners, and the spectral resolution of a vocoder can mimic the functional spectral resolution experienced by CI users by varying the number of channels (Shannon et al., 2004) or by varying the degree of channel interaction. The results of Shannon at el. (2004) and the results of Chapter 1 show that indeed increasing the number of independent channels or increasing the independence of the channels does lead to an increase in pitch perception as measured by familiar melody identification or MCI. These changes to the device will not alter the fixed carrier 46 nature of CI processing; they will only improve the resolution of the device. As such, it would be expected that there will still be a limit to pitch perception with a CI. Currently, the best way to improve music and pitch perception in CI users seems to be adding acoustic hearing. Improving music perception with a CI by the use of residual acoustic hearing, having little to do with changes to electric hearing, may sound trivial, but the real-world impact is remarkable. While added residual, low-frequency acoustic hearing adds to speech perception, the results are much more dramatic for pitch perception. In Chapter 2 and Chapter 4, pitch perception was measured with the CI- only, a HA-only, and both devices together, CI+HA. MCI performance in nearly all tasks for almost all subjects was better with the HA alone than with the CI alone. Additionally, HA performance was much more robust to increases in complexity (e.g., adding a second instrument or varying the words sung in a melody) whereas CI performance fell apart. In addition to a performance increase with HAs relative to CIs, subjects report anecdotally that the overall qualitative perception of sound is much better with both devices rather than just the CI. This result is similar to the findings of Looi et al. (2007) which showed users rated sound quality much higher with CI+HA than CI-alone. Improving overall music and pitch perception, as well as sound quality, by using a HA with a CI is limited by a person’s residual acoustic hearing. In Chapter 2 and Chapter 4, individual subject performance dropped as the low-frequency audibility decreased. These results agree with previous studies (El Fata et al., 2009; Zhang et al., 2013) that show bimodal performance is linked to the resolution and audibility of the residual acoustic hearing. Thus, there is a limit to leveraging residual acoustic hearing to improve pitch perception. Restoring Harmonic Pitch in Cochlear Implants Restoring pitch perception remains a holy grail of CI research. While the majority of CI research is related to speech perception, pitch perception has become an increasing focus of CI research. It is now known that pitch perception plays a critical role in real-life speech perception, either non-lexical information such as speaker identification and vocal emotion or lexical information as in tonal languages. There is ongoing research related to the role of fine-structure cues in segregating speech from noise (Brown and Bacon, 2009; Swaminathan and Heinz, 2012). Pitch perception is critical for music perception, and post-lingually deafened CI users generally maintain that music does not sound as it used to. It is likely that limited CI pitch perception is directly related to the poor quality experienced with electric hearing which results from its fixed carrier nature and poor spatial selectivity. Harmonic pitch is the most salient pitch in NH listening and is robust to a number of stimulus manipulations. And harmonic relationships are discarded by CI processing so it would make sense to try to directly restore these harmonic relationships. In general, pitch related CI research and engineering solutions have not focused on this aspect. Obviously, this goal is very difficult and complex and is subject to a number of confounds. First, harmonic information and relationships need to go to the right place 47 (Oxenham et al., 2004). In a unilateral CI user with no residual acoustic hearing, the pitch of an electrode changes over time to match the input frequency allocation (Reiss et al., 2007). It is unknown how the perceptual, cortical reorganization of pitch changes if there is some residual acoustic hearing and if this reorganization affects the peripheral processing of pitch. Ongoing experiments with single-sided deafness (SSD) patients may reveal the time course and frequency shift associated with this phenomenon. Current steering strategies, such as in Advanced Bionics devices (Firszt et al., 2007), can be used to create virtual channels and effectively change the place of stimulation. Theoretically, an SSD patient with an AB device could pitch match individual electrodes and virtual channels to an acoustic frequency in the opposite ear. Harmonically related frequencies could be matched to places of activation, and the perception of these electric harmonics could be tested in some rigorous way. This serves as an introductory idea from which further experiments could be performed to shed light on harmonic pitch perception with CIs. For example, how close can the harmonics be spaced to still elicit harmonic pitch; how many harmonics can be presented; how does spread of excitation affect harmonic pitch; is there an influence of temporal rate on the harmonic relationships (i.e., do place and timing information have to go to the right place). It is likely that more closely mimicking the typical place-to-frequency (Greenwood, 1990) map is required to restore harmonic pitch. Another complication is the fact that the electrode array of CIs does not reach the lowest frequencies present in the most apical end of the cochlea. This would limit the lower end of harmonics that could be presented by a CI. It has been hypothesized that phantom-electrode strategies with AB devices can activate the more apical auditory nerve fibers. Might this extend the lower limits of frequencies presented by a CI? A Med-El electrode array has a deeper insertion and can more effectively active lower frequency percepts relative to the typical CI electrode array length; can explicit timing information related to CI pulse rate effectively restore low frequency pitch information? There seem to be no improvements to performance for the novel engineering implementations; further study of the benefits provided is critical. One alternative might be to use some low-frequency residual acoustic hearing to present the F0 and lowest order harmonics while a CI provides higher order harmonics. This directly relates to bimodal mapping which is becoming a greater issue. As it stands now, CIs and HAs are mapped independently; it seems likely that the parameters related to frequency allocation and activation are critical to produce a natural, singular auditory sensation from both devices. This is a rich field for questions and answers that is likely critical for further improvements to CI processing. Ultimately, complex pitch is a multichannel percept, and CI research must move to multichannel psychophysics in order to understand real-world pitch perception. 48 References Brown, C. A., and Bacon, S. P. (2009). “Achieving electric-acoustic benefit with a modulated tone,” Ear Hear. 30, 489-493. Burns, E. M., and Viemeister, N. F. (1981). “Played-again SAM: Further observations on the pitch of amplitude-modulated noise,” J. Acoust. Soc. Am. 70, 1655-1660. Chang, Y.-P., and Fu, Q.-J. (2006). “Effects of talker variability on vowel recognition in cochlear implants,” J. Speech Lang. Hear. Res. 49, 1331-1341. Dorman, M. F., and Gifford, R. H. (2010). “Combining acoustic and electric stimulation in the service of speech recognition,” Int. J. Audiol. 49, 912-919. Dorman, M. F., Gifford, R. H., Spahr, A. J., and McKarns, S. A. (2008). “The benefits of combining acoustic and electric stimulation for the recognition of speech, voice and melodies,” Audiol. Neurotol. 13, 105- 112. El Fata, F. E., James, C. J., Laborde, M.-L., and Fraysse, B. (2009). “How much residual hearing is ‘useful’ for music perception with cochlear implants?” Audiol. Neurotol. 14 (suppl 1), 14-21. Firszt, J. B., Koch, D. B., Downing, M., and Litvak, L. (2007). “Current Steering Creates Additional Pitch Percepts in Adult Cochlear Implant Recipients,” Otol. Neurotol. 28, 629-636. Friesen, L.M., Shannon, R.V., Baskent, D., and Wang, X. (2001). “Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants,” J. Acoust. Soc. Am. 110, 1150–63. Fu, Q.-J. (2002). “Temporal processing and speech recognition in cochlear implant users,” Neuroreport 13, 1635-1639. Fu, Q.- J., and Nogaki, G. (2005). “Noise susceptibility of cochlear implant users: The role of spectral resolution and smearing,” J. Assoc. Res. Otolaryngol. 6, 19–27. Galvin J. J., 3rd, Fu Q.-J., and Shannon, R. V. (2009). “Melodic contour identification and music perception by cochlear implant users,” Ann. N.Y. Acad. Sci. 1169, 518–533. Gfeller, K., Christ, A., Knutson, J. F., Witt, S., Murray, K. T., and Tyler, R. S. (2000). “Musical backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant recipients,” J. Am. Acad. Audiol. 7, 390–406. Gfeller, K., Woodworth, G., Witt, S., Robin, D. A., Knutson, J. F. (1997). "Perception of rhythmic and sequential pitch patterns by normally hearing adults and adult cochlear implant users," Ear Hear. 18, 252–260. 49 Gifford, R. H., Dorman, M. F., McKarns, S. A., and Spahr, A. J. (2007). “Combined electric and contralateral acoustic hearing: word and sentence recognition with bimodal hearing,” J. Speech Lang. Hear. Res. 50, 835-843. Greenwood, D. (1990). “A cochlear frequency-position function for several species—29 years later,” J. Acoust. Soc. Am. 87, 2592–2605. Heinz, M. G., and Swaminathan, J. (2009). "Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech," J. Assoc. Res. Otolaryngol. 10, 407-423. Ji, C., Galvin J. J. 3rd, Chang, Y.-P., Xu, A., and Fu, Q.-J. (2014). “Perception of Speech Produced by Native and Nonnative Talkers by Listeners with Normal Hearing and Listeners with Cochlear Implants,” J. Speech Lang. Hear. Res. 57, 532-554. Kong, Y.-Y., and Carlyon, R. P. (2007). “Improved speech recognition in noise in simulated binaurally combined acoustic and electric stimulation,” J. Acoust. Soc. Am. 121, 3717-3727. Kong, Y.-Y., Cruz, R., Jones, J.A., and Zeng, F.-G. (2004). “Music perception with temporal cues in acoustic and electric hearing,” Ear Hear. 25, 173–85. Kong, Y.-Y., Stickney, G. S., and Zeng, F.-G. (2005). “Speech and melody recognition in binaurally combined acoustic and electric hearing,” J. Acoust. Soc. Am. 117, 1351-1361. Looi, V., McDermott, H., McKay, C., and Hickson, L. (2007). “Comparisons of quality ratings for music by cochlear implant and hearing aid users,” Ear Hear. 28, 59S-61S. Looi, V., McDermott, H., McKay, C., and Hickson, L. (2008). “Music perception of cochlear implant users compared with that of hearing aid users,” Ear Hear. 29, 1-14. Luo, X., Fu, Q.-J., and Galvin, J. (2007). “Vocal emotion recognition by normal-hearing listeners and cochlear implant users,” Trends Amplif. 11, 301–15. Macherey, O., and Delpierre, A. (2013). “Perception of musical timbre by cochlear implant listeners: a multidimensional scaling study,” Ear Hear. 34, 426–436. McDermott, H.J. (2004). “Music perception with cochlear implants: A review,” Trends Amplif. 8, 49-82. McDermott, J. H., Lehr, A. J., and Oxenham, A. J. (2008). “Is Relative Pitch Specific to Pitch?” Psych. Science. 19, 1263-1271. Mok, M., Grayden, D., Dowell, R. C., and Lawrence, D. (2006). “Speech perception for adults who use hearing aids in conjunction with cochlear implants in opposite ears,” J. Speech Lang. Hear. Res. 49, 338- 351. Oxenham, A. J. (2012). "Pitch perception," J. Neurosci. 32, 13335-13338. 50 Oxenham, A. J., Bernstein, J. G. W., and Penagos, H. (2004). “Correct tonotopic representation is necessary for complex pitch perception,” Proc. Natl. Acad. Sci. 101, 1421–1425. Peretz, I., Champod, A. S., and Hyde, K. (2003). “Varieties of musical disorders: the Montreal battery of amusia,” Ann. N.Y. Acad. Sci. 999, 58–75. Reiss, L. A., Turner, C. W., Erenberg, S. R., and Gantz, B. J. (2007). “Changes in pitch with a cochlear implant over time,” J. Ass. Res. Otolaryn. 8, 241-257. Rosen, S., Faulkner, A., and Wilkinson, L. (1999). “Adaptation by normal listeners to upward spectral shifts of speech: implications for cochlear implants,” J. Acoust. Soc. Am. 106, 3629-3636. Rubenstein JT. (2004). How Cochlear Implants Encode Speech. Current Opinions in Otolaryngogology and Head and Neck Surgery . 12, 444-448 Shannon, R.V., Fu, Q.J., Galvin, J. (2004). "The Number of channels required for speech recognition depends on the difficulty of the listening situation," Acta Otolaryngol (suppl) 552, 1-5. Shannon, R.V., F.-G. Zeng, V. Kamath, et al. (1995). "Speech recognition with primarily temporal cues," Science 270, 303–304. Swaminathan, J., and Heinz, M. G. (2011). "Predicted effects of sensorineural hearing loss on across- fiber envelope coding in the auditory nerve," J. Acoust. Soc. Am. 129, 4001-4013. Zeng, F.-G. (2002). “Temporal pitch in electric hearing,” Hear. Res. 174, 101–106. Zhang, T., Spahr, A. J., Dorman, M. F., and Saoji, A. (2013). “Relationship between auditory function of nonimplanted ears and bimodal benefit,” Ear Hear. 34, 133-141. Zhu, M., Chen, B., Galvin, J., and Fu, Q.-J. (2011). “Influence of pitch, timbre and timing cues on melodic contour identification with a competing masker,” J. Acoust. Soc. Am. 130, 3562-65. 51 Appendix A: “Channel Interaction Limits Melodic Pitch Perception in Simulated Cochlear Implants.” JASA-EL 2012 Joseph D. Crew, John J. Galvin III, and Qian-Jie Fu. (2012) “Channel Interaction Limits Melodic Pitch Perception in Simulated Cochlear Implants.” Journal of the Acoustical Society of America, Volume 132, EL429-EL435 Channelinteractionlimitsmelodic pitch perceptioninsimulated cochlearimplants JosephD.Crew a),b) DepartmentofBiomedicalEngineering,UniversityofSouthernCalifornia, LosAngeles,California90089 jcrew@hei.org JohnJ.GalvinIII DepartmentofCommunicationandAuditoryNeuroscience,HouseResearchInstitute, LosAngeles,California90057 jgalvin@hei.org Qian-Jie Fu b) DepartmentofBiomedicalEngineering,UniversityofSouthernCalifornia, LosAngeles,California90089 qfu@hei.org Abstract: In cochlear implants (CIs), melodic pitch perception is lim- itedbythespectralresolution,whichinturnislimitedbythenumberof spectral channels as well as interactions between adjacent channels. Thisstudyinvestigatedtheeffectofchannelinteractiononmelodiccon- tour identification (MCI) in normal-hearing subjects listening to novel 16-channel sinewave vocoders that simulated channel interaction in CI signal processing. MCI performance worsened as the degree of channel interaction increased. Although greater numbers of spectral channels may be beneficial to melodic pitch perception, the present data suggest that it is also important to improve independence among spectral channels. V C 2012AcousticalSocietyofAmerica PACSnumbers: 43.66.Ts,43.66.Hg,43.75.Cd[DD] DateReceived: August17,2012 DateAccepted: September28,2012 1.Introduction Multi-channel cochlear implants (CIs) have restored the sensation of hearing to many profoundly deaf individuals. Most CI users are able to understand speech quite well in quiet listening conditions. However, CI performance in difficult listening tasks (e.g., speech understanding in noise, music perception) remains much poorer than normal hearing (NH) listeners, primarily due to the limited spectral resolution of the device (Gfeller et al., 2002; McDermott, 2004). In CIs, spectral resolution is limited by the number of physically implanted electrodes (or virtual channels in the case of current steering), as well as by the amount of channel interactions between adjacent electrodes. In CIs, channel interaction can be due to unintended electric field interactions from nearby electrodes leading to an overlapping spread of excitation of the auditory nerve. The effect of spectral resolution on speech understanding has been extensively studied in CI users (Friesen et al., 2001; Fu and Nogaki, 2005; Luo et al., 2007; Bin- gabr et al., 2008). Friesen et al. (2001) found that, for NH subjects, speech perform- ance in quiet and noise steadily improved as the number of channels was increased up to 20. For CI subjects, performance steadily improved up to seven to ten spectral a) Authortowhomcorrespondenceshouldbeaddressed. b) Also at: Department of Communication and Auditory Neuroscience, House Research Institute, Los Angeles, CA90057. J.Acoust.Soc.Am.132(5),November2012 V C 2012AcousticalSocietyofAmerica EL429 Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms channels, beyond which there was no improvement, presumably due to channel inter- action. Luo et al. (2007) demonstrated that vocal emotion recognition, which relies strongly on voice pitch cues, worsened as the number of spectral channels was reduced, for both real CI users and NH subjects listening to CI simulations. Fu and Nogaki (2005) investigated the influence of channel interaction on speech perception in gated noise for NH subjects listening to acoustic CI simulations. They found that, for a wide range of gating frequencies, speech understanding in noise worsened as the amount of channel interaction increased. They also found that mean CI performance was most similar to that of NH subjects listening to four spectrally smeared channels, with the top CI performance similar to that of NH subjects listening to 8–16 spectrally smeared channels. Bingabr et al. (2008) simulated the spread of excitation in an acoustic CI simulation using a model based on neurophysiologic data. Again, effective spectral re- solution was reduced as the amount of channel interaction was increased, resulting in poorer speech performance in quiet and in noise. Limited spectral resolution has also been shown to negatively impact music perception. Kong et al. (2004) observed that decreasing the number of spectral chan- nels influenced melody recognition. Less is known about the effects of channel interac- tion on music perception, specifically melodic pitch perception. CI users must extract pitch from coarse spectro-temporal representations and do not have access to fine structure cues that are important for melodic pitch and timbre perception. Indeed, pitch and timbre may sometimes be confounded in CI users. Galvin et al. (2008) found that, different from NH listeners, CI users’ melodic contour identification (MCI) was significantly affected by instrument timbre. MCI performance was generally better with simpler stimulation patterns (e.g., organ) than with complex patterns (e.g., piano). It is unclear how channel interaction (which relates in some ways to broad stimulation pat- terns) may affect melodic pitch perception. Most recent CI research and development has been directed at improving the number of spectral channels (e.g., virtual channels) rather than reducing channel inter- action. The present study investigated the effect of channel interaction on melodic pitch perception, using the MCI task. NH subjects were tested while listening to novel 16-channel sinewave vocoders that simulated varying amounts of channel interaction. We hypothesized that, similar to previous CI speech studies, increased channel interac- tion would reduce melodic pitch salience, resulting in poorer MCI performance. 2.Methods Twenty NH subjects (aged 23–48 years) participated in the experiment. For all sub- jects, pure-tone thresholds were less<20dB for audiometric frequencies up to 8 kHz. Music pitch perception was measured using a MCI task (Galvin et al., 2007). The nine contours were “rising,” “falling,” “flat,” “rising-flat,” “falling-flat,” “rising- falling,” “falling-rising,” “flat-rising,” and “flat-falling.” Each contour consisted of five musical notes 300ms in duration with 300ms between notes. The lowest note of each contour was A3 (220Hz). The frequency spacing between successive notes in each contour was varied between one and three semitones. The source stimuli for the vocoding was a piano sound, created by musical instrument digital interface (MIDI) sampling and re-synthesis, as in Galvin et al. (2008) and Zhu et al. (2011). The CI simulations were vocoded using sinewave carriers, rather than noise- bands. Although phoneme and sentence recognition performance has been shown to be similar with sinewave and noise-band CI simulations (Dorman et al., 1997), sinewave simulations have been shown to better emulate real CI performance for pitch-related speech tasks (Luo et al., 2007). Sinewave carriers offer better stimulation site specificity and better temporal envelope representation than noise-band carriers. One concern with sinewave-vocoding is the potential for additional pitch cues provided by sidebands resulting from amplitude modulation. However, in this study, if such sideband pitch cues were available, they were equally available across all channel interaction condi- tions (the parameter of concern). Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 EL430 J.Acoust.Soc.Am.132(5),November2012 Crew et al.:Channelinteractionincochlearimplantmusic Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms The present study made use of a novel implementation of an acoustic CI simu- lation, in that it also simulated different amounts of channel interaction. Figure 1 shows a schematic representation of the simulated CI signal processing, which was implemented as follows. The source stimulus was fed into a bank of 16 bandpass filters equally spaced according to the frequency-to-place mapping of Greenwood (1990). The overall input frequency range was 188–7938 Hz, which corresponded to the default input frequency range of Cochlear Corp.’s (Sydney, NSW, Australia) Nucleus-24 de- vice. The slowly varying envelope energy in each band was extracted via half-wave rec- tification followed by low-pass filtering (cutoff frequency of 160 Hz). Different degrees of channel interaction were simulated by adding variable amounts of temporal enve- lope information extracted across analysis bands to the envelope of a particular band. This was analogous to modifying the output filter slopes in noise-band vocoding. The amount of envelope information added to adjacent bands depended not only on the targeted degree of channel interaction, but also on the frequency distance between ad- jacent bands. The output filter slopes were 24, 12, or 6dB/octave, simulating “slight,” “moderate,” and “severe” channel interaction, respectively. The temporal envelope from each band (including the targeted degree of channel interaction) was used to modulate a sinewave carrier whose frequency corresponded to the center frequency of the analysis band. The outputs were then summed and the resulting signal was normal- ized to have the same long-term root-mean-square amplitude of the input signal. Audio examples of the unprocessed stimuli and the vocoded stimuli with different amounts of channel interaction are given in Mm. 1. Mm. 1. Audio example of the rising contour with two-semitone spacing: (1) Unprocessed, followedbythe16-channelCIsimulationwith(2)no,(3)slight,(4)moderate,and(5)severe channelinteraction.Thisisafileoftype“wav”(2247kB). Figure 2 shows frequency analysis for A3 (the lowest note in any of the con- tours), B3 (two semitones higher than A3), and C#4 (two semitones higher than B3), for unprocessed stimuli and stimuli processed by the CI simulation with no, slight or severe channel interaction. These three notes are the first three notes heard in audio demo Mm. 1. As the amount of channel interaction was increased, the difference in Fig. 1. Block diagram of the signal processing for the acoustic CI simulation. Temporal envelope information extractedfromananalysisband(onlyonebandisshown)isaddedtootherbandswithagainofki,whichcorre- sponds to filter slope in dB/octave. The sinewave carriers were the center frequency of the frequency analysis band. Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 J.Acoust.Soc.Am.132(5),November2012 Crew et al.:Channelinteractionincochlearimplantmusic EL431 Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms spectral envelope contrast or depth across notes was reduced. When the contrast is suf- ficiently reduced (bottom row), there appears to be little change in the spectral enve- lope across the three notes. With no channel interaction (row 2), the depth is sufficient to see the peaks of the spectral envelope shift across notes. However, the representation with the acoustic CI simulation, even with no channel interaction [row 2 in Fig. 2 and the second example in Mm. 1], is much poorer than for the unprocessed stimuli [top row of Fig. 2 and the first example in Mm. 1]. Figure 2 suggests that pitch cues across notes will be better preserved as the degree of channel interaction is reduced. Each subject was initially tested using the unprocessed stimuli to familiarize the subject with the experiment and to verify that they were able to score above 90% correct for the MCI task. All subjects were tested while sitting in a sound-treated booth (IAC, Bronx, NY) and directly facing a single loudspeaker [Tannoy (Coat- bridge, Scotland, UK) Reveal]. All stimuli were presented acoustically at 65 dBA. The four-channel interaction conditions (none, slight, moderate, and severe) were tested in separate blocks and the test block order was randomized across subjects. During each test block, a contour was randomly selected (without replacement) from among the 54 stimuli (9 contours3 semitone spacings2 repeats) and presented to the subject, who responded by clicking on one of the nine response boxes shown onscreen. Subjects were allowed to repeat each stimulus up to three times. No preview or trial-by-trial feedback was provided. A minimum of two test blocks were tested for each channel interaction condition; if the difference in performance was greater than 10%, a third run was performed. The scores for all trials were then averaged together. Fig. 2. Frequency analysis of experimental stimuli. From the left to right, the columns indicate different notes. The top rowshows theunprocessedsignal.Rows 2, 3, and 4 show stimuliprocessedby the CI simulations with no,slight,andseverechannelinteraction,respectively. Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 EL432 J.Acoust.Soc.Am.132(5),November2012 Crew et al.:Channelinteractionincochlearimplantmusic Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms 3.Results All subjects scored above 90% correct with the unprocessed stimuli. Figure 3 shows mean MCI performance as a function of semitone spacing (left panel) and degree of channel interaction averaged across semitone conditions (right panel). MCI perform- ance monotonically worsened as the amount of channel interaction was increased. A two-way repeated-measures analysis of variance showed significant main effects for channel interaction [F (3,90) ¼149.4, p<0.001] and semitone spacing [F (2,90) ¼203.1, p<0.001], as well as a significant interaction [F (6,90) ¼15.6, p<0.001]. Post hoc Bon- ferroni comparisons revealed significant differences between all channel interaction conditions (p<0.001 in all cases) and between all semitone spacing conditions (p<0.001 in all cases). 4.Discussion As hypothesized, the present CI simulation results show that channel interaction can negatively affect melodic pitch perception. As illustrated in Fig. 2, spectral envelope cues are weakened by CI signal processing and further weakened by channel interac- tion. As such, increasing the number of channels may not sufficiently enhance spectral contrasts between notes. Most CI signal processing strategies use monopolar stimula- tion, which results in broader activation and greater channel interaction than with current-focused stimulation (e.g., tripolar stimulation) (Bierer, 2007). The present data suggest that reducing channel interaction may be as important a goal as increasing the number of stimulation sites. In this study, the amount of channel interaction was constant across subjects, and constant across channels within each condition. In the real CI case, channel inter- action may vary greatly across CI users, and across electrode location within CI users. Interestingly, mean performance with real CI users for the exactly the same task and stimuli was 61% correct (Zhu et al., 2011; dashed line in Fig. 3), and was most compa- rable to mean CI simulation performance with slight channel interaction in the present study. Note that the present NH subjects had no prior experience listening to vocoded sounds, compared with years of experience with electric stimulation for real CI users. With more experience, NH performance would probably improve, but the general trend across conditions would most likely remain. It is possible that the effects of chan- nel interaction observed in this study may explain some of the larger variability observed in Zhu et al. (2011), with some CI users experiencing moderate-to-severe channel interaction and others experiencing very little. Of course, many other factors can contribute to CI users’ melodic pitch perception (e.g., acoustic frequency alloca- tion, electrode location, pattern of nerve survival, experience, etc.). Fig. 3. (Color online) Mean MCI performance with the CI simulation as a function of semitone spacing (left panel) or the degree of channel interaction (right panel; performance averaged across semitone spacing condi- tions). The error bars indicate the standard error. The dashed line in the right-hand panel shows mean CI per- formancefromZhuetal.(2011). Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 J.Acoust.Soc.Am.132(5),November2012 Crew et al.:Channelinteractionincochlearimplantmusic EL433 Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms The number of spectral channels has been shown to limit CI performance in difficult listening situations (Friesen et al., 2001; Kong et al., 2004; Luo et al., 2007). Data from the present study and from Fu and Nogaki (2005) suggest that channel interaction may also limit CI performance where perception of pitch cues may be bene- ficial. In dynamic noise, pitch cues may help to stream a talker’s voice and segregate target speech from dynamic noise or a competing talker; when channel interaction is increased, pitch cues may become less salient and segregation more difficult. In previ- ous CI simulation studies, the amount of channel interaction did not necessarily increase as the number of channels increased, as would happen in the real CI case. In these studies, CI simulation performance improved as the number of channels increased, whereas real CI performance peaked at six to ten channels, presumably due to channel interaction. The present MCI data suggest that pitch perception worsens with increasing channel interaction. As the CI does not provide strong pitch cues, channel interaction may further weaken already poor pitch perception. The present data imply that increasing the number of stimulation sites, whether with more electrodes or virtual channels, may not be sufficient to provide adequate pitch cues. The present study essentially simulated a discrete neural popula- tion with 16 fixed channels; the channel interaction conditions simulated increased cur- rent spread across these locations. Because the locations were fixed, the change in the spectral envelope was the dominant cue for melodic pitch. As seen in Fig. 2, as the channel interaction increased, the variance in the spectral envelope was reduced, mak- ing pitch perception more difficult. Although increasing the number of stimulation sites would seem to increase the spectral resolution, the present suggest a need to also limit channel interaction. Current focusing (Landsberger et al., 2012) or optical stimulation (Izzo et al., 2007) may help to reduce channel interaction in CIs and other auditory neuroprostheses. In Landsberger et al. (2012), the perceptual quality of electric stimula- tion (e.g., clarity, purity, fullness, etc.) improved with current focusing and was corre- lated with reduced spread of excitation. Reducing the spread of excitation might reduce channel interaction, which, according to present study, would improve melodic pitch perception. There may be an optimal tradeoff between the number of channels and the degree of channel interaction as spread of excitation also occurs to some extent in acoustic hearing. Acknowledgments The authors thank the subjects for their participation. This work was supported by the USCProvostFellowship,theUSCHearingandCommunicationNeuroscienceProgram, NIDCDR01-DC004993,andbythePaulVeneklasenResearchFoundation. Referencesandlinks Bierer,J.A.(2007).“Thresholdandchannelinteractionincochlearimplantusers:evaluationofthe tripolarelectrodeconfiguration,”J.Acoust.Soc.Am.121,1642–1653. Bingabr,M.,Espinoza-Varas,B.,andLoizou,P.C.(2008).“Simulatingtheeffectofspreadofexcitation incochlearimplants,”Hear.Res.241,73–79. Dorman,M.F.,Loizou,P.C.,andRainey,D.(1997).“Speechintelligibilityasafunctionofthenumber ofchannelsofstimulationforsignalprocessorsusingsine-waveandnoise-bandoutputs,”J.Acoust.Soc. Am.102,2403–2411. Friesen,L.M.,Shannon,R.V.,Baskent,D.,andWang,X.(2001).“Speechrecognitioninnoiseasa functionofthenumberofspectralchannels:Comparisonofacoustichearingandcochlearimplants,”J. Acoust.Soc.Am.110,1150–1163. Fu,Q.-J.,andNogaki,G.(2005).“Noisesusceptibilityofcochlearimplantusers:Theroleofspectral resolutionandsmearing,”J.Assoc.Res.Otolaryngol.6,19–27. Galvin,J.J.,III,Fu,Q.-J.,andNogaki,G.(2007).“Melodiccontouridentificationbycochlearimplant listeners,”EarHear.28,302–319. Galvin,J.J.,III,Fu,Q.-J.,andOba,S.(2008).“Effectofinstrumenttimbreonmelodiccontour identificationbycochlearimplantusers,”J.Acoust.Soc.Am.124,EL189–EL195. Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 EL434 J.Acoust.Soc.Am.132(5),November2012 Crew et al.:Channelinteractionincochlearimplantmusic Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms Gfeller,K.,Turner,C.,Mehr,M.,Woodworth,G.,Fearn,R.,Knutson,J.,Witt,S.,andStordahl,J. (2002).“Recognitionoffamiliarmelodiesbyadultcochlearimplantrecipientsandnormal-hearingadults,” Coch.Imp.Inter.3,29–53. Greenwood,D.(1990).“Acochlearfrequency-positionfunctionforseveralspecies—29yearslater,”J. Acoust.Soc.Am.87,2592–2605. Izzo,A.D.,Suh,E.,Pathria,J.,Walsh,J.T.,Whitlon,D.S.,andRichter,C.-P.(2007).“Selectivityof neuralstimulationintheauditorysystem:Acomparisonofopticandelectricstimuli,”J.Biomed.Opt.12, 021008. Kong,Y.-Y.,Cruz,R.,Jones,J.A.,andZeng,F.-G.(2004).“Musicperceptionwithtemporalcuesin acousticandelectrichearing,”EarHear.25,173–185. Landsberger,D.M.,Padilla,M.,andSrinivasan,A.G.(2012).“Reducingcurrentspreadusingcurrent focusingincochlearimplantusers,”Hear.Res.284,16–24. Luo,X.,Fu,Q.-J.,andGalvin,J.(2007).“Vocalemotionrecognitionbynormal-hearinglistenersand cochlearimplantusers,”TrendsAmplif.11,301–315. McDermott,H.J.(2004).“Musicperceptionwithcochlearimplants:Areview,”TrendsAmplif.8,49–82. Zhu,M.,Chen,B.,Galvin,J.,andFu,Q.-J.(2011).“Influenceofpitch,timbreandtimingcuesonmelodic contouridentificationwithacompetingmasker,”J.Acoust.Soc.Am.130,3562–3565. Crew et al.:JASAExpressLetters [http://dx.doi.org/10.1121/1.4758770] PublishedOnline18October 2012 J.Acoust.Soc.Am.132(5),November2012 Crew et al.:Channelinteractionincochlearimplantmusic EL435 Downloaded 11 Jun 2013 to 12.175.189.130. Redistribution subject to ASA license or copyright; see http://asadl.org/terms 59 Appendix B: “Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception.” PLOS ONE 2015 Joseph D. Crew, John J. Galvin III, David M. Landsberger, and Qian-Jie Fu. (2015) “Contributions of Electric and Acoustic Hearing to Bimodal Speech and Music Perception.” Public Library of Science-ONE, DOI:10.1371/journal.pone.0120279 RESEARCHARTICLE ContributionsofElectricandAcoustic HearingtoBimodalSpeechandMusic Perception JosephD.Crew 1 *,JohnJ.GalvinIII 2 ,DavidM.Landsberger 3 ,Qian-JieFu 2 1 DepartmentofBiomedicalEngineering,UniversityofSouthernCalifornia,LosAngeles,California,United StatesofAmerica,2 DepartmentofHeadandNeckSurgery,UniversityofCalifornia-LosAngeles,Los Angeles,California,UnitedStatesofAmerica,3 DepartmentofOtolaryngology,NewYorkUniversitySchool ofMedicine,NewYork,NewYork,UnitedStatesofAmerica * jcrew@usc.edu Abstract Cochlearimplant(CI)usershavedifficultyunderstandingspeechinnoisylisteningcondi- tionsandperceivingmusic.Aidedresidualacoustichearinginthecontralateralearcanmiti- gatetheselimitations.Thepresentstudyexaminedcontributionsofelectricandacoustic hearingtospeechunderstandinginnoiseandmelodicpitchperception.Datawascollected withtheCIonly,thehearingaid(HA)only,andbothdevicestogether(CI+HA).Speechre- ceptionthresholds(SRTs)wereadaptivelymeasuredforsimplesentencesinspeechbab- ble.Melodiccontouridentification(MCI)wasmeasuredwithandwithoutamasker instrument;thefundamentalfrequencyofthemaskerwasvariedtobeoverlappingornon- overlappingwiththetargetcontour.ResultsshowedthattheCIcontributesprimarilytobi- modalspeechperceptionandthattheHAcontributesprimarilytobimodalmelodicpitchper- ception.Ingeneral,CI+HAperformancewasslightlyimprovedrelativetothebetterear alone(CI-only)forSRTsbutnotforMCI,withsomesubjectsexperiencingadecreaseinbi- modalMCIperformancerelativetothebetterearalone(HA-only).Individualperformance washighlyvariable,andthecontributionofeitherdevicetobimodalperceptionwasboth subject-andtask-dependent.TheresultssuggestthatindividualizedmappingofCIsand HAsmayfurtherimprovebimodalspeechandmusicperception. Introduction Duetorelaxingcriteriaforimplantation,increasingnumbersofcochlearimplant(CI)recipi- entshavesomeamountofresidualacoustichearing[1].Thisacoustichearingisoftenaidedby ahearingaid(HA),andthis"bimodal"listeningor"electro-acousticstimulation"(EAS)has beenshowntobenefitCIusers’speechperception.CIsprovidemanypatientswithgoodspeech understandinginquiet,easylisteningconditions.However,CIsdonotprovidethespectro- temporalfinestructureinformationneededtosegregatespeechfromnoise.Thecoarsespectral resolutionalsodoesnotsupportcomplexpitchperceptionwhichisimportantformusical PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 1/18 OPENACCESS Citation:CrewJD,GalvinIIIJJ,LandsbergerDM,Fu QJ(2015) ContributionsofElectricandAcoustic HearingtoBimodalSpeechandMusicPerception. PLoSONE10(3): e0120279.doi:10.1371/journal. pone.0120279 AcademicEditor:HanjunLiu,SunYat-sen University, CHINA Received:November21,2014 Accepted:January26,2015 Published:March19,2015 Copyright:©2015Crewet al.Thisisanopen accessarticledistributedunder thetermsofthe CreativeCommonsAttribution License,whichpermits unrestricteduse,distribution,andreproductioninany medium,providedtheoriginalauthorandsource are credited. DataAvailabilityStatement:Allrelevantdata are within thepaper. Funding:Thisworkwassupported bytheUniversity ofSouthernCaliforniaProvostFellowship, the UniversityofSouthernCaliforniaHearing and CommunicationNeuroscienceProgram,National InstitutesofHealth:NationalInstituteonDeafness andCommunicationDisordersR01-DC004993,and bythePaulVeneklasenResearchFoundation.The fundershadnoroleinstudydesign, data collection andanalysis,decisiontopublish,orpreparationof themanuscript. melodyperceptionandsegregationofcompetingmelodiesand/orinstruments.Asaresult, manyCIlistenershavedifficultyperceivingandenjoyingmusic[2–4].Acoustichearingwith HAsprovideslow-frequencypitchcuesthatmayworkinconjunctionwithaCItobetterrepre- sentdifferentaspectsofsound.Manybimodallistenersreportthatmusicandvoicessound morenaturalwhilelisteningwithaCIandaHA[5–6].Presumably,theHArestoreslow- frequencyfine-structurecuesthatprovideusefulpitchinformation. Thebenefitsofcombinedelectricandacoustichearingforspeechperceptionarewelldocu- mented[7–15].Ingeneral,combineddeviceperformanceisbetterthanCI-onlyperformance, butoutcomesmaybesubject-and/ortask-dependent.Kongetal.[9]demonstratedthatthe combineduseofHAandCIimprovedsentencerecognitioninnoise;however,thisimprove- mentwasnotobservedatallsignal-to-noiseratios(SNRs)orforallsubjects.WhileBrownand Bacon[8]showedimprovedperformancewithEASrelativetoCI-only,resultsweremixedfor Giffordetal.[12]andMoketal.[10].KongandBraida[16]testedtheintegrationofinforma- tionacrossearsandfoundimprovedperformanceinallNHsubjectswhenlow-frequency acousticinformationwasaddedtovocodedspeech;however,onlyafewCIlistenersexhibited suchbimodalbenefits.Gfelleretal.[17]foundnosignificantdifferencebetweenhybridEAS users(acoustichearingcombinedwithashortCIelectrodearrayinthesameear)andstandard electrodelength,unilateralCIuserswhenmeasuringspeechperformancewithadaptivemusic backgrounds.Zhangetal.[14]andBrownandBacon[8]demonstratedthatotherwiseunintel- ligiblelow-frequencysoundsprovidemostofthebimodalorEASbenefit,comparedwithelec- trichearingonly.Takentogether,theseresultssuggeststhatcombinedelectricandacoustic speechperformancemaybenefitevenfromverylimitedandcrudeacousticinformationpro- videdbyHAs;thislimitedacousticinformationisevenmoreimportantformusicperception. Incontrasttospeechrecognition,musicperceptionhasnotbeenasdeeplyinvestigatedwith bimodalorEASlisteners.Kongetal.[9]testedfamiliarmelodyrecognitionasafunctionofde- vicetype(HAand/orCI)andfrequencyrange.Theauthorsfoundthatdifferentfromspeech perception,HAperformancewasbetterthanCIperformanceforfamiliarmelodyrecognition. SimilarresultswerefoundbyDormanetal.[13]inamelodyidentificationtask,withHA-only performancebetterthanCI-onlyandbimodalperformancesimilartoHA-only.Gfelleretal. [18]showedthathybridEASsubjectsoutperformedunilateralCIsubjectswithastandardelec- trodearrayinbothmelodyandinstrumentidentificationtasks.Gfelleretal.[19]foundbetter pitchdiscriminationbyhybridEASsubjects,comparedwithunilateralCIsubjectswithstan- dardarrays;pitchdiscriminationwasalsosignificantlycorrelatedwithfamiliarmelodyrecog- nition.Gfelleretal.[20]measuredmelodyidentificationusingreal-worldmusicexcerpts andfoundabenefitwhenacoustichearingwasaddedtoaCI.Kongetal.[21]usedmulti- dimensionalscaling(MDS)toexaminetimbreperceptioninbimodalandbilateralCIusers, andfoundnoadvantagetocombineduseofCI+HAortwoCIs,comparedwiththebetterear alone;theauthorsremarkedthatthedominantcuefortimbrewasthespectralenvelopewhich wassufficientlytransmittedbyasingleCI. ElFataetal.[22]dividedbimodalsubjectsintotwogroupsaccordingtotheaudiogramof theacousticear.Usingamelodyidentificationtaskwithandwithoutlyrics,theauthorsfound improvedperformancewithCI+HA(relativetotheCIalone)onlyforthegroupwithbetter acoustichearing.Interestingly,subjectsinthegroupwithpooreracoustichearingpreferredto listentomusicwithonlytheCIwhilethegroupwithbetteracoustichearingpreferredtolisten tomusicwithboththeCIandtheHA.Looietal.[23–24]foundthatCIusersratedmusicas morepleasantsoundingthanHA-onlysubjects,eventhoughHAusersoutperformedCIusers onpitchandmelodytasks.LooiandRadford[25]foundnosignificantdifferencebetween pitchrankingscoresbetweenbimodalandunilateralCIusers,however,thelowesttestedinter- valwasaquarteroctave,3semitones.Thesestudiesdemonstrateawidelyrangingcontribution CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 2/18 CompetingInterests:Theauthorshavedeclared thatnocompetinginterestsexist. ofacoustichearingtobimodalandEASusers’musicperceptionthatseemstodependonsever- alfactorsincluding(butnotlimitedto)theamountofresidualacoustichearing,thelistening task,CI-onlyperformance,etc.However,itisunclearhowacousticcuescontributetoCImusic perception.Presumably,theaddedacousticinformationshouldimprovemelodicpitchpercep- tionand/orsegregationofcompetingmelodies. Inthisstudy,speechandmusicperceptionweremeasuredinbimodalsubjectslistening withtheCIonly,theHAonly,orbothdevices(CI+HA).Sentencerecognitionwasmeasured incompetingmulti-talkerbabble.Melodicpitchperceptionwasmeasuredusingamelodic contouridentificationtask(MCI)[26–27].TheMCItaskwasusedtoprovidesomequantifica- tionofbimodalsubjects’functionalpitchperceptionacrossdeviceconditions.Thefundamen- talfrequency(F0)rangeofthetargetcontourswasconstrainedtobeaudiblewiththeHAand withtheCI.MCIperformancewasalsomeasuredwithacompetingmasker;themaskerF0 rangeeitheroverlappedwiththatofthetargetcontour(i.e.,audiblewiththeHAandtheCI), orwasremotefromthetarget(i.e.,audiblewithonlytheCI).Wehypothesizedthatcombined useofCI+HAwouldprovidebetterspeechandmusicperformancethanwitheitherdevice alone.WealsohypothesizedthatfortheMCItask,segregationofthemaskerandtargetwould beeasierwhenthemaskerF0rangedidnotoverlapwiththetargetF0range. Methods Subjects Eightbimodalsubjects,thosewithaCIinoneearandaHAintheoppositeear,participatedin thestudy.Allbutoneofthesubjectshadparticipatedinapreviousspeechrelatedbimodal study[15],andthefinalsubjectwasrecruitedamongthegeneralpopulation.Table1shows subjectdemographicinformation.Theonlyinclusioncriteriawerethatthesubjectsusedboth devicesonadailybasisandhadmorethanoneyearofbimodallisteningexperience.Nosubject wasexcludedonthebasisofspeechscores,acoustichearingaudiogram,musicalexperience, etc.Nosubjectshadanyformalmusictraining;however,subjectsS1,S2,andS3hadpreviously participatedinanMCItrainingstudy[28].Inthatstudy,onlytheCIwastrainedandtested;in thepresentstudy,thetestingofHAandCI+HAwasnovel.Allsubjectswerepost-lingually deafenedexceptforS5whowasperi-linguallydeafened.Thus,thesubjectpoolrepresenteda broadrangeofbimodalusers:witharangespeechscores,withandwithoutmusicaltraining, differentdevicetype,etc. Table1.Subjectdemographic information. Subject Age OnsetofHearingLoss(Years) CIExperience(Years) CI HA EtiologyofHearingLoss S1 79 14 13 AdvancedBionics Phonak Sudden Sensorineural S2 79 35 12 Cochlear Siemens Sensorineural Progressive S3 75 31 3 Cochlear Resound Noise Exposure S4 43 22 2 Cochlear Phonak Sensorineural Genetic S5 47 47 14 Cochlear Phonak German Measles S6 70 8 1 AdvancedBionics Oticon Meniere's Disease S7 59 10 8 AdvancedBionics Oticon Ototoxicity S8 65 35 8 AdvancedBionics Widex Cochlear Otosclerosis S9 79 15 1 Cochlear Oticon Familial doi:10.1371/journal.pone.0120279.t001 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 3/18 Ethicsstatement ThisstudyandallexperimentalprocedureswereapprovedbytheInstitutionalReviewBoard atSt.Vincent'sHospitalinLosAngeles,CA,USAatthetimeoftesting.Eachsubjectprovided writtenconsentforparticipationinthestudy.Subjectswereinformedoftherisksandpotential rewardsbeforetestingbegan.Subjectswerecompensatedfortheirtimeeachtestday. Audiometricthresholds Fig.1showsthresholdsacrossdifferentlisteningconditionsforeachsubject.Audiometric thresholdsusingwarbletoneswerecollectedinsoundfieldwiththeCI+HA,CIonly,HAonly, andunaidedforeachsubject.FortheCI,HA,andCI+HAconditions,thresholdsweremea- suredusingsubjects’clinicaldevicesandsettings.Subjectssatinasound-treatedbooth(IAC) approximately1mawayfromandfacingtowardsasingleloudspeaker.CI+HAthresholds werecollectedfirst.Next,CI-onlythresholdswerecollected;theHAwasremovedandthatear wasplugged.Next,HA-onlythresholdswerecollected;theCIspeechprocessorwasremoved andthatearwasplugged.Finally,unaidedthresholdswerecollectedbyremovingbothdevices, butpluggingneitherear. Generalprocedure AlltestingwasperformedwiththeCIonly,theHAonly,andCI+HA.Thetestorderfordevice conditionswasrandomizedwithinandacrosssubjects.SubjectsusedtheirclinicalCIandHA deviceandsettings.Toensurethatloudnesswasperceivedassimilaracrossdevices,subjects listenedtoafewmelodiccontourswhileusingtheCIandtheHA.Ifasubjectindicatedthat theHAwaslowerinvolumerelativetotheCI,thesubjectwasinstructedtoincreasetheHA volume,andthissettingwasusedforalltesting.Allsubjects(exceptforS1)indicatedthatthe loudnesswasapproximatelyequalacrossdevicesandmadenoadjustmenttoeitherdevice; subjectS1slightlyincreasedthevolumeoftheHAtomatchtheloudnessoftheCI. Speechstimuliandtesting SpeechunderstandinginnoisewastestedusingsentencesfromtheHearinginNoiseTest (HINT)[29]presentedinmulti-talkerspeechbabble.Speechreceptionthresholds(SRTs)were adaptivelymeasured.TheSRTwasdefinedasthesignal-to-noiseratio(SNR)thatyields50% wordscorrectlyidentifiedinsentences(Rule3fromChanetal.[30]).Thespeechlevelwas fixedat65dBA,andthelevelofthebackgroundnoisewasvariedaccordingtosubjectre- sponse.TheinitialSNRwas20dB.Ifthesubjectcorrectlyidentified50%ormoreofthewords inthetestsentence,thebackgroundnoisewasincreasedby2dB.Ifthesubjectidentifiedless than50%ofthewordsinthetestsentence,thebackgroundnoisewasreducedby2dB.Amini- mumofthreerunswerecollectedforeachhearingmodecondition. Musicstimuliandtesting Melodicpitchperceptionwastestedusingamelodiccontouridentification(MCI)task[26– 27].IntheMCItask,asubjectispresentedwithoneofninepossibletargetcontours(“rising,” “falling,” “flat,” “rising-flat,” “falling-flat,” “rising-falling,” “falling-rising,” “flat-rising,”and “flat-falling”)andisaskedtoidentifywhichcontourwaspresented.Eachcontourconsistedof 5notes,300msindurationwitha300mssilentintervalbetweennotes.Thesemitonespacing betweenconsecutivenotesinthecontourwasvariedfrom1to3semitones,allowingforsome measureofpitchresolution.Thelowestnoteforeachtargetcontourwas220Hz(A3)witha highestpossiblenoteof440Hz(A4).Assuch,theF0rangeofthetargetcontourswasaudible CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 4/18 withboththeHAandtheCI.EachnoteforthetargetcontourwasplayedbyaMIDIsynthe- sizedpianosample.Thepianowaschosenbecauseitproducedthelowestscoresinprevious MCIstudies[31]asithadthemostcomplexspectro-temporalenvelope. Fig.2showsthe “rising”contourwith1-semitone(toprow)or3-semitone(bottomrow) spacing.ThefarleftsideofFig.2illustratesthedifferentcontourswithintheHAandCI Fig1. Audiometricthresholdsforeachsubjectfordifferenthearingdevices.CI+HA(blackcircles),CI(redboxes),HA(greentriangles),andunaided (whitetriangles)thresholdsareshownforeachsubject.Thresholdsgreaterthan80dBHLarenotshown. doi:10.1371/journal.pone.0120279.g001 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 5/18 frequencyranges.Theoriginalspectrogramofthecontoursisshownjusttotheright;differ- encesintheextentofF0rangecanbeseenbetweenthe1-and3-semitonespacingconditions. Nextrightisaspectrogramofthecontoursprocessedbyahearinglosssimulation(AngelSim fromwww.tigerspeech.com).Asteeplyslopinghearinglosswassimulated(0dbHLat125Hz, 20dBHLat250Hz,60dbHLat500Hz,60dBHLat1000Hz,100dBHLat2000Hz,120dB HLat4000Hz,and120dBHLat8000Hz)forillustrativepurposesonly,andwasnotintended torepresentanysubject’saudiogram.Differencesinhighfrequencyharmonicinformationcan beeasilyseenbetweentheoriginalandHAspectrograms.ThefarrightofFig.2showselectro- dogramsthatrepresenttheelectricalstimulationpatternsgiventhedefaultstimulation Fig2. SpectrogramsandelectrodogramsfortheNoMaskerconditionfor1-and3-semitonespacings.Thefarleftpanelshowsaschematic representationofHAandCIfrequencyranges.Thetargetcontourisshowninblack.Themiddletwopanelsshowaspectralrepresentationoftheoriginal stimuli(left)andsimulatedHAoutput(right).AsteeplyslopinghearinglosswassimulatedusingAngelSimandisintendedforillustrativepurposesonly.The farrightpanelshowsanidealizedelectrodogramrepresentingtheelectricalstimulationpatternsforaCI.Electrodogramsweresimulatedusingdefault stimulationparametersfortheCochlearFreedomandNucleus-24devices:900Hz/channelstimulationrate,8maxima,frequencyallocationTable6,etc. doi:10.1371/journal.pone.0120279.g002 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 6/18 parametersfortheCochlearFreedomandNucleus24deviceswhichemploysan8of22chan- nelselectionstrategy.[TheelectrogramsforsubjectsS1,S6,S7,andS8wouldbeslightlydiffer- entastheyuseanAdvancedBionicsdevice(16channels;nochannelselection).]They-axis representselectrodesfromtheapex(bottom)tothebase(top).Differencesinthestimulation patternacrossnotescanbeseenwiththe3-semitonespacing(bottom);withthe1-semitone spacing(top),thechangesinthestimulationpatternaremoresubtle. MCIwastestedwithandwithoutacompetinginstrumentmasker.Themaskerwasplayed byMIDIsynthesizedclarinetsamplesimilartoGalvinetal.[31].Differentfromthepiano,the spectralandtemporalenvelopeswerelesscomplexfortheclarinet,allowingforpotentialtim- bredifferencesbetweenthemaskerandtarget,asinZhuetal.[32].Themaskercontourwasal- ways “flat”(i.e.,thesamefivenotes),andwaspresentedsimultaneouslywiththetargetcontour (i.e.thesameonsetandoffset),asinGalvinetal.[33].Themaskerfrequencyrangewasvaried tobeoverlapping(A3masker,220Hz)ornon-overlapping(A6masker,1760Hz)withthetar- get.TheA3maskerwasaudiblewiththeHAandtheCI,andtheA6maskerwasaudibleonly withtheCIformostsubjects. Fig.3showsthe “rising”contourwith3-semitonespacingwiththeoverlappingA3masker (toprow)andthenon-overlappingA6masker(bottomrow).Fig.3issimilarinpresentation toFig.2.ThefarleftsideofFig.3illustratesthemaskerandtargetcontoursinrelationtothe HAandCIfrequencyranges;theA6maskerisshowntobeintheCIrangeonly.Theoriginal spectrogramisshownjusttotheright.Theoverlappingmaskercanbeseenatthebottomof theA3originalspectrogram,andthenon-overlappingmaskercanbeseenabovethechanges inF0intheA6originalspectrogram.TheHAspectrogramsshowthatonlytheA3overlapping maskerisrepresentedinthesimulatedaudibleacoustichearingrange.Theelectrodogramsat farrightshowtheA3andA6maskerssimultaneouslypresentedwithtargetcontours.Note alsothedifferencesinthestimulationpatternswiththemaskersinFig.3tothetargetalone (bottompanelofFig.2). AllMCIstimuliwerepresentedacousticallyat65dBAinasound-treatedbooth.During eachtestblock,acontourwasrandomlyselected(withoutreplacement)fromamongthe54sti- muli(9contours ! 3semitonespacings ! 2repeats)andpresentedtothesubject,whore- spondedbyclickingononeofthenineresponseboxesshownonscreen.Subjectswereallowed torepeateachstimulusuptothreetimes.Somesubjects(S1,S2,S3)werefamiliarwiththe MCItaskfrompreviousexperiments.Fortheremainingsubjects,theresponsescreenandtest procedureswerecarefullyexplainedtothesubjects.Nopreviewortrial-by-trialfeedbackwas provided.Aminimumofthreetestblocksweretestedforeachhearingmodeconditionand maskercondition;ifthevariabilityinperformancewasgreaterthan20%,afourthrunwasper- formed.Foreachconditionandsubject,performancewasaveragedacrossallruns. Results Fig.4showsindividualsubjects’SRTsacrossthedifferentlisteningconditions;meanperfor- manceisshownatthefarright.SRTswiththeHA-onlycouldnotbeobtainedforsubjectsS1, S2,S5,S7,andS9,asperformancewas<50%correctforSNRs>30dB;similarly,theCI-only SRTcouldnotbeobtainedforsubjectS5.WhilethemissingSRTsmakethestatisticalanalyses difficult,theydoinfactrepresentaparticularlevelofspeechunderstanding.Failuretoobtain anSRTforSNRs>30dBsuggeststhatthesubjectmostlikelycouldnotcorrectlyidentify50% ofthewordsinquiet.Ingeneral,speechperformancewasmuchpoorerwiththeHAthanwith theCI.Formostsubjects,CI+HAspeechperformancewascomparabletoCI-onlyperfor- mance.ForsubjectS4,CI+HAperformancewasmuchbetterthanwitheitherdevicealone.A one-wayrepeated-measuresanalysisofvariance(RMANOVA)showednosignificantof CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 7/18 devicetype(CI-onlyvs.CI+HA)[F(1,14)=1.8,p=0.201];however,observedpowerwaslow (0.125).HA-onlySRTswerenotincludedintheanalysisaswewereunabletoobtainSRTsfor anumberofsubjects;S5wasalsoexcludedfromtheanalysisastherewasnoCI-onlySRTfor thissubject. Fig.5showsindividualsubjects’MCIperformanceforthedifferentlisteningandmaskercon- ditions;meanperformanceisshownatright.DatainFig.5wereaveragedacrossthesemitone spacingconditions.Forallmaskerconditions,MCIperformancewasgenerallybestwithHA- onlyandworstwithCI-only.OnlysubjectsS3andS9performedbetterthan50%correctwith theCI-onlyforanymaskercondition;chanceperformancewas11.1%.CI+HAperformance Fig3. SpectrogramsandelectrodogramsfortheA3andA6Maskerconditions.Thetophalfofthefigureshows(fromlefttoright)aschematic representationofthetestconditioninrelationtothefrequencyrangesoftheHAandtheCI,aspectrogramoftheoriginalstimuli,aspectrogramofthe simulatedHAoutput,andanidealizedelectrodogramfortheA3Maskercondition;thebottomhalfshowsthesameinformationfortheA6Maskercondition. FiguredetailsaresimilartothedetailsofFig.2.Thetargetinstrumentnotesareshowninblackandthemaskinginstrumentnotesareshowningray. doi:10.1371/journal.pone.0120279.g003 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 8/18 relativetobestearperformancewasvariableacrosssubjects.Forexample,subjectS1seemedto experienceperceptualinterference(CI+HA<HA)whenlisteningwithbothdevicesinsome maskerconditions,andsubjectS4seemedtofocusonthebetterear(CI+HA"HA).Somesub- jects(S2andS5)seemedtoexperienceadditiveintegrationinbimodallistening(CI+HA>HA); interestinglythesewerethepoorestperformingsubjectsoverall.Athree-wayRMANOVA showedsignificantmaineffectsforhearingmode[F(2,16)=12.7,p<0.001],maskercondition [F(2,16)=8.5,p=0.003],andsemitonespacing[F(2,16)=25.1,p<0.001].Theobservedpower wasgreaterthan0.9forallmaineffects.Asignificantinteractionwasobservedonlybetween maskerconditionandsemitonespacing[F(4,32)=3.3,p=0.022]withanobservedpowerof 0.781.Post-hocpairedt-testsrevealedsignificantdifferencesafterBonferronicorrections (α crit =0.017)forthefollowing:CIvs.HA(p=0.0042),CIvs.CI+HA(p=0.0087),NoMasker vs.A3Masker(p=0.0065),1Semitonevs.2Semitones(p=0.0055),and1Semitonevs.3Semi- tones(p<0.001). Fig.6showstheeffectofsemitonespacingonMCIperformancefordifferenthearingmode andmaskerconditions.CI-onlyperformancewasgenerallypoorandlessvariable(exceptfor outlierS3)comparedwithHA-onlyandCI+HAperformance;thiswasmostlikelyduetofloor effects.Performancegenerallyimprovedasthesemitonespacingincreased.Performancewith theHAalonewasmuchbetterbutalsomorevariableacrosssubjects,relativetoCI-onlyperfor- mance.HA-onlyperformancegenerallyimprovedwithincreasingsemitonespacing.CI+HA performancewasgenerallypoorerwith1-semitonespacing,andbetterwiththe2-and 3-semitonespacingconditions(thoughtherewaslittledifferencebetweenthe2-and 3-semitonespacings).With1-semitonespacing,themeanHAperformancefortheNoMasker conditionwas65.2%correct,whilethemeanHAperformancefortheA3overlappingmasker conditionwas46.9%correct.Thissuggeststhatsubjectshaddifficultysegregatingtheoverlap- pingcontoursevenwhenfinestructurecueswereavailableintheHA.With1-semitonespac- ing,meanCI+HAperformancefortheNoMaskerandtheA6Maskerconditionswas60.7% and55.8%correct,respectively.Thissuggeststhatlistenerswereabletoselectivelyattendto thetargetwhenthemaskerwasspatiallyremote.Asimilarpatternofresultswasobservedwith theHA-onlyfortheNoMaskerandA6maskerconditions. Correlationalanalyseswereperformedamongthedifferenthearingconditions;datawas collapsedacrosstheNoMasker,A3masker,andA6maskerconditions.Therewasnosignifi- cantcorrelationbetweenCI-onlyandHA-onlyperformance(r 2 =0.085,p=0.140).Therewas Fig4. Speech-in-noiseresultsforindividualsubjectsacrosshearingdevices.CI-onlySRTsareshownbytheblackbars,HA-onlySRTsareshownby thewhitebars,andCI+HASRTsareshownbythegraybars.Meanperformanceisshownatthefarright;errorbarsindicatestandarderror.Asterisks indicatethatSRTscouldnotbemeasuredforthatcondition.Barsclosertothetopofthegraphindicatebetterperformance. doi:10.1371/journal.pone.0120279.g004 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 9/18 Fig5. MCIperformanceforindividualsubjectsacrosshearingdevicesandmaskercondition.CI-onlyperformanceisshownbytheblackbars,HA- onlyperformanceisshownbythewhitebars,andCI+HAperformanceisshownbythegraybars.Meanperformanceisshownatthefarrightwithineach maskercondition;errorbarsindicatestandarderror.MCIwithNoMaskerisshowninthetoppanel,MCIwiththeoverlapping,A3Maskerisshowninthe middlepanel,andMCIwiththenon-overlapping,A6Maskerisshowninthebottompanel. doi:10.1371/journal.pone.0120279.g005 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 10/18 Fig6. BoxplotsofMCIperformanceasafunctionofsemitonespacing,forthedifferentlisteningand maskerconditions.Thecolumnsindicatehearingdevice(CI,HA,andCI+HA)andtherowsindicatemasker condition(NoMasker,A3Masker,A6Masker).Theedgesoftheboxesrepresentthe25thand75th percentiles,thesolidlinerepresentsthemedian,thedashedlinerepresentsthemean,theerrorbarsindicate the10thand90thpercentiles,andthepointsoutsideoftheerrorbarsindicateoutliers. doi:10.1371/journal.pone.0120279.g006 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 11/18 asignificantcorrelationbetweenCI-onlyandCI+HAperformance(r 2 =0.247,p=0.008); however,thecorrelationwasnotsignificantwhenoutlierS3wasremovedfromtheanalysis (r 2 =0.074,p=0.190).TherewasasignificantcorrelationbetweenHA-onlyandCI+HAper- formance(r 2 =0.615,p<0.001). Fig.7showsmusicandspeechperceptionasafunctionofunaidedandaidedthresholdsin thenon-implantedear.Pure-toneaveragethresholds(PTAs)werecalculatedforeachsubject at125Hz,250Hz,and500Hz;notethatwarbletoneswereactuallyusedtomeasuresound fieldthresholds.LinearregressionswereperformedfortotheHA-onlyandtheCI+HAdata. ResultsshowedthatunaidedPTAsweremoderatelycorrelatedwithspeechandmusicperfor- mance.ThereweresignificantcorrelationsbetweenunaidedPTAsandMCIperformancefora fewconditions[HA-only,A3masker(p=0.020);CI+HA,A3masker(p=0.006);CI+HA,A6 masker(p=0.028)].UnaidedPTAswerealsosignificantlycorrelatedwithCI+HASRTscores (p=0.006).However,therewerenosignificantcorrelationsbetweenaidedPTAsandspeechor musicperformance. Discussion Thepresentdatashowthatthecontributionsofacousticandelectrichearingtobimodalper- formancewerebothsubject-andtask-dependent.SimilartoKongetal.[9],wefoundthatthe CIandtheHAprovidedifferingtypesandamountsofinformationforbimodallisteners,with speechinformationmostlyprovidedbytheCIandlow-frequencypitchinformationprovided bytheHA.TherewasevidencetosupportthehypothesisthatcombineduseofCI+HAprovid- edbetterperformancethanwitheitherdevicealoneforspeechbutnotformusic.Indeed,mean MCIperformanceforallmaskerconditionswaspoorer(thoughnotsignificantlypoorer)with CI+HAthanwiththebetterear(HA).Similarly,theadvantagewithCI+HAversusCI-only wasnotsignificantforSRTsinbabble.Thus,thehypothesisthataddingasecond(presumably poorer-performing)devicewouldimproveperformancewasnotsupported;meanHAperfor- mancewasgreaterthanmeanCI+HAperformanceacrossmaskerconditions.Likewise,our hypothesisthatcombineduseofCI+HAwouldimprovesegregationofcompetingmelodic contourswasnotsupported.Therewassomeevidencetosupportourhypothesisthatseparat- ingtheF0rangeofthemaskerandtargetcontours(effectivelyisolatingthemaskerontheCI side)wouldimproveMCIperformancewithacompetinginstrument(e.g.,datafromsubjects S1,S2,andS3).However,performancewiththeA6maskerwasnotsignificantlybetterthan withtheA3masker(p=0.060).Also,therewasnosignificantdifferencebetweentheA6mask- erandNoMaskerconditions,suggestingthatlistenerswereabletoselectivelyattendtothetar- getpresentedontheHAsideevenwhenamaskerwaspresentedonlytotheCIside. Asseeninotherstudies[9–10,12,16,19,22],therewasconsiderableacross-subjectandeven within-subjectvariabilityinthepresentdata.Asubjectmightperformwellinonetask,but poorlyinanother.Forexample,subjectS3performedverywellintheNoMaskerandA6 MaskerconditionsbutpoorlyintheA3Maskercondition.Likewise,subjectS2wasamiddling performerinthespeechtaskbutoneofthepoorerperformersinthemusictasks.Interestingly, theacross-subjectvariabilitywasincreasedintheMCItasksfortheHA-onlyandCI+HAcon- ditions,relativetothatwiththeCI-only(seeFig.6).Performancetendedtobeuniformlypoor withtheCI-only,withamuchwiderrangeinperformancewiththeHA-onlyortheCI+HA. CI+HAperformancewasgreaterthanCI-onlyperformanceinnearlyallcases. Bimodalspeechperception Inthisstudy,meanSRTswiththeCI+HAwereslightly(butnotsignificantly)betterthanwith theCI-only(3.4dBdifferenceonaverage,p>.05).Dormanetal.[13]foundsignificantly CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 12/18 Fig7. Scatterplotsofmusicandspeechperformanceversusunaidedandaidedthresholdsinthenon-implantedear.ThetoprowshowsMCI performancefortheNoMasker(left),A3Masker(middle),andA6Maskerconditions(right),asafunctionofunaidedPTAsat125Hz,250Hz,and500Hz. ThesolidcirclesshowdatafortheHA-onlycondition;thesolidlineshowsthelinearregression(r 2 andp-valuesareshowninthelegendineachpanel).The opencirclesshowdatafortheCI+HAcondition;thedashedlineshowsthelinearregression.Themiddlerowshowssimilarplots,butasafunctionofaided PTAsat125Hz,250Hz,and500Hz.ThebottomrowshowsSRTsasafunctionofunaidedPTAs(left)oraidedPTAs(middle).OnlyCI+HASRTdata isshown. doi:10.1371/journal.pone.0120279.g007 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 13/18 betterperformancewiththeCI+HArelativetoCI-onlyforCNCwordsandAzBiosentences. BrownandBacon[8]alsofoundsignificantlybettersentencerecognitionscoreswithCI+HA relativetoCI-only.Moketal.[10]foundsignificantlybetterperformancewithCI+HArelative toCI-onlyforCNCphonemesandCUNYsentencesat+10dBSNR,butnotforCNCwords andCUNYsentencesat+5dBSNR.Moketal.[10]alsofoundsignificantlybetterSRTswith CI+HAversusCI-only;notethattheseauthorsmeasuredclosed-setspondeeidentificationin noisewhereasopen-setsentencerecognitionwasusedinthepresentstudy.Whilethepresent datadoesnotshowasignificantadvantagewiththeCI+HArelativetoCI-only,thestatistical powerwasverylow. Formostsubjects,theHAprovidedverylittlespeechinformation.Itisnotsurprisingthat HA-onlyperformancewaspoor;ifspeechunderstandingwassufficientlygoodwiththeHA alone,subjectswouldnothavequalifiedforcochlearimplantation.Itwasnotpossibletoobtain SRTsinsomesubjects(S1,S2,S5,S7,S9),evenatveryhighSNRs.Interestingly,somesubjects (S4,S6)demonstratedsubstantialspeechunderstandinginnoisewiththeHAalone;inthese subjects,performancewiththeHAalonewasbetterthanwiththeCIalone.Thevariabilityin HA-onlyperformancemayreflectdifferencesinauditoryprocessingotherthanaudibility(e.g., spectralresolution,temporalprocessing,etc.).Zhangetal.[34]recentlyfoundthatthespectral resolutionwithintheresidualacoustichearing,ratherthantherangeofaudibility,predicted thebenefitofcombineduseofCI+HA.Aidedandunaidedthresholds(Fig.1)suggestthatHA signalprocessingmayhavedifferedacrosssubjects.Forexample,S6hadthemosthighfre- quencyhearingintheunaidedconditionandhadthebestHA-onlySRTs;likewise,S5hadthe worstunaidedaudiogramandwasgenerallythepoorestperformer. Bimodalmusicperception WhilebimodalspeechperformanceseemedtobelargelydrivenbytheCI,bimodalmusicper- formanceseemedtobelargelydrivenbytheHA.MeanMCIperformancefortheNoMasker conditionwas77.0%correctwiththeHA-onlyand42.4%correctwiththeCI-only.Excluding “star”subjectS3,CI-onlyscoresrangedfrom21to62%correctwhileHA-onlyscoresranged from43tonearly100%correctintheNoMaskercondition.CI+HAscorestendedtobevery similartoHA-onlyscoresformostsubjects;somesubjects(S1,S7)showedconsiderablypoorer performancewithCI+HArelativetoHA-only.FromFig.6,theresolutionoftheHAisbetter than1-semitoneformostsubjects;excludingS3,theresolutionoftheCIisworsethan 3-semitones.Forallmaskerconditions,meanMCIperformancewithCI+HAwasslightly poorerthanwiththeHAalone,andpoorerstillwiththeCIalone.Incontrast,Kongetal.[9] foundthatfamiliarmelodyidentificationwasonaveragebetterwithCI+HAthanwithHA- only;interestinglysomesubjectsperformedbetterwiththeCI-onlythanwiththeHA-only. Differencesinthelisteningtasks(MCIvs.familiarmelodyrecognition)mayexplaindifference inoutcomesbetweenstudies. Theeffectofthemaskerswassomewhatvariableacrosssubjects.Onaverage,performance wasworsewiththeA3MaskerthanwiththeA6Masker.Meanperformancedroppedby6.8% fromNoMaskertoA6Maskerconditions;thisdeficitwasnotsignificantafterfamily-wise typeIerrorcorrection(pairedt-testwithBonferronicorrection:p=0.033,α crit =0.017).Mean performancesignificantlydroppedby19.1%fromNoMaskertoA3Masker(pairedt-testwith Bonferronicorrection:p=0.007).Somesubjectsexperiencedalargedropinperformancefor particularmaskerconditions.Forexample,subjectS3scorednearlyperfectlyinallhearing modesfortheNoMaskerandA6Maskerconditions,butexperiencedaverylargedropinper- formanceintheA3Maskercondition.OthersubjectshadsimilarperformanceintheA6Mask- erandA3Maskerconditions(e.g.,S4,S5)forallhearingconfigurations.Andformany CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 14/18 subjects,theeffectofacompetinginstrumentdependedonwhichdevicewastested.Forsubject S1,therewasalargedropinCI+HAperformancebetweentheNoMaskerandA3Maskercon- ditions,butonlyamoderatedropinHA-onlyperformancefromtheNoMaskertotheA3 Maskercondition.Takentogether,theseresultssuggestthatthedevicescontributeddifferently toindividualsubjectperformance,dependingontheparticularlisteningtask. Audibilityandperception AsshowninFig.7,unaidedPTAsweresometimespredictiveofspeechandmusicperformance whileaidedPTAswerenot.ThissuggeststhattheHAsignalprocessing,asappliedtothese particularsubjects,maynothavebeenoptimalorconsistentacrosssubjects.Forexample,the HAgainappliedtolowfrequencieswasdifferentacrosssubjectsdespitesimilarunaided thresholds.SubjectsS2,S3,andS6allhadpoorunaidedthresholds(>60dBHL)butreceived noamplificationat125Hz.Incontrast,subjectsS1,S5,andS8received15dBormoregainat 125Hz.Largeamountsofgainweresometimesappliedathigheraudiometricfrequencies, mostlikelytoimprovespeechperceptionwiththeHA.However,suchhighlevelsofgainmay introducedistortion,whichmayreducemusicperceptionandenjoyment.Thegoodpredictive poweroftheunaidedPTAsversusthepoorpredictivepoweroftheaidedPTAssuggestssome disconnectbetweenaudibilityandintelligibilityinHAfittingand/orsignalprocessing.Tothe extentthattheunaidedthresholdsrepresentperipheralabilitiesorlimits,HAprocessingmust extendtheselimitswithoutsacrificingperformance.Thisprocessingmightbeverydifferent forspeechandmusic,especiallyinthecaseofbimodallistening. Theaboveresultsdemonstrateanumberofimplicationsformappingbimodallisteners.Au- dibilityintheacousticearseemslargelyresponsibleformelodicpitchperception.S5wasone oftheworstperformingsubjectsandhadtheleastamountofresidualacoustichearing.S4had thebestunaidedthresholdat125and250Hzandlikewisewasthebestperformingsubject withtheHAaloneacrossallmaskerconditions.ThisisconsistentwithElFataetal.[22]and Zhangetal.[34],whodemonstratedthattheaudibilityandtheresolutionoftheacousticearis clearlylinkedtobimodalperformance. Itseemsintuitivethataudibilityintheacousticearwouldcontributestronglytomusicper- ceptionperformance.However,therewasnoclearpatterninthepresentresultsthatlinked acousticthresholds,aidedorunaided,toMCIperformance.Audibilityaloneasmeasuredby pure-tonethresholdsmaynotexplainHA-aidedperformance.TheHAprescriptionmaydiffer greatlyamongbimodalusers,andthismaygreatlyaffectmusicperceptionwithacoustichear- ing.TheaidedandunaidedthresholdsinFig.1revealdifferentgainsettingsacrossfrequencies fordifferentsubjects.Forexample,S1exhibitsahalf-gainruleacrosslowfrequenciessuchthat theF0susedinthisstudywouldhavebeenhighlyaudiblewiththeHA.S3exhibitsaverydif- ferentHAmappingsuchthattheF0swouldhavebeenmuchlessaudiblewiththeHA.Itisun- likelythatagreaternumberofsubjectswouldimprovethecorrelationbetweenaided thresholdsandMCIperformance.ThereneedstobeconsistentcontrolofHAmappingin ordertoexaminethecontributionofaidedlowfrequencyhearingtomusicperceptionasthis mappinghasbeenshowntohaveaneffectonbimodalspeechperception[35].Whilethepres- entnumberofsubjectsissmall(n=9),itisunlikelythatagreaternumberofsubjectswould changethefundamentalrelationshipsbetweenacoustichearinginspeechandacoustichearing inmusicforbimodalusers.Futurestudieswithagreaternumberofsubjectsshouldconsider bothunaidedthresholdsandtheamountandtypeofamplificationschemetobetterrevealthe contributionsfromtheacousticear. CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 15/18 Real-worldmusiclistening Thepresentdata,obtainedunderspecificandveryconstrainedspeechandmusicconditions, showednostatisticallysignificantadvantageforbimodallisteningoverthebetterdevicealone. Indeed,meanMCIperformancewaspoorerwithCI+HAthanwithHA-only,andmanysub- jectsperformedequallywellwiththeCI+HAaswiththeHA-only.However,anumberofsub- jectsexperiencedaconsiderableperformancedropwhenlisteningwithbothdevices.Inall threemaskerconditions,HA-onlyscoreswerewellabovetheCI+HAscoresforsubjectS1. SubjectsS6andS8alsoexperiencedadropinperformancebetweenHA-onlyandCI+HAfor theA6Maskercondition.GalvinandFu[36]foundthathigh-passfilteringMCIstimuliim- provedperformance.Theauthorssuggestedthattheimprovementmayhavebeenduetore- ducingthefrequency-to-placemismatchforlow-frequencyF0s,ortode-emphasizingapical pitchcuesthatmayhavebeenlesssalient.Zhangetal.[14]foundthatreducingthefrequency overlapbetweentheacousticearandtheCIeardidnotimprovespeechperceptionforbimodal listening.However,somebimodallisteners(e.g.,S1fromthepresentstudy)mightbenefitfrom reducinglow-frequencystimulationfromtheCIratherthanreducingthehigh-frequencyin- formationfromtheHA.Suchanoptimizationmightreducetheperceptualinterferencebe- tweendevicesforpitchperception.Thus,coordinatingtheinputfrequencyrangesbetween devicesmayimprovebimodallistening. AlthoughthepresentstudyfoundthatmeanMCIperformancewasslightlyworsewith CI+HAthanwiththeHA-only,mostsubjectsindicatedthattheyregularlywearbothdevices especiallywhenlisteningtomusic.WiththeCI-only,subjectsreportedthatthesoundquality was “artificial”and “alien.”AddingtheHAmadethesoundqualitymore “natural.”Thepres- entresultsmustalsobeconsideredintermsofthelimitedmaterialsandmeasures.Clearly, musiccontainsmanymorecomplexcomponentsthanthesimplemelodiccontourstestedin thepresentstudy.Musicoftencontainscomplextimbres,melodies,harmonies,andrhythmic patternsaswellaslargedynamicandfrequencyranges.Therefore,theresultsofthepresent studyshouldbeinterpretedcautiouslywithregardtomorereal-worldmusiclistening.Thefre- quencyrangeofmuchmusicmaybeinaudiblewithanHA(e.g.,cymbals,higherfrequency notes,lyrics),andwillbelargelyrepresentedbytheCI.CIshavealsobeenshownrepresentin- strumenttimbresimilarlytoNH[37].Kongetal.[21]foundthatboththeHAandCIrepre- senttheattackofaninstrumentwell,butHAsdonotrepresentthespectralenvelopeofthe instrumentaswellasaCI.ElFataetal.[22]andGfelleretal.[38]showedthatCIsbetterrepre- sentlyrics,whichareaveryimportantcomponentofpopularmusic.Thus,combineddevice useremainslikelytoproducethebestoutcomesformusicperceptionandenjoyment. Conclusion Inthepresentstudy,speechperceptioninnoiseandmelodicpitchperceptionweremeasured withtheCIalone,theHAalone,andCI+HA.Resultsshowed • MeanSRTswerebestwithCI+HA.However,therewasnosignificantdifferenceinSRTsbe- tweentheCI+HAandCI-onlyconditions,suggestingthattheHAcontributedlittletobi- modalspeechperception. • MeanMCIperformancewasbestwithHA-only,withtheCI+HAproducingslightly(butnot significantly)poorerperformance.ThissuggeststhattheCIcontributedverylittletobimodal melodicpitchperception,andinsomecases,negativelyimpactedbimodal MCIperformance. CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 16/18 • ThepresenceofacompetinginstrumentreducedMCIperformance,especiallywhenthe maskerandtargetF0swereoverlapping. • Theresultsdemonstratethatdifferenttypesofinformationarebestrepresentedbythediffer- entdevices.Individualbimodallistenersmayattendtothesecuesdifferently,dependingon thelisteningdemands. Acknowledgments Theauthorsthankthesubjectsfortheirparticipation. AuthorContributions Conceivedanddesignedtheexperiments:JCJGDLQF.Performedtheexperiments:JC.Ana- lyzedthedata:JCJGDLQF.Contributedreagents/materials/analysistools:JGQF.Wrotethe paper:JCJG.Designedthetestsoftwaresuite:QF. References 1. DormanMF,GiffordRH.Combiningacousticandelectricstimulationintheserviceofspeechrecogni- tion.Int.J.Audiol.2010; 49:912–919.doi:10.3109/14992027.2010.509113PMID:20874053 2. GfellerK,ChristA,KnutsonJF,WittS,MurrayKT,TylerRS.Musicalbackgrounds,listeninghabits, andaestheticenjoymentofadultcochlearimplantrecipients.J.Am.Acad.Audiol.2000; 7:390–406. PMID:10976500 3. McDermottHJ.Musicperceptionwithcochlearimplants:areview.TrendsAmplif.2004; 8:49–82. PMID:15497033 4. LimbCJ,RubinsteinJT.Currentresearchonmusicperceptionincochlearimplantusers.Otolaryngol. Clin.N.Am.2012; 45:129–140. 5. ArmstrongM,PeggP,JamesC,BlameyP.Speechperceptioninnoisewithimplantandhearingaid. Am.J.Otol.1997; 18:S140–S141.PMID:9391635 6. TylerRS,ParkinsonAJ,WilsonBS,WittS,PreeceJP,NobleW.Patientsutilizingahearingaidanda cochlearimplant:speechperceptionandlocalization.EarHear.2002; 23:98–105.PMID:11951854 7. TurnerCW,GantzBJ,VidalC,BehrensA.Speechrecognitioninnoiseforcochlearimplantlisteners: benefitsofresidualacoustichearing.J.Acoust.Soc.Am.2004; 115:1729–1735.PMID:15101651 8. BrownCA,BaconSP.Achievingelectric-acousticbenefitwithamodulatedtone.EarHear.2009; 30: 489–493.doi:10.1097/AUD.0b013e3181ab2b87PMID:19546806 9. KongYY,StickneyGS,ZengFG.Speechandmelodyrecognitioninbinaurallycombinedacousticand electrichearing.J.Acoust.Soc.Am.2005; 117:1351–1361.PMID:15807023 10. MokM,GraydenD,DowellRC,LawrenceD.Speechperceptionforadultswhousehearingaidsincon- junctionwithcochlearimplantsinoppositeears.J.SpeechLang.Hear.Res.2006; 49:338–351. PMID:16671848 11. MokM,GalvinKL,DowellRC,McKayCM.Speechperceptionbenefitforchildrenwithacochlearim- plantandahearingaidinoppositeearsandchildrenwithbilateralcochlearimplants.Audiol.Neurotol. 2010; 15:44–56.doi:10.1159/000219487PMID:19468210 12. GiffordRH,DormanMF,McKarnsSA,SpahrAJ.Combinedelectricandcontralateralacoustichearing: wordandsentencerecognitionwithbimodalhearing.J.SpeechLang.Hear.Res.2007; 50:835–843. PMID:17675589 13. DormanMF,GiffordRH,SpahrAJ,McKarnsSA.Thebenefitsofcombiningacousticandelectricstimu- lationfortherecognitionofspeech,voiceandmelodies.Audiol.Neurotol.2008; 13:105–112.PMID: 18057874 14. ZhangT,SpahrAJ,DormanMF.Frequencyoverlapbetweenelectricandacousticstimulationand speech-perceptionbenefitinpatientswithcombinedelectricandacousticstimulation.EarHear.2010; 31:195–201.doi:10.1097/AUD.0b013e3181c4758dPMID:19915474 15. YoonYS,LiY,FuQJ.Speechrecognitionandacousticfeaturesincombinedelectricandacousticstim- ulation.J.SpeechLang.Hear.Res.2012; 55:105–124.doi:10.1044/1092-4388(2011/10-0325) PMID:22199183 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 17/18 16. KongYY,BraidaLD.Cross-frequencyintegrationforconsonantandvowelidentificationinbimodal hearing.J.SpeechLang.Hear.Res.2011; 54:959–980.doi:10.1044/1092-4388(2010/10-0197) PMID:21060139 17. GfellerK,TurnerC,OlesonJ,KliethermesS,DriscollV.Accuracyofcochlearimplantrecipientsin speechreceptioninthepresenceofbackgroundmusic.Ann.Otol.Rhinol.Laryngol.2012; 121: 782–791.PMID:23342550 18. GfellerKE,OlszewskiC,TurnerC,GantzB,OlsenJ.Musicperceptionwithcochlearimplantsandre- sidualhearing.Audiol.Neurotol.2006; 11(suppl1):12–15.PMID:17063005 19. GfellerK,TurnerC,OlesonJ,ZhangX,GantzB,FromanR,etal.Accuracyofcochlearimplantrecipi- entsonpitchperception,melodyrecognition,andspeechreceptioninnoise.EarHear.2007; 28: 412–423.PMID:17485990 20. GfellerK,JiangD,OlesonJ,DriscollV,OlszewskiC,KnutsonJF,etal.Theeffectsofmusicalandlin- guisticcomponentsinrecognitionofreal-worldmusicalexcerptsbycochlearimplantrecipientsand normal-hearingadults.J.MusicTher.2012; 49:68–101.PMID:22803258 21. KongYY,MullangiA,MarozeauJ.Timbreandspeechperceptioninbimodalandbilateralcochlear- implantlisteners.EarHear.2012; 33:645–659.doi:10.1097/AUD.0b013e318252caaePMID:22677814 22. ElFataFE,JamesCJ,LabordeML,FraysseB.Howmuchresidualhearingis ‘useful’formusicpercep- tionwithcochlearimplants?Audiol.Neurotol.2009; 14(suppl1):14–21. 23. LooiV,McDermottH,McKayC,HicksonL.Comparisonsofqualityratingsformusicbycochlearim- plantandhearingaidusers.EarHear.2007; 28:59S–61S.PMID:17496649 24. LooiV,McDermottH,McKayC,HicksonL.Musicperceptionofcochlearimplantuserscomparedwith thatofhearingaidusers.EarHear.2008; 29:1–14. 25. LooiV,RadfordCJ.Acomparisonofthespeechrecognitionandpitchrankingabilitiesofchildrenusing aunilateralcochlearimplant,bimodalstimulationorbilateralhearingaids.Int.J.Pediatr.Otorhi.2011; 75:472–482.doi:10.1016/j.ijporl.2010.12.023PMID:21300411 26. GalvinJJ3rd,FuQJ,NogakiG.Melodiccontouridentificationbycochlearimplantlisteners.EarHear. 2007; 28:302–319.PMID:17485980 27. GalvinJJ3rd,FuQJ,ShannonRV.Melodiccontouridentificationandmusicperceptionbycochlearim- plantusers.Ann.N.Y.Acad.Sci.2009; 1169:518–533.doi:10.1111/j.1749-6632.2009.04551.xPMID: 19673835 28. GalvinJJ3rd,EskridgeE,ObaS,FuQJ.Melodiccontouridentificationtrainingincochlearimplant userswithandwithoutacompetinginstrument.SeminHear.2012; 33:399–409.doi:10.1097/AUD. 0b013e31823d78fdPMID:22246139 29. NilssonM,SoliSD,SullivanJA.DevelopmentoftheHearingInNoiseTestforthemeasurementof speechreceptionthresholdsinquietandinnoise.J.Acoust.Soc.Am.1994; 95:1085–1099.PMID: 8132902 30. ChanJC,FreedDJ,VermiglioAJ,SoliSD.Evaluationofbinauralfunctionsinbilateralcochlearimplant users.Int.J.Audiol.2008; 47:296–310.doi:10.1080/14992020802075407PMID:18569102 31. GalvinJJ3rd,FuQJ,ObaS.Effectofinstrumenttimbreonmelodiccontouridentificationbycochlear implantusers.J.Acoust.Soc.Am.2008; 124:EL189–195.doi:10.1121/1.2961171PMID:19062785 32. ZhuM,ChenB,GalvinJJ3rd,FuQJ.Influenceofpitch,timbreandtimingcuesonmelodiccontour identificationwithacompetingmasker.J.Acoust.Soc.Am.2011; 130:3562–3565.doi:10.1121/1. 3658474PMID:22225012 33. GalvinJJ3rd,FuQJ,ObaS.Effectofacompetinginstrumentonmelodiccontouridentificationbyco- chlearimplantusers.J.Acoust.Soc.Am.2009; 125:EL98–EL103.doi:10.1121/1.3062148PMID: 19275282 34. ZhangT,SpahrAJ,DormanMF,SaojiA.Relationshipbetweenauditoryfunctionofnonimplantedearsand bimodalbenefit.EarHear.2013; 34:133–141.doi:10.1097/AUD.0b013e31826709afPMID:23075632 35. DillonMT,BussE,PillsburyHC,AdunkaOF,BuchmanCA,AdunkaMC.(2014)Effectsofhearingaid settingsforelectric-acousticstimulation.JAmAcadAudiol. 25:133–40.doi:10.3766/jaaa.25.2.2 PMID:24828214 36. GalvinJJ3rd,FuQJ.Effectofbandpassfilteringonmelodiccontouridentificationbycochlearimplant users.J.Acoust.Soc.Am.2011; 129:EL39–EL44.doi:10.1121/1.3531708PMID:21361410 37. MachereyO,DelpierreA.Perceptionofmusicaltimbrebycochlearimplantlisteners:amultidimensional scalingstudy.EarHear.2013; 34:426–436.doi:10.1097/AUD.0b013e31827535f8PMID:23334356 38. GfellerK,OlszewskiC,RychenerM,SenaK,KnutsonJF,WittS,etal.Recognitionof ‘real-world’musi- calexcerptsbycochlearimplantrecipientsandnormal-hearingadults.EarHear.2005; 26:237–250. PMID:15937406 CochlearImplant+HearingAidSpeechandMusicPerception PLOSONE|DOI:10.1371/journal.pone.0120279 March19,2015 18/18 78 Appendix C: “Melodic Contour Identification and Sentence Recognition Using Sung Speech.” JASA-EL 2015 Joseph D. Crew, John J. Galvin III, and Qian-Jie Fu. (2015) “Melodic Contour Identification and Sentence Recognition Using Sung Speech.” Journal of the Acoustical Society of America, Volume 138, EL347-EL351 Melodic contouridentificationandsentence recognition usingsungspeech Joseph D. Crew, 1,a) JohnJ. Galvin III, 2 andQian-JieFu 2 1 DepartmentofBiomedicalEngineering,UniversityofSouthernCalifornia, 1042DowneyWay,DenneyResearchCenter140,LosAngeles,California90089,USA 2 DepartmentofHeadandNeckSurgery,UniversityofCalifornia, LosAngeles,California90095,USA jcrew@usc.edu,jgalvin@ucla.edu,qfu@mednet.ucla.edu Abstract: For bimodal cochlear implant users, acoustic and electric hearing has been shown to contribute differently to speech and music perception. However, differences in test paradigms and stimuli in speech andmusictestingcanmakeitdifficulttoassesstherelativecontributions of each device. To address these concerns, the Sung Speech Corpus (SSC) was created. The SSC contains 50 monosyllable words sung over an octave range and can be used to test both speech and music percep- tion using the same stimuli. Here SSC data are presented with normal hearinglistenersandanyadvantageofmusicianshipisexamined. V C 2015AcousticalSocietyofAmerica [DOS] DateReceived: June21,2015 DateAccepted: August13,2015 1.Introduction For cochlear implant (CI) users with residual acoustic hearing, combined use of the CI and a hearing aid (HA) in the contralateral ear has been shown to improve speech and music perception (Kong et al., 2005; Dorman et al., 2008; Crew et al., 2015). The ben- efits of bimodal listening are somewhat variable, often with listeners attending to the “better ear” for different speech and music perception tasks (e.g., Crew et al., 2015). In previous bimodal studies, very different stimuli (e.g., speech versus piano notes) and perceptual tests (e.g., speech understanding in noise versus melodic contour identifica- tion) have been used to evaluate bimodal performance. The results indicate that CIs convey speech information quite well, and the HAs convey the pitch information quite well. As such, a subject is likely to focus on a single device during a particular task because that cue is better represented by a single device making it difficult to observe a bimodal benefit. It seems preferable to combine varying musical pitch and speech in- formation within a stimulus such that both devices will be needed to perform the task. This may allow better observation of the contributions and interactions between acous- tic and electric hearing. To address these concerns, we have created a database of sung monosyllable words that contain both musical pitch and speech information called the Sung Speech Corpus (SSC). The SSC allows sentence recognition to be measured with or without variations in pitch cues (fundamental frequency or F0) and melodic pitch perception to be measured with and without variation in words or timbre cues, both using the same stimuli. Thus, the SSC may help elucidate the contributions of pitch and timbre to speech and music perception. The SSC may be especially useful for evaluating speech and music perception in unilateral and bilateral CI users, bimodal listeners, and nor- mal hearing (NH) listeners with pitch processing deficits. The size of the database (100000 possible sentences with 27 possible contours) makes the SSC useful to com- pare performance across numerous experimental conditions. In this paper, we present speech and music perception results with the SSC in adult normal hearing (NH) listen- ers. Such data are important for future comparison to performance in hearing- impaired listeners (e.g., HA users, CI users, bimodal listeners, etc.). Long-term musical experience has been shown to improve both speech and music perception (Peretz et al., 2003; Kraus et al., 2009; Parbery-Clark et al., 2009), possibly because musicians are able to extract and track pitch in complex listening environments. This “musician effect” has not been consistently observed across studies (Ruggles et al., 2014) and may depend on the listening task (Fuller et al., 2014). The SSC contains acoustically complex stimuli in terms of pitch and timbre cues that may be weighted differently depending on the listening task (speech versus music a) Authortowhomcorrespondenceshouldbeaddressed. J. Acoust.Soc.Am.138(3),September2015 V C 2015Acoustical SocietyofAmerica EL347 Crew et al.: JASAExpressLetters [http://dx.doi.org/10.1121/1.4929800] PublishedOnline25 September 2015 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 172.91.3.54 On: Fri, 25 Sep 2015 15:10:59 perception); it is even possible that pitch and timbre cues may not be optimally inte- grated in some listeners. As such, we hypothesized that an advantage may be observed for musicians as the stimuli became more complex (e.g., melodic pitch perception with varying timbre cues, sentence recognition with varying pitch cues). 2.Methods 2.1Subjects Sixteen NH subjects participated in the study. All subjects had pure tone thresholds less than 20dB hearing loss at all audiometric frequencies between 125 and 4000Hz. Subjects were divided into two categories of eight subjects each: musicians (mean age, 30.5 years; age range, 24–47 years) and non-musicians (mean age, 27.8 years; age range, 24–30 years). Musicians were defined as regularly playing a musical instrument at the time of recruitment. Non-musicians were defined as never having any formal musical training or never informally learning to play an instrument (e.g., guitar les- sons, sang in a choir). Potential subjects who had some music training but did not meet the musician criteria were excluded, as they had too much training to be a non- musician, but too infrequent playing to be considered a musician for this study. 2.2SSC The SSC consists of 50 sung monosyllable words produced by a single adult male that can be used to create a simple sentence with the following syntax: “name” “verb” “number” “color” “clothing” (e.g., “Bob sells three blue ties”). Each of the five catego- ries contains ten words, and each word was sung at all 13 pitches from A2 (110Hz) to A3 (220Hz) in discrete semitone steps. As such, a five-word sentence can be con- structed to contain a five-note melody, allowing sentence recognition and melodic con- tour identification (MCI) to be measured using the same stimuli. Natural speech utter- ances were also produced for each word to allow comparison between naturally produced and sung speech. All stimuli were 500ms in duration. Minimal adjustments were made to the stimuli after recording to obtain exact target F0, amplitude and du- ration. Figure 1 shows the response screen for the sentence recognition test (left panel) and the MCI test (right panel). 2.3Testproceduresandconditions Sentence recognition was measured using a closed-set matrix procedure, similar to other matrix sentence testing studies (Rader et al., 2015). To create a test sentence, a word was randomly selected from each category. Depending on the test condition, the F0 for each word was selected to create a target pitch contour. For the “flat contour” condition, the F0 was the same across all words. For the “fixed contour” condition, one of four dynamic contours (rising, rising-falling, falling-rising, and falling) was used for all sentences during testing. For the “random contour” condition, all nine possible contours were presented during testing. Sentences were also tested using naturally pro- duced speech (“spoken”). During testing, a test sentence was presented to the subject, who responded by clicking on the word within each category that best matched the word presented (left panel of Fig. 1). Subjects were allowed to repeat the sentence up to three times. Performance was scored based on complete sentence recognition. The sentence recognition test took approximately 6–8min to complete each run. Audio demo Mm. 1, presents example stimuli for each of the four sentence test conditions. Mm. 1. Audio examples of the sentence test stimuli for the spoken, flat contour, fixed contour,andrandomcontourconditions.Thisisafileoftype“wav”(5355kB). MCI was also measured using the SSC stimuli, using methods described in Galvin et al. (2007). The F0 spacing between notes in the contour was varied between Fig. 1. (Color online) Response screens for sentence recognition (left panel) and MCI (right panel). There are fivecategorieswithtenwordseachforthesentencerecognitiontest;thereareninepossiblecontoursforMCI. Crew et al.: JASAExpressLetters [http://dx.doi.org/10.1121/1.4929800] PublishedOnline25 September 2015 EL348 J. Acoust.Soc.Am.138(3),September2015 Crew et al. Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 172.91.3.54 On: Fri, 25 Sep 2015 15:10:59 one and three semitones. For the “fixed word” condition, 1 of the 50 words was ran- domly chosen, and this word was used for all notes in the contour (e.g., “Bob Bob Bob Bob Bob”). For the “fixed sentence” condition, one word from each category was randomly chosen to construct a single sentence (e.g., “Bob sells three blue ties”) that was used for all contours during MCI testing. For the “random sentence” condition, words were randomly chosen from each category to create different sentences for each contour (e.g., “Bob sells three blue ties,” “John wants five brown shoes,”). As a control condition, MCI was also measured with the MIDI piano sample used in Galvin et al. (2008) and Crew et al. (2015). During testing, a contour was presented to the subject, who responded by clicking on one of the nine response boxes shown onscreen (right panel of Fig. 1). Subjects were allowed to repeat the contour up to three times. MCI performance was scored in terms of overall percent correct, as well as percent correct for each semitone spacing condition. The MCI test took approximately 4–5min to complete each run. Audio demo Mm. 2, presents example stimuli in pairs for each of the four MCI test conditions: piano, fixed word, fixed sentence, and random sentence. Mm. 2. Audio examples of the MCI test stimuli with three-semitone spacing for the piano, fixed word, fixed sentence, and random sentence conditions. This is a file of type “wav” (6703kB). All subjects were tested in while sitting in a sound-treated booth and directly facing a single loudspeaker. All stimuli were presented in sound field at 65 dBA. The four sentence conditions and the four MCI conditions were tested in separate blocks and the test block order was randomized across subjects. No preview or trial-by-trial feedback was provided. A minimum of two test blocks were tested for each condition; if the difference in performance was greater than 10%, a third run was tested. 3.Results Figure 2 shows sentence recognition (top panels) and MCI performance (bottom panels) for musicians and non-musicians for the different test conditions. For sentence recogni- tion, performance for both groups was very good for all conditions, with musicians and most non-musicians scoring near 100% correct. A split-plot analysis of variance (ANOVA) was performed on the sentence recognition data, with subject group (musi- cians or non-musicians) as the across-group factor and test condition (spoken, flat con- tour, fixed contour, or random contour) as the within-group factor. Results showed no significant effects for subject group [F(1,14)¼3.34, p¼0.089] or test condition [F(3,14)¼1.22, p¼0.288]; there wasno significant interaction [F(3,14)¼0.28, p¼0.688]. For MCI, there was a strong musician advantage. Musician performance was nearly perfect in all conditions, while non-musician performance was generally poorer and more variable. A split-plot ANOVA was performed on the MCI data, with subject group (musicians or non-musicians) as the across-group factor and test condition Fig. 2. Box plots for sentence recognition (top panels) and MCI (bottom panels) scores for musicians (M) and non-musicians (NM). Each panel shows data for different test conditions. The boxes show the 25th and 75th percentile, the error bars show the 10th and 90th percentiles, the solid line shows the median, the dashed line showsthemean,andthesymbolsshowoutliers. Crew et al.: JASAExpressLetters [http://dx.doi.org/10.1121/1.4929800] PublishedOnline25 September 2015 J. Acoust.Soc.Am.138(3),September2015 Crew et al. EL349 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 172.91.3.54 On: Fri, 25 Sep 2015 15:10:59 (piano, fixed word, fixed sentence, and random sentence) as the within-group factor. Results showed significant effects for subject group [F(1,42)¼17.604, p¼0.001] and test condition [F(3,42)¼23.05, p<0.001]; there was a significant interaction [F(3,42)¼22.0, p<0.001]. Because musicians scored nearly 100% correct in all test conditions, post hoc pairwise comparisons (with Bonferroni corrections) were performed only on non- musician data. Performances in the piano and fixed word conditions were significantly better than in the fixed sentence or random sentence conditions (p<0.001 in all four comparisons); there were no significant differences among the remaining conditions. Performance with three-semitione spacing was significantly better than with one- semtione spacing (p¼0.008) with no other significant differences. 4.Discussion There was no significant musician effect for sentence recognition, even for the most complex condition (random contour), possibly due to ceiling performance effects. Still, the present data are in line with previous studies that tend to show small musician effects for speech, if at all (Parberry-Clark et al., 2009; Fuller et al., 2014; Ruggles et al., 2014). Supporting our hypothesis, the musician effect for MCI became stronger as the stimuli became more complex. Musicians performed nearly perfectly for all test conditions, suggesting that musicians were better available to extract pitch information despite variations in timbre (in this case, words). Variations in timbre clearly affect non-musicians’ melodic pitch perception. The present results are in agreement with a previous study that showed that MCI performance in CI users was significantly affect by instrument timbre, with performance decreasing as the timbre complexity increased (Galvin et al., 2008); In that study, NH performance was also more variable as the timbre complexity increased, similar to the present results. It is possible that semantic differences across trials may have added to the complexity of the MCI task. If so, performance for the fixed sentence condition (the same sentence across trials) should have been better than for the random sentence con- dition (different sentences across trials). For non-musicians, performance was similar for the fixed and random conditions, and both were significantly poorer than for the fixed word condition (the same word across all notes and across all trials). Further, there was no significant difference between the fixed word and piano conditions, sug- gesting that consistent timbre cues allowed non-musicians to better extract pitch information. The SSC may be an effective tool with which to probe to relative contribu- tions of acoustic and electric hearing to speech and music perception in bimodal CI lis- teners. Pitch perception is poor with the CI alone, and the poor spectral resolution may lead to confusion between pitch and timbre cues causing a deficit in speech and/or music perception when both cues are varied. Adding a contralateral HA may improve pitch perception, which in turn may improve speech and music performance when pitch and/or timbre cues are varied. As such, the SSC may reveal greater bimodal ben- efit to speech performance than observed with previous studies. And while CI signal processing may be modified to improve melodic pitch perception (e.g., semitone-spaced frequency allocation) speech perception may be negatively affected. Optimizing CI sig- nal processing for music perception must not be at the expense of speech perception. The SSC may be used to evaluate both speech and music perception using stimuli that contain both pitch and timbre cues; as such, improvements and decrements in speech and/or music perception can be easily observed. Acknowledgments The authors thank the subjects for their participation. This work was supported by the NSF GK-12 Body Engineering Los Angeles program and NIDCD R01-DC004993 and R01-DC004792. Referencesandlinks Crew, J. D., Galvin, J. J. 3rd, Landsberger, D. M., and Fu, Q.-J. (2015). “Contributions of electric and acoustichearingtobimodalspeechandmusicperception,”PLoSOne.10,e0120279. Dorman, M. F., Gifford, R. H., Spahr, A. J., and McKarns, S. A. (2008). “The benefits of combining acoustic and electric stimulation for the recognition of speech, voice and melodies,” Audiol. Neurotol. 13,105–112. Fuller, C. D., Galvin, J. J. 3rd, Free, R. H., and Baskent, D. (2014). “Musician effect in cochlear implant simulatedgendercategorization,”J.Acoust.Soc.Am.135,EL159–EL165. Galvin, J. J. 3rd, Fu, Q.-J., and Nogaki, G. (2007). “Melodic contour identification by cochlear implant listeners,”EarHear.28,302–319. Crew et al.: JASAExpressLetters [http://dx.doi.org/10.1121/1.4929800] PublishedOnline25 September 2015 EL350 J. Acoust.Soc.Am.138(3),September2015 Crew et al. Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 172.91.3.54 On: Fri, 25 Sep 2015 15:10:59 Galvin,J.J.3rd,Fu,Q.-J.,andOba,S.(2008).“Effectofinstrumenttimbreonmelodiccontouridentifica- tionbycochlearimplantusers,”J.Acoust.Soc.Am.124,EL189–EL195. Kong, Y.-Y., Stickney, G. S., and Zeng, F.-G. (2005). “Speech and melody recognition in binaurally com- binedacousticandelectrichearing,”J.Acoust.Soc.Am.117,1351–1361. Kraus, N., Skoe, E., Parbery-Clark, A., and Ashley, R. (2009). “Experience-induced malleability in neural encodingofpitch,timbreandtiming:Implicationsforlanguageandmusic,”Ann.N.Y.Acad.Sci. 1169, 543–557. Parbery-Clark, A., Skoe, E., Lam,C., and Kraus,N. (2009). “Musicianenhancement for speechinnoise,” EarHear.30,653–661. Peretz, I., Champod, A. S., and Hyde, K. (2003). “Varieties of musical disorders: The Montreal battery of evaluationofamusia,”Ann.N.Y.Acad.Sci. 999,58–75. Rader, T., Adel, Y., Fastl, H., and Baumann, U. (2015). “Speech perception with combined electric- acousticstimulation:Asimulationandmodelcomparison,”EarHear.,inpress(2015). Ruggles,D.R.,Freyman,R.L.,andOxenham,A.J.(2014).“Influenceofmusicaltrainingonunderstand- ingvoicedandwhisperedspeechinnoise,”PLoSOne.9,e86980. Crew et al.: JASAExpressLetters [http://dx.doi.org/10.1121/1.4929800] PublishedOnline25 September 2015 J. Acoust.Soc.Am.138(3),September2015 Crew et al. EL351 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 172.91.3.54 On: Fri, 25 Sep 2015 15:10:59
Abstract (if available)
Abstract
Cochlear implants (CIs) provide good speech understanding to profoundly deaf individuals but do not provide good pitch perception, a critical component of music perception and enjoyment. The cues necessary for speech understanding are fairly well understood, but pitch perception remains a critical area of research in CIs. In this thesis, experimental data is presented that attempts to further explore pitch and music perception with CIs. ❧ In the first experiment, the influence of channel interaction or spectral smearing on melodic pitch perception was examined. The results indicated that the spectral envelope was used to rank the pitch of a stimulus in simulated CIs as CI processing removes the fine spectral cues and harmonic relationships. In the second experiment, the effect of adding a hearing aid (HA) in addition to a CI was examined for both speech-in-noise and melodic pitch perception. The addition of a HA improved speech perception slightly but drastically improved pitch perception. The results indicated that the fine-structure frequency cues provided by a HA contributed strongly to pitch perception, even in the presence of a competing instrument. ❧ In the third experiment, a test database of acoustic stimuli was created. The stimuli set consisted of 50 different words sung over an octave range. Thus the stimuli contained simultaneous pitch and speech information. This database was tested for normal hearing subjects, divided into two groups based on musicianship. There was no effect of musicianship on speech, but there was a large effect of musicianship for pitch perception. There were no differences across speech conditions, ranging from spoken utterances, to sung speech with a constant pitch, to sung speech with variable pitch. The melodic pitch conditions ranged from fixed timbre, either with a piano or the same word, to variable timbre, different words across a melodic contour. Non-musician performance was significantly worse relative to musicians for melodic pitch, and performance in the fixed timbre conditions was significantly better than in the variable timbre conditions. ❧ In the fourth experiment, this newly created database was tested with CI+HA users as in the second experiment. In general, performance worsened as the tasks became increasingly difficult, but combined device use improved speech and melodic pitch perception. The bulk of speech information was provided by the CI, but the HA provided the bulk of melodic pitch cues similar to the results observed in the second experiment. Pitch perception was much more difficult with variable timbre than fixed timbre, even with a HA. This suggested that CI+HA users still lack critical pitch processing abilities. ❧ The results suggested that CI users lack the necessary auditory cues for good pitch processing, especially in more complex music listening situations (e.g., polyphonic music, lyrical melodies). Leveraging residual acoustic hearing via a HA can help improve pitch perception. And while CI listeners may learn to use alternative cues (spectral envelope, repetition rate cues) to perform a pitch perception task, the restoration of the missing cues (namely fine-structure and harmonic relationships) is the ultimate goal for CI device design. This improvement would likely contribute to improved quality of sound with a CI.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Improving frequency resolution in cochlear implants with implications for cochlear implant front- and back-end processing
PDF
Speech enhancement and intelligibility modeling in cochlear implants
PDF
Did you get all that? Encoding of amplitude modulations at the auditory periphery predicts hearing outcomes
PDF
Intensity discrimination in single and multi-electrode patterns in cochlear implants
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Emotional speech production: from data to computational models and applications
PDF
Behavior understanding from speech under constrained conditions: exploring sparse networks, transfer and unsupervised learning
PDF
Speech recognition error modeling for robust speech processing and natural language understanding applications
PDF
Selectivity for visual speech in posterior temporal cortex
PDF
Towards a high resolution retinal implant
PDF
The role of music training on behavioral and neurophysiological indices of speech-in-noise perception: a meta-analysis and randomized-control trial
PDF
The role of individual variability in tests of functional hearing
PDF
Developmental trajectories of sensory patterns in young children with and without autism spectrum disorder: a longitudinal population-based study from infancy to school age
Asset Metadata
Creator
Crew, Joseph David
(author)
Core Title
Understanding music perception with cochlear implants with a little help from my friends, speech and hearing aids
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Publication Date
02/24/2016
Defense Date
12/03/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
atypical speech,bimodal,cochlear implants,electro-acoustic stimulation,music perception,OAI-PMH Harvest,pitch perception,speech perception
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Loeb, Gerald E. (
committee chair
), Shannon, Robert V. (
committee chair
), Kalluri, Radha (
committee member
), Narayanan, Shrikanth S. (
committee member
), Nayak, Krishna S. (
committee member
)
Creator Email
jcrew.research@gmail.com,jcrew@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-216450
Unique identifier
UC11277236
Identifier
etd-CrewJoseph-4167.pdf (filename),usctheses-c40-216450 (legacy record id)
Legacy Identifier
etd-CrewJoseph-4167-0.pdf
Dmrecord
216450
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Crew, Joseph David
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
atypical speech
bimodal
cochlear implants
electro-acoustic stimulation
music perception
pitch perception
speech perception