Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
English phoneme and word recognition by nonnative English speakers as a function of spectral resolution and English experience
(USC Thesis Other)
English phoneme and word recognition by nonnative English speakers as a function of spectral resolution and English experience
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ENGLISH PHONEME AND WORD RECOGNITION BY NONNATIVE ENGLISH SPEAKERS AS A FUNCTION OF SPECTRAL RESOLUTION AND ENGLISH EXPERIENCE by Monica Padilla A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BIOMEDICAL ENGINEERING) August 2003 Copyright 2003 Monica Padilla Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3116765 Copyright 2003 by Padilla, Monica All rights reserved. INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3116765 Copyright 2004 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY O F SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by under the direction o f h& f dissertation committee, and approved by all its members, has been presented to and accepted by the D irector o f Graduate and Professional Programs, in partial fulfillment o f the requirements fo r the degree o f DOCTOR OF PHILOSOPHY f . Director Director Date A u g u st 1 2. 2003 D i ' ■ " C 4 ' L ( C / k> ,- o Chair C... Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION To all my family, the big and small, Especially to the small ones For their tender love and kisses, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWELDGEMENTS I would like to thank my advisor, Dr. Robert V. Shannon for his help and support during this research. Also I would like to acknowledge the people at the DAIP department that have helped me trough out these years of research. Mark Robert and Dr. Qien-Jie Fu for designing the programs used for testing. Dr. Monita Chatteijee for her constant help, input and questioning through out all my research. I would also like to thank all the members of my dissertation committee. I would like to especially thank all my family for their constant support and love through out the years. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS DEDICATION .......... ii ACKNOWLEDEGMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lii LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .via ABSTRACT.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. INTRODUCTION ................ .1 1.1. Specific A im s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. AMERICAN ENGLISH RECOGNITION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Speech Recognition by Nonnative Listeners. . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Speech Recognition with reduced spectral information ............ 10 3. MODEL DESCRIPTION. .12 5.1. Phonemic Level .............. .12 5.2. Sensory Distortion .................. .1-1 5.2.1. Performance Intensity Functions .......... .15 5.2.2. Plomp’ s SRT M odel. ............ 16 5.5. Context Effects. . . . . . . . . . . . . . . ........ 19 3.3.1. Boothroyd and Nittrouer Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1 4. METHOD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2^ ¥.1. Listeners .............. 26 4.2. Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 5 4.5. Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 9 5. RESULTS. 51 5.1. Main Findings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2. Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.5. Performance in Quiet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.Performance Intensity Functions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5. Estimated SRT. .......... 57 5.6. Context Effects .......................... 65 5.6.1. Estimated & j \ . . . . . . . . . . . . . . . . . . . . . . . . .......... .65 5.6.2. Estimated tk \ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69 6. SPEECH PERCEPTION BY NATIVE CHINESE LISTENERS............74 6.1. Listeners . . . . . . . . . . . . . . . . . . . . . . . . . -----. . . . . . . . . . . . . . . . . . . . . . . 7 4 6.2. Performance ......................... 75 7. DISCUSSION. .......... 79 7.1. Vowel Space. ................................................................................79 7.2. Distortion ‘ D \ ............ 88 7.3. Context Effects ............ .99 8. CONCLUSIONS.. .............. .....104 8.1. Models U sed. .......... ..105 8.2. Spectral Information. . .............. .107 8.3. Critical Learning Periods.................. 108 REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vi LIST OF TABLES TABLE PAGE 1. Categories for nonnative English speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2. Characteristics of the subjects te ste d ...................................................................27 3. ANOVA measurements of subject significance............................... 33 4. ANOVA measurements of channel significance ..............................................34 5. ANOVA measurements of SNR significance ................... 35 6. Estimated parameters for the sigmoidal functions............................................ 48 7. Estimated parameters for vowels .................................................... 50 8. Estimated parameters for consonants............................. 52 9. Estimated parameters for words ................................................. 53 10. Estimated parameters for sentences ............ 54 11. Characteristics of normative subjects with Chinese as L I............. 74 12. Estimated parameters for the sigmoidal fitting for vowels. 76 13. Estimated parameters for the sigmoidal fitting of consonants............................77 14. Confusion matrix for native English listeners in full speech in quiet................ 82 15. Confusion matrix for toddler English learners in full speech in quiet............... 82 16. Confusion matrix for child English learners in full speech in quiet................ .83 17. Confusion matrix for teen English learners in full speech in quiet. ..................83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vii TABLE PAGE 18. Confusion matrix for adult English learners in M l speech in quiet..........83 19. Confusion matrix for native English listeners for eight bands in quiet. .84 20. Confusion matrix for toddler English learners for eight bands in quiet............. 85 21. Confusion matrix for child English learners for eight bands in quiet............... 85 22. Confusion matrix for teen English learners for eight bands in quiet...............85 23. Confusion matrix for adult English learners for eight bands in quiet............... 86 24. Parameter ‘ D ’ as a function of age of immersion in L 2 ..................................... 89 25. Parameter ‘ D ’ as a function of number of bands ..............................91 26. Slope of ‘ D ’ (number of dB/doubling of number of channels) . . . 91 27. PRT linear regressions ............................................................ 94 28. Slopes (dB/doubling) relating PRT and the log of the number of bands... 95 29. Coefficient for linear regression .......................................................... . 97 30. PRT linear regressions.................. 97 31. Parameters for distortion given the age of immersion....................................... 99 32. Familiarity with words............................... 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES FIGURE PAGE 1. Performance Intensity Function ......... 16 2. Speech reception threshold (SRT) for sentence in a ty p ical................ . 18 listening situation 3. Different j ’ factors ......................................22 4. Different ‘ k ’ factors.................................................... 24 5. Vowel Recognition in noise (2, 4 and 6 frequency bands).................................... 38 6. Vowel Recognition in noise (8, 16 frequency bands and ...............39 full speech) 7. Consonant Recognition in noise (2, 4 and 6 frequency bands)...............................40 8. Consonant Recognition in noise (8, 16 frequency bands and.................................41 full speech) 9. Word Recognition in noise (2, 4 and 6 frequency bands). ..................... 42 10. Word Recognition in noise (8, 16 frequency bands and full speech).................... 43 11. Sentence Recognition in noise (2, 4 and 6 frequency b an d s) 44 12. Sentence Recognition in noise (8, 16 frequency bands and full speech)............... 45 13. Performance in quiet for phonemes . . 49 14. Performance in quiet for words and sentences ................................. 49 15. PRT values (in dB) as a function of AOI in the 2n d language ..................55 16. PRT values as a function of the performance in Quiet ( Q ) ................ 57 17. Performance intensity functions for vowels of native English listeners............... 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ix FIGURE PAGE 18. Different levels of ‘ D ’ in the SRT curves .................................................. .. . . . 59 19. Values of ‘ D ’ for phonemes, words and sentences . . . . . . . . . . . . . . . . . . . . . . . 61 20. Values of ‘ D ’ for different age of immersion in English................................... 62 21. Distortion ‘ D ’ as a function of age of immersion in the 2n d language.................. 63 22. Fitted curves of ‘ j ’ for words .................................................... 66 23. Values of j ’ for all the subjects in the toddler learner group............................... 66 24. Values of j ’ for w o rd s....................................................... 67 25. Values of ‘ j ’ for sentences ...................................................... 68 26. Fitted curves for the different groups and two band conditions ...........69 27. Fitting for toddler learners with sixteen bands.......................................... 70 28. Parameter ‘ k ’ as a function of the number of bands for all data 71 29. Estimated ‘ k ’ (25,50,75%).......................................................... 72 30. Vowel performance in noise for native Chinese listeners.............................. 75 31. Consonant recognition in noise for nonnative listeners (LI is Chinese) 76 32. Word and sentence recognition for sixteen bands .......................................... 77 33. Perceived vowels for quiet, unprocessed speech ..........................................80 34. Perceived vowels for quiet, eight bands of spectral information ................. 81 35. ‘ D ’ for different stimuli as a function of A O I .................................... 89 36. Linear regression fitting of ‘ D ’ as a function of number of bands ...........90 37. PRT as a function of the logarithm of the number of bands............................ 94 38. PRT fitted curves ........................................ . 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. FIGURE PAGE 39. Independent factors of distortion........................... 97 40. Distortion as a function of AOI in the 2n d language .......... 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. xi ABSTRACT Previous studies have shown that nonnative listeners have poorer speech recognition compared to native listeners, particularly under difficult listening conditions. While native English speaking listeners can tolerate large amounts of distortion to speech, listeners to whom English is a second language (L2) may require a much longer period of learning to develop equally robust central speech patterns. The present study looked at speech perception by nonnative listeners under conditions of noise (SNR of 15dB, lOdB, 5dB, OdB) and reduced spectral information (2, 4, 6, 8 and 16 frequency bands). Subjects were tested using speech processed to simulate the listening condition typically encountered by cochlear implant listeners. Normal hearing listeners whose first language was Spanish were tested with American English phonemes, words and sentences. Results were compared with results for native English listeners tested under the same conditions. Speech perception depends on phonetic, lexical and linguistic knowledge. We wanted to determine the relative contributions of phonemic and lexical processing to speech recognition as a function of language experience by varying both the spectral resolution and the linguistic complexity of the materials. Contrary to what was initially expected, results determined that the main difficulty normative listeners encountered in L2 perception is in vowel perception, not in linguistic integration. Context factors Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. were determined using the Boothroyd and Nittrouer model (1988), in which a parameter ‘ k ’ reflects the degree of predictability in the sentence material. Results suggest that early L2 learners are able to use linguistic information when they are forced to do so. Plomp’s SRT (Speech Reception Threshold) model (1986) was applied to the data. Plomp defines a parameter ‘ D ’ that measures the distortion in perception of the speech stimuli. The difficulties in speech perception of the L2 by nonnative listeners are similar to difficulties encountered by cochlear implant patients. We do not suggest that lack of experience with a L2 is the same as suffering from a hearing loss or hearing impairment, but results show that this simple model can account for both factors. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 CHAPTER 1 INTRODUCTION Speech pattern recognition is a robust perceptual system that develops over a long period of time. Throughout life the brain develops complex pattern recognition mechanisms that are able to overcome conditions of severe degradation of the speech signal. This pattern recognition is particularly robust in a listener’s native language. In this series of experiments we tested listeners with varying degrees of English proficiency who learned English at different stages of their lives. This means that they have had different lengths of time to develop speech pattern recognition in the English language. Their speech pattern recognition in English then may not be as robust as that of native adult English listeners. As there might be also be some critical time periods of learning, people who learn a second language after childhood may be at a disadvantage even after a long time of learning a second language. Previous studies (Florentine, 1985a and 1985b; Flege, 1991; Mayo et al., 1997 and Barinaga, 2000) have shown that age of learning of a second language affects the listener’s performance in speech perception, particularly in difficult listening conditions. However, languages differ in their phonemic structure as well as their lexical structure. It is not clear then, whether the poorer performance in speech perception is caused by difficulty recognizing the constituent phonemes of words or from a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 difficulty in matching the phonemes heard to lexical knowledge (Boothroyd and Nittrouer, 1998). It is important to understand the factors that limit speech understanding by normative listeners especially under difficult listening conditions (airplane pilots, radio and telephone operators, diplomats and interpreters). To do so, it is also Important to understand how language experience affects the relative contributions of phonemic and lexical processing in speech recognition. 1.1. Specific Aims Central pattern recognition mechanisms in humans take many years to become fully trained. Once folly trained, we can rely on them to properly interpret distorted sensory information whether auditory or visual. Speech recognition depends on phonetic, lexical and linguistic knowledge. A comprehensive parametric study that explores performance in recognizing phoneme patterns and also includes measurements of context effects using words and sentences at the same time has also not yet been done, making it difficult to quantify the relative importance of peripheral versus central cues in speech understanding. In this series of experiments we measured phoneme, word, and sentence recognition of normative English listeners as a function of English experience. From this we expect to be able to determine how age of immersion in the second language affects speech recognition patterns for different types of stimuli. We will try to discern whether the difficulty in speech recognition originates in phonemic or lexical aspects of processing. Our objective is to determine Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 the relative contributions of phonemic and lexical processing as a function of language experience by varying both the spectral resolution and the linguistic complexity of the materials. Hearing-impaired listeners and cochlear implant patients receive stimuli with reduced spectral resolution, where the main cues present are primarily temporal acoustic cues. In cochlear Implant devices the sound signal is picked up by a microphone and converted into electrical signals. These signals are delivered to electrodes inserted in the cochlea that deliver the electrical stimuli directly to the auditory nerves to give the patient the sensation of “hearing” the “s o u n d s Normal- hearing listeners can be tested with speech simulating this type of stimuli (different degrees of spectral information) to try to predict the performance of cochlear implant patients with the same number of effective channels. We can then relate their performance to the performance of cochlear implant patients under normal and difficult listening conditions. In this study, listeners were tested under conditions of added noise and spectral degradation to simulate the information that a cochlear implant patient would receive with different numbers of channels. These conditions parametrically vary the difficulty of speech recognition tasks for both native and normative listeners. Native listeners are more able to overcome difficult listening conditions, since their speech pattern recognition mechanisms are well developed and more robust. Adult native English listeners have shown very robust performance when tested with this kind of stimulus distortion, showing that they can also make very good use of the temporal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 information available in the signal. Nonnative English listeners have been tested previously with speech stimuli distorted by the presence of background noise but they have not been tested with spectrally degraded stimuli. Results from this study should allow us to make quantitative predictions of how a cochlear implant user with English as a second language could perceive speech and help us design better ways to deliver speech information to them. We would be able to determine how the vowel and consonant space of the first language affects the perception of the second language and use this information when fitting patients with cochlear implants. Also, this should help us understand and predict potential errors in critical listening situations (air traffic controllers, diplomatic interpreters, etc.). This way communication protocols could be established to avoid confusions in critical communications. Two models were used to study both sensory and contexts effects in speech recognition. Plomp’s SRT model should let us see how spectral degradation affects the performance of normative listeners in noise and in quiet (sensory) while the Boothroyd and Nittrouer model should let us quantify how context (linguistic information) is used by nonnative listeners compared to native listeners. Different test conditions of distortion will be used and different categories of nonnative listeners with different amount of experience in the second language will be tested. The combination of these different test conditions and the two models used to try to quantify the effects of sensory and linguistic information present in speech for normative English listeners will give us an understanding of how these factors interact in speech recognition. This Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. will set up a framework to define a complete model of speech recognition both for native and nonnative English listeners, normal hearing and hearing-impaired, in quiet and in noise. In the next chapter, previous work done in speech recognition by normative listeners of English and also some studies that tested recognition with spectrally degraded speech in adults and children are discussed. Next the models to be used are described. The Boothroyd and Nittrouer model quantifies context effects in speech perception, and Plomp’s SRT model determines speech reception thresholds under different conditions. In chapter four, we discuss the experimental method: how the stimuli are processed and the procedure for testing. The following chapter presents the results obtained. Chapter six contains some analysis of the data. The last chapter contains the conclusions of this research. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 CHAPTER 2 AMERICAN ENGLISH RECOGNITION 2.1. Speech Recognition by Nonnative Speakers Previous studies in second language learning have looked at performance in speech recognition by nonnative listeners under conditions of noise. Florentine (1985a) tested English fluent listeners of different native languages and Native American English listeners using the Speech Perception in Noise (SPIN) test. Results showed that normative listeners had a more difficult time understanding speech in noise than the native subjects did, but no difference was found for high-predictability sentences. Florentine (1985b) found that the age of learning affects the perception of speech in noise by fluent nonnative listeners, and also that the phonetic errors tend to be related to their native language. Results obtained in English for normative listeners of Japanese background showed that the ones that learned English at age 10 or later did not benefit as much from context in high predictability sentences, while subjects that learned English as babies performed as well as native English listeners. Japanese subjects made more errors on the American English phonemes that do not exist in Japanese (/r/ and III are not separate phonemes in Japanese and there are less vowels in this language compared to English). These results show that phonetic performance also has effects on the way second language learners perceive words and sentences. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7 Since phonemic and linguistic performance have not been studied in combination it is difficult to separate their contributions to speech understanding in normative English listeners. Studies in second language, also defined as L2 have also focused in the relation between production and perception and the age of learning of the second language. Flege (1993) looked at the relation between production and perception of a phonetic contrast by nonnative listeners, whose native languages do not have words ending in /d/ and III. A group of native English speakers and a group of subjects who learned English at an early age were tested. Results showed that normative speakers resemble native English speakers more closely in “ perceiving” than “ producing” vowel duration differences. Flege, Munro, and MacKay (1995) looked at the relation between a person's age of learning (AOL) of a second language and the perceived accent by a group of ten native English listeners. A group of native English speakers and a group of Italian native speakers who learned English at different ages recorded a set of sentences. Flege et al. wanted to see if there was a critical period where people can learn a second language without an accent. Talkers with an average age of learning of 7.4 years were perceived to have an accent. This study shows then that even when learning a second language before puberty an accent in the production can be perceived. Previous studies have shown that production and perception of a second language are related and that perception of certain vowels features by normative speakers is more like natives than Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 its production (Flege, 1993). We can relate those results with our study where subjects that learned English at a very early age and have no perceived accent will be tested. The difference in performance at the phonemic level between monolingual English listeners and bilingual listeners has been studied too. Fox et al. (1993) analyzed the perceptual differences between vowel pairs for a group of monolingual English listeners and a group of native Spanish listeners classified as proficient or non proficient in English. Results showed that the vowel space of the second language for proficient bilinguals has become more separated from other similar vowels than that of non-proficient bilinguals. Mayo, Florentine, and Buus (1997) compared the performance in noise of Native Mexican Spanish speakers who learned American English before age 6 or after age 14 with native American-English speakers. The bilingual subjects were divided into three groups depending on their age of acquisition of the second language. The subjects who learned both languages at infancy formed the first group, the ones that learned English as toddlers formed the second group, and the ones that learned it after the age of 14 formed the third group. The SPIN test was used for all subjects. Listeners who obtained 96% correct responses in quiet were tested later in noise. The NTL (Noise Tolerance Level) for each group was defined as the point where subjects perform 50% correct recognition, although normal conversation requires more than this to be effective. It was observed that NTL decreased with increasing age of learning of the second language. Early bilinguals tolerated more noise than bilinguals who learned after puberty even when they had the same amount of years of exposure Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9 to the language. Also some differences were seen between early bilinguals and monolinguals, which could be explained by the interaction of the two different languages in the bilingual subjects. Critical periods of learning are also an important issue in second language acquisition. A recent article by Marcia Barinaga (2000) discusses the existence of critical periods in language learning and development. In the case of English learned as a second language, a lower performance in grammar tests can already be seen in learners who started as early as age 5. A different response is also seen physically in the brain, where the response to a grammar error is seen in the left side for early learners (before age 4) while late learners show more response in the right hemisphere. Researchers clarify that different elements of language learning might not have critical periods and late learners can still perform as well as early learners or native speakers, like in the case of semantics (meaning of words) and for some phonemes, which can be learned well throughout life. A plot of the results obtained for this grammar test showed that the scores decline with the age at which people were immersed in English and that performance flattens after puberty, suggesting a sensitive period that ends around this time. In our study, the age of learning is defined as the age at which the subject was immersed in an English speaking environment, not the age at which he/she started some kind of English studies without a real exposure to daily use of the language. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 10 2.2. Speech Recognition with Reduced Spectral Information Previous studies have tested native monolingual English listeners with stimuli with reduced spectral information. This type of stimuli simulates what a patient with a cochlear implant would hear. Native English listeners tested by Shannon et al. (1995) with vowels and consonants with reduced spectral information accomplished almost 100% recognition in vowels and 90% in consonants with only four frequency bands. Hearing-impaired adults fitted with cochlear implants have achieved asymptotic performance with speech delivered to four to six electrodes as shown by Dorman and Loizou (1998a) and Fishman et al. (1998). These previous studies have been done with speech signals in quiet. Dorman et al. (1998b) and Fu et al. (1998) found that to achieve the same performance levels in noise, more channels were required for normal hearing and implanted adults, showing that more detailed spectral information is needed under difficult listening conditions. It will be interesting to compare these results with the performance of normative speakers under the same conditions of noise and spectral degradation. We predict that nonnative English listeners will require more channels of information compared to native listeners. Speech recognition in adults is a very robust process even under conditions of severe spectral degradation but this ability probably requires many years of learning. Eisenberg et al. (2000) measured speech recognition in two groups of normal hearing children (5-7 and 10-12 years of age) with reduced spectral cues, and compared their results with results for normal hearing adults. The spectral degradation was simulated using the technique developed by Shannon et al. (1995) to get speech signals of 4, 6, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8, 16 and 32 bands. Children were tested with sentence material taken from the Hearing In Noise Test for Children (HINT-C) that was recorded at the House Ear Institute (Gelnett et al., 1995). This database is composed of 130 sentences spoken by a male talker that normal hearing children at 5 or 6 years of age can identify. Children were divided in groups that were tested in three different band conditions. Also words, nonsense syllables, and digits were used for testing. Older children and adults obtained similar results for all the tests and the noise band conditions tested. With 8 bands of information adults obtained a mean score of 93% in the HINT-C test while children between 10-12 years scored 94%. Younger children achieved an average score of 82% correct and only two of the group of six tested with 32 bands obtained a similar score to those of adults and older children in the eight-noise band condition. Context effects were measured using the ‘ j ’ factor defined by Boothroyd and Nittrouer (1988). This factor defines if all the parts of a whole word or sentence are needed or not for the whole to be recognized. The value of ‘ j ’ was higher for the younger children for the HINT-C stimuli, an indication that they were taking less advantage of the sentence context to help them in the word recognition. Results from this study showed that younger children have more difficulty in recognizing speech with reduced spectral information compared to older children and adults. Many years are required to integrate sensory, cognitive, and linguistic information to acquire robust speech pattern recognition. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 12 CHAPTER 3 MODEL DESCRIPTIONS 3.1. Phonemic Level There are two models that try to explain the performance of normative listeners in phoneme recognition. The Perceptual Assimilation Model (PAM) developed by Best et al. (1988) focuses on the discrimination of sounds in an unknown foreign language. This model states that in a foreign language certain pairs of sounds are easier to discriminate than other pairs. Sounds in a foreign language are perceived in agreement with similarities or differences found with sounds in the native language (Best and Strange, 1992). Sounds can be assimilated to a native category. In this case they are heard as a particular native segmental category or as a completely different exemplar of that category. Sounds can also be assimilated as uncategorizable speech sound, when they are between two different native categories for example. They can also be heard as a nonspeech sound. The Speech Learning Model (SLM) developed by Flege (1991) focuses mostly on experienced second language (L2) learners. It proposes that a phonetic contrast can be learned during the whole lifetime and the person can then establish phonetic categories. According to this model, the earlier an L2 learner is exposed to the second language more likely he/she will form a new phonetic category. Also L2 learners are Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13 more likely to develop new phonetic categories for sounds that are perceptually distant from the closest native category that for sounds close to a native category. A study by Guion et al. (2000) showed that the PAM model better explains the results for consonant pair discrimination by normative English listeners. The SLM model predictions were not consistent with the results. The SLM model proposes that an L2 learner must hear the phonetic differences between LI (native language) and L2 (second language) sounds before they can establish a new L2 phonetic category. Evidence of learning was seen only in one of the three consonant pairs expected. These two models can help us understand in part how a second language is learned and how some sounds that are very different from the categories in the first language can be learned at a later age and still form new categories. However, they do not provide a complete accounting for speech recognition performance by normative listeners. The PAM and SLM models look only at phonetic categories, they do not intend to explain how lexical and linguistic knowledge affects the perception of a foreign language and they do not try to quantify these differences. Using these models we can try to qualitatively predict how phoneme recognition occurs and explain some of the confusions between certain pairs of phonemes, but we will not be able to quantify or to clearly see how age of acquisition affects the results. Also, these models do not account for difficult listening conditions like noise and spectral degradation. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 14 3.2. Sensory Distortion Other models have been used to predict speech recognition performance under difficult listening conditions: the Articulation Index (AI) (French and Steinberg, 1947), the Speech Transmission Index (STI) and Plomp’s Speech Recognition Threshold (SRT) model. These models have also been used to predict speech recognition for mild-to-moderate hearing-impaired listeners. The AI model uses a weighting function that gives different importance and different weights to different spectral bands (Kamm et al., 1985 and Pavlovic et al., 1986). The STI model has been used to predict speech recognition especially under conditions of reverberation (Payton and Braida, 1999). Plomp's SRT model is the only one previously used to account for the distortion present in cochlear implant patients (Shannon and Fu, 1999). It includes a parameter to account for hearing loss due to distortion (D). Shannon and Fu (1999) showed that for native English speakers this distortion parameter was related to the number of spectral channels included in the speech signal. The same relation followed for consonants, vowels and sentences tested. These “channels ” or frequency bands are a way to simulate what cochlear implant patients hear. Since this model worked well in explaining what type of distortion affects native English listeners under these conditions, it seemed interesting to apply it to the behavior of nonnative English listeners in speech recognition under the same conditions of distortion. We decided to use the SRT model to try to discern the sensory effects of the degraded signal on speech recognition extending it to the case of nonnative English listeners. We will use Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 15 the same type of stimuli to test nonnative listeners to see how they will perform under these conditions and quantitatively predict their behavior and the parameters that will be important when fitting a patient with a cochlear implant. The performance data will be fitted with performance intensity functions (described below) to find the parameters that best describe the behavior in speech recognition. These parameters will also be used to determine different values in Plomp’s SRT model. 3.2.1. Performance Intensity functions Recognition scores for consonants, vowels, words and sentences plotted as a function of signal to noise ratio can be fitted using a simple sigmoidal model defined by Boothroyd et al. (1996), as can be seen in figure 1. In studies by Fu et al. (1998) and Friesen et al. (2001) plots of performance as a function of SNR were fitted using this performance-intensity (PI) function, (3.1) %C = P0 + (Q - Po)/(l + eP (x 'P R T ) ) Where Pq is the chance level of performance, Q is the performance level in quiet, PRT is defined as the phoneme recognition threshold in dB, x is the SNR in dB and f is related to the slope of the sigmoidal function. The parameters Q, PRT and f are estimated from the data. Preliminary results showed that PRT and Q are linearly related to each other as a function of spectral resolution (Fu et al., 1998). The value obtained for PRT represents the SNR at a 50% level of performance relative to the performance in quiet. It is the SNR at a halfway point between the chance level of performance and the performance in quiet. With the fitted values obtained for each Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. curve using the equation above, we can also find the SNR at a 50% correct recognition score which will be used in Plomp’s model (described below). Performance 3.2.2. Plomp’ s SRT Model Plomp's SRT model (Plomp, 1978, 1979 and 1986) estimates the speech- reception threshold (SRT) for normal and for hearing-impaired listeners in the presence of noise and in quiet. It also contains parameters describing the effects of a hearing aid. This model will allow us to look at the recognition performance for phonemes, words and sentences under conditions of spectral degradation and noise. Noise is the most frequent disturbing factor for speech understanding even for normal listeners and one of the most difficult problems for listeners with a sensorineural hearing impairment. 100 -20 -10 0 10 20 30 Signal to Noise Ratio (dB) Figure 1. Performance intensity function. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 17 Plomp defines the SRT as the sound pressure level required for 50% speech intelligibility with the use of meaningful sentences for testing. We will also use the same definition to find the sound pressure level required to correctly recognize 50% of phonemes and words. The threshold value (L0 ) found in quiet was assumed to be determined by the ear’s internal noise added to external noise. All the parameters were expressed in dB, so the effects were added (antilogarithmically) resulting in the following equation for ’ normal- hearing’ listeners: (3.2) SRT = 101og(10i"/1 ° +10(i*~A i'w )/1°), Where L0 = SRT in quiet in dB (A), Ln = sound pressure level (SPL) of the noise in dB (A) and A L $ n = SNR in noise (with negative sign). In figure 2, the lower curve represents equation (3.2) for people with normal hearing, with L0 = 16dB(A) and A L s n = 8dB. In quiet and low levels of noise the SRT is determined by the first term of the equation (L0 ), while at high levels of noise (>30dB), it is determined by the second term of the equation (A L sn)- A study by ter Keurs et al. (1989) confirmed that the SRT for all noise levels could be defined by measuring the SRT in quiet and at a high noise level. Hearing loss is defined in this model as a combination of two factors. Class A refers to attenuation of all the sounds entering the ear. As shown in figure 2 it manifests itself as an elevation of the level of SRT in quiet. This type of hearing loss can be compensated with hearing aids. Class D is comparable to distortion (you heard something but you do not understand what was being said) and shows as an increase Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the SNR required for speech understanding at all noise levels. This type of hearing loss cannot be compensated by the use of hearing aids. Classes A and D £ 50 Class A CD m £ 30 Class D Normal _ Threshold Sound Pressure Level (SPL) of the noise (dB(A)) Figure 2. Speech reception threshold (SRT) for sentences in a typical listening situation. Including the two types of hearing loss, A and D in his model, Plomp defines the overall equation for 'hearing-impaired’ listeners as: (3.3) SRT = 101og(10(i‘ ’^ + O )/1 ° + io(i"-A i» + D )/1 °), Where A+D = hearing loss for speech in quiet in dB and D = hearing loss for speech in noise in dB. Plomp’s SRT model also includes parameters to account for the hearing aid effects. These parameters are the acoustic gain (G) of the hearing aid, the noise introduced by the microphone (Lnt) and a parameter that accounts for the properties of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 19 the hearing aid (S). We did not consider these parameters when we used Plomp’s model since we only tested normal hearing listeners with no hearing aids. We use Plomp's equation to fit the data in the present series of experiments, since listeners are tested with ‘ spectrally ’ distorted speech. 3.3. Context Effects Once speech sounds are audible, they must be combined into meaningful words and sentences. Different models try to look at how context and linguistic information affects speech perception. The Boothroyd and Nittrouer model was selected to try to quantify the effect of context in speech recognition since it has been used for both words and sentences. The neighborhood activation model (NAM) developed by Luce and Pisoni (1998) was also considered. The NAM model only relates the recognition of phonemes to the recognition of words depending on the familiarity with the word and the frequency of occurrence in the language. In our study we use both words and sentences in English to test our subjects. Since the NAM model does not consider context effects in the recognition of words in sentences, we decided to use a more simple method like the Boothroyd and Nittrouer model that also quantifies these context effects. The Boothroyd and Nittrouer model has also been previously used to quantify amount of context being used by hearing-impaired subjects. Most and Adi-Bensaid (2001) tested postingually and prelingually profoundly hearing-impaired subjects with an auditory-visual stimulus in Hebrew language. They did not find major differences within the two groups of subjects tested Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 20 in the amount of context being used and showed that both groups benefited when meaningful words and sentences were used. None of the models previously discussed have been used to predict speech perception or quantify context effects in nonnative English listeners. Other theories try to explain how word perception occurs for normative listeners. A recent study by Pallier et al. (2001) looked at two existing hypothesis of how words are represented in a lexicon. These theories may allow us to explain some of the factors affecting speech perception by nonnative English listeners although they will not allow us to quantify context effects like the Boothroyd and Nittrouer model does. The first theory (acoustic-trace theory) claims that words are recognized by directly comparing memorized detailed acoustic patterns with the pattern of the presented speech signal. In our case then, if L2 listeners can perceive the phonetic contrasts then words that differ only in one phoneme should not be perceived as homophones. A second theory states that words are represented as abstract phonological entities. There is a prelexical phonological representation used for matching with lexical representations. When two L2 phonemes are matched to the same LI phoneme, their representation will become the same at the prelexical level and two different words that differ only in this contrast will be functionally homophones. Pallier et al. (2001) tested Catalan-Spanish bilinguals who learned the second language at an early age (6 years old at the latest) with Catalan words or words common to both languages. Spanish dominant listeners processed Catalan-specific Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 21 words differing in one contrast (that does not exist in Spanish) as homophones, showing that word recognition uses a “language-specific ” phonological representation. Results also showed that lexical entries are stored in the mental lexicon as abstract forms. It is important to consider different models besides the ones we will be using in case some of the results differ from what we expected. Then, we may find that the results can be best predicted when considering other models also. 3.3.1. Boothroyd and Nittrouer Model This model (Boothroyd and Nittrouer, 1988) defines two parameters, ‘ k ’ and j ’ to quantify context effects in speech recognition. This model will be used to try to quantify the usage of context by nonnative English listeners as a function of the experience with the language. In addition, the effect of spectral degradation will be studied looking at how these factors change with different amount of spectral information available. Boothroyd and Nittrouer (1998) assume that the probability of recognizing a whole (word or sentence) is given by the joint probability of recognizing the parts or segments that form this whole. Assuming each of these parts is statistically independent and equally recognizable, we can define equation (3.4). (3.4) p ^ p f Where p w is the probability of recognizing a whole, pp is the probability of recognizing parts (phonemes) in a whole and n is the number of parts in the whole. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 22 In real words, the parts that form the word are not independent due to coarticulation and properties of the lexicon. To be able to recognize a whole word, not all the different parts or phonemes have to be recognized, so we can define, Where l^j<n. The factor ‘ j ’ relates the probability of recognition of parts in a whole (pp) with the probability of recognizing the whole (pw ), as shown in equation In the case of words, values of 'j' closer to one mean that fewer phonemes need to be recognized to recognize a whole word. In the previous study (Boothroyd and Nittrouer, 1988) a value of y ’ of 2.5 was found for monosyllabic CYC words. Subjects tested then responded as if the words they heard consisted of 2.5 phonemes instead of three independent phonemes. Figure 3, shows the curves given for different ‘ j ’ values. (3.5) Pw= Pp (3.6). (3.6) j = log (p w ) / log (pp) 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Phoneme Recognition Probability in W ords Figure 3. Different ‘ j ’ factors. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 23 In a previous study (Eisenberg. et al. 2000), factor ‘ j ’ was used to relate the probability of recognizing words in sentences (parts in a whole) with the probability of recognizing a whole sentence (whole). In our study we will also find the value of this factor in the case of sentences, to determine how many words are effectively needed to recognize a complete sentence. To quantify the effect of context in sentence recognition, Boothroyd and Nittrouer (1988) relate the probability of recognizing words presented in isolation and the probability of recognizing words in sentences (with context). Context adds sensory information to the speech. In this approach, the logarithms of the error probabilities for contextual information and for data available in the speech units are added together, then we have, (3.7) log(l- p c ) =log(l-pi)+log(l-pJ Where p c is the probability of recognizing a speech unit in context, p f is the probability of recognition without context (in isolation or nonsense material) and p x is the probability of recognition from context effects alone. The “whole ” speech unit, the speech material and context are presented under the same conditions (of noise or distortion) so we can further assumed that Iog(l- px ) is proportional to (1- p ), so equation (3.7) is reduced to, (3.8) log(l-pc )= k lo g (l-p t ) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 24 The factor ‘ k ’ is a parameter that reflects the degree of context or predictability in the sentence material being used. Then, we can define equation (3.9) and equation (3.10) to determine ‘ k (3.9) Pc= l - ( 1 - Pi) k (3.10) k = log (1 -p c) / log (1 -pi) In the case of effects of context in sentences, p c is the probability of recognizing words in a sentence and pt is the probability of recognizing isolated words. Higher values of 'k' mean that there is more use of the context available as shown in figure 4. XI a ja o a. c o 2 "E os o o ® O S m o o c o c a m m I ~ k=4 ■ f — — ir1 — 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recognition Probability for Isolated Words Figure 4. Different ‘ k ’ factors. Values of ‘ k ’ closer to one mean that the listener is not able to use the context available in the sentence material. Larger values of ‘ k ’ mean that more context is being used, so even if isolated words are harder to recognize, words in a sentence are easier Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 to recognize aided by the context present. Values of ‘ k ’ were found for words (relation between phonemes in nonsense words and phonemes in meaningful words) and sentences (Boothroyd and Nittrouer, 1988). The values found suggest that sentence context is more important than word context and that semantic constraints are the single most important contextual factor. Results showed that lexical, syntactic and semantic constraints serve to increase the recognition probabilities for phonemes and words presented in noise. A recent study conducted by Grant and Seitz (2000) with hearing-impaired subjects found that the values of wer e affected by individual variability and that ‘k ’ was not constant for different levels of intelligibility. We will have to consider then the effects of individual variability in the results obtained for 'k' to make sure that we are getting representative results for each category. Grant and Seitz (2000) calculated different values of 'k for each level of intelligibility and very few points were used to fit the curves. In our case, for each spectral band condition we have several points (noise and quiet conditions) to fit values of 'k, so we are more likely to find a representative value for this context factor. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 26 CHAPTER 4 METHOD 4.1. Listeners Adult, normal-hearing listeners whose primary language is Spanish were used for testing. Four groups were defined according to experience with the English language (age they learned English as a second language) as shown in table 1. Table 1. Categories for Normative English Listeners Group Age o f Immersion in L2 A (Toddler) Before age 5 B (Child) Between age 5 and 12 C (Teen) Between age 12 and 18 D (Adult) After 18 years of age (18-25) These age ranges were selected to coincide with previous findings on language development. Previous studies have shown a drop in performance in linguistic patterns when a second language is learned after the age of four. Activation of different areas of the brain has also been seen after this age (Barinaga, 2000). It has also been observed that after puberty a second language will be learned with an accent (Flege et al., 1995). These findings helped us to define a “toddler learner ’’ group and “child learner” group as two distinct categories. Previous studies (even physical studies of brain development) have shown that there is little or no difference in learning a second language after puberty. From this Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 27 we might predict that there will not be a clear distinction in performance between L2 teen and adult English learners. However, we include both categories for completeness. Our goal is to determine the role that linguistic and sensory information play in speech recognition. To do this we should also consider the importance of experience with the second language. Although it is probable that lower level or more sensory recognition patterns do not improve training after reaching puberty it is still possible that a higher or more centralized mechanism involving linguistic and lexical knowledge can still develop more robust patterns of recognition. Table 2. Characteristics of the nonnative English subjects tested. Subject Age (years) AOI (years) 12 Experience (years) Daily Use ofL2 AF1 20 4 16 80% AF2 20 3 17 99% AF3 21 3 18 99% AMI 35 0 35 95% AM2 21 0 21 98% Mean 23.40 2.00 21.40 94.2% BF4 20 9 11 40% BF5 20 7 13 80% BF6 26 5 21 90% BM3 23 5 18 90% BM4 33 5 28 70% Mean 25.5 6.75 . 20.00 72.0% CF7 24 15 9 90% CF8 21 12.5 8.5 80% CF9 22 13 9 80% CMS 21 12 9 90% CM 6 24 16 8 85% Mean 22.4 13.7 8.70 85.0% DF10 33 18 15 90% DF11 40 37 3 40% DM7 33 25 8 70% DM8 22 19 3 40% DM9 33 28 5 90% Mean 32.5 25.4 6.80 66.0% (A, B, C and D refer to the age group of the subject; F = Female; M = Male; AOI - Age o f Immersion in the second language) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 28 At the end of the experiments we tested a total of five listeners for each group defined. In the following chapters we show the results obtained for the normative English listeners tested and a control group of five native English speakers included for comparison. All participants in the experiment have at least a background education of High School or higher and are between 18 and 40 years of age (at the time of testing). Table 2 shows some of the characteristics of the normative English speakers that have been tested. Actual age of the subjects tested and the age they were immersed (AOI) in the second language are shown. All the subjects tested use English daily in their conversations at least 40% of the time and have several years of experience with the second language. The early learner groups (toddler and child) have more years of experience with the second language when compared to late learners of English (teen and adult group). 4.2. Stimuli Listeners were tested with phonemes, words and sentences. The choice of different stimuli will allow us to differentiate between processing mechanisms used for speech recognition. Listeners were tested with 20 medial consonants in (a/C/a) context (Shannon et al., 1999). Consonants by three male and three female speakers who obtained the higher recognition scores in the previous study were chosen from the consonant database. A total of 12 vowels presented in the (h/V/d) context (Hillenbrand et a l, 1995), spoken by five male and five female speakers were used in the test. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 29 Subjects were also tested with words and sentences. The word database consisted of 10 lists of 50 words each. The sentences were taken from the HINT database, which consists of 26 lists of 10 sentences each. The speech signals were processed in different forms. Five different levels of speech shaped noise were added to produce speech-to-noise ratios (SNR) of 15dB, lOdB, 5dB, OdB and -5dB. These noisy speech signals and the original speech (no noise) were then processed to systematically reduce the spectral information in the signal. Bandpass filters were used to extract 2, 4, 6, 8 and 16 frequency bands to produce varying degrees of spectral degradation (Shannon et al., 1995), simulating the stimuli received by cochlear implant patients. A total of 36 signal-processing conditions were presented to all the participants, both native and normative English listeners. These conditions will allow us to see the effects of noise in speech recognition and also the effects of spectral degradation, which simulates speech processed by a cochlear implant. 4.3. Procedure Listeners were tested in a sound-treated booth (IAC). Sounds were presented through headphones at a comfortable listening level (70 dBA). Two repetitions of each consonant and each vowel were presented in each test condition. The tokens were presented using the Condor software developed at the House Ear Institute (Robert, 1998). After each consonant or vowel was played, the subject selected on the screen which phoneme he/she thought was presented. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 30 For word and sentence tests subjects were tested using software also designed at the House Ear institute (Tigersoft by Qian-Jie Fu). In this test the subject repeated what they heard and the experimenter entered the information to obtain the final score. Test conditions of number of channels and signal to noise ratio were randomized and counterbalanced across subjects (Edgington, 1995). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 31 CHAPTER 5 RESULTS 5.1. Main Findings The largest effect of age of immersion in the second language is seen in the results for vowels. Toddler learners, who could be considered as fully bilingual had a lower performance compared to native English listeners, even under conditions of no spectral distortion. Results showed a “ grouping effect” in performance depending on the age of immersion in the second language, especially in the case of vowels and words and even in the case of sentences. Toddler learners and child learners performed similarly (better), while teen and adult learners tended to have similar (lower) performance. Differences in recognition were still seen in the performance of the teen and adult English learners especially in less noisy conditions. Spectral information becomes more important in the presence of noise. With reduced spectral information the drop in speech recognition was greater when noise level was increased. No effect of age of immersion in L2 was seen in the use of lexical information (small difference with native English listeners). An effect of age of immersion was seen in the usage of context in the case of full speech (no spectral distortion introduced). The value of 'k' decreased as the age of immersion in the second language Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 32 increased, showing that they were less able to make use of the context information available in the speech signal. This result was not consistent for all spectral bands. Under difficult listening conditions, early English learners seemed to have larger values of ‘ k ’ compared to native English listeners. In most cases, the value of V increased as the number of spectral bands of information was reduced. The value of the distortion parameter 'D' was higher as age of immersion in the L2 language increased, as if spectral (sensory) information was effectively reduced for L2 listeners. 5.2. Performance Data has been obtained for each group of normative listeners and is shown in figures 5 through 12 along with data obtained from five native English listeners, tested with the same stimuli. The recognition performance for two, four, six, eight and sixteen bands and for full speech for vowels, consonants, words and sentences is shown as a function of the signal to noise ratio (SNR). The fitted curves and the values obtained for the parameters estimated using the sigmoidal model (equation 3.1) are included. In our study we define PRT as the SNR level that produces a 50% level of performance relative to the “performance in quiet” in the case of phonemes, words and sentences. In a more general way, we could then define PRT as the Performance ’ Recognition Threshold. One-way measurements of ANOVA were performed on data for all the listeners tested. No statistical significance due to subjects was found in each of the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 33 categories for the stimuli tested as shown in table 3, this means that we can safely average the results obtained for the subjects into categories and analyze the data this way. Some subjects in the native English listeners group were not tested in all of the most critical noisy conditions (-5dB) because they reached floor effects, but this was taken into consideration when realizing the statistical analysis. A statistical significance was found due to category (native English listeners, toddler, child, teen and adult English learners) for vowels (F(4,871)=42.090, / K 0 . 0 0 0 1 ) , words (F(4,854)=933\, p<0.000\) and sentences(F(¥,Sb3) =9.653,_p<0.0001). In the case of consonants a barely significant statistical difference was found (F(4,871)=3.297, j p=0.011) if we consider the limit for statistical significance at p-0.05. Post hoc pair wise tests using Bonferroni adjustment showed that a significant difference exists between the adult English learners and the toddler learners (p=0.034) and the native English listeners (p=0.021), but not between other categories. Table 3. ANOVA measurements for subject significance within the same category. Category Vowels Consonants Words Sentences Native F(4,171)=0.189 p=0.944 F(4,171)=0.867 p=0.487 F(4,154)=0.248 p=0.910 F(4,163)=0.352 p=0.843 Toddler F(4,175)=0.416 p= 0.797 F(4,175)=0.539 p= 0.707 F(4,175)=0.172 p=0.953 F(4,175) =0.861 p=0.489 Child F(4,175)=0.108 p=0.980 F(4,175)=0.254 p=0.907 F(4,175)=0.879 p=0.478 F(4,175)=0.219 p=0.928 Teen F(4,175)=0.485 p= 0.787 F(4,175)=0.085 p=0.987 F(4,175) =0.465 p= 0.761 F (4,175) =0.778 p=0.541 Adult F(4,175)=2.005 p=0.096 F(4,175)=0.198 p=0.939 F(4,175)=0.761 p=0.552 F(4,175)=0.942 p=0.441 Two-way measurements of ANOVA were performed on the data to determine significant difference between the two variables of SNR and different number of Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. channels. We found statistical significance due to these two factors on the data collected. To be able to look in more detail at the difference between different SNR and different channel conditions we also realized one-way measurements of ANOVA and post hoc pair wise tests using Bonferroni adjustment to determine were the main differences existed between different test conditions. Tables 4 and 5, show the results of one-way ANOVA measurements for channel condition and SNR condition significance. There is a statistically significant difference due to number of channels presented for all the categories and all the stimuli presented as shown in table 4. Post hoc par wise tests showed that main differences for all the stimuli tested existed between the worst SNR conditions (-5dB and OdB) and the best SNR conditions (5, 10, 15dB and quiet). Performance at 5dB and lOdB was also significantly different to the performance in quiet. Table 4. ANOVA measurements for channel significance. Category Vowels Consonants Words Sentences Native F(5,170)— 94.41 P<0.0001 F(5,170)-20.27 P<0.0001 F(5,153)=21.39 P <0.0001 F(5,162)=15.14 P<0.0001 Toddler F(5,174)=64.23 P<0.0001 F(5,174) =28.02 P<0.0001 F(5,174)=34.08 P <0.0001 F(5,174)=20.41 P<0.0001 Child F(5,174)=68.67 P<0.0001 F(5,174)=31.06 P<0.0001 F(5,174)=33.81 P<0.0001 F(5,174)=19.78 P <0.0001 Teen F(5,174)— 63.11 P<0.0001 F(5,174)=27.68 P<0.0001 F(5,174) =32.67 P<0.0001 F(5,174) =22.69 P<0.0001 Adult F(5,174)-46.84 P<0.0001 F(5,174)=32.49 P<0.0001 F(5,174)=34.53 P<0.0001 F(5,174) =24.99 P<0.0001 There is also statistically significant difference due to the signal to noise ratio (SNR) of the signal presented for all the stimuli tested for the different categories Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 35 tested as is shown in table 5. In the case of consonants, words and sentences, post hoc pair wise tests showed that main differences existed between the conditions of very distorted signal (two bands), conditions of four, six and eight bands and better conditions of spectral information (sixteen bands and full-unprocessed speech). Some relation was found between the conditions of four, six and eight frequency bands. Also some relation was found in the performance with sixteen bands and full speech. The performance with two frequency bands was significantly different than all other conditions tested. In the case of vowels, different channel conditions are more significantly different and only some relation is found between the performance with four and six bands. Table 5. ANOVA measurements for signal to noise ratio significance. Category Vowels Consonants Words Sentences Native F(5,170)=6.51 P<0.0001 F(5,170)=16.47 P<0.0001 F(5,153)=10.05 P<0.0001 F(5,162)=6.90 P <0.0001 Toddler F(5,174)~9.77 P<0.0001 F(5,174)=29.06 P<0.0001 F(5,174)= 16.97 P<0.0001 F(5,174)=32.98 P<0.0001 Child F(5,174)=10.43 P<0.0001 F(5,174)=28.78 P<0.0001 F(5,174)=17.28 P <0.0001 F(5,174)=36.29 P <0.0001 Teen F(5,174)=9.93 P <0.0001 F(5,174)=31.13 P<0.0001 F(5,174)=16.66 P<0.0001 F(5,174)=26.49 P<0.0001 Adult F(5,174)=5.78 P<0.0001 F(5,174)=20.36 P<0.0001 F(5,174)=12.64 P<0.0001 F(5,174)=25.15 P<0.0001 In the case of vowels (figures 5 and 6), there is a grouping according to the age of immersion in the second language. The teen and adult learners perform at a similar (poorer) level while the toddler and child learners perform at a higher level. There is some difference between different categories, which is significant and this is more Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 36 noticeable in the case of teen and adult learners. Teen learners perform better than adult learners in less noisy conditions showing that they can take more advantage of the information present in the signal. As the spectral information in the signal increases, subjects from the teen learner group start showing a better performance at lower SNR compared to adult learners. Then as more spectral information becomes available to these listeners they are less affected by noise. A way to explain this is that this group’s recognition pattern is more robust to spectral degradation compared to adult learners of English, but with higher levels of noise in the signal their recognition drops to the same levels of adult learners, so it is not completely well trained. Teen learners never reach the same level of performance of child or toddler learners. Still, results show that there is an advantage on learning a second language prior to being an adult (18 years old), even if this occurs after puberty, and this is true for all the stimuli used in this study. Figures 7 and 8 show the performance for consonants for all groups. The differences in performance in this case are very small when compared to the differences observed for vowels. Only the adult learner group shows a significant difference in performance compared with natives and early learners of English, while the other L2 learners perform almost as well as the natives even under conditions of high levels of noise and high levels of spectral degradation. These results show that for consonants even under conditions of noise, when all the spectral information is present in the speech signal most L2 learners can perform as well as native English listeners. The only group that shows to always be at a disadvantage is the adult learner group, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 37 although their performance is only significantly different to the performance of the native monolingual English listeners and the toddler English learners. When spectral resolution is reduced it can be seen that the recognition patterns of nonnative listeners are not as robust as the patterns developed by native English listeners and some differences in performance are seen. Consonants rely more on temporal information than vowels so even after the spectral information is reduced; there is still more information available to listeners. This can probably help explain the poorer performance of nonnative English listeners with vowels compared with the performance with consonants. It is possible that this poorer performance is also related to the phoneme space of the first language. The vowel space in Spanish (five vowels) is less than half the vowel space present in English (twelve vowels) and differences between certain pairs of vowels are less perceptible. Fox et al. (1993) found that for proficient bilinguals the vowel space of the second language became more separated from other similar vowels when compared to non-proficient bilinguals. We looked in more detail the vowel space of the listeners tested to find here an explanation to this poorer performance. The estimated PRT values for both vowels and consonants followed our expectation. In most cases as age of immersion in the second language increases and performance decreases, PRT values increase, showing that a higher signal to noise ratio will be needed for better speech recognition. Also as spectral resolution increases, PRT values decrease, showing the importance of spectral information in speech perception. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 38 Li i i i y r h i i i i l l & g i i i i i 4 + M + + 4 4 + + « + 4 4 ^ ^ js> ° 2 5. | ! | a " I i i * III o e o o o « C J ( % ) i O d J J O Q l U B O J d d Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5 . Vowel recognition i n noise (2, 4 a n d 6 frequency bands). Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. I 0=60.66*1.58; PRT=1.47+0.50, - (r =0.992) [Toddler] 0=57.70*3.25; PRT=2.96+1.16 + r rTTfT TT TT TT TT fTT TT f i 11 111 I I 1 1 1 ^ iqp'S 111 I I 11 » I » 111 . i n Hll.|lll'l[lm | M »11 I ■ > 111 111 I i I > I i r 'rf T l'l 'l l 'r T T ^ T r i T p ^ « .. HRB. a ^ a . . u a a x / i e7 . n n r _ nn 7 fl Q=90.16*1.31;PRT=-5.18+0.41 11 (r=0.993) INative) ■ 0=79.58*1.66; PRT=-2.30+0.46 (/=0.996) INative] (r =0.973) [Child) I I Q=74.52*1.26;PRT=0.16*0.35 |j§ | l " j r =0.996) [Toddler] / 1.28+Z05;PRT=1.04+0,55 ” - - . - - 1.990) [Child] Q=52.36±4.19;PRT=5.43*1,95 ~ m ; [r =0.953) [Teen] Q=31.53*0.93;PRT=0.79*0.62 (r =0.983) [Adult] 8 Bands:: J U U U ^ J L L lUju l ^ ^ » I | » I ■ I I I < ■ y j ■ ■ ■ l59.63+2.27;PRT=1.38+0J2 - V=50.85+3.38;PRT=3.29+1.55 (r =0.983) [Teen) Age of Immersion £ Native (N-5) y Toddler (N=5) g Child (N-5) ^ Teen (N-5) A Adult (N=5) 16 Bands :: i n »I m i | « »iJ-LiJULilxxxx|i^ ^1111,l' .GI=86.68+0.35;PRT=-5.04+0.09 (/= 0 .9 9 9 ) [T oddler] Q 84 1*0.60;PRT=-4.70+0.16 . . 0= 94.00+0.57;PRT—10.22+0.79 I m J r =0.990) [N ative] (r =0.956) [A dult] A 0=78.37+1.46;PRT=-3.30±0.40 (r2=0.989) [T een] Q=59.78+2.04;PRT=-3.69+0.73 Full Speech 1 1 11111111 1 i 11 n -10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 SNR(dB) SNR(dB) SNR (dB) Figure 6. Vowel Recognition in noise (8,16 bands and full speech). Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. * 100 90 80 70 20 10 0 T T T T p T » i | 11111 m i | 11 i r p i ii | n i; ^ I p r j ' » »■ I r o T f r r r T n n f i » ■ . | ^ f l|l 111 I I I I 111 I I I I | T m j 1111111111111 V p ,(| 111 11_ A# Q=45.95±1.83;PRT=6.55±0.86 (r =0.990) [Native] Q*40,85+0.84;PRT=5.30+0.50 (r*=0.997) [Toddler] O*41.34+1.36;PRT*7.93+0.76 (r*=0.993) [Child]' &=38.60+1.96;PRT*7.31+1.20 [i**0.981)[Teen] Age of Immersion Native (N-5) ■ y Toddler (N=5) | [ Child (N=5) t Teen (N=5) Adult (N=5) O=72.90+1.78;PRT=5.64+0.60 (i**0.996) [Native]' O*68.73+0.99;PRT=4.22+0.40 (f-0.998) [Toddler]' a Qs 71.67+2.84;PRT*6.25+1.0 (*=0.989) [Child] Q*79.76+2.54;PRT=4.38+0.73 (r*=0.992) [Toddler] T * 0=81.35+1.37;PRT=6.11+0.43 - - (t**0.998) [Child) 0=33.53+1.48;PRT*8.49+1.05 - - (1**0.987) [Adult] 2 Bands :: O=73.40+3.29;PRT=8.11+1.00 [ " ’-*=0.987) [Teen] ' Q=56.28+2.66;PRT*6.22+1.23 J (1**0.982) [Adult] ‘m i I m i i i n 11m »I »m j m 111^ y L j x 4 Bands J 4J U U ^ x iJ u |ju .U .j..LU ■ i ■ j lit Q*79.76+2.54;PRT*4.38+0.73 (1**0.992) [Native] J f+2.76;PRT=8.21+0.83 /(M .S93) [Tew] O=68.18+2.73;PRT=6.16+1.05 (1**0.987) [Adult] 6 Bands ; ^xjju«L -10 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 SNR (dB) SNR (dB) SNR (dB) Figure 7. Consonant recognition in noise (2, 4 and 6 frequency bands). Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. j iiH'n iH im i i n H i m i T Tnii1 ! rt-fi-ymTy-il-i t 11 » »T| n »81 »»111 «i V i | i i m n r |T»iT|'¥Tirf'rt'n 0=95.97+2.28;PRT=1.84+0.54 (A=0.995) [Native] 0=93.42+1.75;PRT=2.62ip.41 IA=0.997) [Toddler] Q*97.35+0.20;PRT=-5.70+0.06 y J . (r =0.999) [Native] 0=88.83*1.75;PRT=3.29+0.48 (A=0.997) [Native] O=83.98+2.72;PR7=2.5B+0.76 ir =0.991) [Toddler] 0=88.98+2.38;PRT=1.74+0.57 (A=0.994) [Child] Q=88.95+1M;PRT=4.28+0.59 i f *0,998) (Child] Q*89.31+4.40;PRT=2.82+1.0Z [ f *0.981) [Teen] 0=87.46+1.02;PRT=-4.82+0.30 (v=0.995) [Adult] Q=98.40+1.44;PRT*-4.62+0.36 (A=0.993) [Teen] 0=96.90+1.17;PRT=-5.23+0.32 (A*0.995) [Child] " 0=87.67+2.91;PRT=5.81+0.89 [r =0.991) [Teen] Q*80.02+2.53;PRT=3.40+0.77 + (A*0.992) [Adult] O=77.84+l99;PRT=6.63+1.10 (r =0.988) [Adult] Q=99.11+1.03;PRT=-5.69+0.31 (A*0:998) [Toddler] Age of Immersion Native (N=5) Toddler (N=5) Child (N=5) + Teen (N=5) - ; Adult (N=5) : : 16 Bandst i i i 11» i o i l ■ I i j i m j n n l i i i i l.i.a lydjuui -10 -5 8 Bands 11 1111 1 i n i 1 11 10 15 20 65 -10 0 5 10 15 20 65 -10 -5 0 5 10 15 20 SNR (dB) SNR (dB) Figure 8. Consonant recognition in noise (8, 16 bands and full speech). Full Speech m ij t 65 -10 -5 0 5 10 15 20 65 SNR (dB) Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. 80 TmfTnriiiliiit'ii'ii' ' !ri'gTtt«| > \ ^um innjrrrrps»»j< < n |Trri"|-n-Tiji\ | ■ M J1111111■ | ■ i■ 1111■ 1 1■ 111111■ ■ | ■ ‘ l O g o O c m e m a . 2 Bands Age of V : 4 Bands _ Immersion “ r | | | Native (N=5) . . Toddler (N=5) ' - 1 g j Child (N-5) " " f ree/i fW = 5j - j Adult (N=5) - j - Q=5Z17+Z63;PRT=1Z94+1.19 (r=0.988) [Native] 0=8.38+0.61 ;PRT=16.66+1.88 ‘ = ^ WSiPoMerJ Q=5.61+0.27;PRT=21.05+3.04 (r =0.990) [Toddler] Q=4.00±0.23;PRT=15.00±0.11 [v=0.988) [Child] Q=1M±0.24;PRT=11.72±2.53 (r =0.938) [Teen] :: 6 Bands - - Q=26.72+p.69;PRT=10.79+0.46 " Z(r *0.997) [Child] _ . Q=26.65+1.96;PRT=14.58+1.71 - - (r =0.977) [Teen] 0=2.41+0.31 ;PRT=16.90+ (r =0.924) [Adult] Q=14.11+0.26;PRT=9.28+0.26 (r =0.999) [Adult] . . . i i m l n n l m i h n i h m l i ^ ^ | lit 111»jli.U || > 111M H 1»»I l.|l Q=70.B2+Z36;PRT=11.60+0.60 (r =0.995) [Native] O=53.42±3.84;PRT=11.50±2.16 = - (r =0.977) [Toddler] m ^ Q=53.18+0.96;PRT=15.17+0.44 Z 7 (r =0.999) [Child] qP Q*44.73+3.48;PRT=17.23+2.35 / < r *0.976) [Teen] Q=3Z 79+1.63;PRT=17.09+1.59r (r =0.989) [Adult] m l » i n | « i i i | i i n | M i i i i . n i i ^ y j u X -10 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65 SNR (dB) SNR (dB) SNR (dB) Figure 8. Word recognition in noise (2, 4 and 6 frequency bands). 4 ^ N > Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Tr I 'Tf Till" I'rTfTfrrrrp u i | i m pi) ^»|TriTr > »»' m i' nil" m i m » m i iij < f)l 11 1 1 1 1111111 ff I Tl'l 11111111111111111’ l ) : 8 Bands O=87.09+2.46;PRT=10.74+0.51 (r =0.997) [Native] Q=66.02+1.55;PRT=11.01+0.40 (r=0.990) [Toddler] O=72.73+1.16;PRT=13.98+0.34 (r =0.999) [Child] m 30 1 16 Bands Q=93.70+2.68;PRT=6.42+0,49 (M .996) [Native] Q=88.40+5.72;PRT=9.27+1.26 (M .981) [Toddler] 0=86.05+4.31;PRT=10.35+1.00 y (t^O.989) [Child] Q=74.84+6.61;PRT=10.66+1.t I (t*=0.962) [Teen] - Xt=61.32+7.18;PRT=10.74+2.67) I y=0.939) [Adult] Q=62.69+2.85;PRT=13.24+0. 1.992) [Teen] O=45.22+1.75;PRT=11.6O+0.7( (r=0.993) [Adult] i'» ' iTTpYi r(l l a y fiTi jTiTiJT] M^Full Speech 3 Age of Immersion ;; 0 Native (N-5) Toddler (N=5) J Child (N=5) Teen (N=5) Adult (N=5) J ■ 1111111i11111111111^ r i | j f h i Y » »,» n 11a» ^ ■97.73+2.23;PRT=-0.87+0.43 (r =0.993) [Native] O=93.39+3.22;PRT=0.24±0.68 — (r =0.989) [Toddler] 0=95.33+1.21;PRT=0.65+0,25 (r =0.998) [Child] 'o=88.22+6.28;PRT=1.76+1.40 961) [Teen] 0=73.37+4.63;PRT=2.10+1.25 (r =0.970) [Adult] j 11II i.i.li.u i | i n i | i m l i n i l i i y |m ~ -10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 SNR (dB) SNR (dB) SNR (dB) Figure 10. Word Recognition in quiet (8, 16 bands and full speech). Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. TiiiTrrjTiT Age of Im m ersion I Native (N=5) p ^ Toddler (N=5) H Child (N=5) A Teen (N=5) A Adult (N=5) jL L J i l l | II1 1 1 1 1 n | I 11111111'| 111111'| § 11«i m j I 111 i 11 ■ 111 k 11 a 11111 s i ■ j «») § ■ | ■ Q=100.0+3.44;PRT=1.85+0.58 (r =0.990) [Native] Q=97.32+2.35;PRT*2.50+0.40 . (r =0.996) [Toddler] Q=95.56+5.00:PRT=2.92+0.87 r =0.983) [Child] Q=26.S3±1.S2;PRT=7.99±0.77 (r =0.988) [Native] Q=24.28+2.08;PRT=5.11+1 (r =0.959) [Toddler] 0=24.65+0.61;PRT=8.28+0.31 < r =0.998) [Child] m m 40 Q=9.81±0.11;PRT=6.18±0.13 (r =0.999) [Teen] (r =0.994) [Teen] 0=13.63+1.95;PRT=5.20+2.08 (r =0.911) [Adult] 0=100.0+3.44;PRT=1.85+0.58 (r=0.990) [Native] . mQ=97.32±2.35;PRT=2.50+0.40 I “ J =0.996) [Toddler] 2 Bands z z Q=96.06+3.56;PRT=6.66+0.6i - 0=86.28+4.19;PRT=6.93+0.83 - |- (r =0.989) [Adult] 4 Bands Q=95.56+5.00;PRT=2.92+0.B7 (r =0.983) [Child] Q=96.06+3.56;PRT=6.66+0.62 [r =0.994) [Teen] Q=86.28+4.19;PRT=6.93+0.83 (r =0.989) [Adult] 6 Bands 1|XI^JJJ4^JJJJL|JUJJ^J4J±|JAIA|4L^ 1 1 1 1 11 n 1 1111 l i 11 > 1 1 is ■ 1 1 n 1 1 1 l i i i l i i f T m i i i n j n riln n l m i l m i l i i iil ii f ■ 1 0 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65 SNR (dB) SNR (dB) SNR (dB) Figure 11. Sentence recognition in noise (2, 4 and 6 frequency bands). Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Q=99.98±0.15;PRT~5.73+0M (r =0.999) [Native] Q=99.42+0.40;PRT=-3.69+p.07 (r =0.993) [Toddler] Q=100.0+0.11;PRT=-3.26+0.02 (r =0.999) [Child] 7 k > 11 i»i»11 jtiirprii|iiii|ii " Q=100±4.65;PRT=0.34±0.7 (i*=0.975) [Native] TTTff r i l l fit i i i | n u | n i - Q=100.0+344;PRT=1.85+0.58 L (r =0.990) [Native] A Q=98.9S+1.40;PRT*0,19+p. (i*=0.998) [Toddler] ~ " ~Q=97.32+2.35;PRT=2.50+0 J r =0.995) [Toddler] Q=96.30+2.38;PRT=0.55+0. (l*=JL?94) [Child] Q=94.79±0M;PRT=3.84*0.14 (i*=0.999) [Teen] Q=95.56+5.00;PRT=2.92+0. (r =0.993) [Child] Q=83.32+4.57;PRT=2.76+0.85 (M .979) [Adult] Age of Immersion Q=95.08+3.58;PRT=6.68+0.82 (r =0.994) [Teen] 0=100.0+1.44;PRT=-2.72+0.26 (r =0.995) [Teen] Native (N =5) w Toddler (N =5) H Child (N =5) / / > Teen (N =5) Adult (N =5) Q=86.28+4.19;PRT=6.93+0.83 (r =0.989) [Adult] Q=96.45+1.55;PRT=-1.47+0.29 (r =0.998) [Adult] 16 Bands u 1 1 . i l i i n l i m 1 li y j l l i 8 Bands - - Full Speech i i i i l i i i i l i m l i i i i l i i i i l i i i i l i ^ ^ I j IJ r -10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 -10 -5 0 5 10 15 20 65 SNR (dB) SNR (dB) SNR (dB) Figure 12. Sentence recognition in noise (8, 16 bands and full speech). 4 ^ . U s 46 Figures 9 and 10 show the results for word recognition. With reduced spectral resolution we can see that the late learners (teen and adults) perform at similar (poorer) levels, while the early learners (toddler and child learners) also have similar (better) performance. In the case of foil speech the early learners perform almost as well as the native English listeners, while the late learners are still at a disadvantage. The estimated PRT values do not always show what we expected should happen in this case, as can be seen in table 5. We expected to find that PRT values increase with increased age of immersion in the second language. This is true for toddler, child and teen English learners in all cases except for the case of two bands. In the case of adult English learners, who like in the previous cases, do not follow exactly what we would expect. From the plotted results we can clearly see an effect of age of acquisition of the second language. Figures 11 and 12 show the performance in the case of sentences. With increasing spectral information (sixteen bands and full speech) the L2 learners tend to perform closer to the performance of the native group, except for the adult learners that seem to take less advantage of the spectral information available in all cases. The same grouping observed in the case of vowels and words is seen in the performance of sentences. The toddler and child learners tend to perform similarly (better), while teen and adult English learners perform similarly (lower) too. Although the teen learners were exposed to the L2 language after the age of 12, they showed that they make better use of the spectral information available in the speech signal when compared to adult learners. When the signal is less noisy, teen learners of English have a better Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 47 performance compared to adult learners. Their sentence recognition pattern is then a little more robust than that of adult learners, but is not so robust when noise is also included in the signal. Toddler and child learners, perform similarly in quiet or noisy conditions. For sentences, especially in the case of full speech a ceiling effect was noticed for better SNRs and even the adult group reaches 100% recognition. The estimated PRT values shown in figure 15 vary more according to what was expected in this case, higher PRT for late learners and lower PRT for early learners and native English. This is not true in some cases for adult English learners when the spectral information is greatly reduced (four and two bands of information). 5.3. Performance in Quiet The performance in quiet as a function of the number of bands can be fitted with sigmoidal functions as shown in figure 13. The asymptotic value for this function is the performance in quiet for no spectral degradation for each type of stimuli. -a fN B -N B r fJ ( W QAorQ,syJ ( 1+e ) Qa01 is the percent correct in quiet as a function of the number of bands, QA sy m is the maximum performance in quiet, a is the slope of the function, NB is the number of bands (which is given) and NB50 is the number of bands at the 50% level of the curve. Table 6 shows the estimated parameters for the sigmoidal functions and figures 13 and 14 show the fitted curves for the different type of stimuli. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 6. Estimated parameters for the sigmoidal functions. Stimuli Listener QAsym(% ) NBs o a Vowel Native 92.66+1.17 3.11+0.24 0.3610.04 0.996 Toddler 85.16±3.61 4.5010.71 0.2010.05 0.981 Child 82.9914.09 4.71+0.85 0.1910.05 0.976 Teen 77.7415.76 5.3611.75 0.1310.05 0.950 Adult 60.0511.21 6.9810.36 0.1810.02 0.997 Consonant Native 97.3511.23 0.2210.77 0.28+0.05 0.990 Toddler 96.9212.05 1.0110.99 0.3110.08 0.975 Child 94.1512.52 0.0911.82 0.30+0.12 0.944 Teen 93.4012.38 0.6811.63 0.4010.17 0.927 Adult 84.2012.74 2.2910.88 0.4110.14 0.953 Word Native 95.9311.72 3.6910.23 0.49+0.07 0.992 Toddler 91.8411.87 5.35+0.22 0.3710.04 0.995 Child 90.7613.31 5.4710.30 0.5710.11 0.986 Teen 82.2114.73 5.5810.54 0.4510.14 0.966 Adult 67.8814.55 6.4510.62 0.4710.16 0.966 Sentence Native 100+0.35 0.9710.48 0.71+0.11 0.993 Toddler 99.2910.28 2.4110.08 0.6710.03 0.999 Child 98.3611.37 1.3210.73 0.49+0.12 0.976 Teen 98.3313.82 2.7810.78 0.4910.19 0.936 Adult 91.3917.27 3.7211.09 0.43+0.25 0.867 Performance in quiet for adult learners is always lower when compared with performance of native English listeners and other normative English learner groups as can be seen in the previous figures. Especially in the case of vowels it can be seen that toddler and child learners perform similarly (early learners). Some difference is seen in performance in quiet for the two late learner groups. Teen English learners show better performance compared with adult learners. This is more clearly seen in the case of vowels, consonants and words. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 49 XTTTTTTTTTTTTTTTTTT'fT^ V T Y T Tl" l ’I T'l i I I I I I I I I r f' TT-l'T'^ < fT fT t n r Age of Immersion 0 Native ' y Toddler ■ Child t Teen Adult Consonants JUUL|JUL1JL^^ 5 1 0 15 20 65 0 5 10 15 20 65 Number of Bands 100 &• 80 Vowels ± Figure 13. Performance in quiet for phonemes. rrrT TT TTT TTT TT TTT TT TinFf fTTTT STTTTTTTTTTTTTTTnrTTTTTl 100 - - O 80 N P 20 Words :: juuuk|juuui^aju^ ^ x i ^ Age of z immersion ; 0 Native ^ Toddler - ■ Child f Teen Adult Sentences : 5 1 0 15 20 65 0 5 10 15 20 65 Number of Bands Figure 14. Performance in quiet for words and sentences. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4. Performance Intensity Functions We estimated PRT values for vowels; consonants, words and sentences, so in this case it is not only the Phoneme Recognition Threshold, but also it could be redefined as the Performance Recognition Threshold. Table 7. Estimated parameters for the sigmoidal fitting for vowels. Condition Listener Q(%) PRT(dB) 0 T “ Original Native 94.00±0.57 -10.22±0.79 0.25±0.04 0.990 Toddler 86.68±0.35 -5.04±0.09 0.36+0.02 0.999 Child 84.54+0.60 -4.70±0.16 0.34±0.02 0.997 Teen 78.37+1.46 -3.30+0.40 0.30±0.04 0.989 Adult 59.78±2.04 -3.69+0.73 0.35+0.09 0.956 16 Bands Native 90.16+1.31 -5.18+0.41 0.20±0.02 0.993 Toddler 74.52±1.26 0.16+0.35 0.25±0.02 0.996 Child 71.28±2.05 1.04±0.55 0.31±0.05 0.990 Teen 59.63±2.27 1.38+0.72 0.34+0.07 0.983 Adult 50.85±3.38 3.29+1.55 0.19±0.05 0.961 8 Bands Native 79.58+1.66 -2.30±0.46 0.23±0.03 0.996 Toddler 60.66±1.58 1.47±0.50 0.32+0.05 0.992 Child 57.70±3.25 2.96+1.16 0.25+0.06 0.973 Teen 52.36±4.19 5.43+1.95 0.17+0.05 0.953 Adult 31.53±0.93 0.79+0.62 0.41+0.09 0.983 6 Bands Native 68.28+0.94 -1.75+0.30 0.24+0.02 0.998 Toddler 48.80+1.37 3.01+0.60 0.23+0.03 0.992 Child 47.83+1.58 5.35±0.76 0.19+0.02 0.992 Teen 38.19±0.61 7.93±0.34 0.22+0.01 0.998 Adult 27.35+1.32 2.69+1.28 0.22+0.05 0.966 4 Bands Native 53.62±1.58 0.10±0.64 0.26+0.04 0.994 Toddler 38.50+0.80 1.34+0.42 0.37±0.05 0.994 Child 35.61±0.81 3.19±0.48 0.29+0.04 0.994 Teen 32.90±0.35 6.15+0.21 0.32+0.02 0.999 Adult 23.10±0.85 2.42+0.83 0.56±0.17 0.971 2 Bands Native 18.60+1.15 -0.59+0.40 0.23±0.02 0.994 Toddler 15.06+1.84 22.60±19.10 0.07+0.07 0.862 Child 10.66+0.78 9.92±8.85 0.12±0.12 0.587 Teen 9.51+0.35 11.39+9.55 1.27+8.45 0.741 Adult 10.83+0.74 9.91+5.11 0.21+0.18 0.745 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 51 Table 7 shows the estimated values for Q, PRT, (5 and r2 (coefficient of determination) for vowels and different frequency resolutions. Performance in quiet decreases when age of immersion in the second language increases for all different spectral resolutions. PRT increases with age of immersion in the second language for the original speech and for sixteen bands of spectral resolution. For eight, six, four and two bands PRT increases with age of immersion, except for the adult English learners whose results are not consistent. Toddler and child learners have a similar performance in quiet as is shown in this table. Both teen and adult learners have a lower performance even with the original speech, but teen learners have a better performance compared to adult learners. All the points of the plots for each category of listener were properly fitted, although for two frequency bands the coefficient of determination is lower than in previous cases. PRT increases also as spectral information available is reduced for native monolingual English listeners and for toddler, child and teen learners of English. This is not always the case for adult English learners. Table 8 shows the estimated parameters for consonants. PRT values increase with reduced spectral resolution in almost all cases. Performance in quiet estimated for toddler, child and teen English learners is very similar to those obtained for native monolingual English listeners. A difference of performance in quiet is seen for the adult learner group. Almost in all cases the estimated PRT also increases with Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 52 increasing age of immersion in the second language (except six and four bands for adult learners). Table 8. Estimated parameters for the sigmoidal fitting for consonants. Condition Listener Q(%) PRT(dB) . . . P . . / Original Native 97.35±0.20 -5.7010.06 0.2510.005 0.999 Toddler 99.11+1.03 -5.6910.31 0.1910.01 0.996 Child 96.90±1.17 -5.2310.32 0.2210.02 0.995 Teen 96.40+1.44 -4.6210.36 0.2410.03 0.993 Adult 87.46+1.02 -4.82+0.30 0.2210.02 0.995 16 Bands Native 95.97+2.28 1.8410.54 0.1710.01 0.995 Toddler 93.42±1.75 2.6210.41 0.17+0.01 0.997 Child 89.98±2.38 1.74+0.57 0.1810.02 0.994 Teen 90.27+2.81 2.6810.66 0.19+0.02 0.992 Adult 80.02+2.53 3.4010.77 0.1510.02 0.992 8 Bands Native 88.83+1.75 3.2910.48 0.1510.01 0.997 Toddler 86.76+2.68 3.4110.70 0.1610.02 0.993 Child 86.95±1.99 4.2810.59 0.1310.01 0.996 Teen 87.67+2.91 5.81+0.89 0.1210.01 0.991 Adult 77.84±2.99 6.6311.10 0.1210.01 0.988 6 Bands Native 79.76±2.54 4.3810.73 0.1610.02 0.992 Toddler 81.14±1.39 4.1210.43 0.1410.01 0.998 Child 81.35+1.37 6.1110.43 0.1310.01 0.998 Teen 84.94+2.76 8.2110.83 0.1310.01 0.993 Adult 68.18±2.73 6.1611.05 0.1310.02 0.987 4 Bands Native 72.9011.78 5.6410.60 0.1410.01 0.996 Toddler 68.7310.99 4.2210.40 0.1210.01 0.998 Child 71.6712.84 6.2511.04 0.1310.02 0.989 Teen 73.4013.29 8.1111.00 0.1610.02 0.987 Adult 56.2812.66 6.22+1.23 0.1310.02 0.982 2 Bands Native 45.9511.83 6.5510.86 0.1910.02 0.990 Toddler 40.8510.84 5.3010.50 0.1610.01 0.997 Child 41.3411.36 7.9310.76 0.1710.02 0.993 Teen 38.6011.96 7.3111.20 0.1710.03 0.981 Adult 33.5311.48 8.4911.05 0.1710.02 0.987 Table 9 shows the estimated parameters for words. In some cases the values obtained for PRT do not increase steadily with age of immersion in English, as before this happens for adult English learners. Within each group the PRT increases as Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spectral information is reduced. Now that a complete set of data is available, the approximations found are much better and the coefficients of determination found are very close to one. Toddler and child English learner have a very similar performance in quiet, which is lower than performance for native monolingual English listeners. Table 9. Estimated parameters for the sigmoidal fitting for words. Condition Listener Q(%) PRT(dB) 0 “ T ” Original Native 97.33+2.23 -0.87+0.43 0.25+0.03 0.993 Toddler 93.39±3.22 0.24+0.68 0.21+0.03 0.989 Child 95.33±1.21 0.65+0.25 0.20+0.01 0.999 Teen 88.22±6.28 1.76+1.40 0.20+0.05 0.961 Adult 73.37±4.63 2.10+1.25 0.19+0.04 0.970 16 Bands Native 93.70±2.68 6.42+0.49 0.22+0.02 0.996 Toddler 88.46+5.72 9.27+1.26 0.18+0.03 0.981 Child 86.05±4.31 10.35+1.00 0.17+0.02 0.989 Teen 74.84+6.61 10.66+1.79 0.17+0.04 0.966 Adult 61.32±7.18 10.74+2.67 0.15+0.05 0.939 8 Bands Native 87.09±2.46 10.74+0.51 0.20+0.02 0.997 Toddler 66.02±1.55 11.01+0.40 0.21+0.01 0.998 Child 72.73+1.16 13.98+0.34 0.18+0.01 0.999 Teen 62.69±2.85 13.24+0.70 0.25+0.04 0.992 Adult 45.22+1.75 11.60+0.70 0.20+0.02 0.993 6 Bands Native 70.82±2.36 11.60+0.60 0.20+0.02 0.995 Toddler 53.42±3.84 15.50+2.16 0.14+0.04 0.977 Child 53.18±0.96 15.17+0.44 0.17+0.01 0.999 Teen 44.73+3.46 17.23+2.35 0.19+0.07 0.976 Adult 32.79+1.63 17.09+1.59 0.16+0.04 0.989 4 Bands Native 52.17+2.63 12.94+1.19 0.15+0.03 0.988 Toddler 33.33+2.30 17.37+2.98 0.11+0.03 0.978 Child 26.72+0.69 10.79+0.46 0.20+0.02 0.997 Teen 26.65+1.96 14.58+1.71 0.17+0.05 0.977 Adult 14.11+0.26 9.28+0.26 0.28+0.02 0.999 2 Bands Native 8.38+0.61 16.66+1.88 0.21+0.08 0.979 Toddler 5.61+0.27 21.05+3.04 0.14+0.04 0.990 Child 4.00+0.23 15.00+0.11 2.47+1206 0.988 Teen 1.98+0.24 11.72+2.53 0.17+0.06 0.938 Adult 2.41+0.31 16.90+5.28 0.11+0.06 0.924 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 10 shows the parameters estimated for sentences. As was expected, with increasing age of immersion in the English language PRT value increases in most cases. This does not happen always for adult English learners. This group has a lower performance in quiet when compared with the other groups tested. They seem to be more influenced by the reduction of the spectral information. Table 10. Estimated parameters for the sigmoidal fitting for sentences. Condition Listener Q(%) PRT(dB) 0 / Original Native 99.98±0.15 -6.73+0.06 0.36+0.008 0.999 Toddler 99.42+0.40 -3.68+0.07 0.5910.02 0.999 Child 100.00±0.11 -3.26+0.02 0.4710.004 0.999 Teen 100.00+1.44 -2.7210.26 0.3610.03 0.995 Adult 96.45±1.56 -1.4710.29 0.3110.03 0.996 16 Bands Native 100.00±4.65 0.3410.75 0.3710.09 0.975 Toddler 98.95±1.40 0.1910.21 0.4410.04 0.998 Child 96.30+2.36 0.56+0.34 0.5110.09 0.994 Teen 94.79+0.96 3.8410.14 0.46+0.03 0.999 Adult 83.62+4.57 2.7610.85 0.3610.09 0.979 8 Bands Native 100.00+3.44 1.8510.58 0.3010.04 0.990 Toddler 97.32±2.35 2.5010.40 0.2910.03 0.996 Child 95.56±5.00 2.92+0.87 0.2810.06 0.983 Teen 96.06±3.56 6.6610.62 0.2310.03 0.994 Adult 86.28±4.19 6.9310.83 0.2210.03 0.989 6 Bands Native 96.95±3.12 3.4910.52 0.3010.04 0.994 Toddler 90.70+2.72 4.07+0.52 0.2410.02 0.995 Child 88.8612.53 4.7710.44 0.3110.03 0.996 Teen 77.24+3.41 9.62+0.46 0.4810.10 0.992 Adult 59.05+3.15 7.8410.77 0.29+0.05 0.988 4 Bands Native 89.67+2.10 6.7210.34 0.3010.03 0.998 Toddler 73.7217.45 5.10+1.61 0.2810.10 0.954 Child 77.7017.46 5.77+1.48 0.2810.10 0.961 Teen 64.76+2.46 8.5810.50 0.3410.05 0.995 Adult 51.2112.84 7.79+0.81 0.2910.06 0.989 2 Bands Native 26.5311.52 7.9910.77 0.3310.07 0.988 Toddler 24.28+2.08 5.11+1.04 0.4810.25 0.959 Child 24.6510.61 8.2810.31 0.39+0.04 0.998 Teen 9.8110.11 6.1810.13 0.60+0.04 0.999 Adult 13.6311.95 5.20+2.08 0.3410.20 0.911 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 55 S3 20 ~o t s O 10 8 e o C £ -s T * 1 ™ 1 ^ g r - w - 6 Bands M - 8 B ands T Y T T -^ ^ ^ T Y T Y ^ rT T ~ r T r 'T T ^ -Y T T T 'T T ~ r T T T Y T T T ~ r T T T r r|_TTTT|TTls |TTTT| f S Bands Full Speea 5 - - D) -10 o fa) Vowels 1L (b) Consonants 1 (c) Words j l (d) Sentences - 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Age of Immersion in the 2nd. language (yrs.) Figure 15. PRT values (in dB) as a function of age of immersion in the second language. Figure 15, shows the PRT values as a function of age of immersion in the second language for vowels, consonants, words and sentences (age 0 represents the native monolingual English listeners). In the case of vowels a clear difference between PRT for native English and nonnative listeners is seen. All second language learners have higher PRT values. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 56 For consonants and sentences in most cases we can see and increment in PRT value with increasing age of immersion in the second language. In the plot we can clearly see the increment in PRT with the reduction of spectral information. The estimated values of PRT can also be plotted as a function of the recognition performance in quiet, Q. Figure 16 shows these plots for all the groups tested, showing that they are both affected by spectral degradation. With increasing spectral degradation the performance in quiet decreases while PRT increases. There is a linear relationship between these two parameters, but only in the case of native English listeners we found a good approximation, with a significant correlation coefficient. The relationship between these two parameters changes with age of immersion in the second language. Although the slope for toddler learners is less steep than that of native English listeners, after this point, the slope is steeper as age of immersion in the second language increases. As was said before the approximations are not very good in most cases for normative English listeners (especially for the adult group), but plots still suggest that we need more information to reach 50% performance when age of immersion in the second language increases. The PRT value estimated for a particular condition of spectral resolution is a “relative” SNR, since it does not exactly follow Plomp's definition of 50% performance recognition. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 57 (c) Child 0 Consonants P R T = 2 2 .6 - 0 .2 r Q ( i* = 0 .7 0 ) o Vowels PRT=7.13-O.OB'Q(r!=0.42) 20 40 60 80 10020 40 60 80 10020 40 60 80 100 Percent Correct in Quiet (%) Panels (a), (b), (c), (d) and (e) show the linear regressions for PRT as a function of the performance in quiet for native monolingual English listeners, toddler, child, teen and adult English learners respectively. Percent Correct in Quiet (%) Figure 16. PRT values as a function of the performance in quiet (Q). 10. 8. 6 . ■-2 . -6 _ i TWTTTfTTTTTrT^T^rr o *\ • 0 0 4 ts v_2 F » CU 0 -fr (d) Teen Q. rpTTITTTTTTTTTTTTTTl 9 Consonants PRT=27.40-0.25*Q(t> =0.52)x ' O Vowels PRT=14.51-0.2VO(f=O.B1) o (e) Adults 0 Consonants PRT=10.31-0.7*Q(r2 =0.24) O Vowels PRTc1.85+8.28'Q(r‘=0.01) 20 40 60 80 10020 40 60 80 10 5 . 5. Estimated SRT To find the SNR value at a recognition score of 50%, we used the parameters found from the fitted sigmoidal function and replaced them in the PI functions for a 50% score. This way we found the values of the SNR at 50% recognition. The method is shown in figure 17. We find the points of 50% performance and we determine the SNR at this point. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 58 rTTTTTT T T Y TTTTTTTTTTT TTTfTTTTTTTTTYl ^TTTfO Bands Bands Bands S p e e c h j K 20 S , 1 0 jjuua^juuuujJ^ i. -15 -itT 6 5 ^ 'f O 1 5 20 64 Signal to Noise Ratio (dB) Figure 17. Performance Intensity Functions for vowels of native English listeners. In our case, we applied equation (3.3) that defines SRT for hearing impaired listeners, since we are including some type of distortion (impairment) in the stimuli used for testing. The new equation will only include the "hearing loss” given by 'D\ since it represents the distortion introduced by reduced spectral resolution while there is no loss due to attenuation of all the sounds entering the ear (class A). The new equation for our case is, (5.2) SRT = 101og(10(i“+ D )/1 ° + io (i"“A L ™ + D )/1 0 ), In previous studies (ter Keurs et al., 1989 and 1993), speech recognition as a function of noise was measured under conditions of spectral smearing. The SNR at 50% recognition is the SRT in noise, which is determined by the value of A L sn as was discussed before. The SNR found from the PI function when there is no spectral Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 59 resolution reduction defines the value of A L sn for a ‘ norm al ’ hearing condition (no distortion). - ^band-torxMian m eo Class D Normal th resh o ld 0 30 20 40 60 SPL of the noise in dB(A) Figure 18. Different levels of ‘ D ’ in the Speech Reception Threshold curves. As shown in figure 18, when spectral degradation is introduced the SNR found will give a combined value of D and ALsn, since the second term of the equation will then determine the value of SRT at high noise levels as can be seen from equation (5.3). Then we can define: (5.3) SNR estimated ALsn L )ba n d -c o n d U io n We found values of D for different numbers of spectral channels used as was done by Shannon and Fu (1999). They found that for native English listeners ‘ D ’ increments about 4.4dB when the number of spectral bands is reduced by a factor of two (16. 8 and 4 channels). This value is comparable to the estimated value found by Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 60 ter Keurs et al. (1993) of an increment of 4dB in D as the amount of spectral smearing was doubled. For nonnative listeners we expected to find values of D that were higher. In the case of vowels especially, there is some distortion (D), even in full speech, since nonnative listeners probably “hear something, but they just don't understand what was said” (they do not understand the difference between certain vowels). We define then the value of ALsn for native English listeners as what would be the “normal hearing” A L sn in Plomp’s model. Then, we determined how different the values obtained for D are for each type of stimuli and how it changes (increments) when spectral resolution is reduced. Values of D for phonemes, words and sentences were determined using the data. Figure 19 shows the values of D in dB as a logarithmic function of the number of frequency bands (4, 6, 8, 16 frequency bands and full-unprocessed speech) for the groups tested. The different plots show the results obtained for vowels, consonants, words and sentences in different panels. In the case of consonants, sentences and even vowels, D is almost a linear function of the logarithm of the number of frequency bands. In the case of words, the functions are mostly linear except for the lowest number of channels. A simple linear relationship is not clear if we consider these points (four channels for native monolingual English listeners and six channels for toddler and teen English learners) The values obtained for the SNR at 50% performance and for D are as were expected. L2 learners have greater values when compared with native English listeners under the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 61 same conditions especially in the case of vowels. With increasing age of immersion in the second language the value of D is greater. Late learners (adult and teen learners) reached a level of 50% recognition for vowels and words only under some conditions, so a value of D could not be found for all conditions of spectral resolution. 40 35 30 25 20 5" 15 5 . 1 0 Q 5 0 40 § 0 t : 2 3 5 Q 30 25 20 15 10 5 0 : Age o f I ■ Immersion ~ : _ 0 _ Native (N=5) ] : * ■ ^ ■ Toddler (N=5) 1 \ - B - Child (N=5) i j ^ - M - Teen (N=5) \ Adult(N=S) iConsonants^^% 1 ! t % ! ' [ 1 _ L Word i L 4 i j ] 10 100 10 Number of Bands (log) Figure 19. Values of D for phonemes, words and sentences. 100 Figure 20 shows the same results as the previous figure, displayed as different panels for native English listeners, toddler, child, teen and adult English learners. In this figure, the effect of age of immersion in the second language can be seen in the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 62 way ‘ D ’ increases. Also, a clear difference in performance can be seen. Values of ‘ D ’ for native English listeners tested with different stimuli are closer. The only exception is the value of ‘ D ’ for words at four bands of spectral information. Clearly this condition is very difficult even for native English listeners, so is it understandable that normative listeners do not reach 50% performance under this reduced spectral information condition. Also, the performance for consonants is similar to that of sentences for all the categories of subjects tested. 100 10 100 10 10 0 100 10 100 10 10 Mmber ofBandte (Io$ Figure 20. Values of D for different age of immersion in English. For nonnative English listeners the value of ‘ D ’ obtained for vowels is higher compared to the values obtained for the other stimuli used for testing, even higher than values obtained for words. The adult group is always at a disadvantage. At sixteen bands of spectral resolution they have ‘ D ’ values very similar to the values found for natives tested with eight frequency bands in the case of consonants and sentences. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 63 Comparing the way ‘ D ’ increases in each panel as a function of the number of frequency bands in the stimuli signal, it can be seen that if we fit the data with straight lines the slopes become steeper as age of immersion in the second language increases. 40 20 - - 00 t ! 10 P 0 C 4 0 ■8 1 3 0 .2 O 20 10 0 T " : (a) Vowel ■ ; - 0 - 4 Bands I 8 Bands “ -4B~ ® Bands ^ “ " iti ^ BandS ‘ : ^ (b) Consonant \ : ^ ' (d) Word: ] Adult - : (d) Sentence \ I IIIII III M I I | H-l l-l-l I I I 1 1 I I 1 1 I I I l~ 2 0 2 5 3 0 0 5 1 0 1 5 2 0 2 5 3 0 0 5 1 0 1 5 Age of Immersion in L2 (years) Figure 21. Distortion ‘ D ’ as a function of age of immersion in the second language. Figure 21, shows distortion ‘ D ’ as a function of the age of immersion in the second language. In the case of sentences and consonants, even adult English learners reach 50% performance with only four spectral bands. For vowels, at least eight spectral bands are needed to reach 50% performance. For vowels, adult English learners need at least sixteen spectral bands to reach 50% performance, for these Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 64 stimuli we can clearly see the difference in performance between teen and adult learners. Teen English learners reach 50% performance with eight spectral bands of information. The value of ‘ D ’ defined in the Plomp model has been found in previous studies to be related to the number of bands or channels presented to the subject. For normative English listeners, this parameter is also determined by other factors like age of immersion and experience with the second language as was seen in figures 19 and 20. The increment of the value of the distortion 'D' with age of immersion in the second language compared with native English listeners seems almost constant across different spectral conditions in most cases. For vowels and words the behavior is a little different. Early L2 learners (toddler and child) are the only ones to reach 50% recognition score of vowels in most of the spectral conditions. The child learners performed similarly to the toddler learners for full speech and sixteen frequency bands, but for eight bands the value of ‘ D ’ is almost 5dB higher. Then, the value of ‘ D ’ starts to diverge showing that with increased spectral degradation child learners are affected by another factor, besides the loss of information. This is even clearer in the case of late learners that only reach 50% recognition with M l speech. We found a difference in the performance of the teen and adult English learners, is different, contrary to what was expected. The teen group reached 50% recognition of words with eight bands of spectral resolution and they showed no distortion for sentences when presented with M l speech. Adult learners need at least sixteen frequency bands of spectral information to recognize 50% of the words Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 65 presented and even under optimum quiet conditions and full speech there is some distortion for sentences. This distortion ‘ D ’ could probably be compensated then by delivering an appropriate amount of channels of information to ensure significant speech recognition even in the presence of noise or improving the way the information is treated to enhance speech perception. Second language learners, need then more spectral information than native monolingual English when they are under the same conditions. 5.6. Context Effects The model presented by Boothroyd and Nittrouer (1998) was used to find values of ‘ j ’ and ‘ k ’ that define the context effects for normative listeners. The effect of learning observed by Grant et al. (2000) was not considered in the measurements made in this case since the sentences used for testing did not consist of the same words presented previously added together. 5.6.1. Estimated ‘ j ’ Figure 22, shows the fitted curves for ‘ j ’ for the case of eight bands of spectral resolution and for the case of full speech. Values of ‘ j ’ calculated for words are plotted in figure 24 as a function of age of immersion in the second language for different number of bands and for full speech. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 66 Pa .C 1.0 .N I . 0.8 O m ts o 0.6 a? ® S o 0.4 « Q 45 JQ S O 0.2 Q . 0.0 Age of Immersion # Native Y Toddler m Child Adult 8 B a n d s_l Full Speech: I — I — l-|»LXXX^— L J — I — I-j-i— I — 1 — lll— l-l-l— I — 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1 .0 Probability of recognizing phonemes in words (p^ Figure 22. Fitted curves of j ’ for words. Values of j ’ found using all the data do not change compared to the values we obtained when we use the average phoneme and whole word probability recognition scores, measured for each category of subjects tested. O ) •S .a o I S. s « t» o O o t f > 2 : s 1e ® ■ Q O 1.0 - - Full Speech 0.8 j = 1.73 0.6 0.4 0.2 'Y Toddler 0.0 0.8 1.0 0.4 0.6 0.0 0.2 Probability o f recognizing p h o n em es in a word Figure 23. Values of ‘ j ’ for all the subjects in the toddler learner group. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 67 Figure 23, shows the fitted curve for ‘ j ’ for the toddler learner group in full speech. The performance of all subjects in the group is shown in the plot. The values of ‘ j ’ obtained do not differ significantly between different groups and different band conditions. Smaller values of 'j' should be expected with better spectral resolution, but this was not the case. Values did not change significantly between different band conditions. 2.4 Number of Bands 0 4 Bands -Q - 6 Bands 8 Bands 16 Bands J Full speech 2.2 Q > 2.0 1.8 Words 1.6 30 25 15 20 10 5 0 Age of Immersion (years) Figure 24. Value of j ’ for words. Also, we expected that as age of immersion in English increased the value of ‘ j ' would increase, since normative English listeners should need to recognize more of the constituent phonemes to understand the word. Contrary to what was expected, what we found in almost all the different band conditions (except for four bands) is that normative English listeners have slightly smaller values of ‘ j ’ compared to native Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 68 English listeners as is shown in figure 24. Also y ’ did not increase significantly as age of immersion in the second language increased, the average y ’ for all nonnative listeners groups is very similar to each other. The values of y ’ found for native English listeners were similar to values found in other studies for monosyllabic meaningful words (Boothroyd and Nittrouer, 1988 and Eisenberg et a!., 2000). In a previous study (Eisenberg et ah, 2000) values of j ’ for sentences were also found. In this case y ’ relates the probability of recognizing whole sentences with the probability of recognizing separate words (speech units) in the sentences. We also found the values of y ’ for sentences. The probability of recognizing whole sentences (percent correct for whole sentences) was not available for native English listeners so we were not able to estimate the value of y" for this group to compare their performance with that of nonnative English listeners. 3.5 3.0 - “ y y !k , g ® s 5 s , 2.0 - - 1.5 Sentences a b a 9 f i Number o f Bands Q 4 Bands 6 B ands -fU- 8 Bands 16 B ands -JL- Pull Speech ^ — I — I — I — J— |— 9 — .. “ I 5 10 15 20 25 Age of immersion (years) Figure 25. Values of y ’ for sentences. 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 69 Looking at figure 25, we can see that these values of ‘ j ’ do not provide a simplifying picture of the differences in the performance of each category of normative English listeners. The values do not change significantly from one category to another or between different band conditions. 5.6.2. Estimated ‘ k ’ Preliminary values of ‘ k ’ calculated for sentences gave us a better understanding of the context being used by each group and were similar to what we expected to find for the behavior of normative English listeners. The results obtained for this parameter after all data was collected for each category followed what was expected in the case of full speech presented, without any spectral degradation. mm A Age of ;; • Iffj Immersion ■ ■ J Iff / • Native - ■ j Ml y Toddler I ! / J y ^ W 4 a Child T p If O Teen - • Iff V A Adult I! ffj 0.0 0.2 0.4 0.8 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Probability o f recognizing words in isolation Figure 26. Fitted curves for the different groups and two different band conditions. Since there were ceiling effects in the recognition of words in sentences, we calculated ‘ k ’ values at points where performance reached 25, 50 and 75%. We found the SNR values where performance reached those values and we found word Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 70 performance at the same levels of SNR. Figure 26 shows the fitted curves for the case of eight bands of spectral resolution and for the case of full speech for all the groups tested. In the case of full speech, a greater difference was seen in the value of ‘ k ’ for native English listeners and all the L2 learners. Even the early learners show a lower context use when compared with the native listeners. f O g O 0) c i * s s 42 ■Q g m p r 16 Bands iff 1.0 0.8 0.6 0.4 0.2 Toddler 0.0 0.0 „ 0.2 0 4 0.6 M 1.0 Probability of recognizing words in isolation Figure 27. Fitting for toddler learners with sixteen bands. Values of ‘k’ were also obtained fitting all the data for each category as is shown in figure 27. These values obtained with all data were very similar to the values obtained as was shown in figure 26. Initial results for sentences suggested that normative English learners made less use of context under all conditions when compared with native listeners. With the set of complete data we confirmed this behavior only for the case of full speech. We Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 71 found that when spectral resolution is reduced the values of the parameter ‘ k ’ were larger for early second language listeners (toddler and child learners) than for native English listeners. This is consistent with the performance intensity curves observed for these listeners. While early nonnative English listeners performed similarly to native listeners in the case of sentences, they have a lower performance in the case of words. Late learners show a different pattern of performance. Teen English learners show a similar use of context compared to native English listeners, while adult English learners always show a lower use of context information, as is shown in figure 28. Age of Immersion $ Native Toddler Child - A - Teen Adult k 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of Bands Figure 28. Parameter ‘ k ’ as a function of the number of bands for all data. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 72 We also expected to find that ‘ k ’ always increased as spectral information in the signal is reduced. This is not what we observed when all data was used to estimate this parameter. Although generally ‘ k ’ values are higher as the number of bands decrease, the results are not completely clear. This parameter does not vary systematically according to the number of frequency bands in the stimuli, but some points in the plot suggest that as amount of spectral information in the signal is reduced, listeners tend to use more linguistic information or are being ‘ forced’ to use more of the context. Initial values of ‘ k ’ were calculated using all the information collected in all conditions. 9 8 7 g 6 2 ^ 4 + 3 - i . TTiTr'irr " ^ ^ Num ber o f Bands 4 Bands ■ 6 bands 8 Bands 16 Bands Full Speech I a 1 I a s 1 a a 0 5 10 15 20 25 Age of immersion (years) Figure 29. Estimated ‘ k ’ (25,50,75%). 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 73 To decrease the ceiling effect present in sentence recognition, we also calculated ‘ k ’ values at points were sentence performance reached 25, 50 and 75%. These calculated values are shown in figure 29. The calculated values for the ‘ k ’ factor were very similar in this case to the ones found when all the data was used. This approach did not show us a clearer picture of how normative English listeners really make use of context. Although the values of ‘ k ’ decrease as a function of age of immersion for the case of full speech, a simple equation could not be fitted as was done with initial data. ■ y The coefficient of determination found in the fitting was very small (r =0.25) so we could not describe this curve using the initial value of ‘ k ’ for native English listeners and the age of immersion in the second language. We also expected to be able to fit a similar curve to the values of ‘ k ’ as function of the number of spectral bands. With the obtained values of ‘ k ’ this is not possible. The Boothroyd and Nittrouer model (1988) is a simple model use to describe the amount of context being used by native English listeners under different listening conditions. As was seen in this present study this model was not particularly useful in simplifying the pattern of performance of nonnative English listeners and did not present a simple picture that explains and quantifies how they use context. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 74 CHAPTER 6 SPEECH PERCEPTION B Y NATIVE CHINESE LISTENERS Initially one of the purposes of this study was to test listeners whose first language was Chinese and compare the results obtained for these listeners with the performance of listeners whose first language was Spanish. Time constraints did not allow us to obtain a complete set of data for Chinese speakers, but some preliminary data is available. In the following figures and tables we want to show some of this preliminary information. 6.1. Listeners The same categories as shown in table 1 were defined for native Chinese listeners. The following table shows the characteristics for the listeners tested. None of these subjects was tested completely in all conditions so these results are preliminary. Table 11. Characteristics of normative English subjects with Chinese as LI. Subject Age (years) AOI (years) L2 Experience (years) Daily Use o f L2 BCHF1 21 10 1 1 80% Mean 21.0 10 1 1 80% DCHF2 33 30 3 40% DCHF2 28 24.5 3.5 40% DCHM1 30 28 2 40% DCHM 2 26 24 2 80% Mean 29.3 26.63 2.63 50% (A and D refer to the age group o f the subject; CH = Chinese; F = Female; M = Male; AO I = Age of Immersion in the second language) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2. Performance Figure 30 shows the performance for four and sixteen bands for vowel perception. Performance intensity functions were fitted to the data and parameters found are shown in table 12. ; ; O=)5.97±2.28;PRT=1.B4+0.54 -- (r =0.995) [Natm] Age of Immersion £ Native (N=5) Y Child (Chinese) H Adult (Chinese) Q=58.18*4.36;PRT=1.71*1.85 - (r *0.921) [Child] I Q=72.90+1.78;PRT±5.64+0.60 (r *0.996) [Native] Q ) 6 0 Q*34.78*1.67;PRT*0.49+1.23 *0.873) [ChUd] C 4 0 Q=52.81+5.41;PRT=-0.44+2.01 (**0.88) [Adult] Q=30.53±1.85;PRT=3.12±1.25 =0.856) [Adult] 16 Bands 4 Bands " -1 0 -5 0 5 1 0 1 5 2 0 6 5 -1 0 -5 0 5 1 0 1 5 2 0 6 5 Signal to Noise Ratio (db) Figure 30. Vowel performance in noise for native Chinese listeners. If we compare the parameters shown in table 6 for normative English listeners whose first language is Spanish, with the parameters shown in table 11, we can see that their performance is similar. Subjects in the same category reach similar performance in quiet. From figure 30 we can see that these normative listeners have a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Percent Correct (%) much lower performance in vowel perception when compared to native English listeners. Table 12. Estimated parameters for the sigmoidal fitting for vowels. Condition Listener Q (% ) PRT(dB) P 16 Bands Native 90.16+1.31 -5.18+0.41 0.20+0.02 0.993 Child 58.18+4.36 1.71+1.85 0.18±0.07 0.921 Adult 52.81+5.41 -0.44+2.01 0.43+0.56 0.675 4 Bands Native 53.62±1.58 0.10+0.64 0.26+0.04 0.994 Child 34.78+1.67 0.42±1.23 0.46+0.26 0.873 Adult 30.53±1.85 3.12+1.25 0.38±0.15 0.856 Figure 31 shows the performance for consonants with 4 and 16 bands. 100 9 0 8 0 7 0 6 0 5 0 4 0 30 20 10 0 -10 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65 Signal to Noise Ratio (dB) Figure 31. Consonant recognition in noise for normative listeners (LI is Chinese). 1 11 H I IT I ' | T 1 1 11 1 1 1 1 | M H 1 1 111' l'H V r j T T J Age of l Immersion ; 0 Native (N=5) I y Child (Chinese) ■ Adult (Chinese) &=72.90+1.78;PRT=5.64*0.00 (f=0.99S) [Nathn] ITTT|T!TTJTPrTT|TTTT]pTTr|T!TTjTI) Q=$S.S7±Z2S;PRT-1MtO.S4 (r =0.995) [Native] (&*89.98±Z34;PRT=1.74+Q.57 (r =0.994) fChdd] 0=65.96+2,23;PRT±4.34+1.03 I =0.970) [Child] Q=69.81+2.99;PRT=7.54+1.09 {/=0.980) [Adtllt] 4 Bands &8a.65+4J1PRT=3.27+1.22 (r =0.977) [Adult] 16 Bands | G jX |jU k Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 77 Table 13 shows the parameters found for consonant performance by the listeners tested. Table 13. Estimated parameters for the sigmoidal fitting of consonants. Condition Listener Q(%) PRT(dB) P P 16 Bands Native 95.97±2.28 1.84±0.54 0.17±0.01 0.995 Child 89.98+2.34 1.74+0.57 0.18+0.02 0.994 Adult 88.65+4.31 3.27+1.22 0.15+0.03 0.977 4 Bands Native 72.90+1.78 5.64+0.60 0.14+0.01 0.996 Child 65.96+2.23 4.34±1.03 0.12+0.01 0.970 Adult 69.81±2.99 7.54+1.10 0.14±0.02 0.960 In the case of words and sentences fewer conditions were tested for these listeners. We do not have a complete set of data points to be able to see exactly how word and sentence performance is affected by noise at different band conditions. 100 9 0 - 1 Q = f 0 ( * 4 .6 5 ;P f ? r = 0 .M + 0 J ^ ( 1 * * 0 . 9 7 5 ) [Native! ~ - 8 0 - 1 P_\yn ■ Q = S 3 .7 ( * 2 J 8 ; P R r = 6 .« * 0 .4 9 J— v. ■ o : X 7 Q ) 60 - - \ . / 50 - - Age of Immersion H Native (N=S) Y Child (Chinese) j§ Adult (Chinese) C 40 - 1 3 0 - 1 20-1 Sentences 18 Bands Words 16 Bands 10 -5 0 5 10 15 20 65-10 -5 0 5 10 15 20 65 Signal to Noise Ratio (dB) Figure 32. Word and Sentence recognition for sixteen bands. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In figure 32 we show word and sentence performance for sixteen frequency bands. For nonnative listeners whose first language was Chinese, only the performance in quiet is available. If we compared performance in quiet for these nonnative listeners with the performance in quiet for nonnative listeners whose LI is Spanish, we see that the levels are very similar for these categories. The same is true for the performance for vowels and consonants, even if the curves in these two cases are not that smooth (we do not have a complete set of data for all subjects). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 79 CHAPTER 7 DISCUSSION 7.L Vowel Space Vowels are described by what are called formant frequencies. The resonance points of the mouth when making the vowel sounds give the formant frequencies. Vowel space is defined by plotting the second formant (F2) as a function of the first formant (FI). Vowel space of the first language probably affects the performance of nonnative English listeners in recognizing English vowels. We plotted the perceived formants by the different groups of subjects tested. We have the information of the formant frequencies for the talkers in the vowel database. We also had a reference of the formant frequencies for vowels in Spanish. We found the actual perceived formants from the confusion matrices of the performance for vowels to different talkers. Looking at these plots we can see how the vowel space for older English learners is more cluster together. For early learners of English the vowel space is more separated, it becomes more distinct than the case of late learners. The vowel space for these bilinguals groups tested is not as separated as that of monolingual native English listeners. This performance becomes even clearer with reduced number of channels. Looking at the perceived vowels for the case of eight bands of spectral information we can confirm that that is what is happening. The formant frequencies of the vowels in Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 80 Spanish do not seem to interfere with the vowel space of English. The problem that normative listeners have with the English vowels is that English vowels are less separated and become even closer together with less experience with the language. 2800 2 6 0 0 2 4 0 0 2200 '2000 '1 8 0 0 + 1 6 0 0 1 4 0 0 1200- f 1000 jpTTT|TiTipwf|ttTrfmT|rmr|Tiwp=i • Hillenbrand O Native ▼ Toddier V Child ■ Spanish Heed HI ■ H id H ad o 0 H aw ed + M C§H od-> — X H ood H u d W\7 W V 'h£ %eard $ Hoed la/ M Id Full Speech ■ Early Lmmers 1 Y ’ !TiTfTfTT|T1TtYiTTTpfTP!^TTiTpTfTfT| H eed • Hillenbrand % H a y e d O Native Vv ▼ Teen V Adult ■ Spanish O Hid r O Had ^ Head •H aw ed Id C*Hod * Hood V V W $ v ^ f e r d Hud Hoed /a/ M U Id Full Speech Late Learners J L |jU L iJ L |jJ U U L ^ J U L IJ L |jL O J L ^ p L J U U L |x iJ L J L |jU U U L |j L S g Q O 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 900 F I (Hz) F1 (Hz) Figure 33. Perceived vowels for quiet, unprocessed speech. Not all the speakers present in the Hillenbrand database were used for testing. A total of five men and five women were used and the formant frequencies measured for their vowels were used to calculate the points in the plot. Figure 33 shows the results for quiet and M l speech. We show separate panels for early learners (toddler and child learner group) and late learners (teen and adult group). We can see the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. differences that already exist in perception between native English listeners and early English learners. Figure 34 shows the results for eight bands of information in quiet. 2800 2 6 0 0 2 4 0 0 2200 2000 .1 8 0 0 1 6 0 0 1 4 0 0 1200 1000 8 0 0 l^W fTJTITfYrrrTITiTT^TfTTITfTflTfTTp • Hillenbrand£ ^ O Native Heed *11 Hayed ▼ Toddler ▼ v Child Spanish til T / Hid# 0 j,. 0 V Head /e/ m OHod, U rvrtrl $ HeardY t A f f r W Hud hi/ M Hoed lal Io/ 8 Bands Early Learners Heed ^TfTT|TfTIYITfT|TTfT|TTW |TTTr|TfTT|, i fl • Hillenbranct • O Native ® # H a y e d y Teen V Adult til Who’ i ■ Spanish Hicj7 O H ;ad/"v o OHawed He^T J T Z v O l d j C>V d ^ ( Hud /e/ Heard HoLl lu l m v 8 Bands - Late Learners 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 F1 (Hz) F1 (Hz) Figure 34. Perceived vowels for quiet and eight bands of spectral information. Since the vowel space in English is more densely clustered, nonnative listeners have a harder time in separating different categories of vowels and then they tend to cluster them together closer to one of the categories that is clearer and easier to understand for them. The performance that we show in the plots is for a quiet case Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 82 with unprocessed speech and for eight bands of information. Confusions are even greater when noise is introduced. Confusion matrices for normative listeners for quiet unprocessed speech and for eight bands in quiet are shown in the following tables. Also confusion matrices for one of the native English listeners are shown, so we can compare the confusions that these listeners have between different pairs of vowels compared to normative English listeners. Table 14. Confusion matrix for native English listeners in full speech in quiet. Heed H id Head Had H od Hawed H ood Who’d Hud Heard H ayed Hoed Heed 19 1 - - - - - - - - H id - 20 H ead - 1 ' 39 - - - - - - - H ad - - 2 18 - - - - - - H od - - - 1 : 17 - 2 - - - Hawed - - - - 1 39 - - - - - H ood - - - - - 20 - - - - W ho’d - - - - - - 20 - - - Hud - - - - 1 - 39 - - - Heard - - - - - - - 20..... - - Hayed - - - - - - - - 20 - H oed - - - - - - - - - 20 Table 15. Confusion matrix for toddler English learners in full speech in quiet. Heed H id Head Had H od Hawed H ood Who’d H ud Heard Hayed Hoed Heed 98 1 \ - - - - - - - - H id - TOO - - - - - - - - - Head - - ; 95 5 - - - - - - H ad - - 31 69 - - - - - - - H od - - - 25 35 29 1 10 - - - Hawed - - - - 40 58 - 2 - - - H ood - - - - - - 95* - - 1 - Who’d 1 99 - - - H ud - - - - 9 4 1 86 - - - Heard 100 - - Hayed too - Hoed - - - - - - - - - - 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 83 Table 16. Confusion matrix for child English learners in MI speech in quiet. Heed H id Head Had Hod Hawed Hood Who’d Hud Heard Hayed H oed Heed : 5 H id 1 ■ " 9 9 - - - - - - - - - H ead - - ; 92 6 - - - - 1 1 - H ad - - 40 ! - - - - - - . H od - - 1 47 : 23 1 5 - - 1 4 - - Hawed - - - - 44 ' 5 1 - - 4 - 1 H ood - - - - 1 - 96 1 2 - - Who’d - - - - - - 5 : 95 - - - H ud - - - - 4 2 2 - 92 - - Heard - - 1 - - - - - - 99 - Hayed - - - - - - - - - - 99 1 H oed - - - - - - - - - - 1 Q 0 Table 17. Confusion matrix for teen English learners in full speech in quiet. H eed Hid Head Had Hod Hawed Hood Who’d H ud H eard Hayed H oed H eed 86 1 4 - - - - - - - - - H id 5 - X ' 1 - • - 1 - 1 - - H ead - - 84 1 1 1 1 1 - 2 - - H ad - - 32 5 1 - - - - - H od - - - 38 - - 9 - - 9 - - Hawed - - - 1 38 . - - 32 - 1 H ood - - - - - 1 79 1 9 1 - - Who’d - - - - - - 13 : 85 ' 2 - - H ud - - - - 8 8 6 2 ■ 6 - - Heard - - - - - - - - - 100 - Hayed 7 93 H oed - 1 - - 1 3 - - - - Table 18. Confusion matrix for adult English learners in full speech in quiet. Heed H id Head Had H od Hawed Hood Who’d Hud Heard Hayed H oed Heed 24 - - - - - - - - 2 - H id 24 62 ' 8 - 2 - 1 1 2 - - - Head - - : 52- 26 14 1 2 - 4 - 1 - H ad - - 39 35 15 7 - - - 3 - 1 H od - 1 - 28 ; 51 10 - 1 6 - - 3 Hawed - - - 12 22 30 2 - 31 - - 3 H ood - - - - 1 1 57 37 4 - - - Who’d - - - - - 5 24 1 68 1 - - 2 Hud - - - 9 24 33 3 3 25 - 1 2 Heard - - 3 - - - - - - 96 1 - Hayed 1 - 5 - - - - - - - 93 1 Hoed - - - - 1 26 2 3 - - - 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 84 From the confusion matrices of stimuli tested in quiet, we can see how age of immersion in the second language affects the perception of vowels, making some vowel pairs more difficult to differentiate for nonnative English listeners. Even though we do not have all the confusion matrices for all native listeners we can clearly see that in quiet all native monolingual English listeners achieved nearly perfect recognition and so will be similar to this representative listener. Even bilingual listeners, that is, toddler learners have already problems with certain vowel pairs: ‘ had’ and ‘ head, ‘ hod’ and ‘ had’ and ‘ hod’ and ‘ hawed’ . Child learners have the same problems that toddler learners have. Teen English learners also have problems with the vowel pair ‘ hud’ and ‘ hawed’ and they start confusing ‘ hood’ and ‘ who’ d ’ . Adult English learners confuse even more vowel pairs than the teen 12 learners. Even under conditions of quiet and full speech listeners from this group confuse vowel pairs like ‘ heed’ and ‘ hid’ , they have more problems with ‘ hood’ and ‘ who’ d ’ and also with ‘ hud’ and ‘ hawed’. In the next tables we show the confusion matrices for the different subjects tested in the case of eight bands in quiet. Table 19. Confusion matrix for native English listeners for eight bands in quiet. Heed Hid Head H ad Hod Hawed Hood Who’d Hud Heard H ayed Hoed Heed 17 2 - - - - - - - 1 - H id - ' 17 2 - - - 1 - - - - Head - - ...iS /f 2 . - 2 1 - - - H ad - - 8 12 - - - - - . - H od - - 2 1 16 - - 1 - - - Hawed - - - - 5 J ft. 1 4 - - - Hood - - - - - - 20 - - - - Who’d - - - - - - 3 17 - - - Hud - - - - 2 1 5 12 - - - Heard - - - - - - 1 1 18 - 2 Hayed 2 4 - - - - - - - 14 - Hoed - - - - - - - 6 1 - " '13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 85 Table 20. Confusion matrix for toddler English learners for eight bands in quiet. Heed H id H ead H ad 1 Hawed Hood Who’d H ud Heard I 8. Hoed Heed 91 6 - - - - - - - - 3 - H id 5 79 11 - - • 3 - - - 2 . Head 1 10 61 23 - 1 1 1 2 - - - Had - 1 40 55 2 2 - - - - - - H od - - 4 32 > 17 - - 16 1 - - Hawed 1 - - 1 26 " 4 2 " 3 - 24 1 1 1 Hood - 4 - - - - 8 - 3 - - Who’d - - - - - .. .. 1 5 77 1 5 - 2 H ud - - - 1 1 0 2 28 57 1 - 1 Heard - 1 5 3 1 - 3 1 5 4 - 2 Hayed 36 13 9 - - - - 1 - s - Hoed - - - - - 1 2 40 - 5 - 51 ■a Table 21. Confusion matrix for child English learners for eight bands in quiet. Heed Hid H ead Had Hod Hawed H ood Who’d Hud Heard Hayed H oed Heed 93 8 - - - - - - - - 2 - H id 19 13 - - - 1 1 - - 1 - Head - 4 79 13 1 - 1 1 - 1 - H ad - 1 45 52 - - - - - 2 H od - .. 2 30 ■ v ' 5 18 - - 18 - - 1 Hawed - - I I 33 34 1 - 30 - - - Hood 2 7 5 - - 1 75 10 - - - - Who’d 1 - - - - - 14 v5 - - - - Hud - - 13 3 12 9 11 3 45 3 - 1 Heard - 4 7 1 - - 13 11 55 1 8 Hayed 21 11 5 - - - - 1 - 1 ■ " Z f.... o i - Hoed - - - - 4 1 5 25 - - i 64 Table 22. Confusion matrix for teen English learners for eight bands in quiet. H eed Hid H ead Had H od Hawed H ood Who’d H ud Heard Hayed H oed Heed 85 11 - - - - - - - - 4 - H id 21 65 8 - - - 4 - - - 2 - Head - 2 72 17 3 4 - - - 1 1 - H ad 1 2 51 35 3 3 - - 1 3 - 1 H od 5 47 23 ! 16 1 1 7 - - - Hawed 1 4 30 35- - 1 24 1 - 4 H ood 1 2 4 - - ..T ; >. 67 16 8 1 - - Who’d - - - - 37 49 5 3 - 6 Hud 2 1 29 13 13 1 36 3 - 2 H eard 2 7 15 4 1 1 31 6 25 1 7 Hayed 27 20 3 - - 1 1 1 - - - Hoed - - - 6 11 24 2 2 - Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 86 Table 23. Confusion matrix for adult English learners for eight bands in quiet. Heed Hid Head H ad Hod Hawed Hood Who’d H ud Heard Hayed Hoed Heed 64 23 6 - - - - - - - 6 1 H id 27 44 7 1 4 - 8 2 2 1 3 1 H ead 1 6 23 16 4 4 1 3 2 6 - H ad 1 1 26 11 14 1 - 4 4 2 2 H od - - 1 41 30 1 0 2 - 8 4 3 1 Hawed - - 1 23 26 . 2 1 13 - 1 3 H ood - 2 1 - 5 i 68 13 3 - 1 6 Who’d - - - 1 - - 54 ■ 35 3 3 1 3 Hud - - - 11 29 13 14 4 19 4 2 4 Heard 1 - 10 6 4 - 41 18 5 7 - 8 Hayed 43 19 1 2 1 - - 4 1 1 2 17 - Hoed - - - - 1 1 43 37 3 1 - 14 When spectral resolution is reduced to eight bands, we can see how native monolingual English listeners start having problems with certain vowel pairs when spectral information is reduced. They confuse vowel pairs ‘ head’ and ‘ had’ , ‘ hud’ and ‘ hood’ , ‘ hawed’ and ‘ hod’ , ‘ hawed’ and ‘ hud’ and ‘ hoed’ and ‘ who’ d ’ . Toddler L2 learners have the same confusions but they also have problems with vowel pairs ‘ heed’ and ‘ hayed’ , ‘ hid’ and ‘ head’ and ‘ hod’ and ‘ had’. Child learners have problems with the same vowel pairs and they also start having problems with ‘ hood’ and who ’ d ’ and ‘ heed’ and ‘ hid’ . The teen learner group has larger confusions in these two vowel pairs and also with ‘ hud’ and ‘ hod’. The adult learner group presents confusions with the same set of vowel pairs, although they confusions are even larger. They also present confusions in vowel pairs ‘ hoed’ and ‘ hood’. Nonnative English listeners start having more problems differentiating vowel pairs as age of immersion in the L2 increases. Problems that normative English listeners have with certain vowel pairs in quiet become even more critical under conditions of spectral degradation. Previous Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 87 studies (Boothroyd et a l, 1996 and ter Keurs et al., 1992) have shown that consonants are less affected than vowels by spectral smearing. Spectral information is more important for vowel recognition than for consonant recognition. In our study we also reduced the amount of spectral information in the signal, and even native English listeners were more affected in vowel performance in this case. Normative English listeners showed even larger effects in vowel performance; even under optimal conditions when all the spectral information was available in the signal their performance was lower than that of native English listeners. Although many English consonants tested do not exist in Spanish, normative listeners can perform very close to native English listeners even under conditions of great distortion. Normative English listeners are not only affected by the spectral distortion in the case of vowels but also by the vowel space. This confirms what we saw in the previous figures, vowel space changes with experience with the second language and becomes more separated (Fox et al., 1993) for more proficient bilinguals. Even for proficient bilinguals (early learners) the vowel space is not as separated as native English listeners, and this produces lower performance of this group under difficult listening conditions. This is true even for normative listeners that learned English and Spanish at the same time and were constantly exposed to English (probably in a larger or equal percentage than Spanish) even at home. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.2. Distortion 6 D ’ The results found for ‘ D ’ show that a certain type of ’ distortion’ is introduced when you learned a second language at a later age or at a very early age for some type of stimuli. From figure 18 we can see a clear relationship between ‘ D ’ and the age of immersion in the second language. This distortion as was seen before can also be related to “spectral degradation ’’ since it shows as an increment of the value of 'D', similar to the effect for hearing-impaired patients. We think it is possible to define '£>' as a function of age of immersion in the second language. So we will be able to define: ( 7 . 1 ) D b a n d -c o n d itio n = f(Age o f Immersion) This parameter also depends on the number of channels presented to the listener. It has been shown that nonnative English listeners perform as if they received less amount of spectral information when compared to native English listeners. This information will be an important consideration when fitting patients with a cochlear implant. We fitted the values of XT as a linear function of the age of immersion in the second language as shown in figure 35. For some frequency band conditions good fits were not found, mostly because a value of ‘ D ’ was not available for all groups tested. Table 24 shows the results obtained for the different band condition and different stimuli. We can see from the fitted curves in figure 35 and from the table above that the distortion ‘ D ’ is not a simple function of age of immersion in the second language. If the slopes were the same, then we could determine that both contributions from experience with the language and spectral distortion were independent and we could separate both effects. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this case, we can see that the two factors interact with each other in speech perception by nonnative English listeners. ■ ■ ■ y 6 Bands |§ j S B ands t ie B ands F u ll Speech (a) Vowel (b) Consonant Adult learner Native Listener (d) Word:: (d) Sentence H4l W 4l4H H H 4+4 ^ ^ 0 5 1 0 1 5 2 0 2 5 3 0 0 5 1 0 1 5 20 25 3 0 Age of Immersion in L2 (years) Figure 35. D for different stimuli as a function of age of immersion. Table 24. Parameter ‘ D ’ as a function of age of immersion in L2. Stimuli_____ J^4Bands_________J^6B m ds________^ 8 B a n d s____ O l6 B a n d s D pu U S p eeck Vowel 10.83+1.58AJ (r2— 0.973) 6.90+1.00A, (r^G.910) 2.93+0.35Ai (r=0.770) Consonant 15.61+0.39A]1 (r2 =0.831) 12.62+0.26Ai (r2 =0.984) 10.00+0.26A! (r2 =0.989) 7.68+0.16Ai (r2 =0.890) -0.06+0.08AJ (r2 =o.990) Word 22.34+1.80A, (r2 =0.497) 15.21+0.37A, (1^=0.710) 9.33+0.49Ai (r2 =0.975) 0.41+0.26A, (r2 =0.952) Sentence 12.81+0.54A] (r2 =0.946) 10.43+0.42A, (r2 =0.956) 8.61+0.28A, (r2 =0.953) 6.83+0.18Aj (r2 =0.806) 1.60+0.17Ai (r2=0.726) 1 Ai denotes Age of Immersion in the second language. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 90 In figure 36 we fitted ‘ D ’ as a linear function of the logarithmic number of bands. The coefficients of determination for these curves are better (closer to one), so the linear approximation describes well the relationship between distortion and number of bands. S C O ? 2 43 Q r ——T - - n r - r - r r Age of im m ersion Natiwe (N=5) Toddler (N=5) I Child (N-5) Teen (N=5) Adult (N-5) v o w els ■ Consonants A W ord 4 5 6 7 8 910 2 0 30 4 0 5 0 6070 4 5 6 7 8 9|Q 20 30 40 50 6070 Number of Bands (lo g ) Figure 36. Linear regression fitting of ‘ D ’ as a function of the number of bands. Table 25 shows the parameters found in the linear fitting of ‘ D ’ as a function of the logarithm of the number of bands in the speech signal. Table 26 shows the slopes of the different curves obtained for ‘ D ’ as a logarithmic function of the number of bands. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 91 Table 25. Parameter ‘D ’ as a function of the number of bands. Stimuli -^ N a tive ^Toddler Dchild Dreen T ^ A d u lt Vowel 25.00-14.75Nb1 (r2 =0.881) 26.59-11.54Nb (r2 =0.999) 32.99-15.02Nb (r2 =0.975) 51.65-24.96Nb (r2 =0.848) 79.92-38.23Nb (r2 =1.00) Consonant 23.04-12.90Nb (r2 =0.984) 23.99-13.33Nb (1 ^=0 .966) 25.43-14.02Nb (1^=0.988) 27.37-14.54Nb (1^=0.960) 35.44-19.04Nb (r2 =o.960) Word 40.87-24.52Nb (r2 =0.782) 46.04-25.82Nb (r2 =0.81Q) 46.94-25.94Nb (r*=0.870) 35.89-17.64Nb (^=0.993) 51.24-24.66Nb (i*=l) Sentence 19.16-10.69Nb (r^O.965) 18.67-9.03Nb (r2 =0.948) 19.36-9.18Nb (r2 =0.952) 26.14-12.45Nb (r2=0.973) 33.53-16.75Nb (1^=0.865) !Nb denotes the number of bands. Table 26. Slope of ‘ D ’ (number of dB/doubling of number of channels). Stimuli Native (dB/double) Toddler (dB/double) Child (dB/double) Teen (dB/double) Adult (dB/double) Vowel 4.44 3.47 4.52 7.51 11.51 Consonant 3.88 4.01 4.22 4.38 5.73 Word 7.38 7.77 7.81 5.31 7.42 Sentence 3.22 2.72 2.76 3.75 5.04 We observed that except for the case of words the values of the slopes for native monolingual English listeners are values between three and four. Preliminary studies showed an average slope of 4.4dB/doubling. This is the value we found in the case of consonants but the slope is a little different for other stimuli. If we consider the results for consonants, vowels and sentences the average slope for native monolingual English listeners is 3.85dB/doubling. The average slope for consonants, vowels and sentences is 3.40dB/doubling for toddler L2 learners, 3.83dB/doubling for child learners, 5.21dB/doubling for teen learners and 8.22dB/doubling for adult English learners. In the case of words, the linear approximations found are not as good as the ones found for other stimuli (except for the teen and adult learner groups), since we considered the values of ‘ D ’ for the lowest number of channels. If we eliminate these Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 92 conditions the curves are clearly a linear function of the logarithm of the number of bands and the slope or the number of dB that ‘ D ’ decreases when we double the number of bands of information given will be smaller and most likely closer to the values found for other stimuli. From table 26, it can also be seen that the slope generally increases with age of immersion in the second language (except in the case of words) nonnative English listeners are more affected and the distortion increases. We can probably relate the values of 'D ' found for words with the values for sentences. The distortion values for words are higher than for sentences (words in sentences), which show the use of context. PRT for vowels and consonants can be expressed as a linear function of Q , the performance in quiet as was seen in figure 8. The linear regression found for consonants and vowels was not very good for the normative listener groups, so is not really a good choice to use these equations to have more parameters to calculate the SNR at 50%. We found the following set of equations: Native listeners: (7.2) PRTconsom nt= 17.18-0.16*Q (r2=0.99) (7.3) PRTv o w e i= 7.50-0.13 *Q (r2=0.92) Toddler listeners: (7.4) PRT c o n so n a n t = 8.80-0.08 *Q (r2=0.79) (7.5) PRTv o w e i = 4.04-0.05 *Q (r2=0.3 7) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Child listeners; (7.6) PRTC O n s0 n a n t = 22.6-0.22*Q (r2=0.70) (7.7) PRTvo w ei= 7.13-0.08*Q (r2=0.42) Teen listeners; (7.8) PRTc o n so n a n t = 27.40-0.25 *Q (r2=0.52) (7.9) PRTv o w e i = 14.51-0.21 *Q (r2=0.81) Adult listeners: ( 7 .1 0 ) P RT consonant = 1 0 . 3 1 - 0 . 7 * Q ( / = 0 . 2 4 ) ( 7 .1 1 ) P R T vowei = 1 .8 5 - 8 .2 8 * Q (r2=0.01) In some cases then, with these equations and by measuring performance at fewer conditions (quiet and low noise conditions), avoiding uncomfortable (very difficult conditions) conditions for normal and hearing-impaired subjects we can find other parameters of the Performance Intensity functions (equation 3.1) and calculate the SNR at 50% performance. We can use these equations mostly in the case of native English listeners. Then, we can find the value of the distortion 'D'. Also with the relationship found between ‘ D ’ and age of immersion we will be able to predict the amount of distortion that could affect a particular nonnative listener and predict if he/she would be able to at least perform at a 50% level. Figure 37 shows PRT as a function of the logarithm of the number of bands. Curves for 6, 8, 16 channels and fall speech are shown. Table 27, shows the parameters obtained in the fitting. The approximations are very good except in the case of vowels for the adult English learner group. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 94 C o n so n a n ts Vow els ■ - # Native ~ = - y Toddler ^ 7 ■ Child :: ^ 7"een J L A Adult I i- - 10 £ . . Words S e n te n c e s : -10 6 7 8910 20 30 40 50 6070 6 7 8 910 20 30 40 50 6070 Number of Bands (log) Figure 37. PRT as a function of the logarithm of the number of bands. Table 27. PRT linear regressions. Category FRTvowel PRTConsonant PRT Word PRT Sentence Native 5.03-8.441og(NbJ ) (1^=0.997) 12.43-9.781og(Nb ) (r2 =o.969) 21.53-12.411og(Nb ) (r2 =0.997) 11.13-9.721og(Nb ) (1^=0.980) Toddler 8.77-7.571og(Nb ) (r2=0.986) 12.40-9.621og(Nb ) (r2 =0.929) 25.09-13.71 log(Nb ) (r2=0.963) 9.34-7.301og(Nb ) (r2 =0.989) Child 12.05-9.281og(Nb ) (r2 =0.985) 14.42-10.821og(Nb ) (r2=0.996) 26.82-14.3 llog(N b ) (r2 =0.993) 10.02-7.481og(Nb ) (r2 =0.980) Teen 15.22-10.53log(Nb ) (r2=0.966) 17.25-12.131og(Nb ) 27.32-14.151og(Nb ) (1^=0.996) (r2 =0.979) 17.73-11.4 llog(N b ) (r2 =0.985) Adult 7.30-5.561og(Nb ) (r=0.650) 16.17-11.361og(Nb ) (r2 =0.965) 25.65-13.021og(Nb ) (r^O.928) 14.82-9.211og(Nb) (r2 =0.977) !Nb denotes the number of bands. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 95 Table 28 shows the slopes of linear regression fits to the data for native monolingual listeners and normative English listeners. The slopes are comparable to the ones obtained in previous studies (Fu et al, 1998 and Friesen et ah, 2001) for consonants, vowels and sentences. Only in the case of the adult nonnative category tested with vowels does the PRT change little with the number of bands. Table 28. Slopes (dB/doubling) relating PRT and the log of the number of bands. Stimuli Native (dB/double) Toddler (dB/double) Child (dB/double) Teen (dB/double) Adult (dB/double) Vowel -2.54 -2.28 -2.79 -3.17 -1.67 Consonant -2.94 -2.90 -3.26 -3.65 -3.42 Word -3.74 -4.13 -4.31 -4.26 -3.92 Sentence -2.93 -2.20 -2.25 -3.44 -2.77 P R T values found from the performance curves for each group tested are probably a better parameter to describe how the performance drops as a function of the signal to noise ratio. The SRT value of 50% recognition defined by Plomp is barely reached in some of the bands conditions making approximations found for these cases not as linear as the relationship we found in the case of P R T . This happens since P R T is defined as the SNR needed to reach a performance level midway between chance and performance in quiet. From table 28 we can see that the slopes of different categories tested are very similar for the same type of stimuli. We refit the linear regression for each type of stimuli while holding the slope fixed at the value found for native monolingual English listeners. The coefficient of determination (r2 ) for this fitting was around 0.9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 96 in almost all cases, except for adult English learners tested with vowels, similar to the fits shown in table 27. PRT values were fitted with a line as shown in equation 7.12. (7.12) PRT = a0 + ho*log(Nb ) Table 29 shows the values found for the coefficient a^for the different stimuli and different category. This coefficient is different for each task and for each group. The coefficient ho is the slope of the curve, given by the slope for native English listeners in dB per doubling of the logarithm of the number of bands. ’ ' Age I f ' ' Immersion. j|| Native Toddler, ■ Child Teen Adult Vowels -f-H +f— — 1 — - f —+ -4 -H Consonants - h 1 ------- 1 — i —i i 1 1 ; - r Words f I 1 B IB Sentences » iii i j ..................>........ ■ » i i > 6 78910 20 30 40 506CF0 6 78910 20 30 40 506CF0 Number of Bands (log) Figure 38. PRT fitted curves. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 97 Figure 38 shows the fitted parallel curves obtained when we fit PRT results with linear curves and all slopes are the same for each category of subjects tested for each type of stimuli. Table 29. Coefficient for linear regression. Stimuli ttO T o d d le r ttoC hild tt-O T e e n ttoA dult Vowel 9.80+0.29 (r2 =0.973) 11.06±0.32 (r2 =0.977) 12.76+0.66 (r2 =0.927) 10.67+1.14 (r= 0.478) Consonant 12.59±0.61 (r2 =0.997) 13.20+0.28 (r2 =0.997) 14.49+0.57 (1^=0.997) 14.31+0.61 (r2 =o.997) Word 23.56+0.68 (r^O.955) 24.59+0.52 (r2=0.975) 25.28+0.62 (r2 =0.964) 24.94+0.84 (r2=0.926) Sentence 12.17+0.58 (r2=0.8S0) 12.65+0.57 (r2 =0.893) 15.75+0.50 (r2=0.964) 15.42+0.35 (r2 =0.974) Table 30 shows the values for the fitted PRT with the new slopes and the new coefficient ao found. Table 30. PRT linear regressions. Category PRTvowel PRTconsonant PRT Word PRT Sentence Native 5.03-8.441og(Nb1 ) 12.43-9.781og(Nb ) 21.53-12.411og(Nb ) 11.13-9.721og(Nb ) Toddler 9.80-8.441og(Nb ) 12.59-9.781og(Nb ) 23.56-12.411og(Nb ) 12.17-9.721og(Nb ) Child 11.06-8.441og(Nb ) 13.20-9.781og(Nb ) 24.59-12.411og(Nb ) 12.65-9.721og(Nb ) Teen 12.76-8.441og(Nb ) 14.49-9.781og(Nb ) 25.28-12.41 log(Nb ) 15.75-9.721og(Nb ) Adult 10.67-8.441og(Nb ) 14.31-9.781og(Nb ) 24.94-12.4 llog(N b ) 15.42-9,721og(Nb ) !Nb denotes the number of bands. Using these new curves we can find a value of the distortion ‘ D ’ that will be given by two independent factors: age of immersion in the second language and number of frequency bands in the speech signal as is shown in figure 39. Then the effect of learning a second language later in life in the perception of this L2 language can be model as a simple ‘ D ’ factor. The total distortion for a listener in a given Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 98 category will be an additive result of these two distortions. The distortion given by the spectral resolution is a linear function of the logarithm of the number of bands. Nonnative Native Number o f Bands (tog) Figure 39. Independent factors of distortion. Figure 40 shows the distortion given by the age of immersion in the second language and the curves that were fitted. 10 Vowels Consonants Words Sentences 8 6 4 2 0 0 5 10 1§ 20 25 30 Age o f Immersion (years) Figure 40. Distortion as a function of age of immersion in the second language. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 99 As can be seen Daoi increases rapidly for early age of immersion and then asymptotes for later age of immersion in the second language. The best fits for 'D ’ were found using a power function as shown in equation 7.13. The worst fitting was for the case of vowels, this was mainly due to the performance of the adult group. Table 31 shows the estimated parameters found for the distortion due to age of immersion. (7.13) DA oi = ai *AOf Table 31. Parameters for distortion given the age of immersion. Parameter Vowel Consonant Word Sentence ai 3.30+1.20 0.29±0.22 1,51±0.43 0.75±0.49 P 0.25±0.14 0.62±0.26 0.30+0.11 0.58+0.23 ........T .... 0.713 0.853 0.861 0.857 As can be seen from the two previous equations (7.12 and 7.13) the deficit in performance of nonnative English listeners compared with native monolingual listeners can be measured as a ‘ D ’ factor, which is independent of the distortion factor given by the reduced spectral resolution. These two equations, together with equation (5.1) completely describe the wide set of data obtained in this study, demonstrating that this simple model can account for the different factors that affect speech perception by normative English listeners, normal hearing and hearing impaired. 7.3. Context Effects Contrary to what we initially expected, the results obtained for 'D' and 'k' for sentences cannot be related. While 'k' decreases with age of initial immersion in L2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100 only in the case of full speech as 'D' increases, this is not the case for all different conditions of number of bands. This ‘ distortion ’ also affects the amount of context that is available to you or that you can use in speech recognition, as is shown by the values obtained for ‘ k ’, but exactly how the use of context is affected is not clear. A recent study by Takayanagi et al. (2002) looks at lexical and talker effects on word recognition among native and normative listeners. These listeners are normal and hearing-impaired listeners. In this study, normal hearing and hearing-impaired nonnative listeners required greater intensity in the stimuli to reach the same levels of intelligibility of native listeners. Lexical difficulty was assessed using lexically hard and easy words. A total of 150 words were used (75 easy and 75 hard words). In the study mentioned earlier, all normative listeners from different background languages arrived in US before age 17. A difference in the ratings of familiarity with the words was found between native and normative listeners. Results also indicated a greater difference in performance between native and nonnative listeners for easy words than for hard words. In our study we used a total of 500 words that are not separated between lexically hard and easy words as defined by the NAM model. We used a different type of word database since the number of conditions used for testing is much larger (different band and noise conditions) than in the study mentioned previously. We did not divide the list of words between hard and easy, since the database we used is already used clinically to test patients with hearing impairment or cochlear implants. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 1 We collected a measure of word familiarity given by our normative listeners tested so we can see if the average score for familiarity of the words used for testing is different for different categories of normative listeners. The score for familiarity should decrease with age of immersion in the second language. We defined a scale from one to five to grade the familiarity with the words used in the test. The scale was defined as follows: 1- Unfamiliar word: You have never heard the word before and have no idea of its meaning. 2- Barely familiar word: You have heard the word before but you are not sure of its meaning. 3- Somewhat familiar word: You have heard the word before and are sure or almost sure of its meaning, although you probably don't use it yourself in normal conversation. 4- Familiar word: You hear the word used often and you even use it in conversations. 5- Very familiar word: You hear the word frequently and use it often too. You feel very comfortable using it. Table 32 shows the average results and standard deviations in scores of familiarity given by each group of normative English listeners. We can see from the results of the questionnaire that subjects filled out that familiarity with the words used for testing decreased as experience with English decreased. The familiarity score for the toddler and child group is very similar, also if we looked at figures 9 and 10, that Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. show performance in word recognition we can see again that their performance is very similar. The teen learner group also has a familiarity score close to that of the toddler and child group, but their score is lower and the standard deviation is higher. This group has a lower performance in word recognition than the two previous groups of nonnative English listeners but their performance is better than that of the older learner group that has a lower familiarity score too. Table 32. Familiarity with words. Nonnative Group Average Familiarity Score Standard Deviation Toddler 4.70 0.58 Child 4.73 0.75 Teen 4.62 0.91 Adult 4.17 1.40 If we relate the familiarity scores with the performance in word recognition for each normative learner group, this factor definitely has an effect in the overall performance for words in isolation. Statistical significance in the performance in words due to familiarity was only found for the adult learner group, whose performance is significantly different to other groups tested. In the case of the words contained in the sentence database these are also very familiar words and not only monosyllabic words, so this makes it easier even for adult English learners to perform better in word recognition in sentences. One of the adult English learners tested showed a much better score in word familiarity than other listeners In the same group. The average familiarity score for listener DF10 (from table 2) is 4.7 with a standard deviation of 1.00. This score is Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. comparable to that of early learner groups, although the standard deviation is larger. As can be seen in table 2, this adult English learner shows differences in characteristics with other listeners in the same group, which can contribute to this difference. Listeners DF10 and DM9 use English daily at least 90% of the time. In spite of this, listener DM9 has a similar familiarity score similar to the score of other listeners in the same group (4.09), much lower than that of DF10. A difference that listener DF10 has with the rest of subjects in the group is the number of years of exposure to the second language. Statistical analysis did not show a significant difference in performance in word or sentence recognition within subjects in this group, not even for this subject. If we could have a larger number of subjects in this group and some with the same characteristics of DF10, we might see a significant difference in performance within subjects in this category. We could then conclude that even if you learn a second language late in life if you get immersed for a large number of years you could continue acquiring lexical knowledge of the second language. This knowledge of the language would allow us to develop better central pattern recognition, even if we still perform at the same level of other adult English learners in lower level or more peripheral pattern recognition (phonemic). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 104 CHAPTER 8 CONCLUSIONS Contrary to what was initially expected, results from this study showed that the main difficulty in perception of a second language by normative English listeners is in the perception of vowels. The models applied to the full set of data suggest that the main factor that affects speech understanding by normative listeners is not the lexical and grammatical knowledge of the language but mainly phonemic perception, specifically vowel perception. A simple model allows us to predict all the set of data and make predictions for normal-hearing and hearing-impaired nonnative English listeners with a small number of parameters. Previous studies that relate perception and production of a second language (Flege, 1993 and Flege et al, 1995) by nonnative English listeners have shown that perception of some vowels features is closer to that of natives than production. In contrast, in our case although we did not look at the production of vowels by our nonnative subjects, subjects who did not have any perceived accent (toddler group), showed a difference in perception when compared to native monolingual English listeners. Results from this study also give us a better understanding of how speech pattern recognition develops over time and how plasticity in the brain plays a role in second language learning. Contrary to what was expected, listeners that learned English after puberty, develop better speech recognition patterns compared to adult Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 105 learners. At high levels of noise some grouping was observed with conditions of reduced spectral information, but performance for different stimuli was better that that of adult English learners. Even after puberty there is some plasticity in the brain that allows further development of central pattern mechanisms. As said before in our data years of experience with the second language did not affect the results for people within the same category. Age of immersion in the second language is the primary variable that affects subject performance over all conditions, which will help us describe the performance and the exact disadvantages of L2 learners in speech recognition. 7.1. Models used Contrary to what we expected the values of ‘ j ’ and ‘ k ’ found using the Boothroyd and Nittrouer model (1988) show that nonnative listeners have relative little difficulty with phonemic and linguistic integration. In the use of linguistic information (context) some effect of number of spectral bands is seen for all the subjects tested. The effect of age of immersion in the second language is not clearly seen, although results suggest that early learners are able to use linguistic information available when they are forced to do so (reduced spectral information), while late learners (especially the adult group) are not able to use the information available. A small effect of age of immersion in the second language is seen in the use of lexical information. Number of bands (spectral information) does not affect the use of lexical information or the value of the j ’ factor. Results from this study actually support Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 106 instead the notion that the actual problem that nonnative English listeners encounter in second language perception is due to the perception of vowels. This performance by nonnative English listeners is similar to the performance observed in previous studies for cochlear implant patients. We are not suggesting that normative English listeners suffer from a hearing loss or hearing impairment but only that the same simple model can account for both cases. It is interesting to see that hearing impairment and the degree of experience of normative listeners with the second language affect both groups in their confusions in vowel space. The model used (Plomp’s model) in this study was successfully applied to a very large group of data, and with a small number of parameters we can predict the behavior of normative listeners, normal hearing and hearing-impaired. This global model was able to describe this whole set of data found with different set of stimuli and helps us understand which factors really contribute to speech perception by people with a second language. Using Plomp’s model we can see how lack of experience with a second language affects their performance in English recognition. Even after learning a second language at different stages in their lives nonnative English listeners can still develop well-trained pattern recognition for some types of stimuli. Lexical and grammatical knowledge of the second language are relatively easy problems for n o r m a tive English listeners compared to the problems they encountered at the more sensory level with the perceptual confusion of vowels. This is a result that we did not expect when we started our study. This result, probably points us to a different approach in second language teaching. Since the problems people have with a second Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 107 language are mostly related to the vowel space, (similar to cochlear implant patients), when teaching a second language, more emphasis should be put in teaching L2 learners more of the phonemic level structure of the language, that is, remapping the vowel space. More work should be done on this point when training a person in a second language instead of focusing on linguistic integration. That way normative English listeners may be able to develop better and more robust speech pattern recognition improving their performance in perception of the second language. 7.2. Spectral information Results for nonnative listeners show that they need a larger number of spectral bands to perform at the same level as native English listeners. Although even adult learners reach 50% performance with four channels of frequency information in the case of sentences, if they are presented with isolated words (which could be the case even in a normal conversation) they need at least eight channels of information to reach at least 50% level of performance. Previous studies have shown that cochlear implant patients need more channels of information compared to normal hearing listeners tested with simulations of a cochlear implant stimuli to reach asymptotic performance. Previous studies with cochlear implant patients (Dorman and Loizou, 1998a, Fishman et al., 998, Dorman et al., 1998b and Fu et al., 1998) showed that they required six to eight channels of information to reach asymptotic performance. As we said before, normal hearing nonnative English listeners need at least eight channels to reach 50% recognition of words. Nonnative English listeners implanted with a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 108 cochlear implant would be expected to need at least eight channels to be able to reach 50% performance and probably more channels to reach an asymptotic performance. Depending on the age at which they learned the second language we will expect them to take more, or less advantage of their implant. Nonnative listeners that learned English earlier in life may be able to make better use of an implant. 7.3. Critical learning periods Different types of degraded stimuli presented allow us to determine how different processing mechanisms develop and how long they take to be properly trained. The results confirm the notion expressed by linguists that critical periods of learning actually differ depending on the task that the second language learner has to perform (Barinaga, 2000 and Newport, 2003). We can confirm that for the case of sentences where we have context information in most cases, even at a late age a listener can still achieve relatively normal use of linguistic knowledge. This also happens in the phonemic level for consonants. In the case of vowels, we see that the critical period for learning is reached earlier and even subjects considered as fully bilingual are not able to recognize vowels as well as native English listeners. Contrary to what was expected, differences in performance of teen and adult learner groups were observed, although at high levels of noise there is similarity in their performance. This is an important difference, since it has usually been thought that learning a second language after puberty is no different than learning it at an even Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. later age in life. Our study has shown that there is an advantage on learning a second language before you reach the age of eighteen. Listeners from the teen group that learned English at 15 or 16 years of age (after puberty but before the age of eighteen) perform much better than all subjects in the adult learner group under all the stimuli they were tested with. Their recognition patterns were not completely robust under all noise conditions, especially when spectral information decreased. As spectral information in the signal increased their recognition patterns became more robust to noise. Even people that learned a L2 at this age if implanted with a cochlear implant may be able to take more advantage of their implant when compared to an adult English learner. We have some pilot data for Chinese listeners at some conditions and at some of the same ages of immersion defined for Spanish listeners. When we compared the broad patterns of recognition found for these subjects we found that their performance was similar to that of normative listeners whose first language is Spanish. Although Chinese listeners have different phonemic confusions than those of the Spanish listeners, we can see that the critical periods of learning for each type of stimuli was similar. We can conclude then that the main factor determining the overall performance of normative listeners is the age of immersion in the second language instead of the first language of the listener (not language specific, at least not for Chinese and Spanish). In the study by Takayanagi et al. (2002) subjects from different first language backgrounds were used and similar patterns of performance due to lexical context or difficulty was observed for all of them. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 110 Some of the normative listeners in the same category differ in the years they have been immersed in the second language and in the daily use of English. We also have all the information for each subject of the first language of both parents and level of education. No significant differences were found within subjects in the same group for the amount of people tested in each category (5). This study has given a better understanding of the factors that affect speech perception by nonnative English listeners and of how different type of stimuli are perceived. Results obtained in this study showed us that a simple model can account for the performance of nonnative English listeners in second language perception and can help us understand the problems they encounter in speech perception. Further analysis might allow us to come up with a simpler model that more accurately describes their behavior. The main findings of the present study are: - Vowel space is the main factor that makes a difference in speech perception by nonnative English listeners. Although there are definitely different difficulties in speech perception associated with age of immersion in the second language, the biggest problem is vowel perception. Specific vowel confusions shown by nonnative English listeners are due to the distortion of the L2 vowel space relative to the vowel space of the LI. Age of immersion in the second language and spectral degradation are additive factors that affect speech perception. - The same model that we applied to the performance of second language learners can also be applied to hearing impaired listeners. A simple model can Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. explain a very broad set of conditions: second language performance, hearing impaired performance, spectral resolution, cochlear-implant performance and hearing in noise. - Critical learning periods are different for different stimuli, but learning a second language before adulthood, even if it is after puberty allows nonnative listeners to develop better speech pattern recognition. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 112 REFERENCES Barinaga, Marcia (2000). “A critical issue for the Brain,” Science 288, 2116-2119. Best, C.T., McRoberts, G.W. and Sithole, N.M. (1988). “Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants,” J. Exp. Psychol. Hum. Percept. Perform., 14, 345-360. Best, C.T. and Strange, W. (1992). “Effects of phonological and phonetic factors on cross-language perception of approximants,” J. Phonetics 20, 305-330. Boothroyd, A., Mullheam, B., Gong, J. and Ostroff, J. (1996). “Effects of spectral smearing on phoneme and word recognition,” J. Acoust. Soc. Am. 100, 1807-1818. Boothroyd, A. and Nittrouer, S. (1988). “Mathematical treatment of context effects in phoneme and word recognition,” J. Acoust. Soc. Am. 84(1), 101-114. Bronkhorst, A.W., Bosman, A.J. and Smoorenburg, G.F. (1993). “A model for context effects in speech recognition,” J. Acoust. Soc. Am. 93(1), 499-589. Dorman, M.F., and Loizou, P.C. (1998a). “The identification of consonants and vowels by cochlear implants patients using a 6-channel continuous interleaved sampling processor and by normally hearing subjects using simulations of processors with two to nine channels,” Ear and Hearing 19, 162-166. Dorman, M.F., Loizou, P.C., Fitzke, J., and Tu, Z. (1998b). “The recognition of sentences in noise by normally hearing listeners using simulations of cochlear- implant signal processing,” J. Acoust. Soc. Am. 104, 3583-3585. Dunn, D.S. (2001). “Statistics and data analysis for the behavioral sciences”. McGraw Hill, 1st Ed. New York, USA. Edgington, E.S. (1995). “Randomization Tests”. Marcel Dekker Inc., 3rd. Ed. New York USA. F ish m a n , K., Shannon, R.V. and Slattery, W.A. (1997). “Speech recognition as a function of the number of electrodes used in SPEAK cochlear implant speech processor,” J. Speech, Lang, and Hear. Res., 40, 1201-1215. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 113 Flege, J. E. (1991). “Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language,” J. Acoust. Soc. Am. 89(1), 395-411. Flege, J. E. (1993). “Production and perception of a novel, second language phonetic contrast,” J. Acoust. Soc. Am. 93(3), 1589-1608. Flege, J. E., Munro, M. and MacKay, Ian R.A. (1995). “Factors affecting strength of perceived foreign accent in a second language,” J. Acoust. Soc. Am. 97(5), Pt. 1, 3125-3134. Florentine, M. (1985a). “Nonnative Listeners Perception of American-English in Noise,” Inter-Noise 85, Munich, pp. 1021-1024. Florentine, M. (1985b). “Speech Perception in Noise by fluent, Nonnative listeners,” Communication Research Lab, Northeastern University, Boston, MA USA. Fox, R. A., Flege, J. E., and Munro, M. (1995). “The perception of English and Spanish vowels by native English and Spanish listeners: A multidimensional scaling Analysis,” J. Acoust. Soc. Am., 97(4), 2540-2551. French, N.R. and Steinberg, J.C. (1947). “Factors governing the intelligibility of speech sounds,” J. Acoust. Soc. Am., 19, 90-119. Friesen, L. M., Shannon, R. V., Baskent, D. and Wang, X. (2001). “Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants,” J. Acoust. Soc. Am. 110(2), 1150-1163. Fu, Q-J., Shannon R.V., and Wang, X. (1998). “Effects of noise and spectral resolution on vowel and consonant recognition: Acoustic and electric hearing,” J. Acoust. Soc. Am. 104(6), 3586-3596. Gelnett, D., Sumida, A., Nilsson, M., and Soli, S. D. (1995). “Development of the Hearing In Noise Test for Children (HINT-C),” Presented at the American Academy of Audiology, Dallas, Texas. Grant, K. W. and Seitz, P. F. (2000). “The recognition of isolated words and words in sentences: Individual variability in the use of sentence context,” J. Acoust. Soc. Am. 107(2), 1000-1011. Guion, S. G., Flege, J. E., Akahane-Yamada, R. and Pruitt, J. C. (2000). “An investigation of current models of second language speech perception: The case of Japanese adults' perception of English consonants,” J. Acoust. Soc. Am. 107(5), Pt. 1,2711-2724. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 114 Hillenbrand, J., Getty, L. A., Clark, M. J. and Wheeler, K. (1995). “Acoustic characteristics of American English vowels,” J. Acoust. Soc. Am. 97(5), 3099- 3110. Karnrn, C.A., Dirks, D.D. and Bell, T.S. (1985). “Speech recognition and the Articulation Index for normal and hearing-impaired listeners,” J. Acoust. Soc. Am. 77(1), 281-288. Luce, P.A. and Pisoni, D.B. (1998). “Recognizing spoken words: the Neighborhood Activation Model,” Ear and Hearing, 1-36. Mayo, L. H., Florentine, M. and Buus, S. (1997). “Age of Second-Language Acquisition and Perception of Speech in Noise,” J. Speech, Lang. Hear. Res., 40, 686-693. Most, T. and Adi-Bensaid, L. (2001). “The Influence of Contextual Information on the Perception of Speech by Postlingually and Prelingually Hearing-Impaired Hebrew- Speaking Adolescents and Adults,” Ear & Hearing, 22(3), 252-263. Newport, E. (2003). Critical thinking about critical periods: What should critical period for language acquisition look like?” Language and Mind Conference III, Second Language Acquisition: “Knowledge of a Second Language: Epistemological and Empirical Issues, University of Southern California, College of Letters, Arts and Sciences. Pallier, C., Colome, A. and Sebastian-Galles, N. (2001). “The influence of Native- Language Phonology on Lexical Access: Exemplar-Based Versus Abstract Lexical Entries,” Psychological Science, 12(6), 445-449. Pavlovic, C.V., Studebaker, G.A. and Sherbecoe, R.L. (1986). “An articulation index based procedure for predicting the speech recognition performance of hearing- impaired individuals,” J. Acoust. Soc. Am. 80(1), 50-57. Payton, K.L. and Braida, L.D. (1999). “A method to determine the speech transmission index from speech waveforms,” J. Acoust. Soc. Am. 106(6), 3637- 3648. Perkell, J. Numa, W., Vick, I., Lane, H., Balkany, T. and Gould, J. (2001). “Language-Specific, Hearing-Related Changes in Vowel Spaces: A Preliminary Study of English- and Spanish-Speaking Cochlear Implant Users,” Ear & Hearing, 22(6), 461-470. Plomp, R. (1978). “Auditory handicap of hearing impairment and the limited benefit of hearing aids,” J. Acoust. Soc. Am. 63(2), 533-549. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 115 Plomp, R. (1986). “A signal-to-noise ratio model for the speech-reception threshold of the hearing impaired,” J. Speech Hear. Res., 29, 146-154. Plomp, R. and Mimpen, A.M. (1979). “Speech-reception threshold for sentences as a function of age and noise level,” J. Acoust. Soc. Am. 66(5), 1333-1342. Scholes, R. (1967). “Phoneme categorization of synthetic vocalic stimuli by speakers of Japanese, Spanish, Persian and American English,” Lang. Speech. 10, 46-68. Shannon, R. V.; Zeng, F-G; Kamath, V.; Wygonski, J. and Ekelid, M. (1995). “Speech Recognition with Primarily Temporal Cues,” Science 270, 303-304. Shannon, R. V., Jensvold, A.; Padilla, M.; Robert, M. and Wang, X. (1999). “Consonant recordings for speech testing,” J. Acoust. Soc. Am., 106(6), L71-L74. Takayanagi, S.; Dirks, D.D. and Moshfegh, A. (2002). “Lexical and talker effects on word recognition among native and non-native listeners with normal and impaired hearing,” J. Speech Lang. Hear. Res., 45, 585-597. Ter Keurs, M.; Festen, J.M. and Plomp, R. (1989). “Effect of spectral envelope smearing on speech reception. I,” J. Acoust. Soc. Am., 91(1), 2872-2880. Ter Keurs, M.; Festen, J.M. and Plomp, R. (1993). “Effect of spectral envelope smearing on speech reception. II,” J. Acoust. Soc. Am., 93(3), 1547-1552. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Development of ceramic-to-metal package for BION microstimulator
PDF
Effects of prenatal cocaine exposure in quantitative sleep measures in infants
PDF
Cardiorespiratory interactions in sleep apnea: A comprehensive model
PDF
Head injury biomechanics: Quantification of head injury measures in rear-end motor vehicle collisions
PDF
Comparisons of deconvolution algorithms in pharmacokinetic analysis
PDF
Cellular kinetic models of the antiviral agent (R)-9-(2-phosphonylmethoxypropyl)adenine (PMPA)
PDF
A model of cardiorespiratory autoregulation in obstructive sleep apnea
PDF
A model of upper airway dynamics in obstructive sleep apnea syndrome
PDF
Finite element analysis of the effects of stem geometry, surface finish and cement viscoelasticity on debonding and subsidence of total hip prosthesis
PDF
Contact pressures in the distal radioulnar joint as a function of radial malunion
PDF
Evaluation of R.F. transmitters for optimized operation of muscle stimulating implants
PDF
Comparing signal processing methods for spectral bio-imaging
PDF
Bayesian estimation using Markov chain Monte Carlo methods in pharmacokinetic system analysis
PDF
Propofol Effects On Eeg And Levels Of Sedation
PDF
A preliminary investigation to determine the effects of a crosslinking reagent on the fatigue resistance of the posterior annulus of the intervertebral disc
PDF
Characteristics and properties of modified gelatin cross-linked with saline for tissue engineering applications
PDF
Biological materials investigation by atomic force microscope (AFM)
PDF
Speaker-independent vowel recognition with coarse spectral information using a TDNN
PDF
Bayesian inference using Markov chain Monte Carlo methods in pharmacokinetic /pharmacodynamic systems analysis
PDF
Design of a portable infrared spectrometer: application to the noninvasive measurement of glucose
Asset Metadata
Creator
Padilla, Monica
(author)
Core Title
English phoneme and word recognition by nonnative English speakers as a function of spectral resolution and English experience
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, biomedical,language, linguistics,OAI-PMH Harvest,Speech Communication
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Shannon, Robert V. (
committee chair
), [illegible] (
committee member
), D'Argenio, David (
committee member
), Khoo, Michael C.K. (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-646338
Unique identifier
UC11340140
Identifier
3116765.pdf (filename),usctheses-c16-646338 (legacy record id)
Legacy Identifier
3116765.pdf
Dmrecord
646338
Document Type
Dissertation
Rights
Padilla, Monica
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, biomedical
language, linguistics