Page 1 |
Save page Remove page | Previous | 1 of 153 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
BIOLOGICALLY INSPIRED AUDITORY ATTENTION
MODELS WITH APPLICATIONS IN SPEECH AND
AUDIO PROCESSING
by
Ozlem Kalinli
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2009
Copyright 2009 Ozlem Kalinli
Object Description
| Title | Biologically inspired auditory attention models with applications in speech and audio processing |
| Author | Kalinli, Ozlem |
| Author email | kalinli@usc.edu; okalinli@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Electrical Engineering |
| School | Viterbi School of Engineering |
| Date defended/completed | 2009-10-22 |
| Date submitted | 2009 |
| Restricted until | Unrestricted |
| Date published | 2009-11-13 |
| Advisor (committee chair) | Narayanan, Shrikanth S. |
| Advisor (committee member) |
Kuo, C.-C. Jay Mel, Bartlett W. |
| Abstract | Humans can precisely process and interpret complex scenes in real time despite the tremendous amount of stimuli impinging the senses and the limited resources of the nervous system. One of the key enablers of this capability is a neural mechanism called "attention". The focus of this dissertation is to develop computational algorithms that emulate human auditory attention and to demonstrate their effectiveness in spoken language and audio processing applications.; Attention allows primates to efficiently allocate their neural resources to the locations of interest to precisely interpret a scene or to search for a target. In a scene, some stimuli are inherently salient within the context, and they attract attention in a bottom-up manner. Saliency-driven attention is a rapid, bottom-up, task-independent process, and it detects the objects that perceptually pop out of a scene by significantly differing from their neighbors. The second form of attention is a top-down task-dependent process which uses prior knowledge and learned past experience to focus attention on the target locations in a scene to enhance the processing.; One of the primary contributions of this thesis work is the development of a novel bottom-up auditory attention model. An auditory saliency map is proposed to model such saliency-driven bottom-up auditory attention. The feature extraction structure of the attention model is inspired by the processing stages in the human auditory system. It has been demonstrated with the experiments that the bottom-up auditory attention model can successfully detect prominent syllables and words in speech. In addition, the bottom-up auditory attention model is used to detect salient acoustic events in complex acoustic scenes. It has been shown that using only the selected salient events for acoustic scene classification performs better than the conventional audio content processing algorithms, which process the whole signal fully and treat everything as equally important.; The next contribution of this thesis work is an analysis of the effect of task-dependent influences on auditory attention. For this, a biologically plausible top-down model is proposed in this thesis. The top-down attention model shares the same front-end with the bottom-up auditory attention model and biases the features to mimic the task influences on neurons. In addition to the acoustic cues, the influence of higher level task-dependent cues such as lexical and syntactic information is also incorporated into the model. The combined model achieves the highest performance on prominent syllable/word detection tasks indicating the importance of a priori task information.; Finally, an attention shift decoding method inspired by human speech recognition is proposed in this dissertation. In contrast to the traditional automatic speech recognition systems, which decode speech fully and consecutively from left-to-right, the attention shift decoding method decodes speech inconsecutively using reliability criteria. To detect reliable regions of speech, a new set of features is proposed in this dissertation. The attention shift decoding improves the automatic speech recognition performance. |
| Keyword | auditory attention; auditory saliency map; auditory gist; bottom-up auditory attention; top-down auditory attention; prominence detection; attention shift decoding; acoustic scene classification |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m2739 |
| Rights | Kalinli, Ozlem |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Kalinli-3369 |
| Archival file | uscthesesreloadpub_Volume48/etd-Kalinli-3369.pdf |
Description
| Title | Page 1 |
| Full text | BIOLOGICALLY INSPIRED AUDITORY ATTENTION MODELS WITH APPLICATIONS IN SPEECH AND AUDIO PROCESSING by Ozlem Kalinli A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2009 Copyright 2009 Ozlem Kalinli |
Comments
Post a Comment for Page 1

