Page 1 |
Save page Remove page | Previous | 1 of 124 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
RECOGNITION AND CHARACTERIZATION OF
UNSTRUCTURED ENVIRONMENTAL SOUNDS
by
Selina Chu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2011
Copyright 2011 Selina Chu
Object Description
| Title | Recognition and characterization of unstructured environmental sounds |
| Author | Chu, Selina |
| Author email | selinach@usc.edu; selina.chu@gmail.com |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Computer Science |
| School | Viterbi School of Engineering |
| Date defended/completed | 2010-12-01 |
| Date submitted | 2011 |
| Restricted until | Unrestricted |
| Date published | 2011-05-11 |
| Advisor (committee chair) |
Narayanan, Shrikanth Kuo, C.-C. Jay |
| Advisor (committee member) |
Shahabi, Cyrus Jenkins, Keith |
| Abstract | Environmental sounds are what we hear everyday, or more generally sounds that surround us – ambient or background audio. Human utilize both vision and hearing to respond to their surroundings, a capability still quite limited in machine processing. The first step toward achieving multi-modality is the ability to process unstructured audio and recognize audio scenes (or environments). The goal of my thesis is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent or device, investigating on the development of appropriate feature extraction algorithm and learning techniques for modeling the variations of the environment. Such ability would have applications in content analysis and mining of multimedia data or improving robustness in context aware applications through multi-modality, such as in assistive robotics, surveillance, or mobile device-based services.; The goal of this thesis is on the characterization of unstructured environmental sounds for understanding and predicting the context surrounding of an agent or device. Most research on audio recognition has focused primarily on speech and music. Less attention has been paid to the challenges and opportunities for using audio to characterize unstructured audio. My research focuses on investigating challenging issues in characterizing unstructured environmental audio and to develop novel algorithms for modeling the variations of the environment.; The first step in building a recognition system for unstructured auditory environment was to investigate on techniques and audio features for working with such audio data. We begin by performing a study that explore suitable features and the feasibility of designing an automatic environment recognition system using audio information. In this initial investigation, I have found that traditional recognition and feature extraction for audio were not suitable for environmental sound, as they lack any type of structures, unlike those of speech and music which contain formantic and harmonic structures, thus dispelling the notion that traditional speech and music recognition techniques can simply be used for realistic environmental sound.; Natural unstructured environment sounds contain a large variety of sounds, which are in fact noise-like and thus are not effectively modeled by Mel-frequency cepstral coefficients (MFCCs) or other commonly-used audio features, e.g. energy, zero-crossing, etc. To achieve a more effective representation, I proposed a specialized feature extraction method for environmental sounds that utilizes the matching pursuit (MP) algorithm to learn the inherent structure of each type of sounds, which we called MP-features. MP-features have shown to classify sounds where the frequency domain features (e.g., MFCCs) fail and can be advantageous when combining with MFCCs to improve the overall performance.; The third component leads to our investigation on modeling and detecting the background audio. One of the goals of this research is to characterize an environment. Since many events would blend into the background, I wanted to look for a way to achieve a general model for any particular environment. Once we have an idea of the background, it will enable us to identify foreground events even if we haven’t seen these events before. Therefore, the next section proposes a framework for robust audio background modeling, which includes prediction, data knowledge and persistent characteristics of the environment. This approach has the ability to model the background and detect foreground events as well as the ability to verify whether the predicted background is indeed the background or a foreground event that protracts for a longer period of time. I also investigated the use of a semi-supervised learning technique to exploit unlabeled audio data.; The final components of my thesis will involve investigating on the use of deep learning as a way to obtain a generative model-based method for classification and to learn features within each type of sounds in an unsupervised manner. The inherent nature of environmental sound is noisy and contains relatively large amounts of overlapping events between different environments. Environmental sounds contain large variances even within a single environment type, and frequently, there are no divisible or clear boundaries between some types. Traditional methods of classification are generally not robust enough to handle classes with overlaps. This audio, hence, requires representation by complex models. Using deep learning architecture provides a way to obtain a generative model-based method for classification. Specifically, I considered the use of Deep Belief Networks (DBNs) to model environmental audio and investigate its applicability with noisy data to improve robustness and generalization. A framework was proposed using composite-DBNs to discover high-level representations by unsupervised learning of features characterizing the different acoustic environments and providing a hierarchical structure of sound types in a data-driven fashion. Experimental results on real data sets demonstrate its effectiveness over traditional methods with over 90% accuracy on recognition for a high number of environmental sound types. |
| Keyword | unstructured audio classification; auditory scene recognition; environmental sounds; background modeling; semi-supervised learning; deep belief networks; unsupervised feature learning; generalization; data representation; feature extraction; feature selection; MFCC; matching pursuit |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Provenance | Electronically uploaded by the author |
| Type | texts |
| Legacy record ID | usctheses-m3939 |
| Rights | Chu, Selina |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Chu-4483 |
| Archival file | uscthesesreloadpub_Volume48/etd-Chu-4483.pdf |
Description
| Title | Page 1 |
| Full text | RECOGNITION AND CHARACTERIZATION OF UNSTRUCTURED ENVIRONMENTAL SOUNDS by Selina Chu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2011 Copyright 2011 Selina Chu |
Comments
Post a Comment for Page 1

