Page 1 |
Save page Remove page | Previous | 1 of 134 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
ACTIVE DATA ACQUISITION FOR BUILDING LANGUAGE MODELS FOR
SPEECH RECOGNITION
by
Abhinav Sethy
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2007
Copyright 2007 Abhinav Sethy
Object Description
| Title | Active data acquisition for building language models for speech recognition |
| Author | Sethy, Abhinav |
| Author email | sethy@usc.edu |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Electrical Engineering |
| School | Viterbi School of Engineering |
| Date defended/completed | 2007-05-07 |
| Date submitted | 2007 |
| Restricted until | Unrestricted |
| Date published | 2007-07-26 |
| Advisor (committee chair) | Narayanan, Shrikanth |
| Advisor (committee member) |
Byrd, Dani Jenkins, Keith Ramabhadran, Bhuvana |
| Abstract | The ability to build task specific language models, rapidly and with minimal human effort, is an important factor for fast deployment of natural language processing applications such as speech recognition in different domains. Although in-domain data is hard to gather, we can utilize easily accessible large sources of generic text such as the Internet (WWW ) or the GigaWord corpus for building statistical task language models by appropriate data selection and filtering methods. We propose a query generation and data weighting strategy which iteratively acquires data from such sources using a set of adaptive models to greatly improve the performance achieved from models built from limited in-domain data.; The proposed query generation mechanism utilizes Relative Entropy to extend measures such as TFIDF to larger text contexts and weighted utterances/data sets. Our method also models the data source properties by tracking the performance of queries in every iteration. The data obtained from these sources is weighted in terms of its fit to the topic/domain and merged to existing models in an iterative fashion. The fitness to the task is evaluated using a combination of features in a positive only classification framework. By including features which measure the speech recognizer confusability we attempt to select data which helps build a better discriminative language model for speech recognition.; In some speech recognition applications such as spoken document retrieval, automated call center it is possible to acquire a lot of raw speech data. The manual annotation effort required to convert this speech data into text is costly and time consuming. We present ways to merge the data acquisition process with active learning and unsupervised adaptation methods which can help reduce the annotation requirement significantly be selecting a smaller subset from the raw speech data for annotation. |
| Keyword | speech recognition; language modeling |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Type | texts |
| Legacy record ID | usctheses-m689 |
| Rights | Sethy, Abhinav |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Sethy-20070726 |
| Archival file | uscthesesreloadpub_Volume32/etd-Sethy-20070726.pdf |
Description
| Title | Page 1 |
| Full text | ACTIVE DATA ACQUISITION FOR BUILDING LANGUAGE MODELS FOR SPEECH RECOGNITION by Abhinav Sethy A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2007 Copyright 2007 Abhinav Sethy |
Comments
Post a Comment for Page 1

