Page 1 |
Save page Remove page | Previous | 1 of 124 | Next |
|
small (250x250 max)
medium (500x500 max)
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
Subset |
CATEGORICAL PROSODY MODELS FOR SPOKEN LANGUAGE
APPLICATIONS
by
Sankaranarayanan Ananthakrishnan
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Sankaranarayanan Ananthakrishnan
Object Description
| Title | Categorical prosody models for spoken language applications |
| Author | Ananthakrishnan, Sankaranarayanan |
| Author email | ananthak@usc.edu |
| Degree | Doctor of Philosophy |
| Document type | Dissertation |
| Degree program | Electrical Engineering |
| School | Viterbi School of Engineering |
| Date defended/completed | 2008-05-01 |
| Date submitted | 2008 |
| Restricted until | Unrestricted |
| Date published | 2008-06-25 |
| Advisor (committee chair) | Narayanan, Shrikanth |
| Advisor (committee member) |
Jenkins, B. Keith Byrd, Dani |
| Abstract | Prosody refers to rhythm, intonation, and lexical stress in speech, and is expressed via a broad class of supra-segmental phenomena that occur at the syllable, word, utterance and discourse levels. It is an important constituent of spoken language and plays a vital role in language understanding. However, the lack of a model linking the acoustic correlates of prosody to linguistic elements has made it difficult to incorporate prosody in spoken language systems.; In this dissertation, I employ categorical representations to develop a framework for integrating prosody within spoken language applications in a systematic fashion. In this framework, prosodic events (e.g. pitch accents and phrase boundaries) are represented by discrete symbols. However, manual annotation of categorical prosody labels is a laborious, time-intensive and expensive task. Therefore, the first part of this thesis focuses on developing automatic prosody labeling tools in both supervised and unsupervised learning environments. My work focuses on detecting the presence or absence of prosodic events such as pitch accents and phrase boundaries, as well as fine-grained classification based on the ToBI annotation standard.; I then describe the use of categorical prosody models in the context of automatic speech recognition (ASR). The prosody-enriched ASR developed in this thesis provides a statistically significant reduction in word error rate (WER) over a baseline system that makes no use of prosody. The lattice-enrichment implementation has the added advantage of being able to decode the word sequences and underlying prosody labels simultaneously.; Sparsity problems due to the lack of sufficient prosody-annotated data hinder the development of prosody models for the enriched ASR. In order to alleviate these coverage issues, I develop novel algorithms for confidence-based adaptation and smoothing of prosodic acoustic and language models using a large, unlabeled dataset. Confidence weights extracted from confusion networks constructed from the adaptation data are used to generate fractional n-gram counts for prosodic language model smoothing, and for adaptation of the prosodic acoustic model using a weighted variant of the Expectation-Maximization (EM) algorithm. The proposed adaptation techniques significantly improve the quality and coverage of the prosody models over seed models trained from the (small) human annotated dataset. |
| Keyword | categorical prosody models; prosody labeling; prosody enriched speech recognition; confidence-based unsupervised adaptation |
| Language | English |
| Part of collection | University of Southern California dissertations and theses |
| Publisher (of the original version) | University of Southern California |
| Place of publication (of the original version) | Los Angeles, California |
| Publisher (of the digital version) | University of Southern California. Libraries |
| Type | texts |
| Legacy record ID | usctheses-m1291 |
| Rights | Ananthakrishnan, Sankaranarayanan |
| Repository name | Libraries, University of Southern California |
| Repository address | Los Angeles, California |
| Repository email | http://www.usc.edu/isd/libraries/services/ask_a_librarian/email/ |
| Filename | etd-Ananthakrishnan-20080625 |
| Archival file | uscthesesreloadpub_Volume44/etd-Ananthakrishnan-20080625.pdf |
Description
| Title | Page 1 |
| Full text | CATEGORICAL PROSODY MODELS FOR SPOKEN LANGUAGE APPLICATIONS by Sankaranarayanan Ananthakrishnan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2008 Copyright 2008 Sankaranarayanan Ananthakrishnan |
Comments
Post a Comment for Page 1

