Page 1 |
Save page Remove page | Previous | 1 of 120 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
VISUALIZING AND MODELING VOCAL PRODUCTION DYNAMICS by Erik Bresch A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2011 Copyright 2011 Erik Bresch
Object Description
Title | Visualizing and modeling vocal production dynamics |
Author | Bresch, Erik |
Author email | bresch@usc.edu; erik.bresch@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Electrical Engineering |
School | Viterbi School of Engineering |
Date defended/completed | 2010-11-05 |
Date submitted | 2011 |
Restricted until | Unrestricted |
Date published | 2011-04-06 |
Advisor (committee chair) | Narayanan, Shrikanth S. |
Advisor (committee member) |
Nayak, Krishna S. Goldstein, Louis M. |
Abstract | Understanding human speech production is of fundamental importance for basic and applied research in human communication: from speech science and linguistics to clinical and engineering development. While the vocal tract posture and movement can be investigated using a host of techniques, the newly developed real-time (RT) magnetic resonance imaging (MRI) technology has a particular advantage - it produces complete views of the entire moving vocal tract including the pharyngeal structures in a non-invasive manner. RT-MRI promises a new means for visualizing and quantifying the spatio-temporal articulatory details of speech production and it also allows for exploring novel data-intensive, machine learning based computational approaches to speech production modeling.; The central goal of this thesis is to develop new technological capabilities and to use these novel tools for studying human vocal tract shaping during speech production. The research, which is inherently interdisciplinary, combines technological elements (to design engineering methods and systems to acquire and process novel speech production data), experimental elements (to design linguistically meaningful studies to gather useful insights) and computational elements (to explain the observed data and design predictive capabilities).; In the first chapter the use of RT-MRI as an emerging technique for speech production research studies is motivated. An outline is provided of the biomedical image acquisition and image processing challenges, potentials, and opportunities arising with the use of RT-MRI.; The second part describes novel hardware technology and signal processing algorithms which were developed to facilitate synchronous speech audio recordings during RT-MRI scans. Here, the main problem lies in the loud noise produced by the MRI acquisition process. The proposed solution incorporates digital synchronization hardware and an adaptive signal processing algorithm which allows the acquisition of speech audio with satisfactory quality for further analysis. This enables joint speech-image data acquisition that in turn allows for joint modeling of articulatory-acoustic phenomena.; The third chapter addresses the extraction of relevant geometrical features from the vast stream of MR images. In the case of the commonly used midsagittal view of the human vocal tract the geometrical features of interest are the locations of the articulators, and hence the underlying image processing problem to be solved is that of edge detection. Further complications arise from the poor MR image quality, which is compromised by the inherent trade-off between spatial, temporal resolution, and signal to noise ratio. A solution to the edge detection problem will be devised using a deformable geometrical model of the human vocal tract. Mathematically the proposed procedure relies on designing alternate gradient vector flows for the solution of a non-linear least squares optimization problem. With the new method the human vocal tract outline can be traced automatically.; Chapters 4 and 5 describe two vocal production studies using articulatory vocal tract data. The first study investigates 5 soprano singers' static vocal tract shaping during the singing production of vowel sounds, and it considers the much-researched theory of resonance tuning. The study successfully validates the usefulness of RT-MRI data and the data processing methods of Chapters 2 and 3. The second study focuses on the tongue shaping of English sibilant fricative sounds, and reproduces previously known findings with the new RT-MRI modality.; The last part of this thesis proposes a statistical framework for the modeling of articulatory speech data. Here, the main focus lies on the Coupled Hidden Markov Model as a candidate system to capture the dynamics of the multi-dimensional vocal tract shaping process. It is demonstrated that using this methodology it is possible to capture in a data driven way the well-known timing signatures of the velum-oral coordination of English nasal sounds in word onset and coda positions.; This thesis is concluded with a brief summary of the contributions and a discussion of possible future research directions. |
Keyword | speech; speech production; MRI; magnetic resonance imaging; vocal tract |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m3721 |
Contributing entity | University of Southern California |
Rights | Bresch, Erik |
Repository name | Libraries, University of Southern California |
Repository address | Los Angeles, California |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-Bresch-4237 |
Archival file | uscthesesreloadpub_Volume17/etd-Bresch-4237.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | VISUALIZING AND MODELING VOCAL PRODUCTION DYNAMICS by Erik Bresch A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2011 Copyright 2011 Erik Bresch |