Page 1 |
Save page Remove page | Previous | 1 of 132 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
ROBUST SPEAKER CLUSTERING UNDER VARIATION IN DATA CHARACTERISTICS by Kyu Jeong Han A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2009 Copyright 2009 Kyu Jeong Han
Object Description
Title | Robust speaker clustering under variation in data characteristics |
Author | Han, Kyu Jeong |
Author email | kyuhan@usc.edu; kyujeong.han@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Electrical Engineering |
School | Viterbi School of Engineering |
Date defended/completed | 2009-07-20 |
Date submitted | 2009 |
Restricted until | Unrestricted |
Date published | 2009-11-21 |
Advisor (committee chair) | Narayanan, Shrikanth S. |
Advisor (committee member) |
Kuo, C.-C. Jay Kang, Hong-Goo Shahabi, Cyrus |
Abstract | Speaker clustering refers to a process of classifying a set of input speech data (or speech segments) by a speaker identity in an unsupervised way, based on the similarity of speaker-specific characteristics between the data. The process identifies the speech segments of the same speaker source without any prior speaker-specific information of the given input data. This speaker-perspective, unsupervised classification of speech data can be applied as a pre-processing step to speech/speaker recognition or multimedia data segmentation/classification in various ways. Thus, speaker clustering has been recently attracting much attention in the research area of speech recognition and multimedia data processing.; One big, yet unsolved, issue in the research field of speaker clustering is unreliable clustering performance under the variation of input speech data. In this dissertation, we deal with this problem in the framework of agglomerative hierarchical speaker clustering (AHSC) in two perspectives: stopping point estimation and inter-cluster distance measurement. In order to improve the robustness of stopping point estimation for AHSC under the variation of input speech data, we propose a new statistical measure called information change rate (ICR), which can improve estimation of the optimal stopping point. The ICR-based stopping point estimation method is not only empirically but also theoretically verified to be more robust to the variation of input speech data than the conventional BIC-based method. In order to improve the robustness of inter-cluster distance measurement for AHSC under the variation of input speech data, we also propose selective AHSC and incremental Gaussian mixture cluster modeling. These two approaches are proven to provide much more reliability for speaker clustering performance under the variation of input speech data.; Based on these results on robust speaker clustering under the variation of input speech data, we extend our interest to implementing a speaker diarization system, which is more robust to the variation of input audio data. (Speaker diarization refers to an automated process that can annotate a given audio source in terms of "who spoke when".) Focusing on speaker diarization of meeting conversations speech, we propose two refinement schemes to further improve the reliability of speaker clustering performance in the framework of speaker diarization under the variation of input audio data. One is selection of representative speech segments and the other is interaction pattern modeling between meeting participants, and both of them are experimentally verified to enhance the reliability of speaker clustering performance and hence improve the overall diarization accuracy under the variation of input audio data. |
Keyword | incremental gaussian mixtures; information change rate; speaker clustering speaker Diarization; speaker modeling |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m2753 |
Contributing entity | University of Southern California |
Rights | Han, Kyu Jeong |
Repository name | Libraries, University of Southern California |
Repository address | Los Angeles, California |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-Han-3367 |
Archival file | uscthesesreloadpub_Volume56/etd-Han-3367.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | ROBUST SPEAKER CLUSTERING UNDER VARIATION IN DATA CHARACTERISTICS by Kyu Jeong Han A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2009 Copyright 2009 Kyu Jeong Han |