Page 1 |
Save page Remove page | Previous | 1 of 128 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
STATISTICAL INFERENCE FOR DYNAMICAL, INTERACTING MULTI-OBJECT SYSTEMS WITH !EMPHASIS ON HIJMAN SMALL <iROIJP INTERACTIONSI by ~iktor Rozgicl A D1ssertatwn Presented to the FACUCI'Y OF !'HE USC GRADUATE SCHOOL IINIVERSITY OF SOUTHERN CAT IFORNIA [n Partial Fulfillment of the Requuements for the Degree IDOC !'OR OF PHILOSOPHY (ELECTRICAL ENGINEERING) ~ugust 2011 Copyright 20 II Viktor Rozgic
Object Description
Title | Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions |
Author | Rozgić, Viktor |
Author email | rozgic@usc.edu;rozgic@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Electrical Engineering |
School | Viterbi School of Engineering |
Date defended/completed | 2011-08-09 |
Date submitted | 2011-08-09 |
Date approved | 2011-08-09 |
Restricted until | 2012-08-09 |
Date published | 2011-08-09 |
Advisor (committee chair) | Narayanan, Shrikanth S. |
Advisor (committee member) |
Georgiou, Panayiotis G. Schaal, Stefan |
Abstract | In this dissertation we propose contributions that address the problems in behavioral signal processing for small-group interactions from three important perspectives. We propose algorithmic contributions to general statistical inference methods for interacting dynamical systems, in particular multi-object tracking problems. We propose multi-modal, multi-channel signal processing methods to address particular aspects of the small group interaction, with emphasis on speaker segmentation and speaker/participant tracking. Finally, we present a recording environment, a collected dyadic interaction database and propose methods for estimation of approach-avoidance behavior labels based on non-verbal interaction cues. ❧ In the first part of this dissertation we present a class of sequential block sampling algorithms for tracking unknown and variable number of objects. Proposed algorithms are applicable to multi-object tracking scenarios in which only available observations are detector outputs, and also to scenarios where both detector outputs and more complex observations which figure in the data-association free likelihood models. Proposed algorithms provide a way to construct block proposal distributions using detection based observations. Key parts of the proposed algorithms are methods for sampling block proposal distributions. We propose two novel methods for this purpose, one is based on a variational approximation scheme and the other represents an adaptive MCMC sampling scheme. Samples from block proposal distributions are further used in the sequential MCMC (or SMC) framework. We tested proposed schemes on two synthetic datasets. Results demonstrate benefits of processing longer observation sequences in multi-object tracking problems in a more efficient manner that the classical sequential sampling schemes. ❧ In the second part, we present a multi-target tracking algorithm for algorithm for tracking multiple speakers by a microphone array. As the microphone array observations do not provide an easy way to design speaker location detectors we propose a mixture particle filter for tracing multiple acoustic sources track-before-detection (TbD) framework. This method belongs to the same class of sequential signal processing algorithms (SMC or MCMC) as the block sampler proposed in the first part, while the major difference is that block sampler belongs to the detect-before-tracking class of algorithms. The sound source trajectories reconstructed by by the mixture particle filter do not necessarily correspond to speech only. Therefore, we apply an adapted optimal change point algorithm to segment obtained sound source trajectories into speech and non-speech segments. The algorithm is tested on a multi-participant meeting database as a separate module and as a part of a multi-modal system for automatic meeting monitoring. In both cases it provided significant improvements on the speaker detection and segmentation tasks. ❧ In the third part, we present a modality fusion algorithm that exploits complementary properties of video tracking, microphone array localization and speaker identification and solves the problem of speaker segmentation in presence of the overlapped speech. In this paper we address improvements to our multimodal system for ❧ tracking of meeting participants and speaker segmentation with a focus on the microphone array modality. We propose an algorithm that uses Directions-of-Arrival estimated for each microphone pair as observations and performs tracking of an unknown number of acoustically-active meeting participants and subsequent speaker ❧ segmentation. The proposed algorithm is unique from multiple perspectives. First, we suggest a hidden Markov model architecture that performs fusion of three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel likelihood model for the microphone array observations for dealing with overlapped speech. We propose a modification of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function that takes into the account possible microphone occlusions. We employ the multi-object detect-before-tracking approach and use the local maxima of the modified SPR-GCC-PHAT functions as sound source detectors. Multiple detection locations are fused into the joint likelihood by the joint probabilistic data association. This transforms an original speaker segmentation problem in the multi-object tracking framework where it is solved using Bayesian filtering/smoothing methods. ❧ This concludes exposition on the core signal processing algorithms closely related to the multi-object tracking and the last part of the dissertation is dedicated to the analysis and automation of human behavior coding in small group interactions. ❧ We present a new multi-modal database for analysis of participant behaviors in dyadic interactions. This database contains multiple channels with close- and far-field audio, a high definition camera array and motion capture data. Presence of the motion capture allows precise analysis of the body language low-level descriptors and its comparison with similar descriptors derived from video data. Data is manually labeled by multiple human annotators using psychology-informed guides. We analyzed relation between approach-avoidance (A-A) behavior and various non-verbal body language and acoustic features, and influence of the audio and video channels on experts' labeling decisions. Also we analyzed dependency of the statistical interaction descriptors and A-A labels on participants' roles. ❧ At the end, we propose an ordinal regression (OR) algorithm and its extension applicable to time series for estimation the approach-and-avoidance (AA) behavior quantifiers (lables) in human dyadic interactions. The proposed algorithm transforms the ordinal regression to multiple binary classification problems, solves them by independent score-outputting classifiers and fits the cumulative logit logistic regression model with proportional odds (CLLRMP) the classifier score vectors. The time series extension treats labels as states of the hidden Markov model with likelihood based on the probabilistic CLLRMP output. We compare performances of the proposed algorithm applying the weighted binary SVMs the second step (SVM-OLR), its extension (HMM-SVM-OLR) and the baseline multi-class SVM. The HMM-SVM-OLR achieves the highest estimation accuracy. |
Keyword | statistical inference; multi-modal signal processing; behavioral signal processing |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Rozgić, Viktor |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Archival file | uscthesesreloadpub_Volume71/etd-RozgiVikto-267-0.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text |
STATISTICAL INFERENCE FOR
DYNAMICAL, INTERACTING
MULTI-OBJECT SYSTEMS WITH
!EMPHASIS ON HIJMAN SMALL |