Page 176 |
Save page Remove page | Previous | 176 of 196 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
Table 5.2: Variance for the non-parametric mutual information estimates obtained from 1000 independent realizations of the empirical process. These results are associated with the bias values reported in Table 5.2 and they follow the same organization. 11 33 58 101 179 564 3164 5626 TSVQ: 1.473e-03 1.732e-03 7.941e-04 5.878e-04 2.153e-04 6.009e-05 4.245e-06 1.534e-06 GESS: 4.037e-03 2.509e-03 1.234e-03 9.908e-04 3.824e-04 6.009e-05 4.251e-06 1.992e-06 KERN: 2.428e-02 1.165e-02 6.461e-03 4.294e-03 2.258e-03 6.009e-05 9.379e-05 4.941e-05 PROD: 5.992e-02 2.431e-02 9.190e-03 4.825e-03 2.037e-03 6.009e-05 8.629e-05 6.193e-05 TSVQ: 1.406e-03 2.563e-03 1.418e-03 1.022e-03 5.302e-04 1.926e-04 3.049e-05 1.437e-05 GESS: 5.565e-03 3.369e-03 1.729e-03 1.384e-03 6.555e-04 1.926e-04 3.006e-05 1.632e-05 KERN: 2.565e-02 1.198e-02 6.465e-03 4.127e-03 2.461e-03 1.926e-04 1.145e-04 6.314e-05 PROD: 6.055e-02 2.276e-02 9.664e-03 5.346e-03 2.454e-03 1.926e-04 1.212e-04 7.653e-05 TSVQ: 1.563e-03 3.355e-03 2.268e-03 1.697e-03 9.138e-04 3.759e-04 6.863e-05 3.585e-05 GESS: 6.514e-03 4.386e-03 2.632e-03 2.079e-03 9.792e-04 3.759e-04 7.069e-05 3.894e-05 KERN: 2.820e-02 1.322e-02 7.290e-03 4.674e-03 2.497e-03 3.759e-04 1.494e-04 7.831e-05 PROD: 6.687e-02 2.631e-02 1.327e-02 6.010e-03 2.894e-03 3.759e-04 1.769e-04 1.146e-04 TSVQ: 1.007e-03 3.621e-03 2.464e-03 2.227e-03 1.333e-03 6.586e-04 1.386e-04 7.679e-05 GESS: 6.426e-03 4.900e-03 3.601e-03 3.257e-03 1.521e-03 6.586e-04 1.418e-04 8.362e-05 KERN: 3.132e-02 1.534e-02 1.042e-02 6.225e-03 3.412e-03 6.586e-04 2.345e-04 1.244e-04 PROD: 7.025e-02 2.808e-02 1.602e-02 8.991e-03 4.766e-03 6.586e-04 4.581e-04 3.277e-04 where universal strong consistency is guaranteed. In addition, we provided an empirical comparison of these constructions with respect to classical estimation techniques. From the results presented here, it is evident that additional improvement can be obtained by choosing the design variable of this family of consistency estimates as a function of the data, in such a way to find a good compromise between estimation and approximation error effects [22, 21]. The idea would be to shrink the gap with respect to the ideal oracle result where in the domain of consistent design values, we choose the one with the best small sample performance for a given joint distribution. Improvement can be obtained from the inductive nature of data-driven tree-structured partitions, as explored in [18], and in theory motivated from results in the context of regression and classification trees [5, 51, 62]. This is an interesting direction for further research, not only to explore pruning algorithms that could improve the small sample properties of 163
Object Description
Title | On optimal signal representation for statistical learning and pattern recognition |
Author | Silva, Jorge |
Author email | jorgesil@usc.edu; josilva@ing.uchile.cl |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Electrical Engineering |
School | Viterbi School of Engineering |
Date defended/completed | 2008-06-23 |
Date submitted | 2008 |
Restricted until | Unrestricted |
Date published | 2008-10-21 |
Advisor (committee chair) | Narayanan, Shrikanth S. |
Advisor (committee member) |
Kuo, C.-C. Jay Ordóñez, Fernando I. |
Abstract | This work presents contributions on two important aspects of the role of signal representation in statistical learning problems, in the context of deriving new methodologies and representations for speech recognition and the estimation of information theoretic quantities.; The first topic focuses on the problem of optimal filter bank selection using Wavelet Packets (WPs) for speech recognition applications. We propose new results to show an estimation-approximation error tradeoff across sequence of embedded representations. These results were used to formulate the minimum probability of error signal representation (MPE-SR) problem as a complexity regularization criterion. Restricting this criterion to the filter bank selection, algorithmic solutions are provided by exploring the dyadic tree-structure of WPs. These solutions are stipulated in terms of a set of conditional independent assumptions for the acoustic observation process, in particular, a Markov tree property across the indexed structure of WPs. In the technical side, this work presents contributions on the extension of minimum cost tree pruning algorithms and their properties to affine tree functionals. For the experimental validation, a phone classification task ratifies the goodness of Wavelet Packets as an analysis scheme for non-stationary time-series processes, and the effectiveness of the MPE-SR to provide cost effective discriminative filter bank solution for pattern recognition.; The second topic addresses the problem of data-dependent partitions for the estimation of mutual information and Kullback-Leibler divergence (KLD). This work proposes general histogram-based estimates considering non-product data-driven partition schemes. The main contribution is the stipulation of sufficient conditions to make these histogram-based constructions strongly consistent for both problems. The sufficient conditions consider combinatorial complexity indicator for partition families and the use of large deviation type of inequalities (Vapnik-Chervonenkis inequalities). On the application side, two emblematic data-dependent constructions are derived from this result, one based on statistically equivalent blocks and the other, on a tree-structured vector quantization scheme. A range of design values was stipulated to guarantee strongly consistency estimates for both framework. Furthermore, experimental results under controlled settings demonstrate the superiority of these data-driven techniques in terms of a bias-variance analysis when compared to conventional product histogram-based and kernel plug-in estimates. |
Keyword | signal representation in statistical learning; Bayes decision theory; basis selection; tree-structured bases and Wavelet packet (WP); complexity regularization; minimum cost tree pruning; family pruning problem; mutual information estimation; divergence estimation; data-dependent partitions; statistical learning theory; concentration inequalities; tree-structured vector quantization. |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m1684 |
Contributing entity | University of Southern California |
Rights | Silva, Jorge |
Repository name | Libraries, University of Southern California |
Repository address | Los Angeles, California |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-Silva-2450 |
Archival file | uscthesesreloadpub_Volume32/etd-Silva-2450.pdf |
Description
Title | Page 176 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | Table 5.2: Variance for the non-parametric mutual information estimates obtained from 1000 independent realizations of the empirical process. These results are associated with the bias values reported in Table 5.2 and they follow the same organization. 11 33 58 101 179 564 3164 5626 TSVQ: 1.473e-03 1.732e-03 7.941e-04 5.878e-04 2.153e-04 6.009e-05 4.245e-06 1.534e-06 GESS: 4.037e-03 2.509e-03 1.234e-03 9.908e-04 3.824e-04 6.009e-05 4.251e-06 1.992e-06 KERN: 2.428e-02 1.165e-02 6.461e-03 4.294e-03 2.258e-03 6.009e-05 9.379e-05 4.941e-05 PROD: 5.992e-02 2.431e-02 9.190e-03 4.825e-03 2.037e-03 6.009e-05 8.629e-05 6.193e-05 TSVQ: 1.406e-03 2.563e-03 1.418e-03 1.022e-03 5.302e-04 1.926e-04 3.049e-05 1.437e-05 GESS: 5.565e-03 3.369e-03 1.729e-03 1.384e-03 6.555e-04 1.926e-04 3.006e-05 1.632e-05 KERN: 2.565e-02 1.198e-02 6.465e-03 4.127e-03 2.461e-03 1.926e-04 1.145e-04 6.314e-05 PROD: 6.055e-02 2.276e-02 9.664e-03 5.346e-03 2.454e-03 1.926e-04 1.212e-04 7.653e-05 TSVQ: 1.563e-03 3.355e-03 2.268e-03 1.697e-03 9.138e-04 3.759e-04 6.863e-05 3.585e-05 GESS: 6.514e-03 4.386e-03 2.632e-03 2.079e-03 9.792e-04 3.759e-04 7.069e-05 3.894e-05 KERN: 2.820e-02 1.322e-02 7.290e-03 4.674e-03 2.497e-03 3.759e-04 1.494e-04 7.831e-05 PROD: 6.687e-02 2.631e-02 1.327e-02 6.010e-03 2.894e-03 3.759e-04 1.769e-04 1.146e-04 TSVQ: 1.007e-03 3.621e-03 2.464e-03 2.227e-03 1.333e-03 6.586e-04 1.386e-04 7.679e-05 GESS: 6.426e-03 4.900e-03 3.601e-03 3.257e-03 1.521e-03 6.586e-04 1.418e-04 8.362e-05 KERN: 3.132e-02 1.534e-02 1.042e-02 6.225e-03 3.412e-03 6.586e-04 2.345e-04 1.244e-04 PROD: 7.025e-02 2.808e-02 1.602e-02 8.991e-03 4.766e-03 6.586e-04 4.581e-04 3.277e-04 where universal strong consistency is guaranteed. In addition, we provided an empirical comparison of these constructions with respect to classical estimation techniques. From the results presented here, it is evident that additional improvement can be obtained by choosing the design variable of this family of consistency estimates as a function of the data, in such a way to find a good compromise between estimation and approximation error effects [22, 21]. The idea would be to shrink the gap with respect to the ideal oracle result where in the domain of consistent design values, we choose the one with the best small sample performance for a given joint distribution. Improvement can be obtained from the inductive nature of data-driven tree-structured partitions, as explored in [18], and in theory motivated from results in the context of regression and classification trees [5, 51, 62]. This is an interesting direction for further research, not only to explore pruning algorithms that could improve the small sample properties of 163 |