Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Learning the geometric structure of high dimensional data using the Tensor Voting Graph
(USC Thesis Other)
Learning the geometric structure of high dimensional data using the Tensor Voting Graph
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Learning the Geometric Structure of High Dimensional Data using the Tensor Voting Graph by Shay Deutsch A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2016 Copyright 2016 Shay Deutsch Table of Contents List of Tables v List of Figures vii List of Algorithms xi Abstract xv 1 Introduction 1 1.1 Limitations of Existing Methods . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 The Tensor Voting Graph . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 A general framework to the intersecting Multi-Manifolds problem . 8 1.3.3 MFD: A new Manifold Frequency Denoising Framework based on Analyzing Manifold Frequencies . . . . . . . . . . . . . . . . . . . . 9 1.3.4 Theoretical Foundation and Analysis for Manifold Denoising Based on Graph Signal Processing Tools . . . . . . . . . . . . . . . . . . 10 1.3.5 A Unied Approach for Learning Noisy Manifolds with Singularities 10 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Related Work 13 2.1 Classic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 The single manifold case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 The multi- manifold case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Manifold Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Tensor Voting 23 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Review of the Tensor Voting Framework . . . . . . . . . . . . . . . . . . . 24 4 The Tensor Voting Graph 28 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 The Tensor Voting Graph Construction . . . . . . . . . . . . . . . . . . . 31 ii 4.3.1 Relationship with Previous works . . . . . . . . . . . . . . . . . . . 35 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4.1 Geodesic distance comparison on inliers . . . . . . . . . . . . . . . 38 4.4.2 Comparison with outliers . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.3 Computing geodesic distances on intersecting manifolds . . . . . . 41 4.5 Clustering multiple manifolds using TVG . . . . . . . . . . . . . . . . . . 42 4.5.1 Clustering Synthetic data . . . . . . . . . . . . . . . . . . . . . . . 42 4.5.2 Experiment with large amount of outliers . . . . . . . . . . . . . . 43 4.5.3 Experiments with real data: application to Motion Segmentation . 43 4.5.4 Clustering with corrupted trajectories . . . . . . . . . . . . . . . . 46 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5 Intersecting Manifolds: Detection, segmentation, and labeling 48 5.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.1 Intersection Delineation . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2.2 Global representation of the smooth manifolds parts . . . . . . . . 55 5.2.3 Ambiguity Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Experimental Results with Ambiguity Resolution Algorithm . . . . . . . . 57 5.3.1 Clustering and Junction type inference in 2D . . . . . . . . . . . . 58 5.3.2 Experimental Results without outliers . . . . . . . . . . . . . . . . 59 5.3.3 Experiments with outliers . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.4 Experiments with manifolds embedded in high dimensional spaces 66 5.3.5 Experiments with Real Data sets . . . . . . . . . . . . . . . . . . . 66 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6 Manifold Frequency Denoising 70 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.3.1 Model Assumptions and Previous Results In Manifold Learning . . 76 6.4 Spectral Graph Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.5.2 Noiseless Observations of Smooth Manifold (f over G) . . . . . . . 82 6.5.3 Noisy graph signals ~ f over G . . . . . . . . . . . . . . . . . . . . . 89 6.5.4 Noisy graph signals ~ f over ~ G . . . . . . . . . . . . . . . . . . . . . 91 6.6 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.7.1 Experimental Results on local tangent space estimation . . . . . . 105 6.7.2 Experimental Results with real datasets . . . . . . . . . . . . . . . 107 6.8 Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . . . . . 108 iii 7 Regularizing Manifolds with Singularities 110 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.4 Theoretical Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.5.1 Experiments of manifolds with singularities with inlier noise . . . . 121 7.5.2 Experiments with both outlier and inlier noise . . . . . . . . . . . 122 7.5.3 Experiments with real data set: the cyclo-octane data . . . . . . . 124 7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 8 Discussion and Future Work 128 8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Appendix 134 Reference List 138 iv List of Tables 1.1 Evaluation of manifold learning approaches in terms of their strengths and limitations. In the bottom row "ours" refers to the framework suggested in this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1 Comparison with state of the art on geodesic distances: I corresponds to experiments with inliers noise, and I&O corresponds to data contaminated with outliers. The results reported are in terms of err G D percent. . . . . . 39 4.2 Geodesic distances comparison on intersecting manifold . . . . . . . . . . 41 4.3 Clustering accuracy of clustering multiple manifolds . . . . . . . . . . . . 43 4.4 Clustering accuracy (%) of dierent methods on the Hopkins 155 motion segmentation database. The best are boldfaced . . . . . . . . . . . . . . . 46 5.1 Classication results of junctions in 2D . . . . . . . . . . . . . . . . . . . . 59 5.2 comparison with state of the art . . . . . . . . . . . . . . . . . . . . . . . 63 5.3 Comparison results in the presence of outliers . . . . . . . . . . . . . . . . 63 5.4 Manifolds in high dimensional space . . . . . . . . . . . . . . . . . . . . . 65 5.5 Tangent space average angular error results for the two intersecting planes and intersecting circles data(Figure 5.8) . . . . . . . . . . . . . . . . . . . 68 5.6 Classication results of human activities on Motion Capture data . . . . . 68 6.1 Local tangent space estimation error on a sphere using Tensor Voting and local PCA, before and after denoising using MFD method . . . . . . . . . 106 6.2 RMSE average error reconstruction results on motion capture data and Frey face datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 v 7.1 Average error of local tangent space estimation using Tensor Voting before after and using our denoising method. . . . . . . . . . . . . . . . . . . . . 122 7.2 Clustering results of two intersecting spheres before and after denoising. In both cases we used our TVG to construct the graph. . . . . . . . . . . 123 vi List of Figures 3.1 Vote generation for a stick ball voter, and a generic vote generation. The votes are functions of the position of the voter A and receiver B and the tensor of the voter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1 Overview of the suggested Tensor Voting Graph Construction Method . . 29 4.2 Illustration of the TVG: points b and e that are close in Euclidean distance but far in manifold distance have zero anity, whereas the points b and c are connected with high anity corresponding to small distance on the manifold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 In the example shown in Figure (a), 6 points are sampled from two distinct manifolds (points with the same color belong to the same manifold), such that for each point, among its 3 nearest Euclidean neighbors, at least 1- nearest neighbor belong to a dierent manifold. Figure (b) plots the ground truth anity matrix (red color entries- high anity, blue color entries - low anity. Figure (c) and (d) show the anity matrices constructed by covariance matrices distance and our suggested TVG, respectively. . . . . 38 4.4 Data set used for estimating the geodesic distances. From left to right: Viviani's curve, Torus, Ennepers and Helicoid surfaces . . . . . . . . . . . 40 4.5 Data set of the multiple manifolds used for the clustering experiments . . 42 4.6 Examples of motion sequences frames of indoor and outdoor scenes con- taining two or three motions from the Hopkins 155 motion database . . . 45 4.7 Clustering accuracy comparison for the cars motion sequence with cor- rupted trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1 Flow chart of the proposed method . . . . . . . . . . . . . . . . . . . . . . 50 vii 5.2 The graph of the ball eigenvalue N (shown on the right hand side), as a function of the positions of the points which correspond to two intersecting circles(shown on the left hand side) . . . . . . . . . . . . . . . . . . . . . . 52 5.3 Flow chart of the proposed Ambiguity Resolution Algorithm . . . . . . . 53 5.4 Junctions dataset used in 2D . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.5 Junction type inference illustration . . . . . . . . . . . . . . . . . . . . . . 61 5.6 Manifolds dataset used in table 2 . . . . . . . . . . . . . . . . . . . . . . . 62 5.7 Figures of manifolds with outliers used in the experiments :(a) Intersecting Mobius bands (b) Intersecting Swiss Roll with a plane (c) Intersecting Spheres (d) Intersecting planes . . . . . . . . . . . . . . . . . . . . . . . . 64 5.8 Evaluation on challenging dataset of manifolds with small maximal prin- cipal angle reveal the degradation in performance of both linear and non- linear multi-manifold clustering methods . . . . . . . . . . . . . . . . . . . 65 6.1 Illustration of our method . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2 Scaling function (blue curve) and wavelet generating kernels g(s) for dif- ferent choices of scales s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.3 Example of a smooth manifold. For the same manifold (a) shows the case where the normals do not extend beyond =, while (b) shows the case where the normals intersect and extend beyond . (c) is an example of a manifold with a condition number 1=. . . . . . . . . . . . . . . . . . . . . 82 6.4 Spectral Graph Wavelets coecients corresponding to dierent frequency bands s of a circle. Figures (a)-(e) plot the SGW bands for an increasing number of frequency band s of a noise-free circle, while Figures (f)-(g) plot the SGW bands for a dierent choice of s for a noisy circle . . . . . . . . . 99 6.5 Top raw: Plot of the energy of a noiseless Swiss roll with a hole (a) in the graph Fourier transform (GFT) (b) and in the spectral wavelet domain (c) Bottom row: Plot of the energy of a noiseless Mocap data ([35]) (d) in the graph Fourier transform (GFT) (e) and in the spectral wavelet domain (f) 100 6.6 Experimental results on a circle: (a) Noisy circle (noise shown in red (b) Results with MD (c) Results with LLD (d) Results with MFD (e) Ground truth (f) Results with MD (g) Results with LLD (h) Results with MFD . 101 viii 6.7 Experimental results on a sinus function embedded in high dimension: comparison with dierent number of k nearest neighbors (a) Results with MD (b) Results with LLD (c) Results with MFD (d) Results with MD (e) Results with LLD (f) Results with MFD . . . . . . . . . . . . . . . . . . . 102 6.8 Experimental results on a helix and sh bowl manifolds: (a) Noisy sh bowl (noise shown in red color (b) Results with MD (c) Results with LLD (d) Results with MFD (e) Noisy helix (noise shown in red color) (f) Results with MD (g) Results with LLD (h) Results with MFD . . . . . . . . . . . 103 6.9 Zoom into the the circle and the helix manifolds. (a) Noisy circle (noise shown in red color, ground truth in blue) (b) Results with MD (c) Results with LLD (d) Results with MFD (e) Noisy helix (noise shown in red color, ground truth in blue) (f) Results with MD (g) Results with LLD (h) Results with MFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.10 Experimental results evaluation of the RMSE reconstruction error of the noisy manifolds Swiss roll with a hole, circle, sinus embedded dimension D = 200, and a helix using dierent selection of k nearest neighbor . . . . 105 7.1 Experimental results on two noisy intersecting circles (a) Noisy circles (b) Results with our new deoising method for singularities (c) Zooming into the local intersection area of the noisy circle (d) Zooming into the local intersection area of the denoised output after using our method . . . . . . 122 7.2 Experimental results on noisy self-intersecting gure 8 : (a) Noiseless gure eight (b) Noisy gure eight (c) Zooming in the local singularity area: results obtained using MFD with graph based on Euclidean weights (d) Zooming in the local singularity area: results obtained using our denoising method with graph which is based on local tangent space estimation using the Tensor Voting Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.3 Experimental results on noisy intersecting spheres (a) Noisy intersecting spheres (b) The Denoised spheres using our method. . . . . . . . . . . . 124 7.4 Experimental results on noisy self-intersecting gure 8 which was also con- taminated with outliers : (a) The gure eight with both inlier and outlier noise (b) denoised output using our method. Both outlier and inlier noise is removed, and the local intersection area is preserved . . . . . . . . . . . 125 7.5 Experimental results on noisy intersecting spheres with outliers: (a) Noisy intersecting spheres with 100% outliers (b) denoised output obtained after removing inlier and outlier noise using our method. . . . . . . . . . . . . . 126 ix 7.6 Experimental results on real data-set of cyclo-octane manifold data (a) Noiseless cyclo-octane (b) Noisy cyclo-octane which was contaminated with both inlier and outlier noise. (c) Denoised output after removing both inlier and outlier noise using our method. . . . . . . . . . . . . . . . . . . . . . 127 8.1 The example above shows convergence of the local tangent space estimation by Tensor Voting on a Sphere, which was sampled with 1000 points. As the number of iterations increase, the tangent space estimation by the iterative Tensor Voting converge to the true tangent space . . . . . . . . . . . . . . 132 x List of Algorithms 1 TVG (Tensor Voting Graph construction) . . . . . . . . . . . . . . . . . . . 33 2 Ambiguity Resolution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 54 3 Non-iterative MFD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 101 4 Denoising manifolds with singularities using Tichonov regularization . . . 119 5 Denoising manifolds with singularities using a non-iterative approach . . . 119 xi Dedication Being a student and working on a PhD in particular can be described, as the metaphor of the song "Hotel California" goes, as a journey from innocence to experience. I would like to thank all those who were part of my journey through the years that made this thesis possible. xii Acknowledgements First, I would like to thank my advisor, Prof. G erard Medioni for his guidance, advice, and for giving me the opportunity to work on a PhD thesis with such interesting research topics. As I have learned from his work on Tensor Voting and other scientic contribu- tions, Prof. Medioni's profound vision in Computer Vision and AI elds is way ahead of our time. It has been a rare and unique opportunity to learn from someone that has such deep insights and vast knowledge in Computer Vision. I would like to extend another special thanks to Prof. Antonio Ortega who served on my defense committee and with whom I also collaborated on part of this thesis. I am grateful to Antonio for many hours of meetings and guidance, working on the intersection between Manifold Learning and the new exciting eld of Graph Signal Processing, of which he is also one of its principal founders. Antonio always brings new angles and perspectives that encourage precise and critical thinking in signal processing, science or just in general. It was a stimulating experience getting a window to the high intellectual sphere that Antonio lives in. I would also like to thank Prof. Aiichiro Nakano, a member my defense committee member, for numerous discussions and for his encouragement. I am also thankful to Pro- fessors Shang-Hua Teng, C.C.Jay Kuo, and Nora Anayan for serving on my qualication exam committee. It is also a pleasure to thank all of the Iris members, friends, visitors and others that I met and made my time at USC such an enriching and multi-cultural experience. In particular, Iacopo Masi, for his friendship and expertise at Matlab, Latex, and for xiii making the best pasta, as well as Tomer Levinboim, Liron Cohen, Prof. Olga Belon and Prof. Luciano Silva, Remi Trichet, Pramod Sharma, Prithviraj Banerjee, Yinghao Cai, Bo Zhang, Younghoon Lee, Prof. Tal Hassner, Jongmoo Choi, Dian Gong and Xuemei Zhao. I would like to thank my parents and brother Avi, who supported me through the years in so many ways. Especially when living so far away from my home country, it is always nice to remember my family who provide unconditional support. Finally, I would like to thank my wife, Yaeli, who made some of the challenging times throughout this journey much easier with her love, understanding, and support. xiv abstract This study addresses a range of fundamental problems in unsupervised manifold learn- ing. Given a set of noisy points in a high dimensional space that lie near one or more possibly intersecting smooth manifolds, dierent challenges include learning the local geometric structure at each point, geodesic distance estimation, and clustering. These challenges are ubiquitous in unsupervised manifold learning, and many applications in computer vision as well as other scientic applications would benet from a principled approach to these problems. In the rst part of this thesis we present a hybrid local-global method that leverages the algorithmic capabilities of the Tensor Voting framework. However, unlike Tensor Voting, which can learn complex structures reliably only locally, our method is capable of reliably inferring the global structure of complex manifolds using a unique graph con- struction called the Tensor Voting Graph (TVG). This graph provides an ecient tool to perform the desired global manifold learning tasks such as geodesic distance estimation and clustering on complex manifolds, thus overcoming one of one of the main limitations of Tensor Voting as a strictly local approach. Moreover, we propose to explicitly and directly resolve the ambiguities near the intersections with a novel algorithm, which uses the TVG and the positions of the points near the manifold intersections. xv In the second part of this thesis we propose a new framework for manifold denoising based processing in the graph Fourier frequency domain, derived from the spectral de- composition of the discrete graph Laplacian. The suggested approach, called MFD, uses the Spectral Graph Wavelet transform in order to perform non-iterative denoising directly in the graph frequency domain. To the best of our knowledge, MFD is the rst attempt to use graph signal processing [55] tools for manifold denoising on unstructured domains. We provide theoretical justication for our Manifold Frequency Denoising approach on unstructured graphs and demonstrate that for smooth manifolds the coordinate signals also exhibit smoothness. This is rst demonstrated in the case of noiseless observations, by proving that manifolds with smoother characteristics create more energy in the lower frequencies. Moreover, it is shown that higher frequency wavelet coecients decay in a way that depends on the smoothness properties of the manifold, which is explicitly tied to the curvature properties. We then provide an analysis for the case of noisy points and a noisy graph, establishing results which tie the noisy graph Laplacian to the noiseless graph Laplacian characteristics that are induced by the smoothness properties of the manifold. The suggested MFD framework holds attractive features such as robustness to k nearest neighbors parameter selection on the graph, and it is computationally ecient. Finally, the last part of this research merges the Manifold Frequency Denoising and the Tensor Voting Graph methods into a uniform framework, which allows us to eectively analyze a general class of noisy manifolds with singularities also in the presence of outliers. We demonstrate that the limitation of the Spectral Graph Wavelets in regards to the types of graph signals it can analyze can be overcome for manifolds with singularities using certain graph construction and regularization methods. The suggested approach xvi allows us to take into account global smoothness characteristics without over-smoothing in the manifold discontinuations, which correspond to high frequency bands of the Spectral Graph Wavelets. xvii Chapter 1 Introduction Recent and important emerging applications in a wide range of scientic domains require to analyze high dimensional data. The task of analyzing generally refers to inferring meaningful structures in the data, which includes tasks such as clustering, estimating meaningful distances between the data points, outliers detection, and regression. Discov- ering meaningful structure in the data can be very challenging since in practice the high dimensional inputs is given in a form of noisy, unordered high-dimensional data points with unknown complex geometric structure. A common way to cope with this problem is to assume that the sampled data lies on a lower-dimensional manifold which has an intrinsic dimension which is much smaller than the input dimensionality of the high di- mensional data, and is equipped with dierent choices of inner products on tangent spaces that allows one to measure geometric quantities such as distance and angles. Manifold learning is a eld which aims to extend classical linear methods such as PCA [40] and MDS [17] in order to address the general case of manifolds with non-linear structures. While Manifold Learning can provide eective tools to learn manifolds with complex structures which can be useful to many scientic domains, a number of core challenges still remain, 1 as there is still a large gap between manifold learning and practical applications. Taxonomy of manifold learning approaches: To provide an overview of manifold learning strengths and limitations, it would be useful to discuss the taxonomy of mani- fold learning approaches. In most cases, the rst step in manifold learning methods aim to learn a local neighborhood on a graph, which is a generic data representation that describes the relationships between the data domain. The local geometric structure at each point is dened by the setting of the local neighborhood on the graph which is then propagated to learn its global structure. As in many computer science and other scientic domains, manifold learning methods are often split into local and global approaches. The "tension" between local and global frameworks played an important role in the evolu- tion of Manifold Learning methods. For example, the rst Manifold Learning method suggested in chronological order, Isomap [58], is considered a global method since it per- forms embedding into a lower dimensional space using the geodesic distance between all pairs of points. On the other hand LLE [51], arguably the second well-known method and also second in chronological order, is known as a local method since it uses a cost function which constructs an embedding that only considers the placement of each point with respect to its neighbors. Most methods which followed Isomap and LLE (with a few exceptions of hybrid local-global methods such as Diusion Maps [16] or Semi-Denite Embedding [64]) are often considered either local or global approaches, aiming to general- ize, capitalize, and address the limitations of one of these two benchmark frameworks. For example, Laplacian Eigenmaps (LE) [4], one of the most popular methods today, which followed after Isomap and LLE, showed that it is equivalent to LLE in some situations and is identied as a local method. 2 While the approach presented in this thesis can be considered a hybrid local-global framework, we take a dierent path to manifold learning than the methods described above. We postulate that it is crucial to address both subtle geometric perspectives and practical aspects of robust Manifold Learning in a unied way in order to address the current manifold learning limitations. 1.1 Limitations of Existing Methods We now discuss in detail the specic limitations of the existing Manifold Leaning meth- ods. The main challenges in Manifold Learning are the following: Robustness to Noise. Manifold learning can provide eective tools to handle complex datasets with unknown geometric structure. However, in the presence of noise, whether it is inlier or outlier noise, their performance is degraded signicantly. Outliers is a term usually used to refer to data points that do not belong to the meaningful data structure, while inlier noise refers to data points which do not lie strictly on the manifold. Both can be very disruptive to the learning process. Typically outliers are more disruptive to global methods, while inlier noise is more disruptive to local methods. Robustness to local neighborhood graph parameter Most manifold learning algorithms begins with setting a local neighborhood size on the graph, typically by choosing the k nearest neighbors on the graph. However, due to fac- tors such as noise and varying sampling density they are very sensitive to the parameter selection such the choice ofk nearest neighbors on the graph. This sensitivity often leads 3 to signicant distortion, such as connecting points which do not belong to the same local neighborhood on the manifold. Multiple Manifolds and Manifolds with singularities Most of the manifold learning methods are based on graph constructions that connect points based on the k nearest Euclidean neighbors, and thus are not suitable for multi- manifold data, or manifolds with singularities such as intersections. Learning manifolds with singularities is a very challenging problem since the local intersection area consists of a large amount of ambiguities that is generally very hard to resolve. Moreover, pro- cessing non-linear manifolds with singularities is an issue that is scarcely addressed in the manifold learning literature. However real world problems are expected to contain many of such manifolds. For example, in motion segmentation problems, moving objects form dierent trajectories, which are low dimensional manifolds that may intersect or overlap. Even though relatively few points may be located near the intersections, their contribution to the global structure can be very disruptive to the global manifold struc- ture estimation. Manifolds with unknown, varying dimensionality and density Finding the intrinsic dimensionality of the manifold is an important step for local struc- ture estimation, and may be dicult to estimate for real world data. Moreover, the manifold may have dierent intrinsic dimensionality and density, and most Manifold Learning techniques are not equipped to deal with such cases. Low sampling rate and high curvature Manifold learning methods rely on the quality of the local structure estimation. To pro- vide a reliable estimation of the local manifold structure, data driven frameworks require 4 densely sampled data points. However, in practice, in many applications the data is sparse and under-sampled, and a reliable local structure estimation becomes very challenging. 1.2 Overview In this study, we strive to bridge the gap between the local and global approaches by addressing dierent aspects of manifold learning in a unied way. The local structure estimation tasks that we will be mainly concerned with are geometric properties such as local tangent space estimation and selection of eective local neighborhood size. The global manifold estimation which we will focus on include tasks such as global smoothness estimation, clustering, and estimating meaningful distances such as the geodesic distances on the manifold. Before we describe our proposed framework and main contributions in detail, we dis- cuss existing tools that are at the core of our approach, and the local-global structure estimation trade-o which they oer: Local structure estimation: The local structure estimation in our framework is performed using Tensor Voting [45] [46], which is an unsupervised, data-driven method that is capable of learning complex structures eciently. For an input of unordered, unstructured points, Tensor Voting pro- vides an estimation of the local geometric structure at each point, which includes local tangent space estimation, dimensionality, and the identication of local singularity area. The strength of Tensor Voting is due to its data representation and data communica- tion. The data representation is provided with second order, symmetric, non-negative 5 denite tensors, which propagates local information in the form of tensor votes that are cast directly from a point to its local structure. Unlike methods that make restrictive assumption of locally linear structures such as LLE or local PCA, Tensor Voting allows us to learn the local geometry of a nonlinear set of points. However, the existing Tensor Voting framework suers from two main limitations: rst, Tensor Voting is a strictly local approach, which is not adequate to learn the global structure of manifolds reliably. Its second limitation is that, while it is robust to a large amount of outlier noise, it can be very sensitive for a non-trivial amount of inlier noise. Global structure estimation: Given a set of unstructured points, the global structure estimation in our framework use graph-based tools. Graphs provide a exible and general tool for representing and pro- cessing data, for dierent tasks that estimate the anity/relationships between all pairs of points. In our framework, the data dened on these graphs is the collection of the samples on the manifold which we term graph-signals. Processing of signals on graphs is an emerging eld with an increasing number of applications [55], which aim to extend tools from classical signal processing to weighted graphs. While signals lying on structured domains is a well-established problem, weighted graphs are used as a comprehensive abstraction to represent a collection of data that correspond to many other interesting problems in many scientic domains. In our approach, we use the Spectral Graph Wavelet transform [37], recently proposed to analyze graph-signals. Similar to time and frequency localization trade-os provided by wavelets in structured signal domains, SGW provides a trade-o between spectral and vertex domain localization. However, in comparison to wavelets in 6 the classic signal processing setting, which is mostly concerned with piece-wise smooth signals that are independent of the regular grid, the SGW transform is less exible with regards to the type of graph signals that it can consider. Specically, when using SGWs tools, the graph and signal are closely related which means that both the domain and the observations depend on the smoothness of the manifold. Therefore, SGW are guided by the structure of the underlying graph, and do not take directly into account the particular class of signals to be processed. For these reasons, it is not clear how one can eectively apply the SGW in manifold learning applications both from algorithmic and theoretical perspectives. Robust Robust handle Robust Method/Property Non- Local Global to inlier to outlier Singularities to knn Computationally linear noise? noises? graph Ecient PCA[40] 7 7 3 3 7 7 - 3 RPCA [12] 7 7 3 3 3 7 - 7 Isomap [58] 3 7 3 7 7 7 7 7 LLE [51] 3 3 7 7 7 7 7 3 LE [4] 3 3 7 7 7 7 7 3 HLLE [23] 3 3 7 7 7 7 7 3 LTSA [66] 3 3 7 7 7 7 7 3 Diusion Maps [15] 3 3 3 3 7 7 3 3 Tensor Voting [46] 3 3 7 7 3 3 3 3 Ours 3 3 3 3 3 3 3 3 Table 1.1: Evaluation of manifold learning approaches in terms of their strengths and limitations. In the bottom row "ours" refers to the framework suggested in this thesis 1.3 Main Contributions This thesis suggests a general framework for unsupervised learning, which can eectively analyze a large class of manifolds with complex geometric structures. The set of points can possibly include singularities, given in a form of unordered, unstructured set of data points which can contain both inlier and outlier noise. Our approach addresses each of 7 the manifold learning limitations discussed above, then unies the tools developed into a coherent and whole framework for manifold learning. In Table 1.1 we show evaluation of manifold learning approaches in terms of their strengths and limitations, and also compare it to our approach. We summarize our main contributions: 1.3.1 The Tensor Voting Graph To address the limitation of Tensor Voting as a strictly local method, we suggest con- structing a novel graph, which diuses the local information of the tensor votes and infers the global structure of the manifold. In the constructed graph, the anities between the data points are based on the contribution that was made to the local tangent space es- timation at each point in the voting process from its neighbors. This graph provides an ecient tool to perform the desired global manifold learning tasks, and is called the Tensor Voting Graph (TVG). Thus, the suggested TVG overcomes its limitation as a strictly local approach, and allows us to eciently and reliably perform global manifold structure estimation tasks such as geodesic distance estimation and clustering. 1.3.2 A general framework to the intersecting Multi-Manifolds problem Building on the proposed TVG, we suggest a general framework to handle intersecting manifolds or manifolds with singularities, which is dierent from previous research in clus- tering multi-manifolds [63] [34], [29], as we explicitly and directly resolve the ambiguities near the intersections. Although the local intersection area constitutes a small part of the manifolds, we show that it contains critical information, which is necessary to achieve good clustering performance. We demonstrate the advantage of using an explicit and 8 direct approach to resolve the local intersection ambiguities, which can successfully han- dle challenging geometric structures such as when the tangent spaces at the intersection points are not orthogonal. 1.3.3 MFD: A new Manifold Frequency Denoising Framework based on Analyzing Manifold Frequencies We propose a new framework for manifold denoising-based processing in the graph Fourier frequency domain, derived from the spectral decomposition of the discrete graph Lapla- cian. The suggested approach, called MFD, uses the Spectral Graph Wavelet transform in order to perform denoising directly in the graph frequency domain. To the best of our knowledge, MFD is the rst attempt to use graph signal processing [55] tools for manifold denoising. It is dierent from previous work on image denoising based on graph signal processing [50] in that the data is unstructured and our smoothness prior explicitly assumes that the original data lies on a smooth manifold. Our suggested Manifold Frequency Denoising framework holds attractive features, which are important for practical applications; It is robust against a wide range of pa- rameter selections (such as the choice of k nearest on the graph), it is computationally ecient (for sparse data its computational complexity is O(ND) where N is the total number of points and D is the ambient dimensionality of the data), and it does not re- quire the knowledge of the intrinsic dimensionality on the graph, which, as mentioned, can be very dicult to estimate in practice. 9 1.3.4 Theoretical Foundation and Analysis for Manifold Denoising Based on Graph Signal Processing Tools We provide theoretical justication for our Manifold Frequency Denoising approach on irregular graphs. We show that for smooth manifolds the coordinate signals also exhibit smoothness (i.e., the maximum variation across neighboring nodes is bounded). This is rst demonstrated in the case of noiseless observations, by proving that manifolds with smoother characteristics create more energy in the lower frequencies. Moreover, it is shown that higher frequency wavelet coecients decay in a way that depends on the smoothness properties of the manifold, which is explicitly tied to the curvature properties. The theoretical results also establish a new connection between smoothness in the graph domain and the graph frequencies using the curvature properties of the manifold. We then provide an analysis for the case of noisy points and a noisy graph, establishing re- sults which tie the noisy graph Laplacian to the noiseless graph Laplacian characteristics, induced by the smoothness manifold properties and the graph construction properties (e.g, number of k-hopes on the graph). Using these results, we nally characterize the decay of the noisy spectral graph wavelets. 1.3.5 A Unied Approach for Learning Noisy Manifolds with Singularities We combine the TVG and MFD approaches and extend our suggested approaches further into a uniform framework that eciently and reliably processes noisy manifolds with singularities, including cases where the local intersection area is noisy, and also in the presence of outlier noise. The Spectral Graph Wavelets provide vertex and spectral localization that can serve as useful tools in our MFD framework. However, in comparison 10 to wavelets in the classic signal processing setting, the SGW transform is less exible with regards to the type of graph signals it can process. We show that by using local tangent space based construction and then denoising each one of the SGW bands we can overcome this limitation, allowing us to perform manifold denoising eciently on manifolds with singularities. This framework allows us to take into account the ne-grain regularities of the manifold, and thus it avoids over-smoothing at discontinuities. With the Tensor Voting Graph used as a local tangent space graph we achieve the design goals. 1.4 Outline The rest of this thesis is organized as follows: Chapter 2 provides an overview of related research. Chapter 3 provides an introduction to Tensor Voting which serve as core tools in our framework. Chapter 4 introduces the proposed Tensor Voting Graph and illustrates the performance of the TVG approach in estimating geodesic distances of non-intersecting and self-intersecting manifolds, in clustering multiple manifolds both on synthetic data, and on real data for the problem of motion segmentation. Chapter 5 presents our general framework for learning manifolds with singularities using TVG and our suggested ambi- guity resolution algorithm, and illustrates experimental results on challenging datasets such as when the maximal principal angle between the local tangent spaces is small. Chapter 6 presents our new framework for Manifold Denoising based on Spectral Graph Wavelets, and provides the theoretical justication of our approach. We then show a wide range of experimental results, both on synthetic and real datasets, that demonstrate the advantage of our new denoising approach. Chapter 7 presents our unied framework for 11 denoising manifolds with singularities using the TVG and the MFD methods. Chapter 8 concludes our work and discusses future research. 12 Chapter 2 Related Work We provide an overview of related works in analyzing data in high dimensional space which are organized into the following main categories: i) classic methods in linear manifold learning, ii) state of the art methods in non-linear manifold learning, which study the single manifolds case, iii) multiple manifolds, and in part (iv) we provide a review for related Manifold Denoising approaches. 2.1 Classic Methods i) Principal Component Analysis(PCA) [40] is one of the most classical methods used in to analyze data in high dimensional space. The key idea of PCA is to nd the low- dimensional linear subspace which captures the maximum proportion of the variation in the data. It is possible to prove that the projection from high dimensional space to a lower dimensional space keeps the greatest possible proportion of the variation in the data. PCA gives a natural dimension reduction: if all the data lie in a low dimensional linear subspace of a high dimensional space, then PCA will nd such linear subspace, since the variation in the direction that are orthogonal to the embedded linear subspace 13 will equal to zero. MDS (Multidimensional Scaling) [17] refers to a group of well-known methods that found application in many scientic areas [1]. The main idea of MDS is to nd a mapping from a high-dimensional space to a low-dimensional space, such that the pairwise distances between the observed points are preserved the best. More specically, given a matrix of distances between data points, MDS attempts to nd an embedding into a lower dimensional space such that distances are best preserved. It uses the eigenvectors of a Gram matrix, which contains the inner products between the points in the analyzed dataset, to dene a mapping of data-points into an embedded space that preserves most of these inner-products. The main limitation of MDS method is that it is based on Euclidean distances, and it does not takes into account the distribution of the neighboring data- points. For example, if the high dimensional data lives on a high-dimensional data that lies on a curved manifold, MDS will consider two data points which are close in Euclidean distance as close, although their manifold distance may be far away. 2.2 The single manifold case The early methods in manifold learning, starting with Isomap[58], and LLE[51], which were inspired from PCA and multi-dimensional scaling (MDS), introduced the model of high dimensional data that lie on a smooth low dimensional Riemannian manifold. Under this manifold assumption, a dataset is assumed to be sampled from a Euclidean submanifold with a relatively small intrinsic dimension. The ambient high-dimensional Euclidean space of the manifold is dened by the original parameters of the dataset. These parameters are mapped via nonlinear functions to low-dimensional coordinates 14 of the manifold, which represent the independent factors that control the behaviors of the analyzed phenomenon. Isomap [58] attempts to preserve pairwise geodesic distances between data points. Geodesic distance is the true distance between points which are lying on the manifold. In Isomap, the geodesic distances between the data points are computed by constructing a neighborhood graph, in which every data point x i is connected to itsk nearest neighbors in the data-set. Then, the shortest path between two points in the graph forms a good (over)estimate of the geodesic distance between these two points, which can be computed using Dijkstra's [19] or Floyd's shortest-path algorithm. The geodesic distances are computed between all the data points, forming a pairwise geodesic distance matrix. The last stage compute a low-dimensional representation in a low dimensional space for each data point x i by applying MDS on the geodesic distance matrix. Isomap is considered as a global technique non-linear dimensionality reduction that attempts the preserve the global properties of the data. Isomap main weakness is that it may fail if the manifold is non-convex. LLE [52] is considered as a local method for non-linear dimensionality reduction that attempts to preserve local properties of the data. LLE constructs a graph representation as follows: rst, the k nearest neighbors of each point is determined. Then, each data point x i is described by writing it as a linear combination of its k nearest neighbors. Thus, LLE ts a hyperplane through each data point and its nearest neighbors. The local linear assumption implies that the reconstruction weights for each data point are invariant to translation, rotation and rescaling. The dimensionality of the embedding has to be given as a parameter, since it cannot always be estimated from the data. Moreover, the output is an embedding of the given data, but not a 15 mapping from the ambient to the embedding space. LLE is not isometric and often fails by mapping distant points close to each other. An in ux of other important works followed, such as Laplacian Eigenmaps [4], HLLE [24] and Diusion Maps [15], exploring alternative methods and dierent aspects of the manifold learning problem, where the main goal is to discover the geometry and the topology of the data. Laplacian Eigenmaps [4], similar to LLE, is a local technique to nd low dimensional representation by preserving local properties of the manifold. In Laplacian Eigenmaps, the local properties are based on the pairwise distances between near neighbors. Laplacian Eigenmaps compute a low-dimensional representation of the data in which the distances between a datapoint and its k nearest neighbors are minimized such that the distance in the low-representation data neighbor contribute more to the cost function than the distance between the data point and its second nearest neighbor. Using Spectral Graph theory, the minimization of the cost function is deed as solving an eigenproblem. The Laplacian Eigenmaps algorithm can be viewed as a generalization of LLE, since the two become identical when the weights of the graph are chosen according to the criteria of the latter. Much like LLE, the dimensionality of the manifold also has to be provided, the computed embedding are not isometric and a mapping between the two spaces is not produced. HLLE [24] developed an approach similar to LLE, which computes the Hes- sian instead of the Laplacian of the graph. HLLE starts with identifying the k nearest neighbors for each data point using Euclidean distance. Assuming that a local linear neighborhood is present at that neighborhood, it estimate a basis for the local tangent space at each point by applying PCA on its k nearest neighbors. Then, an estimator 16 for the Hessian of the manifold at the point x i in local tangent space coordinate is esti- mated, which is then orthonormalaized. The eigenvectors corresponding to the Hessian estimators are selected from the matrix which contain the low representation of the data. Diusion Maps [15] are based on dening a Markov random walk on the graph of the data, which is computed using the Gaussian Kernel of the data. Performing the random walk for a number of time steps provides a measure for the proximity of the data points. This measure denes the diusion distance. The diusion maps distance is based on many paths through the graphs, which makes the diusion distance more robust to inlier noise. One of the important approaches for manifold learning is tangent space learning, which includes LTSA [66], Manifold Charting [8] and more recently Riemannian Mani- fold Learning [43], non-Isometric Manifold Learning [20], and Vector Diusion Maps [56]. Local Tangent Space Alignment(LTSA) [66] is a method that that describes local proper- ties of the high dimensional data using the local tangent space of each data-point. LTSA is based on the observation that, under the assumption of local linearity of the manifold, there exists a linear mapping form the high dimensional data-point to its local tangent space, and there exists a linear mapping from the corresponding low dimensional data- point to the same local tangent space space. LTSA attempts to align these linear mapping such that they construct the local tangent space of the manifold from the low dimensional representation. Other research related to ours includes the charting algorithm of Brand [8]. Manifold Charting [8] computes a pseudo-invertible mapping of the data, as well as the intrinsic dimensionality of the manifold, which is estimated by examining the rate of growth of the number of points contained in hyper-spheres as a function of the radius. 17 Linear patches, areas of curvature and noise can be distinguished using the proposed measure. At a subsequent stage a global coordinate system for the embedding is dened. This produces a mapping between the input space and the embedding space. In [20] the data are not embedded in a lower dimensional space. Instead, the local structure of a manifold at a point is learned from neighboring observations and represented by a set of radial basis functions (RBFs) centered on K points discovered by K-means clustering [38]. The manifold can then be traversed by walking on its tangent space between and beyond the observations. Representation by RBFs without dimensionality reduction allows the algorithm to be robust to outliers and be applicable to non-isometric manifolds. More recent works include [26], [14]. The methods mentioned above can work well when the manifold has rather simple topological properties: smooth with no singularities and isometric to Euclidean space [33]. When one or more of these properties are violated, the methods mentioned above will not provide a good solution. Moreover, real world data often include situations where the data consists of manifold with intersections, multiple intersecting manifolds, or other types of singularities. 2.3 The multi- manifold case The multi-manifold case addresses the generic setting where the clusters are low-dimensional manifolds that possibly intersect or overlap. Early methods in multi-manifold clustering assumed that the manifolds are well separated [65], or that the intersecting manifolds 18 have dierent intrinsic dimension or density [31]. The case of linear intersecting mani- folds is addressed by numerous works that developed algorithms and theoretical results for these cases. These includes GPCA [60], Sparse Subspace Clustering (SSC) [25] [27], and Spectral Curvature Clustering [13]. GPCA t the data with a polynomial whose gradient at a point gives the normal vector to the subspace containing that point. Sparse subspace clustering uses the sparsest representation produced by l1-minimization to de- ne the anity matrix of an undirected graph. Then subspace segmentation is performed by Spectral Clustering algorithms [48] [54]. Under the assumption that the subspaces are independent, SSC shows that the sparsest representation is also block-sparse. SSC have shown state of the art results in a number of Computer Vision applications such as motion segmentation. However, the main limitation of GPCA and SSC is that can only handle linear structure. More recently, a number of works address the challenging case of clus- tering non-linear intersecting multi-manifolds. Semi Supervised Manifold Learning [32] developed a model based on the mixture of manifolds in the context of semi supervised learning, also providing theoretical prospectus to quantify the value of using unlabeled data in this multi-manifold setting. The rst step of their method is to sub-sample the data points, such that it obtains 'sub-samples centers' [32]. For the remaining data ob- tained from the subsampling procedure, ak-nearest-neighbor graph is then dened on the centers in terms of the Mahalanobis distances, and Spectral Clustering [48] is performed on the anity matrix. Unsupervised methods for multi-manifold clustering include Spec- tral clustering on multiple manifolds [63], Robust Multiple Manifold Structure Learning [34], and Spectral Clustering using local PCA [29]. Spectral Clustering on Multiple Man- ifolds [63] t a mixture ofd-dimensional ane sub-spaces to the data, which is then used 19 to estimate the tangent subspaces at each data point. [34] rst estimates local tangent space using a version local PCA and denes an anity that incorporates the self tuning method of [65]. In all of these methods , the nal stage applies Spectral Clustering to the anity matrix. Spectral Clustering using local PCA [29] introduced a variant which sub-samples the data (as in [32]) in order to detect an abrupt event of intersection, then compute the local covariance of the sub sampled data. The anity which was used in practice was based on local projection on the eigenvectors of the covariance matrix. Spec- tral Clustering using local PCA [29] also provides a theoretical analysis on the behavior of the manifolds near the intersections. The common characteristic of these methods is that they all construct an anity matrix which is based on local tangent space distance. However, these methods do not address the local intersection area explicitly, and are limited to handle cases when the maximal principal angle at the intersection is relatively large. Also, these methods are especially sensitive in the presence of outliers, where only Robust Multiple Manifold Structure Learning [34] can handle a relatively small amount of outliers. The case of the intersections problem which corresponds to 2D junctions has important applications in image understanding. Studies of human perception and recog- nition indicate that 2D junctions play a fundamental role in many tasks in perceptual grouping such as contour grouping, object detection, and shape recognition [7],[62]. The inherent characteristic of junctions is that they contain a large amount of ambiguities, which pose a very challenging task. As a result previous methods either discard or ignore this information, which is necessary to perform junction type inference in an unsupervised manner. . 20 2.4 Manifold Denoising Denoising is very important for practical manifold learning, as most of the manifold learn- ing approaches, e.g., [58] [51], [4], [66], assume that the data lies strictly on the manifold, and are known to be very sensitive to noise. Thus a number of methods have been pro- posed to handle noisy manifold data. The state of the art methods include manifold denoising (MD) [39], and locally linear denoising (LLD) [36]. Also related are statistical modeling approaches for manifold learning such as Probabilistic non-linear PCA with Gaussian process latent variable models (GP-LVM) [41] and its variants for manifold denoising [30]. Among the state of the art methods, the method most related to our work is MD, which applies a diusion process on the graph Laplacian, using an iterative procedure that solves dierential equations on the graph. It is shown that each iteration in the diusion process in MD is equivalent to the solution of the regularization problem on the graph [39] (the regularization problem on the graph solved by MD is also known as Tikhonov regularization). The main limitation of MD is over-smoothing of the data and sensitivity to the choice of k nearest neighbor construction graph (as mentioned in [39]) especially at high noise levels. LLD partially overcomes the over-smoothing limitation of MD, instead partitioning the manifold into local patches and performing denoising locally. However, this procedure includes 3 stages which are performed independently : the results are that the denoised points may belong to the same local structure they be- long to, but their regularization procedure does not necessarily provides a locally smooth manifold structure. Also note that for the case of linear manifold denoising there exists methods which provides powerful tools such as Robust PCA (RPCA) [12] and its many 21 extensions, however the the main limitation of RPCA is that it assumes globally linear model. Our work is inspired by a classical approach for wavelet-based denoising [22], and its many extensions to image denoising [11]. Some recent work [50] has explored graph-based techniques for image denoising for structured domains. However, the more general case of irregular domains is much less understood, and to the best of our knowl- edge, this work is the rst attempt to explore spectral wavelets for manifold denoising on unstructured data. 22 Chapter 3 Tensor Voting In this chapter we review the Tensor Voting, which serve as the core tool in our framework for local structure estimation. 3.1 Introduction Tensor Voting [45] is an unsupervised learning framework based on strictly local compu- tations, which can infer complex geometric structures from sparse and outlier noisy data by a local voting process which simultaneously considers all possible geometric structures. The early work in Tensor Voting was motivated by the need to create an automated per- ceptual organization system which perform grouping tasks. In its basis, it is an attempt to implement two Gestalt principles: proximity and good continuation, which group generic points, also referred in early work as tokens, in the 2D domain. The same grouping principles of proximity, similarity, and good continuation can be also be applied to spaces in higher dimensions. More recently, [46] suggested a simplied vote generation scheme that allows the direct computation of votes from data in arbitrary dimensions. In the new scheme, votes are cast directly from the voter to the receiver which are not retrieved from 23 pre-computed voting elds, and have perfect certainty in the information they convey, with the uncertainty only coming from the accumulation of votes from dierent votes at each token. This extended version of TV to higher dimensions, which overcomes the com- putational bottleneck in higher dimensions, is compelling to handle problems in machine learning and manifold learning, where many problems require ecient, local data driven algorithms which are robust to higher levels of noise. 3.2 Review of the Tensor Voting Framework The Tensor Voting framework consists of three important aspects: 1.Tensor representation: each point is encoded as a second order, positive semi de- nite symmetric tensor (which is equivalent to a symmetric positive semi-denite NN matrix), and an ellipsoid in N-D space. In the Tensor Voting framework, a tensor repre- sents the structure of a manifold going through the point by encoding the normals to the manifold as eigenvectors of the tensor that correspond to non-zero eigenvalues, and the tangents as eigenvectors that correspond to zero eigenvalues. The tensors can be formed by the summation of the direct products ~ v i ~ v t i of the eigenvectors that span the normal space of the manifold. The tensor at a point on a manifold of dimensionality d and with ! v i corresponding to the unit vectors that span the normal space is computed as follows: T = d X i=1 ! v i ! v i t (3.1) A point, which has no orientation, can be represented by a ball tensor which contains all possible normals and is encoded as the NN identity matrix. Any token on a 24 manifold of known dimensionality and orientation can be encoded in this representation by appropriately constructed tensors, according to Equation 3.1. 2. Information propagation: The core of the Tensor Voting framework is the way information is propagated from token to token. Given a tensor at O and a tensor at P, the vote the point at O (the voter) casts to P (the receiver) has the orientation the receiver would have, if both the voter and receiver belong to the same structure. The stick tensor voting is the fundamental voting element from which all other voting types and voting in higher dimensions can be derived. The following equation denes the stick tensor voting: S vote =DF (s;k;) 2 6 6 4 sin(2) cos(2) 3 7 7 5 sin(2) cos(2) (3.2) DF (s;k;) =e s 2 +ck 2 2 = arcsin( ~ v ^ e i k~ vk );s = jj~ vjj ksink ; = k2sink jj~ vjj In the above equation,s is the length of the arc between the voter and receiver (OP), ~ v is the vector connecting O and P,e i is the normal vector at the voter, is its curvature which can be computed from the radius of the osculating circle, is the scale of voting, which controls the degree of decay with curvature, and c is a constant dened in [46]. The magnitude of the vote is a function of proximity and smooth continuation, and is called the saliency decay function. No votes are cast if the receiver is at an angle larger than 45 with respect to the tangent of the osculating circle at the voter, in order to limit 25 votes that are due to high curvature or from unrelated points. 3. Voting analysis: Given an NN second order, symmetric, non-negative denite matrix, the type of structure encoded in it can be inferred by examining its eigensystem. Any such tensor can be decomposed as in the following equation: T = X i ^ e i ^ e i t = ( 1 2 ) ^ e 1 ^ e 1 t + +( 2 3 )( ^ e 1 ^ e 1 t + ^ e 2 ^ e 2 t ) +::: N ( ^ e 1 ^ e 1 t + ^ e 2 ^ e 2 t +::: ^ e N ^ e N t ) where i are the eigenvalues in descending order of magnitude and ^ e i are the cor- responding eigenvectors. Based on the tensor spectral decomposition, the normal and tangent spaces, structure type, dimensionality and outliers are derived. The estimated local intrinsic dimensionality is given by the maximum gap in the eigenvalues such that if the maximum eigenvalue spread is d d1 , the estimated local intrinsic dimensionality is Nd , and the manifold had d normal and N-d tangents. Since votes are done using normals, the rst d eigenvectors correspond to the largest eigenvalues are the normals to the manifold, and the remaining are the tangents. Intersection identication: The voting results at each Tensor can be used to indicate whether a point belong to a smooth manifold part or to a local intersection area. Tensor Voting Limitations: The main limitation of the Tensor Voting framework is that it is a strictly local method, 26 (a) Stick Vote (c) A Generic Tensor Vote (b) Ball Vote Figure 3.1: Vote generation for a stick ball voter, and a generic vote generation. The votes are functions of the position of the voter A and receiver B and the tensor of the voter. and performing global operations such as estimating geodesic distances and clustering, is not reliable. For example, to estimate geodesics distances on manifolds, previous methods using TV resort to an iterative, non-linear interpolation methods [46] that march on the manifold by projecting the desired direction from the starting point. As pointed out in [46], this process is very slow and unreliable, and also diverges in congurations where points on the path are in deep concavities. 27 Chapter 4 The Tensor Voting Graph In this chapter we present the Tensor Voting Graph - a novel graph constriction which addresses one of the main limitations of Tensor Voting as a strictly local framework. We aim at incorporating the robust properties of Tensor Voting to infer the local manifold geometric structure into a unied framework that can infer the global structure of mani- folds with complex topological structures. 4.1 Introduction Given a set of points in a high dimensional space that lie on one or more smooth manifolds, possibly intersecting, a number of dicult and interesting problems need to be addressed: rst, we may want to estimate the local structure at each data point, which includes estimation of dimensionality, and local normal and tangent spaces. Second, we may need to infer global properties, such as the geodesic distance between 2 points of the manifold. Also, given multiple manifolds, possibly intersecting, we need to dene membership and distance to a new data point. 28 INPUT POINTS TensorVoting Tensor Voting Graph Associated Tensors Associated tangent/normal space .. .. ij w ✁ ✂ ✄ ✂ ✄ ✂ ✄ ☎ ✆ ✝ ✞ ✟ ✞ ✠ w ij Figure 4.1: Overview of the suggested Tensor Voting Graph Construction Method Additionally, although some signicant progress has been made in manifold learning with noise-free data [58],[52], [4], handling outlier noise is a critical and practical issue which signicantly degrades the performance of state of the art methods. These problems are ubiquitous in unsupervised manifold learning, and many appli- cations, especially in computer vision would benet from a principled approach to these problems. A method that can reliably estimate the desired local properties mentioned above is Tensor Voting - a local, non parametric, perceptual organization framework that can infer the local geometric structure from sparse and noisy data using a local voting process. However, Tensor Voting does not provide a reliable and ecient way to learn the global structure of a complex manifold. To achieve this goal, we construct a graph in which the anity between points is based on the contribution that was made at each point in the voting process from its neighbors. This graph provides an ecient tool to perform the desired global manifold learning tasks, augments the Tensor Voting framework, and is called the Tensor Voting Graph (TVG). This hybrid local-global approach can eciently 29 encode and represent the global properties of the manifolds and also serves to identify separate manifolds if appropriate. As the TVG builds on the Tensor Voting framework, it leverages its algorithmic features that include robustness to a large amount of outliers, a reliable estimation of the tangent space, dimensionality and structure type, and the ability to handle the simultaneous presence of multiple structure types. We summarize our contributions: Novel graph construction: Our Tensor Voting Graph encodes the contribution made to the tangent space estimation at each point by neighboring points. A general framework for manifold learning: The new construction provides a general framework, which incorporates a rich geometric structure into a graph that can analyze a large class of manifolds, including those with intersections. Global structure estimation: The TVG can be used to learn the manifold's global properties, using well known tools from graph theory. This includes ecient marching and geodesic distance estimation on manifolds using the Dijkstra algorithm, and clustering and classication using Spectral Clustering. We have validated our approach on complex manifolds and Computer Vision applica- tion data. Experimental results demonstrate that our method signicantly outperforms the state of the art methods on a set of manifolds with intricate manifold structure. In Computer Vision applications we demonstrate that our method achieves comparable results to the state of the art methods, with much more graceful degradation when con- taminated with outliers. 30 4.2 Related Work We present an overview of related previous research in manifold learning, manifold clus- tering and Tensor Voting. 4.2.1 Manifold Learning The central assumption in manifold learning is that the data lie on a low dimensional manifold that captures the dependencies between the observable parameters of the data. The main problem in manifold learning is to explore the geometry and the topology of the manifold. Most of the methods in manifold learning are inspired from PCA and multi- dimensional scaling (MDS), based on the assumption that nonlinear manifolds can be approximated by locally linear parts. Some of the most popular approaches in manifold learning are Isomap[58], LLE[52], Laplacian Eigenmaps [4], HLLE [23], LTSA [66] and Diusion Maps [15], and more recently Vector Diusion Maps[56] which generalize Diu- sion Maps [15] using tangent space learning. Also related are manifold clustering methods that addresses the limitation of the k nearest neighbors and " ball graph constructions. This includes Sparse Subsapce Clustering [25], Spectral Clustering on multiple Manifolds [63], and Robust Multiple Manifold Structure Learning [34]. The main limitation of these methods is that they are sensitive to a large amount of outliers. 4.3 The Tensor Voting Graph Construction To perform global operations on complex manifolds, possibly with singularities, we sug- gest to construct a graph which encapsulates anities between data points which is based 31 Figure 4.2: Illustration of the TVG: points b and e that are close in Euclidean distance but far in manifold distance have zero anity, whereas the points b and c are connected with high anity corresponding to small distance on the manifold. on local tangent space distance. In the voting process that occurs in the Tensor Voting paradigm, points emit tensor votes to their neighbors to estimate the local tangent and normal space at each point. Here, we suggest constructing a graph using the reverse tensor votes: at each point, we estimate the contribution which was made to the local tangent space by the neighboring points that participated in the voting process. Thus, the anity between points is not only a function of the local tangent space orientations, but is also highly correlated with the majority of the votes that contributed to each points local tangent space, which provides a measure of distance to these points on the local manifold. Figure 4.2 illustrates the Tensor Voting Graph construction: points which are close in Euclidean distance but far in manifold distance have zero anity, since the anity is dened using local tangent space distance similarity. Points that lie close on the same manifold part have small tangent space distance and hence high anity value. Formally, given a set of n unlabeled data points x 1 ;::x n , and the normal spaces of each point x i , O i =f(u i ) 1 ;::; (u i ) d g. Let ~ O ij =f(~ v ij ) 1 ;::; (~ v ij ) d g correspond to the subspace of the normal votes which were emitted from point x j at x i using Tensor Voting. Given these votes, the reverse tensor votes are encapsulated in the GraphG = (X;W ) with weightsw :E!R where the anity value is based on the principal angles between 32 the normal spaces O i and the subspace ~ O ij . Given two subspaces, O i , ~ O ij , the maximal principal angle between the subspaces is dened as follows[49]: f(O i ; ~ O ij ) = min u2O i max ~ v2 ~ O ij hu;vi (4.1) We use the anity function in Equation 4.1, to dene the anity value w ij in our TVG as w ij = 8 > > > > > > > > > < > > > > > > > > > : f(O i ; ~ O ij ) if x j 2 kNN(x i ) and arcos(f(O i ;O ij )< 45 ; 0 else (4.2) Where kNN(x i ) denotes the k nearest neighbors of x i . The anity matrix denes for each point itsk nearest neighbors in terms of local tangent space proximity according to the largest anities : w i j1 w i j2 :::w i jk . Algorithm 1: TVG (Tensor Voting Graph construction) Input: The data set x (possibly with outliers), is the tensor voting scale, k for 1 the number of k nearest neighbors on the local tangent space. 1. Perform Tensor Voting, TV (x) (rst iteration). 2. Remove outlier noise: points x i , for which 1 <<, are removed. 3. For each x i , normalize its Tensors eigenvalues l = l = P N i=1 i . 4. Rene tangent space estimation using second iteration TV (x). 5. Compute the anity matrix W2R NN ; (W ) ij =w ij , using Equation 4.2. 6. Fix the symmetry of the similarity matrix: W = (W +W T )=2. Output: Tensor Voting Graph G = (X;W ), and the local geometrical structure at each point x i : (O i ;O ? i ;f i g N i=1 ) Since the similarity graph is founded on the local tangent space, the strength of the anity value between data points can be classied into three categories: (I) Points which 33 are close on the local tangent space have small geodesic distance and a large anity value. (II) Points which are far away in geodesic distance but are close in terms of Euclidean distance, such as in the vicinity of intersections or high curvature, have small anity values. (III) Points which are far in Euclidean distance have zero anity value. Thus the constructed Graph summarizes both local and global relationships within the whole data set. Once the graph is built, we can estimate the geodesic distances eciently and per- form clustering and classication tasks using well known graph methods. Note that in general, votes are not symmetric, and thus directly using the reverse tensor votes would result in a directed graph construction. The choice of the graph construction using the reverse tensor votes may depend on the desired application. In this paper we have focused on clustering and geodesic distance applications using Spectral Clustering and Dijkstra's algorithm [19] which rely on undirected graph, so a k nearest neighbors graph is con- structed by symmetrizing the pairwise anities. Also note that the constructed graph is sparse by construction, which is a valuable property for classication purposes and also computationally ecient. To compute geodesic distances, rst the TVG is constructed, which estimates for each point its k nearest neighbors on the local tangent space. Then, a new Euclidean distance graph is constructed, in which edges weights between pairs of points are connected and set to equal the Euclidean distance only between pairs which arek nearest neighbors on the Tensor Voting Graph. Then the algorithm of Dijkstra [19] is applied on the former constructed Graph to estimate the shortest distance paths. For clustering purposes, the value of an edge on the Graph is set directly to the anity value on the TVG, and the unnormalized Spectral Clustering method is applied to the anity 34 matrix. The Tensor Voting Graph G = (X;W ) construction algorithm is described in the pseudo-code algorithm 1. Note that the complexity is O(Nnlogn) for the Tensor Vot- ing computation[46] and O(n 2 N 2 d) for computing the anity between the local tangent spaces, wheren;N, andd correspond to the number of points, the ambient space and the normal space dimensionality, respectively. 4.3.1 Relationship with Previous works We highlight several properties of the chosen anity function, and also describe the anities which are used in related methods. In general, most of the current intersecting multi-manifold learning methods use some kind of local tangent space measure. [32] used Hellinger distances of the probability distributions N(0;C i ) andN(0;C j ) to compare covariance matrices: w ij = 1 2 D=2 det(C i ;C j ) 1=4 det(C i +C j ) 1=2 ! (4.3) whereC i andC j are full -rank covariance matrices corresponding to the points x i ; x j . [63] use a nearest neighbors graph and the anity value is dened as w ij = 8 > > > > > > > > > < > > > > > > > > > : Q d s=1 cos s (i;j) if x j 2 kNN(x i ) and x i 2 kNN(x j ) 0 else (4.4) 35 where 1 (i;j) 2 (i;j);::: d (i;j) are the principal angles between the estimated tangent spaces of the points x i and x i , and is a tuning parameter. [34] dene an anity value that incorporates the self-tuning method of [65] : w ij = exp kx i x j k 2 i j 2 (O i ;O j ) kx i x i k 2 2 =( i j ) ! (4.5) where i is the distance from x i to its k nearest neighbors, (O i ;O j ) measures the principal angle between the tangent spaces O i ;O j , and is a tuning parameter [34] . Spectral Clustering based on local PCA rst sub-sample the data, then compute the sample covariance C i at each of the sub sampled points y i and its orthogonal projection Q i onto the space spanned by the top d eigenvectors of C i . The anity between the sub-sampled data is given by: w ij = exp ky i y j k 2 2 ! exp kQ i Q j k 2 2 ! (4.6) where is a spatial scale and is a projection scale. In order to achieve good performance when using Spectral Clustering, we need to construct an anity matrix such that points which belong to the same local manifold part are connected with high anity value, and remove spurious connections between points which belong to distinct manifold. In other words,w ij 1 if x i and x j belong to the same 36 local part of the manifold cluster andw ij 0 if x i and x j are in distinct clusters. This can be quite challenging, since the Euclideank nearest neighbors can include points from two distinct manifolds. As described in the previous section, the anity function which is used to construct the TVG is based on the contribution that was made to each point's local tangent space by the normals emitted by that point. This anity has advantages in some geometric conguration, in comparison to a straightforward local tangent space distance or covariance matrix. In particular, we observe that it is less sensitive to the choice of the Euclidean distance in the neighborhood which includes points from dierent clusters, where spectral clustering partitioning can be very challenging. These will be elaborated on in the next example. In the example (shown in Figure 4.3 for illustration), two curves are sampled with 6 points each (points with the same color belong to the same manifold, see Figure 4.3 (a)). The points are sampled such that each point has at least 1 Euclidean nearest neighbor among its 3 nearest Euclidean neighbors which does not belong to its true local geometric structure. Figure 4.3 (d) show the anity matrix results by using the TVG, where the red entries in the anity matrix indicate high anity value and the blue entries indicates zero or very low anity value. Although this is a challenging geometric conguration, the TVG constructs a good anity matrix such that points from the same cluster are connected with a high anity value, while points from distinct manifolds are connected with a very small or zero anity value. On the other hand, as shown in Figure 4.3(c)), the anity matrix obtained by using the covariance matrix (taking the spectral norm between the covariance matrix of each pairwise points [29]) is not reliable, and many points from dierent manifolds are connected with high anity. 37 (a) Sampled points (b) Ground Truth (c) Local PCA (d) TVG Figure 4.3: In the example shown in Figure (a), 6 points are sampled from two distinct manifolds (points with the same color belong to the same manifold), such that for each point, among its 3 nearest Euclidean neighbors, at least 1-nearest neighbor belong to a dierent manifold. Figure (b) plots the ground truth anity matrix (red color entries- high anity, blue color entries - low anity. Figure (c) and (d) show the anity matrices constructed by covariance matrices distance and our suggested TVG, respectively. . 4.4 Experimental Results 4.4.1 Geodesic distance comparison on inliers We evaluate the suggested Tensor Voting graph and compare it to some of state of the art algorithms in manifold learning: LLE, Isomap, Laplacian Eigenmaps, HLLE, and LTSA. All these methods use the kNN graph as the rst stage prior to embedding. In all experiments, we perform a numerical evaluation of the geodesic distance error between all 38 Table 4.1: Comparison with state of the art on geodesic distances: I corresponds to experiments with inliers noise, and I&O corresponds to data contaminated with outliers. The results reported are in terms of err G D percent. Data/Method ISOMAP LLE LE HLLE LTSA TVG I I&O I I&O I I&O I I&O I I&O I I&O Cylinder 0.2 4.8 8.5 7.8 15.3 17.2 8.5 - 7.1 8.5 0.25 0.76 Torus 12.0 12.28 11.7 12.3 19.4 17.8 15.8 - 11.6 11.7 0.3 0.14 Ennepers 13.4 28.3 21.4 26 29.4 65 25.5 - 22 22 0.9 1.3 Helicoid 8.4 14.8 23.4 38.8 23.8 38.8 20.6 - 24.8 19.0 0.2 0.17 pair wise points on the manifold, where in our method we measure the geodesic distance between the points of each pair in the input space, and in the embedding space for all other methods. For the evaluation of the geodesic distance error we use the following measure (used also in [20] [46]) , the geodesic distance error is evaluated using: err G D = X i;j jd(i;j)d est (i;j)j d(i;j) whered ij correspond to the ground truth estimated andd est (i;j) is the estimated distance for a given method. We tested a wide range of k for the number k nearest neighbors for all algorithms, and the results are reported for the best k only. For each embedding method (except Isomap), we computed a uniform scale to minimize the error between the computed distance and the true geodesic distances. Manifold data sets The dataset used for the experiments is a cylinder section which spans 150 and consists of 2000 points, torus consists of 2000 points, Enneper's surface with 3000 points, Helicoid surface with 2500 points, and a torus curve with 800 points. The geodesic distance of the cylinder manifold can be computed analytically, and the other manifolds were sampled densely such that the shortest distance paths using 39 0 1 2 3 4 −2 −1 0 1 2 −4 −2 0 2 4 y x z −5 0 5 −5 0 5 −8 −6 −4 −2 0 2 4 6 8 −15 −10 −5 0 5 10 15 −20 −10 0 10 20 −8 −6 −4 −2 0 2 4 6 8 x−axis Enneper’’s Surface y−axis z−axis Figure 4.4: Data set used for estimating the geodesic distances. From left to right: Viviani's curve, Torus, Ennepers and Helicoid surfaces Dijkstra on a kNN graph provides a good approximation for the true geodesic distances. The experimental results in Table 4.1 show that the compared methods produce signicant distortion whereas the TVG performs well on these manifolds. Also note that the strictly local Tensor Voting method would fail to converge on most of the manifolds experimented in this paper, especially those with deep concavities such as the Helicoid, or Ennepers surface. 4.4.2 Comparison with outliers We apply our method in the presence of large amount of outliers. For the cylinder, torus, Enneper's, and Helicoid surfaces with the same number of inliers used in the previous section, we added 1000, 1500, 2000, and 1500 outliers, respectively. The outliers were generated according to a uniform distribution. Experimental results in Table 4.1 demonstrate that the Tensor Voting Graph can handle large amounts of outlier noise, with a very small eect on the accuracy of the estimated geodesic distance, while outlier noise degrades the competing methods, which obtained good results in the outlier free case. The experimental results show that the embedding methods suer from signicant distortion even in the outlier-free case. The Isomap algorithm shows good results on 40 Data/Method ISOMAP TVG Err% Err% Curve of Viviani 34 1.98 Table 4.2: Geodesic distances comparison on intersecting manifold isometric manifolds, but is not stable in the presence of outliers. Note that in this section we have compared our method to Isomap in embedding space. In the next section our method will be contrasted with thek nearest neighbors in the Euclidean distance Graph, which is the rst step in Isomap. 4.4.3 Computing geodesic distances on intersecting manifolds We can use the TVG to estimate geodesic distances on intersecting manifolds, which are also compared to the estimation of geodesic distance in the ambient space using Isomap. Both in TVG and Isomap, the shortest path distances are estimated using the Dijkstra algorithm. In our method, edges are connected with a distance which is equal to the Euclidean distance if and only if they are connected on the TVG whereas, in Isomap, points are connected if they are k nearest neighbors on the Euclidean Graph. We test our method on Viviani's curve (Figure 4.4), which is the intersection of a sphere with a cylinder that is tangent to the sphere and passes through the center of the sphere. Table 4.2 shows a comparison of the two methods for Viviani's curve. The experimental results demonstrate that the TVG is also capable of eciently traversing intersecting manifolds or manifolds with singularities. 41 −100 −50 0 50 100 −100 −50 0 50 100 −100 −50 0 50 100 −100 −50 0 50 100 150 200 250 −100 −50 0 50 100 −100 −50 0 50 100 0 20 40 60 80 100 20 40 60 20 40 60 80 100 Figure 4.5: Data set of the multiple manifolds used for the clustering experiments 4.5 Clustering multiple manifolds using TVG 4.5.1 Clustering Synthetic data In this section we evaluate the eectiveness of the Tensor Voting graph in clustering mul- tiple manifolds, and show quantitative comparisons. We compare our results to Spectral Clustering (SC [48]), Generalized PCA [60], Sparse Subspace Clustering (SSC, [25]), and Spectral Clustering on Multiple Manifolds (SMMC,[63]). Generalized PCA and SSC rep- resent the state of the art for linear manifold clustering, and SMMC represents the state of the art in non-linear Manifold Clustering. Data: The chosen dataset are three cases of multiple manifolds: one sphere inside another, two intersecting spheres, and two intersecting planes. In each experiment we used 2,000 randomly sampled points from two manifolds. The kernel bandwidth in spectral clustering was tested using 1, 5, 10, 15, 20, 50, 100. The sparse regularization parameter in the SSC algorithm was tested using 0.001, 0.002, 0.005, 0.01, 0.1. The best accuracy over 10 trials of the dierent methods is tabulated in Table 4.3. For the Tensor Voting graph we have computed the anity matrix by using the suggested TVG as detailed in 42 Data/Method SC GPCA SSC SMMC TVG Big-small 100% 51% 56% 100% 100% spheres Two intersecting 78% 50% 53% 96% 98% spheres Two intersecting 60% 85% 93% 99% 99% planes Table 4.3: Clustering accuracy of clustering multiple manifolds section (3). We then provide the anity matrix to be used as an input for the unormalized Spectral Clustering algorithm. The output is the manifold cluster labels. 4.5.2 Experiment with large amount of outliers We also experiment with a large amount of outliers, adding 500 outliers to the intersecting spheres, and compare to Spectral Clustering on Multiple Manifolds. The Tensor Voting Graph method eciently removes the outliers and achieves a clustering accuracy of 95% for the intersecting spheres data, which is signicantly better than the 60% obtained using the SMMC algorithm. [63]. The results clearly show the advantage of the TVG over competing methods, especially in the challenging cases of intersecting manifolds and in the presence of large amount of outliers. 4.5.3 Experiments with real data: application to Motion Segmentation For real data, we apply our method to the problem of motion segmentation. In this problem, we are given a set of feature points that are tracked through a sequence of video frames. We evaluate the TVG on the Hopkins155 1 motion database, where the goal is to 1 http://www.vision.jhu.edu/data/hopkins155/ 43 segment a video sequence into multiple spatiotemporal regions corresponding to dierent rigid-body motions. This problem can be solved by extracting and tracking a set of n feature pointsfx f i g n i=1 ; x f i 2R N through the frames f = 1;;F of the video. Each data point x f i which is refereed as the feature trajectory, correspond to a 2F dimensional vector obtained by stacking the x f i feature points in the video as y i = [x T 1i ; x T 2i ;::x T Fi ]2R 2F . Under the ane projection model, all the trajectories associated with a single rigid motion live in a 3 dimensional ane subspace. The relation between the tracked feature points and the corresponding 3D coordinates of the points on the object under the camera ane model is given by x fP =A f x P 1 (4.7) Where A f 2R 24 is the ane motion matrix at frame f. Using the matrix all the F tracked feature points corresponding to a point on the object in a column, the following relation is obtained: 0 B B B B B B @ x 11 ::: x 1P ::: x F1 ::: x FP 1 C C C C C C A x fP = 0 B B B B B B B B B B @ A 1 : : A F 1 C C C C C C C C C C A 0 B B @ x 1::: x P 1::: 1 1 C C A Writing the equation above asW =MS T , it can be shown that under the ane cam- era model the trajectories of feature points of single rigid motion lie in an ane subspace R 2F of dimension at most three. Hence the 3D motion segmentation problem is to cluster these P trajectories into n dierent groups such that the trajectories in the same group represent a single rigid motion and thus the problem of motion segmentation reduces to the clustering of data points drawn from a union of ane subspace. The database 44 Figure 4.6: Examples of motion sequences frames of indoor and outdoor scenes containing two or three motions from the Hopkins 155 motion database includes 155 motion sequences of indoor and outdoor scenes containing two or three mo- tions, which can be divided into three main categories: articulated, checkerboard, and trac scenes. The articulated sequences contain multiple objects moving dependently in 3D space, while both checkerboard and trac sequences contain multiple objects moving independently. In this case, solving the motion segmentation problem is equivalent to a linear manifold clustering problem [60]. We rst evaluate the TVG method on the 155 motion sequences without outliers, and compare it to the following methods: GPCA, SSC and SMMC. SSC and SMMC represent the state of the art methods for the 155 motion segmentation dataset. The comparison of our method in Table 4.4 shows that the TVG 45 Method Clustering Two Motions Three Motions All Accuracy GPCA Mean 94.12% 77.30% 90.32% Median 98.84% 76.69% 96.89% SSC Mean 99.41% 97.71% 99.03% Median 100% 100% 100% SMMC Mean 99.51% 97.38% 99.03% Median 100% 99.39% 100% TVG Mean 98.60% 97.30% 98.3% Median 100% 98.48% 100% Table 4.4: Clustering accuracy (%) of dierent methods on the Hopkins 155 motion segmentation database. The best are boldfaced achieves similar results to SSC and SMMC, such that the results obtained by all three methods are within a close range to 100% on the entire dataset. 4.5.4 Clustering with corrupted trajectories In realistic scenarios, the data points can often be corrupted due to the limitation of the tracker. For example, the tracker can lose track of the feature points, which leads to gross errors. We examine the robustness of our method in the presence of a large amount of corrupted trajectories and compare it to SSC. Among the state of the art methods, SSC is the only method that can handle corrupted trajectories, as it has demonstrated excellent results on both the outliers and outlier free cases. The evaluation is performed on a large amount of outliers, as follows: for a tested sequence, we randomly select and corrupt 80% of a trajectory's entries, where the number of selected trajectories that were corrupted correspond to dierent percentages of the data:f20%, 25%, 30%, 35%, 40%g. For each sequence we perform ten trials. We also apply the SSC method algorithm which is designed for clustering corrupted data. The testing is performed on two sequences of two motions from the Hopkins155 dataset, 2R3RTC g12 and cars10 g12. The comparison 46 results are reported in Table 4.3. Figure 4.7 shows quantitative results for dierent levels of outlier noise for sequence cars10 g12. As can be seen, the Tensor Voting Graph consistently achieves very high clustering accuracy for dierent amount of outliers, such that the clustering errors degrade gracefully in their presence, and outperform SSC for all the tested noise levels. 0 5 10 15 20 25 30 35 40 82 84 86 88 90 92 94 96 98 100 % of trajectories corrputed with 80% outliers Clustering Accuracy[%] TVG SSC Figure 4.7: Clustering accuracy comparison for the cars motion sequence with corrupted trajectories 4.6 Conclusion We have presented a general method that allows operations to be performed on smooth manifolds regardless to their topology. By embedding the data into our novel TVG, we developed a hybrid local global framework which overcomes one of the main limitation of the Tensor Voting framework and is capable of eciently learning the global features of the manifold. Qualitative results compared to the state of the art methods in manifold learning and clustering demonstrates that our method performs signicantly better, es- pecially for manifolds with complex topological structures and in the presence of outliers. Future work involves explicit handling of junctions and intersections. 47 Chapter 5 Intersecting Manifolds: Detection, segmentation, and labeling Resolving intersections in high dimensional spaces is essential in multi-manifold clustering problems that arise in many applications, such as motion segmentation in computer vision [25]. Recently, a number of multi-manifold clustering algorithms were proposed, in which a multi-way anity measure between data points was suggested to capture complex structure in the data. Typically, such methods [63] [34] [32], [29] construct anity which is based on local tangent space distance, in addition to their Euclidean distance. However, despite important progress made by this research, they only provide satisfactory results when the angle between the tangent planes is large (typically >=4). Moreover, recent work shows that even though relatively few points may be located near the intersections, their contribution to the global structure can be very disruptive to the global manifold structure estimation [5]. In this chapter, we propose a general unsupervised framework to learn a large class of manifolds, including manifolds with singularities, which is also robust against a large amount of outliers. At the core of our approach is the Tensor Voting Graph - which was 48 suggested in Chapter 4. However, while the suggested TVG allows to learn the manifolds eciently globally, it does not address the local intersection area explicitly, and therefore, similar to other multi-manifold learning algorithms, suers from similar shortcomings, such as limitation to only handle manifolds intersecting at relatively large principal angles, and from distortion of the local tangent space estimation around the local intersection area. Our framework to handle intersecting manifolds or manifolds with singularities is dierent from previous research in clustering multi-manifolds [63] [34] [32], [29], as we explicitly and directly resolve the ambiguities near the intersections. In particular, we argue that the position of the points on the manifolds near the intersections contain valuable information that is necessary to achieve high-performance clustering. To resolve complex geometric structures, we suggest decomposing the problem into three main stages (see Figure 5.1 for an illustration of our overall approach): Given a set of unlabeled points with unknown geometric structure, we rst employ a data driven approach, Tensor Voting - which uses the direct communication between data points to inform whether such intersections occurred, and, most importantly, provide a reliable estimation of the local support of the intersections. Using the smooth manifolds parts, we construct a graph in which the anities between the data points are based on a local tangent space distance. The smooth manifold parts are then extracted using Spectral Clustering. The next stage performs ambiguity resolution algorithm in the local singularity area, using the classied smooth manifolds and the positions of the points near the singularities. We show the advantage of our explicit and direct approach in resolving manifolds intersections for a wide range of complex geometric settings, which outperforms the state of the art methods in multi-manifold clustering. 49 Input Points Tensor Voting Smooth Areas Intersection area Tensor Voting Graph Spectral Clustering Ambiguity Resolution Labeling Clustering Figure 5.1: Flow chart of the proposed method 5.1 Related work The multi-manifold case addresses a general setting where the clusters are low dimen- sional manifolds that may intersect or overlap. Many situations exist where the data is formed by a number of manifolds. The complexity of the multi-manifold class of dis- tributions is ruled by the minimum of the manifold curvatures, branch separations, and the overlap between distinct manifolds [32]. Early methods in multi-manifold cluster- ing such as [65] assumed that the manifolds are well separated. Generalized PCA [60] and Sparse Subspace Clustering [25] were suggested to address clustering of intersecting 50 linear multi-manifolds. Recently, a number of methods were suggested to address the challenging problem of non-linear intersecting multi-manifolds. [32] developed a spectral clustering method within a semi-supervised learning framework. As a complementary ap- proach, Robust Multiple Manifold Structure Learning [34], Spectral clustering on multi- ple manifolds [63] and Spectral Clustering using local PCA [29] are unsupervised learning methods which propose similar approaches for clustering intersecting manifolds. Spec- tral Clustering using local PCA also provides a deep and elegant theoretical analysis for multi-manifold learning in the context of resolving intersections. Note however that the algorithms suggested in [32] use a coarsening step, which can hinder a careful treatment of the intersections. The Tensor Voting Graph (TVG) [?], was suggested to address the limitation of the local Tensor Voting method and can perform global operations such as estimating geodesic distance or clustering on single or multi-manifolds which are inter- secting. However, similar to other multi-manifold learning algorithms, the TVG does not address intersections explicitly. Junction type inference 2D junctions are a special case of intersections which has an important applications in image understanding. Studies of human perception and recog- nition indicate that 2D junctions play a fundamental role in many tasks in perceptual grouping such as contour grouping, object detection, and shape recognition [7],[62]. The inherent property of junctions is that their detection and characterization is ambiguous, which poses a very challenging task. As a result previous methods either discard or ignore this information, which is necessary to perform junction type inference in an unsupervised manner. 51 -8 -6 -4 -2 0 2 4 6 8 10 12 -8 -6 -4 -2 0 2 4 6 8 Figure 5.2: The graph of the ball eigenvalue N (shown on the right hand side), as a function of the positions of the points which correspond to two intersecting circles(shown on the left hand side) 5.2 Our Approach We suggest a process that directly untangles the ambiguities in the local intersection area by aggregating support from the smooth manifolds parts. Note that while the suggested TVG allows to learn the manifolds eciently globally, it does not address the local intersection area explicitly, and therefore, similar to other multi-manifold learning algorithms, suers from similar shortcomings, such as limitation to only handle manifolds intersecting at relatively large principal angles, and from distortion of the local tangent space estimation around the local intersection area. Motivated by these shortcomings, we suggest an ambiguity resolution algorithm using three main processing steps, which we detail in the following sections. 52 Input: Labeled Manifold, Unlabeled intersec7on area While local intersec7on set is not Empty Output: classified points ? Update Manifold with the classified point Choose nearest point from the intersec7on area ? ? ? Es7mate tangent with each manifold Choose Manifold with the smallest tangent space varia7on Figure 5.3: Flow chart of the proposed Ambiguity Resolution Algorithm 5.2.1 Intersection Delineation The rst step in our process is to estimate the dimensionality, tangent space and nor- mal space at every point using Tensor Voting. Given a set of unlabeled points, x = fx i g n i=1 ; x i 2R N , which are lying on K smooth intersecting manifolds M 1 ;::M K . Let X J =fx j 2M i \M j jM i \M j 6=;g denote the set of intersection points. The set of points which correspond to the manifolds intersections support will be referred as the decision set points, and are dened as ~ X J =fx j 2k nn (x i )jx i 2X J g (5.1) 53 Algorithm 2: Ambiguity Resolution Algorithm Input: Labeled manifoldsfM r ;T Mr g K r=1 , unlabeled intersection area points ~ X J , 1 nearest neighbor parameter k . 1. Set X new J = ~ X J . while X new J 6=; do 2. Set i = 1;X new J = ~ X J , nd ^ x j = minjjx i ^ x j jj 2 , x i 2M i ; ^ x j 2 ~ X J . 3. Extract the sub-manifolds ~ M r =fx r 2k nn (^ x j )j x r 2M r g for all r = 1:::K 4. Estimate T ~ Mr (^ x j ) , for all r = 1:::K. 5. Compute r (^ x j ) = P k j=1 arccos(jh^ n max ~ Mrr (x j ); ^ n max ~ Mr (^ x j )ij), for all r = 1:::K 6. Add ^ x j 2M j , s.t. j (^ x j ) =minf r (^ x j );r = 1::Kg. 7. Update: X new J =X new J n^ x j 8. Process steps (3-7) with i =i + 1 if i<K, else i = 1 if i =K end Output: Labeled local intersections area pointsf^ x ji g K i=1 2M i , and their corresponding tangent spacesfT M i (^ x ji )g K i=1 To delineate intersections and their local support, we analyze the Tensor at each point x i . Votes are inconsistent only in the area of intersection, which is characterized by sharp transitions of eigenvalues in non-smooth parts; There are two alternatives to identify the local intersection area. Eigenvalue N is adequate to identify local intersections area in any latent dimension, since normal votes are received in the local intersection area at dierent angles and directions from points lying on a dierent manifold (see Figure 2 for illustration of the ball component eigenvalue as a function of the position of the two intersecting circles). The second alternative is to use the eigenvalue d+1 , where d corresponds to the normal dimension of the manifold, to identify the local intersection area. In the smooth parts, the eigenvalue d+1 is very small, while in the local intersection area the dimensionality of the normal space is increased by 1 and hence the corresponding eigenvalue d+1 will be signicantly larger than in the smooth parts. Note that in Figure 5.2 these two cases coincide since d+1 equals the dimensionality of the ball eigenvalue. 54 To estimate which points correspond to the local intersection area, we compute the standard deviation of the eigenvalue d+1 which correspond to all points: = ( 1 n n X i=1 ( d+1 (x i ) d+1 ) 2 ) 1 2 (5.2) where d+1 (x i ) correspond to the ball eigenvalue of point x i and d+1 correspond to the mean of the ball eigenvalues. We identify points x i that belong to the local intersection area if d+1 (x i ) > 2, and all such points are removed for further processing, since their geometric structure information is not reliable. Note also that this threshold is not critical, since the transitions are sharp and distinctive, and only the local intersection area points are characterized by high values of the eigenvalue d+1 . 5.2.2 Global representation of the smooth manifolds parts The second stage is to infer the global structure of the smooth manifolds parts, from which the intersections areas were removed. The TVG (X C ;W C ) is constructed for points corresponding to the smooth manifold parts X C , (X C = Xn ~ X J ) such that the local intersection area points ~ X J are removed fromX. Finally spectral clustering is applied to the anity matrix W C to classify points to manifold labels. 5.2.3 Ambiguity Resolution We elaborate on iterative algorithm that can be cast as a semi-supervised learning algo- rithm that is incrementally aggregating support from the labeled smooth manifolds parts, to determine the labels and geometric structure of the local intersection area. Based on 55 the manifolds smoothness properties, in the local intersection area the local tangent space variation is smaller among pairwise points which belong to the same manifold. In section the Appendix we provide a theoretical analysis in which we prove that if we are given two sub-manifolds, then under mild conditions, the maximal principal angle between the local tangent spaces is smaller for points which belong to the same manifold in the local intersection area. Thus, given three points which are suciently close to the local inter- section area, where only two points belong to the same manifold, the maximal principal angle will be smaller between the pair which belong to the same manifold. This result serves as a basis for our ambiguity resolution algorithm, which allows us to untangle the manifolds in the local intersection area. Formally, our objective is to reconstruct the labels of the decision set points ~ X J and their corresponding tangent spaces T ( ~ X J ) such that manifold smoothness is maximized in the local intersection area. This task can be performed by minimizing the total variation of the tangent spaces. For each point in the local intersection area, we estimate its local tangent space independently by using each of the nearby manifolds (which are known at this stage) and assign it to the manifold for which the total tangent space variation was minimal. Algorithm Description We describe the algorithm for reconstructing the decision set points (the ow chart of the algorithm is illustrated in Figure 5.3). Let X C , be the labeled manifolds data , and letfT M i (X C )g be their corresponding tan- gent spaces. G C = (X C ;W C ) is the Tensor Voting Graph, with W C corresponding to the anity matrix between the labeled manifolds. ~ X J is the set of unlabeled points which correspond to the local intersection area. G C = (X C ;W C ) together with the 56 positions of the local intersection area ~ X J serve as an input to the ambiguity resolu- tion algorithm. The goal is to nd the true labels of the points in the local inter- section area, and obtain a reliable estimation of their tangent spaces. We begin with selecting a point X from the local intersection area which is the nearest neighbor to one of the manifolds: x = minjjX C ^ x j jj 2 ; ^ x j 2 ^ X J , and compute its tangent spaces T M 1 (x );T M 2 (x );:::T M K (x ) induced by its k nearest neighbors in each one of the man- ifolds M 1 ;M 2 ;::M K . We then classify x to belong to the manifold M for which the tangent space variation (x ) = P k j=1 arccos(jh^ n max M (x j ); ^ n max M (^ x )ij) was minimal. We add x andT M (x ) to the corresponding manifold M and remove x from the decision set ~ X new J = ~ X J n x . In a similar way we process all the remaining decision set points ~ X new J until the procedure is exhausted. The output is the labels of the entire decision set points and their corresponding tangent spaces. In the suggested greedy algorithm, computational complexity amounts to estimating the tangent space using Tensor Voting for all the local intersection area points, which requires onlyO(jNklogk), whereN is the dimension of the ambient space ,k corresponds to the number ofk nearest neighbors, andj is the number of local intersection area points, which typically constitutes a small portion of the total number of points n. 5.3 Experimental Results with Ambiguity Resolution Algorithm In the following sections we show experimental results on clustering intersecting mani- folds, using our extended approach of the TVG combined with ambiguity resolution which is detailed in section 5. 57 5.3.1 Clustering and Junction type inference in 2D We show clustering results and a novel application of our ambiguity resolution approach to a junction type inference. The importance of junctions in perceptual organization to perform tasks such as tasks occlusion detection, and gure completion have been long recognized [61], yet for an algorithmic design of automatic junction inference there are still many remaining open questions. We rst show experimental results of clustering performance on 2D dataset which consists of typical junctions such as K, Y, V , X and 3 lines intersecting (see Figure 5.6). The results shown in Figure 5.6) demonstrate our method's ability to achieve high clustering performance on these 2D junctions. Once the ambiguities near the junction are resolved, further important clustering and classication tasks can performed such as junction type inference. We present a scheme for junction type inference that integrates our approach with the rst order Tensor Voting which provides an estimation at each point to whether it is lying on an end point of the curve. Previous research to infer junctions and junction types includes [47] and [57]. In [57] a process to simultaneously infer curves and junctions was suggested, in which the local junction area was rst trimmed, followed by a curve extension method which was employed to articially reconstruct the junction area. However, this strategy yields an artifact junction and local junction area which did not always produce satisfactory results. Moreover, prematurely discarding these points also removes the crucial information on the junctions local properties such as an accurate location of the junction, number of connected components and tangent space information of the points near the junction. Using TVG and the ambiguity resolution algorithm, we present a novel application of 58 unsupervised Junction type inference which can automatically classify junctions such as T, X, Y, V, and K. The Junction type inference is performed as follows: we compute the Tensor Voting Graph which provides the anity graph G and compute its graph Laplacian. The number of clusters of the anity matrix can be obtained by the multiplicity k of the eigenvalue 0 of L [44] [65]. Therefore, this provides us with the number of connected components in the anity graph and hence the number of manifolds which are intersecting at the given junction. The parameter for the number of clusters is used as an input for the smooth global structure estimation. After the ambiguity resolution algorithm is performed, and the points in the local junction area are classied, we perform the rst order voting in the TV scheme [45] to infer whether a junction point is an end point or an inner point for each of the manifolds. Combining the number of manifold components with the rst order information provides a unique description to infer the junction type such as T, X, Y, V, K. An illustration of our scheme for automatic junction inference is shown in Figure 5.5. Table 5.1: Classication results of junctions in 2D Y junction X junction K junction T junction (outside intersection ) 100% 100 % 100% 100% (intersection area) 99.8% 100 % 99.8% 100% 5.3.2 Experimental Results without outliers We experimented with synthetic and real data sets of various challenging geometric con- gurations, such as when the maximal principal angle between the tangent spaces at the 59 0 1 2 3 4 5 6 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 -15 -10 -5 0 5 10 15 2 4 6 8 10 12 14 16 18 20 22 1 2 3 4 5 6 7 8 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 (a) (b) (c) (d) Figure 5.4: Junctions dataset used in 2D intersections points is smaller than 40 degrees. For comparison and evaluation with the state of the art, we experimented with the following datasets: (1) two circles intersecting at 18 degrees, (2) two planes intersecting at 40 degrees, (3) two Mobius bands, (4) two intersecting spheres, and (5) a Swiss roll intersecting with a plane. The manifolds were uniformly sampled with n=1000 points for each plane, circle and spheres, n=2000 points for the Mobius bands, n=2000 points for the Swiss roll. Each simulation was repeated 10 times. We also compared our method to state of the art algorithms in clustering multiple manifolds, Spectral clustering on multiple manifolds (SMMC) [63], and SSC [25], which is a state of the art method in clustering linear intersecting manifolds, For the choice of parameters we tested the k nearest neighborhood size k2f10; 20; 30; 40; 50; 60; 70; 80g. For the second parameter in SMMC and SSC we tested inf10; 20; 30; 40; 50; 60; 70; 80g andf0:001; 0:001; 0:1; 1g, respectively.The results are reported for the best choice of pa- rameters for each method. 60 (a) X Junc*on (b) K Junc*on (c) T Junc*on (d) V junc*on (e)Illustra*on of junc*on type inference Figure 5.5: Junction type inference illustration Note that Sparse Subspace Clustering [25] was only compared to the case of the intersecting planes since it is only adequate to handle linear manifolds. In our method, we chose a scale such that the average number of votes from each point in the Tensor Voting iteration equals to n=20, and the number of k nearest neighbors on the Tensor Voting was tested infn=40;n=40 + 5;n=40 + 10g. We report the classication accuracy percent in each dataset both for the set of points which correspond to the area near the intersection in addition to the rest of the points. Note that the most relevant statistics is the clustering accuracy in the area near the intersections. The comparison results in table 5.2 show that our method consistently outperforms the state of the art both near the intersection areas and in the smooth areas, and in particular for the challenging geometric 61 −20 −10 0 10 20 30 −20 −10 0 10 20 −15 −10 −5 0 5 10 15 −10 −5 0 5 10 15 −10 −5 0 5 10 15 20 −2 0 2 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 100 120 140 160 x y z −10 −5 0 5 10 −10 −5 0 5 10 15 0 20 40 Figure 5.6: Manifolds dataset used in table 2 setting where the principal angle at the intersection point is smaller than =8 (such as in the case of the intersecting planes or two circles). Finally, we highlight both quantitative and qualitative dierences between TVG and the modied approach to handle intersections. Table 5.5 shows the tangent space average angular error for two intersecting planes and the two circles using Tensor Voting and the new proposed method. Even though the average error obtained using the standard TV seems relatively marginal, the clustering performance using TVG deteriorates (which is also the case for all the other existing methods) as the principal angles becomes smaller. Using the new approach, the error of the tangent space is reduced and the clustering results are signicantly improved . We also note that the choice of parameters is not crit- ical, and is robust against a wide range of parameter selection for the k nearest neighbors 62 Dataset / Method SSC[25] SMMC[63] TVG+Ambiguity Resolution outside Intersection outside Intersection outside Intersection intersection area intersection area Intersection area area area area Two circles - - 69.47% 59.09% 100% 99.22% Two Mobius bands - - 95.14% 75.3% 99.98% 98.44% Two spheres - - 96.79% 80.33% 100% 98.58% Two planes 71% 59.58% 72.22% 59.58% 99.9% 96.07% Swiss Roll and a plane - - 96.5% 95.57% 99.95% 95.9% Table 5.2: comparison with state of the art Dataset / Method SMMC [63] TVG + Ambiguity Resolution outside Intersection outside Intersection intersection area intersection area area area Two circles 65.91% 59.72% 99.75% 94.5% Two Mobius bands 87.09% 62.31% 99.94% 98.09 % Two spheres 54.9% 53.9% 99.61% 90.05 % Two planes 60% 58% 94.16% 99.94 % Swiss Roll and a planes 59.03% 52.37% 98.34% 97.29 % Table 5.3: Comparison results in the presence of outliers on the graph and the scale of Tensor Voting. However, truly automatic parameter selec- tion remains an open problem for future research, which is also the case for all existing intersecting manifolds algorithms [32] and [29]. 5.3.3 Experiments with outliers We also apply our method in the presence of a large amount of outliers. Robustness to outlier noise is a critical issue in manifold clustering and current methods are very sensitive in their presence. This shortcoming was pointed out in previous works [63], [34] as a challenging open problem and was partially addressed in [34], however using only a 63 (a) (b) (c) (d) Figure 5.7: Figures of manifolds with outliers used in the experiments :(a) Intersect- ing Mobius bands (b) Intersecting Swiss Roll with a plane (c) Intersecting Spheres (d) Intersecting planes number of outliers which is 10% of the number of inliers. The Tensor Voting framework, on the other hand, is robust to outliers and since it serves as an integral part of our method, removing the outliers by examining the eigenvalues of the Tensors obtained after the rst TV iteration is straightforward to incorporate in our scheme. We experiment our method with outlier noise using the same intersecting manifolds experimented in the previous section. The two circles, Mobius bands, spheres, two planes and Swiss roll intersecting with a plane manifolds are contaminated with 1000 and 2000, 1500, 1500, and 1500 outliers, respectively. To remove outliers, the eigenvalues which correspond to 64 Dataset / Method SMMC [63] TVG + Ambiguity Resolution in high dimensional space outside intersection Intersection outside intersection Intersection area area area area 2D Sphere in 50D 95.56% 94.04% 97.64% 99.3% 3D Hyper Sphere in 50D 85.5% 62.16% 98.71% 94.2% Table 5.4: Manifolds in high dimensional space 20 40 60 80 20 40 60 20 40 60 80 100 120 140 (a) Ground Truth (f) Ground Truth (e) TVG+ Ambiguity Resolution (d) TVG (j) TVG+ Ambiguity Resolution (i) TVG (g) Sparse Subspace Clustering (c) SMMC (b) Sparse Subspace Clustering (h) SMMC Figure 5.8: Evaluation on challenging dataset of manifolds with small maximal principal angle reveal the degradation in performance of both linear and non-linear multi-manifold clustering methods 65 the eigenvalue 1 of the tensor at each point are sorted. We remove the points which correspond to the smallest sorted eigenvalues. The experimental results, shown in table 5.3 demonstrate that our method is robust against outliers while it severely aects the results of the other methods. 5.3.4 Experiments with manifolds embedded in high dimensional spaces We also apply our method on manifolds which reside in a high dimensional space for two cases including 1) intersecting spheres corresponding to 2D manifolds and 2) intersecting hyper spheres corresponding to 3D manifolds. In each case, these manifolds were gener- ated using uniform sampling with 2,000 samples for each manifold in 3D and 4D, which were then embedded in 50-D by using a random orthonormal matrix. Experimental re- sults demonstrated in table 5.4, show that our method remains robust when applied in the high dimensional space, both in the area near the intersections and in the smooth parts. 5.3.5 Experiments with Real Data sets For experiments with real data-sets, we tested our method on the problems of human action classication and two view motion segmentation problems. Motion Capture using the CMU Motion capture data set Classication of human motion sequences as a prepossessing step is important for many tasks in video annotation. The CMU motion capture data-set is a popular and widely used real data set for motion capture. In order to perform evaluation in a strictly unsupervised framework, we remove 66 the temporal information from the data, thus the data provided correspond to static in- formation. In this case, the problem can be considered as clustering multiple manifolds with edge singularity type, which correspond to abrupt change due to a transition of a human action to a dierent motion activity. We choose ve mixed sequences from sub- ject 86, which includes mixed activities such as walking, turning around, sitting, running, jumping, squats, and stretching. We extract approximately 500 frames per each sequence since it corresponds to two or three distinct motion activities. Each point correspond to a human pose, which is represented by 62 dimensional feature vector. The experimental results comparisons in table 5.6 show that our method outperforms the state of the art. The errors obtained using our framework correspond to the frames which occur during transitions between dierent motion activities, which are dicult also for humans to eval- uate. Motion segmentation using 155 motion segmentation benchmark Next we show evaluation on the problem of motion segmentation from only two-views, using the 155 mo- tion segmentation data-set benchmark. We evaluate our method on two image sequences with perspective eects [42], and compare them to SSC [25], which showed state of the art results in the case of motion segmentation based on feature trajectory. The problem of segmenting motions using only 2-views is a challenging task since the feature trajectories lie on quadratic surfaces of dimension at most 3 inR 4 [2] which may be overlapping or intersecting. Applying our method for motion segmentation achieves an average classi- cation error of 10:8% outperforming SSC which obtained 20:43% classication error. 67 Method/sequence two circles two planes Tensor Voting 0.4% 1.8% New Method 0.01% 0.17% Table 5.5: Tangent space average angular error results for the two intersecting planes and intersecting circles data(Figure 5.8) Data/Method SMMC TVG + Ambiguity resolution CMU MoCap 87.06% 96.01% Table 5.6: Classication results of human activities on Motion Capture data 5.4 Discussion We have presented a general method that allows us to eciently perform operations on smooth, but not necessarily planar manifolds, and on manifolds with singularities. Our framework is designed to learn complex geometric structures both locally and globally, and in particular is well suited to address manifolds with singularities such as multiple intersecting manifolds and even non-manifolds. We have suggested a novel method for unsupervised clustering of intersecting multi-manifolds by explicitly addressing and re- solving the ambiguities near the intersections, in convoluted geometric situations such as when the principal angle between the tangent spaces at the intersection is small. An- other advantage of our approach is its robustness against a very large amount of outliers, which maintains high performance when the amount of outliers exceeds more than 100% of the total number of inlier points. Experimental results on a wide range of data-sets demonstrate that our method performs clustering with very high accuracy in all of these situations, and signicantly outperforms the state of the art, especially for manifolds with 68 complex topological structures and in the presence of outliers. The main limitation of the current framework is robustness to inlier noise, where the method may fail in the presence of large amounts of noise in the intersection area itself. Thus, one of the im- portant directions for future work is to extend the current framework to handle inlier noise in high dimensional spaces. In addition, an algorithmic design of an automatic parameter selection to handle the problem of intersecting manifolds, which is currently an open problem, would be also benecial to our framework. 69 Chapter 6 Manifold Frequency Denoising In this Chapter we propose a new framework for Manifold Denoising based on processing in the graph Fourier frequency domain. The motivation for this work is that while our approach using the Tensor Voting Graph which was presented in the previous chapters provides eective tools to analyze manifolds, it relies on the local structure estimation using Tensor Voting, which may degrades signicantly in the presence of large amount of inlier noise, . i.e., when the data does not lie strictly on the manifold. Since in practice there are many applications where the data may contain non-trivial amount of inlier noise, we consider the ability to perform ecient regularization as a crucial building block which can allow us to use of the tools developed in the previous chapters. 6.1 Introduction Manifold learning has been proposed to extend linear approaches such as PCA to ad- dress the more general case where the data lies on a non-linear manifold. The existing manifold learning algorithms [58] [51], [4], [66], can provide eective tools to analyze high dimensional data with a complex structure when the data lies strictly on the manifold. 70 Noisy Data Similarity Graph W Graph Laplacian L=D-W Spectral Graph Wavelet Transform Remove high Frequency bands -1 -0.5 0 0.5 -1 0 1 0.5 1 1.5 2 2.5 3 1 2 3 4 5 6 0 100 200 300 400 500 600 700 800 900 1 2 3 4 5 6 0 100 200 300 400 500 600 700 800 900 -1 -0.5 0 0.5 1 -1 0 1 0 0.5 1 1.5 2 2.5 3 Denoised Data Threshold Inverse Spectral Graph Wavelet Transform 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Figure 6.1: Illustration of our method However, in the presence of noise, i.e., when the observed data does not lie exactly on the manifold, the performance of these methods degrades signicantly. Only a handful of methods have been suggested to handle noisy manifolds in a strictly unsupervised manner, e.g., [39] [36], but their main shortcoming is that they tend to over- penalize either the local or the global structure of the manifold. In this chapter, we address the manifold denoising problem using a graph-frequency framework called Manifold Frequency Denoising (MFD). Our approach is based on pro- cessing using Spectral Graph Wavelets (SGW) [37]. Similar to time and frequency lo- calization trade-os provided by wavelets in regular signal domains, SGWs provide a trade-o between spectral and vertex domain localization. In the context of Machine Learning, this property allows us to overcome the limitations of existing manifold de- noising methods, by providing a regularization framework in which the output denoised manifold is locally smooth without over-tting the global manifold structure. In our proposed framework (see Figure 6.1 for an illustration) we build a graph where each vertex corresponds to one of the noisy observations, with edge weights between two vertices a function of the distance between the corresponding observations in the 71 ambient space. Then, we apply the SGW to several graph-signals, where each graph- signal corresponds to one of the dimensions, and assigns the scalar coordinate in that dimension to the corresponding vertex. Thus, our graph is based on vector distances between observations, while denoising is applied to the observed coordinates in each dimension. In this chapter, we theoretically justify our approach by showing that for smooth manifolds the coordinate signals also exhibit smoothness (i.e., the maximum variation across neighboring nodes is bounded). This is rst demonstrated in the case of noiseless observations, by proving that manifolds with smoother characteristics leads to energy more concentrated in the lower frequencies. Moreover, it is shown that higher frequency wavelet coecients decay in a way that depends on smoothness properties of the man- ifold. We then show that the manifold smoothness properties induce a similar decay characteristic on the spectral wavelets transform of the noisy signal. The eect of noise is relatively small for noisy graphs in which a large fraction of the edges in the noiseless graph remain connected. Our experimental study also demonstrates that graph signal processing methods are eective for processing smooth manifolds, since in a graph signal dened based on these manifolds, most of the energy is concentrated in the low frequencies, making it easier to separate noise from signal information. To the best of our knowledge, MFD is the rst attempt to use graph signal processing ([55]) tools for manifold denoising. It is dierent from previous work from image denoising based on graph signal processing [50] in that the data is unstructured and our smoothness prior explicitly assumes that the original data lies on a smooth manifold. 72 Another crucial aspect in manifold denoising is the eciency and robustness of the process. Most of the current manifold denoising algorithms consist of iterative, global or semi global operations, which may also be sensitive to the parameter selection. In contrast, our denoising approach provides a fast and non-iterative process, with low computational complexity that scales linearly in the number of points for sparse data. It is robust for a large range of parameter selections, and in particular, selection of k, the number of nearest neighbors used to construct the graph. In addition, our approach does not require knowledge of the intrinsic dimensionality of the manifold. Experimental results on complex manifolds and real world data, which includes mo- tion capture and face expression datasets, demonstrate that our framework signicantly outperforms the state of the art, so that, after denoising, it is possible to use current man- ifold learning approaches even for challenging complex smooth manifolds. Quantitatively, denoising using MFD signicantly outperforms the state of the art denoising methods for a wide range of k nearest neighbors selections both on synthetic and real datasets. In addition, our approach is shown to degrade gracefully as noise levels increase, while still preserving both local and global manifold structure, and it is also shown to be robust to graph construction parameters (e.g., the number of k nearest neighbors used). This Chapter is organized as follows: in Section 6.2 we summarize the related work. In Section 6.3 we introduce the notation and provide an overview of Spectral Graph Wavelets. Section 6.5 presents our main theoretical results and Section 6.6 describes our new approach for manifold denoising. The experimental results are provided in Section 6.7 and in Section 6.8 we conclude our work and suggest future work. 73 6.2 Related Work Denoising is very important for practical manifold learning, as most of the manifold learn- ing approaches, e.g., Isomap ([58]), LLE ([51]), LE ([4]), LTSA ([66]), and HLLE ([23]) assume that the data lies strictly on the manifold, and are known to be very sensitive to noise. Thus a number of methods have been proposed to handle noisy manifold data. The state of the art methods include manifold denoising (MD, [39]), and locally linear denois- ing (LLD, [36]). Also related are statistical modeling approaches for manifold learning such as Probabilistic non-linear PCA with Gaussian process latent variable models (GP- LVM, [41]), and its variants for manifold denoising ([30]). Among the state of the art methods, the method most related to our work is MD, which applies a diusion process on the graph Laplacian, using an iterative procedure that solves dierential equations on the graph. It is shown that each iteration in the diusion process in MD is equivalent to the solution of the regularization problem on the graph ([39]). Also note that the regu- larization problem on the graph solved by MD is also known as Tikhonov regularization. The main limitation of MD is over-smoothing of the data and sensitivity to the choice of k nearest neighbor construction graph (as mentioned in [39]) especially at high noise levels. Our work is inspired by a classical approach for wavelet-based denoising ([22]), and its many extensions to image denoising ([11]). Some recent work, ([50]), has explored graph- based techniques for image denoising for structured domains. However, the more general case of irregular domains is much less understood, and to the best of our knowledge, this work is the rst attempt to explore spectral wavelets for manifold denoising on 74 unstructured data. Spectral Graph Wavelets (SGW, [37]) provide us with an ecient tool to select spectral- and vertex-domain localization and are key component of our method (see Section 6.3 for more details). 6.3 Preliminaries Consider a set of points x =fx i g;i = 1;:::N, x i 2 R D , which are sampled from an unknown manifoldM. An undirected, weighted graphG = (V;E), is constructed over x, whereV corresponds to the nodes andE to the set of edges on the graph. The adjacency matrix W = (w ij ) consists of the weights w i;j between node i and node j. In this work, the weights are chosen using the Gaussian kernel function W ij = 8 > > < > > : exp jjx i x j jj 2 2 2 2 D if x j 2 kNN(x i ) 0 else (6.1) where is (jjjj) denotes the L2 distance between the points x i ; x j , kNN(x i ) denotes the k nearest neighbors of x i . and 2 2 D ; are some parameters. The degree d(i) of vertex i is dened as the sum of weights of edges that are connected to i. In order to characterize the global smoothness of a function f 2R N , we dene its Graph Laplacian quadratic form with respect to the graph as: jj5 fjj 2 = X ij w ij (f(i)f(j)) 2 = f T Lf; (6.2) where i j, if i and j are connected on the graph by an edge, and L denotes the combinatorial graph Laplacian, dened as L = D W, with D the diagonal degree 75 matrix with entries d ii =d(i). The eigenvalues and eigenvectors of L are 1 ;:::; N and 1 ;:::; N , respectively. The Graph Fourier Transform (GFT) ^ f of the functionf (which is a function over the vertices's of the graphG), is dened as the expansion off in terms of the eigenvectors of the graph Laplacian ^ f( l ) = X i f(i) l (i) (6.3) 6.3.1 Model Assumptions and Previous Results In Manifold Learning We introduce some notation and recall the denitions of the condition number 1=, ([49]), which provides an ecient measure to capture both the local and global geometric prop- erties of a manifold, and the geodesic covering regularity on the manifold ([3]). For each x i 2M, let T x i M, T ? x i M denote the tangent space and normal space to M, respectively. B D (x i ;r) is an open ball inR D centered at x i with radius r. The ber of size r at x i is dened as L r x i = T ? x i M\B D (x i ;r). Given > 0, if < , where 1= is the condition number dened below, then M is a disjoint union of its bers ([49]): M = [ x i 2M T ? x i M\B D (x i ;) (6.4) Denition 1 (Condition Number, [49]) The condition number of a manifold M is the largest number such that each point in M has a unique projection onto M. Note that is small if M is highly curved or close to self-intersections. For manifolds with high curvature or close to self intersection will be smaller. Given two points 76 x i ; x j 2 M, let d M (x i ; x j ) denote the geodesic distance between the points x i ; x j . Also note that given a set of points A,jAj denotes the number of points in A. Denition 2 (Covering Number, ([3]) Given T > 0, the covering number G(T ) of a compact manifold M is dened as the smallest number such that there exists a set A, with G(T ) =jAj , which satises: min a2A d M (x i ;a)T (6.5) for all x i 2M. Note that givenT > 0, in order to achieve suciently dense covering of the manifold, the covering numberG(T ) will depend on the volume and the intrinsic dimensionality of the manifold ([3]). The following denition is often used to dene localization properties in the graph domain: Denition 3 (shortest-path distance) The shortest-path distance - the minimum number of edges for any paths connecting m and n - is dened as follows: d G (m;n) = arg min s fk 1 ;:::k s g (6.6) such that k 1 =m;k s =n and w kr;k r+1 > 0 for 1r<s. The following results, which will be useful in what follows, were obtained in [49]: LettingM be a smooth manifold with a condition number 1=. Given two points x i ; x j 2 M, then the following inequalities hold: 77 1. Ifjjx i x j jj 2 , then d M (x i ; x j ) q 1 2jjx i x j jj . 2. Consider v2B D (x i ;)\T ? x i M\B D (x j ;T ) , where x j = 2B D (x i ;T ), then: jjx i vjj< 2T 2 . 6.4 Spectral Graph Wavelets Wavelets transforms dened on the vertices of arbitrary weighted graphs provide powerful tools to explore data that lie on complex domains with unknown topological structure. The rst generalization of wavelets to graphs was proposed by Coifman and Maggioni [16]. Spectral Graph Wavelets (SGW) [37] (see Section 6.3) provide us with an ecient tool to select spectral- and vertex-domain localization and are key component of our method. Spectral graph Wavelets (SGW)([37]) dene a scaling operator in the Graph Fourier domain (see Figure 6.2 for an illustration), based on the eigenvectors of the graph Lapla- cian L, which can be thought as an analog of the Fourier transform for functions on weighted graphs. SGWs are constructed using a kernel function operator T g = g(L) which acts on a function f by modulating each Fourier mode: \ T g f( l ) =g( l ) b f( l ): (6.7) Given a function f, the wavelet coecients take the form f (s;n) = (T s g f(n)) = N X l=1 g(s l ) b f( l ) l (n): (6.8) 78 SGW can be computed with a fast algorithm based on approximating the scaled gener- ating kernels by low order polynomials. The wavelet coecients at each scale can then be computed as a polynomial of L applied to the input data. When the graph is sparse, which is typically the case under the manifold learning model, the computational com- plexity scales linearly with the number of points, leading to a computational complexity of O(N) ([37]) for an input signal f2R N . Including a scaling function corresponding to a low pass lter operation, SGWs map an input graph signal, a vector of dimension N, toN(J + 1) scaling and wavelet coecients, which are computed eciently using the Chebyshev polynomial approximation. Since g is designed as a band pass lter in the spectral domain, graph wavelets will be localized in the frequency domain. The kernel function g we use here is the same as the one dened in [37]. The kernel g behaves as a band-pass lter such that it satises the conditions g(0) = 0 and lim x!1 g(x) = 0. In order to stably represent the low frequency content of f dened on the vertices of the graph, it was suggested in [37] to use graph scaling function h. The scaling function h :R + !R acts as a lowpass lter which satises h(0)> 0 and h(x)! 0 when x!1. Note that the scaling functions helps ensure stable recovery of the original signal f from the wavelet coecients when the scale parameter s is sampled at a discrete number of values s j . 6.5 Theoretical Results 6.5.1 Overview We theoretically justify our framework, based on the following three main properties: 79 0 2 4 6 8 10 12 14 16 18 0 0.5 1 1.5 2 2.5 3 Scaling function kernel h(x), Wavelet kernels g(sx) Figure 6.2: Scaling function (blue curve) and wavelet generating kernels g(s) for dierent choices of scales s i) In manifolds that are smooth with respect to curvature and sampling rate, energy is concentrated in the low frequencies of the graph. ii) Given a noisy set of points, and a noiseless graph, the noise aects all bands with similar probability density function, which is proportional to the bandwidth size. iii) Bounded noise perturbations in the data maintain the manifold global structure, such that nodes that were connected in the noiseless graph are still connected in the noisy graph. We will characterize the behavior of the frequencies of the graph both in the noiseless and the noisy case. In the context of this paper, we call a 'noiseless graph', denoted by G, a graph that is constructed from a set of noiseless observations, hence the weights on the graph are noiseless. In contrast, we call a 'noisy graph', denoted by ~ G, a graph 80 constructed based on noisy observations, so graph weights and graph connectivity are based on noisy observations. We will provide an analysis which will characterize the behavior of the graph Laplacian and the SGWs for both a noiseless graph G and a noisy graph ~ G. In Section 6.5.2 we set up the problem and consider the ideal scenario in which we are given a set of points that lie strictly on a smooth manifold. We demonstrate that for the case of manifolds that are suciency smooth, as quantied by its condition number 1= and geodesic covering number (see Denitions 1 and 2), the coordinates of each point on the graph change smoothly. This allows us to bound the variation of coordinate signals on the graph as a function of the smoothness properties of the manifold in a way that shows that smoother manifolds will lead to coordinate signals with lower variation and thus lower graph frequencies. We then show in Theorem 1 that the energy in the higher frequencies decays in a way that depends on the smoothness properties of the manifold, which validates our assumption that there will be less energy in the high frequencies. In Section 6.5.3 we examine the case where the sampled points are noisy and the graph is noiseless. We demonstrate in Lemma 3 that the noise term aects all wavelet bands in a similar probabilistic way (similar distribution), where the total energy of the noise in between adjacent scales diers by a logarithmic factor. In Section 6.5.4 we address the most general case, where we have noisy observations as graph signals and a noisy graph ~ G. Under specic model of noise and sampling rate conditions (we assume that the noise is distributed in perpendicular to the manifold), Lemma 4 and Theorem 2 prove bounds on the decay of the energy of a noisy graph Laplacian and the noisy Spectral Graph Wavelets. Our results show that if the noise 81 term is bounded by the smoothness properties of the manifold (the condition number) and the sampling rate is suciently dense, then the energy of both the noisy graph Laplacian and the noisy spectral wavelets decay in a similar way to the smooth, noiseless case. 6.5.2 Noiseless Observations of Smooth Manifold (f over G) (a) (b) (c) Figure 6.3: Example of a smooth manifold. For the same manifold (a) shows the case where the normals do not extend beyond = , while (b) shows the case where the normals intersect and extend beyond . (c) is an example of a manifold with a condition number 1=. In the following lemma we will establish a connection between the smoothness of the manifold and the smoothness of the coordinate signal f. This Lemma also motivates our choice of denoising each of the coordinate signals in the graph domain. By using a suciently high sampling rate, which depends on the smoothness properties of the 82 manifold quantied by its condition number 1=, we obtain that the points that are connected on the graph belong to the local neighborhood on the manifold, and that the corresponding coordinate signals vary smoothly. Lemma 1 Consider a manifold M, with a condition number 1=, which is sampled at a resolution of a geodesic covering numberG(T ), where also T < 1=4. Letd M (x i ; x j ) denote the geodesic distance on the manifold M. Then, for all i;j2G such that d M (x i ; x j )<, we have that: jf(i)f(j)j 2CT 1 + p 1C=2 ; (6.9) where C is a constant, 1C maxf=2;=4Tg. Proof First note that for each r we have jf(i)f(j)jjjx i x j jjd M (x i ; x j ) (6.10) By Proposition 6.3 in [49], we have that d M (x i ; x j ) q 1 2jjx i x j jj= = 2jjx i x j jj 1 + p 1 2jjx i x j jj= (6.11) for x i ; x j which obeyjjx i x j jj =2. Taking = CT , where C obeys 1 C maxf=2;=4Tg, we have that for all x i ; x j for whichd M (x i ; x j )< sincejjx i x j jj and T= < 1=4, then d M (x i ; x j ) 2CT 1 + p 1 2jjx i x j jj= 2CT 1 + p 1C=2 (6.12) 83 and thus the inequality is obtained. 2 Lemma 1 shows that for a manifold which is sampled with a sucient density, the manifold coordinate signalsf r change smoothly, i.e., their local variation is bounded. Note that for a manifold with condition number 1=, the conditionsT= < 1=4,jjx i x j jj=2 , limits the curvature and closeness to self-intersection. For these conditions, if x i ; x j obeyd M (x i ; x j )<, then we obtain that the geodesic distance has the same order of the Euclidean distance, i.e., d M (x i ; x j )jjx i x j jj. Dening ( 1 ;T ) = 2CT 1 + p 1C=2 (6.13) note that ( 1 ;T ) decreases as T decreases. In the next lemma we bound the total variation of coordinate signals with respect to the graph as a function of the smoothness properties of the manifold. This lemma shows that smoother manifolds will lead to coordinate signals with lower variation and thus lower graph frequencies. Lemma 2 Given a manifold M with a condition number 1=, sampled at a resolution of a geodesic covering number G(T ), with the conditions of Lemma 1 satised, the following inequality holds: jj5 fjj 2 ( 1 ;T ) N C f (6.14) where C f = f 2 is the square of the mean of the graph signal f. 84 Proof Using the denition of the graph Laplacian we have jj5 fjj 2 = f T Lf = X ij w ij (f(i)f(j)) 2 ; next using normalization and the Cauchy-Schwartz inequality we obtain P ij w ij (f(i)f(j)) 2 jjfjj 2 P ij w ij (f(i)f(j)) 2 N f 2 (6.15) By applying Lemma 1 with T;C that obey its conditions, we can bound the coordinate signal dierence terms in (6.15) for all vertices that are 1-hop neighborhoods on the graph: P ij w ij (f(i)f(j)) 2 N f 2 (1=;T ) P ij w ij N f 2 where we used (6.13). Next summing over all vertices we get P d i < P d max , whered max is the maximum degree and since d max < N [9], the Lemma is obtained. 2 Essentially, the lemma states that if two manifolds with dierent (1=;T ) have the same Laplacian the coordinate signals corresponding to the smooth manifolds would have less variationjj5 fjj 2 and thus more energy concentrated in the lower frequencies. This will be re ected in the SGW domain as well. Note that due to this property and the localization of spectral wavelets, the corresponding wavelet transform will be smooth. Also note that the localization property of SGW is achieved by an approximation of a K degree polynomial, which leads to aK-localized transform in the spectral wavelet domain. The K hop local neighborhood correspond to all vertices which are within K number of hops from a given vertex. Furthermore, for a given Laplacian L, a noise signal would 85 have a relatively at energy distribution across bands in the SGW domain. Using Lemma 1 and Lemma 2, we now develop results for the localization of Spectral Graph wavelet coecients of an arbitrary band s. We assume that the corresponding kernel function g(s l ) generally obeys the properties designed in [37] , although it may apply to other kernel design. Our main assumption on the kernel function g(s l ) is that it is continuous and has a zero of integer multiplicity at the origin, i.e., g(0) = 0 and g r 0 (0) = 0 for some r 0 > 0. Also note that we assume that the graph is constructed as in (6.1) however these results can also be applied using other types of distances, such as local tangent space distance. Theorem 1 Given wavelet coecients that were calculated from a smooth manifold with a condition number 1= and a geodesic covering number G(T ) with the conditions of Lemma 1 satised, using a kernel functiong(s) which is non-negative in [0; max ]. Then, the wavelet coecients in a band s obey X n j f (s;n)j 2 s 2 ( 1 ;T ) N C f C s (6.16) where we have that g 0 s ( l ) denotes the derivative of the kernel function g s ( l ) = g(s l ), and for each l we have that g s ( l ) =sg 0 s (c l ) l (6.17) for c l such that s 0 <c l <s l , and C s = P l g 02 s (c l ) l . 86 Proof Observe that the following equality holds for any band s: X n j f (s;n)j 2 = X n X l g(s l ) ^ f( l ) l (n) X n 0 X l 0 g(s l 0) ^ f( l 0) l 0(n 0 ) = X l jg(s l )j 2 j ^ f( l )j 2 (6.18) Denote g s as the function which obeys g s ( l ) =g(s l ). By construction of the kernel g, we have that it is continuous in [ min ; max ], and therefore also continuous in each interval [0;s l ]. By the mean value theorem for each l there exists c l such that g(s l )g(0) =g 0 s (c l )(s l s 0 ) (6.19) where g 0 s denotes the derivative of g s , and s 0 < c < s l . By the properties of spec- tral graph wavelets we have that g(0) = 0, whereas the eigenvalue of the combinatorial Laplacian obeys 0 = 0, and thus: g(s l ) =sg 0 s (c l ) l (6.20) Using (7.6) and the CauchySchwartz inequality we have: X l jg(s l )j 2 j ^ f( l )j 2 = X l s 2 g 02 s (c l ) 2 l j ^ f( l )j 2 s 2 X l g 02 s (c l ) l X l l j ^ f( l )j 2 (6.21) Finally, directly applying Lemma 2 we have 87 s 2 X l g 02 s (c l ) l X l l j ^ f( l )j 2 =s 2 X l l g 02 s (c l )jj5 fjj 2 ( 1 ;T ) N C f s 2 X l g 02 s (c l ) l =s 2 ( 1 ;T ) N C f C s (6.22) where C s = X l g 02 s (c l ) l (6.23) and therefore the inequality is obtained. 2 Thus, Theorem 1 shows that the spectral graph wavelets decay as a function of the smoothness properties of the manifold ( 1 ;T ) and the scale s which validates our as- sumption that high frequency coecients have lower energy. Remark 1: It is important to note that this result is guaranteed under sampling rates where the k nearest neighbors on the graph are within a given geodesic distance on the manifold. Experimentally, the SGW transform and the suggested denoising approach demonstrate similar behavior and performance in cases where the local neighborhood in- cludes points that are far in terms geodesic distance, if most of the nearest points on the graph belong to the true local neighborhood on the manifold. Remark 2: The theoretical results stated above establish a new and interesting explicit connection between the smoothness in the graph domain and the graph frequencies via the curvature properties of the manifold (quantied by the condition number of the man- ifold). This results implies that the Graph Laplacian and SGW bands will have more energy in the low frequencies for manifolds with slowly changing or constant curvature. 88 Remark 3: Note that for suciently densely sampled manifold (smaller T , and thus larger N ) with bounded vertex degree d max , since N 2d max (by the Gerschgorin Cir- cle theorem), the term s 2 C s will admit smaller values for suciently small scale values s<< 1. 6.5.3 Noisy graph signals ~ f overG In this section we consider the case where the function is noisy and the Laplacian is noiseless. We assume that the observed coordinate along dimension r is aected with an i.i.d Gaussian noise so that for each point the noisy function corresponding to coordinate r is given by ~ f r (i) =f r (i) + r (i). As before, again we drop the subscript r and use ~ f to denote the graph signal, and the corresponding Gaussian noise . The next lemma shows that the noise term in the spectral wavelet coecient has expectancy zero, and a variance that depends on the energy of the kernel lter in the spectral graph domain of the corresponding band s. Thus for the SGW designed using the kernel function suggested in [37] the variance of the noise term at arbitrary adjacent scales diers by a logarithmic factor. Note that the results obtained in this section are provided in terms of expectancy, while in all the other sections the analysis provided is deterministic. Lemma 3 Consider the spectral wavelet coecient obtained from a noisy function and noiseless Laplacian. Then we have that the expectation of the noisy wavelet coecient term in scale s is E[ (s;n)] = 0, and E[ X i j (s;i)j 2 ]s 2 2 r C s (6.24) 89 where C s is given in (6.23). Proof First note that from the linearity of the spectral graph wavelets and since the noise (i)N(0; 2 r ) is i.i.d, we immediately obtain: E[ (s;n)] = 0 (6.25) Next observe that by using (7.5) and applying the mean value theorem for the function g s ( l ) = g(s l ) in a similar way to Theorem 1 we have the following equality holds for any frequency band s X n j (s;n)j 2 = X l jg(s l )j 2 j^ ( l )j 2 =s 2 X l (g 0 s (c l ) l ) 2 j^ ( l )j 2 (6.26) using the Cauchy-Schwartz inequality and equality P l j^ ( l )j 2 = P n j(n)j 2 we obtain: s 2 X l (g 0 s (c l ) l ) 2 j^ ( l )j 2 s 2 X l (g 0 s (c l ) l ) 2 X l j^ ( l )j 2 (6.27) =s 2 X l (g 0 s (c l ) l ) 2 X n j(n)j 2 (6.28) Next taking the expectation we obtain the following inequality E[ X n j (s;n)j 2 ]E[s 2 X l (g 0 s (c l ) l ) 2 X n j(n)j 2 ] =s 2 E[ X l (g 0 s (c l ) l ) 2 ]E[ X n j(n)j 2 ] =s 2 C s 2 r (6.29) where 2 r = P n j(n)j 2 is the variance of the noise in the corresponding coordinate di- mension r. 2 90 The results above show that in the case of noisy function and noiseless Laplacian, the noise aects all bands with similar probability density function, which is proportional to the bandwidth size, (which, by construction, diers by a logarithmic factor between adjacent scales). 6.5.4 Noisy graph signals ~ f over ~ G In this section, we provide an analysis for the case where both the function and the graph are noisy, under the assumption that the noisy pointsf~ x i g N i=1 ; ~ x i = x i + i are distributed uniformity on the normal to M, such that it is supported on M for all <. Note that while the assumption on the noise distributed on the normal simplies the theoretical analysis, it is also a suciently general assumption from a practical aspect. However in our experiments the distribution used was even more general (the noise was distributed in all directions). It is also important to note that in practice the noiseless set of points x i are not available, and the results obtained in this section for the noisy case assume that the manifolds are sampled suciently dense such that the conditions in Lemma 1 are satised. We will denote L(G;W ) and ~ L( ~ G; ~ W ) as the Laplacians constructed from the noiseless graphG and the noisy graph ~ G, respectively. ~ d(i) denotes the degree of the vertexi in the noisy graph ~ G. The spectral wavelet coecient which is constructed using a noisy function and a noisy Laplacian is denoted by ~ ~ f (s;n). Letjj5 ~ fjj 2 denote the graph Laplacian which is constructed from the noisy graph ~ G, and a noisy graph signal ~ f, which corresponds to an arbitrary dimension r of ~ x i . In the following proofs we assume that the noiseless points were sampled under the conditions given in Section 6.5.2. 91 Lemma 4 Let ~ X =f~ x i g N i=1 be a set noisy points, and ~ f is a noisy graph signal which correspond to an arbitrary dimension r of ~ x i . Assume that the conditions of Lemma 1 are satised for the pointsfx i g N i=1 , then we have that the following inequality holds: jj5 ~ fjj 2 ( 1 ;T +q()) N +d 2 max C ~ f (6.30) where q() = 2 min i2 ~ G max i ; 4C 2 T 2 ; (6.31) d (2) max = max i2 G f#mjd G (i;m) = 2; m2Gg; (6.32) and C ~ f = ~ f 2 is the square of the mean of the noisy graph signal ~ f. Proof: Using the denitions of the graph Laplacian with the noisy set of points ~ X we have jj5 ~ fjj 2 = ~ f T ~ L ~ f = X ij ~ w ij ( ~ f(i) ~ f(j)) 2 (6.33) Let ~ x i 2 ~ X and denote ~ B = B D (~ x i ; ~ ) for some ~ > 0. By Lemma 4.1 in [49] we have that ifv2B D (x i ;)\T ? x i M\B D (x j ;),T and x j = 2B D (x i ;) thenjjx i vjj< 2 =. Now take ~ = + 4C 2 T 2 . First observe that (B D (x i ;)\ ~ B)6=; , since M is sampled at a resolution of geodesic cover G(T ). Applying Lemma 4.1 from[49] in our case with ~ x i =v, in the worst case scenario, if x j = 2B D (x i ;), but x j 2B D (x i ; ~ ) , we obtain that jjx i ~ x i jj 4C 2 T 2 . For all i;j2 ~ G such that d M (~ x i ; ~ x j )< ~ , we have that: 92 j ~ f(i) ~ f(j)jjj~ x i ~ x j jjjjx i + i x j j jjjjx i x j jj + 2 maxf i ; j g (6.34) where i denotes the noise corresponding to the point x i . Denoting q() = 2 min i2 ~ G max i ; 4C 2 T 2 (6.35) we have that8i2G jj i jjjj~ x i x i jjq()=2 (6.36) By using normalization and the Cauchy-Schwartz inequality as in Lemma 2, and then applying inequality (6.34) in (6.33) we obtain ( 1 ;T +q()) P ij ~ w ij jj ~ fjj 2 ( 1 ;T +q()) P ij ~ w ij N ~ f 2 (6.37) For the sampling conditions of Lemma 1, we obtain that 4C 2 T 2 and thus every vertex in ~ G has at most d max +d (2) max edges, where d (2) max is dened bellow: d (2) max = max i2 G f#mjd G (i;m) = 2; m2Gg (6.38) i.e., d (2) max is the maximum number of vertices's on the graph G which are two hops away from an arbitrary vertex. Summing over all vertices's we get that P ~ d(i) < P (d max + d (2) max ) and since d max < N , the lemma is obtained. 2 93 Thus the decay in the energy of the noisy Laplacian is induced by the smoothness properties of the manifold, the distribution of the vertex degrees in the graph d (2) max , and the noise term q(). As will be shown in the next theorem, these properties will control the decay of SGWs frequency bands as well, modulated by power two of the scale s and and the constant ~ C s which is essentially an estimation of the energy of the kernel lter in a band of scale s. Theorem 2 Given a set of noisy points ~ X =f~ x i g N i=1 , and a noisy graph signal ~ f which correspond to an arbitrary dimensionr of ~ x i . Assume that the conditions of Lemma 1 are satised for the pointsfx i g N i=1 . Then, the noisy spectral wavelets in a band s, calculated using the kernel function g(s ~ ) which is non-negative in [0; ~ N ], obey: X n j ~ ~ f (s;n)j 2 s 2 ( 1 ;T +q()) N +d (2) max C ~ f ~ C s (6.39) where we have that g 0 s ( ~ l ) denotes the derivative of the kernel function g(s ~ l ), for each l we have that g s ( ~ l ) =sg 0 s (~ c l ) ~ l (6.40) for ~ c l such that s ~ 0 < ~ c l < s ~ l , ~ C s = P l g 02 s ~ l (~ c l ), and d (2) max is the maximum degree of two hops on the graph G which is dened in (6.38). Proof We rst observe that the following equation holds using the denitions for the noisy wavelet coecients: 94 X n j ~ ~ f (s;n)j 2 = X l jg(s ~ l )j 2 j ^ ~ f( l )j 2 (6.41) whereg s ( ~ ) denotes the square of the kernel function applied on the domain of the noisy eigenvalues ~ l . Using similar arguments as in Theorem 1, by the mean value theorem we obtain that for each l there exists ~ c l such that g s ( ~ l ) =sg 0 s (~ c l ) ~ l (6.42) whereg 0 s is the derivative ofg s , ands ~ 0 < ~ c l <s ~ l . Substituting the equalities above in (6.41), and using the Cauchy-Schwartz inequality we obtain: X l g s ( ~ l )j ^ ~ f( l )j 2 =s 2 X l g 02 s (~ c l ) ~ 2 l j ^ ~ f( ~ l )j 2 s 2 X l g 02 s (~ c l ) ~ l X l ~ l j ^ ~ f( ~ l )j 2 =s 2 X l g 02 s (~ c l ) ~ l jj5 ~ fjj 2 (6.43) nally using Lemma 4 we obtain s 2 X l g 02 s (~ c l )jj5 ~ fjj 2 s 2 X l g 02 s (~ c l ) ~ l ( 1 ;T +q()) N +d (2) max C ~ f =s 2 ( 1 ;T +q()) N +d (2) max C ~ f ~ C s (6.44) where ~ C s = P l g 02 s (~ c l ) ~ l 2 Remark 1: The result of Theorem 2 shows that as long as a large fraction of the 95 edges in the noiseless Laplacian remain connected in the noisy Laplacian, then we obtain that the decay of the noisy spectral graph wavelets is similar to the decay of the noise- less spectral graph wavelets. This result can also be understood by the way our graph is constructed. Bounded perturbations of the data maintain a graph topology which is similar to the unknown, noiseless graph. This property, combined with the assignment of the manifold coordinates as graph signals, preserve a measure of locality, which is needed in learning the structure of manifolds. Remark 2: Note that in the presence of noise the decay of the graph Laplacian and the SGWs depend on an additional parameter, d (2) max . Since d (2) max is a parameter which depends on the smoothness properties of the graph, then lemma 4 and theorem 2 can be interpreted as follows: if the manifold is sampled suciently densely (when ( 1 ;T ) is suciently small), then the noise becomes less signicant and the decay in the noisy case is similar to the noiseless one. Remark 3: Note that if the noise is zero, then we are reduced to the noise-free case, as one would have expected. Remark 4: In the case that L = ~ L (the Laplacian is noise-free and thus only the function is noisy) we obtain the following bound for the energy in the noisy spectral wavelets in a band s: jj5 ~ fjj 2 ( 1 ;T +q()) N C ~ f while on the other hand we can observe that the noisy-noisy case adds the term ( 1 ;T + q()) d (2) max C ~ f . 96 6.6 Proposed Approach We now describe our approach for manifold denoising. We assume that the noiseless points lie on a smooth or piecewise smooth manifold M 2 R D . Let f r () correspond to the values of all sampled points in dimension r . Denoising is performed indepen- dently for each f r (). In the noisy case, we assume that we are given a set of noisy points ~ f r (n) = f r (n) + r (n), contaminated with Gaussian noise r (n) N(0; 2 ) with zero mean and variance 2 . We assume the noise to be i.i.d. at each position and for each dimension r. Following this noise model, the goal is to provide an estimate ^ f r (i) of the original coordinatesf r (i) given ~ f r (i) for eachr and for alli. The reconstructed manifold points are estimated by constructing ^ x i , which is based on ^ f r (i). In what follows, we describe the processing done for each of these signals and, unless required for clarity we drop the subscript r and use f to denote the graph signal. Our proposed algorithm is motivated by the following properties of smooth manifold: (i) The energy of the manifold coordinate signals is concentrated in the low frequency spectral wavelets. (ii) The noise power is spread out in a similar probability distribution across all wavelets bands. Property (i) is illustrated in Figure 6.5. As can be seen (Figure 6.5(b)) most of the energy is concentrated in the GFT coecients that correspond to the smallest eigenvalues, and similarly (Figure 6.5(c)) the energy in each of the wavelet frequency bands for a 6 97 scale spectral wavelet decomposition can be seen to be concentrated in the low frequency wavelet bands. It is also important to note the dierence between our denoising strategy and shrinkage based methods commonly used in classical wavelet denosing algorithms. In the case of wavelet image denoising, the signals lie on regular grids that are independent of the signal, while in our case, the graph and the noise free signal are closely related by our graph construction. In wavelet denoising for regular signals we mainly deal with piecewise smooth signals, which lead to a predominantly low frequency signal with localized high frequency coecients that correspond to discontinuities in the piecewise smooth signal. In contrast, in our graph construction both the domain and the observations depend on the smoothness of the manifold. This has signicant implications. For example, if the sampling rate along the manifold varies with the degree of smoothness, we may expect locally smooth behavior of coordinate signals even in areas where the geometry is not as smooth. Thus we do not see SGW domain characteristics similar to what is observed in wavelet domain representation of piecewise smooth regular domain signals (isolated high frequency coecients). Instead, low frequency spectral wavelet coecients show locally smooth behavior in respect to the coordinate signals that are spatially connected on the graph, while high frequency spectral coecients are characterized by an irregular non-smooth pattern. As an example, Figures 6.4 (a)-(e) show the SGWs in dierent frequency bands of a noise free circle. As can be seen, the Spectral Graph Wavelets in the low frequency bands are changing smoothly, while the high frequency bands (s = 4; 5) are characterized by a oscillatory, non-smooth pattern. For the noisy SGWs, it can be seen that in the low frequency wavelet bands the power of the true signal is much more 98 dominant than the noise power, while in the high frequency bands the noise power is much bigger in proportion to the SGW coecients, thus making it much harder to separate noise from the signal content (which is dierent than the case of signals in regular domain). This leads to the approximation of the noise free signal by retaining the low frequency wavelet coecients and discarding the high frequency wavelet coecients. Based on these properties, denoising is performed directly in the spectral graph do- main, by retaining all wavelet coecients that correspond to the low frequency wavelets bandsss 0 , and discarding all wavelet coecients in high frequency bands aboves>s 0 . 0 100 200 300 400 500 600 700 800 900 1000 -1.5 -1 -0.5 0 0.5 1 1.5 0 100 200 300 400 500 600 700 800 900 1000 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0 100 200 300 400 500 600 700 800 900 1000 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0 100 200 300 400 500 600 700 800 900 1000 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0 100 200 300 400 500 600 700 800 900 1000 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0 100 200 300 400 500 600 700 800 900 1000 -1.5 -1 -0.5 0 0.5 1 1.5 0 100 200 300 400 500 600 700 800 900 1000 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0 100 200 300 400 500 600 700 800 900 1000 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0 100 200 300 400 500 600 700 800 900 1000 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0 100 200 300 400 500 600 700 800 900 1000 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 (a) SGW band s=1 (b) SGW band s=2 (c) SGW band s=3 (d) SGW band s=4 (e) SGW band s=5 (f) Noisy SGW band s=1 (g) Noisy SGW band s=2 (h) Noisy SGW band s=3 (i) Noisy SGW band s=4 (g) Noisy SGW band s=5 Figure 6.4: Spectral Graph Wavelets coecients corresponding to dierent frequency bands s of a circle. Figures (a)-(e) plot the SGW bands for an increasing number of frequency band s of a noise-free circle, while Figures (f)-(g) plot the SGW bands for a dierent choice of s for a noisy circle We summarize the proposed denoising algorithm for smooth manifolds in the pseudo code shown in table 3: This approach has several attractive features, in particular, it is 99 0 50 100 150 200 250 0 10 20 30 40 50 60 Frequency index GFT coefficients(absolute value) 1 2 3 4 5 6 0 500 1000 1500 2000 2500 Wavelet frequency band index Wavelet energy (c) -3 -2 -1 0 1 2 3 4 -2 0 2 -2 -1 0 1 2 3 (a) (f) (e) (d) (b) (a) 0 50 100 150 200 250 0 5 10 15 20 25 30 35 Frequency index GFT coefficients(absolute value) 1 2 3 4 5 6 0 500 1000 1500 2000 2500 Wavelet Frequency band index GFT coefficients(absolute value) Figure 6.5: Top raw: Plot of the energy of a noiseless Swiss roll with a hole (a) in the graph Fourier transform (GFT) (b) and in the spectral wavelet domain (c) Bottom row: Plot of the energy of a noiseless Mocap data ([35]) (d) in the graph Fourier transform (GFT) (e) and in the spectral wavelet domain (f) (1) non-iterative, i.e., denoising is performed directly in the spectral graph wavelet do- main in one step. (2) robust against a wide range ofk values chosen for nearest neighbor assignment on the graph. (3) computationally ecient, as the computational complexity is O(ND). 6.7 Experimental Results We present experimental results with a variety of manifolds, including ones with complex geometric structure such as sh bowl and a Swiss roll with a hole. In addition, we experimented with a sinus function which was embedded in high dimensional space of 100 Algorithm 3: Non-iterative MFD Algorithm Data: The data set ~ x, k nearest neighbors on the graph, m - order of Chebyshev polynomial approximation Construct an undirected anity graph W, using Gaussian weights as in (6.1), and 1 construct the Laplacian L from W. .; for r 1 to D do 2 Assign the corresponding coordinate values ~ f r to its corresponding vertex on 3 the graph. ; Transform the noisy coordinate signal using SGW dened on L. ; 4 Retain all scaling coecients and all wavelet coecients bellow a low pass 5 frequency ss 0 , for which the total accumulated energy is above threshold E thresh . Discard all wavelet coecients above scales>s 0 . ; Take the inverse spectral wavelet coecients of each of then proceed wavelet 6 coecients. Result: The reconstructed manifold points ^ x. (a) Noisy data (f) MD(k=20) (c) LLD(k=40) (d) MFD(k=40) (h) MFD(k=20) (e) Ground truth (b) MD(k=40) (g) LLD(k=20) -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 MFD -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 LLD -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1.5 -1 -0.5 0 0.5 1 1.5 MD -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 MFD -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 MD -3 -2 -1 0 1 2 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 Noisy Data -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1.5 -1 -0.5 0 0.5 1 1.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 LLD Figure 6.6: Experimental results on a circle: (a) Noisy circle (noise shown in red (b) Results with MD (c) Results with LLD (d) Results with MFD (e) Ground truth (f) Results with MD (g) Results with LLD (h) Results with MFD 101 (b) MD(k=40) (c) LLD(k=40) (d) MFD(k=40) (f) MD(k=20) (h) LLD(k=20) (g) MFD(k=20) -1 -0.5 0 0.5 1 -1 0 1 2 3 4 5 6 7 -0.02 0 0.02 0.04 MFD -1.5 -1 -0.5 0 0.5 1 0 2 4 6 8 -0.6 -0.4 -0.2 0 0.2 0.4 Noisy Data -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -1 0 1 2 3 4 5 6 7 -0.02 0 0.02 MD -1 -0.5 0 0.5 1 0 2 4 6 8 -0.2 0 0.2 LLD -1 -0.5 0 0.5 1 0 2 4 6 -0.1 0 0.1 LLD -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 0 1 2 3 4 5 6 7 -0.01 0 0.01 0.02 0.03 MFD -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -1 0 1 2 3 4 5 6 7 -0.01 0 0.01 0.02 MD -1 -0.5 0 0.5 1 1.5 0 2 4 6 8 -0.4 -0.2 0 0.2 0.4 0.6 (a) Noisy Data (e) Noiseless Data Figure 6.7: Experimental results on a sinus function embedded in high dimension: com- parison with dierent number of k nearest neighbors (a) Results with MD (b) Results with LLD (c) Results with MFD (d) Results with MD (e) Results with LLD (f) Results with MFD D = 200. All manifolds were sampled using a uniform distribution with N = 1000 samples, which were contaminated with isotropic Gaussian noise in all dimensions. In the results shown, the sinus and sh-bowl were contaminated with noise of variance of 0:2, and the circle, helix and Swiss roll with a hole with variance 0:1. We used s = 5 wavelet decomposition levels, and retained all wavelet coecients which correspond to the lowest ss 0 scales above total accumulated energy threshold E thresh(var=0:1) for noise variance equal to 0:1, and the lowest scales s s 0 above total accumulated energy threshold E thresh(var=0:2) with variance 0.2. The order of the Chebyshev polynomial approximation used was k=2 for ak nearest neighbor graph in order to process the manifold locally, via the approximation of the spectral wavelet coecients. For comparison and evaluation with the state of the art in manifold denoising, we compared our approach to MD ([39] and LLD ([36]). In the case of MD, we found that 102 (a) Noisy fish bowl (b) MD(k=20) (c) LLD(k=20) (d) MFD(k=20) (f) MD(k=40) (g) LLD(k=40) (h) MFD(k=40) (e) Noisy helix -1 -0.5 0 0.5 1 -1 0 1 0 0.5 1 1.5 2 2.5 3 Noisy Data -0.5 0 0.5 1 -0.5 0 0.5 0 0.5 1 1.5 2 2.5 3 MD -1 -0.5 0 0.5 1 -1 0 1 0 0.5 1 1.5 2 2.5 3 LLD -1 -0.5 0 0.5 1 -1 0 1 0 0.5 1 1.5 2 2.5 3 MFD -1 -0.5 0 0.5 1 1.5 -1 0 1 -1 -0.5 0 0.5 1 Noisy Data -0.5 0 0.5 -1 -0.5 0 0.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 MD -1 -0.5 0 0.5 1 -1 0 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 LLD -0.5 0 0.5 -1 -0.5 0 0.5 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 MFD Figure 6.8: Experimental results on a helix and sh bowl manifolds: (a) Noisy sh bowl (noise shown in red color (b) Results with MD (c) Results with LLD (d) Results with MFD (e) Noisy helix (noise shown in red color) (f) Results with MD (g) Results with LLD (h) Results with MFD small number of iterations produced better results, while a larger number of iterations produced severe over-smoothing, and thus we xed the number of iterations to 3 and used the best results. The experimental results, shown in Figures 6.6, 6.7, and 6.8, demonstrate that our method signicantly outperforms the state of the art, and produced a smooth reconstruc- tion that is faithful to the true topological structure of the manifold. In the case of the MD and LLD methods, we observe that for the more complex manifolds or under high noise levels, either the global or local geometric structure was severely distorted. More specically, MD tends to over-smooth the data, especially for manifolds with complex geometry, while LLD deviates around the mean curvature of the manifold but often fails to produce a smooth reconstruction. This can also be visualized in Figure 6.9, where we 103 -2.6 -2.4 -2.2 -2 -1.8 -1.6 -1.4 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 LLD -2.4 -2.2 -2 -1.8 -1.6 -1.4 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 MD -2.4 -2.2 -2 -1.8 -1.6 -1.4 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 MFD -2.6 -2.4 -2.2 -2 -1.8 -1.6 -1.4 -1.2 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Noisy Data (c) LLD (a) Noisy Data (d) MFD (b) MD (g) LLD (e) Noisy Data (h) MFD (f) MD -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 0 1 0.5 1 1.5 2 2.5 3 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.5 0 0.5 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 -1 0 1 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 -1 0 1 0 0.5 1 1.5 2 2.5 3 Figure 6.9: Zoom into the the circle and the helix manifolds. (a) Noisy circle (noise shown in red color, ground truth in blue) (b) Results with MD (c) Results with LLD (d) Results with MFD (e) Noisy helix (noise shown in red color, ground truth in blue) (f) Results with MD (g) Results with LLD (h) Results with MFD zoom into the denoising results obtained by MFD and the competing methods. As can be seen, MFD provides a locally smooth reconstruction which also preserves the global manifold structure, while the competing methods suer from the limitations previously mentioned. We also performed quantitative analysis and compared the reconstruction error in terms of the root mean square error (RMSE) of the denoised manifolds in com- parison to the ground truth. The comparison in Figure 6.10 shows the performance of MFD, LLD and MD for even numbers of k nearest neighbors graph selection parameter between 20 to 50, for the Swiss role with a hole, circle, sinus embedded in high dimen- sional space, and a helix. The comparison results in Figure 6.10 shows that our method 104 Swiss Roll with a hole Circle Sinus embedded in high dimension helix 20 30 40 50 0 1 2 3 4 5 Number of neighours Error (RMSE) circle MFD LLD MD 20 30 40 50 0 2 4 6 8 Number of neighours Error (RMSE) helix MFD LLD MD 20 30 40 50 0 5 10 15 20 25 Number of neighours Error (RMSE) MFD LLD MD 20 30 40 50 0 5 10 15 20 Number of neighours Error (RMSE) MFD LLD MD Figure 6.10: Experimental results evaluation of the RMSE reconstruction error of the noisy manifolds Swiss roll with a hole, circle, sinus embedded dimension D = 200, and a helix using dierent selection of k nearest neighbor is robust against a wide range ofk nearest neighbor graph selections, thanks to the multi- scale properties of the Spectral Wavelets. Quantitatively, MFD signicantly outperforms LLD and MD for a wide range of k nearest neighbors selection. 6.7.1 Experimental Results on local tangent space estimation Local tangent space estimation is a fundamental step for many machine learning and Com- puter Vision applications, which can be severally distorted in the presence of noise. In this section we perform evaluation on local tangent space estimation on noisy manifolds, and show the eect of MFD denoising in terms of the local tangent space estimation error us- ing popular approaches such as local PCA ([66]) or Tensor Voting ([46]). More details on these local tangent space estimation methods can be found in [66] and [46]. To test the de- noising eect using MFD, we add dierent amount of Gaussian noise to a sphere sampled 105 Before denoising After MFD (noise variance) 0 0.05 0.1 0.2 0.3 0.05 0.1 0.2 0.3 Tensor Voting 0.005 2.8 3.6 14.8 25.8 1.9 2.1 2.9 3.4 Local PCA 1.6 2.5 3.1 6.5 10.4 2.2 2.4 2.7 3.4 Table 6.1: Local tangent space estimation error on a sphere using Tensor Voting and local PCA, before and after denoising using MFD method with N = 1000 points and compare the local tangent space estimation error before and after MFD denoising using Local PCA and Tensor Voting. The variance of Gaussian noise tested are 0,0.05,0.1,0.2,0.3. Local PCA is estimated using k2f20; 30; 40; 50; 60; 70; 80g and the scale in Tensor Voting is tested inf0:1; 0:3; 0:5; 0:7g. For each method, we report the best results obtained for the range of parameters tested. The experimental results comparison in terms of the average local tangent space error, which is computed using the ground truth normal for all points on the manifold is tabulated in Table 6.1. The experimental results shows that MFD denoising signicantly reduce the local tangent space estimation error, especially for mid to high levels of noise (0.1,0.2,0.3). We also tested the eect of denoisng using MD and LLD, and the experimental results yield that MFD performs signicantly better in terms of local tangent space estimation using Local PCA and Tensor Voting. It is also interesting to note that in the noise free case Tensor Voting performs signicantly better then local PCA and even converges to zero in the case of manifolds with constant curvature. In the case of mid-to high noise levels , Local PCA is less sensitive to noise then Tensor Voting. However both methods show gross errors for higher levels of noise, such that the local tangent space information is severely distorted. Using MFD to denoise the data prior to local tangent space estimation allows to obtain meaningful local structure estimation. 106 Data/Method MD LLD MFD CMU MoCap 11.84 7.35 3.42 Frey face datasets 109.1 62.2 51.2 Table 6.2: RMSE average error reconstruction results on motion capture data and Frey face datasets 6.7.2 Experimental Results with real datasets For experiments with real data-sets, we tested our method on the CMU Motion capture data set, which is a dataset of human motion sequences, and the Frey faces data-set ([53]), a face-expression data-set. The Frey face dataset consists of low resolution faces with dimension D=560, and in the CMU Motion capture data set we choose 10 mixed sequences from subject 86 where the dimensionality of the data is 62. For the CMU data-set, in order to perform evaluation in a strictly unsupervised framework, we remove the temporal information from the data. Thus the data provided corresponds to static information which is then is contaminated using Gaussian noise of variance 0.1 in all dimensions. We test our method on the corrupted sequences and compare to the MD and LLD methods. The experimental results evaluation in terms of the average RMSE error are tabulated in table 6.2, where the results shown are the average error obtained in all dimensions and for all sequences. The experimental results obtained using MFD shows signicant improvement over LLD and MD. 107 6.8 Conclusions and Future work We have presented a new framework for manifold denoising which simultaneously oper- ates in the vertex and frequency graph domains by using spectral graph wavelets. The advantage of such an approach is that it allows us to denoise the manifold locally, while taking into account the ne-grain regularity properties of the manifold. Our approach is based on the property that the energy of a smooth manifold concentrates in the low frequency of the graph, while the noise eects all frequency bands in a similar way. The suggested MFD framework also possesses additional appealing properties: it is non-iterative and has low computational complexity, as it scales linearly in the number of points for sparse data, thus making it attractive to be used for large scales problems. Our current strategy for denoising is based on setting high frequency bands to zero, which is dierent then the shrinkage-based methods commonly used in classical wavelet denoising algorithms. While wavelet denoising for regular signals is mostly concerned with piecewise smooth signals that are independent of the regular grid, in our case, both the domain and the observation depend on the smoothness of the manifold. Moreover, irregular domains can potentially have dierent characteristics. For example, in some cases locally smooth behavior of the coordinate signals can occur even in areas where the geometry is not as smooth. Our theoretical and experimental study corroborate that for a smooth manifold, the energy is concentrated in the low frequencies of the graph, thus justifying our strategy to truncate the high frequencies. 108 Experimental results on manifolds with complex geometric structure show that our approach signicantly outperforms the state of the art, and is robust to a wide range of parameter selection of k nearest neighbors on the graph. There are also many other possible future directions for our denoising approach. From a practical aspect, MFD provides an eective tool to be used in unsupervised learning applications as a fast, non-iterative and ecient denoising method. Moreover, it does not require the knowledge of the intrinsic dimensionality of the manifold, which in many cases is not known in advance or dicult to estimate from the noisy data. It can also be useful in a semi- supervised or supervised problems, in which case the graph is not noisy, thus potentially the denoising process of the coordinates becomes much easier to perform. On the other hand, the fact that the behavior of the graph frequencies is primary determined by the choice of the graph construction leaves many possible theoretical and practical future explorations of the MFD approach by using dierent graph constructions. For example, a possible limitation of the current suggested denoising algorithm is when the curvature is changing very rapidly, in which case the initial graph construction may not be suciently reliable. This limitation may be solved by using curvature or tangent information, and remains open for future research. Therefore, a further investigation of how the underlying graph construction aects the spectral transform properties is one of the important future research directions for this framework. Some other promising directions include designing probabilistic modeling in the spectral wavelet domain for smooth manifolds, addressing the case of non-Gaussian noise, and developing tight bounds for the decay of spectral wavelet at higher frequencies. 109 Chapter 7 Regularizing Manifolds with Singularities In this chapter we present a unied and general framework which can eectively denoise noisy manifolds with singularities without over-smoothing at discontinuities and is also robust to a large amount of outliers. Using Spectral Graph Wavelets as a tool to provide localization in vertex and spectral domains, and a graph construction that is based on the Tensor Voting Graph, we suggest a new denoising approach which allows us to eectively process much larger classes of complex manifolds than with the Manifold Frequency Denoising method suggested in Chapter 6. 7.1 Introduction Noise poses a major challenge to discover hidden, unknown structure of data points that resides in high dimensional space. To increase the robustness and applicability of manifold learning methods, manifold denoising methods such as MD, LLD and more recently MFD described in Chapter 6 were suggested as a prior regularization step to manifold learning. As discussed in detail in the previous Chapter, the main limitation of most of the existing manifold denoising methods such as MD and LLD is that they tend to over penalize 110 either the local or the global manifold structure for a non-trivial amount of noise in the data. The MFD approach presented in Chapter 6 overcomes the over-smoothing limitation of the existing state of the art method by using the vertex and spectral domain localization that Spectral Graph Wavelets provides, and by the smoothness properties of the coordinate dimension of the manifold. Put in the context of machine learning, the approach suggested by MFD provides an eective way to perform manifold regularization which can better handle over-tting or under-tting. However, similar to most existing methods in manifold denoising, MFD only addresses smooth manifolds, thus overlooking a larger classes of piece-wise smooth manifolds which may occur in practice, for example in applications where the manifold contain self-intersections or in the case of multiple manifolds with singularities. In this chapter we suggest a general approach to regularize a large class of manifolds which goes beyond the smoothness manifold assumption commonly made by Manifold Denoising methods. Like the MFD method suggested in Chapter 6, our approach use Spectral Graph Wavelets (SGW) as a tool that provides vertex and spectral localization. Unlike classical wavelets tools which allows to eciently process piece-wise smooth signals in regular domain, Spectral Graph Wavelets is less exible in regards to the type of signals which it can process. Since typically the graph and the signal are closely related, the SGWs may exhibits locally smooth behavior even though the local geometry is not as smooth. Thus applying the MFD method on piece-smooth signals may result in severe over-smoothing near the manifold discontinuities. We suggest a framework which allows to employ the SGW in a way that takes into account the ne grain regularity properties of the manifold without over-smoothing in 111 discontinuities (which correspond to the high frequency SGW bands). Our proposed Manifold Denoising framework learns the local neighborhood graph structure using local tangent space anity which is based on the Tensor Voting Graph. The suggested local- tangent space graph reveals the presence of local singularity area in the SGW bands and thus allows to eectively separate signal from noise for each of the manifold coordinate dimensions. Therefore our suggested approach avoids a loss of signicant information which may occur using our previous denoising approach in Chapter 6 and is capable of eciently regularizing a much wider range of manifolds. We suggest two alternative denoising methods which oer a trade-o between a fast, non-iterative method with one less parameter, and an iterative method based on the Tichonov regularization which provides an optimal global solution and higher accuracy. Also note that in both methods our new suggested denoising methods do not require setting a threshold which corresponds to the number of bands discarded (unlike the MFD method suggested in Chapter 6). This Chapter is organized as follows: in Section 7.2 we summarize the related work. In Section 7.3 we describes our new approach for denoising manifolds with singularities. Section 7.4 provides the theoretical motivation for our approach. The experimental results are provided in Section 7.5 and in Section 7.6 we conclude our work. 7.2 Related Work Noise can be very harmful to manifold learning methods. Manifold Denoising methods were suggested to regularize the data and provide a reliable estimation of the noise-free 112 manifold in order to allow ecient processing of the data after denoising. The current manifold denoising methods, however, suers from the at least on of the following limi- tations: i) They tend to over-penalize either the local of global manifold structure, thus discarding signicant features of the manifold. ii) They are sensitive to parameter selection on the graph, such as k nearest neighbor selection on the graph. iii) They make restrictive model assumptions, such that the manifold is smooth every- where, without discontinuities [5]. The Manifold Frequency Denoising (MFD) method which was suggested in the pre- vious Chapter, use Spectral Graph Wavelets as a tool which provides localization in the vertex and spectral domains. This approach presented by MFD was theoretically justied by showing that under sucient sampling conditions the manifold coordinate dimension is also smooth. Based on the smoothness property of smooth manifolds in the spectral wavelet domain, it was suggested to perform non-iterative denoising by discarding the high frequency wavelets bands. However, the approach suggested by MFD still suers from the third limitation listed above, since for more general classes of manifolds with singularities, critical information may manifest in the higher frequencies of the graph, which is discarded in the MFD approach. One diculty one faces when using SGW tools is the behavior in the SGWs domain which is dierent than the typically sparse characterization shown when using wavelets transforms [22] on regular signals. Also, by construction, the SGW are guided by the structure of the underlying graph, and do not take directly into account the particular 113 class of signals to be processed. In other words, when using SGW tools, the graph and signal are closely related which means that both the domain and the observations depend on the smoothness of the manifold. Moreover, as noted in [18], SGW may exhibits locally smooth behavior even though the local geometry is not as smooth, making it harder to separate noise form signal in the higher frequencies of the graph. 7.3 Proposed Approach We suggest two alternative algorithms for denoising manifold with singularities. Consider a set of points x =fx i g;i = 1;:::N , x i 2 R D which are sampled from an unknown manifoldM, that is locally smooth expect of discontinues which we refer as the manifold singularities. An undirected, weighted graphG = (V;W ), is constructed over x, whereV corresponds to the nodes and W to the set of edges on the graph. For each x2 M, let T x i M , T ? x i M denote the tangent space and normal space estimates to M, respectively. Let O i , O j , be the sub-spaces which correspond to the local tangent space estimates of T x i M , T ? x i M ,respectively. In our graph construction, each vertex corresponds to one of the noisy observations of the manifold, with edge weights between two vertices which are based on local tangent space distance between the corresponding observations in the ambient space. Then, we apply the SGW to coordinate graph-signals on each dimension, where in a coordinate graph signal each vertex is assigned the scalar coordinate of the vertex in the corresponding dimension. Thus, we take advantage of the smoothness of the coordinate signals inside the support of the smooth sub-manifolds which are away from the local singularity areas. 114 We suggest to construct the adjacency matrix W = (w ij ) which consists of the weights w i;j between nodei and nodej, that incorporate the local tangent space distance between two points, in addition to Euclidean distance, as follows: w ij = 8 > > < > > : exp jjx i x j jj 2 2 2 2 d F (O i ;O j ) if x j 2 kNN(x i ) 0 else (7.1) Where F (O i ;O j ) is the anity function based on the local or normal tangent space estimates, kNN(x i ) denotes the k nearest Euclidean neighbors of x i , (jjjj) denotes the L2 distance between the points x i ; x j , and 2 d is the RBF parameter. Note that there are many dierent possible choices for the anity function F which is based on local tangent space; in our case it is chosen based on the maximal principal angle between the sub-spaces O i ;O j as follows [49]: F (O i ;O j ) = min u2O i max ~ v2O j hu;vi (7.2) were the estimate for the local tangent space is provided by Tensor Voting, and the anity is estimated using the Tensor Voting Graph. As before, f r () correspond to the values of all sampled points in dimension r. Denoising is performed independently for each f r (). In the noisy case, we assume that we are given a set of noisy points ~ f r (n) =f r (n)+ r (n), contaminated with Gaussian noise r (n)N(0; 2 ) with zero mean and variance 2 . We assume the noise to be i.i.d. at each position and for each dimensionr. We aim to provide an estimate ^ f r (i) of the original coordinatesf r (i) given ~ f r (i) for eachr and for alli. The reconstructed manifold points are estimated by constructing ^ x i , which is based on ^ f r (i). 115 Note that we assume that the manifold is locally smooth except near discontinuities, thus under this assumption the coordinates dimensions changes smoothly in the support of the sub-manifolds, as shown in Chapter 6. In each of the two suggested algorithms, the rst step is to construct a local neighborhood graph in which the anity is based on the local tangent space distance as suggested in Equation 7.1. A brief description of the proposed algorithms is described below and a detailed description of the algorithms is provided in tables 4 and 5. Algorithm I: The rst algorithm suggested applies a non-iterative denoising to each one of the spectral wavelet bands, by replacing the value of each the spectral wavelet coecients with the median value of its spectral wavelet coecients neighbors. Compare to the second algorithm suggested, Its main advantage is that it is very simple and ecient to implement. Recall that the local neighborhood on the graph of a spectral graph wavelet coecient f (s;i) that correspond to x i isN (s;i), which is the set of all vertices which are within s or less hops on the graph. Algorithm II: The second algorithm suggested applied Tichonov regularization to each one of the spectral wavelet bands. While this algorithm is iterative, it is more robust to a signicant large amount of noise. For each denosied band, the local neighborhood of the spectral graph wavelet coecient f (s;i) which correspond to the vertex i x i dened by the setN (s;i). Our new approach enjoys similar performance to MFD when processed on smooth manifolds, however it can eciently process a much larger class of manifolds, allowing to discover subtle hidden geometric features in the data, and moreover it doesn't require to set a threshold for how many bands needs to be removed. 116 7.4 Theoretical Motivation Comparing to previous works in Manifold Learning which are based on the Graph Lapla- cian processing, we provide the following interpretation of our framework as suggested in algorithm 4. In general, spectral based methods can be cast as solving the following minimization problem: min x g(x) + Trace(x t Lx) (7.3) Recall that In our case, we process each one of the coordinates dimensions of the manifold independently using SGWs. It can be shown that the energy in the spectral graph wavelet band s is represented by the equality in 7.4: Proposition 1 Given wavelet coecients that were calculated using a kernel function g(s) which is non-negative in [0; max ] calculated using the Laplacian L whose eigenval- ues and eigenvectors are l and l , respectively Then, the wavelet coecients in a band s obey X n j f (s;n)j 2 =sf T L h 0 s (c ) f (7.4) where h s is dened as h s = g 2 s , h 0 s is the derivative of h 0 s , s 0 < c < s l , and the Laplacian L h 0 s (c ) is the Laplacian whose corresponding eigenvectors and eigenvalues are l and h 0 s (c l ) , respectively. Proof Observe that the following equality holds for any band s: X n j f (s;n)j 2 = X n X l g(s l ) ^ f( l ) l (n) X n X l 0 g(s l 0) ^ f( l 0) = X l jg(s l )j 2 j ^ f( l )j 2 (7.5) 117 Denoting h s is the kernel function dened as h s = g 2 s , h 0 s is the derivative of h s , by construction of the kernel g, we have that it is continuous in [ min ; max ], and therefore h s is also continuous in each interval [0;s l ]. By the mean value theorem for eachl there exists c l such that h(s l )h(0) =h 0 s (c l )(s l s 0 ) (7.6) where h 0 s denotes the derivative of h s , and s 0 < c < s l . By properties of spectral wavelets we have that g(0) = 0, and therefore h(0) = 0 whereas the eigenvalue of the combinatorial Laplacian obeys 0 = 0 , and thush(s l ) =sh 0 s (c l ) l . Using (7.6) and the CauchySchwartz inequality we have: X l jg(s l )j 2 j ^ f( l )j 2 = X l sh 0 s (c l ) l j ^ f( l )j 2 =sf T L h 0 s (c ) f (7.7) where the Laplacian L h 0 s (c ) is the Laplacian whose eigenvectors are the same as L and its eigenvalues are the eigenvalues of L modulated by the valueh 0 s (c l ) and the corresponding scale s . 2 Using the total energy of in each SGW band s in equation 7.7 we can compare our method through the glance of the graph Laplacian based approaches in Manifold Learning such as [39]. First, in our method, we process each of the coordinate manifold dimensions indepen- dently, using the SGW transform. The energy of an arbitrary spectral graph wavelets band that is dened in 7.7, which is to be minimized, in fact correspond to the graph Laplacian quadratic of the Laplacian L h 0 s (c ) , which is the Laplacian whose eigenvectors are the same as L and its eigenvalues are the eigenvalues of L modulated by the value 118 Algorithm 4: Denoising manifolds with singularities using Tichonov regularization Data: The data set ~ x , is the tensor voting scale, k nearest neighbors on the local tangent space, m - order of Chebyshev polynomial approximation, - smoothness parameter regularization. Construct W based on local tangent space distance as in Equation. Construct L 1 from W.; for r 1 to D do 2 Assign the corresponding coordinate values ~ f r to its corresponding vertex on 3 the graph. ; Transform the noisy graph signals ~ f r using SGW. ; 4 for j 1 to J do 5 apply Tichonov regularization (Equation 7.8) to the corresponding spectral 6 graph wavelet coecientsf f (s;n)g N n=1 , where the local neighborhood of each vertex n is dened byN (n;s) ; Take the inverse spectral wavelet coecients of each of then proceed wavelet 7 coecients. Result: The reconstructed manifold points ^ x . Algorithm 5: Denoising manifolds with singularities using a non-iterative approach Data: The data set ~ x , is the tensor voting scale, k nearest neighbors on the local tangent space, m - order of Chebyshev polynomial approximation, - smoothness parameter regularization. Construct W based on local tangent space distance as in Equation. Construct L 1 from W.; for r 1 to D do 2 Assign the corresponding coordinate values ~ f r to its corresponding vertex on 3 the graph. ; Transform the noisy graph signals ~ f r using SGW. ; 4 for j 1 to J do 5 To each one of the the spectral wavelet coecients in each one of the SGW 6 frequency band, s = 1:::m apply the following function: F ( f (s;n)) = medianf f (s;m)jm2N (n;s);g based on the local neighborhoodN (n;s) ; Take the inverse spectral wavelet coecients of each of then proceed wavelet 7 coecients. Result: The reconstructed manifold points ^ x . 119 h 0 s (c l ) and the corresponding scale s. We apply Tikhonov regularization for each one of the Spectral wavelet bands: s = 1::J, and for each one of the manifold coordinates , f r ; r = 1:::D, can be represented be the following functional: min fr n g s (f r ) + Trace(sf T r L h 0 s (c ) f r ) o (7.8) where in the algorithm II, we choose to use Tikhonov regularization with g s (f r ) = jj ^ fr (s) ~ fr (s)jj, where ~ fr (s) is the noisy spectral wavelet band input. Note that the Laplacian L is constructed using local tangent space distance, that will exhibit ambiguity behavior in the local intersecting area. These ambiguities will propagate to the higher frequencies of the graph, thus making it easier to separate signal from noise in this bands. Thus, this approach allows us to take into account the ne-grain regularities of the manifold, by using information across all the high frequencies bands integrated from all the manifold coordinate dimensions. 7.5 Experimental results We show experimental results on a variety of manifolds with singularities including man- ifolds with singularities and compare to state of the art deniosing manifolds including MFD. The dataset used include intersecting circles, the self-intersecting gure eight which, and two spheres which are intersecting. For local tangent space estimation we sued Tensor Voting we for local tangent space estimation and the Tensor Voting Graph. We perform evaluation in dierent categories such as local tangent space estimation and clustering. All manifolds were sampled using a uniform distribution, which were contaminated with 120 isotropic Gaussian noise in all dimensions. We used s = 5 wavelet decomposition lev- els. The order of the Chebyshev polynomial approximation used was m = 5 in order to process the manifold locally, via the approximation of the spectral wavelet coecients. 7.5.1 Experiments of manifolds with singularities with inlier noise We rst demonstrate the eect of denoising in the local intersection area, where applying current approaches results in over- smoothing. Following the framework suggested in this work, we denoise each one of the spectral wavelet bands using the algorithms suggested in this chapter. Also note that our approach does not require to set a threshold to retain certain bands for which the total accumulated energy is above a certain threshold, as required by previous MFD approach. Comparing to MFD approach suggested in the previous chapter, it can be seen in Figure 7.2 that the local intersection area is very noisy, while using the new denoising approach provides a locally smooth reconstruction. Experiential results on two intersecting circles can be seen in Figure 7.1. The eect of denoising on the local structure estimation is shown in Table 7.1 where the local tangent space error for all points is compared before and after denoising. We can clearly see a signicant reduction in the local tangent space estimation error using our denoising approach. Also note that it can be further reduced by performing clustering and then applying more iterations, as suggested by our approach for clustering manifolds with singularities. We also show the eect of denoising in terms of clustering manifolds in Table 7.2 which the clustering accuracy of two interesting spheres before and after denoising. The result shows signicant improvement after denoising, where in both cases, TVG was used to construct the anity graph prior to denoising. 121 (a) (b) (c) (d) Figure 7.1: Experimental results on two noisy intersecting circles (a) Noisy circles (b) Results with our new deoising method for singularities (c) Zooming into the local inter- section area of the noisy circle (d) Zooming into the local intersection area of the denoised output after using our method 7.5.2 Experiments with both outlier and inlier noise We also show experimental results in the presence of both large amount inlier and outlier noise. To remove outliers we use the Tensor Voting framework, which is our rst step for local geometric estimation. In each experiment we added 100 % of outlier noise (compare to the total number of inlier points). Using the output of the eigen-decomposition at each point, we classify and then reject a point as an outlier if its Tesnor's corresponding largest Error/Data Noisy Data Denoised data local tangent space error 10.81 5.8 (two interesting circle) Table 7.1: Average error of local tangent space estimation using Tensor Voting before after and using our denoising method. 122 (a) Noiseless (b) Noisy (c) MFD (d) New Method Figure 7.2: Experimental results on noisy self-intersecting gure 8 : (a) Noiseless gure eight (b) Noisy gure eight (c) Zooming in the local singularity area: results obtained using MFD with graph based on Euclidean weights (d) Zooming in the local singularity area: results obtained using our denoising method with graph which is based on local tangent space estimation using the Tensor Voting Graph eigenvalue was not among theN largest valued obtained from sorting all the largest eigen- value from each Tensor. In Figure 7.4 experimental results of a noisy gure eight, where it can be seen that that even under large amount of both inlier and outlier noise we were able to obtain a smooth reconstruction, and avoid over-smoothing in the local intersection area. Figure 7.5 shows results on two intersecting spheres which are contaminated with Data Noisy Data Denoised data Clustering accuracy 86% 96 % (intersecting spheres) Table 7.2: Clustering results of two intersecting spheres before and after denoising. In both cases we used our TVG to construct the graph. 123 (a) Noisy intersec/ng spheres (a) Denoised intersec/ng spheres Figure 7.3: Experimental results on noisy intersecting spheres (a) Noisy intersecting spheres (b) The Denoised spheres using our method. a large amount of both inlier and outlier noise. The output reconstruction is shown to maintain the local and global topological properties of the manifold. 7.5.3 Experiments with real data set: the cyclo-octane data We also show results on real data set which includes the cyclo-octane data [10]. The low -embedding space of this cyclo-octane data without noise is shown in Figure 7.6, where it can be seen that it is composed of the union of a sphere and a Klein bottle, which is a non-orientable manifold. As before, we add a large amount of both inlier and outlier noise, shown in Figure 7.6. The denoised output using our method can be seen in Figure 7.6, where it is shown to recover the structure of the noiseless data. 124 -8 -6 -4 -2 0 2 4 6 8 -6 -4 -2 0 2 4 6 -10 -8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 (a) (b) Figure 7.4: Experimental results on noisy self-intersecting gure 8 which was also con- taminated with outliers : (a) The gure eight with both inlier and outlier noise (b) denoised output using our method. Both outlier and inlier noise is removed, and the local intersection area is preserved 7.6 Discussion In this chapter we have presented a new denoising approach which addresses the case of noisy non-linear manifolds with singularities. We demonstrated that for certain graph construction which is based on local tangent space anity we can successfully denoise graph signals which correspond to the manifold coordinates of piecewise-smooth mani- folds. Thus, the proposed approach uniform ally address the main limitations of both the TVG and MFD frameworks in the previous sections; the TVG approach which degrades in the presence of inlier noise, and the MFD denoising approach pretested in Chapter 6, which only address smooth manifolds, and suers from over-smoothing in the local intersection area. Future work include testing our denoising method for additional and 125 (a) Spheres with both inlier and outlier noise (b) Denoised output -10 -5 0 5 10 15 20 25 -10 0 10 -10 -5 0 5 10 6 Figure 7.5: Experimental results on noisy intersecting spheres with outliers: (a) Noisy intersecting spheres with 100% outliers (b) denoised output obtained after removing inlier and outlier noise using our method. dierent graph constructions such as the Mahalanobis distance, and characterizing the the decay of the SGWs for a D dimensional manifold input. 126 -4 -3 -2 -1 0 1 2 3 4 -5 -4 -3 -2 -1 0 1 2 3 4 5 x y -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1 0 1 2 3 2 (b) Noisy cyclo-octane molecule (c) Denosied cyclo-octane molecule -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1 0 1 2 clean data (c) Noise-free cyclo-octane molecule Figure 7.6: Experimental results on real data-set of cyclo-octane manifold data (a) Noise- less cyclo-octane (b) Noisy cyclo-octane which was contaminated with both inlier and outlier noise. (c) Denoised output after removing both inlier and outlier noise using our method. 127 Chapter 8 Discussion and Future Work This chapter summarizes our main contributions and presents possible directions for future work. 8.1 Contributions In this thesis we have presented a general unsupervised learning framework that ad- dresses the problem of analyzing high dimensional inputs, which are given in the form of unstructured noisy set of points. We have addressed this problem by tackling the main limitations of the existing manifold learning approaches one by one, solving challenging core problems including: estimating meaningful distances on complex manifolds, explic- itly addressing manifolds with singularities, handling a large amount of both inlier and outlier noise in the data, robustness to k nearest neighbor parameter selection in the graph, and nally addressing the case where the local singularity area is noisy. The core tools used in our framework are Tensor Voting for ecient local structure estimation and the Spectral Graph Wavelets for the global structure estimation, which also provides us a tool to perform a multi-resolution analysis on the graph. Our unied 128 approach overcomes two of the main limitation of Tensor Voting: First, the limitation of Tensor Voting as a strictly local approach; this limitation is handled by proposing to construct a unique graph, which we coin the Tensor Voting Graph, that allows us to perform global operations on complex manifolds such as estimating geodesic distances and clustering. Second, the limitation of Tensor Voting to large amount of inlier noise is addressed by suggesting a new method for denoising Manifolds, called MFD. This method overcomes the over-smoothing limitations of the current unsupervised denoising methods on one hand, while it is also capable to eciently denoise manifolds with singularities without losing critical topological properties of the manifold. We have provided a theoretical justication for our Manifold Frequency Denoising framework. First, we demonstrate that for smooth manifolds the coordinate dimensions of the manifold changes smoothly, and that manifolds with smoother characteristics create more energy in the lower frequencies. Using these results we characterize the decay of the spectral wavelet coecients, where it is shown that higher frequency wavelet coecients decay in a way that depends on the smoothness properties of the manifold, which is explicitly tied to the curvature properties. Our theoretical analysis also addresses the general case where the both the graph and the graph signals noisy (i.e., the graph is constructed using the noisy observations, and graph signals assignments correspond to the coordinate dimensions of the noisy ob- servations). In this case we show that under a certain model of noise and sampling rate conditions, if the noise term is bounded by the smoothness properties of the manifold and the sampling rate is suciently dense, then the energy of both the noisy graph Laplacian and the noisy spectral wavelets decay in a similar way to the smooth, noiseless case. 129 Finally, we combine the TVG an MFD approaches into a unied approach which can analyze and denoise manifold with singularities without over-smoothing in the local singularity area, and also in the presence of outliers. While the SGW which used in our framework serves as a useful tool to provide localization in both the vertex and spectral domains, it is less exible than classical wavelet transform in regards to the type of signals it can process. In our case, both the graph and the signal are dependent, meaning that the SGW may exhibit locally smooth behavior, even when the signal is not as smooth. We demonstrate that by using graph construction which is based on the local tangent space estimation and applying denoising on each of spectral wavelet bands, we can successfully apply denoising algorithm on a general class of manifolds with singularities. This suggested framework allows us to take into account the ne-grain regularities of the manifold, and thus it avoids over-smoothing at discontinuities. 8.2 Future Work We believe that this study opens new directions for exploring the suggested framework towards robust manifold learning applications. From a practical angle, it would be in- teresting to apply our framework to further computer vision and robotics applications, as well as to other scientic domains. For example, it would be interesting to further employ it in experts domain such as in the problem of protein folding and molecular motion, where recent research revealed that the molecule cycloocatne has a structure of self-intersecting manifolds [10]. 130 For future research in Tensor Voting, a compelling direction is to study the conver- gence properties in respect to the local geometric structure including local tangent space and dimensionality estimation. Previous common uses of Tensor Voting were in a non- iterative framework or with a heuristic choice for the number of iterations. In our study we discovered convergence properties of the TV framework as the number of iterations increase. Even though the non-iterative approach produces a reliable local geometric es- timation, our new experiments reveal that iterations can be useful for Tensor Voting and can signicantly improve the local geometric structure estimation. This refers in particu- lar to the tangent space estimation, where empirically the local tangent space estimation can be shown to converge to the ground truth for a two dimensional sphere. As an exam- ple, Figure 8.1 shows the results of the average local normal space error estimated for a sphere as a function of the number of iterations. It can be seen that the error is decreased signicantly after a number of iterations. For research in SGWs applications in manifold learning, a further investigation of how the underlying graph construction aects the spectral transform properties is one of the important future research directions for this framework, since the behavior of the graph frequencies is primarily determined by the choice of the graph construction. Other promising directions include designing probabilistic modeling in the spectral wavelet do- main for smooth manifolds, addressing the case of non-Gaussian noise, and developing tight bounds for the decay of spectral wavelet at higher frequencies. Another interesting direction is to build probabilistic models which attempt to denoise the graph Laplacian directly, in addition or prior to denoising the data points, as has been suggested in a number of recent works [21]. In general, learning graphs from data samples 131 -1 0 1 -1 0 1 -1.5 -1 -0.5 0 0.5 1 1.5 (a) (b) Figure 8.1: The example above shows convergence of the local tangent space estimation by Tensor Voting on a Sphere, which was sampled with 1000 points. As the number of iterations increase, the tangent space estimation by the iterative Tensor Voting converge to the true tangent space is an ill-posed problem, since there might be many solutions to associate a structure to the data. However in many applications it is possible to dene probabilistic meaningful models for computing the topology of the graph. It may be useful to construct such models that incorporates our denoising framework together with a probabilistic model for denoising the Laplacian directly. Another interesting and important direction for future work is to explicitly address the out of sample problem [59], [6], [28]. Most manifold learning approaches perform dimensionality reduction, in which case solving the out of sample problem is very dicult [6]. In contrast, our framework performs all operations in the ambient space. Thus, learning new points can be performed directly by estimating the local geometric structure 132 using Tensor Voting and MFD. An explicit handling of this problem would be useful for practical applications. 133 Appendix: Theoretical analysis: In this section we prove that given two sub-manifolds that are intersecting, then under certain conditions, the maximal principal angle of between the local tangent spaces is smaller for points which belong to the same manifold in the local intersection area. Thus, given three points which are suciently close to the local intersection area, where only two points belong to the same manifold, the maximal principal angle will be smaller between the pair which belong to the same manifold. This result serves as a motivation for our ambiguity resolution algorithm, which allows us to untangle the manifolds in the intersection area. The proof is given for the case of K = 2 intersecting manifolds for simplicity, and can be extend to K > 2 . Preliminary Denitions: LetR N be the ambient space. LetM d (k) denote the class of connected,C 2 and compactd dimensional manifolds without boundary embedded inR N , with reach at least 1 k , which is a notion used to quantify smoothness[29]. Formally, the reach of M R N is the supremum over r> 0 such that, for each y2B(x;r), where B(x;r) = y2R N :jjx yjj<r (8.1) there is a unique point in M nearest x. For submanifolds without boundaries, the reach coincides with the condition number 1n [29]. 134 Given two connected,C 2 and compactd dimensional manifoldsM 1 ;M 2 2M d (k). Dene X J as the set of points where M 1 ;M 2 intersect X J =fx2M 1 \M 2 jx = s 1 = s 2 ; s 1 2M 1 ; s 2 2M 2 g (8.2) Let J to be the set of all principal angles corresponding to the maximal principal angle max at J: J (M 1 ;M 2 ) =f max (T M 1 (s 1 );T M 2 (s 2 ))js 1 ; s 2 2X J g (8.3) and also let inf J = inf (s 1 ;s 2 ) f max (s 1 ; s 2 )j max (s 1 ; s 2 )2 J g (8.4) to be the inmum obtained over all possible maximal principal angles at the intersections points. To prove our main claim, we need the following lemma, which was proved in [29]. This results gives a bound on the maximal principal angle between the local tangent space on a smooth manifold which is much more tight than the one obtained in [49], and therefore is the one we use. Lemma 1 [29]. For MM d (k) and any s; s 0 2M max T M (s);T M (s 0 ) < 2asin min k 2 s 0 s ; 1 (8.5) Lemma 2: Suppose M 1 ;M 2 M d (k), and assume that J (s 1 ; s 2 ) > 0,8s 1 ; s 2 2 X J , and that the intersection set has a strictly positive reach. Given three points y; t; z, and assume that, without loss of generality, y; t2 M 1 ; z2 M 2 . Then, for r > 0 which 135 obeys r = min n asin( inf J ); ( inf J ) 4k ; 1 o , we have that for all y; t 2 M 1 ; z 2 M 2 , where y; z; t2B(x;r), the following inequality is satised: max (T M 1 (t);T M 1 (y))< max (T M 1 (y);T M 2 (z)) (8.6) Proof We rst prove that inf 2 J = C1 > 0: We claim that there exists C1 > 0 such that for all x2 X J , where x = s 0 = t 0 , s 0 2 M 1 ; t 0 2 M 2 , then for all s2 M 1 ; t2 M 2 ; where s; t2B(x;), we have that max (T M 1 (s);T M 2 (t))> 0. Assume in contrast that there exists x2 X J , x = s 0 = t 0 such that for all > 0, s2M 1 ; t2M 2 ; s; t2B(x;) such that max (T M 1 (s);T M 2 (t)) = 0. Thus, by this assump- tion, there exists sequencesfs n g 1 n=1 ;ft n g 1 n=1 such that max (T M 1 (s n );T M 2 (t n )) = 0,8n. Since M 2 is compact, there exists a converging sub-sequence ft nk g 1 k=1 M 2 such thatft nk g 1 nk=1 ! t 0 2 M 2 . On the other hand, we have that max (T M 1 (s n );T M 2 (t 0 ))) = 08n. Since M 1 is compact,fs n g 1 n=1 has a converging sub- sequencefs nk g 1 k=1 ! s 0 2M 1 . Thus we have that max (T M 1 (s nk );T M 2 (t 0 ))) = 0 and max (T M 1 (s 0 );T M 2 (t 0 ))) > 0. Letting v s nk ; v t nk denote the vectors corresponding to the maximal principal angles of the local tangent spaces T M 1 (s nk ) and T M 2 (t nk ), we have 136 that jhv s nk ; v t 0ij = 1,jhv s 0; v t 0ij6= 1, and v s nk ! v s 08k, which is a contradiction to the compactness of M 1 . We use the results above to conclude the proof. Let r = min n asin( inf J ); ( inf J ) 4k ; 1 o . With- out loss of generality, choose s; s 0 2M 1 and t2M 2 where s; s 0 ; t2B(x;r) with x2X J . Using Lemma 1 and applying the arc-sine function which is increasing - we have that: max T M 1 (s);T M 1 (s 0 ) 2asin min k 2 s 0 s ; 1 2asin k r 2 2asin( inf J ) 2( inf J =4) = inf J =2 (8.7) On the other hand, we have that max (T M 1 (s);T M 2 ((t))> inf J =2 for each s2M 1 ; t2M 2 , and thus we have that max (T M 1 (s);T M 1 (s 0 ))< max (T M 1 (s);T M 2 (t)) 137 Reference List [1] Yonathan A alo and Ron Kimmel. Spectral multidimensional scaling. Proceedings of the National Academy of Sciences, 110(45):18052{18057, 2013. [2] Ery Arias-Castro, Guangliang Chen, and Gilad Lerman. Spectral clustering based on local linear approximations. Electron. J. Statist., 5:1537{1587, 2011. [3] Richard G. Baraniuk and Michael B. Wakin. Random projections of smooth mani- folds. In Foundations of Computational Mathematics, pages 941{944, 2006. [4] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373{1396, 2003. [5] Mikhail Belkin, Qichao Que, Yusu Wang, and Xueyuan Zhou. Graph laplacians on singular manifolds: Toward understanding complex spaces: graph laplacians on manifolds with singularities and boundaries. CoRR, abs/1211.6727, 2012. [6] Yoshua Bengio, Jean-Franois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Le Roux, and Marie Ouimet. Out-of-Sample Extensions for LLE, Isomap, MDS, Eigen- maps, and Spectral Clustering. MIT Press, Cambridge, MA, 2004. [7] I. Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94:115{147, 1987. [8] M. Brand. Charting a manifold. Advances in neural information processing systems, pages 985{992, 2003. [9] A.E. Brouwer and W.H. Haemers. A lower bound for the laplacian eigenvalues of a graph-proof of a conjecture by guo. Discussion Paper 2008-27, Tilburg University, Center for Economic Research, 2008. [10] Martin S. Pollock S.N. Coutsias E.A. Watson J.P. Brown, W.M. Algorithmic dimen- sionality reduction for molecular structure analysis. Journal of Chemical Physics, 2008. [11] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation, 4(2):490{530, 2005. [12] Emmanuel J. Cand es, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J. ACM, 58(3):11:1{11:37, June 2011. 138 [13] Guangliang Chen and Gilad Lerman. Spectral curvature clustering (scc). Interna- tional Journal of Computer Vision, 81(3):317{330, 2009. [14] B. Cheng, J. Yang, S. Yan, Y. Fu, and T. Huang. Learning with l1-graph for image analysis. Transaction on Image Processing, 19(4):858{866, 2010. [15] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, F. Warner, and S. Zucker. Ge- ometric diusions as a tool for harmonic analysis and structure denition of data: Diusion maps. In Proceedings of the National Academy of Sciences, pages 7426{ 7431, 2005. [16] Ronald R. Coifman and Mauro Maggioni. Diusion wavelets, 2004. [17] T. F. Cox and M. A. A. Cox. Multidimensional Scaling. Chapman & Hall, London, 2001. [18] Shay Deutsch, Antonio Ortega, and Gerard Medioni. Manifold denoising based on spectral graph wavelets. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, 2016. [19] E. Dijkstra. Communication with an Automatic Computer. PhD thesis, University of Amsterdam, 1959. [20] P. Doll ar, V. Rabaud, and S. Belongie. Non-isometric manifold learning: analysis and an algorithm. In Proceedings of the 24th international conference on Machine learning, pages 241{248, 2007. [21] Xiaowen Dong, Dorina Thanou, Pascal Frossard, and Pierre Vandergheynst. Lapla- cian matrix learning for smooth graph signal representation. In 2015 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pages 3736{3740, 2015. [22] David L. Donoho and Iain M. Johnstone. Ideal spatial adaptation by wavelet shrink- age. Biometrika, 81:425{455, 1994. [23] D.L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding tech- niques for high dimensional data. In Proceedings of the National Academy of Sciences of the United States of America, volume 100, pages 5591{5596, 2003. [24] D.L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding tech- niques for high dimensional data. In Proceedings of the National Academy of Sciences of the United States of America, volume 100, page 5591, 2003. [25] E. Elhamifar and R. Vidal. Sparse subspace clustering. In CVPR, pages 2790{2797, 2009. [26] E. Elhamifar and R. Vidal. Sparse manifold clustering and embedding. In Advances in Neural Information Processing Systems, pages 55{63. 2011. [27] Ehsan Elhamifar and Ren e Vidal. Sparse subspace clustering: Algorithm, theory, and applications. CoRR, abs/1203.1005, 2012. 139 [28] Deniz Erdogmus, Murat Ak cakaya, Suleyman Serdar Kozat, and Jan Larsen, edi- tors. 25th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2015, Boston, MA, USA, September 17-20, 2015. IEEE, 2015. [29] GiladLerman EryArias-Castro and Teng Zhang. Spectral clustering based on local pca. IN Review, 2013. [30] Yan Gao, Kap Luk Chan, and Wei-Yun Yau. Manifold denoising with gaussian pro- cess latent variable models. In 19th International Conference on Pattern Recognition (ICPR 2008), December 8-11, 2008, Tampa, Florida, USA, pages 1{4, 2008. [31] Aristides Gionis, Alexander Hinneburg, Spiros Papadimitriou, and Panayiotis Tsaparas. Dimension induced clustering. LWA, pages 109{110, 2005. [32] Andrew B. Goldberg, Xiaojin Zhu, Aarti Singh, Zhiting Xu, and Robert Nowak. Multi-manifold semi-supervised learning. In AISTATS, pages 169{176, 2009. [33] Y. Goldberg, A. Zakai, D. Kushnir, and Y. Ritov. Manifold learning: The price of normalization. Journal of Machine Learning Research, 9:1909{1939, 2008. [34] D. Gong, X. Zhao, and G. Medioni. Robust multiple manifold structure learning. In ICML, 2012. [35] Dian Gong and Gerard Medioni. Dynamic manifold warping for view invariant action recognition. Computer Vision, IEEE International Conference on, 0:571{578, 2011. [36] Dian Gong, Fei Sha, and Grard G. Medioni. Locally linear denoising on image manifolds. In AISTATS, volume 9 of JMLR Proceedings, pages 265{272. JMLR.org, 2010. [37] David K. Hammond, Pierre Vandergheynst, and R emi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129{150, March 2011. [38] J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 28:100{108, 1979. [39] M. Hein and M. Maier. Manifold denoising. In Advances in Neural Information Processing Systems 19, pages 561{568, Cambridge, MA, USA, September 2007. Max- Planck-Gesellschaft, MIT Press. [40] H. Hotelling. Analysis of a complex of statistical variables into principal components, 1933. [41] Neil Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In In NIPS, page 2004, 2003. [42] Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, and Steven Zhiying Zhou. Perspective motion segmentation via collaborative clustering. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 1369{1376, 2013. 140 [43] T. Lin and H. Zha. Riemannian manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5):796{809, 2008. [44] Ulrike Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395{416, December 2007. [45] P. Mordohai and G. Medioni. Tensor Voting: A Perceptual Organization Approach to Computer Vision and Machine Learning. Morgan & Claypool Publishers, 2006. [46] P. Mordohai and G. Medioni. Dimensionality estimation, manifold learning and function approximation using tensor voting. Journal of Machine Learning Research, 11:411{450, 2010. [47] Philippos Mordohai and Gerard Medioni. Junction inference and classication for gure completion using tensor voting. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 4:56, 2004. [48] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849{856, 2001. [49] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of submanifolds with high condence from random samples. Discrete & Computational Geometry, 39(1):419{441, 2008. [50] Jiahao Pang, Gene Cheung, Antonio Ortega, and Oscar C. Au. Optimal graph laplacian regularization for natural image denoising. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, (6), 2015. [51] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embed- ding. SCIENCE, 290:2323{2326, 2000. [52] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embed- ding. SCIENCE, 290:2323{2326, 2000. [53] Sam T. Roweis, Lawrence K. Saul, and Georey E. Hinton. Global coordination of local linear models. In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahra- mani, editors, Advances in Neural Information Processing Systems 14 [Neural Infor- mation Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada], pages 889{896. MIT Press, 2001. [54] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:888{905, 1997. [55] D.I. Shuman, S.K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging eld of signal processing on graphs: Extending high-dimensional data anal- ysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83{98, May 2013. [56] A. Singer and H. Wu. Vector diusion maps and the connection laplacian. Commu- nications on Pure and Applied Mathematics, 65(8):1067{1144, 2012. 141 [57] Chi-Keung Tang and G erard G. Medioni. Inference of integrated surface, curve, and junction descriptions from sparse 3d data. IEEE Trans. Pattern Anal. Mach. Intell., 20(11):1206{1223, 1998. [58] J. Tenenbaum, V. de Silva, and J. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500):2319{2323, 2000. [59] L.J.P. van der Maaten, E. O. Postma, and H. J. van den Herik. Dimensionality reduction: A comparative review, 2008. [60] R. Vidal, Y. Ma, and S. Sastry. Generalized principal component analysis (gpca), 2003. [61] David Waltz. Understanding line drawings of scenes with shadows. In The Psychol- ogy of Computer Vision, page pages. McGraw-Hill, 1975. [62] David L. Waltz. Generating semantic descriptions from drawings of scenes with shadows. Technical report, Cambridge, MA, USA, 1972. [63] Y. Wang, Y. Jiang, Y. Wu, and Z. Zhou. Spectral clustering on multiple manifolds. IEEE Transactions on Neural Networks, 22(7):1149{1161, 2011. [64] Kilian Q. Weinberger and Lawrence K. Saul. Unsupervised learning of image man- ifolds by semidenite programming. Int. J. Comput. Vision, 70(1):77{90, October 2006. [65] L. Zelnik-manor and P. Perona. Self-tuning spectral clustering. In Advances in Neural Information Processing Systems, pages 1601{1608, 2004. [66] Z. Zhang and H. Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientic Computing, 26(1):313{338, 2005. 142
Abstract (if available)
Abstract
This study addresses a range of fundamental problems in unsupervised manifold learning. Given a set of noisy points in a high dimensional space that lie near one or more possibly intersecting smooth manifolds, different challenges include learning the local geometric structure at each point, geodesic distance estimation, and clustering. These challenges are ubiquitous in unsupervised manifold learning, and many applications in computer vision as well as other scientific applications would benefit from a principled approach to these problems. ❧ In the first part of this thesis we present a hybrid local-global method that leverages the algorithmic capabilities of the Tensor Voting framework. However, unlike Tensor Voting, which can learn complex structures reliably only locally, our method is capable of reliably inferring the global structure of complex manifolds using a unique graph construction called the Tensor Voting Graph (TVG). This graph provides an efficient tool to perform the desired global manifold learning tasks such as geodesic distance estimation and clustering on complex manifolds, thus overcoming one of one of the main limitations of Tensor Voting as a strictly local approach. Moreover, we propose to explicitly and directly resolve the ambiguities near the intersections with a novel algorithm, which uses the TVG and the positions of the points near the manifold intersections. ❧ In the second part of this thesis we propose a new framework for manifold denoising based processing in the graph Fourier frequency domain, derived from the spectral decomposition of the discrete graph Laplacian. The suggested approach, called MFD, uses the Spectral Graph Wavelet transform in order to perform non-iterative denoising directly in the graph frequency domain. To the best of our knowledge, MFD is the first attempt to use graph signal processing [55] tools for manifold denoising on unstructured domains. We provide theoretical justification for our Manifold Frequency Denoising approach on unstructured graphs and demonstrate that for smooth manifolds the coordinate signals also exhibit smoothness. This is first demonstrated in the case of noiseless observations, by proving that manifolds with smoother characteristics create more energy in the lower frequencies. Moreover, it is shown that higher frequency wavelet coefficients decay in a way that depends on the smoothness properties of the manifold, which is explicitly tied to the curvature properties. We then provide an analysis for the case of noisy points and a noisy graph, establishing results which tie the noisy graph Laplacian to the noiseless graph Laplacian characteristics that are induced by the smoothness properties of the manifold. The suggested MFD framework holds attractive features such as robustness to k nearest neighbors parameter selection on the graph, and it is computationally efficient. ❧ Finally, the last part of this research merges the Manifold Frequency Denoising and the Tensor Voting Graph methods into a uniform framework, which allows us to effectively analyze a general class of noisy manifolds with singularities also in the presence of outliers. We demonstrate that the limitation of the Spectral Graph Wavelets in regards to the types of graph signals it can analyze can be overcome for manifolds with singularities using certain graph construction and regularization methods. The suggested approach allows us to take into account global smoothness characteristics without over-smoothing in the manifold discontinuations, which correspond to high frequency bands of the Spectral Graph Wavelets.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Structure learning for manifolds and multivariate time series
PDF
Efficient graph learning: theory and performance evaluation
PDF
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Motion pattern learning and applications to tracking and detection
PDF
Critically sampled wavelet filterbanks on graphs
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Alleviating the noisy data problem using restricted Boltzmann machines
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Novel algorithms for large scale supervised and one class learning
PDF
Neighborhood and graph constructions using non-negative kernel regression (NNK)
PDF
Human motion data analysis and compression using graph based techniques
PDF
Efficient transforms for graph signals with applications to video coding
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Word, sentence and knowledge graph embedding techniques: theory and performance evaluation
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
Tensor learning for large-scale spatiotemporal analysis
PDF
Cloud-enabled mobile sensing systems
PDF
Human activity analysis with graph signal processing techniques
PDF
Efficient graph processing with graph semantics aware intelligent storage
PDF
Algorithm and system co-optimization of graph and machine learning systems
Asset Metadata
Creator
Deutsch, Shay
(author)
Core Title
Learning the geometric structure of high dimensional data using the Tensor Voting Graph
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/09/2017
Defense Date
05/06/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
manifold learning,OAI-PMH Harvest,Tensor Voting Graph,unsupervised denoising
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Medioni, Gerard (
committee chair
), Nakano, Aiichiro (
committee member
), Ortega, Antonio (
committee member
)
Creator Email
shaydeu1@gmail.com,shaydeut@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-640839
Unique identifier
UC11305690
Identifier
etd-DeutschSha-4767.pdf (filename),usctheses-c3-640839 (legacy record id)
Legacy Identifier
etd-DeutschSha-4767.pdf
Dmrecord
640839
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Deutsch, Shay
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
manifold learning
Tensor Voting Graph
unsupervised denoising