Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Scalable sampling and reconstruction for graph signals
(USC Thesis Other)
Scalable sampling and reconstruction for graph signals
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SCALABLE SAMPLING AND RECONSTRUCTION FOR GRAPH SIGNALS by Ajinkya Jayawant A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2023 Copyright 2023 Ajinkya Jayawant Acknowledgements Advancing from coursework in Master’s to research in Ph.D. requires a significant leap in maturityandvision, andagoodmentorisindispensabletoavoidpitfallsandprovidemomen- tum. My Ph.D. advisor, professor Antonio Ortega has mentored me regarding everything ranging from problem selection, critical thinking, to writing. I am thankful to him for show- ing me the ropes for the process of research. However, a Ph.D. is more than just a technical journey. In this journey, for more than being a first-rate advisor, I am grateful to professor Ortega for being a good person. Before starting my Ph.D., I was worried about being homesick staying away from family and friends in India for an extended period of time, and about being able to manage living by myself on the other side of the globe. During my stay in Los Angeles, I was thankful to be close enough to the families of Aditya and Saurabh Jayawant who provided me with a homely atmosphere in a country I was living for the first time. Living by yourself forces you to navigate all sorts of scenarios ranging from housing, finance, to career, more or less on your own. From the first to the last year of my Ph.D., three friends - Saurav Prakash, Alexander Serrano, and Miguel Moscoso, have stuck with me through all ups and downs and helped me on various occasions. I owe it to them for making my life in US smooth sailing. Once I had managed to settle in Los Angeles, research proved to be a challenge in the beginning. Collaborations and discussions with professor Salman Avestimehr, colleagues Basak Guler, and Eduardo Pavez, helped me get up to speed with the latest research. During a following summer internship, I got the opportunity to work with Wenqing Jiang. ii I am especially thankful to him for providing an exciting and encouraging atmosphere for my internship and I am glad that I met him. For various Ph.D. milestones since then, Akshay Gadde provided me timely advice. On reaching the thesis proposal milestone, for myqualifyingexam,thecommitteeofprofessorsJayKuo,SalmanAvestimehr,KeithJenkins, RameshGovindan, andmyadvisorgavemanyusefulsuggestionsandcomments. Inaddition, pursuing a doctoral degree requires dependable academic advisement, funding 1 , and logistics to proceed smoothly. For that, I want to thank professor Richard Leahy, Tracy Charles, Andy, Gloria, Seth, and Diane. Ph.D. students through most parts of the world suffer from anxiety and depression, but a good social circle is known to help maintain mental well-being. I was fortunate enough to have a healthy office atmosphere and social circle where I could learn new things and also to wind down outside of work. For that, I thank my colleagues and friends Aamir, Benjamin, Keng-Shih, Shashank, Johanna, Laura, Ecem, Samuel, Darukeesan, Carlos, Jitin, Nitin, Vivek, Rashmi, Sagar, Pratyusha, Chaitanya, Sarath, and Dhruva. Now as I look forward towards the completion of Ph.D., I feel fortunate to have grown up in a family interested in science, which primarily led me to think about pursuing a Ph.D. When it came to pursuing a Ph.D., allowing their only child to travel overseas to study for a program that does not have a fixed time limit took strength on the part of my parents. From the beginning and throughout the process, I am grateful for the support of my parents - Kirti and Anand Jayawant. This Ph.D. has been possible because of many people who have contributed along the way in their own unique way. To all these people who have accompanied me to where I am today, I thank you. 1 This work is supported in part by NSF under grants CCF-1410009, CCF-1527874, and CCF-2009032 and by a gift from Tencent. iii Table of Contents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Graph preliminaries and notation . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Sampling problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2: Practical graph signal sampling with log-linear size scaling . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Solving D-optimal objectives . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Efficient sampling set selection algorithms . . . . . . . . . . . . . . . . . . . 15 2.3.1 Incremental subset selection . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Approximation through distances . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Approximate volume maximization (AVM) through inner products . 21 2.3.3.1 Approximate squared coherence . . . . . . . . . . . . . . . . 22 2.3.3.2 Approximate inner product matrix . . . . . . . . . . . . . . 23 2.3.3.3 Computing low pass filtered delta signals . . . . . . . . . . . 24 2.3.3.4 Fast inner product computations . . . . . . . . . . . . . . . 25 2.3.3.5 Summary of approximations . . . . . . . . . . . . . . . . . . 25 2.3.4 Computational complexity of AVM . . . . . . . . . . . . . . . . . . . 26 2.3.4.1 Dependence on coherence estimation accuracy . . . . . . . . 26 2.3.4.2 Dependence on the number of samples . . . . . . . . . . . . 27 2.3.4.3 Log-linear dependence on graph size . . . . . . . . . . . . . 27 2.4 Volume maximization interpretation of sampling . . . . . . . . . . . . . . . . 27 iv 2.4.1 The SP algorithm as Gaussian elimination . . . . . . . . . . . . . . . 28 2.4.2 SP algorithm as volume maximization . . . . . . . . . . . . . . . . . 29 2.4.3 Eigendecomposition-free methods as volume maximization . . . . . . 34 2.5 Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.1 Signal, Graph Models and Sampling setups . . . . . . . . . . . . . . . 36 2.5.1.1 Signal smoothness and graph topologies . . . . . . . . . . . 36 2.5.1.2 Sampling set sizes . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.1.3 Classification on real-world datasets . . . . . . . . . . . . . 37 2.5.1.4 Effect of scaling graph sizes . . . . . . . . . . . . . . . . . . 37 2.5.2 Initialization details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.3 Reconstruction techniques . . . . . . . . . . . . . . . . . . . . . . . . 40 2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6.1 Reconstruction error . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6.2 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6.3 Effect of number of samples on execution time . . . . . . . . . . . . . 46 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 3: Robust graph signal sampling . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3 Problem formulation: Lost samples . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 4: Graph signal reconstruction with unknown signal bandwidth . . . . . . . 65 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.1 Signal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.2 Model selection for reconstruction . . . . . . . . . . . . . . . . . . . . 68 4.2.3 Bandwidth selection through reconstruction errors . . . . . . . . . . . 68 4.3 Cross-validation theory for graph signals . . . . . . . . . . . . . . . . . . . . 69 4.3.1 Conventional error estimation and shortcomings . . . . . . . . . . . . 69 4.3.2 Proposed error estimation . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1 Graph construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.2 Set selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 5: Subgraph-based parallel sampling for large point clouds . . . . . . . . . . 76 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 Distributed sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.1.1 Local sampling global reconstruction . . . . . . . . . . . . . 79 v 5.3.1.2 Proxy for optimizing the sampling over subgraphs . . . . . . 80 5.3.2 Distributed sampling through graph modifications . . . . . . . . . . . 82 5.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4.1 Further approximations for faster sampling . . . . . . . . . . . . . . . 86 5.4.1.1 Leveraging degree information . . . . . . . . . . . . . . . . . 86 5.4.1.2 First order approximation to frequency spectrum . . . . . . 87 5.4.1.3 Coherence estimation using degree and neighbors . . . . . . 88 5.4.1.4 Reducing polynomial degrees . . . . . . . . . . . . . . . . . 88 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Chapter 6: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 A Proof of eigenvector convergence . . . . . . . . . . . . . . . . . . . . . . . . . 98 B Justification for ignoring target bandwidth while sampling . . . . . . . . . . 103 C Approximating Gram matrix by a diagonal matrix . . . . . . . . . . . . . . . 104 D D-optimal sampling for generic kernels . . . . . . . . . . . . . . . . . . . . . 105 vi List of Tables 1.1 Linear algebra notation in this thesis . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Approximation to greedy maximization of determinant. LSSS - For imple- mentation details refer to Section 2.4.3 . . . . . . . . . . . . . . . . . . . . . 35 2.2 Types of graphs in the experiments . . . . . . . . . . . . . . . . . . . . . . . 38 2.3 Execution time(secs.) for sampling, random sensor graphs . . . . . . . . . . 43 2.4 SNRs, random sensor graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.5 Execution time(secs.) for sampling, Community graphs . . . . . . . . . . . . 44 2.6 SNRs, Community graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 Setting for cross-validation experiments on sensor networks and weather data. 74 5.1 Reconstruction PSNRs for various sampling algorithms . . . . . . . . . . . . 85 5.2 Algorithm execution times in secs. . . . . . . . . . . . . . . . . . . . . . . . . 86 6.1 Proposed algorithms and contributions . . . . . . . . . . . . . . . . . . . . . 90 6.2 Proposed algorithms and contributions . . . . . . . . . . . . . . . . . . . . . 91 vii List of Figures 1.1 Pipeline for sampling and reconstruction of graph signals. Uncolored ver- tices indicate unknown signal values, and colored vertices indicate known or predicted signal values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Geometry of SP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Comparison of eigendecomposition-free methods in the literature. x-axis: number of samples selected. y-axis: average SNR of the reconstructed signals. We do not include the entire range of SNR from WRS based reconstruction because of its comparatively wider range. . . . . . . . . . . . . . . . . . . . . 49 2.3 Visualizing average sampling times of four algorithms over 50 iterations on community graphs with 10 communities. Execution times for LSSS are aver- aged over executions for different parameter values. . . . . . . . . . . . . . . 50 2.4 Scatter plot of the SNR vs execution time for graph size 8000. Axis for execution time is reversed, so results on the top right are desirable. . . . . . 50 2.5 Random sensor graphs with 8192 vertices and 20 nearest neighbour connections. 51 2.6 Random sensor graphs with 50,000 vertices and 20 nearest neighbour connec- tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.7 Community graphs with 8192 vertices and 10 communities. . . . . . . . . . . 51 2.8 Community graphs with 50,000 vertices and 10 communities. . . . . . . . . . 51 3.1 Performance comparisons for Barabasi-Albert graph with (a) n = 100, (b) n=200. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2 Performance comparisons for Erdös-Rényi graph with (a) n=100, (b) n=200. 64 4.1 Inputs to the reconstruction algorithm . . . . . . . . . . . . . . . . . . . . . 67 4.2 Disconnected graph case for cross-validation. In this example, we sample a single graph containing two connected components (left and right). Since the random subsetS c i is disconnected fromS i , signal values onS c i cannot be inferred from signal values onS i . . . . . . . . . . . . . . . . . . . . . . . . . 71 viii 4.3 Squared reconstruction errors vs bandwidth for bandlimited signal model. Legend is common for all the plots. . . . . . . . . . . . . . . . . . . . . . . . 72 5.1 Original graph and its division into subgraphs for enabling the parallelized sampling algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Illustration of how uniform sampling works before and after partitioning. . . 80 5.3 Illustration of how AVM sampling works before and after partitioning. . . . . 81 5.4 AVM tends to favor sampling more isolated (lower degree) nodes. Our pro- posed solution of adding self-loops prevents this bias towards boundary nodes. 81 5.5 Illustration of how uniform sampling works before and after partitioning. . . 82 5.6 Point clouds for experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.7 Estimating λ k using λ max . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1 Sampling and reconstruction pipeline . . . . . . . . . . . . . . . . . . . . . . 90 6.2 Closeness to diagonal at each iteration. . . . . . . . . . . . . . . . . . . . . . 105 ix Abstract Processing data such as spatially scattered weather measurements or point clouds generated from 3D scans is a challenge due to the lack of an inherent structure to the data. Graphs are convenient tools to analyze and process such unstructured data with algorithms analogous to those in traditional signal processing, which treat the data as a graph signal. However, measuring the whole graph signal can be expensive. Observing a limited subset of graph nodes may be better in such cases, i.e., sampling the graph signal and inferring information at the remaining nodes using reconstruction algorithms. Although graph signal sampling and reconstruction algorithms exist in the literature, making them practical enough to be used in real-life applications requires numerous theoret- ical and fundamental improvements. One such requirement is that the algorithm should not require substantially more execution time as the number of vertices and edges in the graph increases. Even if the algorithm execution time scales well with the graph size, some samples may get corrupted. Reconstruction of such data is challenging and requires knowledge of the signal model or its parameters, which are commonly assumed to be known in the literature but in practice have to be estimated. In this thesis, we propose algorithms to minimize the reconstruction error when sam- pling in the presence of signal corruption, estimate reconstruction error as a function of the signal bandwidth and develop scalable graph sampling algorithms, in particular algo- rithms amenable to parallelization. This makes the graph signal reconstruction algorithms x more flexible and increases the capabilities of the graph sampling algorithms. Through these improvements, we push the limits of graph signal sampling and reconstruction even further. xi Chapter 1 Introduction Graphs are a convenient way to represent and analyze data having irregular relationships between data points [63] and can be useful in a variety of different scenarios, such as char- acterizing the Web [40], semi-supervised learning [73], community detection [23], or traffic analysis [16]. We call graph signal [48] the data associated with the nodes of a graph. Graph signals are useful in analyzing real-world systems, such as sensor networks, biological data, or machine learning systems, using tools from graph signal processing [63, 49, 26]. Due to the size of most real-world graphs, it is often unfeasible to observe all the data points on the graph. In such scenarios, one needs to select a small subset of nodes, observe the corresponding samples and make inferences about samples in the remaining nodes in the network using the data obtained from the selected nodes. Such setups are also related to the problem of active semi-supervised learning [24], where one chooses a small set of data points to label and learns the missing labels by utilizing the labeled data along with the graph topology. The question is then to choose the best data points to sample to reconstruct the underlying data structure as accurately as possible. This is known as the graph signal sampling problem [54]. Optimizing the choice of sampling set using concepts from experiment design [56] can help minimize the effect that noise in the input signal may have on the quality of signals reconstructed from observed samples. Many existing sampling set selection methods are computationally intensive because they require an eigendecomposition. For large graphs, 1 even existing eigendecomposition-free methods are still much slower than random sampling algorithms, which are the fastest available methods. In Chapter 2, through optimizing sam- pling sets towards the D-optimal objective from experiment design, we propose a sampling algorithm that has complexity comparable to that of random sampling algorithms while reaching accuracy similar to existing eigendecomposition-free methods for a broad range of graph types. WhiletheproposedgraphsignalsamplingalgorithminChapter2improvesexistingstate- of-the-art algorithms, some practical scenarios often place additional demands on a sampling algorithm. For example, some algorithms may encounter a situation where some selected samples are lost or unavailable due to sensor failures or adversarial erasures. To address this setting, in Chapter 3, we formulate a robust graph signal sampling problem where only a subset of selected samples are received, and the goal is to maximize the worst-case performance. We propose a novel greedy robust sample selection algorithm and study its performance guarantees. Our numerical results demonstrate the performance improvement of the proposed algorithm over the existing schemes. Another desirable characteristic of sampling algorithms is that they require minimal assumptions. Data on graphs is often modeled as bandlimited graph signals. Predicting or reconstructingtheunknownsignalvaluesforsuchamodelrequiresestimatingthebandwidth. In Chapter 4, we propose a signal bandwidth estimation technique. In doing so, we design a cross-validation approach with a stable graph signal reconstruction and propose a method for estimating the reconstruction errors for different choices of signal bandwidth. Using this technique, we can estimate the reconstruction error on various real-world graphs. Data such as point clouds often contain millions of data points and need to be downsized before processing. Downsampling algorithms for point cloud data typically demand high sampling rates and require fast processing for run-time applications. In Chapter 5, we pro- pose parallelized algorithms for point clouds in the high sampling rate regime. We test these 2 algorithms on various point clouds and compare them to the algorithm in Chapter 2, ob- serving only marginal loss in performance with an order of magnitude speedup in processing times. 1.1 Graph preliminaries and notation A graph is defined as the pair (V,E), where V is the set of nodes or vertices and E is the set of edges [10]. The set of edgesE is a subset of the set of unordered pairs of elements of V. A graph signal is a real-valued function defined on the vertices of the graph, f :V →R. We index the vertices v∈V with the set{1,··· ,n} and define w ij as the weight of the edge between vertices i and j. The (i,j) th entry of the adjacency matrix of the graph A is w ij , with w ii =0. A is n× n, where n is the number of vertices in the graph, which we also call the graph size. The degree matrix D of a graph is an n× n diagonal matrix with diagonal entries d ii = P j w ij . This thesis considers weighted undirected graphs without self-loops and with non-negative edge weights. Throughout the thesis, I is the n× n identity matrix. The combinatorial Laplacian of a graph is given by L =D− A, with its corresponding eigendecomposition defined as L = UΣ U T since the Laplacian matrix is symmetric and positive semidefinite. The eigenvalues of the Laplacian matrix are Σ = diag(λ 1 ,··· ,λ n ), with 0 = λ 1 ≤ ··· ≤ λ n representing the frequencies. The column vectors of U provide a frequency representation for graph signals so that the operator U T is usually called the graphFouriertransform(GFT).Theeigenvectorsu i ofLassociatedwithlargereigenvaluesλ i correspond to higher frequencies, and the ones associated with lower eigenvalues correspond to lower frequencies [63, 48]. The GFT of x is defined as ˜ x=U T x [63]. In this thesis, we represent sets using calligraphic uppercase, e.g., X, vectors using bold lowercase, x, matrices using bold uppercase, A, and scalars using plain uppercase or lower- case as x or X. Table 1.1 lists additional notations. We use tr(A) to denote the trace of A. diag(x 1 ,··· ,x n ) represents a square diagonal matrix with values x 1 ,··· ,x n on the diagonal. 3 Table 1.1: Linear algebra notation in this thesis Notation Description X i X after iteration i |X| Cardinality of setX A XY or A X,Y Submatrix of A indexed by setsX andY A ij (i,j) th element of A A X A :,X , selection of the columns of A A i A after iteration i x i or x(i) i th element of the vector x x X or x(X) Subset of the vector x corresponding to indicesX x v Vector corresponding to a vertex v among a sequence of vectors indexed over the set of verticesV ∥.∥ Two/Euclidean norm of matrix or vector |x|,|x| Entry wise absolute value of scalar x or vector x 1.2 Problem formulation 1.2.1 Sampling problem We consider an n-dimensional scalar real-valued signal x on the vertex set V. We assume that only a part of this signal is known, corresponding to a subset of vertices, S ⊆V . For the sake of convenience, without loss of generality, for a given algorithm, the vertices are relabeledaftersamplingsothattheirlabelscorrespondtotheorderinwhichtheverticeswere chosen,S ={1,2,···} . We denotex S andx S c the known and unknown signals, respectively, whereS c is the complement ofS. Estimatingx S c fromx S is the graph signal reconstruction problem [54]. We denote the reconstructed unknown signal as ˆ x S c, and quantify its closeness with the original signal,x S c, using the ℓ 2 norm∥x S c− ˆ x S c∥ 2 2 . A popular choice for a smooth graph signal is the bandlimited signal model [2]. Bandlimited signals are represented asx=U F α , whereF is the set{1,··· ,f}, andα is an f-dimensional vector. We call f the bandwidth of the signal. For sampling bandlimited signalsx, a sampling set that satisfies the following two conditions: Condition i) the number of samples requested is larger than the bandwidth, that is |S| ≥ f, and Condition ii) the sampling setS is a uniqueness set [54] corresponding to the bandwidth supportF, will allow 4 Sample Reconstruct Figure 1.1: Pipeline for sampling and reconstruction of graph signals. Uncolored vertices indicate unknown signal values, and colored vertices indicate known or predicted signal values. us to recover x exactly. Given the observed samples, x S , the reconstruction is given by the least squares solution: ˆ x=U F (U T SF U SF ) − 1 U T SF x S . (1.1) In practice, signals are never exactly bandlimited. In this thesis, we consider the widely- studied scenario of bandlimited signals with added noise and choose sampling rates that satisfy Condition i) for the underlying noise-free signal 1 . While Condition ii) is difficult to verify without computing the eigendecomposition of the Laplacian, it is likely to be satisfied if Condition i) holds. Indeed, for most graphs, except those that are either disconnected or have some symmetries (e.g., unweighted path or grid graphs), any sets such that |S| ≥ f are uniqueness sets. Thus, similar to most practical sampling methods [2, 57, 59, 5], our samplingalgorithmsarenotdesignedtoreturnuniquenesssetssatisfyingConditionii), which guarantee exact recovery, and instead, we assume that Condition i) is sufficient to guarantee exact recovery. We consider the signal model f =x+n, (1.2) 1 We do not consider cases where signals are not bandlimited but can be sampled and reconstructed (refer to [67] and references therein). Exploring more general models for signal sampling is left for future work. 5 wherex is bandlimited andn is an n-dimensional noise vector. The reconstruction from the sampled signal f S =x S +n S is then: ˆ f =U F (U T SF U SF ) − 1 U T SF (x S +n S ). (1.3) This process is summarized in Figure 1.1. The original unknown signal is f. After sampling it, we know a few signal values corresponding to f S . Using those values, we reconstruct the signal, represented as ˆ f. Since (1.1) allows us to reconstruct x exactly, the error in the reconstructed signal is: ˆ f− x=U F (U T SF U SF ) − 1 U T SF n S . (1.4) The expected value of the corresponding error matrix, ( ˆ f− x)( ˆ f− x) T , is E[( ˆ f− x)( ˆ f− x) T ]=U F (U T SF U SF ) − 1 U T SF E[n S n T S ]U SF (U T SF U SF ) − 1 U F T . (1.5) Each choice of a sampling setS leads to a different error covariance matrix in (1.5). The sampling problem consists of finding the sampling set S that optimizes a scalar function h of the error covariance matrix: S ∗ =argmax S h E[( ˆ f− x)( ˆ f− x) T ] (1.6) In the subsequent chapters, we will study the sampling problem for different functions h of the error covariance matrix and under different assumptions on the signal and the noise distributions. 6 Chapter 2 Practical graph signal sampling with log-linear size scaling 2.1 Introduction Similar to traditional signals, a smooth graph signal can be sampled by making observations on a few graph nodes so that the signal at the remaining (non-observed) nodes can be estimated [14, 2, 67]. For this, we need to choose a set of vertices, S, called the sampling set, on which we observe the signal values with the goal of predicting signal values on the other vertices (the complement ofS,S c ). In the presence of noise, some sampling sets lead to better signal reconstructions than others: the goal of sampling set selection is to find the best such sampling set. For traditional discrete signals such as images and audio, downsampling by an integer factor often works well because of the implicit ordering and regular spacing in the signals (e.g., observing every other sample in a discrete-time signal). Such a structure with ordered and evenly spaced out locations of the discretized signal is unavailable for most graph signals. As a result, the best sampling set is also unknown. Accurately reconstructing graph signals from observed samples usually relies on the assumption that the underlying signal is smooth. Intuitively, this means that signal values for neighboring vertices are not drastically different. This is a reasonable assumption in various scenarios, such as sensor networks modeling 7 temperature distribution, graph signals representing labels in semi-supervised learning, or preferences in social networks. This makes it possible to reconstruct these graph signals from a few observed values [54]. As we saw in Chapter 2, a common model for smooth graph signals assumes that most of their energy is localized in the subspace spanned by a subset of eigenvectors of the graph Laplacian or other graph operators [63]. Thus, the problem of selecting the best sampling set naturally translates to the problem of selecting a submatrix of the matrix of eigenvectors of the graph Laplacian [2]. Specifically, the problem reduces to a row/column subset selection similar to the linear measurement sensor selection problem [35]. In the graph signal sampling context, several papers leverage this knowledge to propose novel algorithms — [62, 14, 69, 68, 13, 71]. We refer the reader to [67] for a recent comprehensive literature review on this topic. However, to solve the graph sampling set selection problem, row/column selection needs to be applied to the matrix of eigenvectors of the graph Laplacian (or those of some other suitable graph operator). The corresponding eigendecomposition is an O(n 3 ) operation for an n× n matrix 1 . This makes it impractical for large graphs in machine learning, social networks, and other applications, for which the cost of eigendecomposition would be pro- hibitive. Thus, methods that solve this subset selection problem without explicitly requiring eigendecomposition are valuable. 2.1.1 Prior work We can group sampling set selection methods into two main classes based on whether they require eigendecomposition or not. Some methods compute the full eigendecomposition [62, 14, 69], or instead require a sequential eigendecomposition, where one eigenvector is computed at each step [2]. Alternatively, eigendecomposition-free methods do not make use 1 In practice if the signal is bandlimited to the lowest f frequencies, only f eigenvectors need to be computed, but even this can be a complex problem (e.g., a signal bandlimited to the top 10% frequencies of a graph with millions of nodes). We describe these as full decomposition methods for simplicity, even though in practice only a subset of eigenvectors is needed. 8 of an eigendecomposition of the Laplacian matrix [57, 71, 68, 59] and are usually faster. Weighted Random Sampling (WRS) [57] is the fastest method but provides only guaran- tees on average performance, which means that it may exhibit poor reconstruction accuracy for specific instances. It also needs more samples to match the reconstruction accuracy of other eigendecomposition-free methods. Among eigendecomposition-free methods discussed in [67], Neumann series-based sampling [71] has a high computational complexity, Binary Search with Gershgorin Disc Alignment (BS-GDA) [5] has low computational complexity for smaller graphs, but cannot compete with WRS for large graphs, and Localization operator based Sampling Set Selection (LSSS) [59] achieves good performance but requires some pa- rameter tuning to achieve optimal performance. Our proposed method can overcome these limitations: similar to [71, 5, 59], it is eigendecomposition-free, but it has complexity closer to WRS while requiring fewer parameters to tune than WRS. Other recently proposed sampling algorithms are eigendecomposition-free but involve a different setup than what we consider in this chapter. For example, the error diffusion sam- pling algorithm (Algorithm 5 from [50]) achieves complexity comparable to WRS. However, the sampling set and the number of samples chosen depend on the vertex numbering in the graph, which has to be done independently of the algorithm in question. In [50], no specific vertex numbering suitable for Algorithm 5 was recommended. A random vertex numbering algorithm would be fast but may lead to suboptimal sampling set choices (similar to what may happen with random sampling). Thus, more research may be needed to identify efficient numbering algorithms. Note that other blue noise sampling algorithms [51] do not require vertex numbering, but they involve distance computations on the graph similar to Distance- Coherence(DC) in [33]. In contrast, our proposed algorithm, AVM, is independent of the vertex numbering of the graph and does not require distance computations. As another ex- ample, the algorithms proposed in [7] and [1] are designed for sampling clustered piecewise constant graph signals. However, this chapter focuses on a bandlimited smoothness model for graph signals, with topologies not limited to clustered graphs. 9 2.1.2 Motivation To motivate our methods consider first WRS, where vertices are sampled with a probability proportional to their squared local coherence [57]. However, selecting vertices having the highest coherence may not result in the best sampling set because some vertices may be “redundant” (e.g. if they are close to each other on the graph). Other sampling algorithms [59] improve performance by selecting vertices based on importance but avoid redundancy by minimizing a notion of the overlapped area between functions centered on the sampled vertices. In our preliminary work [33], we proposed the Distance-Coherence (DC) algorithm, which mitigates the effect of redundancy between vertices by adding new vertices to the sampling set only if they are at a sufficient distance on the graph from the previously selected nodes. While this can eliminate redundancy, it leads to higher computation costs since distance computation is expensive. As an alternative, in this chapter, we propose a novel algorithm, Approximate Volume Maximization (AVM), that replaces the distance computation with a filtering operation. Loosely speaking, our proposed scheme in AVM precomputes squared coherences as [57], with an additional criterion to maintain separation between selected vertices using a filtering operation. The resulting complexity (see Section 2.3.4) has a log- linear dependence on the number of edges in a connected graph. The log-linear dependence is desired because it is similar to that of WRS, which is the fastest among algorithms in the literature that use spectral information, and second only to unweighted random sampling from [57] in terms of overall speed. AVM can also be considered an efficient approximation to the D-optimality criterion [4]. In this chapter, we review the main concepts in DC, introduce AVM, and demonstrate its benefits over existing algorithms. 2.1.3 Contributions Our main contributions are: 10 1. We describe our distance-based sampling DC algorithm (Section 2.3) to illustrate how to balance the frequency and vertex domain information of graphs for sampling. DC provided us with key ideas to develop the AVM algorithm and can potentially serve as the basis for hybrid algorithms. 2. We introduce a new algorithm, AVM (Algorithm 2), which can be used for any graph size or topology while requiring only a few parameters to tune. Moreover, changing the values of these parameters allows us to achieve different trade-offs between speed and accuracy. 3. Using the framework of volume based sampling (Section 2.3), we interpret a series of algorithms — exact greedy [69], WRS, Spectral Proxies (SP) [2], LSSS, DC, and our proposed AVM as variations of the volume maximization problem formulation (Section 2.4), and explain critical differences between existing methods and AVM. 4. AVM provides competitive reconstruction performance on a variety of graphs and sam- pling scenarios, improving reconstruction signal-to-noise ratio (SNR) over WRS by at least0.6dBandfrequentlysignificantlyhigher(e.g., 2dB)—Section2.5. Thepractical- ity of AVM is apparent for larger graph sizes (e.g., of the order of a hundred thousand nodes). In the graph sizes chosen for some of our experiments (see Section 2.5.1.4), other state-of-the-art algorithms such as SP, LSSS and BS-GDA often fail, while a complete execution is always possible for AVM. At graph sizes small enough for the other algorithms to be applied, AVM is at least 2.5 times and often orders of magni- tude faster compared to state-of-the-art algorithms such as SP, LSSS and BS-GDA, while sacrificing less than 0.01dB of reconstruction SNR — Section 2.6. We explain these advantages in terms of complexity towards the end of Section 2.3 by showing that compared to WRS, the additional computations needed by AVM scale linearly as a function of the number of edges in a connected graph. 11 As a summary, our proposed AVM sampling algorithm has complexity comparable to the WRS sampling algorithm along with a significantly better reconstruction accuracy. It achieves this without requiring prior knowledge of the signal bandwidth and can be used for different graphs while requiring a few easy-to-tune parameters. 2.2 Problem setup 2.2.1 Formulation We follow the formulation of Section 1.2.1, with signal model (1.2) and least squares re- construction from the signal samples as in (1.5)(see Section 1.2.1). If we assume further individual noise entries to be independent with zero mean and equal variance, the expected value, the error covariance matrix from (1.5) reduces to K e =E[( ˆ f− x)( ˆ f− x) T ]=cU F (U T SF U SF ) − 1 U F T , (2.1) for a constant c. Different metrics of the reconstruction error ˆ f − x can be optimized by maximizing a function h : M n,n (R) → R of K e , where M n,n (R) is an n× n matrix of real numbers. SinceK e is a function of the sampling setS, we wish to find an S that maximizes a function of K e : S = argmax S⊂V ,|S|=s h U F (U T SF U SF ) − 1 U T F . (2.2) Note that the set S achieving optimality under general criteria in the form of (2.2) is a function of F, so that S is optimized for reconstruction with that particular bandwidth support F. While typically we do not know the bandwidth of the original signal, in what follows, we assume that a specific bandwidth for reconstructing the signal has been given. This assumption can be relaxed, as will be shown in Chapter 4. A particular choiceh(K e ) of interest ish(K e )=1/pdet(K e ), wherepdet(·) is the pseudo determinant [45]. Since K e is singular, we used the pseudo-determinant instead of the 12 determinant. The pseudo-determinant differs from the determinant in that it is a product of only the non-zero eigenvalues instead of all eigenvalues. With our choice of h(·), (2.2) is equivalent to the following maximization: S = argmax S⊂V ,|S|=s det(U T SF U SF ). (2.3) This is also known as the D-optimality criterion. Maximizing the determinant leads to minimizing the confidence interval of the solution ˆ f [4], as will be seen in Appendix B. As a further advantage, the D-optimal objective leads to a novel unified view of different types of sampling algorithms proposed in the literature (Section 2.4.3). Moreover, the D- optimalobjectiveisnecessaryfortheapproximationsweneedtodevelopalgorithmsachieving efficient eigendecomposition-free subset selection. Sampling algorithms are designed to implicitly or explicitly optimize the sampling set for a particular bandwidth support. In this chapter, we denote by R the bandwidth support assumed by a sampling algorithm. R can be equal to the reconstruction bandwidth support F, in which case the objective (2.3) can be rewritten as: S = argmax S⊂V ,|S|=s det(U T SR U SR ), with R=F. (2.4) However, there are advantages to choosing a different R̸=F. For example, if we consider R = {1,··· ,s} so that |R| = |S|, we can rewrite the objective function (2.4) without changing its value, by permuting the order of the matrices: S = argmax S⊂V ,|S|=s det(U SR U T SR ). (2.5) Essentially, instead of using the reconstruction frequency f as the sampling frequency, we use the number of samples requested, s, as a proxy for the sampling frequency. As we will see, this new form of (2.5) is easier to interpret and use. 13 Since choosing |R| = |S| is required, it raises concerns about the optimality of our sampling set for the original objective function. This issue will be discussed in Appendix B. 2.2.2 Solving D-optimal objectives As just discussed, D-optimal subsets for matrices are determinant maximizing subsets. The determinant measures the volume, and selecting a maximum volume submatrix is an NP- Hard problem [15]. Nearly-optimal methods have been proposed in the literature [29], [19], butthesearebasedonselectingasubmatrixofrowsorcolumnsofaknownmatrix. Similarly, in the graph signal processing literature, several contributions [69, 13] develop algorithms for D-optimal selection assuming that U is available. In contrast, the main novelty of our work is to develop greedy algorithms for approximate D-optimality, i.e., solving (2.3) without requiring explicit eigendecomposition to obtainU. This is made possible by specific characteristics of our problem to be studied next. Among graph signal sampling approaches that solve the D-optimal objective, the clos- est to our work is the application of Wilson’s algorithm for Determinantal Point Process (WDPP) [68], which similarly does not require explicitly computing U. However, our pro- posed technique, AVM, achieves this goal in a different way and leads to better performance. Specifically, WDPP avoids eigendecomposition while approximating the bandlimited kernel using Wilson’s marginal kernel upfront [68]. This is a one-time approximation, which does not have to be updated each time nodes are added to the sampling set. This approach relies on a relation between Wilson’s marginal kernel and random walks on the graph, leading to a probability of choosing sampling sets that is proportional to the determinant [68]. In contrast, AVM solves an approximate optimization at each iteration, i.e., each time a new vertex is added to the existing sampling set. Thus, AVM optimizes the cost function (2.3) at every iteration as opposed to WDPP which aims to optimize the expected value of the cost function. 14 The WDPP and AVM algorithms differ in their performance as well. AVM is a greedy algorithm, and the performance greedy determinant maximization algorithms is known to lie within a factor of the maximum determinant [15]. In contrast, WDPP samples with probabilities proportional to the determinants, so that its average performance depends on the distribution of the determinants. In fact, for certain graph types in [68], we observe that WDPP has a worse average performance than WRS. In comparison, in our experiments, for a wide variety of graph topologies and sizes, AVM consistently outperforms WRS [57] in terms of the average reconstruction error. 2.3 Efficient sampling set selection algorithms In what follows we assume that the conditions for equivalence between the two objective function forms (2.4) and (2.5) are verified, so that we focus on solving (2.5). 2.3.1 Incremental subset selection The bandwidth support for the purpose of sampling is assumed to beR={1,··· ,s}. Let us start by defining d v , the signal obtained by applying an ideal low pass filter to the Kronecker delta functionδ v localized at vertex v: d v =U R U T R δ v . (2.6) 15 With this definition, the objective in (2.5) can be written as: det(U SR U T SR )=det(U SR U T R U R U T SR ) =det(I T S U R U T R U R U T R I S ) =det d 1 ··· d s T d 1 ··· d s ! = Vol 2 (d 1 ,··· ,d s ). (2.7) Here I S represents the submatrix obtained by selecting the columns of I indexed by set S. Thus,maximizingthedeterminantdet(U SR U T SR )isequivalenttomaximizingVol(d 1 ,··· ,d s ), and as a consequence the set maximizing (2.7) also maximizes (2.5). In an iterative algorithm where the goal is to select s samples, consider a point where m < s samples have been selected and we have to choose the next sample from among the remaining vertices. Throughout the rest of the chapter, we denote the sampling set at the end of the m th iteration of an algorithm byS m . Given the first m chosen samples we define D m = [d 1 ··· d m ] and the space spanned by the vectors in D m as D m = span(d 1 ,··· ,d m ). We denoteD and D at the end of the m th iteration of an algorithm by D m and D m . Note that both D m and D m are a function of the choice of the sampling bandwidth support R. Next, the best column d v to be added to D m should maximize: det D m d v T D m d v ! =det D T m D m D T m d v d T v D m d T v d v (2.8a) =det(D T m D m )det(d T v d v − d T v D m (D T m D m ) − 1 D T m d v ) (2.8b) =det(D T m D m )(∥d v ∥ 2 − d T v D m (D T m D m ) − 1 D T m d v ) (2.8c) =det(D T m D m ) ∥d v ∥ 2 −∥ P Dm d v ∥ 2 . (2.8d) 16 The effect on the determinant of adding a column to D m can be represented according to a multiplicative update (Section 11.2 [4]) in our D-optimal design. (2.8b) follows from [31] (Section 0.8.5 in Second Edition), while (2.8d) follows because P Dm =D m (D T m D m ) − 1 D T m (2.9) isaprojectionontothespaceD m . Directgreedydeterminantmaximizationrequiresselecting a vertex that maximizes the update term in (2.8c): v ∗ =argmax v∈S c m ∥d v ∥ 2 − d T v D m (D T m D m ) − 1 D T m d v (2.10) over all possible vertices v∈S c m , which requires the expensive computation of (D T m D m ) − 1 . The first step towards a greedy incremental vertex selection is estimating the two com- ponents, ∥d v ∥ 2 and d T v D m (D T m D m ) − 1 D T m d v , of the multiplicative update. The first term ∥d v ∥ 2 is the squared coherence introduced in [57], which is estimated here using the same techniques as in [57], and is defined as ∥d v ∥ 2 = U R U T R δ v 2 = U T R δ v 2 . (2.11) For the second term, the projection interpretation of (2.8d) will be useful to develop approx- imations to reduce complexity. Additionally, we will make use of the following property of our bandlimited space to develop an approximation. Lemma 2.1. The space of bandlimited signals span(U R ) equipped with the dot product is a reproducing kernel Hilbert space (RKHS). Solution. Defining the inner product for signals f,g∈span(U R ) as ⟨f,g⟩= X i f i g i , (2.12) 17 span(U R ) is a Hilbert space. A Hilbert space further needs an existing reproducing kernel to be an RKHS. Towards that end, consider a mapping to our bandlimited space ϕ :R n → span(U R ) given as: ϕ (x)=U R U T R x, (2.13) where ϕ (x) is the orthogonal projection of x onto span(U R ). A function K :R n × R n →R that uses that mapping and the scalar product in our Hilbert space is: ∀x,y∈R n , K(x,y)=⟨ϕ (x),ϕ (y)⟩. (2.14) Now using Theorem 4 from [8], K is a reproducing kernel for our Hilbert space, and using Theorem 1 from [8] we conclude that our bandlimited space of signals is an RKHS. Corollary 2.1. The dot product of a bandlimited signal f ∈ span(U R ) with a filtered delta d v is f(v), the entry at node v of signal f: ⟨f,d v ⟩=f(v). (2.15) Solution. The dot product⟨f,d v ⟩ in our RHKS can be seen as the evaluation functional of f at the point v. Using the definition of reproducing kernel K, since f ∈span(U R ) we have that ϕ (f)=f and thus (using Section 2 definition and Theorem 1 Property b from [8]): ⟨f,d v ⟩=⟨ϕ (f),ϕ (δ v )⟩=⟨f,ϕ (δ v )⟩. (2.16) An evaluation functional⟨f,ϕ (x)⟩ for f bandlimited can be simplified as: ⟨f,ϕ (x)⟩=⟨f,U R U T R x⟩=⟨U R U T R f,x⟩ =⟨f,x⟩. (2.17) 18 Thus, from (2.16) and (2.17): ⟨f,d v ⟩=⟨f,ϕ (δ v )⟩=⟨f,δ v ⟩=f(v). As a consequence of (2.15), if f =d w we have: ⟨d w ,d v ⟩=d v (w)=d w (v). (2.18) 2.3.2 Approximation through distances We start by proposing a distance-based algorithm (DC) based on the updates we derived in (2.8). While in principle those updates are valid only when s = f, in DC we apply them even when s > f. We assume f is known. We take the bandwidth support for the purpose of sampling to beR={1,··· ,f}, which is the same as the signal reconstruction bandwidth supportF. To maximize the expression in (2.8d) we would like to select nodes that have: 1. Large squared graph coherence ∥d v ∥ 2 with respect to f frequencies (the first term in (2.8d), which is a property of each node and independent of S m ), and 2. small squared magnitude of projection onto the subspace D m (which does depend on S m ) P Dm ˆ d v 2 and thus would increase (2.8d). ∥d v ∥ 2 varies between 0 and 1, taking the largest values at vertices that are poorly con- nected to the rest of the graph [57]. On the other hand, the subspace D m is a linear combination of filtered delta signals d v corresponding to the vertices in S m . The energy of a signal d v is expected to decay as a function of distance from v. Therefore, for a partic- ular energy∥d v ∥ 2 , a vertex whose inner-product is minimum with the filtered delta signals corresponding to vertices in S m will have a small ∥P Dm d v ∥ 2 . Thus, d v for a vertex farther away from the already sampled vertices will have lesser overlap with their corresponding d w , 19 Algorithm 1 Distance-coherence (DC) 1: function DC(L,s,f,d,ϵ ) 2: S ←∅ , 3: ∆ ← 0.9 4: R←{ 1,··· ,f} 5: ∥d 1 ∥ 2 ,··· ,∥d n ∥ 2 ,λ f ,coeffs ← Compute coherence(L,n,f,ϵ ) 6: while|S|∆ .max u∈V d(S,u)} 8: v∗← argmax v∈V d (S) ∥d v ∥ 2 9: S ←S∪ v ∗ 10: end while 11: returnS 12: end function which also span the spaceD m . Therefore for a vertexv∈S c m whose “distance” to the vertices S m is large, the corresponding d v will have a small projection on the spaceD m . OurproposedDCalgorithm(Algorithm1)consistsoftwostages; itfirstidentifiesvertices that are at a sufficiently large distance from already chosen vertices in S m . This helps in reducing the set size, by including only those v∈V d (S m ) that are expected to have a small ∥P Dm d v ∥ 2 . From among those selected vertices it chooses the one with the largest value of ∥d v ∥ 2 . The nodes with a sufficiently large distance from S are defined as follows V d (S m )={v∈S c m |d(S m ,v)>∆ .max u∈V d(S m ,u)}, where ∆ ∈[01], d(S m ,v)= min u∈Sm d(u,v) (2.19) and d is the geodesic distance on the graph. The distance between two adjacent vertices i,j is given by d(i,j)=1/w(i,j). (2.20) The parameter ∆ is used to control how many nodes can be included inV d (S m ). With a small ∆ , more nodes will be considered at the cost of increased computations; while with a 20 large ∆ , lesser nodes will be considered with the benefit of reduced computations. For small ∆ , the DC algorithm becomes similar to WRS, except the vertices are picked in the order of their squared coherence, rather than randomly with probability proportional to their squared coherence as in [57]. The DC algorithm (Algorithm 1) provides a proof-of-concept of the volume maximization interpretation using coherences and distances for sampling. However, it involves obtaining geodesic distances on the graph, which is a computationally expensive task. Eliminating this bottleneck is possible by employing simpler distances such as hop distance, or doing away with distances altogether. We leave the first approach open for future work and develop the second approach here as the AVM algorithm (Algorithm 2). 2.3.3 Approximate volume maximization (AVM) through inner products In this section, we use a more efficient technique based on filtering, instead of computing the distance between nodes as in DC, here we assumed that the signal bandwidth for sampling was the same as the reconstruction bandwidth f. In practice, we do not know the signal bandwidth and thus also do not know the reconstruction bandwidth. To remedy this, in AVM, we use the number of samples, s, as a proxy for the signal’s bandwidth. As a result, the bandwidth support used for sampling isR={1,··· ,s}. We explained the reason behind this decoupling of the sampling and the reconstruction bandwidth in Section 2.2.1 through equations (2.3) and (2.4). AVM has the following advantages: • We can use the optimization framework we defined in Section 2.3.1. • Bynotassumingknowledgeofthereconstructionbandwidthforsampling,AVMmodels real-world sampling scenarios better. • For our chosen set of samples, we do not have to limit ourselves to one reconstruction bandwidth. 21 Algorithm 2 Approximate volume maximization (AVM) function AVM(L,s,d,ϵ ) S ←∅ R←{ 1,··· ,s} ∥d 1 ∥ 2 ,··· ,∥d n ∥ 2 ,λ s ,coeffs ← Compute coherence(L,n,s,ϵ ) while|S|ε· ¯λ do 10: if round(SS)≥ k then 11: ¯λ ← λ . 12: else 13: λ ← λ . 14: end if 15: λ ← (λ + ¯λ )/2 16: coeffs ← Polynomial Filter Coefficients(0,λ n ,λ,d ) 17: r 1 filt ,··· ,r L filt ← Poly. Filter(L,coeffs ,r 1 ),··· ,Poly. Filter(L,coeffs ,r L ) 18: SS← P n i=1 P L l=1 (r l filt ) 2 i 19: end while 20: ∥d 1 ∥ 2 ,··· ,∥d n ∥ 2 ← h P L l=1 (r l filt ) 2 1 ,··· , P L l=1 (r l filt ) 2 n i /SS 21: return ∥d 1 ∥ 2 ,··· ,∥d n ∥ 2 ,λ, coeffs 22: end function 2.3.3.2 Approximate inner product matrix We know that the volume of the parallelepiped formed by two fixed-length vectors is maxi- mized when the vectors are orthogonal to each other. Now, since vectors that optimize (2.10) also approximately maximize the volume, we expect them to be close to orthogonal. Thus, we approximate D T m D m by an orthogonal matrix (Appendix C). That is, assuming that the filtered delta signals corresponding to the previously selected vertices are approximately orthogonal we can write: D T m D m ≈ diag ∥d 1 ∥ 2 ,··· ,∥d m ∥ 2 23 and (D T m D m ) − 1 ≈ diag 1 ∥d 1 ∥ 2 ,··· , 1 ∥d m ∥ 2 , which leads to an approximation of the determinant: det D T m D m D T m d v d T v D m d T v d v ≈ det(D T m D m )det(d T v d v − d T v ˆ D m ˆ D T m d v ), (2.21) where ˆ D m is obtained from D m by normalizing the columns: ˆ D m =D m diag(1/∥d 1 ∥,··· ,1/∥d m ∥). (2.22) The second term in (2.21) can be written as: d T v ˆ D m ˆ D T m d v = ⟨d v ,d 1 ⟩ 2 ∥d 1 ∥ 2 +··· + ⟨d v ,d m ⟩ 2 ∥d m ∥ 2 , (2.23) which would be the signal energy of projected signald v on tospan(d 1 ,··· ,d m ), if the vectors d 1 ,··· ,d m were mutually orthogonal. This is consistent with our assumption that D T m D m is approximately diagonal, which would only hold exactly if the vectors form an orthogonal set. 2.3.3.3 Computing low pass filtered delta signals If U is known, then computing the low pass filtered delta signal d v is straightforward by simply using the ideal low pass filter as in (2.6). However, since we would like to avoid the cost of the eigendecomposition,U is unknown. A polynomial approximation of the ideal low pass filter with the frequency λ s can be computed using Function 1. Using this polynomial approximation,δ v is filtered to obtain d v . 24 2.3.3.4 Fast inner product computations Maximization of (2.21) requires evaluating the inner products⟨d v ,d i ⟩ in (2.23) for alli∈S m and all vertices v outside S m . Suppose we knew d i for sampled vertices, i ∈ S m , and the inner products from the past iteration. The current (m+1) th iteration would still need to compute n− m new inner products. To avoid this computation, we use the inner product property of (2.18), which allows us to simplify (2.23) as follows: d T v ˆ D m ˆ D T m d v = d 2 1 (v) ∥d 1 ∥ 2 +··· + d 2 m (v) ∥d m ∥ 2 . By doing this we avoid computing⟨d v ,d m ⟩ for v∈S c for the newly added vertex m. Thus, there is no need to compute n− m new inner products, while we also avoid computing d v ,v∈S c m . With this, our greedy optimization step becomes: v∗← argmax v∈S c m ∥d v ∥ 2 − X w∈Sm d 2 w (v) ∥d w ∥ 2 . 2.3.3.5 Summary of approximations In summary, thanks to the approximations from Section 2.3.3.2 to Section 2.3.3.4, we do not need to compute distances and no longer rely on the choice of a parameter ∆ , as in Algorithm 1. Algorithm 2 only requires the following inputs: 1. The number of samples requested, s, 2. The constant c specifying the number of random projections, cslogs, 3. The scalar ϵ specifying the convergence criteria for random projection iterations while computing squared coherences. The last two inputs are specifically needed by Algorithm 1 in [57], which we use in Step 1 (Section 2.3.3.1) to compute squared coherences. 25 While the inner product property is defined based on the assumption that we use an “ideal” low pass filter for reconstruction, it can also be used to maximize the volume formed by the samples of more generic kernels — see Appendix D. The approximations proposed in this section towards designing AVM can be justified if they lead to a scalable and fast algorithm. In what follows, we study the computational complexity of AVM to assess its scalability. 2.3.4 Computational complexity of AVM The computational complexity of AVM depends on the number of vertices and edges in the graph — |V| and |E|, the degree of the polynomial d, the number of samples s, and the number of iterations T 1 to converge to the right λ s for computing squared coherences. In practice, we observe that a small number of iterations T 1 are required to converge. AVM starts by computing squared coherences, with complexity O(|E|dT 1 log|V|). Find- ing and normalizing filtered signal requires O(d(|E|+|V|)) computations. Subtraction and finding the maximum requires O(|V|) computations. We repeat this s times which results in O(s|V|+s(|E|+|V|)d) computations in Stage 2 of AVM. This leads to Algorithm 2 having a computational complexity of O((|E|+|V|)dT 1 log|V|+s(|E|+|V|)d). For a connected graph we know that|E|≥|V|− 1, so then the complexity is simply O(|E|dT 1 log|V|+s|E|d). 2.3.4.1 Dependence on coherence estimation accuracy Stage 1 is the bottleneck in the AVM algorithm because it involves T 1 iterations to find the squared coherences, with computations in each iteration scaling as|E|log|V|, where both the factors|E| and|V| scale with the graph size. A limitation in the number of computations we can do at this stage may cap the graph sizes we can consider. In this situation, we note that Stage 1 (computing squared coherences and λ s ) is an approximation, and we could select an alternative approximation requiring fewer computations instead. 26 2.3.4.2 Dependence on the number of samples The complexity of sampling algorithms naturally depends on the number of samples re- quested at the input, and it is reasonable to assume that an ideal sampling algorithm cannot grow sublinearly in complexity as the number of samples increases. This is because simply adding one sample requires O(1) computations. While a sampling algorithm’s complexity may grow superlinearly with the number of samples requested — see Table III in [2], algo- rithms we compare in Section 2.6.2 grow linearly with the number of samples. Additionally, AVM’s complexity also scales linearly with the number of samples as the complexity factor O(s|E|d) suggests. 2.3.4.3 Log-linear dependence on graph size The computational complexity of O(|E|dT 1 log|V| + s|E|d) suggests that AVM has a log- linear dependence on the graph size, specifically with a linear dependence on the number of edges and log dependence on the number of vertices. This can further be seen as com- plexity with log-linear dependence on the number of edges as O(|E|dT 1 log|E|+s|E|d), but O(|E|dT 1 log|V|+s|E|d) is a more accurate estimate. So far, approximations to the volume maximization objective (2.8d) were useful to develop DC and AVM 2 algorithms. In the following sections, we will show how other eigendecomposition-free algorithms can also be interpreted as approximations to the greedy volume maximization objective. 2.4 Volume maximization interpretation of sampling We next study how existing graph signal sampling methods are related to volume maximiza- tion. We start by focusing on the SP algorithm from [2] and show how it can be seen as a volume maximization method. This idea is developed in Section 2.4.1 and Section 2.4.2. 2 Code: https://github.com/STAC-USC/Graph-signal-sampling-AVM 27 Section 2.4.3 then considers other eigendecomposition-free methods and draws parallels with our volume maximization approach. 2.4.1 The SP algorithm as Gaussian elimination The SP algorithm is based on the following theorem [2]. Theorem 2.1. Let L be the combinatorial Laplacian of an undirected graph. For a set S m of size m, let U Sm,1:m be full rank. Let ψ ∗ k be zero over S m and a minimizing signal in the Rayleigh quotient of L k for a positive integer k. ψ ∗ k = argmin ψ ,ψ (Sm)=0 ψ T L k ψ ψ T ψ . (2.24) Let the signal ψ ∗ be a linear combination of first m+1 eigenvectors such that ψ ∗ (S m )=0. Now if there is a gap between the singular values σ m+2 > σ m+1 , then ∥ψ ∗ k − ψ ∗ ∥ 2 → 0 as k→∞. Solution. Refer to [2]. For the proof of ℓ 2 convergence, see Appendix A. Following (2.24), the step in the SP algorithm that leads to sampling a new vertex is v ∗ =argmax v∈Sm c |ψ ∗ k |, where ψ ∗ k is from (2.24). Consider first the ideal SP algorithm, where k → ∞ and the solution tends to the ideal bandlimited solution. In the ideal case, given a full rank U Sm,1:m , from Theorem 2.1 we can always get another vertex v such thatU Sm∪v,1:m+1 is also full rank. Thus, at all iterations, any submatrix of U Sm,1:m will have full rank. 28 d 1 d 2 d m h ˜ D m (a) Orthogonality of h with ˜ D m . d v P ˜ Dm (d v ) ˜ D m |⟨h,d v ⟩| (b) Components of d v . Figure 2.1: Geometry of SP. When k → ∞ and S m vertices are selected, ψ ∗ is given by the m+1 th column of U ′ , where U ′ is obtained by applying Gaussian elimination to the columns of U such that the m+1 th column has zeros at indexes given byS m [2]. U ′ can be written as: U ′ = u 1 (1) 0 u ′ 2 (2) . . . u m+2 ··· u n u ′ m+1 (m+1) ⋆ . . . u ′ m+1 (n) , (2.25) where ⋆ denotes arbitrary entries and 0 denotes zero entries in the corresponding matrix regions. Because we have non-zero pivots, u 1 (1) to u ′ m+1 (m+1), U Sm,1:m is full rank. The columns inU from m+2 to n remain intact. The concept is the same as that in [62]. Next, we explain how an iteration in the ideal SP algorithm can be seen as a volume maximization step. 2.4.2 SP algorithm as volume maximization Consider a single stage in the SP algorithm, where the current sampling set isS m ,|S m |=m is the number of sampled vertices, and the bandwidth support is R m = {1,··· ,m + 1}. 29 At this stage, choosing a vertex v that maximizes det(U Sm∪v,Rm U T Sm∪v,Rm ) is equivalent to choosing a vertex that maximizes |det(U Sm∪v,Rm )|. We briefly state our results in terms of |det(U Sm∪v,Rm )| as this gives us the added advantage of making the connection with the Gaussian elimination perspective from Section 2.4.1. Now, focusing on the selection of the (m+1) th sample, we have the following result. Proposition 2.1. The sample v ∗ selected in the (m+1) th iteration of the ideal SP algorithm is the vertex v from S c m that maximizes |det(U Sm∪v,Rm )|. Solution. The ideal SP algorithm selects the vertex corresponding to the maximum value in |u ′ m+1 (m+1)|, ··· , |u ′ m+1 (n)|. Since S m is given and U ′ Sm∪v,Rm is a diagonal matrix, this also corresponds to the selection of v such that magnitude value of the det(U ′ Sm∪v,Rm ) is the maximum among all possible v selections. But becauseU ′ Sm∪v,Rm is obtained fromU by doing Gaussian elimination, the two deter- minants are equal, i.e.,|det(U Sm∪v,Rm )| =|det(U ′ Sm∪v,Rm )|, and since the current (m+1) th iteration chooses the maximum absolute value pivot, given S m , the selected sample maxi- mizes|det(U Sm∪v,Rm )|. We now show that the vertex v ∗ is selected in the (m+1) th iteration according to the following rule: v ∗ =argmax v∈S c m dist(d v ,span(d 1 ,··· ,d m )), where dist(·,·) is the distance between a vector and its orthogonal projection onto a vector subspace. Thus, this optimization is equivalent to selecting a vertex v that maximizes the volume of the parallelepiped formed by d 1 ,··· ,d m ,d v , i.e., Vol(d 1 ,··· ,d m ,d v ). Let h be a unit vector along the direction of (m+1) th column of U ′ in (2.25). We are interested in finding the vertex v that maximizes|h(v)|. Proposition 2.2. The signal value h(v) is the length of projection of d v on h. 30 Solution. Thesignalhbelongs tothebandlimited space,h∈span(U Rm ). Thus, using (2.15) we have that: h(v)=⟨h,d v ⟩. Sinceh is a unit vector, the last expression in the equation above is the projection length of d v on h. In summary,|h(v)| is maximized when|⟨d v ,h⟩| is maximized. Proposition 2.3. The signal h is such that h∈span(d 1 ,··· ,d m ) ⊥ ∩span(U Rm ). Solution. All the diagonal elements in the Gaussian elimination of U Sm∪v,Rm are non-zero, as seen in (2.25), so that the following equivalent statements follow: • U Sm,1:m is full rank. • U Rm U T Rm I Sm is full column rank. • span(d 1 ,··· ,d m ) has dimension m. The second statement follows (Section 0.4.6 (b) [31]) from the first because U Rm has full column rank and U Sm,1:m is nonsingular. Given that span(d 1 ,··· ,d m ) has dimension m we can proceed to the orthogonality arguments. By definition, h obtained from (2.25) is zero over the set S m so that, from Proposition 2.1: h(1)=0 =⇒ ⟨h,d 1 ⟩=0, . . . h(m)=0 =⇒ ⟨h,d m ⟩=0, 31 and therefore h is orthogonal to each of the vectors d 1 ,··· ,d m . We call the space spanned by those vectors ˜ D m , defined as ˜ D m ={span(d i )|i∈{1,2,··· ,m}}. Since dimension of U Rm is m+1 and h is orthogonal to ˜ D m =span(d 1 ,··· ,d m ) (2.26) of dimension m, span(h) is the orthogonal complement of ˜ D m (see Fig. 2.1a for an illustra- tion). For this particular algorithm,R changes with the number of samples in the sampling set. At the end of m th iteration, the bandwidth supportR can be represented as R m ={1,··· ,m+1}, (2.27) where m is the number of samples in the current sampling set. We use ˜ D and ˜ D m to denote a dependence ofD andD m onR m in addition toS m . Proposition 2.4. The sample v ∗ selected in the (m+1) th iteration of SP maximizes the distance between d v and its orthogonal projection on ˜ D m . Solution. Sinced v ∈span(U Rm ), it can be resolved into two orthogonal components belong- ing to the two orthogonal spaces ˜ D m and span(h) (Prop. 2.3): d v =P ˜ Dm d v +⟨d v ,h⟩h, where h has unit magnitude and P ˜ Dm is the projection matrix onto the subspace ˜ D m . 32 Maximizing|h(v)| is equivalent to maximizing|⟨d v ,h⟩| which can be expressed in terms of magnitude of d v and the magnitude of its projection on ˜ D m as: argmax v ⟨d v ,h⟩ 2 =argmax v ∥d v ∥ 2 − P ˜ Dm d v 2 . (2.28) Fig. 2.1b shows this orthogonality relation between d v ,|⟨h,d v ⟩|, and P ˜ Dm (d v ). Thus, the v ∗ chosen at the (m+1)th iteration is the one that maximizes the volume of the space spanned by the filtered delta signals. v ∗ =argmax v∈S c m Vol(d 1 ,··· ,d m ,d v ). (2.4.2) follows from using the definition of volume of parallelepiped [52]. Although Propo- sition 2.4 could have been derived from the determinant property in Proposition 2.1 using (2.7), we observe that the approach using the orthogonal vector to the subspace in Propo- sition 2.2, Proposition 2.3 and Proposition 2.4 makes more explicit the geometry of the problem. Algorithm 3 summarizes this new volume maximization interpretation of SP. Although Algorithm 3 requires eigendecomposition, it is helpful to see its conceptual similarity with Algorithms 1 and 2. Algorithm 3 updates R in each iteration and that can be seen as an approximation of a greedy volume maximization approach where R is kept fixed. Hence Algorithm 3 is expected to have sub-optimal performance compared to the greedy volume maximizationapproachfortheD-optimalitycriteria. Foranempiricalcomparison, inSection 2.5,wecompareSPwhichisAlgorithm3relaxedwithafinitevalueof k andwithoutrequiring the full eigendecomposition. 33 Algorithm 3 Volume interpretation of SP algorithm as k→∞ function SP(L,s) S ←∅ R←{ 1} while|S|f, which is the setting we consider in this chapter. Note that this reconstruction requires the signal bandwidth f to be known, regardless of whether the signal is bandlimited or bandlimited with additional noise. Most reconstruction algorithms assume that this bandwidth is known [14, 57, 72]. However, fundamentally this is a model selection problem where an appropriate bandlimited signal model with a fixed bandwidth f must be chosen. 4.2.3 Bandwidth selection through reconstruction errors Although the goal of model selection for signal reconstruction is to choose f, the signal itself might not be bandlimited. As a result, there may not be any prior for signal bandwidth. However, our primary goal is to minimize the reconstruction error: E S c =∥x S c− ˆ x S c∥ 2 , (4.2) wheretheestimate ˆ x S c isafunctionoff,andsoisE S c. Toselectf weproposeaminimization of∥ˆ x S c− x S c∥ 2 over a set of possible values of f, so that whichever bandwidth f minimized the error will be used as the reconstruction bandwidth, f ∗ =min f E S c. However, minimizing ∥ˆ x S c− x S c∥ 2 is impossible without knowing x S c. For that reason, we propose estimating the error∥ˆ x S c− x S c∥ 2 for different values of f using the known signal values, x S . We limit the scope of this chapter to estimating this reconstruction error and leave the bandwidth selection for future work. Towards that end, we propose an estimate, 68 ˆ E S c, of the reconstruction error E S c for different values of f, such that|E S c− ˆ E S c| is as small as possible. Cross-validation is a suitable tool for such estimations and we will consider that next. 4.3 Cross-validation theory for graph signals In order to accurately estimate the reconstruction error as a function of the signal bandwidth f, it is essential to analyze in more detail the error with respect to subset selection on the set of graph vertices. 4.3.1 Conventional error estimation and shortcomings The reconstruction error, e(S c ), measured over the unknown nodes is the following: e(S c )=x S c− ˆ x S c =x S c− U S c F (U T SF U SF ) − 1 U T SF x S . To estimate this error we could split the set S further into the sets {S 1 ,S c 1 },··· , {S k ,S c k } such thatS i ∪S c i =S for i∈{1,··· ,k}, estimate e(S c i )=x S c i − U S c i F (U T S i F U S i F ) − 1 U T S i F x S i , and use the estimate ˆ E S c = X i∈{1,··· ,k} ∥e(S c i )∥ 2 /k. This would be equivalent to using the standard cross-validation approach that is typical in linear model selection [60]. 69 Suppose that the noise vector has some representation, n=U F γ +U F cβ , (4.3) we can conveniently separate the bandlimited and non-bandlimited components of the signal using the following representation: x=U F α ′ +U F cβ , (4.4) where α ′ =α +γ . (4.5) The bandlimited component of the signal has no effect on either e(S c ) or e(S c i ). Using the new representation of the signal, to simplify the notation we define M=U S c F c− U S c F (U T SF U SF ) − 1 U T SF U SF c M i =U S c i F c− U S c i F (U T S i F U S i F ) − 1 U T S i F U S i F c. Thus, our errors are e(S c )=Mβ , (4.6) e(S c i )=M i β i∈{1,··· ,k}. (4.7) The matrices M and M i are what mainly differentiate the errors e(S c ) and e(S c i ). Be- cause the subsets S i are selected randomly, M i can be ill-conditioned although M is well- conditioned. This ill-conditioning often causes the estimate of the cross-validation error to be orders of magnitude higher than the actual error. Intuitively, this can happen in cases where S is well connected to S c but S i is poorly connected toS c i . See Figure 4.2 for a toy example where all vertices in the graph are within 70 Figure 4.2: Disconnected graph case for cross-validation. In this example, we sample a single graph containing two connected components (left and right). Since the random subset S c i is disconnected fromS i , signal values onS c i cannot be inferred from signal values onS i . onehopofS, butS i isdisconnectedfromS c i . Inthisexample, tryingtoreconstructthesignal onS c i usingtheknownsignalvaluesonS i isimpossiblebecausethegraphisdisconnectedand we only observe samples from one of the connected components. Thus, we cannot achieve a meaningful reconstruction for nodes in the other (unobserved) connected component. This can be viewed as an extreme case of ill-conditioning since the graph is disconnected graphs, butthisissuemanifestsitselfforconnectedgraphsandwouldbereflectedasanill-conditioned M i . 4.3.2 Proposed error estimation AswenotedinSection4.3.1, averagingtheerroroverrandomsubsetsmayleadtotheblowing up of the error estimate due to the ill-conditioning of the reconstruction matrices. Such ill- conditioning is caused when the magnitude of ∥M i ∥ in (4.7) is large. To obtain reliable estimates in the presence of ill-conditioning, we propose a weighted averaging to estimate the bandwidth, where we assign different importance (weight) to the reconstruction errors estimated from different random subsets. Consider the singular value decomposition M i and the resulting expression for the reconstruction error on the set S c i : M i =V i Σ i W T i , e(S c i )=V i Σ i W T i β . (4.8) 71 20 40 60 80 0.5 1 ·10 − 3 (a) Bandwidth 20 noise power 0.2 20 40 60 80 0 0.5 1 ·10 − 3 (b) Bandwidth 50 noise power 0.1 20 40 60 80 0.8 1 1.2 1.4 ·10 − 3 (c) Bandwidth 120 noise power 0.2 20 40 60 80 20 25 (d) California, day average tem- perature 0 50 100 15 20 25 (e) West and west northcen- tralclimateregions,dayaverage temperature 0 50 100 20 40 60 E S c ˆ E S c (f) US average temperature, monthly normal Figure4.3: Squaredreconstructionerrorsvsbandwidthforbandlimitedsignalmodel. Legend is common for all the plots. We can see that∥M i ∥ will be large when the singular values in Σ i are large. Thus, to limit the increase in∥M i ∥ due toΣ i , we propose to clip the singular values: We leave the singular value σ inΣ i unchanged if σ < 1, and clip it to 1 if σ ≥ 1. Define the singular value matrix Σ ′ i with the singular values as follows: σ ′ = σ, σ > 1 1, σ ≤ 1 (4.9) This helps in preserving the changes in the magnitude of ∥e(S c i )∥ due to ∥β ∥ but limits the effect of M i having a large condition number. Although we decomposed M i , it is worth keeping in mind that we only have access toe(S c i ) and to control the magnitude of this error, we can pre-multiply with a matrix. To achieve the transformation in the singular values, we 72 multiplye(S c i ) withΣ ′ i , the inverse of the clipped matrix of singular values, to obtain a new error e new (S c i )=(Σ ′ i ) − 1 V T i e(S c i ), (4.10) In(4.10), pre-multiplicationoftheexistingerrorvectore(S c i )by(Σ ′ i ) − 1 V T i canbeinterpreted as giving more importance to certain vertices while ignoring others. Given that V T i is not diagonal, the weights in (Σ ′ i ) − 1 are not directly applied to individual vertices. The weights on individual vertices can be seen as weighted averaging by the matrixV i (Σ ′ i ) − 1 V T i . Finally, we estimate the error using ˆ E S c = P i∈{1,··· ,k} (Σ ′ i ) − 1 V T i e(S c i ) 2 k (4.11) ˆ E S c = P i∈{1,··· ,k} ∥e new (S c i )∥ 2 k . (4.12) 4.4 Experiments 4.4.1 Graph construction For the initial verification of our error estimation approach, we construct random regular graphs with 1000 vertices according to the model RandomRegular from [18]. We define noisy bandlimited signals with bandwidths {20,50,120} and power 1 and noise power levels 0.1 and 0.2 according to the model in (4.1). We call these graphs and signals synthetic graphs and signals for our experiments. For the next experimental validation, we use publicly available climate data from the Na- tional Oceanic and Atmospheric Administration (NOAA) [70] measured by sensors through- out the United States. The sensor data consists of different weather measurements such as average daily temperature or precipitation, along with the corresponding sensors’ latitudes, longitudes, and altitudes. 73 Geographical region Signal Number of samples California Avg. day temperature 100 West and west north central Avg. day temperature 200 US Avg. temperature monthly normal 200 Table 4.1: Setting for cross-validation experiments on sensor networks and weather data. Using the locations, we construct graphs by connecting the five nearest sensor locations to each sensor. The edge weights of the graph are given by e − d 2 /2σ 2 , where we experimentally choose σ = 50. We calculate the distance d between the measurement locations using the latitude, longitude, and altitude of the measuring station using q d 2 f +d 2 a . d f is the flat distance computed using the distance package from geopy library, and d a is the altitude. While constructing the graph, we drop sensors whose measurements are missing because there is no way to verify our predictions for those sensors. The measurements we include as signals are day averages measured on 3 rd January 2020 and monthly normals [20], which are average measurements for January 2010. 4.4.2 Set selection In Section 4.2, we assume that the signal values on a vertex set S are known. To select this set for the constructed graphs on which we assume signal values are known, we use the AVM algorithm from Chapter 2 to sample 200 vertices from each graph and observe the reconstruction errors on the frequencies {10,20,··· ,110}. The only exception is the Californiasensornetworkgraph, wherewesample100verticesandobservethereconstruction errors over the frequencies{10,20,··· ,80} because the graph contains only 300 vertices. We summarize this in Table 4.1. To estimate the reconstruction error using cross-validation, we partition each sampling set S into 10 subsets using RepeatedKFold function from model_selection package of sklearn. We measure the squared reconstruction error on each subset of the partition, for 50 different random partitions, and average the squared reconstruction errors using (4.12). 74 4.4.3 Results We can see the results of our estimation in Figure 4.3. The estimated cross-validation error tracks the actual error in the wide variety of the graphs and graph signals that we experiment with. We note that in 4.3a the actual error increases slightly. However, the estimated error does not increase with it. This is due to the error weighting strategy proposed in (4.10). Since for the problem of choosing the bandwidth we are interested in correctly locating the lowest value of the actual error, the ability of the error estimate to track the actual error as it increases should be of lesser importance than its ability to track the actual error as it decreases. A more accurate error estimation could be achieved with different set selection or error weighting strategies for cross-validation, which we reserve for future work. 4.5 Conclusion In this chapter, we proposed a way to minimize graph signal reconstruction error without assuming the knowledge of the signal bandwidth. In the process, we tailored the cross- validation method for the problem of reconstruction error estimation. Our technique ac- curately estimated the error as a function of the signal bandwidth on various bandlimited signals with noise and also for sensor networks measuring weather. 75 Chapter 5 Subgraph-based parallel sampling for large point clouds 5.1 Introduction Analyzing and processing 3D structures is important for many applications such as au- tonomous driving [32], geological elevation models [65], and preserving information from historic sites [6]. Point clouds are one of the simplest data structures created when scanning such 3D structures. For these reasons, processing point clouds is important for success in these applications. To prevent loss of information at the source while capturing 3D structures, point clouds typically consist of closely spaced points. Realistic point clouds easily contain a hundred thousand to a million nodes. However, not all parts of a point cloud need to be rendered in the highest resolution for various applications such as detecting defects in terrestrial scans [66] or segmentation [41]. Moreover, transmitting the point cloud requires the bandwidth to be minimized. Point cloud subsampling is a popular approach to reduce the size of the point cloud to process it for downstream tasks. In the past, point cloud subsampling approaches have been used for downsampling the geometry of the point cloud. However, point cloud attribute subsampling is equally important for applications such as attribute compression for immersive communication [42]. We will consider attribute subsampling in this chapter. Point clouds are often scans of 3D objects. As a result, most points lie on the surface of thescannedobjects. Sincegraphsareagoodrepresentationformanifolds, weconsidergraphs 76 for this application of subsampling point clouds. We will consider the graph sampling and reconstruction problem for graph signals. Point clouds often contain millions of data points. Graph sampling methods in the literature, including those presented in this thesis, are often not scalable to the size of millions of nodes in the graph. Therefore fast graph algorithms need to be developed for sampling point clouds. We devote this chapter to developing such algorithms. One way to speed up the sampling of point clouds is by parallelizing the sampling process. For that, we propose dividing the point clouds into smaller point clouds, constructing graphs on the smaller point clouds, and sampling graph signals on those smaller subgraphs. This chapter makes the following contributions to the problem of sampling point cloud attributes: • We propose a method of parallelizing graph sampling algorithms for large graphs through partitioning of the entire graph into subgraphs • The proposed algorithm provides at least 0.8dB gain in performance compared to uniform sampling for all datasets we consider • As future work, we propose approximations to speed up the algorithm even further over the approach from Chapter 2 5.2 Problem formulation A point cloud is a set of coordinates (x,y,z) and their corresponding attributes. Attributes can contain object information, such as color, or structural information, such as surface normals. We consider the kNN graph constructed using coordinates of the point cloud and usetheconventionsandnotationsdefinedinSection1.1. Forthepointcloudattributesignals that we consider, we use the bandlimited model with noise from (1.2) with the luminance channel as our signal x. We want to be able to measure the values of the signal at a few 77 nodesS and predict the entire signal ˆ x from those values. We measure the error as in (4.2) on the vertices where the signal value is unknown. E S c =∥x(S c )− ˆ x(S c )∥ 2 (5.1) The least squares reconstruction in (1.1) is computationally expensive for the entire graph. We will adopt a simpler reconstruction for signals on point clouds. We will use weighted edges to connect each unknown-signal point to its k nearest known-signal points. These weights are used for reconstruction, with each unknown value estimated as a weighted average of the attributes of its nearest neighbors with known attributes. More specifically, ˆ x i = w 1i x 1 +··· +w ki x k w 1i +··· +w ki , where w ij are the edge weights in the graph constructed for the point cloud, and x i are luminance values at the points in the cloud. (a)G (b)G 1 andG 2 Figure 5.1: Original graph and its division into subgraphs for enabling the parallelized sampling algorithm. 5.3 Distributed sampling Towards the goal of distributed sampling for graph signals, we propose dividing the point cloud into smaller point clouds. We construct graphs from these smaller point clouds which we call subgraphs and apply a graph sampling algorithm to each subgraph. 78 5.3.1 Problem formulation 5.3.1.1 Local sampling global reconstruction We will assume that the partition of the graph into subgraphs is given. In practice, in this chapter, point clouds are partitioned using octrees with suitable depth and each subgraph connects all points within one of the octree volumes forming the partition. For simplicity, consider Figure 5.1 where the graph is divided into two subgraphs, G 1 and G 2 , with corre- sponding vertex setsV 1 andV 2 , respectively. Instead of solving the problem of selecting the best sampling set on the entire graph, we solve the problem of sampling in the individual graphs such that the original objective (5.1) is minimized. For a sampling set S 1 selected from graphG 1 , and a sampling setS 2 selected from graphG 2 , we can reconstruct ˆ x(S c 1 ) and ˆ x(S c 2 ). From (2.5) the sampling set selection problem for the entire graph can be formulated as: S ∗ = argmax S⊂V ,|S|=s det(U SR U T SR ). (5.2) After partitioning the graph, we wish to formulate separate optimization problems on the two subgraphs of the original graph: S ∗ 1 = argmax S 1 ⊂V ,|S 1 |=s 1 det(U S 1 R U T S 1 R ) (5.3) S ∗ 2 = argmax S 2 ⊂V ,|S 2 |=s 2 det(U S 2 R U T S 2 R ) (5.4) such that, (5.5) s=s 1 +s 2 ,and (5.6) S ∗ 1 ∪S ∗ 2 =S ∗ = argmax S⊂V 2 ,|S|=s det(U SR U T SR ) (5.7) This leaves the problem that optimizing problem (5.3) and (5.4) generally does not provide an optimal solution for (5.2). This is illustrated by Figures 5.3band 5.3b, where we can see that independent sampling in each subgraph (Fig. 5.3b) would lead to inefficiencies in 79 a global interpolation, since points that are neighbors in the joint graph (but happen to be in different subgraphs) have been sampled. To ameliorate this, we propose modifying the subgraphs so that sampling the individual subgraphs independently can better mimic the global sampling over the entire graph. (a) Uniformly sampled graph (b) Uniformly sampled graphs after partition- ing Figure 5.2: Illustration of how uniform sampling works before and after partitioning. 5.3.1.2 Proxy for optimizing the sampling over subgraphs We designed AVM in Chapter 2 to solve (5.2). The solution S obtained through AVM depends on the underlying graph predominantly through d v s. A proxy for obtaining similar sampling results using AVM onG 1 as that onG is to maked v the same on both the graphs. More specifically, we would want d v,G (V 1 )=d v,G 1 , ∀v∈V 1 , (5.8) where we have used an additional subscript to denote the graph that d v is evaluated on. Note that d v,G has length n but we only wish to compare the vectors over G 1 . Hence, we consider a subset of the vector that corresponds to V 1 . In general (5.8) cannot be achieved exactly. Here we propose to modify the graph G 1 to obtain a new graph G ′ 1 for which (5.8) can be approximately satisfied, that is, G ′ ∗ 1 =argmin G ′ 1 X v∈V 1 d v,G (V 1 )− d v,G ′ 1 2 . (5.9) 80 The operators can be seen as low-pass filters with a cut-off frequency. The first step to having the same operators for the original graph and the subgraphs is to have the same cutoff frequency for the operator on the subgraph and on the original graph. Using polynomials to synthesize the filter, this can be achieved by setting the cutoff frequency before computing the polynomial filter coefficients. (a) AVM sampled graph (b) AVM sampled graphs after partitioning Figure 5.3: Illustration of how AVM sampling works before and after partitioning. (a) Issue with AVM sampling after partition- ing (b) Proposed graph modification for AVM sampling after partitioning Figure 5.4: AVM tends to favor sampling more isolated (lower degree) nodes. Our proposed solution of adding self-loops prevents this bias towards boundary nodes. Since we represent filters by their corresponding polynomials, we can make the filter operator the same by using the same polynomial coefficients corresponding to a single cutoff frequency. However, we also want the result of filtering operations on the graph signals to be the same. This is more difficult to achieve because the domain of the filter has changed. We still want the effects of the filter to be the same on the smaller domain. To ensure it is enough to ensure that the signals obtained by filtering delta signals to be the same. Considerapolynomialrepresentationofafilter p(L 1 )wheretheorderofthepolynomialis k. If the boundary ofG 1 is at a distance of k-hops or more from a vertex 1, p(L 1 )δ 1 =p(L)δ 1 . 81 However, if vertex 1 is less than k-hops away from the boundary, the effects of the filter will not be the same. The effects of the filter only depend on two factors, the frequency response of the filter and the graph input to the filter. If we want to use the same filter for the entire signal, and also keep the effects of the filter the same for delta signals localized in the interior of the graph, we have to modify the graph Laplacian. (a) Uniformly sampled graph 1 2 3 4 (b) Uniformly sampled graphs after partition- ing Figure 5.5: Illustration of how uniform sampling works before and after partitioning. 5.3.2 Distributed sampling through graph modifications To improve the performance of independent sampling of subgraphs, we propose to modify the subgraphs such that the effect of the polynomial filtering on the subgraphs is as close as possible to filtering on the original graph. Essentially this entails introducing graph modi- fications at the boundaries between subgraphs, i.e., where edges were removed to partition the original graph. For concreteness, we will consider the graphG which is split into graphsG 1 andG 2 as in Figure5.1. Withoutlossofgenerality, wewillsolvetheproblemforG 1 . Weconsiderchanging edge weights for all edges in the graph G 1 and adding self-loops to all vertices. Since our goal is to compensate for the effect of partitioning on filtering operations, we consider the simplest possible graph filter, i.e., a first-order polynomial filter: p(L)=c 0 I+c 1 L. (5.10) 82 Problem 5.1. We propose to modify edge weights between vertices and add self-loops to vertices of G 1 such that Laplacians of G, and the modified graph G ′ 1 satisfy (c 0 I+c 1 L)δ i =(c 0 I+c 1 L ′ 1 )δ i , ∀i∈V 1 . (5.11) Solution. We want to compare the effects of this filter on graph vertices in G 1 . Consider Figure 5.2b, where one edge connects graph G 1 with graph G 2 . Let the change in edge weights be ϕ ij and self-loops of degrees ϕ i be added to each vertex. L ′ 1 = w 12 +w 13 +ϕ 12 +ϕ 13 +ϕ 1 − w 12 − ϕ 12 − w 13 − ϕ 13 − w 12 − ϕ 12 w 12 +ϕ 12 +ϕ 2 0 − w 13 − ϕ 13 0 w 13 +ϕ 13 +ϕ 3 (5.12) First, consider the case a filtered delta signal localized in the interior of G 1 . L ′ 1 δ 2 = − w 12 − ϕ 12 w 12 +ϕ 12 +ϕ 2 0 (5.13) For these points, we can see that equality (5.1) can be achieved, i.e., L ′ 1 δ 2 =Lδ 2 (V 1 ), (5.14) for ϕ 12 = 0 and ϕ 2 = 0. This means that when comparing the effects of a first-order filter, the edges and vertices in the interior of the subgraph should remain the same. 83 Next, consider the effects of the filter on a boundary vertex. L ′ 1 δ 1 = w 12 +w 13 +ϕ 12 +ϕ 13 +ϕ 1 − w 12 − ϕ 12 − w 13 − ϕ 13 (5.15) On equating the filtered delta signal using L and L ′ 1 , where Φ is unknown, L ′ 1 δ 1 =Lδ 1 (V 1 ), (5.16) we observe that we need to add a self-loop with edge weight w 14 to achieve equality while the edge weights remain unchanged. L ′ 1 = w 12 +w 13 +w 14 − w 12 − w 13 − w 12 w 12 0 − w 13 0 w 13 (5.17) More generally, the weight of the self-loop added to a vertex v inG 1 is the sum total of all edges connecting v toG 2 inG. We will use the self-loop modification for vertices on the boundary of the partitioned graphs for experiments and demonstrate that the performance comparably improves while providing us with significant parallelization capabilities. 5.4 Experiments and Results We consider the point clouds in the Microsoft voxelized upper bodies dataset [43] and 8i Voxelized Full Bodies dataset [21]. The dataset consists of point clouds for different people, with multiple point clouds of each person in different postures. For our experiments, we will 84 use one point cloud for each unique person in the dataset, and we consider the luminance channel of the point cloud as our signal. Figure 5.6: Point clouds for experiments Table 5.1: Reconstruction PSNRs for various sampling algorithms Dataset Uniform AVM AVM-SL Sarah 46.62 47.43 47.66 Andrew 34.5 35.19 35.31 Longdress 31.5 32.23 32.43 Redandblack 37.87 38.94 39.15 Soldier 35.8 37.05 37.29 Loot 39.91 40.9 41.14 David 45.4 46.36 46.53 Phil 36.62 37.33 37.42 Ricardo 46.59 47.23 47.55 We divide each point cloud using octrees in blocks of size 2 64 × 2 64 . Increasing the sub- graph sizes leads to improvement in the performance, at the expense of more computational time however we don’t focus on that in this chapter. We construct k nearest neighbor graphs over each of the smaller point clouds with k =5. We sample the subgraphs created by each block using multiple algorithms and reconstruct them using (5.2) from 10 nearest known 85 Table 5.2: Algorithm execution times in secs. Dataset Uniform AVM AVM-SL Sarah 0.013 284 297 Andrew 0.021 457 468 Longdress 0.013 194 193 Redandblack 0.011 161 160 Soldier 0.014 226 227 Loot 0.011 141 144 David 0.023 371 380 Phil 0.024 425 398 Ricardo 0.013 229 230 samples. We compare the PSNR resulting from samples from three sampling algorithms — Uniform sampling [64], AVM, AVM with self-loops (AVM-SL). We run all the AVM algorithms in parallel on 8 CPU cores. From Table 5.1, we can see that AVM outperforms uniform sampling in terms of recon- struction performance by 1dB in most cases. Additionally, AVM SL attains about 0.3dB gain in performance in most cases. In table 5.2 we see that Uniform sampling is faster than all the AVM-based algorithms. AVM and AVM-SL, on the other hand, take the longest to finish. However, it is worth noting that point clouds like Phil contain more than a million points, and AVM and AVL-SL finish executing within 8 minutes for all datasets we consid- ered. In comparison, executing an unparallelized version of a fast algorithm like AVM takes time in hours and the algorithms we developed finish at least ten times faster. 5.4.1 Further approximations for faster sampling To speed up AVM-SL even further, we propose the following modifications as future work. 5.4.1.1 Leveraging degree information Even after sampling in parallel after the self-loop modification, there is scope for making the sampling algorithm faster. The three sources of complexity in AVM come from 1. Frequency estimation 86 0 100 200 300 400 500 600 0 5 10 15 i λ i (a) Eigenvalue spectrum 0 100 200 300 400 500 600 0 5 10 15 i λ i (b) Eigenvalue and spectrum estimation Figure 5.7: Estimating λ k using λ max 2. Coherence estimation 3. Evaluating polynomials on the graph To solve these issues, we propose the following modifications to the AVM algorithm: 1. First order approximation to the frequency spectrum 2. Coherence estimation using degree and neighbors 3. Reducing polynomial degrees These modifications can also be used for applications other than point clouds. 5.4.1.2 First order approximation to frequency spectrum The computation of coherences in Algorithm 2 involves the estimation of two frequencies - λ max andλ k . Extreme eigenvalues like λ max can be iteratively computed with the complexity of each iteration linear in the number of edges of the graph — [37], but estimating λ k is more computationally expensive. To address this λ k can be estimated as follows: ˆ λ k = kλ max n . (5.18) 87 For graphs, such as kNN, constructed from points whose pairwise distances do not show extreme variations such as those we study here, the frequency spectrum can be reasonably estimated using a linear approximation. Figures 5.7a and 5.7b illustrate this estimation for a block in the soldier point cloud. 5.4.1.3 Coherence estimation using degree and neighbors Estimating coherence in Algorithm 2 is another computationally expensive task. However, we note that coherences, graph vertex degrees, and neighborhoods are highly correlated. This is because for higher filter frequencies corresponding to larger sampling rates, p(L)δ i is localized to the neighborhood around the vertex. As a result, the polynomial filters have dominant weights corresponding to the zeroeth and first-order coefficients. These zeroeth and first-order coefficients correspond to the edge weights of the neighbors of a vertex and vertex degrees. We propose estimating the coherences using a function of the vertex degrees and the 1-hop neighborhood of the vertices. We estimate the coefficients of the polynomial using the Chebyshev polynomial approx- imation [55]. We know that the diagonal of the filter operator matrix gives us the squared coherence: p(L)(i,i)=∥d i ∥ 2 (5.19) 5.4.1.4 Reducing polynomial degrees Finally, filtering signals using polynomials over the graph is another expensive computation. The computational requirement is linear in the order of the polynomial. To alleviate the computational cost, we reduce the polynomial order to 1. 88 5.5 Conclusion In this chapter, we considered applications of the AVM algorithm to a million node graphs with high sampling rates. In the process, we extended AVM to be parallelizable. To im- prove the performance of parallelized algorithm, we proposed modifications to the graph of the point cloud and demostrated the performance improvements on several large-scale point clouds. In addition, we provided modifications to speed up the existing parallelized implementation by more than ten times. 89 Chapter 6 Conclusion The sampling problem for traditional signals like audio and images is well-studied. In this thesis, we considered the setting of sampling and reconstruction ( Figure 6.1) of unstructured data represented as graph signals. Sample Reconstruct Figure 6.1: Sampling and reconstruction pipeline It is necessary that graph signal sampling and reconstruction algorithms should work on a diverse set of graph types and sizes, unknown signal models, corruption of input signal, and real-time applications. We devote various chapters — see Table 6.1, for these problems. Table 6.1: Proposed algorithms and contributions Chapter Problem considered 2 Computationally scalable sampling algorithm 3 Signal corruption and sample loss 4 Unknown signal model for reconstruction 5 Towards real-time sampling algorithms through parallelization 90 Throughconsideringthesamplingandreconstructionproblemforgraphsignalsinvarious settings such as semi-supervised learning, sensor networks, point clouds, we showed that the sampling algorithms that traditionally existed in for structured signals can also be extended forgraphsignalsandmadescalableandrobust. Weproposedseveralalgorithmsthatimprove the current state-of-art sampling and reconstruction strategies. Table 6.2 summarizes those algorithm contributions. Table 6.2: Proposed algorithms and contributions Algorithm Contribution DC Sampling using graph distances AVM Scalable graph sampling algorithm Robust sampling Sampling preemptively against data loss Error estimation Towards signal reconstruction with unknown signal bandwidth AVM-SL Sampling algorithm parallelization Through the proposed algorithms which are faster and more robust for sampling and reconstruction for graph signals, we pushed the limits of graph signal processing in this thesis. 91 References [1] OleksiiAbramenkoandAlexanderJung. Graphsignalsamplingviareinforcementlearn- ing. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3077–3081. IEEE, 2019. [2] Aamir Anis, Akshay Gadde, and Antonio Ortega. Efficient sampling set selection for bandlimited graph signals using graph spectral proxies. IEEE Transactions on Signal Processing, 64(14):3775–3789, 2015. [3] Aamir Anis, Akshay Gadde, and Antonio Ortega. Efficient sampling set selection for bandlimited graph signals using graph spectral proxies. IEEE Transactions on Signal Processing, 64(14):3775–3789, 2016. [4] Anthony Atkinson, Alexander Donev, and Randall Tobias. Optimum experimental designs, with SAS, volume 34. Oxford University Press, 2007. [5] Yuanchao Bai, Fen Wang, Gene Cheung, Yuji Nakatsukasa, and Wen Gao. Fast graph sampling set selection using gershgorin disc alignment. IEEE Transactions on Signal Processing, 68:2419–2434, 2020. [6] Ahmad Baik. From point cloud to jeddah heritage bim nasif historical house – case study. Digital Applications in Archaeology and Cultural Heritage, 4:1–18, 2017. [7] Saeed Basirian and Alexander Jung. Random walk sampling for big data over networks. In 2017 International Conference on Sampling Theory and Applications (SampTA), pages 427–431. IEEE, 2017. [8] Alain Berlinet and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011. [9] Ilija Bogunovic, Junyao Zhao, and Volkan Cevher. Robust maximization of non- submodular objectives. International Conference on Artificial Intelligence and Statistics (AISTATS’18), 84, 2018. [10] Béla Bollobás. Modern graph theory, volume 184. Springer Science & Business Media, 2013. [11] Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. Convex optimization. Cam- bridge university press, 2004. 92 [12] Luiz F. O. Chamon and Alejandro Ribeiro. Greedy sampling of graph signals. IEEE Transactions on Signal Processing, 66(1):34–47, 2018. [13] Luiz FO Chamon and Alejandro Ribeiro. Greedy sampling of graph signals. arXiv preprint arXiv:1704.01223, 2017. [14] Siheng Chen, Rohan Varma, Aliaksei Sandryhaila, and Jelena Kovačević. Discrete signal processing on graphs: Sampling theory. IEEE Transactions on Signal Processing, 63(24):6510–6523, 2015. [15] Ali Çivril and Malik Magdon-Ismail. On selecting a maximum volume sub-matrix of a matrix and related problems. Theoretical Computer Science, 410(47-49):4801–4811, 2009. [16] Mark Crovella and Eric Kolaczyk. Graph wavelets for spatial traffic analysis. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, volume 3, pages 1848–1857. IEEE, 2003. [17] Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. arXiv preprint arXiv:1102.3975, 2011. [18] Michaël Defferrard, Lionel Martin, Rodrigo Pena, and Nathanaël Perraudin. Pygsp: Graph signal processing in python. [19] Amit Deshpande and Luis Rademacher. Efficient volume sampling for row/column subset selection. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 329–338. IEEE, 2010. [20] Storm Dunlop. A dictionary of weather. OUP Oxford, 2008. [21] Eugene d’Eon, Bob Harrison, Taos Myers, and Philip A Chou. 8i voxelized full bodies-a voxelizedpointclouddataset. ISO/IECJTC1/SC29JointWG11/WG1(MPEG/JPEG) input document WG11M40059/WG1M74006, 7(8):11, 2017. [22] Paul Erdős and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960. [23] Santo Fortunato. Community detection in graphs. Physics reports, 486(3):75–174, 2010. [24] Akshay Gadde, Aamir Anis, and Antonio Ortega. Active semi-supervised learning us- ing sampling theory for graph signals. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 492–501, 2014. [25] Akshay Gadde and Antonio Ortega. A probabilistic interpretation of sampling theory of graph signals. In 2015 IEEE international conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3257–3261. IEEE, 2015. 93 [26] FernandoGama,AntonioGMarques,GeertLeus,andAlejandroRibeiro. Convolutional neural network architectures for signals supported on graphs. IEEE Transactions on Signal Processing, 67(4):1034–1049, 2019. [27] Benjamin Girault, Shrikanth Narayanan, Paulo Gonçalves, Antonio Ortega, and Eric Fleury. Grasp: A matlab toolbox for graph signal processing. In 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), 2017. [28] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012. [29] Sergei A Goreinov, Ivan V Oseledets, Dimitry V Savostyanov, Eugene E Tyrtyshnikov, and Nikolay L Zamarashkin. How to find a good submatrix. In Matrix Methods: Theory, Algorithms And Applications: Dedicated to the Memory of Gene Golub, pages 247–256. World Scientific, 2010. [30] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009. [31] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge University Press, 2012. [32] Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 954–960, 2018. [33] Ajinkya Jayawant and Antonio Ortega. A distance-based formulation for sampling signals on graphs. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6318–6322. IEEE, 2018. [34] Ajinkya Jayawant and Antonio Ortega. Practical graph signal sampling with log-linear size scaling. Signal Processing, 194:108436, 2022. [35] Siddharth Joshi and Stephen Boyd. Sensor selection via convex optimization. IEEE Transactions on Signal Processing, 57(2):451–462, 2008. [36] Alexander Jung and Nguyen Tran. Localized linear regression in networked data. IEEE Signal Processing Letters, 26(7):1090–1094, 2019. [37] Andrew V Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM journal on scientific computing, 23(2):517–541, 2001. [38] Andreas Krause and Daniel Golovin. Submodular function maximization. Cambridge University Press, 2014. 94 [39] Andreas Krause, H Brendan McMahan, Carlos Guestrin, and Anupam Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9(Dec):2761– 2801, 2008. [40] Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Dandapani Sivakumar, An- drew Tompkins, and Eli Upfal. The web as a graph. In Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1–10. ACM, 2000. [41] Yun-Jou Lin, Ronald R Benziger, and Ayman Habib. Planar-based adaptive down- samplingofpointclouds. Photogrammetric Engineering &RemoteSensing, 82(12):955– 966, 2016. [42] Zhi Liu, Qiyue Li, Xianfu Chen, Celimuge Wu, Susumu Ishihara, Jie Li, and Yusheng Ji. Point cloud video streaming: Challenges and solutions. IEEE Network, 35(5):202–209, 2021. [43] Charles Loop, Qin Cai, S Orts Escolano, and Philip A Chou. Microsoft voxelized upper bodies-a voxelized point cloud dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m38673 M, 72012:2016, 2016. [44] Javier Maroto and Antonio Ortega. Efficient worker assignment in crowdsourced data labeling using graph signal processing. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2271–2275. IEEE, 2018. [45] Thomas Minka. Inferring a gaussian distribution. Media Lab Note, 1998. [46] SunilKNarang, AkshayGadde, EduardSanou, andAntonioOrtega. Localizediterative methods for interpolation in graph structured data. In 2013 IEEE Global Conference on Signal and Information Processing, pages 491–494. IEEE, 2013. [47] James B Orlin, Andreas S Schulz, and Rajan Udwani. Robust monotone submodu- lar function maximization. In International Conference on Integer Programming and Combinatorial Optimization, pages 312–324. Springer, 2016. [48] Antonio Ortega. Introduction to Graph Signal Processing. Cambridge University Press, 2022. [49] Antonio Ortega, Pascal Frossard, Jelena Kovačević, José MF Moura, and Pierre Vandergheynst. Graph signal processing: Overview, challenges, and applications. Proceedings of the IEEE, 106(5):808–828, 2018. [50] Alejandro Parada-Mayorga. Blue noise and optimal sampling on graphs. PhD thesis, University of Delaware, 2019. [51] Alejandro Parada-Mayorga, Daniel L Lau, Jhony H Giraldo, and Gonzalo R Arce. Blue- noisesamplingongraphs. IEEETransactionsonSignalandInformationProcessingover Networks, 5(3):554–569, 2019. 95 [52] Bo Peng. The determinant: A means to calculate volume. Recall, 21:a22, 2007. [53] Nathanaël Perraudin, Johan Paratte, David Shuman, Lionel Martin, Vassilis Kalofo- lias, Pierre Vandergheynst, and David K. Hammond. Gspbox: A toolbox for signal processing on graphs, 2016. [54] Isaac Pesenson. Sampling in paley-wiener spaces on combinatorial graphs. Transactions of the American Mathematical Society, 360(10):5603–5627, 2008. [55] William H Press, William T Vetterling, Saul A Teukolsky, and Brian P Flannery. Numerical Recipes Example Book (FORTRAN). Cambridge University Press Cam- bridge, 1992. [56] Friedrich Pukelsheim. Optimal design of experiments. SIAM, 2006. [57] Gilles Puy, Nicolas Tremblay, Rémi Gribonval, and Pierre Vandergheynst. Random sampling of bandlimited signals on graphs. Applied and Computational Harmonic Analysis, 2016. [58] Akie Sakiyama, Yuichi Tanaka, Toshihisa Tanaka, and Antonio Ortega. Eigendecomposition-free sampling set selection for graph signals. arXiv preprint arXiv:1809.01827, 2018. [59] Akie Sakiyama, Yuichi Tanaka, Toshihisa Tanaka, and Antonio Ortega. Eigendecomposition-free sampling set selection for graph signals. IEEE Transactions on Signal Processing, 67(10):2679–2692, 2019. [60] Jun Shao. Linear model selection by cross-validation. Journal of the American statistical Association, 88(422):486–494, 1993. [61] Michael C Shewry and Henry P Wynn. Maximum entropy sampling. Journal of applied statistics, 14(2):165–170, 1987. [62] Han Shomorony and A Salman Avestimehr. Sampling large data on graphs. In Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on, pages 933– 936. IEEE, 2014. [63] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Van- dergheynst. The emerging field of signal processing on graphs: Extending high- dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013. [64] ShashankNSridhara, EduardoPavez, AntonioOrtega, RyosukeWatanabe, andKeisuke Nonaka. Point cloud attribute compression via chroma subsampling. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2579–2583. IEEE, 2022. 96 [65] Jason M Stoker, John C Brock, Christopher E Soulard, Kernell G Ries, Larry Sug- arbaker, Wesley E Newton, Patricia K Haggerty, Kathy E Lee, and John A Young. USGS lidar science strategy: mapping the technology to the science, volume 10. US Department of the Interior, US Geological Survey, 2015. [66] Czesław Suchocki and Wioleta Błaszczak-Bąk. Down-sampling of point clouds for the technical diagnostics of buildings and structures. Geosciences, 9(2):70, 2019. [67] Yuichi Tanaka, Yonina C Eldar, Antonio Ortega, and Gene Cheung. Sampling signals ongraphs: Fromtheorytoapplications. IEEESignalProcessingMagazine, 37(6):14–30, 2020. [68] Nicolas Tremblay, Pierre-Olivier Amblard, and Simon Barthelme. Graph sampling with determinantal processes. arXiv preprint arXiv:1703.01594, 2017. [69] Mikhail Tsitsvero, Sergio Barbarossa, and Paolo Di Lorenzo. Signals on graphs: Uncer- tainty principle and sampling. IEEE Transactions on Signal Processing, 64(18):4845– 4860, 2016. [70] Russell S. Vose, Scott Applequist, Mike Squires, Imke Durre, Matthew J. Menne, Claude N. Williams, Chris Fenimore, Karin Gleason, and Derek Arndt. Improved his- torical temperature and precipitation time series for u.s. climate divisions. Journal of Applied Meteorology and Climatology, 53(5):1232 – 1251, 2014. [71] Fen Wang, Yongchao Wang, and Gene Cheung. A-optimal sampling and robust re- construction for graph signals via truncated neumann series. IEEE Signal Processing Letters, 25(5):680–684, 2018. [72] Xiaohan Wang, Pengfei Liu, and Yuantao Gu. Local-set-based graph signal reconstruc- tion. IEEE transactions on signal processing, 63(9):2432–2444, 2015. [73] Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning us- ing gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003. 97 Appendices A Proof of eigenvector convergence Lemma6.1. There exists a signalϕ in the orthogonal subspace toψ ∗ withϕ (S m )=0,∥ϕ ∥= 1 whose out of bandwidth energy is a minimum value c 0 ̸=0. Solution. The set of signals{ϕ :ϕ (S m ) =0,∥ϕ ∥ = 1} is a closed set. Let x 1 ··· x n T be in the set for any ϵ . Then x 1 +ϵ/ 2 x 2 ··· x n T is in the ϵ neighborhood. Distance exists because it is a normed vector space. That vector does not have ∥∥ = 1 so it is not in the set. So for every ϵ -neighborhood∃ a point not in the set. So every point is a limit point and the set is a closed set. Out of bandwidth energy is a continuous function on our set. Let v 1 ,v 2 be such that v 1 ,v 2 ⊥ψ ∗ andv 1 (S m )=0,v 2 (S m )=0. Let us suppose that the Fourier coefficients for v 1 and v 2 are (α 1 ,··· ,α n ) T and (β 1 ,··· ,β n ) T . Then we want n X i=m+2 (α 2 i − β 2 i )<ϵ (A.1) for some δ where∥v 1 − v 2 ∥<δ . We can show that (A.1) holds when δ =ϵ/ 2. Since the set is closed and the function is continuous on the set, the function attains a minimum value. Minimum value cannot be zero because there is a unique signal ψ ∗ with that property, and we are looking in a space orthogonal to ψ ∗ . So there is a signal with minimum out of bandwidth energy of c 0 where c 0 >0. 98 Next, this appendix shows the proof for the l 2 convergence from Theorem 2.1. Solution. Let us look at a particular step where we have already selected S m vertices. The solutions to the following optimization problems are equivalent. ψ ∗ k =argmin ψ ψ T L k ψ ψ T ψ =argmin ψ ,∥ψ ∥=1 ψ T L k ψ . Therefore, we will consider solutions with∥ψ ∥=1. Let us consider the space of our signals. ϕ (S m )=0,ϕ ∈R n is a vector space. Dimension of this vector space is n− m. For anyk, let us represent our solution fork asψ =α 1 ψ ∗ +α 2 ψ ⊥ . Hereψ ⊥ is a vector in the orthogonal subspace to our vectorψ ∗ . We can do this because we have a vector space and ithasfinitedimensions. Oneconditiononoursignalisthat α 2 1 +α 2 2 =1,∥ψ ∗ ∥=1, ψ ⊥ =1. Furthermore, we know the Fourier transform of our two signal components. ψ ∗ F − → U T ψ ∗ = γ 1 ··· γ m+1 0 ··· 0 T =γ , ψ ⊥ F − → U T ψ ⊥ = β 1 . . . β n T =β . Our signal can be written as ψ ∗ ψ ⊥ α 1 α 2 = ψ ∗ ψ ⊥ α . Our objective function becomes the following: α T ψ ∗ T ψ ⊥T L k ψ ∗ ψ ⊥ α =α T γ T β T Σ k γ β α =α T P m+1 i=1 γ 2 i σ k i P m+1 i=1 γ i β i σ k i P m+1 i=1 γ i β i σ k i P n i=i β 2 i σ k i α =α T a b b d α . 99 In the last equation a,b,c,d are just convenient notations for the scalar values in the 2× 2 matrix. Note that a,d > 0 because σ i > 0 and the expression is then a sum of positive quantities. Note that we want to minimize the objective function subject to the constraint ∥α ∥ = 1. We solve this optimization problem by a standard application of the KKT condi- tions [11]. The Lagrangian function corresponding to our constrained minimization problem is as follows: L(α ,λ )=α T a b b d α +λ (α T α − 1). The solution which minimizes this objective function is the eigenvector of the matrix a b c d withtheminimumeigenvalue. Toprovethatwetakethegradientoftheequationwithrespect toα and put it to 0. This gives us two first order equations. aα 1 +bα 2 +λα 1 =0, bα 1 +dα 2 +λα 2 =0. α 1 = 0 implies α 2 = 0 unless b = 0 and vice versa. Both α 1 and α 2 cannot be zero at the same time otherwise our solution does not lie in our domain of unit length vectors. However either of α 1 or α 2 can be 0 only if b = 0. If b = 0, for large k the solution is given by α 2 = 0,α 1 = 1 because we show next that a/d can be made less than 1/2 for k > k 0 . We now analyze the case where b̸= 0 and so a+λ ̸= 0 and d+λ ̸= 0. Writing α 2 in terms of α 1 for both the equations we get: α 2 = − (a+λ ) b α 1 , α 2 = − b d+λ α 1 . 100 Equating both the expressions for α 2 (and assuming α 1 ̸= 0)gives us a quadratic with two solutions. Since a,d>0 the positive sign gives us the λ with lower magnitude. λ = − (a+d)+ p (a− d) 2 +4b 2 2 . We now use the condition that the solution has norm one. Solution of this gives us a value for α 2 . α 2 =∓ a− d+ p (a− d) 2 +4b 2 r a− d+ p (a− d) 2 +4b 2 2 +4b 2 . We look at the absolute value of the α 2 . |α 2 |= 2|b| q (− (a− d)+ p (a− d) 2 +4b 2 ) 2 +(2b) 2 Let us find a k 0 such that|b|/d<ϵ/ 2(ϵ> 0) for allk >k 0 . This will also makea/d<1/2. This will help us make the entire expression less than ϵ for k >k 0 (A.3). We upper bound b in the following way. |b|=| m+1 X i=1 β i γ i σ k | ≤ v u u t m+1 X i=1 β 2 i γ 2 i v u u t m+1 X i=1 σ 2k i ≤ 1. √ m+1σ k m+1 =b 1 . Now we know that the least possible value of d is c 0 σ k m+2 . So when k > k 0 we get the following upper bound for|b|/d in terms of ϵ : |b| d ≤ √ m+1σ k m+1 c 0 σ k m+2 . 101 We want this to be less than ϵ/ 2, which gives us our condition on k. √ m+1σ k m+1 c 0 σ k m+2 < ϵ 2 k > & log(m+1)/2+log1/ϵ +log(2/c 0 ) log σ m+2 σ m+1 ' . (A.2) a/d also admits a similar analysis. a d ≤ σ k m+1 c 0 σ k m+2 k > & log(2/c 0 ) log σ m+2 σ m+1 ' . (A.3) Since this value of k is equal or lesser than the value of k required for|b|/d<ϵ/ 2, for our theorem we will take the value (A.2). When d divides both the numerator and denominator of the equation it gives us the expressions we need in terms of a/d and|b|/d. |α 2 |= |2b/d| r 1− a/d+ p (a/d− 1) 2 +4(b/d) 2 2 +(2b/d) 2 < ϵ |1− a/d+ p (1− a/d) 2 +ϵ 2 | < ϵ |2(1− a/d)| <ϵ. This implies that as k increases the coefficient of out-of-bandwidth component goes to zero. Because the out-of bandwidth signal has finite energy, the signal energy goes to zero as α 2 → 0. Whether ψ ∗ k converges to ψ ∗ or− ψ ∗ is a matter of convention. Hence as k→∞, ψ ∗ k →ψ ∗ . 102 B Justification for ignoring target bandwidth while sampling We know that selecting the right U SF matrix is essential to prevent a blow-up of the error while reconstructing using (1.1). In practice for reconstruction the bandwidth is f ≤ s. However, for our AVM sampling algorithm we chose the bandwidth to be |R| = s instead. We next address why that is a logical choice with respect to D-optimality. The matrix U T SS U SS is positive definite following from our initial condition of the set S being a uniqueness set. This provides us with the needed relations between determinants. We can see that U T SF U SF is a submatrix of U T SS U SS . U T SS U SS = U T SF U SF U T SF U S,f+1:s U T S,f+1:s U SF U T S,f+1:s U S,f+1:s . This helps us to relate the matrix and its submatrix determinants using Fischer’s inequality from Theorem 7.8.5 in [31]. det(U T SS U SS )≤ det(U T SF U SF )det(U T S,f+1:s U S,f+1:s ) (B.1) The determinant of the matrix U T S,f+1:s U S,f+1:s can be bounded above. The eigenvalues ofU T S,f+1:s U S,f+1:s are the same as the non-zero eigenvalues ofU S,f+1:s U T S,f+1:s by Theorem 1.2.22 in [31]. Using eigenvalue interlacing Theorem 8.1.7 from [28], the eigenvalues of the matrixU S,f+1:s U T S,f+1:s are less than or equal to 1 because it is submatrix ofU V,f+1:s U T V,f+1:s whose non-zero eigenvalues are all 1. As the determinant of a matrix is the product of its eigenvalues, the following bound applies: det(U T S,f+1:s U S,f+1:s )≤ 1. (B.2) 103 Using (B.1) and (B.2) and positive definiteness of the matrices, we now have a simple lower bound for our criteria under consideration: |det(U T SS U SS )|≤| det(U T SF U SF )|. (B.3) Thus, for example it is impossible for|det(U T SS U SS )| to be equal to some positive value while |det(U T SF U SF )| being half of that positive value. To summarize, instead of aiming to maximize |det(U T SF U SF )|, we aimed to maximize |det(U T SS U SS )|. ThisintuitivelyworkedbecauseoptimizingforaD-optimalmatrixindirectly ensured a controlled performance of the subset of that matrix. In this way due to the relation (B.3), we avoided knowing the precise bandwidth f and still managed to sample using the AVM algorithm. C Approximating Gram matrix by a diagonal matrix Here we try to estimate how close our approximation of D T m D m to a diagonal matrix is. Towards this goal we define a simple metric for a general matrix A. Fraction of energy in diagonal = P i A 2 ii P i P j A 2 ij . (C.1) Since this can be a property dependent on the graph topology, we take 5 different types of graphs with 1000 vertices — Scale-free, WRS sensor nearest neighbors, Erdős Rényi, Grid, Line. Using AVM we select a varying number of samples ranging from 1 to 50. With the bandwidth f taken to be 50, we average the fraction of the energy (C.1) over 10 instances of each graph and represent it in Fig. 6.2. We observe more than 0.75 fraction of energy in the diagonal of the matrixD T m D m , which justifies this approximation. According to our experiments, which are not presented here, the inverseofthematrix(D T m D m ) − 1 isnotasclosetoadiagonalmatrixasD T m D m istoadiagonal 104 0 10 20 30 40 50 0.75 0.8 0.85 0.9 0.95 1 Number of samples Fraction of energy in diagonal BA Random sensor knn ER Grid Line Figure 6.2: Closeness to diagonal at each iteration. matrix. Nevertheless, in place of (D T m D m ) − 1 we still use diag 1/∥d 1 ∥ 2 ,··· ,1/∥d m ∥ 2 for what it is, an approximation. Note however that the approximation does not hold in general for any samples. It holds when the samples are selected in a determinant maximizing conscious way by Algorithm 2. This approximation is suited to AVM because of its choice of sampling bandwidth,R. As the number of samples requested increases, so does the sampling bandwidth. The higher band- width causes the filtered delta signals to become more localized causing energy concentration in the diagonal and keeping the diagonal approximation reasonable and applicable. D D-optimal sampling for generic kernels Another graph signal model is a probabilistic distribution instead of a bandlimited model [25], [73]. In such cases, the covariance matrix is our kernel. The subset selection problem is defined as a submatrix selection of the covariance matrix. Framing the problem as entropy maximization naturally leads to a determinant maximization approach [61]. To define our problem more formally, let us restrict space of all possible kernels to the space of kernels which can be defined as K = g(L) with g defined on matrices but induced 105 from a function from non-negative reals to positive reals g : R ≥ 0 → R >0 . An example of such a function of L would be (L+δI ) − 1 . Such a function can be written as a function on the eigenvalues of the Laplacian K=Ug(Σ )U T . Motivated by entropy maximization in the case of probabilistic graph signal model, suppose we wish to select a set S so that we maximize the determinant magnitude|det(K SS )|. There are a few differences for solving the new problem, although most of Algorithm 2 translates well. We now wish to maximize |det(U S g(Σ )U T S )|. The expression for the determinant update remains the same as before. det D T m D m D T m d v d T v D m d T v d v ≈ det(D T m D m )det(d T v d v − d T v D m (D T m D m ) − 1 D T m d v ) Only now we have to maximize the volume of the parallelelpiped formed by the vectors d v = Ug 1/2 (Σ )U T δ v for v ∈ S. The squared coherences with respect to our new kernel d T v d v are computed in the same way as before by random projections. The diagonal of our new kernel matrix now approximates the matrix D T m D m . D T m D m ≈ diag((Ug(Σ )U T ) 11 ,··· ,(Ug(Σ )U T ) nn ) The other difference is that the approximate update stage is given by v∗← argmax v∈S c ∥d v ∥ 2 − X w∈S (Ug 1/2 (Σ )U T d w ) 2 (v) ∥d w ∥ 2 with the difference resulting from the kernel not being a projection operator. So for a generic kernel with a determinant maximization objective, Algorithm 2 works the same way with minor modifications discussed here. 106
Abstract (if available)
Abstract
Processing data such as spatially scattered weather measurements or point clouds generated from 3D scans is a challenge due to the lack of an inherent structure to the data. Graphs are convenient tools to analyze and process such unstructured data with algorithms analogous to those in traditional signal processing, which treat the data as a graph signal. However, measuring the whole graph signal can be expensive. Observing a limited subset of graph nodes may be better in such cases, i.e., sampling the graph signal and inferring information at the remaining nodes using reconstruction algorithms.
Although graph signal sampling and reconstruction algorithms exist in the literature, making them practical enough to be used in real-life applications requires numerous theoretical and fundamental improvements. One such requirement is that the algorithm should not require substantially more execution time as the number of vertices and edges in the graph increases. Even if the algorithm execution time scales well with the graph size, some samples may get corrupted. Reconstruction of such data is challenging and requires knowledge of the signal model or its parameters, which are commonly assumed to be known in the literature but in practice have to be estimated.
In this thesis, we propose algorithms to minimize the reconstruction error when sampling in the presence of signal corruption, estimate reconstruction error as a function of the signal bandwidth and develop scalable graph sampling algorithms, in particular algorithms amenable to parallelization. This makes the graph signal reconstruction algorithms more flexible and increases the capabilities of the graph sampling algorithms. Through these improvements, we push the limits of graph signal sampling and reconstruction even further.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Estimation of graph Laplacian and covariance matrices
PDF
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Critically sampled wavelet filterbanks on graphs
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Efficient graph learning: theory and performance evaluation
PDF
Lifting transforms on graphs: theory and applications
PDF
Neighborhood and graph constructions using non-negative kernel regression (NNK)
PDF
Green knowledge graph completion and scalable generative content delivery
PDF
Novel algorithms for large scale supervised and one class learning
PDF
Efficient transforms for graph signals with applications to video coding
PDF
Scaling up deep graph learning: efficient algorithms, expressive models and fast acceleration
PDF
Human motion data analysis and compression using graph based techniques
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Efficient graph processing with graph semantics aware intelligent storage
PDF
Exploiting variable task granularities for scalable and efficient parallel graph analytics
PDF
Syntax-aware natural language processing techniques and their applications
PDF
Human activity analysis with graph signal processing techniques
PDF
Learning and control for wireless networks via graph signal processing
PDF
Hardware and software techniques for irregular parallelism
Asset Metadata
Creator
Jayawant, Ajinkya
(author)
Core Title
Scalable sampling and reconstruction for graph signals
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2023-08
Publication Date
07/13/2023
Defense Date
04/19/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
graphs,OAI-PMH Harvest,processing,reconstruction,sampling,Signal
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ortega, Antonio (
committee chair
), Govindan, Ramesh (
committee member
), Kuo, Jay (
committee member
)
Creator Email
ajinkyajayawant1@gmail.com,jayawant@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113263929
Unique identifier
UC113263929
Identifier
etd-JayawantAj-12081.pdf (filename)
Legacy Identifier
etd-JayawantAj-12081.pdf
Document Type
Dissertation
Format
theses (aat)
Rights
Jayawant, Ajinkya
Internet Media Type
application/pdf
Type
texts
Source
20230713-usctheses-batch-1067
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
processing
sampling