Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient transforms for graph signals with applications to video coding
(USC Thesis Other)
Efficient transforms for graph signals with applications to video coding
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENT TRANSFORMS FOR GRAPH SIGNALS WITH APPLICATIONS TO VIDEO CODING by Keng-Shih Lu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2020 Copyright 2020 Keng-Shih Lu Dedication To my family and my friends. ii Acknowledgments First, I would like to express my deepest gratitude and thanks to my advisor, Professor Antonio Ortega, for his continuous guidance and patience through the years I have been pursuing my Ph.D. degree. His expertise and inspiration has enlightened me for refining research ideas and approaches. It has been a pleasure and privilege to work with him. I owe thanks to Professor C.-C. Jay Kuo and Professor Ulrich Neumann for serving on my dissertation committee and giving me valuable feedbacks. I am very grateful to Professor Shrikanth Narayanan, Professor Richard Leahy, and Professor Yingying Fan for being members of my qualifying exam committee. I also thank all my teachers at USC for their instruction in a wide range of courses. Many thanks to Dr. Debargha Mukherjee for his guidance and valuable discussions we had during my two summer internships at Google. I am also thankful to Dr. Yue Chen and Dr. Elliott Karpolovsky for their generous support and feedbacks on several collaborative projects with Google. I would like to thank my labmates in the Signal Transformation, Analysis and Com- pression Group, especially Jiun-Yu (Joanne), Hilmi, Eduardo, Ajinkya, Pratyusha, and Sarath, for their collaboration and brainstorming, and for being great colleagues and friends. I also thank my housemates and all my friends who have been along with me during my Ph.D. study in LA, including Ting-Ru Lin, Po-Wen Chen, I-Hsiu (Albert) Kao, Po-Han Huang, Kuan-Wen Huang, Ming-Chun Lee, Chin-Chuan (Bill) Chang, and Kai-Che (Kevin) Lin. Finally, special thanks to my parents and my two little sisters for their continuous supports, and for letting me pursue my dream. And most importantly, to my wife, Ashley, for her infinite love, companionship, and support. My Ph.D. journey would not have been so memorable and fruitful without her. iii Contents Dedication ii Acknowledgments iii List of Tables vii List of Figures ix Abstract xiii 1 Introduction 1 1.1 Notations and Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Review of Graph Signal Processing . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Graph Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Graph Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Gaussian Markov Random Fields (GMRFs) . . . . . . . . . . . . . 8 1.2.4 DCT and DST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 A Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Fast GFT based on graph symmetry and bipartition 12 2.1 Haar Units in GFTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Right Haar Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Left Haar Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Fast GFTs Based on Graph Symmetry . . . . . . . . . . . . . . . . . . . . 21 2.2.1 Graph Symmetry Based on Node Pairing . . . . . . . . . . . . . . 21 2.2.2 Node Partitioning for Haar Units . . . . . . . . . . . . . . . . . . . 22 2.2.3 Main Approach–Decomposition of Symmetric Graphs . . . . . . . 22 2.3 Examples and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Graphs with 2-Sparse Eigenvectors . . . . . . . . . . . . . . . . . . 26 2.3.3 Symmetric Line Graphs . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.4 Steerable DFTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.5 Symmetric Grid Graphs . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.6 Skeletal Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.7 Search of Symmetries in General Graphs . . . . . . . . . . . . . . . 33 iv 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Comparison with Matrix GFT . . . . . . . . . . . . . . . . . . . . 35 2.4.2 Comparison with Approximate Fast GFTs . . . . . . . . . . . . . . 36 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Data-Driven Fast GFTs for Video Coding 41 3.1 Learning Fast GFTs: A General Framework . . . . . . . . . . . . . . . . . 42 3.1.1 Solving (3.2) with a Particular Symmetry Type . . . . . . . . . . . 44 3.2 Symmetric Line Graph Transforms (SLGT) . . . . . . . . . . . . . . . . . 45 3.2.1 DTTs as SLGTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Closed-Form Graph Learning Solution with Tree Topologies . . . . 47 3.3 Data-driven Non-separable Fast GFTs . . . . . . . . . . . . . . . . . . . . 48 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.1 Data-Driven SLGTs for Inter Predictive Coding . . . . . . . . . . . 51 3.4.2 Data-Driven Nonseparable GFTs for Intra Blocks . . . . . . . . . . 53 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4 Lapped Graph Fourier Transform 56 4.1 Review of Lapped Orthogonal Transforms . . . . . . . . . . . . . . . . . . 57 4.2 Lapped Transforms on Graphs . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Conditions of Perfect Reconstruction and Orthogonality . . . . . . 58 4.2.2 Proposed LGFT Construction . . . . . . . . . . . . . . . . . . . . . 59 4.3 Experimental Results for LGFT . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.1 Transform Coding Gain with a Particular Line Graph . . . . . . . 61 4.3.2 LGFT for Image Coding . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5 Efficient DCT and DST Filtering with Sparse Graph Operators 66 5.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Sparse DCT and DST Operators . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.1 Sparse DCT-II Operators . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.2 Sparse Operators of 16 DTTs . . . . . . . . . . . . . . . . . . . . . 74 5.2.3 Sparse 2D DCT/DST Filters . . . . . . . . . . . . . . . . . . . . . 76 5.2.4 Remarks on Graph Operators of Arbitrary GFTs . . . . . . . . . . 79 5.3 New Graph Filter Designs with Sparse Operators . . . . . . . . . . . . . . 80 5.3.1 Least Squares (LS) Graph Filter . . . . . . . . . . . . . . . . . . . 80 5.3.2 Minimax Graph Filter . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.3 Weighted GFT Domain Energy Evaluation . . . . . . . . . . . . . 84 5.3.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.1 Filter Approximation Accuracy with Respect to Complexity . . . . 86 5.4.2 Transform Type Selection in Video Coding . . . . . . . . . . . . . 90 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 v 6 Irregularity-Aware GFT for Image and Video Coding 94 6.1 Irregularity-aware graph Fourier transform (IAGFT) . . . . . . . . . . . . 95 6.2 Perceptual coding with Weighted MSE . . . . . . . . . . . . . . . . . . . . 96 6.2.1 Transform Coding with IAGFT . . . . . . . . . . . . . . . . . . . . 96 6.2.2 SSIM-Driven Weights for WMSE . . . . . . . . . . . . . . . . . . . 97 6.3 IAGFT Transform Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4.1 Image Coding Results on JPEG . . . . . . . . . . . . . . . . . . . 101 6.4.2 Subjective Comparison–An Example . . . . . . . . . . . . . . . . . 103 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7 Conclusion and Future Work 105 7.1 Summary of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 A Proofs of Theorems and Lemmas 108 A.1 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 A.2 Derivations for Sparse Operators of DST-IV, DST-VII, and DCT-V . . . 110 A.2.1 Sparse DST-IV Operators . . . . . . . . . . . . . . . . . . . . . . . 110 A.2.2 Sparse DST-VII Operators . . . . . . . . . . . . . . . . . . . . . . 111 A.2.3 Sparse DCT-V Operators . . . . . . . . . . . . . . . . . . . . . . . 112 A.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 B Search of Involutions in General Graphs 115 B.1 Pruning based on Degree Lists . . . . . . . . . . . . . . . . . . . . . . . . 116 B.1.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 119 B.2 Searching of Identical Tree Branches . . . . . . . . . . . . . . . . . . . . . 120 B.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B.2.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 123 B.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C Characterization of Sparse Laplacians with a Common GFT 126 D Construction of Sparse Operators in Symmetric Graphs 129 Reference List 131 vi List of Tables 1.1 Left and right boundary conditions (b.c.) of 16 DCTs and DSTs. . . . . . 9 1.2 Definitions of all types of DCTs and DSTs. The indices j and k range from 1 toN. Scaling factors for rows and columns are given byc j = 1/ √ 2 for j = 1 and 1 otherwise, and d j = 1/ √ 2 for j =N and 1 otherwise. . . . 10 2.1 Definitions of symmetries for vectors, matrices, and graphs. . . . . . . . . 17 2.2 Types of symmetric N×N grids and their corresponding involutions. Indicesk andl represent vertical and horizontal coordinates of grid nodes, as in Figure 2.9(a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3 Runtime performance of proposed fast GFTs. The baseline for the runtime reduction rates is the matrix GFT implementation. . . . . . . . . . . . . . 36 3.1 Parameters of learned symmetric line graphs, and those associated with DCT. The parameters are weights of self loops and edges, as shown in Figure 3.2(b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Average percentage of bit rate reduction of SLGTs as compared to DCT at 32dB-38dB PSNR. Results with and without frequency weighting are compared. Negative values mean compression gain. . . . . . . . . . . . . . 51 3.3 Bit rate reduction of separable KLT, fast GFT, and hybrid DCT/ADST as compared to the 2D separable DCT, tested on residual blocks with intra mode-2 and mode-16. Negative values mean compression gain, measured in the Bjontegaard metric (percentage of bit rate reduction). . . . . . . . 53 4.1 Quality comparison of different horizontal transforms on full test image. . 65 5.1 Matrix structure and corresponding eigenpairs of sparse operators associ- ated to all DCTs and DSTs. The indice j ranges from 1 to N. . . . . . . . 71 5.2 Least squares and minimax design approaches of for PGF and MPGF, with weights ρ i on different graph frequencies. . . . . . . . . . . . . . . . 84 5.3 Encoding time and quality loss (in BD rate) of different transform pruning methods. The baseline is AV1 with a full transform search (no pruning). A smaller loss is better. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4 Encoding time and quality loss (in BD rate) of PRUNE OPERATORS versus PRUNE 2D FAST. Smaller or negative loss is better. . . . . . . . 92 vii 6.1 Bit rate comparison with respect to DCT-based scheme with uniform quantization table (i.e., “Uniform, DCT”). Negative numbers correspond to compression gains. Uniform and Non-uniform refer to uniform and non-uniform quantization tables, respectively. . . . . . . . . . . . . . . . . 103 B.1 Telephone numbers T (n) with n from 1 to 12 . . . . . . . . . . . . . . . . 116 B.2 Demonstration of Algorithm 5. . . . . . . . . . . . . . . . . . . . . . . . . 124 viii List of Figures 1.1 Graphs associated to (a) DCT-II, (b) DST-IV (ADST). . . . . . . . . . . 8 2.1 Fast transform using J layers of Givens rotations. The parameter 0 < θ i,j ≤π is the j-th rotation angle in the i-th butterfly stage, and Π k are permutation operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Two examples of fast algorithms using butterfly stages with n = 8 . . . . . 14 2.3 (a) The 4-node cycle graph and (b) a fast algorithm for its GFT. . . . . . 15 2.4 Example of symmetric graphs. (a) A graph with a bisymmetric Laplacian matrix. (b) A graph with a Laplacian satisfying (2.8). . . . . . . . . . . . 20 2.5 (a) Symmetric graph decomposition for the graph in Figure 2.4(a). Red diamonds and blue squares represent nodes inV X andV Y , respectively. (b) The associated fast GFT diagram forG 1 , where U + 1 and U − 1 are the GFTs ofG + 1 andG − 1 , respectively. . . . . . . . . . . . . . . . . . . . . . . . 24 2.6 (a) Symmetric graph decomposition for the graph in Figure 2.4(b). Red diamonds, green triangles, and blue squares represent nodes inV X ,V Z , andV Y , respectively. (b) The associated fast GFT diagram forG 2 , where U + 2 is the GFT ofG + 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 An example of even and odd symmetric components on the graph in Fig- ure 2.6. Signals x + and x − are outputs of the Haar units. Signals x even and x odd are associated to x + and x − by (2.17). . . . . . . . . . . . . . . 25 2.8 The 12-node cycle graph: (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds, green triangles, and blue squares rep- resent those nodes inV X ,V Z , andV Y , respectively, for the next stage of decomposition. Unlabeled edges and self-loops have weights 1, and P c,i are permutation operations. The two shaded sub-GFTs are (U ++ c ) > /2 (top) and (U −− c ) > /2 (bottom), respectively. . . . . . . . . . . . . . . . . . 28 2.9 (a) The axes/point of symmetry for each symmetry type, with a 6×6 grid as an example. Node indices are represented based on the image coordinate system. (b) Relationships among types of symmetries. . . . . . 29 2.10 The bi-diagonally symmetric grid. (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds, green triangles, and blue squares represent those nodes inV X ,V Z , andV Y , respectively, for the next stage of decomposition. Unlabeled edges have weights 1, and P b,i are permutation operations. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 ix 2.11 The 4×4 z-shaped grid: (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds and blue squares represent those nodes in V X andV Y , respectively, for the next stage of decomposition. Unlabeled edges have weights 1, and P z,1 is a permutation operation. . . . . . . . . . 32 2.12 The 15-node skeletal graph: (a) Graph decomposition. (b) The associ- ated fast GFT diagram. Red diamonds, green triangles, and blue squares represent those nodes inV X ,V Z , andV Y , respectively, for the next stage of decomposition. U −,1 s and U −,2 s are the GFTs corresponding to two connected components ofG − s , respectively. Unlabeled edges have weights 1, and P s,1 is a permutation operation. . . . . . . . . . . . . . . . . . . . . 34 2.13 Runtime versus sign-normalized relative error δ for different GFT imple- mentations on different graphs. The numbers labeled alongside the mark- ers indicate the associated numbers of Givens rotation layers. . . . . . . . 37 2.14 Runtime versus empirical average error for different GFT implemen- tations on different graphs. The numbers labeled alongside the markers indicate the associated numbers of Givens rotation layers. . . . . . . . . . 38 3.1 Useful graph structures in modeling image and video pixels: (a) length-4 line graph (b) 4×4 4-connected grid, (c) 4×4 8-connected grid, and (d) 4×4 6-connected grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 (a) An arbitrary length 8 line graph. (b) A length 8 SLGT. (c) A special length-8 SLGT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 (a) The diagram of general 8×8 SLGT implementation. (b) Graphs that characterize the sub-transforms in the fast GFT diagram. . . . . . . . . . 47 3.4 Rate-distortion performance of separable KLT, fast GFT, and hybrid DCT/ADST. The testing blocks in this figure are mode-2 intra residual blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 An example of Kron reduction on a line graph, whereG S k is the graph obtained from Kron reduction ofG with vertex subset S k . . . . . . . . . . 60 4.2 Transform coding gains for different transforms and block sizes M with signals modeled by a particular line graph. The KLT, GFT, and DCT shown here are defined in a block-based manner. . . . . . . . . . . . . . . 62 4.3 Basis functions of the LGFT for k = 2 with M = 8. . . . . . . . . . . . . 63 4.4 Subjective comparison different transforms with QP=30. (a) The full piecewise smooth image and a 160×60 patch. (b) Original image patch. (c) Sobel edge map. (d)-(f) Recovered image patches with different trans- forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.1 (a) Sparse operators Z (j) DCT-II , (b) their associated Laplacian matrices L (j) DCT-II = 2I− Z (j) , and (c) associated graphsG (j) DCT-II for the length-4 DCT-II. . . . 73 5.2 Sparse graph operators with length N = 6 that associated to DCT-I to DCT-VIII. Different symbols represent different values: × =−1,· = 0, = 1,4 = √ 2, and = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 75 x 5.3 Sparse graph operators with length N = 6 that associated to DST-I to DST-VIII. Different symbols represent different values: + =−2,× =−1, · = 0, = 1, and4 = √ 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 Graphs associated to sparse operators of 2D 4×4 DCT. For visualization, coordinates are slightly shifted to prevent some edges from overlapping. Self-loops are not shown in the graphs. . . . . . . . . . . . . . . . . . . . . 77 5.5 Sparse operators associated to 2D 4× 4 DCT. Symbols· and represent 0 and 1, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.6 An example for PGF and MPGF fitting results on a length 12 line graph. The desired frequency response is h ∗ (λ) = exp(−4(λ− 1) 2 ). The PGF and MPGF filters have been optimized based on (1.6) and (5.10). . . . . . 80 5.7 Example illustrating the frequency responses of degree K = 4 PGF with least squares (LS) and minimax criteria, with weighted or unweighted settings. The filters are defined on a length 24 line graph. In the weighted setting, weights ρ i are chosen to be 2, 0, and 1 for passband, transition band, and stopband, respectively. . . . . . . . . . . . . . . . . . . . . . . . 83 5.8 Runtime vs approximation error for (a)(c) Tikhonov DCT filter, (b)(d) bandpass exponential DCT filter. Those filters are defined based on two different graphs: (a)(b) 16× 16 grid, (c)(d) length-64 line graph. Differ- ent PGF degrees K, MPGF operators involved R, and ARMA iteration numbers T , are labelled in the figures. . . . . . . . . . . . . . . . . . . . . 86 5.9 Runtime vs maximum absolute error for various designs of ideal low-pass filter on (a) 16×16 grid, and (b) length-64 line graph. . . . . . . . . . . . 89 6.1 Flow diagrams of JPEG encoder. . . . . . . . . . . . . . . . . . . . . . . . 96 6.2 Values of q i with respect to local variance, with Δ = 8. . . . . . . . . . . . 98 6.3 (a) Basis, (b) Q 0 -energy and (c) variations of Q 0 -IAGFT modes. We use j to denote indices of basis functions. In (b) and (c), two curves represent quantities for pixels with q i = 1.6 and with q i = 0.4, respectively. . . . . . 99 6.4 Flow diagrams of our proposed scheme. Blocks highlighted in blue are new components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.5 Vector quantization codewords of q i for 8× 8 blocks. . . . . . . . . . . . . 100 6.6 An example: (a) original image, (b) local variance map (c) q i map, and (d) quantized q i map with vector quantization. . . . . . . . . . . . . . . . 101 6.7 RD curves for Airplane image in (a) PSNR and (b) MS-SSIM. . . . . . . 102 6.8 Encoded image patches of (a) DCT and (b) IAGFT. . . . . . . . . . . . . 104 B.1 An example of pruning based on degree lists. (a) An example graph, (b) its weighted and unweighted degree lists, and (d) its vertex partitions and involutions on them. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 B.2 Symmetry of tree characterized by identical branches. (a) Two branches with a common root. (b) Two branches with adjacent roots. Their associ- ated involutions are φ a = (1, 6, 8, 9, 7, 2, 5, 3, 4) and φ b = (5, 7, 8, 6, 1, 4, 2, 3).119 xi B.3 An example graph for demonstration of Algorithm 5. The center of this graph is{1}. Note that a topological order given by BFS from node 1 would be (1, 2, 7, 12, 3, 6, 8, 9, 13, 4, 5, 10, 11). . . . . . . . . . . . . . . . . . 123 C.1 An illustrative example of a polyhedral cone inR 3 with a vertex at 0 and 5 edges. Any element of the cone can be represented as P 5 m=1 a m L (m) with non-negative a m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 D.1 An illustrative example for graph operator construction based on graph symmetry. (a) The 15-node human skeletal graphG. (b) The graphG ϕ associated to an alternative sparse operator by construction. All edge weights are 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 xii Abstract Graph signal processing (GSP) extends tools of conventional signal processing to graph signals, i.e., structured data residing on an irregular domain. The graph Fourier trans- form (GFT), as an extension of the discrete Fourier transform, defines the notion of frequency on graph, and can be used as a tool for frequency analysis of graph signals. However, challenges are posed by the computational complexity of graph signal process- ing techniques. First, a high dimensional graph Fourier transform can be computationally expensive and there is no fast algorithms for general graphs. Second, in graph filtering, which is a crucial tool in many GSP applications, an efficient implementation without GFT computation is a challenging problem that receives a great amount of attention. Third, while graph-based transforms have been shown to enhance the coding efficiency in MSE-based video compression framework, this application in coding schemes based on perceptually-driven metrics remains an open problem. With the aim of answering these questions, we propose several techniques in this work. First, we study algebraic properties of the graph Laplacian matrix that lead to fast implementations of the graph Fourier transform, based on which we can design fast algorithms when the graph satisfies certain symmetry or bipartition properties. Second, we propose data-driven approaches to obtain fast GFT solutions for efficient separable and non-separable transforms. Third, we extend the notion of lapped transform to a graph-based setting, and propose a generalized version of the well-known lapped orthogonal transform. Fourth, we explore sparse graph operators for the well-known discrete cosine transform (DCT) and discrete sine transform (DST), which lead us to an efficient DCT/DST graph filtering approach and a fast transform type selection scheme for video encoder. Finally, we apply irregularity-aware graph Fourier transform (IAGFT) to perceptual image coding, which gives promising experimental results in terms of a popular perceptual quality metric, structural similarity (SSIM). xiii Chapter 1 Introduction In recent years, the advances in data storage, communication, and computing power have led to an explosion of data. The demand for data science tasks, such as clustering, classification, and data-driven decision making, has been rapidly increasing in various application fields, and for a wide variety of high dimensional data. In practice, for some applications such as sensor, social, and transportation networks, data of interest is usually irregularly structured. However, conventional signal processing tools, such as the discrete Fourier transform, are targeted at regularly spaced data, such as time series. Those signal processing tools cannot be easily applied to high dimensional data lying on an irregular domain. Graphs offer the capability of representing relations among data samples. For exam- ple, in a sensor network, each sensor can be represented as a graph node, and the weight of the graph edge connecting two nodes can model the similarity (based on e.g., distance) between the associated sensors. In view of this convenience, graph signal processing (GSP) [90,104,114] has become an emerging research field, where signal processing tech- niques such as frequency analysis and filtering are extended to signals defined on graphs. In GSP, a data sample, represented as a vector, is called a graph signal. A graph signal is associated to a graph that models the underlying relations between elements of the signal, where each vertex corresponds to a sample and each edge represents the corre- lation between a pair of samples. Traditional signal processing techniques including the Fourier transform, filtering, convolution, and downsampling, can be extended for graph signals. In particular, the graph Fourier transform (GFT) is a generalization of the dis- crete Fourier transform. The GFT provides the notion of frequency for graph signals, based on which we can be perform transformations [20,36,106], frequency analysis [105], filtering [7], and sampling [1, 88] for graph signals, which lead to applications in sensor networks [21], image and video processing [11], machine learning [6], and so forth. One drawback of graph signal processing techniques is this increased complexity compared to conventional signal processing methods. In particular, the graph Fourier transform in general does not have a fast algorithm. In addition, implementation of graph filters, which usually require GFT as a key building block, can be challenging when the graph size is large. To address these issues, the main focus of this thesis is on 1 efficient graph Fourier transforms and graph filtering methods. The scope of this thesis, as well as a summary of contributions, is presented as follows. • Fast GFTs (Chapter 2): Unlike the discrete Fourier transform, which is fixed and has a well-known divide-and-conquer algorithm [15] known as the fast Fourier trans- form (FFT), the GFT is defined based on the graph structure, and in general does not have a fast algorithm. In this chapter, we investigate conditions on graph structure for a butterfly stage to be available in the associated GFT implementa- tion. We show that such fast GFTs arise when the graph has certain symmetric or bipartite properties, and propose a graph decomposition approach to derive fast GFT algorithms. We provide several examples for such fast GFTs with their applications. We also compare the runtimes of these fast GFTs with respect to the matrix GFT as well as other approximate fast GFT techniques. • Data-driven fast GFTs for video coding (Chapter 3): We consider various data- driven fast GFTs for video compression. In Section 3.1 we introduce a graph learn- ing approach for a fast GFT within the framework of Chapter 2: given a particular butterfly stage, we learn a graph whose GFT has a fast algorithm with the specified butterfly stage. Several types of butterfly stages can be used for image and video coding applications. For example, symmetric line graph transforms (SLGT) can be applied to 1D pixel blocks (Section 3.2). In Section 3.3, we propose an approx- imation method to determine the butterfly stage for the problem formulated in Section 3.1. By learning the optimal parameters of graphs from data, we obtain a compression gain for both inter- and intra-predicted residual blocks in Section 3.4. • Lapped GFTs (Chapter 4): When the number of nodes is large, applying a GFT on the entire graph may not be practical in terms of computational complexity. A more practical solution is to divide the signal into blocks, and to apply the GFT block-wise. However, this does not take into account the correlations across the block boundaries, leading to the so-called blocking artifact. In conventional signal processing, the lapped transforms [81] was proposed to eliminate blocking artifacts of block-based discrete cosine transforms (DCT). In this thesis, we extend the notion of lapped transforms to graph signals. We show that lapped graph Fourier transforms (LGFT) can be obtained and implemented under a generalized frame- work of the lapped orthogonal transform design [81]. Using a piecewise smooth image as an example, we show that the lapped transform leads to a compression gain compared to DCT and the lapped orthogonal transform, and with blocking artifacts reduced. 2 • Fast DCT and DST graph filtering (Chapter 5): We study the discrete cosine transform (DCT) and discrete sine transform (DST) and their associated filtering operations from a graph-based perspective. For each transform within the 16 types of DCT and DST, we characterize a family of sparse graph operators, where all of these operators have the DCT/DST as their eigenmatrix. The sparse operators can be viewed as graph filters operating in the DCT/DST (graph frequency) domain, but with sample (graph vertex) domain implementation. With the derived sparse graph filters, we propose a filter design approach that allows us to approximate a filter with an arbitrary frequency response by a linear combination of sparse operators that can be applied efficiently in graph vertex domain. Experimental results demonstrate that DCT/DST filters can be approximated with an efficient one when the graph size is sufficiently large. In addition, we apply our method to rate-distortion optimization process for transform type selection in video codec, which gives a negligible compression loss with a significant encoding time saving. • Adaptive GFT for perceptual video coding (Chapter 6): In image and video coding applications, perceptual quality metrics such as Structural Similarity (SSIM) are typically used after encoding, but not tied to the encoding process. We consider an alternative framework where the goal is to optimize a weighted MSE metric, where different weights can be assigned to each pixel so as to reflect their relative importance in terms of perceptual image quality. For this purpose, we propose a novel transform coding scheme based on irregularity-aware graph Fourier trans- form (IAGFT), where the induced IAGFT is orthogonal, but the orthogonality is defined with respect to an inner product corresponding to the weighted MSE. We propose to use weights derived from local variances of the input image, such that the weighted MSE aligns with SSIM. In this way, the associated IAGFT can achieve a coding efficiency improvement in terms of multi-scale SSIM with respect to conventional transform coding based on DCT. 1.1 Notations and Conventions We use bold symbols to denote vectors and matrices. Lowercase symbols with double sub-indices are used to denote matrix elements. For example,m i,j denotes the (i,j) entry 3 of the matrix M. The n×n identity matrix is denoted by I n . The n×n order-reversal permutation matrix is denoted by J n = 1 1 . . . 1 . (1.1) When J n right (resp. left) multiplies another matrix, it flips this matrix left to right (resp. up to down). It also satisfies J n = J > n and J n J n = I. In (1.1) and in what follows, the entries not included in the matrix are meant to be zero. The subscripts of I and J matrices indicate their sizes, and may be omitted for brevity. For scalars a 1 ,...,a k and square matrices M i with arbitrary sizes, we denote diagonal and block diagonal matrices in compact notation as diag(a 1 ,...,a k ) = a 1 a 2 . . . a k diag(M 1 ,..., M k ) = M 1 M 2 . . . M k . Finally, the set of n-dimensional real-valued vectors is denoted as R n , and the set of n×n orthogonal matrices (with columns normalized to have unit norms) is denoted as O n . Finally, the pseudo inverse matrix of M is denoted as M † . 1.2 Review of Graph Signal Processing In this thesis, all graphs we consider are assumed to be undirected and weighted. Let x be a length-n graph signal associated to a graphG(V,E, W). There are n nodes in the vertex setV, each corresponding to one of the entries of x. Each edge e i,j ∈E describes the inter-sample relation between nodes i and j. The (i,j) entry of the weight matrix, w i,j , is the weight of the edge between nodesi andj. The combinatorial graph Laplacian (CGL) matrix ofG is: L CGL = D− W, (1.2) 4 where the degree matrix D = diag(d 1 ,...,d n ) is a diagonal matrix with d i = n X j=1 j6=i w i,j . (1.3) The symmetric normalized Laplacian (GGL) is defined asL = D −1/2 L CGL D −1/2 . In many cases, we consider graph with self-loops, i.e., edges going from a node to itself. We allow these self-loops to be weighted, with s i := w ii being the weight of the self-loop on node i. This leads to the definition of generalized graph Laplacian: L = D− W + S, (1.4) where S = diag(s 1 ,...,s n ) is the diagonal self-loop matrix. In this thesis, if not stated otherwise, the terms “Laplacian” or “unnormalized Laplacian” refer to GGL, where the graph is allowed to have self-loops. A graph is simple if it does not contain any self-loops, andk-regular if all node degrees are k. The Laplacian matrix for such graph is called a combinatorial graph Laplacian. We call a graph bipartite if its vertices can be divided into two disjoint sets (or, two parts)S 1 andS 2 such that every edge connects a vertex inS 1 and one inS 2 . An acyclic graph is a graph with no cycles. If an acyclic graph is connected, it is called a tree. Otherwise, it is called a forest. 1.2.1 Graph Fourier Transform There are several definitions of the GFT [36, 90, 104, 114], depending on whether the graph is directed, which fundamental graph operator is used (e.g., adjacency or Laplacian matrix), and how the graph signal energy is defined. Here, we adopt the definition in [114], where the GFT is obtained from the eigen-decomposition of the graph Laplacian matrix, L = UΛU > . Then, the i-th coefficient of GFT of a graph signal x is defined as the projection of x onto u i , thei-th column of U. In some applications, such as spectral clustering [132], it may be beneficial to use a GFT defined on the eigenvectors of the symmetric normalized LaplacianL. However, in what follows, we use GFTs associated to the GGL unless stated otherwise. 5 GFT coefficients provide a frequency representation of the given signal, since GFT basis functions associated to lower (resp. higher) eigenvalues represent lower (resp. higher) variation on the graph. To see this, we note that the Laplacian quadratic form f > Lf = X (i,j)∈E w i,j (f i −f j ) 2 + n X k=1 s k f 2 k (1.5) measures the variation of signal f on the graph. Since w i,j and s k are non-negative, L is positive semi-definite and thus f > Lf is always non-negative. We know that the eigenvectors of L are the solution to u 1 = argmin kfk=1 f > Lf, u k = argmin f⊥u 1 ,...,u k−1 ,kfk=1 f > Lf. Thus, eigenvectors u 1 , ... , u n form an orthogonal basis with functions having lower to higher variations on the graph. The quantities of their corresponding variations are given by the associated eigenvalues λ 1 , ... , λ n , which are also called graph frequencies. In addition, from the graph variation quantity (1.5), we can see the connection between GFT basis functions and the graph weights. Note that, when w i,j , the weight of edge (i,j), is large, the first GFT basis function u 1 tends to have similar values on entries i and j. Similarly, when there is a self-loop with a large weight on node k, then u 1 tends to have its k-th entry close to zero. The GFT has a wide range of applications. First, based on the GFT and its frequency interpretation, graph spectral filters [51] can be defined by multiplying the GFT coeffi- cients by the frequency response in the graph spectral domain, leading to applications such as image denoising and edge-preserving smoothing [7,89]. Second, when the graph signal is modeled by a Gaussian Markov random field (GMRF) [101], the corresponding GFT can be regarded as the orthogonal transform that optimally decorrelates the graph signal and hence reduces spatial redundancies. Based on this fact, the GFT has been applied to image and video compression [26, 32, 50]. Third, in machine learning, when relations between data points are modeled by a graph, the GFT can be used for data clustering [132] and dimensionality reduction for classification [85,121]. 1.2.2 Graph Filtering For a given graph with Laplacian L, we consider a 1-hop graph operator Z, which could be adjacency or Laplacian matrix. When it is applied to a signal x to compute y = Zx, it defines an operation for each node with its 1-hop neighbors (e.g., when Z = A, 6 y(i) = P j∈N (i) x(j) is a sum over the neighbors of graph node i). Furthermore, it can be shown that a y = Z K x is a K-hop operation, and thus a degree-K polynomial of Z, H = K X k=0 g k Z k , with Z 0 = I, (1.6) induces an operation for each node with its neighbors within K hops. The operation defined in (1.6) is thus called a graph filter, an FIR graph filter, or a polynomial graph filter (PGF). In what follows, we refer to Z as graph operator for short 1 . For the rest of this thesis, we choose Z = L or define Z as a matrix with the same eigenbasis as L, e.g., Z could be a polynomial of L such as Z = 2I− L. Let Φ = (φ 1 ,...,φ N ) be the matrix of eigenvectors of Z = L, and λ j the associated eigenvalue ofφ j , then it follows that Φ is also the eigenvector matrix of the polynomial H. That is, H = Φ·h(Λ)· Φ > , h(Λ) := diag(h(λ 1 ),··· ,h(λ N )). (1.7) The eigenvalue h(λ j ) of H associated toφ j is called the frequency response of λ j . Note that with y = Hx, in the GFT domain we have ˆ y = h(Λ)ˆ x, meaning that the filter operator amplifies or reduces the signal component with λ j frequency by h(λ j ) in the GFT domain. We also note that the definition via (1.7) generalizes the notion of digital filter, since when Φ is the discrete Fourier transform (DFT) matrix, H reduces to the classical Fourier filter [105]. Given a desired graph frequency response h = (h 1 ,...,h N ) > , its associated polynomial coefficients in (1.6) by solving a least squares minimization problem [105]: g = argmin g ||h− Ψg|| 2 , where Ψ = 1 λ 1 ... λ K 1 . . . . . . . . . . . . 1 λ N ... λ K N , (1.8) withλ j being thej-th eigenvalue of Z. The PGF operation y = Hx can be done via an efficient iterative algorithm: 1) t (0) =g K x, 2) t (i) = Zt (i−1) +g K−i x, and 3) y = t (K) . Its complexity is given byO(KE) with E the number of graph edges. Note that this algorithm does not require GFT computation, and its complexity depends on how sparse Z is (lower complexity for sparser Z). 1 In the literature,Z is often called graph shift operator [34,51,104,109]. Here, we simply call it graph operator, since its properties are different from shift in conventional signal processing, which is always reversible, while the graph operator Z, in most cases, is not. 7 (a) (b) Figure 1.1: Graphs associated to (a) DCT-II, (b) DST-IV (ADST). 1.2.3 Gaussian Markov Random Fields (GMRFs) A Gaussian Markov random field (GMRF) [101] x∼N (μ, Σ) is a Gaussian random vector, whose inverse covariance matrix (also called a precision matrix) L = Σ † is positive semidefinite. Each GMRF x is associated with an undirected graphG, where the relations among its entries can be characterized by L: the graph vertices correspond to elements in x, and edges ofG describe the partial correlations among elements of x: Corr(x i ,x j |x −ij ) =− l ij p l ii l jj , i6=j, which is the correlation between elementsi andj given all the other elements. Note that for a general GMRF, the associated graphG may have negative edge weights. When the precision matrix L of a GMRF x is a graph Laplacian matrix, i.e., its off- diagonal entries of are non-positive, then x is called an attractive GMRF. In this case, the graphG associated to x has non-negative edge weights. Among attractive GMRFs, we say x is a diagonally dominant GMRF if L is diagonally dominant (i.e.,|l ii |> P j6=i|l ij |); we call x an intrinsic GMRF if L is singular (i.e., L is a combinatorial Laplacian) [25]. In image and video compression, pixel data can be modeled by attractive GMRFs, whose probabilistic properties lead to justification of the use of DCT and ADST [99,117]. We summarize the connection between DCT/ADST and graph Laplacian matrices below. 1.2.4 DCT and DST The discrete cosine transform (DCT) and discrete sine transform (DST) are discrete trigonometric transforms (DTT). These are orthogonal transforms that operate on a finite sequence, where the basis functions are cosine functions and sine functions, respec- tively. Depending on how samples are taken from a continuous cosine and sine functions, eight types of DCT and eight types of DST can be defined [136]. The definitions are shown in Table 1.2, where scaling factors c j andd j are included to make the transforms orthogonal [99, Table A.1]. Table 1.1 shows left and right boundary conditions (b.c.) of all DTTs, which arise from even and odd symmetries of the cosine and sine functions, 8 Right b.c. φ j (N +k) =φ j (N−k) φ j (N +k) =−φ j (N−k) φ j (N +k) =φ j (N−k + 1) φ j (N +k) =−φ j (N−k + 1) Left b.c. φ j (k) =φ j (−k + 2) DCT-I DCT-III DCT-V DCT-VII φ j (k) =−φ j (−k) DST-III DST-I DST-VII DST-V φ j (k) =φ j (−k + 1) DCT-VI DCT-VIII DCT-II DCT-IV φ j (k) =−φ j (−k + 1) DST-VIII DST-VI DST-IV DST-II Table 1.1: Left and right boundary conditions (b.c.) of 16 DCTs and DSTs. and can be trivially verified based on DTT definitions in Table 1.2. We will refer to the 8 DCT types DCT-I to DCT-VIII, and to those DST types DST-I to DST-VIII. Discrete Cosine Transform (DCT) In what follows, the term DCT refers to DCT-II, the most widely used type, unless stated otherwise. For j = 1,...,N and k = 1,...,N, we write the k-th element of the j-th DCT-II basis function with length N as u j (k) = r 2 N c j cos (j− 1)(k− 1 2 )π N , (1.9) with normalization constant c j being 1/ √ 2 for j = 1 and 1 otherwise. If those basis functions are written in vector form u j ∈ R N , it was pointed out in [117] that u j are eigenvectors of the N×N Laplacian matrix L DCT-II = 1 −1 −1 2 −1 . . . . . . . . . −1 2 −1 −1 1 . (1.10) This means that the DCT is the GFT of a graphG D associated to Laplacian matrix L DCT-II . In fact,G D is a line graph with uniform weights, where the case with N = 8 is shown in Fig. 1.1(a). The eigenvalue corresponding to u j isω j = 2− 2 cos((j− 1)π/N). 9 Table 1.2: Definitions of all types of DCTs and DSTs. The indices j and k range from 1 to N. Scaling factors for rows and columns are given by c j = 1/ √ 2 for j = 1 and 1 otherwise, and d j = 1/ √ 2 for j =N and 1 otherwise. DTT Transform functions DCT-I φ j (k) = q 2 N−1 c j c k d j d k cos (j−1)(k−1)π N−1 DCT-II φ j (k) =u j (k) = q 2 N c j cos (j−1)(k−1/2)π N DCT-III φ j (k) = q 2 N c k cos (j−1/2)(k−1)π N DCT-IV φ j (k) = q 2 N cos (j−1/2)(k−1/2)π N DCT-V φ j (k) = 2 √ 2N−1 c j c k cos (j−1)(k−1)π N−1/2 DCT-VI φ j (k) = 2 √ 2N−1 c j d k cos (j−1)(k−1/2)π N−1/2 DCT-VII φ j (k) = 2 √ 2N−1 d j c k cos (j−1/2)(k−1)π N−1/2 DCT-VIII φ j (k) = 2 √ 2N+1 cos (j−1/2)(k−1/2)π N+1/2 DST-I φ j (k) = q 2 N+1 sin jkπ N+1 DST-II φ j (k) = q 2 N d j sin j(k−1/2)π N DST-III φ j (k) = q 2 N d k sin (j−1/2)kπ N DST-IV φ j (k) =v j (k) = q 2 N sin (j−1/2)(k−1/2)π N DST-V φ j (k) = 2 √ 2N+1 sin jkπ N+1/2 DST-VI φ j (k) = 2 √ 2N+1 sin j(k−1/2)π N+1/2 DST-VII φ j (k) = 2 √ 2N+1 sin (j−1/2)kπ N+1/2 DST-VIII φ j (k) = 2 √ 2N−1 d j d k sin (j−1/2)(k−1/2)π N−1/2 Discrete sine transform (DST) One of the most notable applications of DST is image and video coding. In particular, it has been shown that DST-VII optimally decorrelates 1D intra residual blocks following a Gaussian Markov model [45, 49]. In [46], it was pointed out that a fast algorithm is 10 available for DST-IV, a variation of DST-VII that can achieve a comparable compression gain. While DST-VII has been adopted by the HEVC standard [120], DST-IV is used in AV1 and AV2 standards [10], developed by the Alliance for Open Media (AOM). In this thesis, the term ADST refers to DST-IV as in AV1. We denote v j a DST-IV basis function, which has the form v j (k) = r 2 N sin (j− 1 2 )(k− 1 2 )π N . (1.11) The vectors v j are eigenvectors of L DST-IV = 3 −1 −1 2 −1 . . . . . . . . . −1 2 −1 −1 1 , (1.12) with corresponding eigenvaluesδ j = 2− 2 cos((j− 1/2)π/N). In addition, DST-IV is the GFT of a graphG A . The graphG A corresponding to N = 8 is shown in Fig. 1.1(b). 1.3 A Roadmap In the following chapters, we will study various graph-based approaches, such as transfor- mations and filtering operations based those definitions introduced above. In Chapter 2 we explore conditions of graphs for fast GFT algorithms. Based on the results of Chap- ter 2, we consider efficient graph-based transform (including GFT and lapped graph Fourier transform (LGFT)) with applications in image and video coding in Chapter 3. Then, we investigate efficient DTT filtering methods using graph filter design approaches in Chapter 5. Finally, we study a variation of GFT, irregularity aware graph Fourier transform (IAGFT), in Chapter 6 to enhance image perceptual quality. 11 Chapter 2 Fast GFT based on graph symmetry and bipartition The discrete Fourier transform (DFT) has a well-known divide and conquer algo- rithm: fast Fourier transform (FFT) [15]. The availability of FFT algorithm is very important for many signal processing problems, particularly in speech and biomedical signal processing. However, the GFT, as a generalization of the DFT, does not have fast algorithms in general. Note that the DFT basis functions are always even or odd sym- metric, which can be exploited for obtaining fast algorithms. However, arbitrary graph topologies do not always lead to Laplacian eigenvectors with the same symmetry proper- ties. Lack of fast algorithms is a significant drawback for GSP approaches, particularly when the GFT needs to be applied repeatedly. This has led researchers to investi- gate techniques for fast GFT computation. Magoarou et. al. have proposed a series of approaches for fast GFT approximation [60–62]. The work in [61] uses a gradient- descent-based optimization approach to approximate the GFT matrix by a product of sparse matrices, while [60] refines this method such that the resulting transform matrix can approximately diagonalize the graph Laplacian. In [62], a truncated Jacobi algorithm was introduced for picking the Givens rotations used in the approximate fast GFT, lead- ing to an implementation with the structure shown in Figure 2.1. This approach was further analyzed in [63], which demonstrates that more Givens rotations are required to approximate the Laplacian eigenvectors whose corresponding eigenvalues are close. Although these methods [60–62] are able to find approximate fast GFTs, they do so without taking advantage of structural properties of the original graph. The relation of topological properties of graphs, such as bipartition, repeated subgraphs, symmetry, and uniformity of weights, to the structure of the GFT bases is an important topic in GSP. In this chapter, we investigate graph topological properties that lead to fast implemen- tations of the GFT, then derive the resulting fast GFT algorithms. In particular, we will show that for graphs with certain symmetry or bipartition properties, a exact and fast GFTs based on Haar units (as will be defined in Section 2.1) can be designed. We Work in this chapter has been published in [72] with MATLAB and C code implementation available in [69]. 12 Figure 2.1: Fast transform usingJ layers of Givens rotations. The parameter 0<θ i,j ≤π is thej-th rotation angle in thei-th butterfly stage, and Π k are permutation operations. propose divide-and-conquer fast GFT algorithms for symmetric graphs and demonstrate that the resulting fast GFTs lead to significant complexity reduction, leading to potential benefits in hardware implementation or in scenarios where the graph is fixed and the corresponding GFT is applied multiple times. We show that graphs for which such Haar- unit-based fast GFTs can be developed are useful in applications such as video coding and human action analysis. Unlike fast approximate GFTs [60–62], our fast GFTs are based on graph topological properties, and are exact. Experimental results show that as long as the desired graph symmetry property is available, our fast GFTs can provide a better speed improvement than [62]. The rest of this chapter is organized as follows. In Section 2.1 we introduce the so- called Haar unit and derive the conditions for a stage of Haar units to be available in a GFT implementation. In Section 2.2 we define the notion of symmetry for graphs that gives rise to a Haar unit stage, and propose a graph decomposition method for obtaining fast GFT algorithms. Some examples of fast GFTs are shown in Section 2.3. We present experimental results for the complexity of the derived fast GFTs in Section 2.4. Finally, Section 2.5 concludes this chapter. 13 (a) (b) Figure 2.2: Two examples of fast algorithms using butterfly stages with n = 8 . 2.1 Haar Units in GFTs An n dimensional Givens rotation [39], commonly referred to as a butterfly [46, 62, 64], is a linear transformation that applies a rotation of angle θ to two coordinates, denoted as p and q. Its associated matrix Θ(p,q,θ) has the form: Θ pp = Θ qq = cosθ, Θ qp =−Θ pq = sinθ, Θ ii = 1, i6=p, q, Θ ij = 0, otherwise. (2.1) A system using layers of parallel Givens rotations, such as the one in Figure 2.1, can be used to design an approximate fast transform. In particular, each Givens rotation can be implemented using three lifting steps [17], which further reduces the number of operations involved. In this work, we aim to achieve computation efficiency by using Givens rotations with angle θ = π/4, which typically require only an addition and a subtraction (the factor 1/ √ 2 can be absorbed into other stages of the transform computation). This is also equivalent to a 2×2 Haar transform, so we refer to this operator as Haar unit, as opposed to general Givens rotations. We say that a butterfly stage is a stage in a transform diagram with several parallel Givens rotations or Haar units. For example, in Figure 2.2(a), we call the stage that produces y i from x i a butterfly stage, and the operator that produces y 1 and y 8 from x 1 and x 8 a Haar unit of this butterfly stage. We consider a divide and conquer framework based on stages of Haar units and parallel sub-transforms, as illustrated Figure 2.2. For each Haar unit, we always assume 14 (a) (b) Figure 2.3: (a) The 4-node cycle graph and (b) a fast algorithm for its GFT. that the two output variables, such as y 1 and y 8 in Figure 2.2(a), will be inputs of different sub-transforms in the next stage. Otherwise, this Haar unit, such that the one acting on x 1 and x 8 , can be trivially absorbed into the next stage. As a first example, we consider a 4-node cycle graph with unity edge weights as in Figure 2.3(a). It has a GFT matrix U C 4 = 1 2 1 1 1 1 1 −1 1 −1 1 −1 −1 1 1 1 −1 −1 . Note that the second and third columns of U C 4 both correspond to eigenvalue 2, which has multiplicity 2. This means that the GFT basis is not unique, because we can obtain another orthogonal basis for the eigenspace corresponding to eigenvalue 2. An example of another basis for this GFT is the length-4 DFT, which has a well-known fast algorithm [15]. Despite this non-uniqueness, a fast algorithm for a particular GFT basis would still be useful: we can first apply it to obtain coefficients for this particular basis, then apply an m-dimensional rotation (m×m orthogonal transform) on those m GFT coefficients associated to eigenvalues with a multiplicity m > 1, to obtain coefficients associated to another GFT basis. For example, if we apply a proper rotation to c x 2 and c x 3 in Figure 2.3(b), we can obtain the second and third DFT coefficients. In what follows, we study GFT implementations for which stages of Haar units are available. In cases where eigenvalues of high multiplicity are present, we favor the set of eigenvectors for the corresponding subspace that will lead to a more efficient implementation. Based on the structure of U C 4 , it can be seen that the GFT can be implemented using two butterfly stages, as in Figure 2.3(b). We refer to those Haar units located at the first stage (such as the one acting on x 1 and x 4 ) as left Haar units, and those located at the last stage (such as the one producingc x 1 andc x 2 ) as right Haar units. We will explore the 15 conditions that allow a GFT to be factored into terms that include left and right Haar units, which will enable us to develop techniques for designing such fast GFTs. We will show that right Haar units are associated to bipartite graphs (Section 2.1.1), while left Haar units are related to graph symmetries (Section 2.1.2). For a general graph with n nodes, we define the following n×n orthogonal matrix to represent a stage of p parallel Haar units (with p≤n/2): B n,p = 1 √ 2 I p 0 J p 0 √ 2I n−2p 0 J p 0 −I p . (2.2) Note that B > n,p = B n,p , and when we multiply a vector x = (x 1 ,...,x n ) > by B n,p , we have (B n,p · x) i = 1 √ 2 (x i +x n+1−i ), i = 1,...,p x i , i =p + 1,...,n−p 1 √ 2 (−x i +x n+1−i ), i =n−p + 1,...,n For example, B 8,4 are B 8,3 are equivalent to the butterfly stages in Figures 2.2(a) and (b), respectively, with a scaling constant 1/ √ 2. The factors √ 2 and 1/ √ 2 are included in (2.2) so that the columns of B n,p have unit norms; in this way, when an orthogonal matrix U satisfies U = B n,p ¯ U, then ¯ U is orthogonal as well, meaning that U can be factorized into a butterfly stage and another orthogonal transform. 2.1.1 Right Haar Units Let an orthogonal transform U have a right butterfly stage with p Haar units, and assume without loss of generality that the entries of input and output vectors are properly ordered. Then, in compact notation, the GFT of input x can be written as U > x = B n,p · diag(E > , F > )· x, (2.3) where E∈O n−p and F∈O p . This means that U = E 11 E 12 0 E 21 E 22 0 0 0 F 1 √ 2 I p 0 J p 0 √ 2I n−2p 0 J p 0 −I p = 1 √ 2 E 11 √ 2E 12 E 11 J p E 21 √ 2E 22 E 21 J p FJ p 0 −F , (2.4) where E 11 , E 12 , E 21 , and E 22 are subblock components of E. Recall that J flips a matrix left to right when right-multiplied. Thus, for k = 1,...,p, if we denote the k-th column 16 Table 2.1: Definitions of symmetries for vectors, matrices, and graphs. Subject Terminology Definition Vector v Even symmetric v = Jv Odd symmetric v =−Jv Matrix M Symmetric M = M > Centrosymmetric [5] M = JM > J Bisymmetric [5] M = M > = JM > J GraphG(V,E, W) φ-symmetric (Definition 2) w i,j =w φ(i),φ(j) ,∀i,j of U as u k = (e > k , f > n−p−k+1 ) > , then the (n−k + 1)-th column of U is (e > k ,−f > n−p−k+1 ) > . GFT matrices with this structure arise for k-regular bipartite graphs (k-RBGs): Lemma 1 ([52]). Let L be the Laplacian of a k-RBG withS 1 ={1,...,p} andS 2 = {p + 1,...,n}. If u = (u > 1 , u > 2 ) > with u 1 ∈R p and u 2 ∈R n−p is an eigenvector of L with eigenvalue λ, then ˆ u = (u > 1 ,−u > 2 ) > is an eigenvector of L with eigenvalue 2k−λ. If we consider eigenvectors of the symmetric normalized Laplacian, the same symme- try properties hold for any bipartite graph: Lemma 2 ([14]). Let L be the normalized Laplacian of a bipartite graph withS 1 = {1,...,p} andS 2 ={p + 1,...,n}. If u = (u > 1 , u > 2 ) > with u 1 ∈R p and u 2 ∈R n−p is an eigenvector ofL with eigenvalue λ, then ˆ u = (u > 1 , −u > 2 ) > is an eigenvector of L with eigenvalue 2−λ. From Lemmas 1 and 2, we know that as long as the graph satisfies certain conditions (bipartite forL or k-RBG for L), then there exists a GFT matrix with a right butterfly stage as in (2.3), even though the GFT matrix may not be unique. The matrices E and F can be obtained by multiplying B n,p in the right of a pre-computed U: diag(E, F) = U· B n,p . 2.1.2 Left Haar Units If n is even and the GFT has a butterfly stage in the left with exactly n/2 Haar units as in Figure 2.2(a), then U = B n,n/2 U + 0 0 U − ! = 1 √ 2 U + JU − JU + −U − ! , 17 where U + , U − ∈O n/2 . From the right hand side we see that each column u i of U must be either even symmetric (i.e., u i = Ju i ) or odd symmetric (i.e., u i =−Ju i ). In this case, the Laplacian must be centrosymmetric (symmetric around the center): Lemma 3 ([5]). Let n be even. An n×n matrix Q has a set of n linearly independent eigenvectors that are even or odd symmetric if and only if Q is centrosymmetric, i.e., Q = JQ > J. Note that the Laplacian matrix L of an undirected graph is symmetric (L = L > ) by nature, but an additional centrosymmetry condition (L = JL > J) is required so that Lemma 3 holds. Such a matrix with both symmetries (L = L > = JL > J) is called bisymmetric, and its entries are symmetric around both diagonals. The various types of symmetries considered in this chapter listed in Table 2.1. Lemma 3 states that for even n, a GFT can be factored to include n/2 left Haar units if and only if the associated Laplacian matrix is bisymmetric (with nodes properly ordered). We now generalize this result to the case when there are only p < n/2 Haar units in the first butterfly stage, and with a possibly odd n. Again, we assume without loss of generality that the graph nodes, input, and output variables are properly ordered (notations defined for general node ordering will be introduced in Section 2.2). We let V X ={1,...,p},V Z ={p+1,...,n−p}, andV Y ={n−p+1,...,n} be disjoint subsets of vertices. We define G := B > n,p · L· B n,p , (2.5) and denote the corresponding subblock components of L and G as L = L XX L XZ L XY L ZX L ZZ L ZY L YX L YZ L YY , G = G XX G XZ G XY G ZX G ZZ G ZY G YX G YZ G YY . (2.6) Similar to (2.3) and (2.4), the GFT matrix with a first butterfly stage of p Haar units has the form U = B n,p · diag(U + , U − ), U + ∈O n−p , U − ∈O p . (2.7) Then the following lemma describes the conditions for L to have a GFT withp left Haar units. Lemma 4. Let L be a graph Laplacian matrix, then there exists a GFT matrix U in the form of (2.7), i.e., the associated G in (2.5) is block-diagonal, if and only if L YY = JL XX J, L YX = JL XY J, L ZY = L ZX J. (2.8) 18 Note that whenZ is empty, i.e.,p =n/2, (2.8) implies that L has to be centrosymmetric, as in Lemma 3. Proof: If U is a GFT matrix satisfying (2.7), we denote the subblocks of U + as U + XX , U + XZ , U + ZX , and U + ZZ , and rewrite (2.7) as U = 1 √ 2 I p 0 J p 0 √ 2I n−2p 0 J p 0 −I p U + XX U + XZ 0 U + ZX U + ZZ 0 0 0 U − (2.9) Denote the matrix of eigenvalues of L as Λ = diag(Λ X , Λ Z , Λ Y ) with subblock sizes p, n− 2p, and p, respectively. Then, we can express each subblock of L by expanding L = UΛU > with (2.9), and we can trivially verify that (2.8) holds. To show the converse, we assume that (2.8) holds. Expanding the right hand side of (2.5), we can express the subblocks of G in terms of those of L. In particular, G XY = 1 2 (L XX J + JL YX J− L XY − JL YY ), G ZY = √ 2 2 (L ZX J− L ZY ). With these expressions, (2.8) implies that G XY , G ZY , and their transpose versions G YX , G YZ are all zero. This means that G is block-diagonal, and thus has an eigendecompo- sition as G = diag(V 1 , V 2 )· diag(λ 1 ,...,λ n )· diag(V 1 , V 2 ) > , where V 1 ∈O n−p and V 2 ∈O p . It follows that B n,p · diag(V 1 , V 2 ) is an eigenmatrix of L = B n,p · G· B > n,p as in (2.7). Under the conditions of (2.8), G reduces to G = L XX + L XY J √ 2L XZ 0 √ 2L > XZ L ZZ 0 0 0 L YY − JL XY , (2.10) and U + and U − are respectively the eigenmatrices of L + := L XX + L XY J √ 2L XZ √ 2L > XZ L ZZ ! , L − := L YY − JL XY . (2.11) A diagram with n = 8, p = 3 is shown in Figure 2.2(b) as an example. 19 (a)G1 (b)G2 Figure 2.4: Example of symmetric graphs. (a) A graph with a bisymmetric Laplacian matrix. (b) A graph with a Laplacian satisfying (2.8). Note that the desired properties in L correspond to certain symmetry properties in the graph topology. IfV Z is empty, Lemma 4 implies that w i,j =w n+1−i,n+1−j , ∀i∈V, j∈V. (2.12) In this case, when we plot the nodes in order on a 1D line, we can identify an axis in the middle, around which all edges and self-loops are symmetric. An example of a graph whose Laplacian is bisymmetric is shown in Figure 2.4(a). More generally, ifV Z is nonempty, then the first two equations in (2.8) indicate that the sub-matrix of L associated toV X andV Y is bisymmetric. This means thatV X and V Y contain vertices that are symmetric to each other. The third equation in (2.8) implies that when there is an edge connecting k∈V Z and i∈V X , there must be an edge with the same weight connecting k and n + 1−i∈V Y as well. An example of this type of graph is shown in Figure 2.4(b), whereV X ={1},V Z ={2, 3}, andV Y ={4}. Similar to Figure 2.4(a), we can identify a symmetry around the middle, though nodes inV Z are not paired with symmetric counterparts. Based on the observations above, we see that left Haar units are available when the graph has symmetry properties related to Lemma 4. However, Lemma 4 assumes that the nodes had been ordered properly, so that the Laplacian has the required bisymmetric structure. In general, the node labels of a graph will not be such that this condition is automatically met, even if the graph is symmetric. For example, if a different node labeling is applied to the graph of Figure 2.4(a), its corresponding Laplacian may not 20 be bisymmetric anymore. In the next section we study methods to identify graph sym- metries directly using node pairing functions, which will allow us to design fast GFT algorithms, regardless of how the nodes are initially labeled. 2.2 Fast GFTs Based on Graph Symmetry In this section, we will characterize how Lemma 4 relates to the graph topology. In par- ticular, we define the symmetry properties observed in Figure 2.4 based on an involution (node-pairing function) in Section 2.2.1. Given an observed graph symmetry charac- terized by an involution, in Section 2.2.2 we define the node setsV X ,V Y , andV Z . In Section 2.2.3, we propose a graph decomposition approach for searching fast GFTs in stages. 2.2.1 Graph Symmetry Based on Node Pairing The symmetries of Figure 2.4 can be described in terms of complete (Figure 2.4(a)) and incomplete (Figure 2.4(b)) node pairings. Such pairings can be defined by bijective mappings that are their own inverses, namely, involutions: Definition 1 (Involution [96]). A permutation on a finite setV is called an involution if it is its own inverse, i.e., φ(φ(i)) =i for all i∈V. We will use them to identify graph symmetries. Definition 2 (φ-symmetric graph). Letφ be an involution on the vertex setV of a graph G, thenG is φ-symmetric if w i,j =w φ(i),φ(j) for all i∈V, j∈V. Note that in Definition 2 the required property has to hold also for i = j. That is, s i =w i,i =w φ(i),φ(i) =s φ(i) , meaning that the self-loops on nodesi andφ(i) are required to have the same weight. Also note that, among permutations, only involutions are valid to define the symmetry described in Definition 2, since the pairing functions that lead to the conditions in (2.8) can only be induced by involutions. 1 LetV ={1,...,n} be the vertex set, and let us denote an involution φ as φ = (φ(1),φ(2),...,φ(n)). For example, the involutions corresponding to the symmetries of 1 Note that graph symmetry can be defined differently in different contexts. In algebraic graph theory, graph symmetry is defined based on transitivity of vertices and edges [38]. In [28], a graph is called symmetric if there exists a non-identical permutation φ (not necessarily an involution) on the graph nodes that leaves the graph unaltered. These definitions are beyond the scope of this work. When we refer to graph symmetry, we always assume an involution φ is specified such that Definition 2 holds. 21 graphs in Figures 2.4(a) and (b) are φ a = (4, 3, 2, 1) and φ b = (4, 2, 3, 1), respectively. We also denote the number of available Haar units for a given φ as p φ := 1 2 ×|{i∈V : i6=φ(i)}|. (2.13) 2.2.2 Node Partitioning for Haar Units Once we observe a graph symmetry and characterize it by an involutionφ, we can identify the nodes on the axis of symmetry,V Z :={i∈V : φ(i) = i}, then partition the other nodes into two setsV X andV Y such that nodes in those sets belong to different sides of the symmetry axis. In this way, we can define an orthogonal matrix B φ as a permuted version of B n,p φ based on φ,V X ,V Y , andV Z in the following way: (B φ ) i,j = 1/ √ 2, i =j∈V X −1/ √ 2, i =j∈V Y 1, i =j∈V Z 1/ √ 2, i∈V X , j =φ(i)∈V Y 1/ √ 2, i∈V Y , j =φ(i)∈V X 0, otherwise (2.14) This means that L φ := B > φ LB φ is a permuted version of (2.10), whose block diagonal structure gives the following theorem. Theorem 1 (Block-diagonalization of Laplacian based on graph symmetry). Let the graphG with Laplacian L be φ-symmetric. Then, (L φ ) i,j = (L φ ) j,i = 0 if i∈V X ∪V Z and j∈V Y . While Theorem 1 is derived based on the unnormalized Laplacian L, it holds for normalized Laplacian as well. 2.2.3 Main Approach–Decomposition of Symmetric Graphs The block-diagonalization of (2.5) maps L to G via B n,p , where G in (2.10) can be regarded as the Laplacian of a graph with two connected components, with possibly negative edge weights and respective Laplacians L + and L − defined in (2.11). Thus,G with Laplacian L is decomposed into two separate graphsG + andG − with Laplacians L + and L − , vertex setsV + :=V X ∪V Z andV − :=V Y , weight matrices W + and W − , respectively. In this way, the GFT of L can be implemented using Haar units in B n,p , followed by two sub-GFTs characterized by Laplacians L + and L − . Explicitly considering the graphs resulting from this decomposition is useful because the symmetry properties 22 in L + and L − can be further explored to determine whether additional reductions in complexity are possible. Moreover, considering the transforms after the Haar units as GFTs can potentially help us provide better interpretations of the overall GFT. By definition of graph Laplacian, we can express the self-loop weights s i and edge weights w i,j in terms of entries of the Laplacian matrix L = (l i,j ) i,j , and vice versa: s i = X j∈V l i,j , w i,j =−l i,j for i6=j, (2.15) l ii =s i + X j∈V j6=i w i,j , l i,j =−w i,j for i6=j. (2.16) Together with (2.11), we can express the self-loop/edge weights ofG + andG − in terms of those ofG, as described in the following theorem. Theorem 2. IfG isφ-symmetric with node partitionsV X ,V Y , andV Z , then the weights ofG + (with vertex setV + =V X ∪V Z ) andG − (with vertex setV − =V Y ) are given by w + i,j = w i,j +w i,φ(j) , if i∈V X , j∈V X √ 2w i,j , if i∈V X , j∈V Z or i∈V Z , j∈V X w i,j , if i∈V Z , j∈V Z , s + i = ( s i − ( √ 2− 1) P j∈V Z w i,j , if i∈V X s i + (2− √ 2) P j∈V X w i,j , if i∈V Z , w − i,j =w i,j −w i,φ(j) , ∀i,j∈V Y , i6=j s − i =s i + 2 X j∈V X w i,j + X j∈V Z w i,j , ∀i∈V Y . The proof of Theorem 2 refers to Appendix A.1. We use the toy examples of Fig- ures 2.5 and 2.6 (withV Z =∅ andV Z 6=∅, respectively) to illustrate the graph decompo- sition. Note thatG + andG − may have negative weights even ifG does not. For any signal x, we denote the “sum” (low-pass) and “difference” (high-pass) outputs of Haar units as x + and x − : ((x + ) > , (x − ) > ) > = B > φ x. For example, in Figure 2.5, x + = (y 1 ,y 2 ) > and x − = (y 3 ,y 4 ) > . In Figure 2.6, x + = (z 1 ,z 2 ,z 3 ) > and x − = (z 4 ). The graph construction of Theorem 2 creates two disconnected sub-graphs by remov- ing all edges betweenV y andV x ∪V z and preserving all other edges, but changing some of the weights and adding self-loops. Three types of cases lead to one or two edges being removed: 23 (a) (b) Figure 2.5: (a) Symmetric graph decomposition for the graph in Figure 2.4(a). Red dia- monds and blue squares represent nodes inV X andV Y , respectively. (b) The associated fast GFT diagram forG 1 , where U + 1 and U − 1 are the GFTs ofG + 1 andG − 1 , respectively. (a) (b) Figure 2.6: (a) Symmetric graph decomposition for the graph in Figure 2.4(b). Red dia- monds, green triangles, and blue squares represent nodes inV X ,V Z , andV Y , respectively. (b) The associated fast GFT diagram forG 2 , where U + 2 is the GFT ofG + 2 . 1. Edges connecting two symmetric nodes i∈V X and φ(i)∈V Y . The edge with weightb in Figure 2.5(a) is an example of this case. These edges are removed and lead to self-loops with twice the original weight inG − (2b in this case). 2. Two symmetric edges: each connecting a node inV X to a node inV Y . The two edges with weight c in Figure 2.5(a) are an example. These edges are removed, but lead to changes in two edge weights, with the weight of the edge in G + increasing and that of the edge inG − decreasing. Two self-loops are also added to the corresponding nodes inG − . 24 Figure 2.7: An example of even and odd symmetric components on the graph in Fig- ure 2.6. Signals x + and x − are outputs of the Haar units. Signals x even and x odd are associated to x + and x − by (2.17). 3. Two symmetric edges with a common node inV Z . Edges with weight d in Figure 2.6(a) belong to this case. This case results in a single edge being kept, with a modified edge weight and two self-loops inG + , and a self-loop inG − . Note that, the signals x + and x − correspond toG + andG − , and can be regarded as even and odd symmetric components of the original graph signal x. An example associated to Figure 2.6 is shown in Figure 2.7. We can see that a graph signal can be decomposed into two components x even and x odd , which correspond to x + and x − , respectively, by x even (i) = x + (i)/ √ 2, i∈V X x + (φ(i))/ √ 2, i∈V Y x + (i), i∈V Z , x odd (i) = x − (φ(i))/ √ 2, i∈V X −x − (i)/ √ 2, i∈V Y 0, i∈V Z (2.17) In particular, x even and x odd have even and odd symmetries based on the node pairing, i.e., x even (i) = x even (φ(i)) and x odd (i) =−x odd (φ(i)) for all i∈V. This can be consid- ered as a generalization of even and odd symmetric components decomposition for finite length time series. Components x even and x odd of the graph signal x can be regarded as intermediate results of the GFT coefficients. 25 The decomposition described in Theorem 2 enables us to search for further stages Haar units in the sub-GFTs U + and U − by inspecting their associated graphsG + and G − . Once a symmetry based on an involution is found inG + orG − , we can apply the decomposition again, and repeat until symmetry property cannot be found anymore. Some examples will be provided in the next section. 2.3 Examples and Applications In practice, graphs with distinct weights on different edges or graphs learned from data without any topology constraints are not likely to have the desired bipartition and sym- metry properties. Searching for an involution that induces symmetry in a given graph may involve a combinatorial problem because the number of involutions grows faster than polynomial withn [96]. However, bipartite and symmetric structures arise in graphs con- sidered in certain fields. Examples of bipartite graphs include tree-structured graphs, whose GFTs are useful for designing wavelet transforms on graph [112]. Involution-based symmetries can be found in graphs with regular or partially regular topologies (e.g., line, cycle, and grid graphs), graphs that are symmetric by construction (e.g., human skeletal graphs), and uniformly weighted graphs. In the following, we show several classes of graphs with these desired properties. 2.3.1 Bipartite Graphs As stated in Lemma 1, a k-regular bipartite graph has a GFT with a right butterfly stage. Let the sizes of the parts in the bipartite graph be|S 1 | =p,|S 2 | =n−p, then sub-GFTs E and F have sizes p and n−p, and the total number of multiplication operations will be p 2 + (n−p) 2 . By Lemma 2, the same result can be derived for a GFT derived from the symmetric normalized Laplacian of any bipartite graph. 2.3.2 Graphs with 2-Sparse Eigenvectors Conditions for 2-sparse graph eigenvectors to exist have been studied in [124,125]: Lemma 5 ([124,125]). A Laplacian has an eigenvector u with only two nonzero elements u(i) = 1/ √ 2, u(j) =−1/ √ 2 if and only if ∀v∈V\{i,j}, w v,i =w v,j . (2.18) In fact, the condition (2.18) is equivalent to havingGφ i,j -symmetric, withφ i,j (i) =j, φ i,j (j) = i, and φ i,j (k) = k for k6= i,j. In this case, each ofV X ={i} andV Y ={j = 26 φ(i)} has only one node, and U − reduces to a one by one identity matrix. Examples provided in [124] that satisfies Lemma 5 include uniformly weighted graphs with several types of topology: 1) star graph, 2) complete graph, 3) complete bipartite graph, and 4) cycle graph. We also note that, a graph with a clique (a complete subgraph), where at least two nodes in the clique are not connected to any other nodes outside the clique, also satisfies Lemma 5. 2.3.3 Symmetric Line Graphs A Laplacian matrix associated to a line graph can be viewed as a precision matrix (inverse covariance matrix) of a first-order attractive GMRF, which can be used for modeling image and video pixels [66,143]. One special case is the line graph with uniform weights, whose GFT is the well-known DCT. If a line graphG l is symmetric around the middle, then it is φ = (n,n− 1,..., 1)- symmetricG l and its GFT has a left butterfly stage (whether n is even of odd). In Section 3.2, we will study the design of such a fast GFT on symmetric line graphs, and present coding results on inter-predicted residual blocks. 2.3.4 Steerable DFTs The Laplacian of an n-node cycle graphG c with unit weights is a circulant matrix, i.e., each of its row is circularly shifted one element to the right relative to the previous row. Due to this circulant property, the Laplacian can be diagonalized by the DFT matrix, meaning that DFT is one of the GFTs ofG c (the GFTs ofG c are not unique since some of the Laplacian eigenvalues have multiplicities greater than one). The family of all GFTs ofG c is called the set of steerable DFTs (SDFTs) [31]. Fast SDFT algorithms based on graph symmetry For anyn,G isφ-symmetric, withφ = (n,n− 1,..., 3, 2, 1), which enables us to explore fast SDFT algorithms forG c other than the FFT algorithm [15]. In fact, the trans- form shown in Figure 2.3(b) can be obtained using the proposed graph decomposition. Another example withn = 12 is shown in Figure 2.8, where two stages of Haar units are available, and some of the sub-GFTs after the first two stages can further be simplified. We also note that, for any SDFT with a length n that is a multiple of 4, the GFTs of G ++ c andG −− c are DCT-II and DST-IV, respectively, which can also be implemented using fast DCT and ADST algorithms [29,46,59]. 27 (a) (b) Figure 2.8: The 12-node cycle graph: (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds, green triangles, and blue squares represent those nodes inV X ,V Z , andV Y , respectively, for the next stage of decomposition. Unlabeled edges and self-loops have weights 1, and P c,i are permutation operations. The two shaded sub-GFTs are (U ++ c ) > /2 (top) and (U −− c ) > /2 (bottom), respectively. 28 (a) (b) Figure 2.9: (a) The axes/point of symmetry for each symmetry type, with a 6×6 grid as an example. Node indices are represented based on the image coordinate system. (b) Relationships among types of symmetries. Comparison between proposed fast SDFT algorithm and FFT We also note that, SDFT can be implemented by applying the FFT algorithm, followed by proper rotations on graph frequencies with multiplicities 2 or larger. However, the proposed fast SDFT implementations have several advantages. First, unlike the conven- tional DFT, the derived GFT will have real operations only. Second, while the FFT algorithm cannot be easily applied whenn is a prime number, our method gives at least one left butterfly stage for any n. Introducing this butterfly stage would reduce the number of operations by half (for even n) or nearly half (for odd n). To the best of our knowledge, fast implementations for SDFT other than the FFT algorithm have not been studied in the literature. 2.3.5 Symmetric Grid Graphs The 2D DCT can be shown to be the GFT of an N×N uniformly weighted grid 2 and provides an optimal decorrelation of block data modeled by a 2D Gaussian Markov model [142]. Despite the use of DCT, 2D non-separable transforms such as the KLT have been shown to achieve a significantly compression gain over the DCT for some types of video blocks such as intra residual blocks with diagonal prediction direction [2]. Such 2 The term “grid” refers to a graph with N 2 nodes lying in a regular grid pattern within a square block, where each node can be connected to its 8-neighbors. 29 Table 2.2: Types of symmetricN×N grids and their corresponding involutions. Indices k and l represent vertical and horizontal coordinates of grid nodes, as in Figure 2.9(a). Symmetry type Involution Centrosymmetry φ((k,l)) = (N + 1−l,N + 1−k) UD-symmetry φ((k,l)) = (N + 1−k,l) LR-symmetry φ((k,l)) = (k,N + 1−l) Diagonal symmetry φ((k,l)) = (l,k) Anti-diagonal symmetry φ((k,l)) = (N + 1−l,N + 1−k) block-based non-separable transforms correspond to GFTs whose graphs are grids with n =N 2 nodes. Here, we call such a graph with N 2 nodes a grid, but do not impose any constraint on its graph topology to have more flexibility in designing the transforms. One key limitation of non-separable transforms is that they typically have much higher computational complexities than separable ones. This issue can be partly addressed when the grid has symmetry properties for fast GFT algorithms. These grid symmetries can be defined based on different axes or point of symmetry, as shown in Figure 2.9(a), and described as follows. We also show in Figure 2.9(b) the relationships among different symmetry types. Definition 3 (Grid symmetries). AN byN grid graph is UD-symmetric if it is symmet- ric around the middle horizontal axis, LR-symmetric if it is symmetric around the middle vertical axis. It is called centrosymmetric if it is symmetric around the center, diagonally symmetric if it is symmetric around the main diagonal, and anti-diagonally symmetric if it is symmetric around the anti-diagonal. When a grid is UD- and LR-symmetric, it is called UDLR-symmetric. If it is diagonally and anti-diaognally symmetric, we say it is bidiagonally symmetric. Finally, if the grid has all symmetry properties above (symmetric around all the four axes and the center), we say it is pentasymmetric. In fact, fast GFTs for those grids can be derived based on the involutions shown in Table 2.2, which follow directly from grid symmetry conditions. Explicit forms of those fast GFTs can be found in [72]. The GFTs with exact or partial symmetry properties can lead to a gain in energy compaction with respect to the DCT [37]. In Figure 2.10, we show an example with a 4×4 gridG b that is bi-diagonally symmetric (symmetric around both diagonals). We can first decomposeG b based on the diagonal symmetry intoG + b andG − b . Then, we observe that the symmetry around the anti- diagonal remains inG + b andG − b , so further decomposition can be applied. As a result, the overall GFT has two butterfly stages of Haar units, and can be implemented using 4 sub-GFTs with length 6, 4, 4, and 2, as in Figure 2.10. 30 (a) (b) Figure 2.10: The bi-diagonally symmetric grid. (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds, green triangles, and blue squares repre- sent those nodes inV X ,V Z , andV Y , respectively, for the next stage of decomposition. Unlabeled edges have weights 1, and P b,i are permutation operations. 31 (a) (b) Figure 2.11: The 4×4 z-shaped grid: (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds and blue squares represent those nodes inV X andV Y , respectively, for the next stage of decomposition. Unlabeled edges have weights 1, and P z,1 is a permutation operation. 32 As another example, grid graphs with symmetry properties are also considered in the coding framework proposed in [100], where image blocks are classified intoK classes, each with an associated GFT to be applied. These GFTs are derived from graph templates including 1) 4-connected grids with horizontal and vertical edges, 2) 4-connected grids with horizontal and anti-diagonal edges as shown in the top-left of Figure 2.11(a), which we refer to as z-shaped grid, and 3) rotated and flipped versions of the z-shaped grid. Those templates are weighted graphs with the constraint that all edges with the same orientation have a common weight. Without loss of generality, all GFTs from graph templates within types 2) or 3) can be characterized by the GFT of a z-shaped grid with horizontal weights 1 and anti-diagonal weightsw. We denote this graph asG z and derive its fast GFT in Figure 2.11 based on the centrosymmetry of the grid, characterized by the involution φ(i) = N + 1−i. Note that, if we flip the nodes v 9 to v 16 up to down, thenG z becomes a left-right symmetric grid, based on which we can deriveG + z andG − z as in Figure 2.11(a). The derived fast GFT diagram in Figure 2.11(b) can thus provide a computational speedup for the coding framework in [100]. 2.3.6 Skeletal Graphs In human action analysis, the human body can be represented by a hierarchy of joints that are connected with bones. Many motion capture datasets, such as the Florence 3D dataset [110], use 3D coordinates of human joints to represent human actions. We can consider the human skeleton as a graph, and the motion vectors between consecutive frames on each nodes as a graph signals. Recent work has demonstrated that the GFT basis has localization properties corresponding to different human parts [54]. For exam- ple, the second GFT basis function has positive entries on joints in the upper body, and negative entries on those in the lower body. Thus, the resulting GFT coefficients can provide a discriminating power between different human actions. Typical skeletal graphs are symmetric by construction, so a fast GFT can be obtained, as shown in Figure 2.12, where a 15-node skeleton is considered. Note that the butterfly stage is also available for skeletal graphs with non-uniform weights or different topologies, as long as the desired symmetry properties hold. 2.3.7 Search of Symmetries in General Graphs In the previous examples, symmetry properties of graphs can be easily identified by inspection. However, in general, and particularly for denser graphs, desired symmetry properties may not be straightforward to identify, or may not even exist. To design 33 (a) (b) Figure 2.12: The 15-node skeletal graph: (a) Graph decomposition. (b) The associated fast GFT diagram. Red diamonds, green triangles, and blue squares represent those nodes inV X ,V Z , andV Y , respectively, for the next stage of decomposition. U −,1 s and U −,2 s are the GFTs corresponding to two connected components ofG − s , respectively. Unlabeled edges have weights 1, and P s,1 is a permutation operation. fast GFTs for graphs beyond the previous examples, an algorithm for searching a valid involution would thus be useful. The number of involutions on n elements is [58, Section 5.1.4] T (n) = bn/2c X k=0 n! 2 k (n− 2k)!k! ∼ n e n/2 e √ n (4e) 1/4 , (2.19) 34 which asymptotically grows faster than a polynomial inn. This means that an exhaustive search of valid involutions among T (n) possible candidates is a combinatorial problem. Here, we provide two methods to reduce the complexity of this search. More detailed illustrations and implementations of these methods can be found in Appendix B. • Pruning based on the degree list Note that ifG is φ-symmetric, then the degrees of nodes i and φ(i) must be equal for every i. This necessary condition for φ-symmetry allows us to prune the involution search: we can compute the list of degrees first, then prune those involutions ϕ where for some i the degrees d i and d ϕ(i) are different. For graphs with many distinct weight values, we tend to have many distinct node degrees, and thus the number of involutions needed to be searched can be significantly reduced. • Searching of identical tree branches Trees (i.e., graphs with no cycles) are connected graphs that have the smallest number of edges. This sparsity property implies that symmetry on trees can be characterized by pairs of identical subtrees (i.e., branches) whose roots are common or adjacent. For example, in Figure 2.12, the two arms in the skeletal graph are identical branches that share a common root, and so are the two legs. Based on an algorithm proposed in [30], we provide in Appendix B an algorithm withO(n logn) complexity that, for any given treeG, finds all involutions φ such thatG is φ-symmetric. 2.4 Experimental Results In addition to theoretical computational complexity analysis including the number of operations, we also conduct experiments to measure empirical computation complexities of the fast GFTs. We have implemented several fast GFTs in C, in order to simulate an environment closer to hardware. 2.4.1 Comparison with Matrix GFT In the first experiment, we include the GFTs of several different graph topologies: the cycle graph with unit weights, the bi-diagonally symmetric 6-connected grid (as in Fig- ure 2.10, with a = 0.5), the z-shaped grid with w = 2, and the skeletal graph. For each graph we implement two fast GFTs with different sizes, and compare the runtime of the matrix GFT implementation and the fast GFT with butterfly stages. We include those GFTs in Figures 2.8 to 2.12, together with larger graphs with the same topology types: the 80-node cycle graph, the 8×8 bi-diagonally symmetric 4-connected grid, the 8×8 z-shaped grid, and the 25-node skeletal graph used in [111]. Detailed design of their fast 35 Table 2.3: Runtime performance of proposed fast GFTs. The baseline for the runtime reduction rates is the matrix GFT implementation. Topology n Number of Operations Runtime Matrix (±/∗) Fast (±/∗) Reduction Cycle 12 132/144 44/30 52.7% 80 6320/6400 1224/1078 79.7% 6-connected grid 16 240/256 80/80 53.7% 64 4032/4096 1104/1072 68.5% Z-shaped grid 16 240/256 128/112 41.5% 64 4032/4096 2048/2048 45.0% Skeleton 15 210/225 96/102 45.5% 25 600/625 272/282 47.5% GFTs can be extended from the examples in Figures 2.8 to 2.12. For each GFT, we gen- erate 20000 graph signals with a proper length, whose entries are i.i.d. uniform random variables with range [0, 1]. Then, we compute the percentage of runtime reduction for the symmetry-based fast GFT compared to the GFT realized by a single n×n matrix multiplication. In Table 2.3, we show, for each GFT, the numbers of additions (including subtrac- tions), multiplications, and the empirical computation time reduction rate compared to matrix GFT in C implementation. We see that the fast GFT on skeletal graph in Figure 2.12 with one butterfly stage leads to 45.5% speed improvement, and that on z-shaped grid in Figure 2.11 gives around 41.5% runtime saving. Fast GFTs on cycle graphs and 6-connected grids that have multiple butterfly stages yield higher runtime reduction rates. From the results in Table 2.3, we can see that the butterfly stages obtained from our proposed method lead to a significant speedup, and can be useful if the transform is required to be performed many times, and in a low-level or hardware implementation. 2.4.2 Comparison with Approximate Fast GFTs In the second experiment, we compare our proposed method with an existing fast GFT approach [62] on graphs with symmetry properties. We consider two graphs for this experiment: the 8×8 bi-diagonally symmetric grid with a = 0.5, and the 8×8 z-shaped grid with w = 2. Note that, when the desired symmetry property is available, existing 36 0 0.5 1 1.5 2 2.5 3 Average runtime per GFT (sec) 10 -5 0 0.5 1 1.5 Sign-normalized RE 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Matrix Haar + Matrix Approximate Haar + Approximate (a) 8×8 bi-diagonally symmetric 6-connected grid G b 0 0.5 1 1.5 2 2.5 3 Average runtime per GFT (sec) 10 -5 0 0.5 1 1.5 Sign-normalized RE 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 7580 85 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Matrix Haar + Matrix Approximate Haar + Approximate (b) 8×8 z-shaped gridGz Figure 2.13: Runtime versus sign-normalized relative error δ for different GFT imple- mentations on different graphs. The numbers labeled alongside the markers indicate the associated numbers of Givens rotation layers. methods can be incorporated into the symmetry-based fast GFT scheme to speed up the computation of sub-GFTs such as U + and U − . Thus, we can compare the following four GFT implementations: 1. Matrix GFT: an n×n matrix multiplication. 2. Haar-matrix GFT: symmetry-based fast GFT using Haar units, as shown in Fig- ures 2.10(b) and 2.11(b), where the sub-GFTs are implemented by full matrix multiplications. 3. PTJ-GFT [62]: fast GFT using layers of Givens rotations found by the paral- lel truncated Jacobi (PTJ) algorithm–a greedy-based algorithm that progressively approximate ˆ U > L ˆ U to a diagonal matrix. The resulting GFT can be implemented using the schematic diagram as in Figure 2.1. 4. Haar-PTJ-GFT: symmetry-based fast GFT with sub-GFTs implemented using PTJ-GFT. For a given GFT implementation, with GFT matrix ˆ U that approximates the true GFT matrix U, we define two error metrics as follows. 1. Sign-normalized relative error (RE): we consider the relative error between two n×n orthogonal matrices, RE( ˆ U, U) = kU− ˆ Uk F kUk F = k ˆ U > U− Ik F √ n . (2.20) 37 0 0.5 1 1.5 2 2.5 3 Average runtime per GFT (sec) 10 -5 0 5 10 15 20 25 30 35 Empirical average error 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Matrix Haar + Matrix Approximate Haar + Approximate (a) 8×8 bi-diagonally symmetric 6 connected grid G b 0 0.5 1 1.5 2 2.5 3 Average runtime per GFT (sec) 10 -5 0 5 10 15 20 25 30 35 Empirical average error 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 7580 85 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Matrix Haar + Matrix Approximate Haar + Approximate (b) 8×8 z-shaped gridGz Figure 2.14: Runtime versus empirical average error for different GFT implementations on different graphs. The numbers labeled alongside the markers indicate the associated numbers of Givens rotation layers. Note that if ˆ U =−U, the RE will be large although they share a common eigen- structure. To avoid this sign ambiguity, we modify (2.20) by taking absolute values elementwise on ˆ U > U: δ( ˆ U, U) := 1 √ n k| ˆ U > U|− Ik F . 2. Empirical average error: letX ={x 1 ,..., x M } be the set of input signals, we define ( ˆ U, U,X ) := 1 M M X i=1 n X j=1 |u > j x i |−| ˆ u j > (x i )| 2 , where u > j x i and ˆ u j are the j-th true and approximate GFT coefficients of x i , and the absolute values are used to avoid the sign ambiguity. For both graphs considered in this experiment, the eigenvalues of the Laplacians are all unique, so there is no rotation ambiguity in the GFT basis functions. We apply the method in [62] to obtain the parameters (angle and node pairings for Givens rotations) for PTJ-GFT and Haar-PTJ-GFT, then implement the resulting fast algorithms in C, with different numbers of layers J∈{0, 5, 10,...}. We use M = 20000 random samples as in Section 2.4.1, and obtain the error metrics, δ and, for each GFT implementation. The runtime versus sign-normalized RE, and versus empirical average error are shown in Figures 2.13 and 2.14, respectively. We note that, first, the RE drops more steadily 38 than the empirical error when the number of layers increases. This is related to the order of GFT basis functions. When more layers of Givens rotations are introduced, more GFT basis functions will be ordered correctly (i.e., smaller error in the final permutation operation Π J+1 in Figure 2.1). Indeed, we observe that when the number of correctly ordered GFT coefficients increases, the decrease of the empirical error is usually more significant than that of the relative error. The second observation is that for the two graphs with n = 64 nodes, the PTJ-based approach typically takes more than 20 layers to yield a sufficiently accurate GFT in terms of both error metrics. However, when more than 20 layers are used, the computation complexity becomes comparable or higher than Haar-matrix GFT, which provides exact GFT coefficients. Finally, we see that in both Figures 2.13 and 2.14, the error of Haar-PTJ-GFT drops faster than that of PTJ-GFT. This means that by applying the symmetry property, we can obtain a significantly higher convergence rate for the PTJ algorithm. This is a reasonable consequence, as our divide- and-conquer method reduces the dimension of the problem for the PTJ algorithm. 2.5 Conclusion We have explored the relationship between the graph topology and properties in the corresponding GFT for fast GFT algorithm based on butterfly stages. In particular, we focus on a component of the butterfly stage called Haar unit, and discuss the conditions for a stage of Haar units to be available in the GFT implementation. We have shown that a graph has a right butterfly stage with Haar units if it is k-regular bipartite (or if it is bipartite when we consider the symmetric normalized graph Laplacian). On the other hand, a left butterfly stage is available if the graph has symmetry properties. We have formally defined the relevant graph symmetry based on involution, i.e., pairing of nodes. Then, we have proposed an approach, where once a graph symmetry is identified, we can decompose a graphG into two smaller graphs,G + andG − , whose GFTs corresponds to the two parallel sub-transforms after a butterfly stage of Haar units. Again, fromG + andG − we can explore subsequent butterfly stages if any desired symmetry property holds in them. Thus, this method enables us to explore butterflies stage by stage. The desired symmetry properties typically arise in graphs that are nearly regular, symmetric by construction, or uniformly weighted. We have discussed several classes of those graphs: bipartite graphs, graphs with 2-sparse eigenvectors such as star and complete graphs, symmetric line and grid graphs, cycle graphs, and skeletal graphs. Relevant applications of those GFTs include video compression and human action anal- ysis. In particular, the fast GFT of a grid graph provides an efficient implementation of a non-separable transform for video blocks; a fast GFT on skeletal graph can also speed 39 up the feature extraction procedure for further action classification tasks. Finally, we implement the fast GFT algorithms in C and compute the runtime saving for several graphs. The experiment results show that our method provides a significant computa- tion time reduction compared to the GFT computed by matrix multiplication. It also outperforms existing fast approximate GFT approaches in terms of both complexity and accuracy for graphs with desired symmetry properties. 40 Chapter 3 Data-Driven Fast GFTs for Video Coding In Chapter 2, we have discussed conditions of graphs for fast GFT algorithms. Here, we apply some of the results to an important graph signal processing application: image and video coding, with a focus on transform coding. Transform coding is a key component in image and video coding applications [41]. Its basic idea is to decorrelate pixel data using a transform kernel, and hence reduce the redundancies among pixels. The DCT [117] is still by far the most widely used transform in image and video coding due to its fast implementation. However, it has been demonstrated that better compression efficiency can be achieved by data-driven transforms that are obtained based on statistical properties of the residual blocks [2,146, 147]. The Karhunen-Lo` eve Transform (KLT), which is known to provide the optimal decorrelation, has been proposed for designing mode-dependent transforms of residual blocks [139]. However, its computational complexity is typically very high, which has limited its practical use. Recently, it has been shown that GFT is useful for decorrelating particular types of pixel data [24,33,50]. However, one of the major challenges of GFT in transform coding is that, similar to the KLT, those GFTs that optimally adapt to the data are usually computationally expensive. To address this complexity issue, we note that based on dis- cussions in Chapter 2, certain graph structural properties, such as symmetry and bipar- tition, would lead to a computation cost reduction in GFT implementations. Indeed, once the graph to be learned is restricted to satisfy those structural properties, a GFT with fast algorithm can be obtained. In this chapter, we study data-driven fast GFTs to enhance efficiency in transform coding techniques. To achieve this goal, we consider a graph learning problem to obtain the graph from sample data. This can be viewed as the problem of learning a data-driven GFT that adapts to statistical properties of target image/video blocks. Furthermore, we restrict the graphs to be symmetric or to have a butterfly stage so that a fast algorithm Work in this chapter has been published in [70,71]. 41 is available to compute the GFT. First, we introduce a general framework for designing data-driven fast GFT in Section 3.1. Under this framework, we optimize parameters of symmetric line graphs (as discussed in Section 2.3.3) to obtain efficient separable transforms that provide a coding gain in Section 3.2. Then, we propose a method for non-separable transform extensions with unknown butterfly stages in Section 3.3. Experimental results are shown in Section 3.4. 3.1 Learning Fast GFTs: A General Framework Graph learning is an important problem in graph signal processing [22]. This problem arises from the fact that, given real-world data such as pixel blocks, we may not know ahead of time what the best graph is to develop graph signal processing tools. In prac- tice, the graph can be learned from data based on the statistical properties [25], global smoothness [25], diffusion kernel estimation [92,108], causal dependency [84] and so on. In what follows, among different graph learning problem formulations, we adopt the one formulated in [25] and incorporate structural constraints associated to fast GFT imple- mentations. This allows us to obtain practically useful GFTs that have fast algorithms and adapt to particular set of pixel data of interest. We start by formulating a data-driven fast GFT design problem. We denote x i ’s training samples (e.g., these could be image or video blocks used for training), and μ their sample mean (which is usually assumed to be 0 when dealing with prediction residual blocks of a video). Then, we let S = 1 m P m i=1 (x i −μ)(x i −μ) > be the empirical covariance matrix. We start with the Gaussian maximum likelihood (ML) estimation problem [25] for the graph Laplacian precision matrix L: minimize L∈L(E) − log|L| † + trace(LS), (3.1) where the objective function is the negative log-likelihood function of an attractive GMRF whose precision matrix is L. In (3.1), L(E) is the set of graph Laplacians with edge setE. The pseudo-determinant (product of nonzero eigenvalues)|L| † is required here when a combinatorial graph Laplacian matrix, which is singular, is considered. Note that, for anyE, (3.1) is a convex problem, where general solution can be obtained using efficient iterative approaches [25,94]. For image and video coding, pixels are usually modeled as 1st order GMRFs, where each node is connected to its nearest neighbors in the associated graph. In particular, the following graph structuresE (as shown in Figure 3.1) may be considered: 42 (a) (b) (c) (d) Figure 3.1: Useful graph structures in modeling image and video pixels: (a) length-4 line graph (b) 4×4 4-connected grid, (c) 4×4 8-connected grid, and (d) 4×4 6-connected grid. • 1D line graph: it corresponds to a 1st order GMRF for 1D pixel data (a row or a column of a 2D pixel block). Its associated GFT could be used as a 1D row or column transform. • 2D 4-connected grid graph: it is associated to a 1st order GMRF for 2D pixel data. The resulting GFT corresponds to a 2D block transform. • 2D 8-connected and 6-connected grid graphs: diagonal edges can be added to the 4-connected grid to capture correlations between each pixel and its 8-neighbors. Those connections are particularly useful when the image/video content has diag- onal orientations. These graphs with length 4 or grid size 4×4 are shown in Figure 3.1. Our results from Chapter 2 for fast GFTs are mainly based on factorization of Lapla- cian L = BGB > into a matrix of left Haar units B and a block-diagonal matrix G (see 43 Section 2.1.2.). The fast GFT associated to L is U = BU G , where U G is the eigen- vector matrix of G. This speedup is led by the fact that B is sparse, and U G has the block-diagonal structure as G does. Here, we generalizing the factorization L = BGB > to L = HRH > with a sparse H (not necessarily B) and block-diagonal R, such that there is a fast GFT of L. In particular, we write L = HRH > , with constraints on H and G given by criteria C1 and C2: C1 H is orthogonal, and each of its columns is a constant multiple of a vector, on which the projection can be implemented efficiently. That is, H = G H D H , where D H is diagonal, G H has orthogonal columns, and G > H x has fast implementation. Here are two specific criteria that allow fast implementations: C1-A All entries in G H are 0, 1, or -1. C1-B G H is sparse. C2 R is block diagonal, and as sparse as possible, i.e., having many smaller blocks is preferable. Let the eigenmatrix of R be U R , then the GFT of signal x is U > R D > H G > H x. Note that multiplying by G > H is efficient, and U > R D > H has the same block diagonal structure as R does. Thus, the sparsity of R enables reduction in the computation of this GFT. Writing L = HRH > with C1 and C2 imposed, we have det(L) = det(R) and trace(LS) = trace(HRH > S) = trace(R(H > SH)). Thus, (3.1) can be rewritten as minimize H,R − log det(R) + trace(R(H > SH)) subject to R 0, HRH > i,j = 0 for (i,j) / ∈E, HRH > i,j ≤ 0 for (i,j)∈E H satisfies C1, R satisfies C2. (3.2) The problem (3.2) involves a product HRH > of unknown matrices, so it is a noncon- vex problem that may not have efficient solvers. However, when H that satisfies C1 is available, (3.2) reduces to a convex problem. This special case will be studied as follows. 3.1.1 Solving (3.2) with a Particular Symmetry Type As discussed in Section 2.2, when the graph is symmetric in a certain way, we obtain a fast GFT: by Lemma 3, L = B N GB > N with block-diagonal G if L is centrosymmetric, which is associated to symmetry properties of the graph. In this case, a fast GFT U can be obtained using the diagram in Figure 2.2(a), which can be expressed as U = B N · diag(U + , U − ), 44 where B N is the orthogonal matrix defined in (2.2) that corresponds to a butterfly stage of Haar units, and diag(U + , U − ) is a block-diagonal matrix associated to two parallel length N/2 transforms. From the above statement, if we restrict the graph to be symmetric, an H with some desirable properties will arise, and the sizes of block diagonal elements of R will be known. Then, we can show that (3.2) would reduce to a convex problem. We denote R = diag(R 1 ,..., R M ), where element R i has sizek i ×k i and P M i=1 k i =n. We also write H as H = (H 1 ,..., H M ), where H i , with size n×k i , is a vertical slice of H containing k i columns. Thus, HRH > = P M l=1 H l R l H > l , and HRH > i,j = e > i M X l=1 H l R l H > l ! e j = M X l=1 h(i,l)R l h(j,l) > , where h(i,l) denotes the i-th row of H l . We define Θ as Θ := H > SH = Θ 1,1 Θ 1,2 ... Θ 1,M Θ 2,1 Θ 2,2 ... Θ 2,M . . . . . . . . . . . . Θ M,1 Θ M,2 ... Θ M,M , where Θ i,j has size k i ×k j , then (3.2) can be reduced to a convex problem: minimize R i ∈R k i ×k i M X l=1 [−log det(R l ) + trace(R l Θ l,l )] subject to R i 0, M X l=0 h(i,l)R l h(j,l) > ≤ 0, i6=j. (3.3) The number of variables is reduced fromN 2 = ( P M l=1 k l ) 2 to P M l=1 k 2 l . Since H is typically sparse, the constraints in (3.3) are simplified in many practical cases. Note that if Θ i,j is zero for all i6=j, the solution of (3.3) is the same as that of (3.1). Based on this framework of data-driven fast GFT, we devote the next two sections to particular cases: symmetric line graph (Section 3.2) and non-separable fast GFT without any specified symmetry (Section 3.3). 3.2 Symmetric Line Graph Transforms (SLGT) As introduced in 3.1, first order GMRF are associated to line graphs. Based on the connection between graph symmetry and Haar units we have introduced in Section 2.2, 45 (a) (b) (c) Figure 3.2: (a) An arbitrary length 8 line graph. (b) A length 8 SLGT. (c) A special length-8 SLGT. we note that among all line graphs, those having a butterfly stage of Haar units are symmetric line graphs. The GFT of a symmetric line graph is thus called a symmetric line graph transform (SLGT). With a length 8, an arbitrary line graph can be represented as in Figure 3.2(a), while a symmetric line graph represented in Figure 3.2(b). The flow diagram for the computation of a general 8×8 SLGT is shown in Figure 3.3, where U + and U − are 4×4 transforms characterized by the subgraphsG + andG − , obtained from the decomposition described in Section 2.2.3. This fast GFT diagram can be extended to any even size N 1 . For any even numberN, both U + and U − have at mostN 2 /4 multiplications, meaning that the overallN×N SLGT has at mostN 2 /2 multiplications. This is half the number of multiplications required by a general N×N transform such as the KLT. 1 As we have pointed out in Section 2.3.3, there also exists a butterfly stage of Haar units for a symmetric line graph with an odd number of nodes. However, we leave the discussion for future work since transforms with odd lengths are rarely used in existing image and video coding systems. 46 (a) (b) Figure 3.3: (a) The diagram of general 8×8 SLGT implementation. (b) Graphs that characterize the sub-transforms in the fast GFT diagram. 3.2.1 DTTs as SLGTs There are some well-known transforms that can be viewed as SLGTs. For example, among 16 types of DTTs, the DCT-II, DST-I, and DST-II, as defined in Section 1.2.4 correspond to Laplacian matrices L DCT-II (1.10), and L DST-I = 2 −1 −1 2 −1 . . . . . . . . . −1 2 −1 −1 2 , L DST-I = 3 −1 −1 2 −1 . . . . . . . . . −1 2 −1 −1 3 , respectively. Their associated graphs are uniform line graphs with two self loops with weights α = 0, α = 1, and α = 2, respectively, as shown in Figure 3.2(c). These three transforms are known to have fast algorithms for lengths satisfying certain conditions [9,134,140]. 3.2.2 Closed-Form Graph Learning Solution with Tree Topologies Here, we highlight a property that provides a closed form solution for learning a simple (i.e., loopless) line graph 2 . More generally, the close-form solution would be available if 2 The main result of this section has been published in our work [77]. 47 the graph to be learned is a tree (acyclic graph), where line graph is a particular case. Those results are potentially useful because 1) those results applies to trees that are more general than line graphs 2) trees are useful graphs that provide computational benefits. In particular, trees are the sparsest connected graphs, so graph based filters [105, 114] have the lowest complexity when implemented on trees. Furthermore, tree are bipartite, hence perfect reconstruction filterbanks on graphs can be easily implemented [87]. We consider the graph estimation problem with an ` 1 regularization term, which is more general than (3.1): minimize L∈L(E) − log|L| † + trace(LS) +αkLk 1,off , (3.4) where α is a Lagrange multiplier andkLk 1,off is the absolute sum of all off-diagonal elements of L. The role of the last regularization term in (3.4) is to promote graph sparsity, which also corresponds to an exponential prior of the GMRF parameters [25]. In fact, closed-form solution for (3.4) is available if the topology setE corresponds to a tree graph: Theorem 3. IfE corresponds to a simple tree graphG, then the graph weights for the optimal CGL solution of (3.4) are given by w s,t = h (e s − e t ) > K(e s − e t ) i −1 = h 1 N P N i=1 (x i (s)− x i (t)) 2 + 2α i −1 , (s,t)∈E, 0, (s,t) / ∈E. (3.5) where K = S +α(I− 11 > ). Proof: see Appendix A.3. Using this closed form solution, the complexity of solving a tree CGL can be sig- nificantly reduced. Given the matrix K, constructing the weights from (3.5) has an overall complexity ofO(n− 1) =O(n). In contrast, the algorithm proposed in [25] hasO(T p (n) +n 2 ) complexity per block-coordinate descent iteration, where T p (n) is the complexity of solving a nonnegative quadratic program of size n. 3.3 Data-driven Non-separable Fast GFTs In Section 3.1.1 we solved (3.2) for known H. In this section, we provide an approximate method when H in (3.2) is not known ahead of time. 48 When H is not given, (3.2) is a nonconvex problem. We propose an efficient approx- imation method as follows. We factor N as N = N 1 N 2 and approximate S by the Kronecker product of two matrices of sizes N 2 ×N 2 and N 1 ×N 1 : S≈ S 2 ⊗ S 1 . (3.6) The optimal approximation in terms of Frobenius norm is given by the optimal rank-1 approximation of ˜ S, a matrix with size N 2 2 ×N 2 1 , whose entries are those of S with rearrangement [98,131]: ˜ S = vec(S 1,1 ) > . . . vec(S N 2 ,1 ) > vec(S 1,2 ) > . . . vec(S N 2 ,N 2 ) > , with S = S 1,1 S 1,2 ... S 1,N 2 S 2,1 S 2,2 ... S 2,N 2 . . . . . . . . . . . . S N 2 ,1 S N 2 ,2 ... S N 2 ,N 2 , where S i,j ’s are N 1 ×N 1 block components of S. It can be shown that S 2 and S 1 can be obtained by solving the first left and right singular vectors of ˜ S followed by vector reshaping: Lemma 6 ([131]). For an N×N matrix S, the optimal solution of (S ∗ 1 , S ∗ 2 ) = argmin S 1 ∈R N 1 ×N 1,S 2 ∈R N 2 ×N 2 kS− S 2 ⊗ S 1 k 2 can be obtained from the rank-1 approximation of ˜ S. In particular, denote μ = (μ 1 ,...,μ N 2 2 ) > andν = (ν 1 ,...,ν N 2 1 ) > the first left and right singular vectors of ˜ S asso- ciated to singular value σ 1 , then S ∗ 1 = ν 1 ν N 1 +1 ... ν N 2 1 −N 1 +1 ν 2 ν N 1 +2 ... ν N 2 1 −N 1 +2 . . . . . . . . . . . . ν N 1 ν 2N 1 ... ν N 2 1 , S ∗ 2 = μ 1 μ N 2 +1 ... μ N 2 2 −N 2 +1 μ 2 μ N 2 +2 ... μ N 2 2 −N 2 +2 . . . . . . . . . . . . μ N 2 μ 2N 2 ... μ N 2 2 . Next, with S 1 and S 2 obtained from Lemma 6, we denote the eigendecomposition of S 2 as S 2 = E 2 D 2 E > 2 , then S≈ E 2 D 2 E > 2 ⊗ (I N 1 S 1 I N 1 ) = (E 2 ⊗ I N 1 ) (D 2 ⊗ S 1 ) (E 2 ⊗ I N 1 ) > . (3.7) 49 Algorithm 1 Proposed Approximation Method for Non-separable Fast GFT Require: S∈R N×N , a chosen factorization N =N 1 N 2 Ensure: (H, R) as a solution of (3.2) 1: Evaluate S 1 and S 2 such that S≈ S 2 ⊗ S 1 2: Evaluate the eigenmatrix E 2 of S 2 3: H← E 2 ⊗ I N 1 4: Solve (3.3) for R i , given H and k i =N 1 for each i 5: R← diag(R 1 ,..., R N 2 ) If N 2 is small, E 2 ⊗ I N 1 is sparse; thus, it is a legitimate matrix satisfying C1-B that approximately block-diagonalizes S to D 2 ⊗ S 1 . Therefore, we pick E 2 ⊗ I N 1 as the H matrix, and use the block structure of D 2 ⊗ S 1 , to solve (3.3) and obtain R which has M =N 2 nonzero N 1 ×N 1 diagonal blocks. We summarize this proposed method in Algorithm 1. For the choice of N 1 and N 2 , we note that it is preferable to choose N 1 ≈N 2 . With the resulting H and R matrices, the number of multiplications required for the fast GFT isN(N 1 +N 2 ), which is smaller whenN 1 andN 2 are closer to each other. In addition, whenN 1 is closer toN 2 , there are more degrees of freedom in S 1 ⊗ S 2 , giving a potentially better approximation in (3.6). If the signal is a pixel block with sizeN 1 ×N 2 , the eigenmatrices E 1 and E 2 of S 1 and S 2 characterize the column and row transforms in a separable scheme. Therefore, the GFT obtained by the proposed method can be regarded as a non-separable transform obtained from a separable one: the solver of (3.3) updates D 2 ⊗ S 1 in (3.7) into an arbitrary block diagonal matrix (not necessarily a Kronecker product matrix), which has more degrees of freedom. The resulting fast GFT can be realized by a common row transform applied to all rows followed by different column transforms applied to different columns 3 . The number of multiplications is the same as a separable transform. The number of required coefficients is NN 1 +N 2 2 , which is between the numbers in the separable case (N 2 1 +N 2 2 ) and the nonseparable case (N 2 ). 3.4 Experimental Results In this section, we apply data-driven GFTs introduced in Sections 3.2 and 3.3 to video residual blocks, and obtain the resulting rate-distortion (RD) performance. In Sec- tion 3.4.1, we obtain a data-driven separable GFT composed of SLGTs (as described in 3 Note that when the input covariance matrix has row-first node ordering, the same method will yield an approximate transform with a common column transform and different row transforms. 50 Table 3.1: Parameters of learned symmetric line graphs, and those associated with DCT. The parameters are weights of self loops and edges, as shown in Figure 3.2(b). s 1 s 2 s 3 s 4 w 1 w 2 w 3 w 4 G V 0.15 0 0.02 0.02 0.94 0.96 1 0.90 G H 0.08 0 0.01 0.03 1 0.93 1 0.94 G DCT 0 0 0 0 1 1 1 1 Table 3.2: Average percentage of bit rate reduction of SLGTs as compared to DCT at 32dB-38dB PSNR. Results with and without frequency weighting are compared. Nega- tive values mean compression gain. w/o freq. weighting w/ freq. weighting BQMall 0.04 -0.10 BasketballDrill 0.09 -0.19 Kimono1 -0.75 -1.40 Cactus -0.03 -0.20 ParkScene -0.82 -0.58 BQTerrace -0.31 -0.17 Average -0.39 -0.51 Section 3.2), and evaluate its rate-distortion (RD) performance on a dataset of inter pre- dicted blocks. Then, in Section 3.4.2, we apply the approach introduced in Section 3.3 to learn mode-dependent non-separable GFTs for intra predicted blocks in particular prediction modes, and present the resulting RD performance. 3.4.1 Data-Driven SLGTs for Inter Predictive Coding In this experiment, we obtain a 2D separable transform composed of a row SLGT and a column SLGT, and apply it to inter predictive coding. We use a dataset of 8× 8 inter predicted luma blocks, extracted with the HEVC test model (HM-16.9) from sev- eral video sequences: BasketballDrive, BQMall, BasketballDrill, Kimono1, Cactus, ParkScene, and BQTerrace, where BasketballDrive is used as the training set and the other sequences are used for testing. We solve the problem (3.3) with H = B N andE corresponding to a line graph topology. For the column transform, we use 1D vertical vectors of length 8 taken from the 8×8 residuals in the training set, to obtain L V , the 8×8 empirical covariance matrix. For the row transform, we repeat the same procedure using 1D horizontal vectors taken from the same training set to obtain the empirical covariance matrix L H . Since most residual blocks are expected to be well approximated by only a few low frequency eigenvectors at compression rates of interest, we modify the optimization criterion of (3.3) by increasing 51 the penalty on the approximation error on the lower frequency basis. To achieve this, we apply a simple heuristic based on frequency weighting to the covariance matrix [95]. Letting L V = U V Λ V U > V and L H = U H Λ H U > H , we replace Λ V and Λ H by Λ 2 V and Λ 2 H , respectively, then use the modified covariance matrix as input of the program (3.3) to estimate the Laplacian matrices L V and L H and hence the symmetric line graphsG V /G H . In this way, the effects of eigenvectors corresponding to low frequencies are magnified in the objective function of (3.3). As a result, the first few basis functions of the learned SLGT are expected to better approximate those of KLT. In the implementation of this experiment, self-loops are allowed to provide more degrees of freedoms for better coding results 4 . We use the CVX package [42] to obtain the parameters. Those weights of self loops and edges, as depicted in Figure 3.2(b), are shown in Table 3.1. The eigenmatrices of L V and L H are SLGTs that we use as column and row transforms for testing data. The rate-distortion (RD) performance of the proposed transform is compared with that of DCT. After the transforms, the coefficients are uniformly quantized, then entropy- encoded using the AGP encoder [102]. The AGP codec combines an alphabet parti- tioning and a set partitioning techniques, which are learned based on the distribution of the transform coefficients; thus, this codec can provide a fair comparison between different transforms. Since the column/row transform scheme is always applied to all columns/rows, no side information is required for uniquely decoding the blocks. The bitrate gains in Bjontegaard-delta measure (BD-rate) [4] at 32dB-38dB peak- signal-to-noise-ratio (PSNR), with and without frequency weighting, are shown in Table 3.2, where the coding results with DCT coefficients is used as the baseline. Accord- ing to the table, we have the following remarks from the results. First, the proposed transform achieves an average bitrate gain of 0.51% as compared to DCT. In average, the gain of SLGT is the highest at distortion level of 32dB, and drops when it gets higher. As our proposed SLGT better approximates the first few basis functions of a learned KLT, better decorrelation results at lower bit rate is expected. Second, the frequency weighting procedure provides a more consistent gain in the 6 test sequences compared to the scheme without frequency weighting. The most significant bitrate gain is observed in Kimono1 sequence, with frequency weighting applied. Indeed, the residual signal of Kimono1 has the smallest energy in average, and such blocks are more likely to be rep- resented by a small number of SLGT coefficients. Thus, the first few basis functions at low frequency, where the SLGT approximates the KLT with a better accuracy, are particularly important for those smooth blocks. 4 If self-loops are not allowed, the convex program can be solved using the closed form solution described in Section 3.2.2. 52 Table 3.3: Bit rate reduction of separable KLT, fast GFT, and hybrid DCT/ADST as compared to the 2D separable DCT, tested on residual blocks with intra mode-2 and mode-16. Negative values mean compression gain, measured in the Bjontegaard metric (percentage of bit rate reduction). Separabe KLT Fast GFT DCT+ADST Mode-2 Mode-16 Mode-2 Mode-16 Mode-2 Mode-16 BQMall -9.7 -6.3 -12.9 -8.2 -0.7 -1.3 BasketballDrill -11.9 -8.6 -14.5 -10.2 -3.4 -2.5 Crew -17.5 -14.8 -20.7 -16.2 -3.2 -1.7 Harbour -26.6 -24.9 -29.7 -23.0 -1.7 -3.2 Ice -28.4 -11.8 -35.5 -16.8 -3.7 -1.7 Soccer -14.6 -12.7 -15.6 -11.2 -3.9 -1.5 Average -13.2 -10.4 -20.2 -11.8 -2.8 -2.1 0.2 0.25 0.3 0.35 0.4 0.45 Bits per pixel 33 33.5 34 34.5 35 35.5 36 PSNR (dB) DCT Separable KLT (pre-trained) fast-GFT (pre-trained) Hybrid DCT/ADST Figure 3.4: Rate-distortion performance of separable KLT, fast GFT, and hybrid DCT/ADST. The testing blocks in this figure are mode-2 intra residual blocks. 3.4.2 Data-Driven Nonseparable GFTs for Intra Blocks In the second experiment, we apply the learning method proposed in Section 3.3 to intra- predicted blocks. We extract intra-predictive residual blocks from test video sequences BQMall, BasketballDrill, City, Crew, Harbour, Ice andSoccer, again with HM- 16.9. Only 8×8 luma residual blocks with intra prediction mode-2 (HOR+8) and mode- 16 (HOR-6), which have nearly diagonal directions of prediction, are considered. We train a fast GFT for mode-2 (resp. mode-16) based on Algorithm 1, using mode-2 (resp. mode-16) blocks in the City sequence as the training set. The convex problem (3.2) is 53 solved using CVX [42] toolbox. Then, we test the results on mode-2 (resp. mode-16) blocks in the other 6 sequences, not including the one for training. To compare the results, we consider several other transforms with comparable or lower computational costs: the 2D separable DCT, separable KLT, and the mode-dependent hybrid DCT/ADST transform. In the hybrid scheme, we apply a mode-dependent com- bination of DCT and ADST, as suggested in [107], for the row and column transform. Each KLT (for mode-2 and mode-16, respectively) is pre-trained using the same training set as the fast GFT. The transform coefficients are uniformly quantized using various quantization steps that yield PSNR levels ranging from 29dB to 38dB. As in Section 3.4.1, the quantized coefficients are encoded using the AGP codec [102]. Table 3.3 shows the coding gain in BD-rate (percentages of bit rate reduction) over 2D DCT. The RD plot for mode-2 blocks is shown in Figure 3.4. The resulting fast GFT achieves an average of 20.2% bit rate gain over DCT for mode-2 blocks, and 11.8% for mode-16 blocks. It also outperforms the hybrid transform for each sequence/mode. In most cases, the fast GFT also gives a higher coding gain than the pre-trained separable KLT. In fact, the Laplacian constraint that requires the transform to be a GFT can be viewed as a regularization, which prevents overfitting. In contrast, a separable KLT that is learned and tested on different datasets is not regularized, and may suffer from overfitting. We also note that in this experiment, data-driven transforms (separable KLT and fast GFT) yield a very significant gain (> 10%) compared to the scheme with 2D DCT or hybrid DCT+ADST. While data-driven transforms provide a better adaptation to statistical properties of blocks, the significant gain we obtain here may not be easily achievable in practice. In fact, part of the gain we observe is thanks to the fact that the coding scheme we implement here is an open-loop system, where residual blocks were extracted from HEVC encoder, and then processed using an independent entropy encoder. In addition, during block collection process in the HEVC encoder, the decision of intra prediction mode is based on the RD cost, which is dependent of the transform coefficients. However, the separable KLT and fast GFT were trained after those blocks were collected, and prediction modes may be chosen differently when RD cost was eval- uated based on KLT or GFT coefficients. Thus, while data-driven transforms give a significant gain on the blocks we extracted, such a big gain may not be attainable when they are embedded into a real codec as a component of a closed-loop system. 54 3.5 Conclusion In this chapter, we have investigated several data-driven fast GFTs for video coding. A general graph learning framework for fast GFT was introduced in Section 3.1, where constrains associated to fast GFT algorithms were incorporated. We showed that if the desired butterfly stage(s), associated to given graph symmetry properties, is specified, then the problem is convex and can be solved using existing algorithms. Two approaches under this framework were considered: in Section 3.2, we learned GFTs corresponding to symmetric line graphs, and introduced a learning approach that approximates KLT basis functions using SLGTs. In Section 3.3, we extended the graph learning problem to non-separable transforms. Without any prior information on graph topology or specified butterfly stage(s), we proposed an approximate solution that gives a GFT with the same complexity as a separable transform. Experimental results were shown in Section 3.4, where the learned SLGT and non-separable fast GFT achieve consistent compression gains for inter and intra predicted residual signals, respectively. 55 Chapter 4 Lapped Graph Fourier Transform In image and video coding, block-based transforms such as the DCT, are applied to different blocks independently. In this way, the correlation between pixels on the boundary of two adjacent blocks cannot be captured. This leads to the so-called block- ing artifact: an artificial discontinuity across the block boundary in the reconstructed signal. A well-known approach that addresses blocking artifacts is based on the design of lapped transforms [79,80], where a transform is applied to blocks with overlap. This method has been shown to reduce significantly blocking artifacts. Designs of lapped transforms include the lapped orthogonal transform (LOT) [81], and a pre- and post- filtering framework [127], which has been adopted in the Daala codec [130]. However, as many signal processing techniques have been extended to a graph-based framework, to our knowledge, lapped transforms have not been studied in the context of graph signal processing. Similar to the block-based DCT, most existing graph-based transforms for pixel data are applied block-wise without overlap, and may lead to block- ing artifacts. Thus, an LOT-like graph-based transform that can mitigate blocking arti- facts would be of practical interest. In this chapter, we extend the notion of lapped trans- form to graph signals, with particular focus on line graphs with non-uniform weights, as in Figure 3.2(a). We propose the lapped graph Fourier transform (LGFT): a family of lapped transforms that have different basis functions for different blocks so as to capture distinct local statistical properties for different blocks, while having perfect reconstruc- tion and orthogonality properties. We will show that the conventional LOT can be seen as approximately optimizing the transform coding gain for signals with a uniform line graph model, while our proposed LGFT is a generalization of the LOT that considers more general graph-based models. The LGFT matrix for each block can be obtained from a Kron reduction [23] of the graph, followed by an eigen-decomposition. Experimental results show that for data associated to a non-uniform line graph model, the LGFT can provide a better transform coding gain as compared to existing transforms such as LOT and DCT. Work in this chapter has been published in [73]. 56 The rest of this chapter is organized as follows. In Section 4.1 we review basic concepts of the LOT. In Section 4.2, we formulate the optimal lapped graph Fourier transform design problem, and propose an LGFT design method. Experimental results are shown in Section 4.3. Finally we conclude this work in Section 4.4. 4.1 Review of Lapped Orthogonal Transforms The lapped orthogonal transform (LOT) is a lapped transform that has orthogonal basis functions, and is applied to blocks with length 2M and overlap lengthM. Given a block size 2M, the LOT matrix is a 2M×M matrix R = (E > , F > ) > , where E and F are two square matrices to be designed. If we denote x as the input signal and y as the transform domain vector, then we can define a transform matrix T such that y = T > x, and the reconstructed signal ˆ x = Ty, where T = . . . R R . . . = . . . . . . E F E F . . . . . . . The transform R is a valid LOT if and only if TT > = T > T = I, meaning that ˆ x = x and columns of T are orthogonal. The general solution has the form [118]: E = PQ, F = (I− P)Q, (4.1) where P can be any symmetric projection matrix, and Q can be any orthogonal matrix, both with size M×M. 1 Many lapped transform designs use the DCT as a key component. One example of such a design is: ˆ E = 1 2 U e − U o , (U e − U o )Z , ˆ F = 1 2 J(U e − U o ),−J(U e − U o )Z , (4.2) where U e and U o areM×M/2 matrices whose columns are the length-M DCT functions of even and odd symmetry, respectively. The matrix Z is a cascade of plane rotations 1 In the literature, the general solution is sometimes represented as E = QP, F = Q(I−P). In fact, this form and (4.1) are interchangeable, and we focus on (4.1) here as it can be extended to the graph-based design more easily. 57 [81] or a product of DST-IV and DCT-II [80], and J is the order reversal permutation matrix (1.1). Based on the fact that DCT approximates the KLT, the design in (4.2) approximates the optimal solution characterized in [79]. The projection and orthogonal matrices corresponding to (4.2) are ˆ P = 1 2 (I− U e U > o − U o U > e ), ˆ Q = (U e ,−U o Z). (4.3) 4.2 Lapped Transforms on Graphs We now propose the lapped graph Fourier transform (LGFT) by first investigating the conditions for perfect reconstruction and orthogonality, and then incorporating the graph variation (1.5) into the optimality criterion. 4.2.1 Conditions of Perfect Reconstruction and Orthogonality We denote a graph signal x∈ R NM as x = (x > 1 ,..., x > N ) > , modeled by an attractive GMRF: x 1 x 2 . . . x N ∼N 0, L † = C = C 1,1 C 1,2 ··· C 1,N C 2,1 C 2,2 ··· C 2,N . . . . . . . . . . . . C N,1 C N,2 ··· C N,N , where (x > k , x > k+1 ) > corresponds to thek-th block with length 2M, and C p,q is the (p,q)- thM×M block component of the covariance matrix C. Unlike the conventional LOTs, where a common model is used for all blocks, here we consider different models for dif- ferent blocks, such as a line graph model with non-uniform weights (e.g., Figure 3.2(a)), where C k,k are different for differentk. Under this assumption, we revisit the conditions of perfect reconstruction and orthogonality. First, we define a lapped transform matrix R k = (E > k , F > k ) > for thek-th block (x > k , x > k+1 ) > , where the R k ’s are different in order to capture distinct statistical properties for different blocks. Based on this definition, the overall transform matrix T LGFT is T LGFT = . . . R k R k+1 . . . = . . . E k F k E k+1 F k+1 . . . . 58 Then, we obtain the output signal y = (y > 1 ,..., y N ) and the reconstructed signal ˆ x = (ˆ x > 1 ,..., ˆ x > N ) with y k = E > k x k + F > k x k+1 , ˆ x k = F k−1 y k−1 + E k y k = E k E > k + F k−1 F > k−1 x k + F k−1 E > k−1 x k−1 + E k F > k x k+1 . By comparing x k and ˆ x k , we obtain the conditions for perfect reconstruction and aliasing cancellation. In addition, the orthogonality constraint, R > k R k = I, is equivalent to E > k E k + F > k F k = I. Thus, for each k, the desired transform should satisfy E k E > k + F k−1 F > k−1 = I, (Perfect reconstruction) (4.4) E k F > k = F k E > k = 0, (Aliasing cancellation) (4.5) E > k E k + F > k F k = I. (Orthogonality) (4.6) Due to the highly nonlinear nature of these constraints as well as the large number of degrees of freedom in designing E k and F k , we propose a LGFT construction that generalizes the LOT solution of (4.1). We select E k = PQ k , F k = (I− P)Q k , (4.7) where the projection matrix P is common for all k so that (4.4) can be satisfied. Based on (4.7), one can verify that (4.4)-(4.6) are always satisfied as long as P is a symmet- ric projection matrix and the Q k are orthogonal matrices. Thus, (4.7) is a sufficient condition of perfect reconstruction and orthogonality for LGFT. 4.2.2 Proposed LGFT Construction An optimality criterion for the conventional LOT based on the transform coding gain can be defined as [53,80]: G TC = 1 M P M i=1 σ 2 k,i Q M i=1 σ 2 k,i 1/M , (4.8) 59 Figure 4.1: An example of Kron reduction on a line graph, whereG S k is the graph obtained from Kron reduction ofG with vertex subset S k . whereσ 2 k,i = Var(y k (i)) is the variance of thei-th transform coefficient in blockk. With x∼N (0, C), σ 2 k,i is the (i,i) entry of E[y k y > k ] = E F ! > C k,k C k,k+1 C k+1,k C k+1,k+1 ! E F ! (4.9) = Q > P I− P ! > C k,k C k,k+1 C k+1,k C k+1,k+1 ! P I− P ! | {z } G k Q. (4.10) It has been shown [80] that, for a fixed P, the optimal Q that maximizes G TC is the eigenmatrix of G k . In the data model considered in conventional LOT, G k = G is common for all k. In (4.1), Z is typically chosen as an orthogonal transform with fast implementation such that ˆ Q approximates the eigenmatrix of G. Here, we extend the LOT design problem to LGFT design as follows. Let the signal x∼N (0, C = L † ) be an attractive GMRF, where L is a graph Laplacian corresponding to graphG and C k,k can be different for different k. Note that the signal in block k is also an attractive GMRF with length 2M: x k x k+1 ! ∼N 0, L † S k , L S k := C k,k C k,k+1 C k+1,k C k+1,k+1 ! † . The matrix L S k is a Laplacian obtained by Kron reduction [23] of the graph nodes V S k ={(k− 1)M + 1,..., (k + 1)M} corresponding to block k. In general, the Kron reduction of a subset of graph nodesV S ∈V can be expressed as L S = L S,S 0L −1 S 0 ,S 0 L S 0 ,S , V S 0 =V\V S . 60 As in (4.9), we consider (4.7) and replace the block components of C by L S k . With the assumption that L S k can be different for differentk, we would like to choose Q k that diagonalizes H k := P I− P ! > L S k P I− P ! . (4.11) In particular, if L corresponds to a line graph, then the Kron reduction L S k is the Laplacian of the line graph segment ofG with nodesV S k . Thus, the retrieval of thek-th LGFT component R k boils down to choosing the line graph segment with length 2M associated to block k, which can be different for different k. An illustrative example is shown in Figure 4.1. While the choice of Q k is straightforward with a given P, finding P for globally optimal solution in term of transform coding gain, even in conventional LOT design, is a challenging problem [81]. Following the LOT design, we adopt the matrix ˆ P in (4.3) associated to (4.2). With this choice of P, this LGFT design can be regarded as a generalization of the LOT, where R k reduces to LOT when Q k = (U e ,−U o Z). To summarize, given a line graphG with Laplacian L and block sizeM, the proposed LGFT can be constructed as follows: 1. Pick P as in (4.3). 2. Pick Q k for each k as the eigenmatrix of the H k in (4.11). 3. Obtain LGFT components as given in (4.7). Similar to the LOT, which achieves nearly optimalG TC when L is associated to a uniform line graph, the proposed LGFT construction, based on a fixed P = ˆ P, can approximately optimize G TC when the line graph has non-uniform weights. 4.3 Experimental Results for LGFT We evaluate the proposed LGFT on a class of line graphs, where some edges have weights ε that characterize weak correlations, and other edges have weights 1. This line graph model has been used for encoding intra-predicted images [49] and piecewise smooth images [50,144]. 4.3.1 Transform Coding Gain with a Particular Line Graph First, we consider a line graph with length n = 400 and ε = 0.05, where weak weights are located at edges (h,h + 1) with h∈{15, 30, 45, 60,..., 390}. A covariance matrix 61 4 8 16 M 1.45 1.5 1.55 1.6 1.65 1.7 1.75 Transform coding gain LOT LGFT KLT GFT DCT Figure 4.2: Transform coding gains for different transforms and block sizes M with signals modeled by a particular line graph. The KLT, GFT, and DCT shown here are defined in a block-based manner. C = (L + 0.2I) −1 is used in this experiment to avoid the matrix singularity issue. We compare the transform coding gains with LOT, LGFT, DCT, KLT, and GFT, where the three latter transforms are designed and applied in a non-overlapping block-based manner. The implementation of LOT follows from (4.2), where Z is composed of a DCT- II and a DST-IV, as suggested in [79]. For each transform, block sizes of M = 4, 8, and 16 are considered. In Figure 4.2 we show the transform size versus the transform coding gain G TC . For this modelN (0, C), G TC is upper-bounded by 1.736, derived from the length-n KLT of the overall model. We can see that the LGFT yields the highest gain for all block sizes included in this experiment. For some block sizes larger than M = 16, the transform coding gains with the block-based KLT and GFT are higher than those of the LGFT since there are fewer block boundaries. In practical coding scenarios, additional information such as the positions of weak edges may be required as bit rate overhead. This has not been considered in the analysis here. We show the LGFT basis functions R 2 with M = 8 in Figure 4.3. Note that R 2 is designed for the block (x > 2 , x > 3 ) > with nodes 9 to 24, where a weak edge (15, 16) lies between the 7th and 8th nodes within this block. We can observe in most basis functions a discontinuity between entries 7 and 8 that corresponds to the lower correlation, showing 62 5 10 15 -1 0 1 R 2 , column 1 5 10 15 -1 0 1 R 2 , column 2 5 10 15 -1 0 1 R 2 , column 3 5 10 15 -1 0 1 R 2 , column 4 5 10 15 -1 0 1 R 2 , column 5 5 10 15 -1 0 1 R 2 , column 6 5 10 15 -1 0 1 R 2 , column 7 5 10 15 -1 0 1 R 2 , column 8 Figure 4.3: Basis functions of the LGFT for k = 2 with M = 8. that the LGFT basis can capture the local weak correlation in the graph topology through different choices of Q k . 4.3.2 LGFT for Image Coding In the second experiment, we apply an LGFT to image coding. A 480×640 piecewise smooth image from the Tsukuba dataset [97] is used for this experiment. For demonstra- tion and comparison purpose, we apply different transforms (LGFT, LOT, DCT, and GFT) horizontally, then a common transform (block-based DCT) vertically. We quantize the coefficients and apply inverse horizontal transforms and vertical block-based DCT to 63 (a) Full Image (b) Original (c) Edge map (d) LGFT (PSNR=42.27dB) (e) LOT (PSNR=42.14dB) (f) GFT (PSNR=42.57dB) (g) DCT (PSNR=42.33dB) Figure 4.4: Subjective comparison different transforms with QP=30. (a) The full piece- wise smooth image and a 160×60 patch. (b) Original image patch. (c) Sobel edge map. (d)-(f) Recovered image patches with different transforms. reconstruct the signal. To define the line graphs for LGFT and GFT, we apply a Sobel edge detector [40] to obtain an edge map. For each image row we define a line graph based on image discontinuity information in the Sobel edge map: an edge weight of the 64 QP PSNR (dB) LGFT LOT GFT DCT 25 48.24 48.12 48.63 48.38 30 44.84 44.74 45.11 44.93 35 41.94 41.83 42.11 41.91 40 37.74 37.73 37.69 37.62 45 33.15 33.15 33.09 33.06 Table 4.1: Quality comparison of different horizontal transforms on full test image. line graph is ε = 0.7 if any of the two corresponding pixels is an edge point in the Sobel map. We use a block size M = 8, and quantization parameters (QP) ranging from 25 to 45, corresponding to quantization factors 2 (QP−4)/6 . Note that, in a practical image coding scheme, the edge location can be transmitted as side information to the decoder for unique decodability. In this experiment, we only compare the distortion but not the number of bits required for encoding. This means that we can fairly compare LGFT with GFT, but not with DCT and LOT, which do not require side information. Table 4.1 shows the peak-signal-to-noise ratio (PSNR) of different transforms with different QPs. We note that, similar to the GFT, which provides a higher PSNR than the DCT, the LGFT gives a higher PSNR than LOT for almost all QPs. This gain results from the fact that the line graph model, which LGFT and GFT are based on, can better capture the discontinuities in the image signal. In Figure 4.4 we show the original image, the Sobel edge map, and the reconstructed images using different transforms with QP=30. With this quantization level, while LGFT does not give higher PSNRs than GFT, it yields reduced blocking artifacts (reduced vertical image discontinuities) as compared to the GFT. 4.4 Conclusion In this chapter, we have extended the well-known lapped transform to a graph-based setting. We derived the conditions for perfect reconstruction and orthogonality of the lapped graph Fourier transform (LGFT), which is more general than the lapped orthog- onal transform (LOT). Then, we proposed a design of LGFT on an arbitrary line graph, where different transform functions are applied to different blocks to adapt to different local statistical properties. Experimental results showed that on a nonuniform line graph, the LGFT can achieve a better energy compaction than the block-based graph Fourier transform and a conventional LOT in terms of transform coding gain. The extensions to different lengths of overlap and to graphs with more general topologies, as well as fast implementations of the LGFT, will be explored in future work. 65 Chapter 5 Efficient DCT and DST Filtering with Sparse Graph Operators Filtering, where frequency components of a signal are attenuated or amplified, is a fundamental operation in signal processing. Similar to conventional filters in digital signal processing, which manipulate signals in Fourier domain, a graph filter selectively reduces or amplifies graph Fourier coefficients, and can be characterized by a frequency response that indicates how much the filter amplifies each graph frequency component. This notion of frequency selection leads to various applications, including graph signal denoising [7, 89, 138], classification [78] and clustering [128], and graph convolutional neural networks [19,55]. For an undirected graph, we recall (1.7) in Section 1.2.2, which represents the fre- quency domain graph filter operation: H = Φ·h(Λ)· Φ > , h(Λ) := diag(h(λ 1 ),··· ,h(λ N )). (1.7) The operation y = Hx with input signal x and filter matrix H involves a forward graph Fourier transform (GFT) Φ > , a frequency selective scaling operation h(Λ), and an inverse GFT Φ. However, as fast GFT algorithms may not exist for any arbitrary graphs (see Chapter 2), GFT often introduces a very high computational overhead when the graph is large. To address this issue, graph filters can usually be implemented with polynomial operations in vertex domain as (recall again in Section 1.2.2) H = K X k=0 g k Z k , with Z 0 = I, (1.6) where the g k ’s are coefficients and Z is called the fundamental graph operator, or graph operator for short. With this expression, graph filtering can be applied in the vertex (sample) domain via y = Hx, which does not require GFT computations. A graph filter in the form of (1.6) is usually called an FIR graph filter [16, 51] as it can be viewed as This chapter is an extended version of the work in [74,75]. 66 an analogy to conventional FIR filters with order K, which are polynomials of the delay z. In what follows, we call the filters defined as in (1.6) polynomial graph filters (PGFs). Various methods for designing vertex domain graph filters given a desired frequency response have been studied in the literature. Least squares design of polynomial fil- ters given a target frequency response was introduced in [105]. The recurrence relations of Chebyshev polynomials provide computational benefits in distributed filter imple- mentations as shown in [44, 115]. In [109] an extension of graph filter operations to a node-variant setting is proposed, and polynomial approximation approaches using con- vex optimization are introduced. Autoregressive moving average (ARMA) graph filters, whose frequency responses are characterized by rational polynomials, have been inves- tigated in [51, 68] in both static and time-varying settings. Design strategies of ARMA filters are further studied in [67], which provides comparisons to PGFs. Furthermore, in [16], state-of-the-art filtering methods have been extended to an edge-variant setting. Recently, it has been pointed out that multiple operators can be obtained for cycle graphs [34] and line graphs [74]. For those graphs, multiple graph operators Z ={Z (1) , Z (2) ,..., Z (m) } that are jointly diagonalizable (i.e., have a common eigenba- sis) can be obtained. Essentially, those operators are by themselves graph filter matrices with different frequency responses. Thus, unlike (1.6), which is a polynomial of a single operator, we can design graph filters as follows: H Z,K =p K (Z (1) , Z (2) ,..., Z (m) ), (5.1) where p K (·) stands for a multivariate polynomial with degree K and arbitrary coef- ficients. Iterative algorithms for the implementation of such graphs filters have been further studied in [27]. Since H {Z},K =p K (Z) reduces to (1.6), the form (5.1) is a gen- eralization of the PGF expression. We refer to (5.1) as multivariate polynomial graph filter (MPGF). In this chapter, we focus on filter operations based on the well-known DCTs and DSTs. Notably, the DCT is the GFT associated to a uniform line graph, which means that DCT filters are essentially graph filters. A DCT filter [8] is defined as the following operation: 1) applying the DCT, 2) scaling DCT coefficients selectively, and 3) perform- ing the inverse DCT. While implementation of DCT filter is typically done using forward and inverse DCT, in fact, we can apply graph filter approaches to design and implement DCT filters. More generally, it has been shown in [99] that all members of the family of discrete trigonometric transforms (DTTs), i.e., 8 types of DCTs and 8 types of dis- crete sine transforms (DSTs), are associated with line graphs. This allows us to exploit 67 graph filter approaches to all DTT filters, whose applications include image resizing [91], biomedical signal processing [113], medical imaging [129], and video coding [143]. Our work approaches the design of efficient sample domain (graph vertex domain) graph filters, with particular focus on DTT filters. In particular, we show that if the GFT is one among the 16 DTTs, then, in addition to the graph operator obtained from the well-known line graph model [99], we can derive a familyZ of sparse graph operators with closed form expressions. In this way, efficient DTT filters can be obtained using PGF and MPGF design approaches, yielding a lower complexity than a DTT filter implementation in the transform domain. Experimental results in video coding are presented to demonstrate the effectiveness of the proposed DCT and DST filters. In the remainder of this chapter, we summarize our contributions with respect to previous work in Section 5.1. In Section 5.2, we consider DTTs, where sparse operators can be obtained by extending well-known properties of DTTs. We also extend the results to 2D DTTs and provide some remarks on sparse operators for general graphs. Section 5.3 introduces PGF and MPGF design approaches using least squares and minimax criteria. An efficient filter design for Laplacian quadratic form approximation is also presented. Experimental results are shown in Section 5.4 to demonstrate the effectiveness of our methods in graph filter design as well as applications in video coding. The conclusion of the chapter will be drawn in Section 5.5. 5.1 Summary of Contributions The first contribution of this work is to introduce novel DTT filter design methods for graph vertex domain implementation. We note that DTT filters are special cases of graph filters, which can be implemented using PGFs and MPGFs. The DTT case has not been explored in existing work on general graph filters [16, 51, 67, 68, 109, 115]. We also introduce multiple sparse graph operators specific to DTTs and allowing fast MPGF implementations. In addition, while in related work [12, 82, 99], DTT filtering is typi- cally performed in the transform domain using convolution-multiplication properties, we introduce sample domain DTT filter implementations based on PGF and MPGF designs, and show that our designs with low degree polynomials lead to faster implementations as compared to those designs that require forward and inverse DTTs, especially in cases where DTT size is large. Furthermore, in addition to the well-known least squares graph filter design, we propose a novel minimax design approach for both PGFs and MPGFs. Second, we provide novel insights on multiple sparse graph operators. Motivated by the previous work [27, 34], we use multiple graph operators for filter design, but focus particularly on operators that are sparse. Notably, we show that for each of the 16 DTTs, 68 whose associated graph is a uniform line graph or its variants, there exist a series of oper- ators that are sparse and have closed form expressions. We demonstrate that those oper- ators lead to more efficient implementations, as compared to conventional PGF designs, for DTT filters with frequency responses that are non-smooth (e.g., ideal low-pass filter) or non-monotonic (e.g., bandpass filter). Note that [27] studies MPGFs with a focus on distributed filter implementations, but does not investigate design approaches for those filters or how sparse operators for generic graphs can be obtained other than cycle and Cartesian product graphs. Our work complements the study in [27] by considering 1) the case where GFT is a DTT, which corresponds to various line graphs, and 2) design approaches for MPGFs. To the best of our knowledge, sparse DTT operators other than those derived from line graphs are not known in the literature. Third, we demonstrate experimentally the benefits of sparse DTT operators in image and video compression applications. In addition to filter operation, our approach can also be used to evaluate the transform domain weighted energy given by the Laplacian quadratic form, which has been used for rate-distortion optimization in the context of image and video coding [33,50]. We implement the proposed method in AV1, a real-world codec, where our method provides a speedup in the transform type search procedure. 5.2 Sparse DCT and DST Operators Classical PGFs can be extended to MPGFs [27] if multiple graph operators are available [34]. Let L = ΦΛΦ > be a Laplacian with GFT Φ, we assume that we have a series of graph operatorsZ ={Z (k) } M k=1 that share the same eigenvectors as L, but with different eigenvalues: Z (k) = ΦΛ (k) Φ > , Λ (k) = diag(λ (k) ) = diag(λ (k) 1 ,...,λ (k) N ), where λ (k) = (λ (k) 1 ,...,λ (k) N ) > denotes the vector of eigenvalues of Z (k) . When the polynomial degree K = 1 in (5.1), we have: H Z,1 =g 0 I + M X m=1 g m Z (m) , (5.2) 69 where g k are coefficients. When K = 2, we have H Z,2 =g 0 I + M X m=1 g m Z (m) +g M+1 Z (1) Z (1) +g M+2 Z (1) Z (2) +··· +g 2M Z (1) Z (M) +g 2M+1 Z (2) Z (2) +··· +g 3M−1 Z (2) Z (M) +... +g (M 2 +3M)/2 Z (M) Z (M) , (5.3) where the terms Z (j) Z (i) withj >i are not required because all operators commute, i.e., Z (i) Z (j) = Z (j) Z (i) . Expressions with a higher degree can be obtained with polynomial kernel expansion [47]. We also note that, since H {Z},K reduces to the form of H in (1.6), H Z,K is a generalization of PGF and thus provides more degrees of freedom for the filter design procedure. As pointed out in the introduction, DTT filters are essentially graph filters. This means that they can be implemented as PGFs (1.6) without applying any forward or inverse DTT. Next, we will go one step further by introducing multiple sparse operators for each DTT, which allows the implementation of DTT filters using MPGF. 5.2.1 Sparse DCT-II Operators Recall that u j denotes the DCT functions as in (1.9), and L D represents the Laplacian of a uniform line graph as in (1.10). Here, we revisit a validation in [117] to verify that u j is an eigenvector of L D with eigenvalueω j = 2− 2 cos((j− 1)π/N) for eachj = 1,...N. Then, we extend it to a generalized version, which contains not only L D but a family of sparse graph filters that all have u j as an eigenvector. To verify that L D ·u j =ω j u j , it suffices to consider an equivalent equation: Z DCT-II · u j = (2−ω j )u j , where Z DCT-II = 2I− L D = 1 1 1 0 1 . . . . . . . . . 1 0 1 1 1 . (5.4) 70 Table 5.1: Matrix structure and corresponding eigenpairs of sparse operators associated to all DCTs and DSTs. The indice j ranges from 1 to N. GFT Structure of Z (`) (Eigenvalue, Eigenvector) of Z (`) DCT-I Figure 5.2(a), ` = 1,...,N− 1 2 cos `(j−1)π N−1 , φ j DCT-II Figure 5.2(b), ` = 1,...,N 2 cos `(j−1)π N , φ j DCT-III Figure 5.2(c), ` = 1,...,N− 1 2 cos `(j−1/2)π N , φ j DCT-IV Figure 5.2(d), ` = 1,...,N− 1 2 cos `(j−1/2)π N , φ j DCT-V Figure 5.2(e), ` = 1,...,N− 1 2 cos `(j−1)π N−1/2 , φ j DCT-VI Figure 5.2(f), ` = 1,...,N− 1 2 cos `(j−1)π N−1/2 , φ j DCT-VII Figure 5.2(g), ` = 1,...,N− 1 2 cos `(j−1/2)π N−1/2 , φ j DCT-VIII Figure 5.2(h), ` = 1,...,N 2 cos `(j−1/2)π N+1/2 , φ j DST-I Figure 5.3(a), ` = 1,...,N + 1 2 cos `jπ N+1 , φ j DST-II Figure 5.3(b), ` = 1,...,N 2 cos `jπ N , φ j DST-III Figure 5.3(c), ` = 1,...,N− 1 2 cos `(j−1/2)π N , φ j DST-IV Figure 5.3(d), ` = 1,...,N− 1 2 cos `(j−1/2)π N , φ j DST-V Figure 5.3(e), ` = 1,...,N 2 cos `jπ N+1/2 , φ j DST-VI Figure 5.3(f), ` = 1,...,N 2 cos `jπ N+1/2 , φ j DST-VII Figure 5.3(g), ` = 1,...,N 2 cos `(j−1/2)π N+1/2 , φ j DST-VIII Figure 5.3(h), ` = 1,...,N− 1 2 cos `(j−1/2)π N−1/2 , φ j For 1≤p≤N, the p-th element of Z DCT-II · u j is (Z DCT-II · u j ) p = u j (1) +u j (2), p = 1 u j (p− 1) +u j (p + 1), 2≤p≤N− 1 u j (N− 1) +u j (N), p =N. 71 Following the expression in (1.9), we extend the definition ofu j (k) to an arbitrary integer k. The even symmetry of the cosine function at 0 and π gives u j (0) = u j (1) and u j (N) =u j (N + 1), and thus (Z DCT-II · u j ) 1 =u j (0) +u j (2), (Z DCT-II · u j ) N =u j (N− 1) +u j (N + 1). (5.5) This means that for all p = 1,...,N, (Z DCT-II · u j ) p =u j (p− 1) +u j (p + 1) (5.6a) = r 2 N c j " cos (j− 1)(p− 3 2 )π N + cos (j− 1)(p + 1 2 )π N # (5.6b) = 2 r 2 N c j cos (j− 1)(p− 1 2 )π N cos (j− 1)π N (5.6c) = (2−ω j )u j (p), (5.6d) which verifies Z DCT-II · u j = (2−ω j )u j . Note that in (5.6b), we have applied the sum-to-product trigonometric identity: cosα + cosβ = 2 cos α +β 2 cos α−β 2 . (5.7) Indeed, whenu j (q± 1) is replaced byu j (q±`) in (5.6a), this identity also applies, which generalizes (5.6a)-(5.6d) to u j (p−`) +u j (p +`) = r 2 N c j " cos (j− 1)(p−`− 1 2 )π N + cos (j− 1)(p +`− 1 2 )π N # = 2 r 2 N c j cos (j− 1)(p− 1 2 )π N cos `(j− 1)π N = 2 cos `(j− 1)π N u j (p). (5.8) As in (5.5), we can apply even symmetry of the cosine function at 0 and π, to replace indices p−` or p +` that are out of the range [1,N] by those within the range: u j (p−`) =u j (−p +` + 1), u j (p +`) =u j (−p−` + 2N + 1). Then, an N×N matrix Z (`) DCT-II can be defined such that the left hand side of (5.8) corresponds to (Z (`) DCT-II · u j ) p . 72 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0 0 0 0 2 0 0 2 0 0 2 0 0 2 0 0 0 (a) Z (1) DCT-II , Z (2) DCT-II , Z (3) DCT-II , and Z (4) DCT-II 1 −1 0 0 −1 2 −1 0 0 −1 2 −1 0 0 −1 1 2 −1 −1 0 −1 2 0 −1 −1 0 2 −1 0 −1 −1 2 2 0 −1 −1 0 1 0 −1 −1 0 1 0 −1 −1 0 2 2 0 0 −2 0 2 −2 0 0 −2 2 0 −2 0 0 2 (b) L (1) DCT-II , L (2) DCT-II , L (3) DCT-II , and L (4) DCT-II (c)G (1) DCT-II ,G (2) DCT-II ,G (3) DCT-II , andG (4) DCT-II Figure 5.1: (a) Sparse operators Z (j) DCT-II , (b) their associated Laplacian matrices L (j) DCT-II = 2I− Z (j) , and (c) associated graphsG (j) DCT-II for the length-4 DCT-II. Proposition 1. For` = 1,...,N− 1, we define Z (`) DCT-II as aN×N matrix, whose p-th row has only two non-zero elements specified as follows: Z (`) DCT-II p,q 1 = 1, q 1 = ( p−`, if p−`≥ 1 −p +` + 1, otherwise Z (`) DCT-II p,q 2 = 1, q 2 = ( p +`, if p +`≤N −p−` + 2N + 1, otherwise This matrix Z (`) DCT-II has eigenvectors u j with associated eigenvalues 2 cos(`(j− 1)π/N) for j = 1,...,N. Note that Z (1) DCT-II = Z DCT-II as in (5.4). Taking ` = 2 and ` = 3 and following Proposition 1, we see that nonzero elements in Z (2) DCT-II and Z (3) DCT-II form rectangular- like patterns similar to that in Z DCT-II : Z (2) DCT-II = 1 1 1 1 1 . . . 1 1 . . . 1 1 1 , Z (3) DCT-II = 1 1 1 . . . 1 1 1 1 . . . 1 1 1 (5.9) 73 For` =N, the derivations in (5.8) are also valid, but with Z (N) DCT-II = 2J. The rectangular patterns we observe in (5.9) can be simply extended to any arbitrary transform length N (e.g., all such operators with N = 6 are shown in Fig. 5.2(b)). We also show the associated eigenvalues of Z (`) DCT-II with arbitrary N in Table 5.1. Example–Length 4 DCT-II Operators We show in Figure 5.1(a) all sparse operators Z (`) DCT-II of DCT-II for N = 4. In fact, those matrices can be regarded as standard operators on different graphs: by defining L (`) DCT-II = 2I−Z (`) DCT-II , we can view Lm (`) DCT-II as a Laplacian matrix of a different graph G (`) DCT-II . For example, all the resulting L (`) DCT-II ’s andG (`) DCT-II ’s for a length-4 DCT-II are shown in Figure 5.1(b) and (c), respectively. The rectangular patterns we observe in (5.9) can be simply extended to any arbitrary transform length N. We also show the associated eigenvalues of Z (`) DCT-II with arbitrary N in Table 5.1. We observe that, among all graphs in Figure 5.1(c),G (4) DCT-II is a disconnected graph with two connected components. It is associated to the operator Z (4) DCT-II = Φ DCT-II · diag(2,−2, 2,−2)· Φ > DCT-II . Note that, while Z (4) DCT-II is associated to a disconnected graph, it can still be used as a graph operator for DCT-II filter because it is diagonalized by Φ DCT-II . However, Z (4) DCT-II , as well as its polynomials, have repeated eigenvalues, i.e., eigenvalues with multiplicity 2. This means that a filter whose frequency response has distinct values (e.g. low-pass filter with h(λ 1 )>···>h(λ 4 ) cannot be realized as a PGF of Z (4) DCT-II . Based on the previous observation, we can see that those operators associated to disconnected graphs, and those having eigenvalues with high multiplicities would provide less degrees of freedoms in PGF and MPGF filter designs, as compared to an operators with distinct eigenvalues such as Z (1) DCT-II . 5.2.2 Sparse Operators of 16 DTTs In fact, the approach in Section 5.2.1 can be adapted to all 16 DTTs to obtain their sparse operators. For illustrative purpose, we present in Section A.2 the derivations for DST-VI, DST-VII, and DCT-V, which share the same right boundary condition with DCT-II, but have different left boundary condition (see Table 1.1). Results for those DTTs with other combinations of left/right boundary condition can be easily extended. In Table 5.1, we list the sparse operators and their associated eigenpairs for all DTTs. Figures 5.2 and 5.3 show the operators for N = 6, which can be easily extended to any 74 (a) Z (1) DCT-I to Z (5) DCT-I (b) Z (1) DCT-II to Z (6) DCT-II (c) Z (1) DCT-III to Z (5) DCT-III (d) Z (1) DCT-IV to Z (5) DCT-IV (e) Z (1) DCT-V to Z (5) DCT-V (f) Z (1) DCT-VI to Z (5) DCT-VI (g) Z (1) DCT-VII to Z (5) DCT-VII (h) Z (1) DCT-VIII to Z (6) DCT-VIII Figure 5.2: Sparse graph operators with length N = 6 that associated to DCT-I to DCT-VIII. Different symbols represent different values:× =−1,· = 0, = 1,4 = √ 2, and = 2. arbitrary length. Interestingly, we observe that the non-zero entries in all sparse opera- tors have rectangle-like patterns. Indeed, the 16 DTTs are constructed with combinations of 4 types of left boundary conditions and 4 types of right boundary conditions, which are associated to 4 types of upper-left rectangle edges and 4 types of lower-right rectangle edges in Figures 5.2 and 5.3. Some of DCT sparse operators and their variants in Figures 5.2 and 5.3 were already known such as Z (1) DCT-I [57], I + Z (1) DCT-III and I + Z (1) DCT-IV [43], whose results are then summarized in [103] under a more general framework. In [99], left and right boundary conditions have been exploited to obtain sparse matrices with DTT eigenvectors, which correspond to the first operator Z (1) for each DTT. However, to the best of our knowledge, graph operators with`> 1 (i.e., Z (2) to Z (N−1) for each DTT) have not been studied in the literature. 75 (a) Z (1) DST-I to Z (7) DST-I (b) Z (1) DST-II to Z (6) DST-II (c) Z (1) DST-III to Z (5) DST-III (d) Z (1) DST-IV to Z (5) DST-IV (e) Z (1) DST-V to Z (6) DST-V (f) Z (1) DST-VI to Z (6) DST-VI (g) Z (1) DST-VII to Z (6) DST-VII (h) Z (1) DST-VIII to Z (5) DST-VIII Figure 5.3: Sparse graph operators with lengthN = 6 that associated to DST-I to DST- VIII. Different symbols represent different values: + =−2,× =−1,· = 0, = 1, and 4 = √ 2. 5.2.3 Sparse 2D DCT/DST Filters In image and video coding, the DTTs are often applied to 2D pixel blocks, where a combination of 1D DTTs can be applied to columns and rows of the blocks. We consider a N 1 ×N 2 block (with N 1 pixel rows and N 2 pixel columns), X 1,1 X 1,2 ... X 1,N 2 X 2,1 X 2,2 ... X 2,N 2 . . . . . . . . . . . . X N 1 ,1 X N 1 ,2 ... X N 1 ,N 2 . We use a 1D vector x∈R N 1 N 2 to denote X with column-first ordering: x = (X 1,1 ,X 2,1 ,...,X N 1 ,1 ,X 1,2 ,X 2,2 ,...,X N1,2 ,...,X N 1 ,N 2 ) > We assume that the GFT Φ = Φ r ⊗ Φ c is separable with row transform Φ r and column transform Φ c . In such cases, sparse operators of 2D separable GFTs can be obtained from those of 1D transforms: 76 Figure 5.4: Graphs associated to sparse operators of 2D 4× 4 DCT. For visualization, coordinates are slightly shifted to prevent some edges from overlapping. Self-loops are not shown in the graphs. Proposition 2 (Sparse 2D DTT operators). Let Φ = Φ r ⊗ Φ c with Φ r and Φ c being orthogonal transforms among the 16 DTTs, and letZ r andZ c be the set of sparse opera- tors associated to Φ r and Φ c , respectively. Denote the eigenpairs associated to the oper- ators ofZ r andZ c as (λ r,j ,φ r,j ) and (λ c,k ,φ c,k ) with j = 1,...,N 1 and k = 1,...,N 2 . Then, Z ={Z r ⊗ Z c , Z r ∈Z r , Z c ∈Z c } is a set of sparse operators corresponding to Φ r ⊗ Φ c , with associated eigenpairs (λ r,j λ c,k ,φ r,j ⊗φ c,k ). 77 Figure 5.5: Sparse operators associated to 2D 4× 4 DCT. Symbols· and represent 0 and 1, respectively. Proof: Let Z (1) r , ... , Z (M 1 ) r be sparse operators inZ r with associated eigenvalues con- tained in vectorsλ (1) r , ... ,λ (M 1 ) r , respectively. Also let Z (1) c , ... , Z (M 1 ) c be those inZ c with eigenvalues inλ (1) c , ... ,λ (M 2 ) c , respectively. We note that Z (m 1 ) r = Φ r · diag(λ (m 1 ) r )· Φ > r , m 1 = 1,...,M 1 , Z (m 2 ) c = Φ c · diag(λ (m 2 ) c )· Φ > c , m 2 = 1,...,M 2 . 78 Applying a well-known Kronecker product identity [145], we obtain Z (m 1 ) r ⊗ Z (m 2 ) c = Φ· diag(λ (m 1 ) r ⊗λ (m 2 ) c )· Φ > . In Proposition 2, we allow Φ c and Φ r to be the same. An example is shown in Figures 5.4 and 5.5, where Φ c = Φ r is the length-4 DCT-II, and Φ is the 4× 4 2D DCT. 5.2.4 Remarks on Graph Operators of Arbitrary GFTs Obtaining multiple sparse operators Z (k) for a fixed GFT Φ∈ R N×N is a challenging problem in general. Let the graph Laplacian associated to Φ be L, and λ j be the eigenvalue of L associated to eigenvectorφ j . It can be seen that, if the graph does not have any self-loops, the Laplacian of the complement graph [86] L c :=Nw max I−w max 11 > − L, has eigenpairs (0,φ 1 ) and (n−λ j ,φ j ) for j = 2,...,N. However, L c will be a dense matrix when L is sparse, and thus it may not provide an efficient MPGF design. Some additional remarks on the retrieval of sparse graph operators are presented as follows. More details on those remarks can be found in Appendix C. Characterization of Sparse Laplacians with a Common GFT Extending a key result in [92], we can characterize the set of all graph Laplacians (i.e., that satisfy (1.4) with non-negative edge and self-loop weights) sharing a given GFT Φ by a convex polyhedral cone. In particular, those graph Laplacians that are the most sparse among all correspond to the edges of a polyhedral cone (i.e., where the faces of the cone meet each other). However, the enumeration of edges is in general an NP-hard problem since the number of polyhedron vertices or edges can be a combinatorial number of N. Construction of Sparse Operators from Symmetric Graphs If a graph with Laplacian L satisfies the ϕ-symmetry property in Definition 2, then we can construct a sparse operator in addition to L. Recall that for an involution ϕ on graph vertices, a graph is ϕ-symmetric if w i,j = w ϕ(i),ϕ(j) for all i,j∈V. For such a graph, a sparse operator can be constructed as follows: Lemma 7. Given a ϕ-symmetric graphG with Laplacian L, we can construct a graph G ϕ by connecting nodes i and j with edge weight 1 for all node pairs (i,j) with ϕ(i) = j,i6=j). In this way, the Laplacian L ϕ ofG ϕ commutes with L. 79 Figure 5.6: An example for PGF and MPGF fitting results on a length 12 line graph. The desired frequency response ish ∗ (λ) = exp(−4(λ− 1) 2 ). The PGF and MPGF filters have been optimized based on (1.6) and (5.10). The proof of this theorem and a illustrative example are presented in Appendix D. 5.3 New Graph Filter Designs with Sparse Operators In this section, we introduce some filter design approaches based on sparse operators for DTTs. The least squares design method will be summarized in Section 5.3.1. We also propose a minimax filter design in Section 5.3.2 for both PGF and MPGF. Then, in Section 5.3.3 we show that weighted energy in graph frequency domain can also be efficiently approximated using multiple graph operators. 5.3.1 Least Squares (LS) Graph Filter For an arbitrary graph filter H ∗ , its frequency response h ∗ = (h ∗ (λ 1 ),...,h ∗ (λ N )) > , can be approximated with a filter H Z,K in (5.1) by designing a set of coefficients g as in 80 (5.2) or (5.3). Let h(λ j ) be the frequency response of corresponding to H Z,K , then one way to obtain g is through a least squares solution: g ∗ = argmin g N X j=1 (h ∗ (λ j )−h(λ j )) 2 = argmin g N X j=1 h ∗ (λ j )−p K (λ (1) j ,...,λ (M) j ) 2 = argmin g h ∗ − Π K (λ (1) ,...,λ (M) )· g 2 , (5.10) where Π K for K = 1 and K = 2 are Π 1 (λ (1) ,...,λ (M) ) = 1 λ (1) 1 ... λ (M) 1 . . . . . . . . . . . . 1 λ (1) N ... λ (M) N , Π 2 (λ (1) ,...,λ (M) ) = 1 λ (1) 1 ... λ (M) 1 λ (1) 1 λ (1) 1 λ (1) 1 λ (2) 1 ... λ (M) 1 λ (M) 1 . . . . . . . . . . . . . . . . . . . . . 1 λ (1) N ... λ (M) N λ (1) N λ (1) N λ (1) N λ (2) N ... λ (M) N λ (M) N . This formulation can be generalized to a weighted least squares problem, where we allow different weights for different graph frequencies. This enables us to approximate the filter in particular frequencies with higher accuracy. In this case, we consider g ∗ = argmin g N X j=1 ρ 2 i (h ∗ (λ j )−h(λ j )) 2 = argmin g kdiag(ρ)(h ∗ − Π K · g)k 2 , (5.11) where ρ i ≥ 0 is the weight corresponding to λ i . Note that when ρ = 1, the problem (5.11) reduces to (5.10). When g is more sparse, (i.e., has a smaller ` 0 norm), fewer terms will be involved in the polynomial p K , leading to a lower complexity for the filtering operation. This ` 0 -constrained problem can be viewed as a sparse representation of diag(ρ)h ∗ in an overcomplete dictionary diag(ρ)Π K . Well-known methods for this problem include the orthogonal matching pursuit (OMP) algorithm [93], and the optimization with a sparsity- promoting ` 1 constraint: minimize g kdiag(ρ)(h ∗ − Π K · g)k 2 subject to kgk 1 ≤τ, (5.12) 81 whereτ is a pre-chosen threshold. In fact, this formulation can be viewed as an extension of its PGF counterpart [115] to an MPGF setting. Note that (5.12) is a ` 1 -constrained least squares problem (a.k.a., the LASSO problem), where efficient solvers are available [126]. Compared to conventional PGF H in (1.6), the implementation with H Z,K has sev- eral advantages. First, whenK = 1, the MPGF (5.2) is a linear combination of different sparse operators, which is amenable to parallelization. This is contrast to high degree PGFs based on (1.6) that require applying the graph operator repeatedly. Second, H Z,K is a generalization of H and provides more degrees of freedom, which may lead to a more accurate approximation with a lower order polynomial. Note that, while the eigenval- ues of Z k for k = 1, 2,... are typically all increasing (if Z = L) or decreasing (if, for instance, Z = 2I− L), those of different Z (m) ’s have more diverse distributions (i.e., increasing, decreasing, or non-monotnoic). Thus, it is more likely that MPGFs can effi- ciently approximate filters with non-monotonic frequency responses. For example, we demonstrate in Figure 5.6 the resulting PGF and MPGF for a bandpass filter. We can see that, for K = 2 and K = 3, a degree-1 MPGF with K operators gives a higher approximation accuracy than a degree-K PGF, while they have a similar complexity. 5.3.2 Minimax Graph Filter The minimax approach is a popular filter design method in classical signal processing. The goal is to design an length-K FIR filter whose frequency response G(e jω ) approxi- mates the desired frequency response H(e jω ) in a way that the maximum error within some range of frequency is minimized. A standard design method is the Parks-McClellan algorithm, which is a variation of the Remez exchange algorithm [83]. Here, we explore minimax design criteria for graph filters. We denoteh ∗ (λ) the desired frequency response, and g(λ) the polynomial filter that approximates h ∗ (λ). Polynomial Graph Filter Let g(λ) be the PGF with degree K given by (1.6). Since graph frequencies λ 1 , ... , λ N are discrete, we only need to minimize the maximum error between h ∗ and g at frequencies λ 1 , ... , λ N . In particular, we would like to solve polynomial coefficients g i : minimize b max i ρ i h ∗ (λ i )− K X j=0 g j λ j i | {z } kdiag(ρ)(h ∗ −Ψg)k∞ 82 Figure 5.7: Example illustrating the frequency responses of degreeK = 4 PGF with least squares (LS) and minimax criteria, with weighted or unweighted settings. The filters are defined on a length 24 line graph. In the weighted setting, weights ρ i are chosen to be 2, 0, and 1 for passband, transition band, and stopband, respectively. where Ψ is the matrix in (1.8), ρ i is the weight associated to λ i andk·k ∞ represents the infinity norm. Note that, when K≥ N− 1 and Ψ is full row rank, then h ∗ = Ψg can be achieved with g = Ψ † h ∗ . Otherwise, we reduce this problem by setting = kdiag(ρ) (h ∗ − Ψg)k ∞ : minimize g, subject to −1 diag(ρ) (h ∗ − Ψg)1, (5.13) whose solution can be efficiently obtained with a linear programming solver. Multivariate Polynomial Graph Filter Now we considerg(λ) a graph filter with M graph operators with degree K, as in (5.1). In this case, we can simply extend the problem (5.13) to minimize g, subject to −1 diag(ρ) (h ∗ − Π K g)1, (5.14) where a ` 1 or ` 0 norm constraint on g can also be considered. To summarize, we show in Table 5.2 the objective functions of least squares and minimax designs with PGF and MPGF, where weights on different graph frequencies 83 PGF MPGF Least squares min g ||diag(ρ)(h− Ψg)|| 2 (5.11) Minimax (5.13) (5.14) Table 5.2: Least squares and minimax design approaches of for PGF and MPGF, with weights ρ i on different graph frequencies. are considered. Note that the least squares PGF design shown in Table 5.2 is a simple extension of the unweighted design (1.8) in [105]. Using an ideal low-pass filter as the desired filter, we show a toy example with degree-4 PGF in Figure 5.7. When different weightsρ i are used for passband, transition band, and stopband, approximation accuracies differ for different graph frequencies. By comparing LS and minimax results in a weighted setting, we also see that the minimax criterion yields a smaller maximum error within the passband (see the last frequency bin in passband) and stopband (see the first frequency bin in stopband). 5.3.3 Weighted GFT Domain Energy Evaluation Let x be a signal and Φ be a GFT to be applied, we consider a weighted sum of squared GFT coefficients: C Φ (x; q) = N X i=1 q i (φ > i x) 2 , (5.15) where arbitrary weights q = (q 1 ,...,q N ) > can be considered. Denote the graph Lapla- cian associated to Φ as L = ΦΛΦ > , thenC Φ (x; q) has a similar form to the Laplacian quadratic form (1.5), since x > Lx = N X l=1 λ l (φ > l x) 2 . (5.16) Note that computation of x > Lx using (1.5) can be done in the vertex domain, and does not require the GFT coefficients. This provides a low complexity implementation than (5.16), especially when the graph is sparse (i.e., few edges and self-loops). Similar to vertex domain Laplacian quadratic form computation (1.5), we note that C Φ (x; q) can also be realized as a quadratic form: C Φ (x; q) = N X i=1 q i (φ > i x) 2 = x > Φ· diag(q)· Φ > | {z } Hq x, (5.17) 84 where H q can be viewed as a graph filter with frequency responseh q (λ i ) =q i . Thus, we can approximate H q with a sparse filter H ˆ q such that x > H ˆ q x approximatesC Φ (x; q). For example, if we consider a polynomial with degree 1 as in (5.2), we have x > " g 0 I + M X m=1 g m Z (m) # | {z } H ˆ q x = N X i=1 g 0 + M X m=1 g m λ (m) i ! | {z } ˆ q i (φ > i x) 2 . (5.18) The left hand side can be computed efficiently if there are only a few nonzerog m , making H ˆ q sparse. The right hand side can be viewed as a proxy of (5.15) if g m ’s are chosen such that ˆ q i ≈q i . Such coefficients g m can be obtained by solving (5.12) with h ∗ = q. 5.3.4 Complexity Analysis For a graph with N nodes and E edges, it has been shown in [16] that a degree-K PGF hasO(KE) complexity. For an MPGF with R terms, we denote E 0 the maximum sparsity of the operator among all operators involved. Each term of MPGF requires at mostO(KE 0 ) operations, so the overall complexity of an MPGF isO(KRE 0 ). We note that for DTT filters, the sparsity of all operators we have introduced is at most 2N. Thus, complexities of PGF and MPGF can be reduced toO(KN) andO(KRN), respectively. We note thatO(KRN) is not a tight upper bound for the complexity if many terms of the MPGF have lower degrees than K. In addition, the polynomial degree required by an MPGF to reach a similar accuracy as a PGF can achieve may be lower. Thus, an MPGF does not necessarily have higher complexity than a PGF that bring a similar approximation accuracy. Indeed, MPGF implementation may be further optimized by parallelizing the computation associated to different graph operators. 5.4 Experiments We consider two experiments to validate the filter design approaches. In Section 5.4.1, we evaluate the complexity of PGF and MPGF for DCT-II, and compare the trade- off between complexity and filter approximation accuracy as compared to conventional implementations in the DCT domain. In Section 5.4.2 we implement DTT filters in a state-of-the-art video encoder–AV1, where we obtain a computational speedup in trans- form type search. 85 0 1 2 3 4 5 Average runtime (sec) 10 -5 0 0.02 0.04 0.06 0.08 Root normalized mean square error K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 R=3 R=4 R=5 R=6 R=7 R=8 T=1 T=2 T=3 PGF, LS PGF, Chebyshev MPGF, K=1 ARMA, Q=2, P=2 Exact filter, fast DCT (a) Tikhonov, 16×16 grid 0 1 2 3 4 5 Average runtime (sec) 10 -5 0 0.2 0.4 0.6 0.8 Root normalized mean square error K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 R=2 R=3 R=4 R=5 R=6 R=7 R=8 T=1 T=2 T=3 PGF, LS PGF, Chebyshev MPGF, K=1 ARMA, Q=2, P=2 Exact filter, fast DCT (b) Bandpass exponential, 16×16 grid 0 2 4 6 8 Average runtime (sec) 10 -6 0 0.01 0.02 0.03 0.04 0.05 Root normalized mean square error K=1 K=2 K=3 K=4 K=5K=6 K=7 K=8K=9 K=10 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 R=2 R=3 R=4 R=5R=6R=7R=8 T=1 T=2 T=3 PGF, LS PGF, Chebyshev MPGF, K=1 ARMA, Q=2, P=2 Exact filter, fast DCT (c) Tikhonov, length-64 line graph 0 2 4 6 8 Average runtime (sec) 10 -6 0 0.2 0.4 0.6 0.8 Root normalized mean square error K=1 K=2 K=3 K=4 K=5 K=6K=7 K=8K=9 K=10 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 R=2 R=3 R=4 R=5 R=6 R=7 R=8 T=1 T=2 T=3 PGF, LS PGF, Chebyshev MPGF, K=1 ARMA, Q=2, P=2 Exact filter, fast DCT (d) Bandpass exponential, length-64 line graph Figure 5.8: Runtime vs approximation error for (a)(c) Tikhonov DCT filter, (b)(d) bandpass exponential DCT filter. Those filters are defined based on two different graphs: (a)(b) 16× 16 grid, (c)(d) length-64 line graph. Different PGF degrees K, MPGF operators involved R, and ARMA iteration numbers T , are labelled in the figures. 5.4.1 Filter Approximation Accuracy with Respect to Complexity In the first experiment, we implement several DCT filters on a 16× 16 grid, and a length-64 line graph. Those filters are implemented in C in order to fairly evaluate computational complexity under an environment close to hardware. Comparison among filter implementations. First, the following filters are considered: 86 • Tikhonov filter: given z = x + n, a noisy observation of signal x, the denoising problem can be formulated as a regularized least squares problem: minimize x kx− zk 2 +μx > Lx. The solution is given by ˆ x = H t x, where H t = (I + μL) −1 is known as the Tikhonov graph filter with frequency response h t (λ) = 1/(1 +μλ). Applications of the Tikhonov filter in graph signal processing include signal denoising [114], classification [78], and inter-predicted video coding [143]. • Bandpass exponential filter: bandpass graph filters are key components in M- channel graph filter banks [122,123]. Here, we consider the frequency response h exp (λ) = exp(−γ(λ−λ pb ) 2 ), where γ is a decaying factor and λ pb is the central frequency of the passband. For the choice of parameters, we use μ = 0.25, γ = 1, and λ pb = 0.5λ max in this experi- ment, to characterize a smooth Tikhonov filter and a bandpass filter with a symmetric frequency response, respectively. The following filter implementations are compared: • Polynomial DCT filter: given the desired frequency response, two design methods for PGF coefficients are considered. The first method is a least squares design (PGF, LS) [105] with iterative implementation described in Section 1.2.2. The second method is the Chebyshev polynomial approaches (PGF, Chebyshev) [115], where PGF is implemented based on recurrence relations of Chebyshev polynomi- als. • Multivariate polynomial DCT filter: we consider all sparse graph operators (289 operators for the 16× 16 grid and 65 operators for the length-64 line graph). Then, we obtain the least squares filter (5.10) with an ` 0 constraint and K = 1 using orthogonal matching pursuit, with R being 2 to 8. • Autoregressive moving average (ARMA) graph filter [51]: we consider an IIR graph filter in rational polynomial form, i.e., H ARMA = P X p=0 a p Z p −1 Q X q=0 b q Z q . 87 We choose polynomial degrees as Q = P = 2 and consider different numbers of iterations T . The graph filter implementation is based on the conjugate gradient approach described in [67], whose complexity isO((PT +Q)E). • Exact filter with fast DCT: the filter operation is performed by a cascade of a forward DCT, a frequency masking withh, and an inverse DCT, where the forward and inverse DCTs are implemented using well-known fast algorithms [9]. For 4× 4 or 16× 16 grids, 2D separable DCTs are implemented, where a fast 1D DCT is applied to all rows and columns. In LS designs, uniform weights ρ = 1 are used. For each graph we consider, 20000 random input signals are generated. The complexity for each graph filter method is then evaluated as an average runtime over all 20000 trials. We measure the error between approximate and exact frequency responses with the root normalized mean square error kh approx − hk/khk. We show in Figure 5.8 the resulting runtimes and errors, where a point closer to the origin correspond to a better trade-off between complexity and approximation accu- racy. We observe in Figure 5.8(a)(c) that low degree PGFs accurately approximate the Tikhonov filter, whose frequency response is closer to a linear function of λ. In Figure 5.8(b)(d), for bandpass exponential filter on the length-64 line graph, MPGF achieves a higher accuracy with lower complexity than PGF and ARMA graph filters. As discussed in Section 5.3.4, the complexity of PGF and MPGF grows linearly with the graph size, while the fast DCT algorithm hasO(N logN) complexity. Thus, PGF and MPGF would achieve a better speed performance with respect to exact filter when the graph size is larger. Note that in this experiment, a fast algorithm withO(N logN) complexity for the GFT (DCT-II) is available. However, this is not always true for arbi- trary graph size N, nor for other types of DTTs, where fast exact graph filter may not be available. Evaluation of minimax designs. Next, we consider an ideal low-pass filter: h LP (λ) = ( 1, 0≤λ≤λ c 0, otherwise where λ c = 0.5λ max is the cut-off frequency. The weight ρ i is chosen to be 0 in the transition band 0.4≤λ i ≤ 0.6, and 1 in passband and stopband. Figure 5.9 shows the resulting runtimes and approximation errors, which are measured with the maximum absolute error between approximate and desired frequency responses in passband and 88 0 1 2 3 4 5 Average runtime (sec) 10 -5 0 0.2 0.4 0.6 0.8 1 Maximum absolute error K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 R=2 R=3 R=4 R=5 R=6 R=7 R=8 R=2 R=3 R=4 R=5 R=6 R=7 R=8 PGF, LS PGF, minimax MPGF, LS, K=1 MPGF, minimax, K=1 Exact filter, fast DCT (a) Ideal low-pass, 16×16 grid 0 2 4 6 8 Average runtime (sec) 10 -6 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Maximum absolute error K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 R=2 R=3 R=4 R=5 R=6 R=7 R=8 R=2 R=3 R=4 R=5 R=6 R=7 R=8 PGF, LS PGF, minimax MPGF, LS, K=1 MPGF, minimax, K=1 Exact filter, fast DCT (b) Ideal low-pass, length-64 line graph Figure 5.9: Runtime vs maximum absolute error for various designs of ideal low-pass filter on (a) 16×16 grid, and (b) length-64 line graph. stopband: max i ρ i |h approx (λ i )−h(λ i )|. We can see in Figure 5.9 that, when K or R increases, the maximum absolute error steadily decreases in PGF and MPGF designs with minimax criteria. In contrast, PGF and MPGF designs with LS criterion may lead to non-monotonic behavior in terms of the maximum absolute error as in Figure 5.9(a). In fact, under the LS criterion, using more sparse operators will reduce the least squares error, but does not always decrease the maximum absolute error. Based on the results in Figures 5.8 and 5.9, we provide some remarks on the choice of DTT filter implementation: • If the desired frequency response is close to a linear function of λ, e.g., Tikhonov filters with a small μ or graph diffusion processes [116], then a low-order PGF would be sufficiently accurate, and has the lowest complexity. • If the graph size is small, or transform length allows a fast DTT algorithm, or when separable DTTs are available (e.g., on a 16×16 grid), DTT filter with fast DTT implementation would be favorable. • For a sufficiently large length (e.g., N = 64) and a frequency response that is non-smooth (e.g., ideal low-pass filter) or non-monotonic (e.g., bandpass filter), an MPGF design may fit the desired filter with a reasonable speed performance. In particular, we note that Z (2 ) is a bandpass filter with passband center λ pb = λ max /2. Thus, MPGF using Z (2) would provide an efficiency improvement for bandpass filters with λ pb close to λ max /2. • When robustness of the frequency response in the maximum absolute error sense is an important concern, a design based on minimax criterion would be preferable. 89 5.4.2 Transform Type Selection in Video Coding In the second experiment, we consider the quadratic form (5.15) as a transform type cost, and apply the method described in Section 5.3.3 to speed up transform type selection in the AV1 codec [10]. In transform coding [41], (5.15) can be used as a proxy of the bitrate cost for block-wise transform type optimization [33,50]. In particular, we denote x an image or video block, and Φ the orthogonal transform applied to x. Lower bitrate cost can be achieved if Φ gives a high energy compaction in the low frequencies, i.e., the energy of Φ > x is concentrated in the first few entries. Thus, the proxy of cost (5.15) can be defined with positive and increasing q (0 < q 1 <··· < q N ) to penalize large and high frequency coefficients, thus favoring transforms having more energy in the low frequencies. AV1 includes four 1D transforms: 1) U: DCT, 2) V: ADST, 3) JV: FLIPADST, which has flipped ADST functions, and 4) I: IDTX (identity transform), where no transform will be applied. For small inter predicted blocks, all 2D combinations of 1D transforms are used. Namely, there are 16 2D transforms candidates, (T col , T row ) with T col , T row ∈{U, V, JV, I}, which makes the encoder computationally expensive. Recent work on encoder complexity reduction includes [65,74,119], which apply heuristic and data-driven techniques to prune transform types during the search. To speed up transform type selection in AV1, for 1D pixel block x∈R N , we choose the following increasing weights for (5.15) 1 : q i =δ i = 2− 2 cos (i− 1 2 )π N ! . (5.19) Then, different transform type costs would be given by (5.15) with different Φ, i.e., C T (x; q) with T ∈ {U, V, JV, I}. This choice allows efficient computation of exact C V (x; q) andC JV (x; q) through their corresponding sparse Laplacian matrices: C V (x; q) = x > L A x, C JV (x; q) = x > JL A Jx, where JL A J is the left-right and up-down flipped version of L A . For the approximation of DCT costC U (x; q), we obtainR = 3 nonzeros polynomial coefficientsg m with degree 1 As (5.15) serves as a proxy of the actual bitrate cost, we leave out the search of optimal weights. Weights are chosen to be increasing functions because transform coefficients associated to a higher fre- quency typically requires more bits to encode. The weights in (5.19) are used because of their computa- tional forQV =x > i L A xi. In fact, we have observed experimentally that different choices among several increasing weights produce similar coding results. 90 Table 5.3: Encoding time and quality loss (in BD rate) of different transform pruning methods. The baseline is AV1 with a full transform search (no pruning). A smaller loss is better. Method Encoding time Quality loss PRUNE LAPLACIAN [74] 91.71% 0.32% PRUNE OPERATOR 89.05% 0.31% PRUNE 2D FAST [119] 86.78% 0.05% L = 1 as in (5.18) using an exhaustive search. As a result, costs for all 1D transforms can be computed in the pixel domain as follows Q U = x > i g 0 I + M X m=1 g m Z (m) DCT-II ! x i Q V =C V (x i ; q) = x > i L A x i Q JV =C JV (x i ; q) = x > i JL A Jx i Q I =C I (x i ; q) = X j w j x i (j) 2 , (5.20) where M is the number of DCT operators and g m has only R = 3 non-zero elements. Extending an experiment PRUNE LAPLACIAN in [74], we implemented a new experiment named PRUNE OPERATORS in AV1 2 . We implement the integer versions of the transform cost evaluation (5.20) for transform lengths 4, 8, 16, and 32. Within each 2D block, we take an average over all columns or rows, to obtain column and row costsQ (col) T andQ (row) T with T∈{U, V, JV, I}. Those costs are aggregated into 16 2D transform costs by summing the associated column and row costs. For example, the cost associated to vertical ADST and horizontal DCT is given by Q (V,U) =Q (col) V +Q (row) U . Finally, we design a pruning criteria, where each 2D column (or row) transform will be pruned if its associated cost is relatively large compared to the others. C1. For T col , T row ∈{U, V, JV}, prune (T col , T row ) if Q (T col ,Trow) >τ 1 Q (col) U +Q (col) V +Q (col) JV +Q (row) U +Q (row) V +Q (row) JV . C2. For T col = I or T row = I, prune (T col , T row ) if 2 The experiment has been implemented on a version in July 2020. Available: https:// aomedia-review.googlesource.com/c/aom/+/113461 91 Table 5.4: Encoding time and quality loss (in BD rate) of PRUNE OPERATORS versus PRUNE 2D FAST. Smaller or negative loss is better. Sequence Encoding time Quality loss akiyo 102.10% 0.00% bowing 97.22% -0.14% bus 103.92% -0.17% city 102.36% 0.18% crew 103.65% 0.07% foreman 104.29% 0.07% harbour 106.49% -0.06% ice 105.22% 0.30% mobile 103.27% 0.23% news 103.29% -0.09% pamphlet 97.75% 0.21% paris 105.54% 0.21% soccer 104.53% 0.22% students 100.71% 0.03% waterfall 102.34% 0.23% Overall 102.61% 0.26% Q (T col ,Trow) >τ 2 Q (col) U +Q (col) V +Q (col) JV +Q (col) I +Q (row) U +Q (row) V +Q (row) JV +Q (row) I , where threshold parameters are chosen as τ 1 = 0.34 ,τ 2 = 0.33. Note that the number of 1D transforms being pruned can be different for different blocks. The pruning rules C1 do not depend onQ I because IDTX tends to have a larger bitrate cost with a significantly lower computational complexity than the other transforms. Thus, more aggressive pruning criteria C1 is applied to U, V, and JV to reduce more encoding time. This pruning scheme is evaluated using 15 benchmark test sequences: akiyo, bowing, bus, city, crew, foreman, harbour, ice, mobile, news, pamphlet, paris, soccer, students, and waterfall. The results are shown in Table 5.3, where the speed improve- ment is measured in the percentage of encoding time compared to the scheme without any pruning. Each number in the table is an average over several target bitrate levels: 300, 600, 1000, 1500, 2000, 2500, and 3000 kbps. Note that the proposed method yields a smaller quality loss with shorter encoding time than in our previous work [74]. Our method does not outperform the state-of-the-art methods PRUNE 2D FAST in terms of the average BD rate, but shows a gain in particular video sequences such as bowing (as shown in Table 5.4. Note that in [119], for each supported block size (N×N, N× 2N and 2N×N, with N∈{4, 8, 16}), a specific neural network is required to obtain the scores, involving more than 5000 parameters to be learned in total. In contrast, our 92 approach only requires the weights q to be determined for each transform length, requir- ing 4 + 8 + 16 + 32 = 60 parameters. With or without optimized weights, our model is more interpretable than the neural-network-based model, as has a significantly smaller number of parameters, whose meaning can be readily explained. 5.5 Summary In this chapter, we explored discrete trigonometric transform (DTT) filtering approaches using sparse graph operators. First, we introduced fundamental graph operators asso- ciated to 8 DCTs and 8 DSTs by exploiting trigonometric properties of their transform bases. We also showed that these sparse operators can be extended to 2D separable transforms involving 1D DTTs. Considering a weighted setting for frequency response approximation, we proposed least squares and minimax approaches for both polynomial graph filter (PGF) and multivariate polynomial graph filter (MPGF) designs. We demon- strated through an experiment that PGF and MPGF designs would provide a speedup compared to traditional DTT filter implemented in transform domain. We also used MPGF to design a speedup technique for transform type selection in a video encoder, where a significant complexity reduction can be obtained. 93 Chapter 6 Irregularity-Aware GFT for Image and Video Coding Mean square error (MSE) is commonly used as quality metric in many image and video coding standards. However, as it is well-known that MSE does not always reflect perceptual quality, it is important to incorporate a perceptually-driven metric into the coding optimization process. Based on such a metric, it would be possible to formu- late a bit allocation problem with the goal of spending more bits on image regions that are perceptually more sensitive to quantization error. In the literature, this problem is typically addressed by designing perceptual quantization strategies. For example, JPEG quantization tables can be designed based on human visual system (HVS) [133], while JPEG-2000 adopts a visual masking technique [141] that exploits self-contrast masking and neighborhood masking, leading to adaptive quantization of wavelet coefficients with- out any overhead. Quantization parameter (QP) adjustment is a popular approach in video codecs such as HEVC [120], in which QP is changed per block or per coding unit. In this chapter, we propose a novel approach based on designing transforms with the goal of optimizing a weighted mean square error (WMSE), which allows us to adapt the perceptual quality pixel-wise instead of block-wise. We make use irregularity-aware graph Fourier transforms (IAGFTs) [36], generalized GFTs where orthogonality is defined with respect to an inner product such that distance between a signal and a noisy version corresponds to a WMSE instead of the MSE. This leads to a generalized Parseval’s Theorem, in which the quantization error energy in the IAGFT transform domain is the same as the pixel domain WMSE. Based on the IAGFT, we design an image coding framework, where perceptual quality is characterized by choosing suitable weights for the WMSE. Under this framework, the overall perceptual quality of an image can be enhanced by weighting different pixels differently based on their perceptual importance, while the quantization step size is fixed for the entire image. We consider a noise model, under which the WMSE weights are chosen to maximize the structural similarity (SSIM) [135]. Besides WMSE weights driven by SSIM, other perceptual weights may also be used Work in this chapter has been published in [76]. 94 to optimize other metrics. We demonstrate experimentally the benefits of our framework by modifying a JPEG encoder to incorporate these novel transforms, showing coding gains in terms of multi-scale SSIM [137]. In Section 6.1 we give a summary of IAGFT. In Section 6.2, we propose a weight design for IAGFT that favors improved SSIM. Some properties of IAGFT basis are discussed in Section 6.3. In Section 6.4, we demonstrate of perceptually driven IAGFT through experimental results. Finally we conclude this chapter in Section 6.5. 6.1 Irregularity-aware graph Fourier transform (IAGFT) The IAGFT [36] is a generalization of the GFT, where the graph Fourier modes (i.e., GFT basis functions) are determined not only by the signal variation operator L, but also by a positive definite matrix Q that leads to a Q−inner product [36]:hx, yi Q = x > Qy, and therefore to a new definition of orthogonality: x is orthogonal to y if and only if x > Qy = 0. Typically, Q is chosen to be diagonal, so that the energy of signal x is a weighted sum of its squared components: kxk 2 Q = P i∈V q i |x i | 2 , with Q = diag(q 1 ,...,q n ). The notion of generalized energy leads to a generalized GFT, i.e., the IAGFT: Definition 4 (Generalized graph Fourier modes). Given the Hilbert space defined by the Q−inner product and a graph variation operator L, the set of (L, Q)−graph Fourier modes is defined as the solution{u k } k to the following sequence of minimization prob- lems: for increasing K∈{1,...,N}, minimize u K u > K Lu K subject to U > K QU K = I, (6.1) where U K = (u 1 ,..., u K ). Definition 5 (Irregularity-aware GFT). Let U be the matrix of (L, Q)−graph Fourier modes, the (L, Q)−GFT is F = U > Q and its inverse is F −1 = U. In fact, (6.1) can be written as a generalized Rayleigh quotient minimization, whose solution can be obtained efficiently through the generalized eigenvalue problem. Note that when Q = I, F reduces to conventional GFT as in Section 1.2.1. One key property of the IAGFT is the generalized Parseval’s theorem: hx, yi Q =hˆ x, ˆ yi I , (6.2) with ˆ x and ˆ y being the (L, Q)−GFT of x and y, respectively. Depending on the appli- cation, various choices of Q may be used. Examples include diagonal matrices of node degrees and Voronoi cell areas (refer to [36] for details). 95 Figure 6.1: Flow diagrams of JPEG encoder. 6.2 Perceptual coding with Weighted MSE We focus on the weighted mean square error (WMSE) as an image quality metric, where different pixels are associated with different weights. First, in Section 6.2.1, we design a transform coding scheme that optimizes the WMSE, and analyze its error in pixel domain. We focus on the choice of perceptual quality inspired weights in Section 6.2.2. 6.2.1 Transform Coding with IAGFT We define the WMSE with weights q∈ R n (or, in short, q-MSE) between a distorted signal z∈R n and its reference signal x∈R n as WMSE(z, x, q) := 1 n n X i=1 q i (z i −x i ) 2 = 1 n hz− x, z− xi Q , (6.3) where Q = diag(q). When q = 1, i.e., Q = I, the WMSE reduces to the conventional MSE. We note that the right hand side of (6.3) is a Q-inner product, so the generalized Parseval’s Theorem gives WMSE(z, x, q) = 1 n hˆ z− ˆ x, ˆ z− ˆ xi I = 1 n n X i=1 (ˆ z i − ˆ x i ) 2 . This means that minimizing the q-MSE is equivalent to minimizing the ` 2 error energy in the IAGFT domain. Based on this fact, we propose an image coding scheme that integrates IAGFT into the JPEG framework. The diagram is shown in Figure 6.1, where the values in Q are quantized and transmitted as signaling overhead for the decoder to uniquely reconstruct the image. Further details for implementation will be described in Section 6.4. Next we provide a pixel domain error analysis under uniform quantization noise assumption. Let ε p and ε t be the vectors of errors within a block in the pixel and IAGFT domain, respectively. Then, the variance of the i-th element inε p is E h ε p (i) 2 i =E h (e > i Uε t ) 2 i = tr U > e i e > i U·E h ε t ε > t i , 96 where e i is the i-th standard basis. Denote the quantization step size for the j-th transform coefficient as Δ j and model the quantization noise with uniform distribution ε t (i)∼ Unif(−Δ i /2, Δ i /2). Thus, we have E h ε t ε > t i = diag(Δ 2 1 ,..., Δ 2 n )/12. When a uniform quantizer with Δ i = Δ is used for all i, E h ε p (i) 2 i = Δ 2 12 tr U > e i e > i U = Δ 2 12 tr e i e > i Q −1 = Δ 2 12q i , (6.4) where we have used UU > = Q −1 , which follows from the fact that U > (QU) = I = (QU)U > . With (6.4), we know that the expected WMSE for this block is E [WMSE(z, x, q)] = 1 n n X i=1 q i E h ε p (i) 2 i = Δ 2 12 , which only depends on the quantization step. Note that the scheme shown in Figure 6.1 can be viewed as a bit allocation method. When q j = 2q i for pixels i and j within a block, the quantization error of IAGFT coefficients tends to contribute more error to pixel j than to pixeli in the pixel domain. Implicitly, this indicates that more bits are spent to accurately encode pixel j. On the other hand, if Q ` = 2Q k for blocks k and `, we can show that the IAGFT coefficients will satisfy x ` = √ 2x k , meaning that the encoder tends to use more bits for block` than for block k. 6.2.2 SSIM-Driven Weights for WMSE In this work, we adopt the structural similarity (SSIM) as the target metric for perceptual quality, and design a WMSE to optimize it 1 . SSIM is one of the most popular image quality assessment metrics, with many experiments demonstrating its better alignment with perceptual visual quality as compared to MSE [135]. For a distorted image z and its reference image x, the definition of SSIM is SSIM(x, z) = 1 n n X i=1 SSIM(x i ,z i ), SSIM(x i ,z i ) = 2μ x i μ z i +c 1 μ 2 x i +μ 2 z i +c 1 · 2σ x i z i +c 2 σ 2 x i +σ 2 z i +c 2 , (6.5) 1 In fact, our proposed method can be applied for any arbitrary WMSE. The application of this method based on other metrics such as Video Multimethod Assessment Fusion (VMAF) is considered for future work. 97 Figure 6.2: Values of q i with respect to local variance, with Δ = 8. where μ x i and σ 2 x i are local mean and variance around pixel i and the summation is taken over all n pixels in the image. We denote z = x +ε p , and assume that x and ε t are independent. Based on the statistics of ε p derived in Section 6.2.1, we have μ z i = μ x i , σ 2 x i z i = σ 2 x i , and σ 2 z i = σ 2 x i + Δ 2 /12q i . Thus, the local SSIM in (6.5) reduces to SSIM(x i ,z i ) = 2σ 2 x i +c 2 2σ 2 x i +c 2 + Δ 2 /(12q i ) = q i q i +γ i ,. where γ i = Δ 2 /12(2σ 2 x i +c 2 ). To obtain q that maximizes the SSIM, we introduce an upper bound for P i q i as a proxy of the bitrate constraint, and solve maximize q 1 n n X i=1 q i q i +γ i subject to n X i=1 q i ≤n. (6.6) It can be shown that this problem is convex in q. Using the Lagrangian cost function and Karush-Kuhn-Tucker (KKT) conditions, we can obtain a closed-form solution: q i = (n + P n i=1 γ i ) √ γ i P n i=1 √ γ i −γ i . (6.7) While q is a high dimensional vector (with dimension n, the number of pixels in the image), (6.7) provides an efficient way to obtain the optimal solution, which only depends on the quantization step and local variance. The computation of all q i can be 98 (a) (b) (c) Figure 6.3: (a) Basis, (b) Q 0 -energy and (c) variations of Q 0 -IAGFT modes. We use j to denote indices of basis functions. In (b) and (c), two curves represent quantities for pixels with q i = 1.6 and with q i = 0.4, respectively. carried out inO(n) time. Figure 6.2 shows the resulting ofq i with different local variance values and a fixed quantization step size. The fact thatq i decreases with respect to local variance means that larger weights are used for pixels in uniform or smooth regions of the image, which in turn results in higher quality in those regions. 6.3 IAGFT Transform Bases In this section, we provide some remarks on IAGFT basis functions. First, we consider the case with Q =kI, where the IAGFT coefficients are √ k times the DCT coefficients. In this case, when we apply a kI-IAGFT followed by a uniform quantization with step size Δ, it is equivalent to applying a DCT followed by a quantization with step size √ kΔ. Therefore, this special case reduces to a block-wise quantization step adjustment scheme, as in related work such as [48]. This means that our scheme can be viewed as a generalization of quantization step adjustment method, and it can adapt the quality per pixel, which is finer than a per block adaptation. 99 Figure 6.4: Flow diagrams of our proposed scheme. Blocks highlighted in blue are new components. Figure 6.5: Vector quantization codewords of q i for 8× 8 blocks. As a second example, we show the 2D Q 0 -IAGFT basis in Figure 6.3(a), where Q 0 is the diagonal matrix associated to a 4× 4 block with WMSE weights: 1.6 1.6 1.6 1.6 1.6 1.6 1.6 0.4 1.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 . Note that the pixels in the top left corner have larger weights, while the weights sum to 16 as in the I-IAGFT (i.e., DCT). In Figure 6.3(b)(c) we show the Q 0 -energy and variation of each basis function, within top-left regions (q i = 1.6) and within bottom-right regions (q i = 0.4). By definition of IAGFT (Definition 5), functions u j with increasingj would correspond to lower to higher variations u > j Lu j , while having the same Q 0 -energy u > j Q 0 u j . In Figure 6.3(b) we observe that those basis functions corresponding to low to medium frequencies (i.e., u 1 to u 9 ) have increasing variations for pixels in the top left region. In fact, the Q 0 -energy of u 1 to u 9 are highly concentrated to pixels with a larger q. On the other hand, other basis functions associated to higher frequencies (u 11 to u 16 ) are localized to the lower right corner, where q i = 0.4. This means that in the pixel domain, the energy in the area with large q i will be mostly captured by low frequency IAGFT coefficients. As a result, when the importance of pixel i is characterized withq i , those IAGFT basis function with lower frequencies would tend to represent information on more important pixels. 100 (a) (b) (c) (d) Figure 6.6: An example: (a) original image, (b) local variance map (c) q i map, and (d) quantized q i map with vector quantization. 6.4 Experiments 6.4.1 Image Coding Results on JPEG To demonstrate the effectiveness of the proposed framework, we apply the coding scheme illustrated in Figure 6.4 in JPEG. Note that the non-uniform quantization table in JPEG standard was designed based on perceptual criteria for DCT coefficients. We propose a similar non-uniform quantization for IAGFTs as follows. For an IAGFT with basis functions u k , we find the unique representations in DCT domain, denoted as u k = P n i=1 φ ki v i with v i being the i-th DCT basis vector. Then, we choose quantization step associated to u k as a weighted mean: Δ k = P n i=1 |φ ki |Δ i , where Δ i is the quantization step associated with v i . In this way, when u k has low frequency in DCT domain, a small quantization step size will be used, and vice versa. The Laplacian of a uniform grid graph is used as variation operator to define the IAGFT. The weights q for WMSE are first 101 (a) (b) Figure 6.7: RD curves for Airplane image in (a) PSNR and (b) MS-SSIM. obtained from (6.7), then quantized using entropy-constrained vector quantization (VQ) [35] with 10 codewords trained from the 8× 8 blocks of q i values in house image. The resulting codewords are shown in Figure 6.5, and signaling overhead for each codeword is based on the selection frequency during VQ training. In particular, the number of bits for each codeword isd−p log 2 (p)e, where p is the empirical probability obtained for this codeword during the training procedure. For each 8× 8 block of testing images, we quantize the corresponding q to the closest codeword, and apply the associated 8× 8 non-separable IAGFT. We assume that transform bases and quantization tables for the IAGFTs represented by the codewords are embedded in the codec, so we do not require eigen-decomposition or side information for bases and quantizers. For illustration, Figure 6.6(b) shows the local variance map obtained as in SSIM formula (6.5), and Figure 6.6(c)(d) show that resulting q i obtained from (6.7) and the quantized q i with VQ, respectively. The RD performances in terms of PSNR and multi-scale SSIM (MS-SSIM) [137] are shown in Figure 6.7. We observe that the proposed scheme leads to a loss in PSNR, while giving a compression gain in MS-SSIM. In Table 6.1, we show the BD rate performance in PSNR, SSIM, and MS-SSIM for several benchmark images. By comparing the “NU, DCT” and “NU, IAGFT” rows, we note that with non-uniform quantization, the IAGFT- based scheme yields a MS-SSIM gain compared to the DCT-based scheme (i.e., standard JPEG), even though (6.6) was derived under a uniform quantization assumption. One probable reason for a higher gain in MS-SSIM than in SSIM is that q i ’s have been smoothed by the vector quantizer. Thus, the quantized q i ’s are no longer optimal for SSIM. Alternatively, such a smoothing procedure may potentially favor the MS-SSIM metric, which takes into account the quality of smoothed and downscaled versions of 102 Mandrill Lena Airplane Sailboat Uniform, IAGFT PSNR 14.78% 17.64% 20.40% 17.69% SSIM 1.28% 2.18% 3.81% 1.64% MS-SSIM -2.09% -2.25% -8.18% -7.70% Non-uniform, DCT PSNR 8.42% 5.14% 8.90% 3.82% SSIM -6.23% -5.22% -9.00% -10.64% MS-SSIM -27.82% -17.05% -17.76% -30.27% Non-uniform, IAGFT PSNR 22.64% 17.96% 26.14% 14.36% SSIM -2.05% -1.99% -2.82% -6.19% MS-SSIM -28.15% -17.74% -21.28% -31.74% Table 6.1: Bit rate comparison with respect to DCT-based scheme with uniform quanti- zation table (i.e., “Uniform, DCT”). Negative numbers correspond to compression gains. Uniform and Non-uniform refer to uniform and non-uniform quantization tables, respec- tively. the original output image. We also note that for this experiment, the side information accounts for 6% to 8% of the bit rates. Thus, further optimization for signaling overhead may lead to a coding gain in SSIM. 6.4.2 Subjective Comparison–An Example Here, we show an example of output images using DCT-based and IAGFT-based schemes in Figure 6.8. Those image patches are chosen to have similar bitrates, but the IAGFT encoded image has a better quality in SSIM and MS-SSIM. We can compare these images in two different regions circled in blue (smooth region) and yellow (texture region) boxes. In the smooth region, the q i values are larger, meaning that the IAGFT scheme spends more bits on this region and produces smaller error. Indeed, less blocking artifacts are observed in the blue region in the IAGFT encoded image compared to the same region in the DCT encoded image. On the contrary, q i values in the texture region are smaller, so IAGFT produces a larger error in terms of the MSE, compared to the same region in the DCT encoded image. Perceptually, the reconstruction error in texture region would be less visible as that in a smooth region. Such a difference in perceptual sensitivity to distortion demonstrates how different weights q i in different pixels lead to an improvement in human perceptual quality. 6.5 Conclusion In this chapter, we consider weighted mean square error (WMSE) as an image quality metric, as a generalization of the widely used MSE. By selecting proper weights, WMSE offers a high flexibility and enables us to better characterize human perceptual quality 103 Figure 6.8: Encoded image patches of (a) DCT and (b) IAGFT. than the MSE. In order to optimize WMSE-based image quality, we proposed a novel image coding scheme using irregularity-aware graph Fourier transform (IAGFT). Based on the generalized Parseval’s theorem, we have shown the optimality in terms of WMSE when uniform quantization is performed in the IAGFT domain. We then design weights to maximize the structural similarity (SSIM), where the weights are determined by the local image variances and quantization step size. When integrated into JPEG standard, our method with the associated SSIM-driven WMSE can provide a compression gain in MS-SSIM. In the future, we will extend this method to schemes with non-uniform quantization and consider the integration into existing video codecs. 104 Chapter 7 Conclusion and Future Work 7.1 Summary of this Thesis The primary goal of this thesis was to provide efficient graph-based approaches, including speedup techniques for the graph Fourier transform and efficient graph filtering. Among possible applications, we focused primarily on image and video compression, where graph Fourier transform (GFT) and its variations (such as lapped GFT and irregularity aware GFT) can be applied in transform coding. Firstly, in Chapter 2, we explored fast GFTs with Haar units as components. In particular, algebraic conditions for a graph such that the Haar units emerge in its GFT were derived. Those conditions indicate that butterfly stages of Haar units arise when the graph have symmetry or bipartition properties. We then defined a notion of graph symmetry, such that a graph with this symmetry property is guaranteed to have a fast GFT algorithm. We presented an approach to derive divide-and-conquer fast GFT algorithm, and show a number of examples with potential fields of application. Our fast GFT approach is suited for graphs with regular or nearly regular topologies, uniformly weighted graphs, or graph that are symmetric by construction. Experimental results showed that the GFT run time can be reduced significantly using the proposed fast GFTs. In Chapter 3 we applied efficient graph-based transforms to video coding in different aspects. We formulated a general convex problem for solving this graph learning problem. Under this framework, we considered the so-called symmetric line graph transforms (SLGT), which are 1D transforms with a butterfly stage. Then, we applied the learning method to non-separable transforms, i.e., GFTs on grid graphs, and then propose an approximate fast GFT solution when the butterfly stages are not known ahead of time. Experimental results showed that, by learning parameters of the symmetric line graph, we obtained useful SLGTs that give a compression gain for inter residual blocks. In addition, we can also obtain non-separable GFTs to approximate KLTs, which, when applied to intra residual block with some certain modes, can improve the compression efficiency. 105 In addition to block-based GFTs, we have also extended classical lapped transforms (transforms on overlapping blocks) to a graph-based setting in Chapter 4. Perfect recon- struction and orthogonality properties were re-examined for lapped transforms defined on graphs. With the focus on line graphs, we proposed the design of lapped graph Fourier transform, which leads to a nearly optimal transform coding gain, and can be applied to piecewise smooth image compression. We showed experimentally that the pro- posed lapped transform outperforms the DCT and the conventional lapped orthogonal transform. We investigated in Chapter 5 DTT (DCT and DST) filters in a graph-based perspec- tive. We showed that all types of DCTs and DSTs are associated to a family of sparse graph operators, which are DTT filters by themselves. Thus, those operators allow us to design efficient approximation of DTT filters with arbitrary frequency responses using 1) polynomial graph filter (PGF) and 2) multivariate polynomial graph filter (MPGF). We introduced design approaches of both PGF and MPGF with least squares and minimax criteria, to obtain efficient DTT filters. With experiments implemented in C, the derived DCT and DST filters yields a speedup in popular graph filters–Tikhonov filter, band- pass exponential filter, and ideal low-pass filter, when graph size is sufficiently large. Our method can also be applied to rate-distortion optimization for transform type search, yield a significant speed up in the AV1 codec. Finally, in Chapter 6, we studied the application of irregularity aware GFT (IAGFT) for perceptual coding. In particular, we pointed out the fact the IAGFT has orthogonal- ity property with respect to a weighted mean square error (WMSE), which can be used to characterize perceptual image quality. Given a certain WMSE, the generalized Parseval’s theorem implies that minimum WMSE distortion can be achieved when uniform quanti- zation is applied in the IAGFT domain. Based on these properties, we designed WMSE weights to align with the well-known structural similarity (SSIM) index for perceptual image quality. For validation, a coding scheme is built on top of the JPEG encoder, where IAGFT with the SSIM-driven weights is applied instead of the traditional DCT. Our preliminary results showed a promising coding gain in MS-SSIM. 7.2 Future Work For future work, we consider the following issues or extensions: 1. Lapped GFTs for more general graphs and applications. In Section 4.2.2, we have demonstrated some results on piecewise smooth images. In fact, the definition of the lapped transforms on graphs can be applied to general graph topologies. Thus, we would like to explore a wider range of applications, and with different 106 types of graphs. For example, image filtering can be an application because con- ventional graph-based filtering cannot be easily applied without partitioning the image, especially when the number of pixels is large. 2. Extensions for IAGFT in perceptual video coding. With the generalized Parseval’s theorem, we can design a variety of useful transforms based on different WMSEs. Choices of weights that lead to perceptual quality enhancement, or other video coding benefits, would be an interesting problem. For example, larger weights can be chosen for pixels that are used to predict many other pixels, to achieve higher prediction accuracy. Another extension would be exploring efficient methods to reduce signaling overhead. For example, weights for the WMSE may be derived from information that are available to both encoder and decoder, so that no extra bits will be required. 107 Appendix A Proofs of Theorems and Lemmas A.1 Proof of Theorem 2 In the following, we will repeatedly use (2.15) and (2.16). Also recall that in (2.11), L + = L XX + L XY J √ 2L XZ √ 2L > XZ L ZZ ! , (A.1) L − = L YY − JL XY . (A.2) From the block partition structure (2.6), we also have (L XX ) i,j =l i,j , (L XZ ) i,j =l i,p+j , (L XY ) i,j =l i,n−p+j , (L ZZ ) i,j =l p+i,p+j , (L YY ) i,j =l n−p+i,n−p+j , (JL XY ) i,j =l p+1−i,n−p+j , (L XY J) i,j =l i,n+1−j . Such changes of indices will be used in the derivations below. • Edges ofG + . Withi,j∈V + ={1,...,n−p} andi6=j, we discuss three different cases separately, all based on (A.1). If i,j∈V X ={1,...,p}, then w + i,j =− (L XX + L XY J) i,j =−(l i,j +l i,n+1−j ) =w i,j +w i,n+1−j . If i∈V X ={1,...,p} and j∈V Z ={p + 1,...,n−p}, then w + i,j =− √ 2L XZ i,j−p =− √ 2l i,j = √ 2w i,j , and the same holds for i∈V Z and j∈V X . If i,j∈V Z ={p + 1,...,n−p}, then w + i,j =− (L ZZ ) i−p,j−p =−l i,j =w i,j . 108 • Self-loops ofG + . To express s + i , we discuss the cases with i∈V X and i∈V Z separately. If i∈V X ={1,...,p}, then from (A.1) we have s + i = p X j=1 (L XX + L XY J) i,j + n−2p X j=1 √ 2L XZ i,j = p X j=1 (l i,j +l i,n+1−j ) + √ 2 n−2p X j=1 l i,p+j =s i + n X j=1 j6=i w i,j | {z } l i,i + − p X j=1 j6=i w i,j | {z } l i,j with i6=j − p X j=1 w i,n+1−j − √ 2 n−2p X j=1 w i,p+j =s i − ( √ 2− 1) n−p X j=p+1 w i,j . If i∈V Z ={p + 1,...,n−p}, then (A.1) gives s + i = n−2p X j=1 (L ZZ ) i−p,j + p X j=1 √ 2L XZ i−p,j = n−2p X j=1 l i,p+j + √ 2 p X j=1 l j,i =s i + n X j=1 j6=i w i,j | {z } l i,i + − n−p X j=p+1 j6=i w i,j | {z } l i,j with i6=j − √ 2 p X j=1 w i,j =s i + p X j=1 w i,j + n X j=n−p+1 w i,j − √ 2 p X j=1 w i,j =s i + (2− √ 2) p X j=1 w i,j • Edges ofG − . With i,j∈V Y ={n−p + 1,...,n} and i6=j, from (A.2) we have w − i,j =− (L YY − JL XY ) i−n+p,j−n+p =−(l i,j −l n+1−i,j ) =w i,j −w n+1−i,j . • Self-loops ofG − . Here, we have i∈V Y ={n−p + 1,...,n}, so, by (A.2), s − i = p X j=1 (L YY − JL XY ) i−n+p,j = p X j=1 (l i,n−p+j −l n+1−i,n−p+j ) =s i + n X j=1 j6=i w i,j | {z } l i,i + − n X j=n−p+1 j6=i w i,j | {z } l i,j with i6=j + p X j=1 w i,p+1−j =s i + 2 p X j=1 w i,j + n−p X j=p+1 w i,j . 109 The results derived above apply to the case when φ = (n,n− 1,..., 1). For any arbitrary φ, we can simply modify the sub-indices based on φ. Thus, the results above can be written as in Lemma 2. A.2 Derivations for Sparse Operators of DST-IV, DST- VII, and DCT-V A.2.1 Sparse DST-IV Operators Recall the definition of DST-IV functions as in (1.11): φ j (k) =v j (k) = r 2 N sin (j− 1 2 )(k− 1 2 )π N As in Section 5.2.1, we can obtain v j (p−`) +v j (p +`) = r 2 N " sin (j− 1 2 )(p−`− 1 2 )π N + sin (j− 1 2 )(p−`− 1 2 )π N # = 2 r 2 N sin (j− 1 2 )(p−`− 1 2 )π N cos `(j− 1 2 )π N = 2 cos `(j− 1 2 )π N ! v j (p), where we have applied the trigonometric identity sinα + sinβ = 2 sin α +β 2 cos α−β 2 . (A.3) By the left and right boundary condition of DST-IV, we have v j (p−`) =−v j (−p +` + 1), v j (p +`) =v j (−p−` + 2N + 1). which gives the following result: Proposition 3. For` = 1,...,N−1, we define Z (`) DST-IV as aN×N matrix, whosep-th row has only two non-zero elements specified as follows: Z (`) DST-IV p,q 1 = ( 1 with q 1 =p−`, if p−`≥ 1 −1 with q 1 =−p +` + 1, otherwise , Z (`) DST-IV p,q 2 = 1, q 2 = ( p +`, if p +`≤N −p−` + 2N + 1, otherwise 110 The corresponding eigenvalues are λ j = 2 cos `(j− 1 2 )π N . A.2.2 Sparse DST-VII Operators Now, we consider the basis function of DST-VII: φ j (k) = 2 √ 2N + 1 sin j− 1 2 kπ N + 1 2 . Then, by (A.3) we have φ j (p−`) +φ j (p +`) = 2 √ 2N + 1 sin j− 1 2 (p−`)π N + 1 2 + sin j− 1 2 (p +`)π N + 1 2 = 2 √ 2N + 1 2 sin j− 1 2 pπ N + 1 2 cos ` j− 1 2 π N + 1 2 = 2 cos ` j− 1 2 π N + 1 2 φ j (p) (A.4) The left boundary condition (i.e., φ j (k) = −φ j (−k)) of DST-VII corresponds to φ j (p−`) =−φ j (−p +`). Together with the right boundary condition φ j (p +`) = φ j (−p−` + 2N + 1), we have the following proposition. Proposition 4. For ` = 1,...,N− 1, we define Z (`) DST-VII as a N×N matrix, whose p-th row has at most two non-zero elements specified as follows: Z (`) DST-VII p,q 1 = ( 1 with q 1 =p−`, if p>` −1 with q 1 =−p +`, if p<` , Z (`) DST-VII p,q 2 = 1, q 2 = ( p +`, if p +`≤N −p−` + 2N + 1, otherwise The corresponding eigenvalues are λ j = 2 cos `(j− 1 2 )π N+ 1 2 . In Proposition 4, note that the`-th row has only one nonzero element because when p =`, φ j (p−`) = 0, and (A.4) reduces to φ j (p +`) = 2 cos ` j− 1 2 π N + 1 2 φ j (p). 111 A.2.3 Sparse DCT-V Operators Here, φ j are defined as DCT-V basis functions φ j (k) = 2 √ 2N− 1 c j c k sin (j− 1)(k− 1)π N− 1 2 . Note that c k = 1/ √ 2 for k = 1 and 1 otherwise. For the trigonometric identity (5.7) to be applied, we introduce a scaling factor such that b k c k = 1 for all k: b k = (√ 2, k = 1 1, otherwise . In this way, by (5.7) we have b p−` ·φ j (p−`) +b p+` ·φ j (p +`) = 2 √ 2N− 1 c j " cos (j− 1)(p−`− 1)π N− 1 2 + cos (j− 1)(p +`− 1)π N− 1 2 # = 2 √ 2N− 1 c j 2 cos (j− 1)(p− 1)π N− 1 2 cos `(j− 1)π N− 1 2 = 2 cos `(j− 1)π N− 1 2 ! b p φ j (p), so this eigenvalue equation can be written as b p−` b p ·φ j (p−`) + b p+` b p ·φ j (p +`) = 2 cos `(j− 1)π N− 1 2 ! φ j (p). (A.5) The left boundary condition of DCT-V corresponds to φ j (p−`) = φ j (−p +` + 2), and the right boundary condition gives φ j (p +`) = φ j (−p−` + 2N + 1). Thus, (A.5) yields the following proposition: Proposition 5. For ` = 1,...,N− 1, we define Z (`) DCT-V as aN×N matrix, whose p-th row has at most two non-zero elements specified as follows: Z (`) DCT-V p,q 1 = √ 2 with q 1 = 1, if p−` = 1 1 with q 1 =p−`, if p−`> 1 1 with q 1 =−p +` + 2, if p−`≤ 0,p6= 1 , Z (`) DCT-V p,q 2 = √ 2 with q 2 = 1, p = 1 1 with q 2 =p +`, p6= 1,p +`≤N 1 with q 2 =−p−` + 2N + 1, otherwise 112 The corresponding eigenvalues are λ j = 2 cos `(j−1)π N− 1 2 . The values of √ 2 in Proposition 5 arise fromb p−` /b p andb p+` /b p in the LHS of (A.5). In particular, when p =` + 1 we have b p−` /b p = √ 2 and b p+` /b p = 1, so (A.5) gives √ 2φ j (p−`) +φ j (p +`) = 2 cos `(j− 1)π N− 1 2 ! φ j (p). When p = 1, b p−` /b p = b p+` /b p = 1/ √ 2. In addition, by the left boundary condition, φ j (p−`) =φ j (p +`), so (A.5) reduces to √ 2φ j (p +`) = 2 cos `(j− 1)π N− 1 2 ! φ j (p), for p = 1. meaning that the first row of Z (`) DCT-V has one nonzero element only. A.3 Proof of Theorem 3 Denote the objective function of the CGL estimation problem (3.4) asJ (L). It have been shown in [25] that J (L) :=− log|L| † + trace(LS) +αkLk 1,off =− log|L| † + trace(LK), (A.6) where K = S +α(I− 11 > ). The derivation from the function in (3.4) to (A.6) is based on the facts thatkLk 1,off = trace(L(I− 11 > )) when L is a CGL. First, we consider the oriented incidence matrix of a graph: Ξ = (ξ 1 ,...,ξ m )∈ R n×m . Each of its columnsξ j has two nonzero elements, 1 and -1, representing an edge ε j connecting nodes s j and t j : ξ j = e s j − e t j for ε j = (s j ,t j )∈E, where e i is the i-th vector of the standard basis, that is, e i (i) = 1 and e i (j) = 0, j6=i. We can express L in terms of the edge weights and Ξ: L = m X j=1 w s j ,t j ξ j ξ > j = Ξ· diag(w s 1 ,t 1 ,...,w sm,tm )· Ξ > , (A.7) where diag(w s 1 ,t 1 ,...,w sm,tm ) is the m×m diagonal matrix formed by w s i ,t i . Note that, as ε j is undirected, eitherξ j = e s j − e t j orξ j = e t j − e s j is allowed ((A.7) holds either way, and the choice does not affect the following results). For simplicity, we define 113 u j :=w s j ,t j to represent weights with a single subscript, and also define a weight vector as u = (u 1 ,...,u m ) > for compact notation. Next, we note that|L| † can be simplified using Kirchhoff’s matrix tree theorem: Theorem 4 (Matrix Tree Theorem [56]). For a CGL L of a connected graphG, we have |L| † =n X T⊆P Y ε∈T w ε , whereP is the set of all spanning trees ofG and w ε is the weight of edge ε. By the assumption thatG is a tree, the only spanning tree ofG is itself. Thus, Theorem 4 implies that |L| † =n m Y j=1 u . (A.8) Now, we can reduce (A.6) to J (u) =− log n m Y j=1 u j + trace m X j=1 u j ξ j ξ > j K =− log(n)− m X j=1 log(u j ) + m X j=1 u j (ξ > j Kξ j ), which is a convex function in u. By setting its derivative with respect to u j to zero, we obtain the optimal weights u j = 1/ ξ > j Kξ j . Note that, with the expression K = S +α(I− 11 > ) and S = P N i=1 x i x > i /N, the weights can be expressed in terms of x i : w s,t = h (e s − e t ) > K(e s − e t ) i −1 = (k s,s +k t,t − 2k s,t ) −1 = 1 N N X i=1 (x i (s)− x i (t)) 2 | {z } :=δs,t +2α −1 , for (s,t)∈E, It is clear that this value is always positive givenα> 0, so u≥ 0 is satisfied, meaning that the derived form is indeed the closed form solution of (3.4). Note that the quantity δ s,t is the mean-square-difference between thes-th and thet-th elements over all realizations x i . 114 Appendix B Search of Involutions in General Graphs In this appendix, we introduce two algorithms for searching valid involutions in a general graph. The purpose of these algorithms is to identify symmetry properties of a given graph that lead to fast graph Fourier transform (GFT) implementations discussed in Chapter 2. As in Section 1.2, we denote a graph asG with vertex setV of n nodes, edge setE, and weighted adjacency matrix W. Weighted self-loops are allowed in the graphs. Degrees of graph nodes are denoted as d 1 , d 2 , ... , d n . Recall Definition 1 that a permutation on the setV is called an involution if φ(φ(i)) = i for all i∈V. For V ={1,...,n}, we denote an involution φ :V→V as φ = (φ(1),φ(2),...,φ(n)). As defined in Definition 2, a graphG is called φ-symmetric if w i,j =w φ(i),φ(j) for all i∈V, j∈V. We say φ is a valid involution for graphG ifG is φ-symmetric. Essentially, finding all valid involutions of a graph is a combinatorial problem, since the number of all involutions T (n), also known as the telephone number (2.19), grows asymptotically faster than polynomials in n [58] (T (n) withn from 1 to 12 are shown in Table B). An exhaustive search of valid involutions among T (n) of possible candidates, as described as in Algorithm 2, has an exponential (or higher) complexity. However, given the knowledge of the graph structure and associated properties (e.g., when the graph is a tree), we can exploit useful strategies to prune the brute force algorithm, or even design an algorithm of polynomial complexity. In what follows, we will present two methods in Sections B.1 and B.2, respectively. The first method takes advantage of the degree list, and prune the exhaustive search based on the degrees of the nodes. The intuition behind it is that two nodes that are paired by an involution φ in a φ-symmetric graph must have the same degree. This provides a criterion for pruning involutions that are not valid during the search. The second approach is designed particularly for trees, which are special cases where O(n logn) complexity (or O(n) complexity, when aO(n) sort is used) can be achieved. In this method, we consider a property that involutions of a tree graph can be characterized by pair(s) of identical Work in this appendix is included in a technical report that can be found in [69]. 115 Algorithm 2 Exhaustive search of graph symmetry 1: procedure Exhaustive(G = (V,E, W)) 2: Π←∅ 3: T ← set of all involutions onV 4: while φ∈T do . Loop over all involutions 5: if w i,j =w φ(i),φ(j) for all i,j∈V then 6: Π← Π∪{φ} 7: end if 8: end while 9: return Π . Set of all valid involutions 10: end procedure Table B.1: Telephone numbers T (n) with n from 1 to 12 n 1 2 3 4 5 6 7 8 9 10 11 12 T (n) 1 2 4 10 26 76 232 764 2620 9496 35696 140152 tree branches whose roots are either common or adjacency. This property enables us to search involutions by comparing out-going branches at each tree node, leading to a linear complexity. B.1 Pruning based on Degree Lists From Definition 2, we obtain a necessary condition for nodes i and φ(i) being paired: nodes i and φ(i) have the same weighted degree (sum of weights on edges adjacent to a node). This is given by d i = X (i,j)∈E w i,j = X (i,j)∈E w φ(i),φ(j) = X (φ(i),φ(j))∈E w φ(i),φ(j) =d φ(i) . (B.1) In addition, we can also consider the unweighted degree (number of neighbors of a node). We denote this number of neighbors for node i as f i . Similar to (B.1), we have f i = X (i,j)∈E 1(w i,j ) = X (i,j)∈E 1(w φ(i),φ(j) ) = X (φ(i),φ(j))∈E 1(w φ(i),φ(j) ) =f φ(i) , (B.2) where 1 is the indicator function with 1(w) = 1 if w6= 0 and 1(w) = 0 otherwise. Based on these two conditions (B.1), (B.2), we obtain a useful pruning rule for the exhaustive search: only search those involutions in which all pairs of nodes have the same degrees and numbers of neighbors. To apply this criterion, we first partition the vertex setV into subsets{V d,f }, each of which contains nodes that share the common weighted 116 Algorithm 3 Truncated search of graph symmetry based on degree lists 1: procedure Truncated(G = (V,E, W)) 2: Compute the weighted degree list d = (d 1 ,...,d n ) > with d i = P n j=1 w i,j 3: Compute the unweighted degree list f = (f 1 ,...,f n ) > with f i = P n j=1 1(w i,j ) 4: S← GetTruncatedList(d, f) 5: Π←∅ 6: while φ∈S do . Loop over all involutions that are not pruned 7: if w i,j =w φ(i),φ(j) for all i,j∈V then 8: Π← Π∪{φ} 9: end if 10: end while 11: return Π . Set of all valid involutions 12: end procedure 13: procedure GetTruncatedList(d, f) 14: V d,f ←∅ for all d and f 15: for i = 1,...,n do 16: V d i ,f i ←V d i ,f i ∪{i} 17: end for 18: forV d,f 6=∅ do . Loop over allV d,f 19: S d,f ←set of all involutions onV d,f 20: end for 21: S← N d,f S d,f . Direct product of all involution sets 22: returnS . Set of all valid involutions after pruning 23: end procedure and unweighted degrees (d andf). Then, for each vertex subsetV d,f , we list allT (|V d,f |) possible involutions on it. Finally, we search all combinations of the listed involutions across all vertex subsets. Note that, the number of such involutions is Π d,f T (|V d,f |), which is typically significantly smaller than T (n) for graphs that are not regular (node degrees are not all equal). An example is shown in Figure B.1. We note that the graph nodes have weighted degrees d∈{1, 2, 3, 4, 5, 7} and unweighted degrees f∈{1, 2, 3, 4}. As in Figure B.1(c), there are 7 combinations of (d,f), so the vertex set is partitioned into 7 subsets with sizes 2, 3, 2, 1, 1, 1, 1, and 1. For the node in each class with size 1, all valid involutions must map it to itself, which reduces the dimension of the search. On the other hand, those classesV 1,1 ,V 2,1 , andV 2,2 have 2, 3, and 2 elements, respectively, so there are T (2) = 2, T (3) = 4, and T (2) = 2 candidates of involutions to be searched. As a result, the number of total involutions to be searched is reduced from T (12) = 140152 to T (2) 2 T (3) = 2× 2× 4 = 16. 117 (a) Graph (b) Degree lists (c) Tables of involutions Figure B.1: An example of pruning based on degree lists. (a) An example graph, (b) its weighted and unweighted degree lists, and (d) its vertex partitions and involutions on them. To summarize, we show the procedures based on this pruning criterion in Algorithm 3, whose MATLAB implementation can be found in [69]. 118 (a) (b) Figure B.2: Symmetry of tree characterized by identical branches. (a) Two branches with a common root. (b) Two branches with adjacent roots. Their associated involutions are φ a = (1, 6, 8, 9, 7, 2, 5, 3, 4) and φ b = (5, 7, 8, 6, 1, 4, 2, 3). B.1.1 Complexity Analysis Let the number of nodes be n and the number of edges be m. We note that, for a given involution φ, checking whether it is a valid involution (lines 5 and 6 in Algo- rithm 2) requiresO(m) time. This means that Algorithm 2 takes an overall complexity ofO(mT (n)). To analyze the complexity of Algorithm 3, we break it into several parts: • Retrieval of the degree lists (lines 2 and 3). This can be done by accumulating all edge weights, so the complexity isO(m) of each of d and f. • Obtaining the truncated list of involutions (lines 10-17). The algorithm requires scanning through all nodes (lines 12-13), visiting all possible involutions for each V d,f (lines 14-15), and applying the direct product of all sets of involutions (line 16). Let the number of partitions be q and the sizes of partitions be k 1 , k 2 , ... , k q . Then, these three procedures takeO(n),O( P q i=1 T (k i )), andO(Π q i=1 T (k i )), respectively. • Searching over the truncated set of involutions (lines 6-8). This takes O(mΠ q i=1 T (k i )). Combining all the complexities above, we obtain an overall complexity of O(mΠ q i=1 T (k i )). This reduces the complexity of Algorithm 2 by a factor of T (n)/Π q i=1 T (k i ), where n = P q i=1 k i . 119 B.2 Searching of Identical Tree Branches In this section, a polynomial time (O(n logn), orO(n) with optimization) algorithm is provided for the search of involution on trees. This algorithm arises from the fact that a symmetry in a tree can be uniquely characterizedby identical branches of the tree whose roots are common (as in Figure B.2(a)) or adjacent (as in Figure B.2(b)). In what follows, we refer to such branches as adjacent identical branches (AIB). The main algorithm is based on the fact that AIBs can be identified efficiently through a bottom-up traversal in a tree. This approach is motivated by related work in [13, 30], where the so-called subtree repeats can be found in linear time. The subtree repeat problem and our problem are similar, but different in two aspects. First, we do not allow the subtrees to overlap. Second, from the condition of graph symmetry, we require that the roots of the target subtrees to be adjacent. Thus, the algorithm we design can be regarded as a simplified version of the forward/non-overlapping stage in [30], and has the same time and space complexity. First, we show that if a treeG isφ-symmetry, then this symmetry can be characterized by a pair, or several pairs of adjacent identical branches inG: Lemma 8. Let φ be a non-trivial involution (φ (φ(i) 6= i for some i) and letG = (V,E, W) be a φ-symmetric tree. Then, the tree nodes can be partitioned into V = V X ∪V Y ∪V Z such that φ(i)∈V Y if i∈V X and φ(j) =j if j∈V Z , and there is at most one edge (k,l)∈E with k∈V X , l∈V Y . From this lemma, we know that involution φ is associated to a pair of branches (sub-trees) whose vertex sets areV X andV Y , respectively. Proof. We defineV Z ={i∈V|φ(i) =i}6=V, and discuss two cases:V Z 6=∅ andV Z =∅. IfV Z is non-empty, sinceφ is non-trivial, there must be a (possibly non-unique) node k 1 ∈V Z connected to two nodes that are not inV Z , denoted asi 1 andj 1 =φ(i 1 ). First, we start withV X ={i 1 } andV Y ={j 1 }. Then, we apply a depth-first search (DFS) to traverse the branch rooted by i 1 toward its other neighbors than k 1 , and include those nodes being visited intoV X . Note that, from the φ-symmetry, every step of the DFS corresponds to a traversal step fromj 1 towards its other neighbors thank 1 . In this way, we can obtain an identical branch rooted byj 1 , and include all nodes in this branch into V Y . If there exists another node k 2 ∈V Z other than k 1 connected to two nodes i 2 and j 2 =φ(i 2 ), we apply the same traversal procedure again and appendV X andV Y in the same way. This procedure can be repeated until all nodes are classified intoV X ,V Y , and 120 V Z . The resultingV X andV Y will end up containing one or several identical branches of the original tree. IfV Z is empty, then by connectivity of the tree graph, there must be an edge con- necting two nodes paired byφ,i andφ(i). Then, we apply a DFS to traverse the branch rooted by i toward its other neighbor than φ(i), and include all visited nodes intoV X . Similarly, the φ-symmetry also gives an identical branch rooted by φ(i). We include nodes in this branch intoV Y . The connectivity of the graph implies that all nodes are included in these two indentical branches. Note that the converse of this theorem is straightforward: when there are two iden- tical branches whose roots are common or adjacent, we can identify an involutionφ that pairs the nodes of the branches such that the graph is φ-symmetric. B.2.1 Algorithm Since tree symmetry can be characterized by adjacent identical branches, we can build branch descriptors for all branches in the tree, and compare branches that are joined or adjacent. Note that, to apply this method, the tree needs to be rooted, i.e., has a root so that there is a hierarchy with different levels of nodes. The reasonable choice of root would be a node in the center of the tree. The center of a graph is the set of verticesv where the greatest distanced(u,v) to other verticesv is minimal. In fact, if (z 1 ,...,z L ) is a longest path of the graph (whose length is called the diameter of the graph), then the center node(s) of this path must belong to the center of the graph. An important consequence of this property is given as follows. Lemma 9. A node in the center of a tree cannot be an internal node of a branch that has an adjacent identical branch. Proof. If otherwise, joining these two branches leads to a path whose length is greater than the diameter, yielding a contradiction. To find the center of a tree, we can apply breadth-first search (BFS) twice to obtain the longest path in the tree, and identify the nodes in the center of the longest path. An algorithm for obtaining the center is shown in Algorithm 4. Then, with Lemma 9, we design a bottom-up branch matching algorithm on a tree rooted with its center, which is summarized in Algorithm 5. As a demonstration of this algorithm, an example with graph in Figure B.3 is provided in Table B.2, where we show the iterations in reverse topological order of nodes (from bottom to top of the tree). Note that in iteration 5, an involution associated to the two AIBs is φ 1 = (1, 2, 3, 5, 4, 6, 7,..., 13) (pairing of nodes 4 and 5), while the involution 121 Algorithm 4 Finding the center of a tree 1: procedure TreeCenter(G) . G is a tree 2: Pick any node v 1 ∈V 3: Run BFS onG from v 1 , and obtain the furthest node v 2 4: Run BFS onG from v 2 , and obtain the furthest node v 3 5: Retrieve thev 2 −v 3 path (z 1 =v 2 ,z 2 ,...,z L =v 3 ) from the previous BFS solution . This path is the diameter of the tree 6: if L = 2n + 1 is odd then 7: return{z n } . The tree has one center 8: end if 9: if L = 2n is even then 10: return{z n ,z n+1 } . The tree has two centers 11: end if 12: end procedure Algorithm 5 Finding valid involutions in a tree 1: procedure FindInvolutionInTree(G) 2: Take a node v∈C=TreeCenter(G) 3: Run BFS from v, and obtain the node list with topogolical order (u 1 ,...,u n ) 4: I←∅ . List of valid involutions 5: for i =n,..., 1 do . Build branch descriptor bottom-up 6: if u i has no childs then 7: S i ←‘’ . Empty descriptor 8: end if 9: if u i has childs then 10: (t i,1 ,...,t i,c i )←list of childs, sorted w.r.t. their descriptors. 11: If any two branch descriptors are equal, appendI with the involution characterized by the two identical branches 12: S i ←‘(w u i ,t i,1 S t i,1 )(w u i ,t i,2 S t i,2 )... (w u i ,t i,c i S t i,c i )’ . Build descriptor by concatenating the weights and the descriptors of the sorted child branches 13: end if 14: end for 15: if|C| = 2 then . The center has two nodes 16: if Two branches rooted by the two center nodes are identical then 17: AppendI with the involution characterized by these two branches 18: end if 19: end if 20: returnI 21: end procedure 122 Figure B.3: An example graph for demonstration of Algorithm 5. The center of this graph is {1}. Note that a topological order given by BFS from node 1 would be (1, 2, 7, 12, 3, 6, 8, 9, 13, 4, 5, 10, 11). found in iteration 8 is φ 2 = (1,..., 8, 9, 11, 10, 12, 13) (pairing of nodes 10 and 11). In the last iteration, the involution corresponding to the largest AIBs is φ 3 = (1, 7, 9, 10, 11, 8, 2, 6, 3, 4, 5, 12, 13), that pairs nodes 2, 3, 4, 5, and 6 with nodes 7, 9, 10, 11, and 8, respectively. Note that, when each descriptor is built, the sorting of branch descriptors from the lower level will fix the order, so two AIBs must have identical branch descriptors. B.2.2 Complexity Analysis The algorithm consists of the following steps: • Finding the center (line 2). This requires two BFS’s and a traversal of a path. For a tree, the number of edges is m = n− 1, so the overall complexity of these operations on a tree isO(m +n) =O(n). • Finding a topological order using a BFS (line 3). The complexity of this step is O(n). • Building descriptors for all nodes (line 5-11). Here, it taken iterations to loop over all tree nodes. Letp i be the number of childs of nodeu i . In iterationi, sorting the 123 Table B.2: Demonstration of Algorithm 5. Iteration Node Branch descriptor Comment 1 4 S 4 =‘’ 2 5 S 5 =‘’ 3 10 S 10 =‘’ 4 11 S 11 =‘’ 5 3 S 3 = (w 3,4 S 4 )(w 3,5 S 5 ) AIB found =‘(1)(1)’ 6 6 S 6 =‘’ 7 8 S 8 =‘’ 8 9 S 9 = (w 9,11 S 11 )(w 9,10 S 10 ) AIB found =‘(1)(1)’ 9 13 S 13 =‘’ 10 2 S 2 = (w 2,6 S 6 )(w 2,3 S 3 ) =‘(1)(3((1)(1)))’ 11 7 S 7 = (w 7,8 S 8 )(w 7,9 S 9 ) =‘(1)(3((1)(1)))’ 12 12 S 12 =‘(1)’ 13 1 S 1 = (w 1,12 S 12 )(w 1,2 S 2 )(w 1,7 S 7 ) AIB found =‘(1(1))(2(1)(3(1)(1)))(2(1)(3(1)(1)))’ descriptors takesO(p i log(p i )), and concatenating them takesO(p i ). The overall complexity of this loop is n X i=1 p i log(p i ) = (n− 1) " n X i=1 p i n− 1 log(p i ) # = (n− 1) " n X i=1 p i n− 1 log p i n− 1 + n X i=1 p i n− 1 log(n− 1) # = (n− 1) −H p + P n i=1 p i n− 1 log(n− 1) ≤ (n− 1) log(n− 1) (B.3) where the equation P n i=1 p i =n− 1 has been applied, and H p ≥ 0 because it is an entropy quantity. From (B.3), we obtain a complexity ofO(n logn). • Check complete symmetry (line 13-14). This requires a comparison of two descrip- tors, and takesO(n) time. In fact, as mentioned in [30], sorting of descriptors can be implemented by radix sort, which requires onlyO(p i ), due to the finite range of strings to be sorted. As a result, the overall complexity isO(n logn) with anO(n logn) sort, orO(n) with aO(n) sort. 124 B.3 Conclusion In this appendix, we discuss two methods for searching a valid involutionφ in a graph for graph symmetry. The first method is based on the fact that two symmetric nodes must have the same weighted and unweighted degrees. The second approach takes advantage of the sparsity of tree graphs, in which case the graph symmetry can be characterized by adjacent identical branches. We show the algorithms of both methods as well as examples for illustration. The complexities of both methods are analyzed. The MATLAB implementations can be found in the companion github repository [69]. 125 Appendix C Characterization of Sparse Laplacians with a Common GFT This section includes remarks on the retrieval of sparse operators for general GFTs beyond DCT and DST. The characterization of all Laplacians that share a common GFT has been studied in the context of graph topology identification and graph diffusion process inference [18,92,108]. In particular, it has been shown in [92] that the set of normalized Laplacian matrices having a fixed GFT can be characterized by a convex polytope. Following a similar proof, we briefly present the counterpart result for unnormalized generalized Laplacians, where self-loops are allowed: Theorem 5. The set of Laplacian matrices with a fixed GFT can be characterized by a convex polyhedral cone in the space of eigenvalues (λ 1 ,...,λ N ). Proof: For a given GFT Φ, let the eigenvalues λ j of the Laplacian be variables. By definition of the Laplacian (1.4), we can see that L = Φ· diag(λ 1 ,...,λ N )· Φ > = N X k=1 λ k φ k φ > k (C.1) is a valid Laplacian matrix ifl ij ≤ 0 (non-negative edge weights),l ii ≥ P N j=1,j6=i l ij (non- negative self-loop weights), and λ k ≥ 0 for all k (non-negative graph frequencies). With the expression (C.1) we havel ij = P N k=1 λ k φ k (i)φ k (j), and thus the Laplacian conditions can be expressed in terms of λ j ’s: N X k=1 λ k φ k (i)φ k (j)≤ 0, for i6=j, N X k=1 λ k φ k (i) 2 ≥ N X j=1 j6=i λ k φ k (i)φ k (j), for i = 1,...,N, λ k ≥ 0, k = 1,...,N. (C.2) 126 Figure C.1: An illustrative example of a polyhedral cone inR 3 with a vertex at 0 and 5 edges. Any element of the cone can be represented as P 5 m=1 a m L (m) with non-negative a m . These constraints onλ = (λ 1 ,...,λ N ) > are all linear, so the feasible set forλ∈R N is a convex polyhedron. We denote this polyhedron byP, and highlight some properties as follows: •P is non-empty: it is clear that λ = 1 gives L = I, which is a trivial, but valid, Laplacian. • λ = 0 is the only vertex ofP: when λ j = 0 for all j, equality is met for all constraints in (C.2). This means that all hyperplanes that defineP intersect at a common point 0, which further implies thatP does not have other vertices than 0. From those facts above, we conclude thatP is a non-empty convex polyhedral cone. For illustration purpose, we can visualize the structure of a 3-dimensional polyhedral cone with 5 edges in Figure C.1. Notably, any element inP can be expressed by a conical combination (linear combination with non-negative coefficients) of elements on the edges ofP, as illustrated in Figure C.1. In particular, letP have M edges, and let L (1) , ... , L (M) be points on different edges, then any element Q∈P can be represented as Q = M X m=1 a m L (m) , a m ≥ 0. 127 The fact that Laplacians have non-positive off-diagonal entries implies that the L (m) ’s are the most sparse Laplacians. This can be seen by noting that a conical combination of two Laplacians must have more non-zero off-diagonal elements than the two individual Laplacians do. Since sparse Laplacians are characterized by edges of a polyhedral cone, we can choose sparse filters in (5.1) as those Laplacians:Z ={L (k) } k . The retrieval of those matrices would require an algorithm that enumerates the vertices and edges given the description of a polyhedron. A popular algorithm for this problem is the so-called reverse search [3], which has a complexityO(rdv), where r is the number of linear constraints in R d , and v is the number of target vertices. In (C.2), d =N and m = (N 2 + 3N)/2, so the complexity reduces toO(N 3 v). In practice, the vertex enumeration problem is in general an NP-hard problem since the number of verticesv can be a combinatorial number: r d . For the purpose of efficient graph filter design, a truncated version of the algorithm [3] may be applied to obtain a few instead of all vertices. The study of such a truncated algorithm will be left for our future work. 128 Appendix D Construction of Sparse Operators in Symmetric Graphs This section shows a construction of sparse operator for graphs with certain symmetry properties. In Chapter 2, we highlighted that a GFT has a butterfly stage for fast implementation if the associated graph demonstrates a symmetry property based on involution. Here, we show how a sparse operator can be constructed for a ϕ-symmetric graphG. Lemma 10. Given a ϕ-symmetric graphG with Laplacian L, we can construct a graph G ϕ by connecting nodes i and j with edge weight 1 for all node pairs (i,j) with ϕ(i) = j,i6=j. In this way, the Laplacian L ϕ ofG ϕ commutes with L. Proof: We note that, for i∈V, we either have ϕ(i) = j6= i with ϕ(j) = i or ϕ(i) = i. Without loss of generality, we order the graph vertices such that ϕ(i) = N + 1−i for i = 1,...,k andϕ(i) =i fori =k + 1,...,N + 1−k. With this vertex order, we express L in terms of block matrix components, L = L 11 L 12 L 13 L > 12 L 22 L 23 L > 13 L > 23 L 33 , where L 11 , L 33 ∈ R k×k and L 22 ∈ R (N−2k)×(N−2k) . By ϕ-symmetry, the block compo- nents of L satisfy Lemma 4: L 13 = JL > 13 J, L 33 = JL 11 J, L 23 = L > 12 J. (D.1) We can also see that the Laplacian constructed from Lemma 10, with the same node ordering defined as above, is L ϕ = I 0 −J 0 0 0 −J 0 I . 129 (a) (b) Figure D.1: An illustrative example for graph operator construction based on graph symmetry. (a) The 15-node human skeletal graphG. (b) The graphG ϕ associated to an alternative sparse operator by construction. All edge weights are 1. Then, using (D.1), we can easily verify that LL ϕ = L 11 − L 13 J 0 −L 11 J + L 13 0 0 0 L > 13 − JL 11 0 −L > 13 J + JL 11 J = L ϕ L, which concludes the proof. We demonstrate an example for the construction ofG ϕ , in Figure D.1. Figure D.1(a) shows a 15-node human skeletal graphG [54]. A left-to-right symmetry can be observed inG, which induces an involution ϕ with ϕ(i) = i for i = 7, 8, 9 and ϕ(i) = 16−i otherwise. With the construction in Lemma 10, we obtain a graphG ϕ as in Figure D.1(b) by connecting all pairs of symmetric nodes in Figure D.1(a). We denote Z (1) = L and Z (2) = L ϕ the Laplacians ofG andG ϕ , respectively, and Ψ = (ψ 1 ,...,ψ 15 ) the GFT matrix of L with basis functions in increasing order of eigenvalues. Thus, we have Z (2) = Ψ· diag(λ (2) )· Ψ > , λ (2) = (0, 0, 2, 2, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0) > . Since Z (2) has only two distinct eigenvalues with high multiplicities, every polynomial of Z (2) also has two distinct eigenvalues only, which poses a limitation for graph filter design. However, an MPGF with both Z (1) and Z (2) still provides more degrees of freedom compared to a PGF with a single operator. 130 Reference List [1] A. Anis, A. Gadde, and A. Ortega, “Efficient sampling set selection for bandlim- ited graph signals using graph spectral proxies,” IEEE Transactions on Signal Processing, vol. 64, no. 14, pp. 3775–3789, July 2016. [2] A. Arrufat, P. Philippe, and O. D´ eforges, “Non-separable mode dependent trans- forms for intra coding in HEVC,” in IEEE Visual Communications and Image Processing Conference, Dec. 2014, pp. 61–64. [3] D. Avis and F. Fukuda, “A pivoting algorithm for convex hulls and vertex enu- meration of arrangements and polyhedra,” Discrete & Computational Geometry, vol. 8, pp. 295–313, Sep. 1992. [4] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” Proceedings of the ITU-T Video Coding Experts Group (VCEG) Thirteenth Meet- ing, 01 2001. [5] A. Cantoni and P. Butler, “Eigenvalues and eigenvectors of symmetric centrosym- metric matrices,” Linear Algebra and its Applications, vol. 13, no. 3, pp. 275–288, 1976. [6] S. Chen, F. Cerda, P. Rizzo, J. Bielak, J. H. Garrett, and J. Kova˘ cevi´ c, “Semi- supervised multiresolution classification using adaptive graph filtering with appli- cation to indirect bridge structural health monitoring,” IEEE Transactions on Signal Processing, vol. 62, no. 11, pp. 2879–2893, June 2014. [7] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kova˘ cevi´ c, “Signal denoising on graphs via graph filtering,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), Dec. 2014, pp. 872–876. [8] W.-H. Chen and S. C. Fralick, “Image enhancement using cosine transform filter- ing,” in Image Sci. Math. Symp., Nov 1976. [9] W.-H. Chen, C. Smith, and S. Fralick, “A fast computational algorithm for the discrete cosine transform,” IEEE Transactions on Communications, vol. 25, no. 9, pp. 1004–1009, Sep. 1977. 131 [10] Y. Chen, D. Mukherjee, J. Han, A. Grange, Y. Xu, S. Parker, C. Chen, H. Su, U. Joshi, C.-H. Chiang, and et al., “An overview of coding tools in AV1: the first video codec from the alliance for open media,” APSIPA Transactions on Signal and Information Processing, vol. 9, p. e6, 2020. [11] G. Cheung, E. Magli, Y. Tanaka, and M. K. Ng, “Graph spectral image process- ing,” Proceedings of the IEEE, vol. 106, no. 5, pp. 907–930, May 2018. [12] B. Chitprasert and K. R. Rao, “Discrete cosine transform filtering,” Signal Pro- cessing, vol. 19, no. 3, pp. 233–245, 1990. [13] M. Christou, M. Crochemore, T. Flouri, C. S. Iliopoulos, J. Janouˇ sek, B. Melichar, and S. P. Pissis, “Computing all subtree repeats in ordered trees,” Information Processing Letters, vol. 112, no. 24, pp. 958–962, 2012. [14] F. R. K. Chung, Spectral Graph Theory. American Mathematical Soc., 1997, vol. 92. [15] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, 1965. [16] M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Transactions on Signal Processing, vol. 67, no. 9, pp. 2320–2333, May 2019. [17] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting steps,” Journal of Fourier Analysis and Applications, vol. 4, no. 3, pp. 247–269, May 1998. [18] Y. De Castro, T. Espinasse, and P. Rochet, “Reconstructing undirected graphs from eigenspaces,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 1679–1702, Jan. 2017. [19] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Proceedings of the 30th Interna- tional Conference on Neural Information Processing Systems, ser. NIPS’16. Red Hook, NY, USA: Curran Associates Inc., 2016, pp. 3844–3852. [20] J. A. Deri and J. M. F. Moura, “Spectral projector-based graph Fourier trans- forms,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 785–795, Sep. 2017. [21] X. Dong, A. Ortega, P. Frossard, and P. Vandergheynst, “Inference of mobility pat- terns via spectral graph wavelets,” in IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 3118–3122. [22] X. Dong, D. Thanou, M. Rabbat, and P. Frossard, “Learning graphs from data: A signal representation perspective,” IEEE Signal Processing Magazine, vol. 36, no. 3, pp. 44–63, 2019. [23] F. Dorfler and F. Bullo, “Kron reduction of graphs with applications to electrical networks,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 60, no. 1, pp. 150–163, 2013. 132 [24] H. E. Egilmez, Y.-H. Chao, and A. Ortega, “Graph-based transforms for video coding,” arXiv preprint arXiv:1909.00952, 2020. [Online]. Available: https://arxiv.org/abs/1909.00952 [25] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from data under Lapla- cian and structural constraints,” IEEE Journal of Selected Topics in Signal Pro- cessing, vol. 11, no. 6, pp. 825–841, 2017. [26] H. E. Egilmez, A. Said, Y. H. Chao, and A. Ortega, “Graph-based transforms for inter predicted video coding,” 2015 IEEE International Conference on Image Processing (ICIP), pp. 3992–3996, Sep. 2015. [27] N. Emirov, C. Cheng, J. Jiang, and Q. Sun, “Polynomial graph filter of multiple shifts and distributed implementation of inverse filtering,” arXiv:2003.11152, March 2020. [Online]. Available: http://arxiv.org/abs/2003.11152 [28] P. Erd¨ os and A. R´ enyi, “Asymmetric graphs,” Acta Mathematica Hungarica, vol. 14, no. 3-4, pp. 295–315, Sep. 1963. [29] E. Feig and S. Winograd, “Fast algorithms for the discrete cosine transform,” IEEE Transactions on Signal Processing, vol. 40, no. 9, pp. 2174–2193, Sep. 1992. [30] T. Flouri, K. Kobert, S. P. Pissis, and A. Stamatakis, “An optimal algorithm for computing all subtree repeats in trees,” in Combinatorial Algorithms, T. Lecroq and L. Mouchard, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 269–282. [31] G. Fracastoro and E. Magli, “Steerable discrete Fourier transform,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 319–323, March 2017. [32] G. Fracastoro, D. Thanou, and P. Frossard, “Graph transform learning for image compression,” in Picture Coding Symposium (PCS), Dec. 2016, pp. 1–5. [33] G. Fracastoro, D. Thanou, and P. Frossard, “Graph transform optimization with application to image compression,” IEEE Transactions on Image Process- ing, vol. 29, pp. 419–432, 2020. [34] A. Gavili and X. Zhang, “On the shift operator, graph frequency, and optimal filter- ing in graph signal processing,” IEEE Transactions on Signal Processing, vol. 65, no. 23, pp. 6303–6318, Dec. 2017. [35] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992. [36] B. Girault, A. Ortega, and S. S. Narayanan, “Irregularity-aware graph Fourier transforms,” IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5746– 5761, Nov. 2018. 133 [37] A. Gnutti, F. Guerrini, R. Leonardi, and A. Ortega, “Symmetry-based graph Fourier transforms for image representation,” in IEEE International Conference on Image Processing (ICIP), Oct. 2018, pp. 2575–2579. [38] C. Godsil and G. Royle, Algebraic Graph Theory, ser. Graduate Texts in Math- ematics. volume 207 of Graduate Texts in Mathematics. Springer, 2001, vol. 207. [39] G. H. Golub and C. F. Van Loan, Matrix Computations (3rd Ed.). Baltimore, MD, USA: Johns Hopkins University Press, 1996. [40] R. C. Gonzalez and R. E. Woods, Digital Image Processing (3rd Edition). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2006. [41] V. K. Goyal, “Theoretical foundation of transform coding,” IEEE Signal Processing Magazine, pp. 9–21, Sep. 2001. [42] M. Grant and S. Boyd, “CVX: Matlab Software for Disciplined Convex Program- ming, version 2.1,” http://cvxr.com/cvx, Mar. 2014. [43] H. Hou, “A fast recursive algorithm for computing the discrete cosine transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 10, pp. 1455–1461, 1987. [44] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011. [45] J. Han, A. Saxena, V. Melkote, and K. Rose, “Jointly optimized spatial prediction and block transform for video and image coding,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1874–1884, Apr. 2012. [46] J. Han, Y. Xu, and D. Mukherjee, “A butterfly structured design of the hybrid transform coding scheme,” in Picture Coding Symposium, 2013, pp. 1–4. [47] T. Hofmann, B. Sch¨ olkopf, and A. J. Smola, “Kernel methods in machine learning,” The annals of statistics, pp. 1171–1220, 2008. [48] I. H¨ ontsch and L. J. Karam, “Adaptive image coding with perceptual distortion control,” IEEE Transactions on Image Processing, vol. 11, no. 3, pp. 213–222, March 2002. [49] W. Hu, G. Cheung, and A. Ortega, “Intra-prediction and generalized graph Fourier transform for image coding,” Signal Processing Letters, IEEE, vol. 22, no. 11, pp. 1913–1917, Nov. 2015. [50] W. Hu, G. Cheung, A. Ortega, and O. C. Au, “Multiresolution graph Fourier transform for compression of piecewise smooth images,” IEEE Transactions on Image Processing, vol. 24, no. 1, pp. 419–433, Jan. 2015. 134 [51] E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving average graph filtering,” IEEE Transactions on Signal Processing, vol. 65, no. 2, pp. 274– 288, Jan. 2017. [52] D. Jakobson, S. D. Miller, I. Rivin, and Z. Rudnick, Eigenvalue Spacings for Reg- ular Graphs. New York, NY: Springer New York, 1999, pp. 317–327. [53] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Prentice Hall Professional Technical Reference, 1990. [54] J.-Y. Kao, A. Ortega, and S. S. Narayanan, “Graph-based approach for motion capture data representation and analysis,” in IEEE International Conference on Image Processing (ICIP), Oct. 2014, pp. 2061–2065. [55] T. N. Kipf and M. Welling, “Semi-Supervised classification with graph convolutional networks,” arXiv:1609.02907 [cs, stat], Feb. 2017, arXiv: 1609.02907. [Online]. Available: http://arxiv.org/abs/1609.02907 [56] G. Kirchhoff, “Ueber die Aufl¨ osung der Gleichungen, auf welche man bei der Untersuchung der linearen Vertheilung galvanischer StrÃűme gefÃijhrt wird,” Annalen der Physik, vol. 148, no. 12, pp. 497–508, 1847. [Online]. Available: http://dx.doi.org/10.1002/andp.18471481202 [57] H. Kitajima, “A symmetric cosine transform,” IEEE Transactions on Computers, vol. C-29, no. 4, pp. 317–323, Apr. 1980. [58] D. E. Knuth, The Art of Computer Programming, Volume 3: (2Nd Ed.) Sorting and Searching. Redwood City, CA, USA: Addison Wesley Longman Publishing Co., Inc., 1998. [59] C. W. Kok, “Fast algorithm for computing discrete cosine transform,” IEEE Trans- actions on Signal Processing, vol. 45, no. 3, pp. 757–760, Mar. 1997. [60] L. Le Magoarou and R. Gribonval, “Are there approximate fast Fourier transforms on graphs?” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp. 4811–4815. [61] L. Le Magoarou and R. Gribonval, “Flexible multilayer sparse approximations of matrices and applications,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 688–700, 2016. [62] L. Le Magoarou, R. Gribonval, and N. Tremblay, “Approximate fast graph Fourier transforms via multilayer sparse approximations,” IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 2, pp. 407–420, June 2018. [63] L. Le Magoarou, N. Tremblay, and R. Gribonval, “Analyzing the approximation error of the fast graph Fourier transform,” in Asilomar Conference on Signals, Systems, and Computers, 2017, pp. 45–49. 135 [64] B. Li, O. G. Guleryuz, J. Ehmann, and A. Vosoughi, “Layered-givens transforms: Tunable complexity, high-performance approximation of optimal non-separable transforms,” in IEEE International Conference on Image Processing (ICIP), Sep. 2017, pp. 1687–1691. [65] B. Li, J. Han, and Y. Xu, “Fast transform type selection using conditional Lapla- cian distribution based rate estimation,” in Applications of Digital Image Process- ing XLIII, vol. 11510. SPIE, 2020, pp. 461–468. [66] S. Z. Li, Markov Random Field Modeling in Image Analysis, 3rd ed. Springer Publishing Company, Incorporated, 2009. [67] J. Liu, E. Isufi, and G. Leus, “Filter design for autoregressive moving average graph filters,” IEEE Transactions on Signal and Information Processing over Networks, vol. 5, no. 1, pp. 47–60, July 2019. [68] A. Loukas, A. Simonetto, and G. Leus, “Distributed autoregressive moving average graph filters,” IEEE Signal Processing Letters, vol. 22, no. 11, pp. 1931–1935, 2015. [69] K.-S. Lu and A. Ortega, “Fast GFT based on graph symmetry.” [online] https: //github.com/kslu/fastgft. [70] ——, “Symmetric line graph transforms for inter predictive video coding,” in Pic- ture Coding Symposium (PCS), Dec. 2016, pp. 1–5. [71] ——, “A graph Laplacian matrix learning method for fast implementation of graph Fourier transform,” in IEEE International Conference on Image Processing (ICIP), Sep. 2017, pp. 1677–1681. [72] ——, “Fast graph Fourier transforms based on graph symmetry and bipartition,” IEEE Transactions on Signal Processing, vol. 67, no. 18, pp. 4855–4869, 2019. [73] ——, “Lapped transforms: A graph-based extension,” in IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5401– 5405. [74] K.-S. Lu, A. Ortega, D. Mukherjee, and Y. Chen, “Efficient rate-distortion approx- imation and transform type selection using Laplacian operators,” in Picture Coding Symposium (PCS), June 2018, pp. 76–80. [75] ——, “DCT and DST graph filtering with sparse shift operators,” in preparation, 2020. [76] ——, “Perceptually-inspired weighted MSE optimization using irregularity-aware graph Fourier transform,” IEEE International Conference on Image Processing (ICIP), 2020. [77] K.-S. Lu, E. Pavez, and A. Ortega, “On learning Laplacians of tree structured graphs,” in IEEE Data Science Workshop (DSW), June 2018, pp. 205–209. 136 [78] J. Ma, W. Huang, S. Segarra, and A. Ribeiro, “Diffusion filtering of graph signals and its use in recommendation systems,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4563–4567. [79] H. S. Malvar, “Lapped transforms for efficient transform/subband coding,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 6, pp. 969– 978, June 1990. [80] ——, Signal Processing with Lapped Transforms, ser. Artech House communica- tions library. Artech House, 1992. [81] H. S. Malvar and D. H. Staelin, “The LOT: transform coding without blocking effects,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 4, pp. 553–559, April 1989. [82] S. A. Martucci, “Symmetric convolution and the discrete sine and cosine trans- forms,” IEEE Transactions on Signal Processing, vol. 42, no. 5, pp. 1038–1051, 1994. [83] J. McClellan, T. Parks, and L. Rabiner, “A computer program for designing opti- mum FIR linear phase digital filters,” IEEE Transactions on Audio and Electroa- coustics, vol. 21, no. 6, pp. 506–526, 1973. [84] J. Mei and J. M. F. Moura, “Signal processing on graphs: Causal modeling of unstructured data,” IEEE Transactions on Signal Processing, vol. 65, no. 8, pp. 2077–2092, 2017. [85] M. M´ enoret, N. Farrugia, B. Pasdeloup, and V. Gripon, “Evaluating graph signal processing for neuroimaging through classification and dimensionality reduction,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), Nov. 2017, pp. 618–622. [86] B. Mohar, “The Laplacian spectrum of graphs,” in Graph Theory, Combinatorics, and Applications. Wiley, 1991, pp. 871–898. [87] S. Narang and A. Ortega, “Perfect Reconstruction Two-Channel Wavelet Filter Banks for Graph Structured Data,” Signal Processing, IEEE Transactions on, vol. 60, no. 6, pp. 2786–2799, Jun. 2012. [88] H. Q. Nguyen and M. N. Do, “Downsampling of signals on graphs via maximum spanning trees.” IEEE Trans. Signal Processing, vol. 63, no. 1, pp. 182–191, 2015. [89] M. Onuki, S. Ono, M. Yamagishi, and Y. Tanaka, “Graph signal denoising via tri- lateral filter on graph spectral domain,” IEEE Transactions on Signal and Infor- mation Processing over Networks, vol. 2, no. 2, pp. 137–148, June 2016. [90] A. Ortega, Graph Signal Processing: An Introduction. Cambridge University Press, 2021. 137 [91] Y. Park and H. Park, “Design and analysis of an image resizing filter in the block- DCT domain,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 2, pp. 274–279, 2004. [92] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G. Rabbat, “Character- ization and inference of graph diffusion processes from observations of stationary signals,” IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 3, pp. 481–496, Sep. 2018. [93] Y. C. Pati, R. Rezaiifar, Y. C. P. R. Rezaiifar, and P. S. Krishnaprasad, “Orthog- onal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proceedings of the 27 th Annual Asilomar Conference on Signals, Systems, and Computers, 1993, pp. 40–44. [94] E. Pavez and A. Ortega, “Generalized Laplacian precision matrix estimation for graph signal processing,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 6350–6354. [95] E. Pavez, A. Ortega, and D. Mukherjee, “Learning separable transforms by inverse covariance estimation,” IEEE International Conference on Image Process- ing (ICIP), 09 2017. [96] S. Pemmaraju and S. Skiena, Computational Discrete Mathematics: Combina- torics and Graph Theory with Mathematica R . New York, NY, USA: Cambridge University Press, 2003. [97] M. Peris, S. Martull, A. Maki, Y. Ohkawa, and K. Fukui, “Towards a simulation driven stereo vision system,” in Proceedings of the 21st International Conference on Pattern Recognition (ICPR), Nov. 2012, pp. 1038–1042. [98] N. P. Pitsianis, “The Kronecker Product in Approximation and Fast Transform Generation,” Ph.D. dissertation, Cornell University, Ithaca, NY, USA, 1997, uMI Order No. GAX97-16143. [99] M. P¨ uschel and J. M. F. Moura, “The algebraic approach to the discrete cosine and sine transforms and their fast algorithms,” SIAM Journal on Computing, vol. 32, no. 5, pp. 1280–1316, 2003. [100] I. Rotondo, G. Cheung, A. Ortega, and H. E. Egilmez, “Designing sparse graphs via structure tensor for block transform coding of images,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Dec. 2015, pp. 571–574. [101] H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications. CRC Press, 2005. [102] A. Said and W. A. Pearlman, “Low-complexity waveform coding via alphabet and sample-set partitioning,” in Proceedings of IEEE International Symposium on Information Theory, June 1997, pp. 61–73. 138 [103] V. Sanchez, P. Garcia, A. M. Peinado, J. C. Segura, and A. J. Rubio, “Diagonal- izing properties of the discrete cosine transforms,” IEEE Transactions on Signal Processing, vol. 43, no. 11, pp. 2631–2641, 1995. [104] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” Signal Processing, IEEE Transactions on, vol. 61, no. 7, pp. 1644–1656, Apr. 2013. [105] ——, “Discrete signal processing on graphs: Frequency analysis,” Signal Process- ing, IEEE Transactions on, vol. 62, no. 12, pp. 3042–3054, Jun. 2014. [106] S. Sardellitti, S. Barbarossa, and P. D. Lorenzo, “On the graph Fourier transform for directed graphs,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 796–811, Sep. 2017. [107] A. Saxena and F. C. Fernandes, “DCT/DST-based transform coding for intra pre- diction in image/video coding,” IEEE Transactions on Image Processing, vol. 22, no. 10, pp. 3974–3981, Oct. 2013. [108] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Transactions on Signal and Information Process- ing over Networks, vol. 3, no. 3, pp. 467–483, Sep. 2017. [109] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and appli- cations to distributed linear network operators,” IEEE Transactions on Signal Processing, vol. 65, no. 15, pp. 4117–4131, Aug. 2017. [110] L. Seidenari, V. Varano, S. Berretti, A. D. Bimbo, and P. Pala, “Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2013, pp. 479–485. [111] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A large scale dataset for 3D human activity analysis,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [112] G. Shen and A. Ortega, “Tree-based wavelets for image coding: Orthogonalization and tree selection,” in Picture Coding Symposium, May 2009, pp. 1–4. [113] H. S. Shin, C. Lee, and M. Lee, “Ideal filtering approach on DCT domain for biomedical signals: index blocked DCT filtering method (IB-DCTFM),” J. Med. Syst., vol. 34, no. 2, pp. 741–753, Aug. 2010. [114] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” Signal Processing Magazine, IEEE, vol. 30, no. 3, pp. 83–98, May 2013. 139 [115] D. I. Shuman, P. Vandergheynst, D. Kressner, and P. Frossard, “Distributed sig- nal processing via Chebyshev polynomial approximation,” IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 4, pp. 736–751, Dec. 2018. [116] A. Smola and R. Kondor, “Kernels and regularization on graphs,” in Learning Theory and Kernel Machines, vol. 2777, Jan 2003, pp. 144–158. [117] G. Strang, “The discrete cosine transform,” SIAM review, vol. 41, no. 1, pp. 135– 147, 1999. [118] G. Strang and T. Nguyen, Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997. [119] H. Su, M. Chen, A. Bokov, D. Mukherjee, Y. Wang, and Y. Chen, “Machine learn- ing accelerated transform search for AV1,” in Picture Coding Symposium (PCS), 2019, pp. 1–5. [120] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, Dec. 2012. [121] T. Tanaka, T. Uehara, and Y. Tanaka, “Dimensionality reduction of sample covari- ance matrices by graph Fourier transform for motor imagery brain-machine inter- face,” in IEEE Statistical Signal Processing Workshop (SSP), June 2016, pp. 1–5. [122] Y. Tanaka and A. Sakiyama, “M-channel oversampled graph filter banks,” IEEE Transactions on Signal Processing, vol. 62, no. 14, pp. 3578–3590, 2014. [123] O. Teke and P. P. Vaidyanathan, “Extending classical multirate signal processing theory to graphs–part ii: M-channel filter banks,” IEEE Transactions on Signal Processing, vol. 65, no. 2, pp. 423–437, Jan 2017. [124] O. Teke and P. P. Vaidyanathan, “Sparse eigenvectors of graphs,” in IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp. 3904–3908. [125] ——, “Uncertainty principles and sparse eigenvectors of graphs,” IEEE Transac- tions on Signal Processing, vol. 65, no. 20, pp. 5406–5420, Oct. 2017. [126] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society (Series B), vol. 58, pp. 267–288, 1996. [127] T. D. Tran, J. Liang, and C. Tu, “Lapped transform via time-domain pre- and post-filtering,” IEEE Transactions on Signal Processing, vol. 51, no. 6, pp. 1557– 1571, June 2003. [128] N. Tremblay, G. Puy, R. Gribonval, and P. Vandergheynst, “Compressive spectral clustering,” in International Conference on International Conference on Machine Learning (ICML), vol. 48. JMLR.org, 2016, pp. 1002–1011. 140 [129] U. Tuna, S. Peltonen, and U. Ruotsalainen, “Gap-filling for the high-resolution pet sinograms with a dedicated DCT-domain filter,” IEEE Transactions on Medical Imaging, vol. 29, no. 3, pp. 830–839, 2010. [130] J. Valin, T. B. Terriberry, N. E. Egge, T. Daede, Y. Cho, C. Montgomery, and M. Bebenita, “Daala: Building a next-generation video codec from unconventional technology,” in IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2016, pp. 1–6. [131] C. F. Van Loan and N. Pitsianis, Approximation with Kronecker Products. Dor- drecht: Springer Netherlands, 1993, pp. 293–314. [132] U. von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007. [133] C.-Y. Wang, S.-M. Lee, and L.-W. Chang, “Designing JPEG quantization tables based on human visual system,” Signal Processing: Image Communication, vol. 16, no. 5, pp. 501–506, 2001. [134] Z. Wang, “Fast algorithms for the discrete w transform and for the discrete Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Process- ing, vol. 32, no. 4, pp. 803–816, Aug. 1984. [135] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assess- ment: from error visibility to structural similarity,” Image Processing, IEEE Trans- actions on, vol. 13, no. 4, pp. 600–612, 2004. [136] Z. Wang and B. Hunt, “The discrete W transform,” Applied Mathematics and Computation, vol. 16, no. 1, pp. 19–48, 1985. [137] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, Nov. 2003, pp. 1398–1402. [138] A. C. Ya˘ gan and M. T. ¨ Ozgen, “A spectral graph wiener filter in graph Fourier domain for improved image denoising,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2016, pp. 450–454. [139] C. Yeo, Y. H. Tan, Z. Li, and S. Rahardja, “Mode-dependent fast separable klt for block-based intra coding,” in IEEE International Symposium of Circuits and Systems (ISCAS), May 2011, pp. 621–624. [140] P. Yip and K. Rao, “A fast computational algorithm for the discrete sine trans- form,” IEEE Transactions on Communications, vol. 28, no. 2, pp. 304–307, Feb. 1980. [141] W. Zeng, S. Daly, and S. Lei, “Point-wise extended visual masking for JPEG- 2000 image compression,” in Proceedings 2000 International Conference on Image Processing, vol. 1, Sep. 2000, pp. 657–660. 141 [142] C. Zhang and D. Florˆ encio, “Analyzing the optimality of predictive transform coding using graph-based models,” Signal Processing Letters, IEEE, vol. 20, no. 1, pp. 106–109, Jan. 2013. [143] C. Zhang, D. Florˆ encio, and P. A. Chou, “Graph signal processing-A probabilistic framework,” Technical Report, Apr. 2015. [144] D. Zhang and J. Liang, “Graph-based transform for 2D piecewise smooth signals with random discontinuity locations,” IEEE Transactions on Image Processing, vol. 26, no. 4, pp. 1679–1693, April 2017. [145] H. Zhang and F. Ding, “On the Kronecker products and their applications,” Jour- nal of Applied Mathematics, vol. 2013, 06 2013. [146] X. Zhao, L. Zhang, S. Ma, and W. Gao, “Video coding with rate-distortion opti- mized transform,” IEEE Transactions on Circuits and Systems for Video Technol- ogy, vol. 22, no. 1, pp. 138–151, Jan 2012. [147] F. Zou, O. C. Au, C. Pang, J. Dai, X. Zhang, and L. Fang, “Rate-distortion optimized transforms based on the Lloyd-type algorithm for intra block coding,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 6, pp. 1072–1083, Dec 2013. 142
Abstract (if available)
Abstract
Graph signal processing (GSP) extends tools of conventional signal processing to graph signals, i.e., structured data residing on an irregular domain. The graph Fourier transform (GFT), as an extension of the discrete Fourier transform, defines the notion of frequency on graph, and can be used as a tool for frequency analysis of graph signals. However, challenges are posed by the computational complexity of graph signal processing techniques. First, a high dimensional graph Fourier transform can be computationally expensive and there is no fast algorithms for general graphs. Second, in graph filtering, which is a crucial tool in many GSP applications, an efficient implementation without GFT computation is a challenging problem that receives a great amount of attention. Third, while graph-based transforms have been shown to enhance the coding efficiency in MSE-based video compression framework, this application in coding schemes based on perceptually-driven metrics remains an open problem. ❧ With the aim of answering these questions, we propose several techniques in this work. First, we study algebraic properties of the graph Laplacian matrix that lead to fast implementations of the graph Fourier transform, based on which we can design fast algorithms when the graph satisfies certain symmetry or bipartition properties. Second, we propose data-driven approaches to obtain fast GFT solutions for efficient separable and non-separable transforms. Third, we extend the notion of lapped transform to a graph-based setting, and propose a generalized version of the well-known lapped orthogonal transform. Fourth, we explore sparse graph operators for the well-known discrete cosine transform (DCT) and discrete sine transform (DST), which lead us to an efficient DCT/DST graph filtering approach and a fast transform type selection scheme for video encoder. Finally, we apply irregularity-aware graph Fourier transform (IAGFT) to perceptual image coding, which gives promising experimental results in terms of a popular perceptual quality metric, structural similarity (SSIM).
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Efficient coding techniques for high definition video
PDF
Lifting transforms on graphs: theory and applications
PDF
Estimation of graph Laplacian and covariance matrices
PDF
Human motion data analysis and compression using graph based techniques
PDF
Human activity analysis with graph signal processing techniques
PDF
Critically sampled wavelet filterbanks on graphs
PDF
Distributed source coding for image and video applications
PDF
Efficient graph processing with graph semantics aware intelligent storage
PDF
Efficient graph learning: theory and performance evaluation
PDF
Application-driven compressed sensing
PDF
Texture processing for image/video coding and super-resolution applications
PDF
Learning and control for wireless networks via graph signal processing
PDF
Techniques for compressed visual data quality assessment and advanced video coding
PDF
Coded computing: a transformative framework for resilient, secure, private, and communication efficient large scale distributed computing
PDF
Advanced intra prediction techniques for image and video coding
PDF
Scalable sampling and reconstruction for graph signals
PDF
Architecture design and algorithmic optimizations for accelerating graph analytics on FPGA
Asset Metadata
Creator
Lu, Keng-Shih
(author)
Core Title
Efficient transforms for graph signals with applications to video coding
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/11/2021
Defense Date
04/27/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
graph Fourier transform,graph Laplacian,graph signal processing,OAI-PMH Harvest,transform coding,video compression
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ortega, Antonio (
committee chair
)
Creator Email
f28525@gmail.com,kengshil@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-379237
Unique identifier
UC11666181
Identifier
etd-LuKengShih-9037.pdf (filename),usctheses-c89-379237 (legacy record id)
Legacy Identifier
etd-LuKengShih-9037.pdf
Dmrecord
379237
Document Type
Dissertation
Rights
Lu, Keng-Shih
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
graph Fourier transform
graph Laplacian
graph signal processing
transform coding
video compression