Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Lifting transforms on graphs: theory and applications
(USC Thesis Other)
Lifting transforms on graphs: theory and applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
LIFTING TRANSFORMS ON GRAPHS: THEORY AND APPLICATIONS by Godwin Shen A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2010 Copyright 2010 Godwin Shen Dedication To all of my wonderful friends and family. ii Acknowledgments I would first like to thank my advisor, Professor Antonio Ortega, whose wonderful, tireless guidance has shaped my ideas about research, and moreover, has helped to re-shape and refine my approach to writing, analytical / critical thinking and problem solving. Thanks is also due to Professor Bhaskar Krishnamachari and Professor Ramesh Govindan forserving onmy dissertation committee, as well asto Professor C.-C. Jay Kuo and Professor Alexandros Dimakis for being members in my qualifying exam committee. It is a privilege to have their advice on my work. I would also like to thank Hua Xie, Samuel Dolinar, Matthew Klimesh, Aaron KielyandMichael Cheng fromJet Propulsion Laboratory, whohave provided great support throughout my research studies. Interactions with you have been wonder- ful. Iamalsogratefulforthemanyfruitfuldiscussions andfuntimeswehadduring my summer inPasadena. I also owe thanks to JaejoonLee andHoCheon Wey from Samsung Electronics Co., Ltd. Your support during the final year of my research has been greatly appreciated. Special thanks are also due to my colleagues in the Compression Research Group. It has been a pleasure working with all of you and I have enjoyed the nu- merous discussions we had about research, work and life in general. In particular, I would like to thank Sunil Narang for all of our memorable research collaborations and for teaching me everything I know about spectral graph theory. I also owe thanks to Woo-shik Kim for our collaborations and for teaching me so much about iii video compression. I would also like to thank Ivy Tseng, Polin Lai, Insuk Chong and Roger Pique for all of their advice, guidance and support. I also owe a special thanks to Sean McPherson, who has been a great colleague and friend through all of my time spent at USC. From the Autonomous Networks Research Group, I owe special thanks to Prof. Bhaskar Krishnamachari, Sundeep Pattem and Ying Chen for all of the unforget- table years of collaboration. The time spent in our joint efforts has truly enriched my experience at USC. I would also like to thank Paula Tarr´ ıo from Universi- dad Polit´ ecnica de Madrid, Giuseppe Valenzise from Politecnico di Milano, Alfonso S´ anchez from Universidad Polit´ ecnica de Catalunya and Javier Perez Trufero from Universidad Polit´ ecnica de Catalunya for all of the wonderful collaborations we have undertaken. I would also like to thank my mother Elena Shen and father Jen-Chi Kung for their endless love, patience, guidance and support throughout my life. Special thanks also to my brother Ernest Shen and to the rest of my family who have always given me so much love and support. Finally, I would like to thank Vanessa Hadikusumah for all of her love and support. iv Table of Contents Dedication ii Acknowledgments iii List Of Tables viii List Of Figures ix Abstract xiii Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Lifting Transforms on Graphs . . . . . . . . . . . . . . . . . . . . . 5 1.3 Transform-based Distributed Data Gathering . . . . . . . . . . . . . 6 1.4 Joint Optimization of Transform and Routing . . . . . . . . . . . . 8 1.5 Graph-based Transforms for Image Coding . . . . . . . . . . . . . . 8 1.6 Outline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Lifting Transforms on Graphs 11 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Even/Odd Split Design . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Tree-based Even/Odd Split . . . . . . . . . . . . . . . . . . 17 2.2.2 Graph-based Even/Odd Split . . . . . . . . . . . . . . . . . 18 2.3 Prediction Filter Design . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Polynomial Prediction Filters . . . . . . . . . . . . . . . . . 23 2.3.2 Data-adaptive Prediction Filters . . . . . . . . . . . . . . . . 24 2.3.2.1 Optimal Prediction Filters . . . . . . . . . . . . . . 24 2.3.2.2 Approximating Optimal Prediction Filters . . . . . 25 2.4 Update Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.1 Mean-preserving Update Filters . . . . . . . . . . . . . . . . 27 2.4.2 Orthogonalizing Update Filters . . . . . . . . . . . . . . . . 28 2.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 v Chapter 3: Transform-based Distributed Data Gathering 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 En-route In-network Transforms . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2 Definition of Unidirectional Transforms . . . . . . . . . . . . 43 3.2.3 Invertibility Conditions for Unidirectional Transforms . . . . 45 3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Unidirectional Transform Designs . . . . . . . . . . . . . . . . . . . 54 3.3.1 Tree-based Karhunen-Lo` eve Transform . . . . . . . . . . . . 54 3.3.2 Orthogonal Unidirectional Transforms . . . . . . . . . . . . 55 3.3.3 Tree-based DPCM . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.4 Unidirectional Lifting-based Wavelets . . . . . . . . . . . . . 57 3.3.5 Unidirectional 5/3-like Wavelets . . . . . . . . . . . . . . . . 61 3.4 Unidirectional Haar-like Wavelets . . . . . . . . . . . . . . . . . . . 63 3.4.1 Transform Construction . . . . . . . . . . . . . . . . . . . . 63 3.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Quantization of Transform Coefficients . . . . . . . . . . . . . . . . 67 3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 69 3.6.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 71 3.6.3 Comparison of Filter and Even/Odd Split Designs . . . . . . 73 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Chapter 4: Joint Optimization of Transform and Routing 78 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Joint Routing and Transform Optimization . . . . . . . . . . . . . . 83 4.2.1 Optimization Algorithm . . . . . . . . . . . . . . . . . . . . 85 4.2.2 Feasible Set Construction . . . . . . . . . . . . . . . . . . . 86 4.2.3 Feasible Set Search . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Heuristic Approximation Algorithm . . . . . . . . . . . . . . . . . . 88 4.4 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5 Evaluation of MST Performance . . . . . . . . . . . . . . . . . . . . 91 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Chapter 5: Graph-based Transforms for Image Coding 97 5.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.3 Tree-based Lifting Transforms . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 Tree-based Transform Design . . . . . . . . . . . . . . . . . 104 5.3.1.1 Lifting Filter Design . . . . . . . . . . . . . . . . . 105 5.3.1.2 Tree Construction . . . . . . . . . . . . . . . . . . 105 5.3.1.3 Separable Tree-based Transforms . . . . . . . . . . 107 5.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 108 vi 5.4 Edge-Adaptive Intra Prediction . . . . . . . . . . . . . . . . . . . . 115 5.4.1 Edge-adaptive Intra Prediction . . . . . . . . . . . . . . . . 118 5.4.1.1 Edge Detection . . . . . . . . . . . . . . . . . . . . 118 5.4.1.2 Predictor Selection . . . . . . . . . . . . . . . . . . 118 5.4.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 122 5.4.2 RD Optimization . . . . . . . . . . . . . . . . . . . . . . . . 123 5.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 125 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Chapter 6: Conclusions 129 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Appendix A Additional Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.2 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.3 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . 133 Bibliography 137 vii List Of Tables 2.1 Table of common notation. . . . . . . . . . . . . . . . . . . . . . . . 13 5.1 Edge map bit rates (in kbps). . . . . . . . . . . . . . . . . . . . . . 126 viii List Of Figures 1.1 Irregularly spaced nodes organized onto a rooted tree. . . . . . . . . 2 1.2 Example illustrating the communications required to compute the transforms in [23,36,72,73] in a distributed manner. White nodes are even and gray nodes are odd. First even nodes must transmit data to their odd neighbors. Odd nodes receive even node data, compute transformcoefficients, then transmit those coefficients back to their even neighbors (and also to the sink). Even nodes then use these odd node coefficients to compute their own coefficients, then transmitthemtothesink. Notethateven nodesmust transmittheir own data twice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Examples of splitting on multiple trees. Black center node is the sink, gray nodes are even and white nodes are odd. The first level tree is shown in (a). In the second level tree (b), the even nodes from the first level are again split and another level of transform decomposition is performed. . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Exampleofroutingtreeandatreeaugmentedwithbroadcasts. Solid arrows denote forwarding links along the tree and dashed arrows denote broadcast links. . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Example of causal neighborhoods for each node. Node n receives y Dn and y Bn from D n and B n , respectively, processes x(n) together with y Dn and y Bn , then forwards its transform coefficient vector y n through its ancestors inA n . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 Illustration of causal neighborhoods. Noden transmits at time t(n). The left figure shows the full communication graph. The right figure showsthegraphafterremoving broadcastlinksthatviolatecausality and step by step decoding. . . . . . . . . . . . . . . . . . . . . . . . 44 ix 3.4 Example to illustrate unidirectional computations. Nodes gener- ate and transmit transform coefficients in the order specified by the transmission schedule. . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Example of splitting based on the depth of the routing tree. White (odd depth) nodes are odd, gray (even depth) nodes are even and the black center node is the sink. . . . . . . . . . . . . . . . . . . . 58 3.6 Raw data example. Nodes 3 and 6 need x(2) to compute details d(3) and d(6), so they must forward raw data over 1-hop to node 2. Nodes 4 and 5 need d(3) to compute s(4) and s(5), so they must forward raw data over 2-hops. . . . . . . . . . . . . . . . . . . . . . 62 3.7 Unidirectional Computations for Haar-like Transform. In (a), nodes 3 and 6 compute a first level of transform. Then in (b), nodes 3 and 6 compute a second level of transform on smooth coefficients of their children. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.8 No broadcasts are used in (a), so node 11 consumes more resources when transmitting raw data x(11). Broadcasts are used in (b), so node 11 consumes less resources when transmitting detail d(11). . . 67 3.9 Average percent cost reduction ( Cr−Ct Cr ). Solid and dashed lines cor- respond to high and low spatial data correlation, respectively. Best performance achieved by Haar-like transforms, followed by 5/3-like transform and T-DPCM. High correlation data also gives greater reduction than low correlation data. . . . . . . . . . . . . . . . . . . 72 3.10 Sample networks with corresponding Cost-Distortion curves. In (a) and (c), solid lines denote forwarding links, dashed lines are broad- cast links, circles are even nodes, x’s are odd nodes, and the square center node is the sink. . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.11 Filter design comparison. Circles are even nodes and x’s are odd nodes. Adaptive prediction filters do much better than fixed predic- tion filters. Orthogonalizing updates provide almost no gain. . . . . 77 3.12 Split design comparison. Circles are even nodes and x’s are odd nodes. Dashed lines denote broadcast links. Graph-based splits pro- vide some improvements over tree-based splits. . . . . . . . . . . . . 77 4.1 SPT, MST, and Combined Tree . . . . . . . . . . . . . . . . . . . . 82 4.2 Comparison of MST with RD optimal tree. . . . . . . . . . . . . . . 92 4.3 Performance Comparison of MST and RD Optimal Tree . . . . . . 93 x 4.4 JointlyoptimizednetworkwithcorrespondingCost-Distortioncurves. In (a), blue lines denote forwarding links, dashed magenta lines de- note broadcast links, green circles represent even nodes, red x’s rep- resent odd nodes, and the black center node is the sink. . . . . . . . 94 4.5 Comparison of optimized graph-based splitting and optimized rout- ing. In (a), blue lines denote forwarding links, dashed magenta lines denote broadcast links, green circles represent even nodes, red x’s represent odd nodes, and the black center node is the sink. . . . . . 95 4.6 Cost reduction ratios with routing optimization. . . . . . . . . . . . 96 5.1 Example to illustrate tree construction, where links in the tree (de- noted by blue lines between pixels) are not allowed to cross edges in the image (denoted by red dots) . . . . . . . . . . . . . . . . . . . . 108 5.2 The Peppers image (a) and its corresponding edge map (b) . . . . . 109 5.3 The Tsukuba depth map (a) and its corresponding edge map (b) . . 110 5.4 Rate-distortion curve for various transforms using peppers image. Tree-based transforms give the best performance, and orthogonaliz- ing update filters provide additional gain over mean-preserving up- date filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.5 Rate-distortion curve for various transforms using depth map im- age. Tree-based transforms give the best performance, and orthogo- nalizing update filters provide additional gain over mean-preserving update filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.6 Subjectiveperformancecomparisonat0.25bpp. Ourproposedmethod has a PSNR of 42.65 dB whereas the standard 9/7 transform has PSNR of 35.83 dB. This difference is clearly reflected in the recon- structed images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.7 The Lena image (a) and Barbara image (b). . . . . . . . . . . . . . 114 5.8 RD curves for the Lena image (a) and Barbara image (b) . . . . . . 114 5.9 Examples of blocks with different edge structure. Blocks such as those in (a) and (b) can be efficiently represented by existing intra prediction schemes. Blocks such as those in (c) are not efficiently represented. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.10 Predicted pixels (a-p) and predictor pixels (A-M) used in H.264. . . 116 xi 5.11 Exampleofvalidpredictors. Thissectionoftheimageconsistsoftwo flat regions separated by an edge shown by the thick solid line. In this case pixels A,B,...,K and M are all valid predictors for pixels a,b,... andi, but arenot validpredictors for pixelsh andj,k,...,p. On the other hand, pixel L is only a valid predictor for pixels h and j,k,...,p. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.12 Example of graph used to find valid predictors using the same sam- ple from Figure 5.11. The thick dotted line with small black circles denotes the edges. Thin solid lines between pixels represent connec- tions in the graph G. The thick solid line represents the boundary between the current block and previously coded blocks. . . . . . . . 120 5.13 Comparisonoftherate-distortioncurvesbetweentheproposedmeth- ods and H.264 AVC. x-axis: total bitrate to code two depth maps; y-axis: PSNR of luminance component between the rendered view and the ground truth. . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.14 Comparisonoftherate-distortioncurvesbetweentheproposedmeth- ods and H.264 AVC using IPPP structure. x-axis: total bitrate to code two depth maps; y-axis: PSNR of luminance component be- tween the rendered view and the ground truth. . . . . . . . . . . . . 127 xii Abstract There are many scenarios in which data can be organized onto a graph or tree. Data may also be similar across neighbors in the graph, e.g., data across neigh- boring sample points may be spatially correlated. It would therefore be useful to apply some form of transform across neighboring sample points in the graph to exploit this correlation in order to achieve more compact representations. To this end, we describe a general class of de-correlating lifting transforms that can be applied to any graph or tree, and propose a variety of transform optimizations. We mainly focus on the design of tree-based lifting transform designs. Extensions to graph-based lifting transforms are also discussed. As a first application, we develop distributed graph-based transforms for efficient data gathering in wireless sensor networks (WSNs), where the goal is to transmit data from every node in the network to a collection (or sink) node along a routing tree. In particular, we (i) propose a general class of unidirectional transforms that can be computed in a distributed manner as data is routed toward the sink, and (ii) provide conditions for their invertibility. Moreover, we show that any unidirectional lifting transform is invertible, and propose a variety of tree-based lifting transform designs. By us- ing these transforms to de-correlate data in the network, the total communication cost for data gathering is significantly reduced. We also extend these tree-based lifting transforms to incorporate broadcast communication links. This leads to xiii a set of graph-based lifting transforms for WSNs. In particular, nodes incorpo- rate data received from their broadcast neighbors together with data received from their neighbors in the routing tree. By doing so, they are able to achieve more data de-correlation. By exploiting the additional broadcast communication links in this way, these graph-based liftingtransforms reduce thetotalcommunication cost even further. Inadditiontothetransformdesigns, wealsoproposeanalgorithmthatcan jointly optimize the choice of routing tree with the transform. As a second applica- tion, we also develop graph-based transforms for image compression. In particular, wefocusondesigning graph-basedtransformsthatavoidfilteringacrossedgesinan image. This reduces the number of large magnitude coefficients, which are expen- sive to code, and ultimately reduces the total bit rate while also preserving better the edge structure in the reconstructed images. To this end, we first discuss how our tree-based lifting transforms generalize existing wavelet transforms proposed for image coding, then propose algorithms to design the trees and transforms. By avoiding filtering across edges, our tree-based lifting transforms yield better coding efficiency than standard transforms (i.e., the total bit rate is reduced for a fixed reconstruction quality). We also develop an edge-adaptive intra prediction scheme that avoids computing predictions across edges in an image/video frame. Since predictions are not computed across edges, our scheme significantly reduces the number of large magnitude coefficients that must be coded. This new scheme is then incorporated with the intra prediction scheme in H.264/AVC, and is shown to increase the overall coding efficiency of H.264/AVC. Moreover, when using this new scheme to code depth map images in a multi-view video coding system (where virtual views are synthesized using video plus depth from existing views), we also see an improvement in the quality of the virtual views. xiv Chapter 1 Introduction There are many applications in which data can be organized on a graph G = (V,E) or tree T = (V,E ′ ), where the set of tree edges E ′ is a subset of E. As an example, consider a wireless sensor network (WSN) [4] where nodes are able to communicate with each other (possibly via broadcast) through wireless data transmissions. In this case, nodes serve as vertices, V, and the communication links that connect neighboring nodes form the set of edges E. A concrete example of this is illustrated in Figure 1.1, where nodes are organized onto a routing tree T. Even for standard such as images a graph interpretation is possible, i.e., we can view the pixels in the image grid as vertices in a graph or tree. Edges between pixels can then be defined, for instance, as connections between 4-connected or 8-connected neighbors. As we will soon see, a graph-based representation is useful since it allows us to generalize standard signal processing operations (e.g., filtering, transforms, de-noising) to different types of data. In this chapter, we first provide some motivation and preliminaries for a graph-based representation in Section 1.1. Wethengiveanoverview oftheworkproposedinthisthesisinSections1.2,1.3,1.4 and 1.5. 1 3 4 2 6 1 5 7 8 11 12 13 9 10 Figure 1.1: Irregularly spaced nodes organized onto a rooted tree. 1.1 Motivation Wenowdescribethebasicgraph-basedrepresentationandprovidesomemotivation for the work in this thesis. Let {x(n)} N n=1 denote a finite-dimensional signal and suppose thattheN valuesaretakenfromsomearbitrarysampling grid. Thepoints in the sampling grid (i.e., the sample points) can be organized as vertices V in a graphG,witheachvertexcorrespondingto,forexample,apointinEuclideanspace. Theconnectivity between vertices canthenberepresented byedgesE inthegraph, with the “neighbors” of each vertex defined as other vertices that are connected to it (e.g., its “one-hop” neighbors in the graph). This leads to a graph-based signal representation for x(n). Note that for standard signals (e.g., digital audio signals and images), samples aretypicallyplacedonaregularlyspacedsamplinggridonthereallineRorthereal planeR 2 . Therearemanysignalprocessingtoolsavailableforregularlyspaceddata, e.g.,filtersforprocessing[39],transformsforcompressionandanalysis[22,28,47,64]. 2 However, none of these tools can be used for irregularly spaced sampling grids. Thus, the utility of this graph-based representation is two-fold. First, it allows us to generalize existing signal processing techniques to irregularly spaced data. Since irregularly spaced sampling grids are common to many applications (e.g., WSNs [14], distributed databases [18], and de-noising of scattered data [23,36]), one overarching theme of this thesis is to develop signal processing tools that can be applied to irregularly spaced data by utilizing this graph-based representation. Secondly, graph-based representations for regularly spaced data can also be used to develop transforms that are adapted to the underlying signal structure, e.g., discontinuities (or edges) in an image. A set of “edge-adaptive” transforms is also described in this thesis. In standard applications such as image processing and compression, filtering is typically done along 1D paths. For example, in JPEG [42] and JPEG-2000 [67] filtering is performed along the rows and columns of an image. Filtering could also be applied along 1D paths that are oriented diagonally as in the Directional DCT [78], Directionlets [71] and Bandelets [43]. Note that in all of these (and related) techniques filtering is done along non-overlapping 1D paths. While these are reasonable strategies for signals on regular sampling grids, they are not easily extended to signals on irregular sampling grids. For irregular grids, it is more sensible to form the sample points onto a graph or tree, then to perform filtering along these graphs or trees. In particular, note that filtering along 1D paths is just a special case of filtering along trees with non-overlapping paths (i.e., every node in the tree has at most one child). On the other hand, more general trees will have have multiple 1D paths that merge together (i.e., nodes can have more than one child). Therefore, in this thesis the primary focus is to develop techniques for filtering along arbitrary trees. This provides a more general framework for filtering 3 since data can be filtered along overlapping 1D paths. We mainly focus on a tree- based signal representation and develop algorithms for processing data along these trees, then discuss how to extend these algorithms to more general graphs. Ofthewidevarietyofsignal processing techniques thatcanbegeneralizedusing a tree-based (and graph-based) signal representation, we only focus on designing de-correlatinglineartransformsfordatacompression. Lineartransformsareimpor- tant since data across neighboring sample points is typically well correlated, e.g., neighboring pixels in an image tend to have similar intensity, temperature read- ings tend to be very similar across neighboring nodes in a WSN, etcetera. Thus, applying a de-correlating linear transform (e.g., a DCT or wavelet transform) to the data will allow us to reduce the amount of information (i.e., bits) needed to represent it. We would also like the transforms to be localized in the sense that the transform only operates on sets of sample points that are physically close to- gether. Let x = [x(1),x(2),...,x(N)] t be a vector containing the measurements of N nodes in a graph. An example of a transform which is localized for the tree shown in Figure 1.1 is y = A 1 0 0 0 0 A 5 0 0 0 0 A 7 0 0 0 0 A 11 · x 1 x 5 x 7 x 11 , where x 1 = [x(1),x(2),x(3),x(4)] t , x 5 = [x(5),x(6)] t , x 7 = [x(7),x(8),x(9),x(10)] t and x 11 = [x(11),x(12),x(13)] t . Note that the transform operations on nodes 1 through 4 only involve data from nodes 1 to 4, hence, the transform is localized to nodes 1 to 4. Similar logic follows for other nodes. These local transforms are 4 also convenient since they can be computed in a distributed manner, thus, they are easilyamenabletodistributedcompressioninWSNs. Moreover,wecaneasilyadapt localized transforms to local signal structures such as discontinuities. This is useful for image compression. In this work, we mainly focus on developing tree-based lifting transforms since (i) they have been shown to provide goodde-correlation for multiple applications and (ii) they are local transforms. The transform designs in this thesis are geared toward two applications. The first is distributed data gathering for WSN, where the goal is to gather data from every node in the network at a central collection (or sink) node. In this applica- tion, de-correlating linear transforms are often computed in the network in order to reduce the amount of data nodes must transmit to a central collection (or sink) node. This reduces the total communication cost that nodes incur while transport- ing data to the sink node. Tree-based transforms are developed for these WSNs as well as graph-based extensions that incorporate broadcast wireless communication links. The second application is image compression, where tree-based transforms are developed in which the paths in the tree do not cross over discontinuities (i.e., edges) in the image. Since filtering along these trees avoids filtering across edges, fewer large magnitude coefficients are produced. Thus, fewer bits are needed to represent the transform coefficients. 1.2 Lifting Transforms on Graphs In terms of transform designs, our primary focus is on the design of wavelet trans- forms constructed on graphs and trees using lifting [65]. Lifting transforms are invertible by construction, and are fully specified by (i) a split step which divides the sample points into an even and odd set, (ii) a prediction step that filters data 5 from odd sample points with data from even ones (yielding detail coefficients), and (iii) an update step which filters data from even sample points with detail coeffi- cients from odd ones. Since arbitrary split, prediction and updates can be used without affecting invertibility, these transforms are very flexible. They can also be made local by only allowing odd (resp. even) sample points to use data from their even (resp. odd) neighbors in the graph. Furthermore, these lifting transforms have consistently shown excellent performance in many applications, e.g., in data de-noising [23,36] and distributed data gathering [72]. In this thesis we describe a framework that encompasses these tree-based and graph-based lifting transforms. We primarily focus on tree-based lifting transform designs, where the splitting is done along a tree and filtering operations are only performed across neighbors in the tree, though we have also proposed graph-based splitting schemes [37]. Similar techniques have also been proposed [23,36]. Thus, we also compare tree-based and graph-based lifting transforms. In terms of novel contributions, we propose new tree-based lifting transforms with (i) a new split design along an arbitrary tree, (ii) anextension ofexisting adaptivepredictionfilterstoWSNsand(iii)anovel update filter design. 1.3 Transform-basedDistributedDataGathering In terms of applications, we first apply these lifting transforms to data gathering in WSN. Notethatcomputing thetransformsin[23,36,72,73]inadistributed manner will require nodes to make many (local) data transmissions before coefficients can be encoded and transferred to the collection node along an efficient routing tree. An example of this is shown in Figure 1.2. As such, these strategies are not very efficient in terms of energy expenditure. Instead, it would be better to design 6 transforms that can be computed as data is being routed to the sink node. This will eliminate the need for excessive local data transmissions, thereby leading to distributed implementations that are more energy-efficient. 3 4 2 6 1 5 7 8 11 12 13 9 10 3 4 2 6 1 5 7 8 11 12 13 9 10 3 4 2 6 1 5 7 8 11 12 13 9 10 Forwarding Step Prediction Step Update Step x(3) x(3) x(1) x(1) x(5) x(5) x(11) x(11) x(7) x(7) x(9) x(9) d(4) d(2) d(4) d(2) d(4) d(6) d(12) d(13) d(10) d(8) d(8) d(10) [d(2) d(4)] [d(12) d(13)] [d(8) d(10)] d(6) s(3) s(3) s(5) s(11) [s(7) s(9)] [s(1) s(3)] s(9) Figure1.2: Exampleillustratingthecommunicationsrequiredtocomputethetrans- forms in [23,36,72,73] in a distributed manner. White nodes are even and gray nodes are odd. First even nodes must transmit data to their odd neighbors. Odd nodes receive even node data, compute transform coefficients, then transmit those coefficients back to their even neighbors (and also to the sink). Even nodes then use these odd node coefficients to compute their own coefficients, then transmit them to the sink. Note that even nodes must transmit their own data twice. Inthisthesiswe(i)designageneral classoftransforms(notrestricted tolifting) that can be computed in a distributed manner as data is routed to the sink node, then (ii) provide lifting transform designs that fall into this general class. While these designs are specific to WSN, they can be easily extended to other applica- tions. We assume that data is forwarded along the tree according to a transmission scheduling and that for transform computations sensor nodes can only use data that they receive before they transmit. This includes data that sensor nodes re- ceive along the tree, i.e., from descendants, as well as data overheard via broadcast communications. This leads to a general class of transforms which we refer to as unidirectional transforms; a general definition andinvertibility conditions aregiven 7 in Chapter 3. We then describe how to translate existing unidirectional trans- forms into our framework in order to demonstrate its generality. Finally, novel energy-efficient lifting transforms are proposed that provide superior performance over existing methods. 1.4 JointOptimizationofTransformandRouting The choice of routing tree also impacts the performance of unidirectional trans- forms. For example, from a routing perspective the best trees are the well-known shortest path routing trees [12] (SPTs). While SPT routing provides the minimum cost to route a fixed amount of data to the sink, transforms computed along these trees will not always provide the most compact representation of the underlying data. In particular, an SPT will provide the minimum distance from any node to the root, but will not necessarily minimize the distance between each node and its neighbors in the tree. As was pointed out in our previous work [54], if data corre- lation is inversely proportional to distance, a minimum spanning tree [12] (MST) which minimizes the sum of the distances between neighboring nodes will provide a more compact representation of the data than an SPT. Thus, routing with com- pression on a shortest path routing tree (SPT) will not necessarily provide the minimum total cost. Another major contribution of this thesis is a joint routing and transform optimization algorithm, as described in Chapter 4. 1.5 Graph-based Transforms for Image Coding The second application of this thesis is image coding. Correlation across neighbor- ing pixels in an image is typically exploited in one of two ways. First, separable 8 filtering (i.e., filtering is done first along rows, then along columns) is typically ap- plied, e.g., pixel data is filtered using separable DCT bases [47] as in JPEG [42], or with separable wavelet bases as in JPEG-2000 [67]. These bases can repre- sent smooth images with few horizontal, vertical and diagonal discontinuities very compactly. However, for images with complex discontinuities, these transforms produce many large magnitude high-pass coefficients. Large magnitude high-pass components require many bits to be encoded, i.e., they increase the total bit rate. Furthermore, quantization of these large high-pass components leads to annoying compression artifacts such as ringing. One way to deal with this is to construct transformsalonggraphsthateitheravoiddiscontinuitiesorperformfilteringparallel to them. When doing so, the number of large magnitude high-pass coefficients can be significantly reduced. The structure of the discontinuities will also be well pre- served, therebyreducing theamount ofringingartifacts. InChapter 5weconstruct trees that do not have links between pixels that are separated by discontinuities, then we design transforms along these trees. We apply these transforms to natural and depth map images, and see performance that is superior to standard separable transforms. Another way correlation in images is typically exploited is through block-based intra prediction schemes used in, for example, H.264/AVC and MPEG-4. The prediction is typically done along a fixed set of directions and the “best” direction is chosen as the final “intra prediction mode”. While these directional prediction methodscanprovideaccuratepredictionsofblockswithasinglediagonaledge(and therefore, can provide low energy prediction residuals forthese blocks), they do not provide accurate for blocks with more complex edge structure such as “L” or “V” shaped edges. Thus, we also develop an edge-adaptive intra prediction scheme that can be easily integrated with existing techniques. When applied to intra predictive 9 coding of depth map images, we see significant gains with respect to existing intra prediction schemes. 1.6 Outline Thisthesisisorganizedasfollows. Firstweprovideanoverviewofliftingtransforms in Chapter 2. We then describe the data gathering problem for WSN and propose a general framework for efficient de-correlating transforms that can be computed in the network in Chapter 3. Various tree-based and graph-based transforms are also proposed in Chapter 3. In Chapter 4 we also propose a joint routing and transform optimizationmethodfor WSN. InChapter 5 we proposeavariety oftree-based and graph-based transforms for image compression. Finally, some concluding remarks and interesting directions for future work are discussed in Chapter 6. 10 Chapter 2 Lifting Transforms on Graphs We now focus on the construction and optimization of tree-based and graph-based lifting transforms. As a starting point, in Section 2.1 we establish some definitions andnotationforliftingtransforms[65]andshowthatthesetransformsareinvertible by construction. These transforms consist of three key components. The first is a split step that divides nodes into disjoint sets of even and odd nodes. Prediction filters must also be designed with which data at odd nodes is linearly predicted from data at even nodes (yielding detail coefficients). Finally, update filters are used to linearly update data at even nodes using detail coefficients from odd nodes (yielding smooth coefficients). Multiple prediction and update steps can be used. We initially make no assumption about the relationships between these nodes, i.e., we do not assume anything about the structure of the graph nor is there any notion of relative position or distance, though some notion of this would be useful when defining the filtering operations used in the transform. Therefore, the lifting transforms presented as such are very general. InSection2.2,twosplitdesignproceduresarediscussedthatcanbeappliedtoan arbitrary rooted tree or to an arbitrary graph. These designs were first introduced byusin[37,55]. Wenotethatdatade-correlationoccursintheprediction step, i.e., 11 ifanappropriatechoiceofpredictionfilterismadeforeachoddnode,theprediction madefromitsneighbors’datawillbeveryclosetoitsowndata,hence, theresulting prediction residual (i.e., detail coefficient) will be close to zero. This is useful since small prediction residuals require very fewbits tobeencoded. Naturally, thechoice ofpredictionfilter depends onthepropertiesofthegivendata. Thus, inSection2.3 we will discuss prediction filter designs that minimize the average energy in the prediction residuals. This result and an algorithm for learning these prediction filters were described by the author in [52]. When a prediction step is applied without any update step, issues like numer- ical instability of the inverse transform [23,67] and propagation of quantization errors [16,67] will arise. This will reduce the quality of the reconstructed data. Thus, it is also important to include an update step to mitigate these effects. It is also desirable to design update filters that have certain properties such as pre- serving the average value of coefficients across multiple levels of decomposition or orthogonalitybetween low-pass (i.e., update) andhigh-pass (i.e., prediction) filters. Various update filter designs are discussed in Section 2.4. In particular, we propose an update filter design that makes the low-pass and high-pass filters orthogonal. Thisresultwasfirstintroducedbyusin[56]. Furthermore, weshowthatthischoice of update filters also minimizes the reconstruction MSE due to quantization of the transform coefficients. 2.1 Preliminaries Liftingtransformsarecomputedbysplitting (i.e., partitioning)nodesintoevenand oddsets, filtering datafromoddnodeswithdatafromeven nodestoproduce detail coefficients, and then filtering data from even nodes with details coefficients from 12 Parameter Description N Number of sample points (nodes) I Set of node indices e i i-th identity vector, e i (i) = 1,e i (j) = 0 for all j 6=i I Identity matrix 0 All zero vector x(i) Data at node i∈I ˆ x(i) Prediction of x(i) x Vector of original data T Lifting transform matrix y Vector of lifting transform coefficients E andO Set of even and odd nodes (E∩O =∅) N i Set of even (odd) neighbors of odd (even) node i p n Prediction vector (filter) for odd node n d(n) Detail coefficient of odd node n P Prediction matrix d Vector of detail coefficients u m Update vector (filter) for even node m s(m) Smooth coefficient of even node m U Update matrix Table 2.1: Table of common notation. oddnodes. Aswewill soonshow, since dataatoddnodesisonlyfilteredusing data from even nodes and vice versa, the corresponding transform is guaranteed to be invertible. We first establish some notation that will be throughout the remainder of this chapter. The notation is summarized in Table 2.1. Suppose that there are N sample points indexed by I = {1,2,...,N}. Let x(n) denote the value at sample point n, and let x = [x(1),x(2),...,x(N)] t . Let E and O be two disjoint sets of even and odd nodes, respectively. For each odd node n, let N n ⊂E be the set of even neighbors of node n. The prediction vector p n is used to produce a prediction as P m∈Nn p n (m)x(m) and this prediction is subtracted from x(n) to yield detail coefficient d(n) = x(n)− P m∈Nn p n (m)x(m). Nopredictions areperformedforevennodedata, thus,p m =0forallm∈E, where 0 is the all zero vector. Since data for odd node n is only filtered using even node 13 data in N n , we also have that p n (k) = 0 for all k ∈O∪(E−N n ). For each even node m, let N m ⊂O denote the set of odd neighbors of node m. Data from each even node m is then updated using update vector u m , yielding smooth coefficient s(m) =x(m)+ P n∈Nm u m (n)d(n). Since odd node data is not updated,u n =0 for all n∈O. Since data from even node m is only filtered with odd node data from N m , we also have thatu m (l) = 0 for all l∈E∪(O−N m ). This is all summarized in the following definition. Definition 1 (Single Step Lifting Transform). Let there be N data points x(n) indexed by n ∈ I = {1,2,...,N}. Let I be partitioned into two disjoint sets of even and odd nodes denoted byE andO respectively, i.e.,I =E∪O andE∩O =∅. For each m ∈ E, let N m ⊂ O. Similarly, for each n∈ O, N n ⊂ E. A single step lifting transform is a linear prediction step followed by a linear update step. Let p n denote the N×1 prediction operator used for node n and let u m denote the N×1 update operator for node m. The lifting transform is computed in the prediction step first, yielding detail coefficients for each n∈O as d(n) =x(n)− X i∈Nn p n (i)x(i). (2.1) In the update step, smooth coefficients are computed for each m∈E as s(m) =x(m)+ X j∈Nm u m (j)d(j). (2.2) Since N n ⊂ E for all n ∈ O, we have that p n (j) = 0, for all j ∈ O∪(E −N n ). Similarly, N m ⊂ O for all m ∈ E, hence, u m (j) = 0, for all j ∈ E ∪(O−N m ). Moreover, p m =0, for m∈E and u n =0, for n∈O. 14 Note that these operations correspond to a set of N vector inner products. For example, if e n represents the n-th identity vector (i.e., e n (n) = 1 ande n (l) = 0 for all l 6= n), then for each n ∈ O, x(n) = e t n ·x and P m∈Nn p n (m)x(m) = p t n ·x. Thus, d(n) = (e n −p n ) t ·x. Therefore, we can also express the transform as a single matrix operation y = (I +U)·(I−P)·x as shown in Proposition 1 (see Appendix A for the proof). Proposition 1 (Lifting Transform Matrices). Let the vectors u n and p n satisfy Definition 1 for all n. Let P be the N ×N prediction matrix with row n (P) = p t n . Similarly, let U be the N × N update matrix with row n (U) = u t n . The lifting transform matrix is simply T = (I+U)·(I−P) and we can compute the vector of coefficients as y =T·x. Note that the non-zero filter coefficients are completely unconstrained in Def- inition 1, thus, this represents a very general class of transforms. The inverse of I−P and I+U also exist by construction and are easily shown to be I+P and I−U, respectively, as shown in Proposition 2 (the proof is in Appendix A). Proposition 2 (Inverse Lifting Transform Matrices). Let E and O satisfy Defini- tion 1, and letP andU satisfy the assumptions in Proposition 1. Then (I−P) −1 = I+P and (I+U) −1 =I−U. We can also introduce non-zero normalization factors into the filters without affecting invertibility. In particular, if the prediction operation at node n is nor- malizedbyafactorc n,p , thisisequivalent tomultiplying thepredictionmatrixI−P by a diagonal matrixD p = diag(c 1,p ,c 2,p ,...,c N,p ). The same is true for the update matrix I+U for some diagonal matrix D u . As long as the normalization factors are non-zero (there is no practical reason why they should be zero), the overall transform y ′ =D u ·(I+U)·D p ·(I−P)·x, will be trivially invertible. 15 Corollary 1 (Lifting Filter Normalization). Let E, O, P and U be specified as in Definition 1 and Proposition 1. Let c 1,p ,c 2,p ,...,c N,p (resp. c 1,u ,c 2,u ,...,c N,u ) be a set of normalization factors for the prediction (resp. update) filters, with D p = diag(c 1,p ,c 2,p ,...,c N,p ) (resp. D u = diag(c 1,u ,c 2,u ,...,c N,u )). The normalized prediction and update filters are then given by the matrices P ′ =D p ·(I−P) and U ′ = D u ·(I+U) respectively. Moreover, (P ′ ) −1 = (I+P)·D −1 p and (U ′ ) −1 = (I−U)·D −1 u . This can be easily generalized to multiple lifting (i.e., prediction and update) steps and multiple levels of decomposition. LetO j andE j be odd and even sets of nodes, respectively, for j = 0,1,2,...,J and some positive integer J. We assume that E 0 =I and O 0 =∅. Suppose that E j−1 =E j ∪O j and E j ∩O j =∅ for all j. This provides a direct analogy to the standard dyadic decomposition. At each level j, let K j denote the number of lifting steps and let the k-th prediction and update filters be respectively denoted by the vectors p k n,j and u k m,j for all m,n ∈ I. Of course each of these should satisfy Definition 1. Moreover, letP j,k andU j,k satisfy the assumptions in Proposition 1 for each j, let d j (n) denote the detail coefficient of each n ∈ O j and let s j (m) denote the smooth coefficient of each m ∈ E j . By convention, weletP 0 =U 0 =0ands 0 (n) =x(n)foralln. Thisrepresents amulti- level transform decomposition on the original data, with the aggregate transform operations defined as y = Q J j=1 Q K j k=1 (I +U j,k )· (I−P j,k )·x. Each (I−P j,k ) and (I+U j,k ) is invertible by Proposition 2. Therefore, the overall transform is invertible. This is formally stated in Corollary 2. Note that this transform is still invertible when filter normalization is introduced. This follows by a simple extension of Corollary 1. 16 Corollary 2 (Invertible Multi-level Lifting Transforms). Let E j and O j satisfy Definition 1 and P j,k and U j,k satisfy the assumptions in Proposition 1 for all j = 0,1,2,...,J for some positive integer J, and for all k = 1,2,...,K j . Suppose thatE j−1 =E j ∪O j andE j ∩O j =∅forallj. Thenthetransformy = Q J j=1 Q K j k=1 (I+ U j,k )·(I−P j,k )·x is invertible with (I−P j,k ) −1 =I+P j,k and (I+U j,k ) −1 =I−U j,k for all j and k. 2.2 Even/Odd Split Design Assume that nodes are organized on some graph G. This graph could naturally arisefroma routingtreeasinaWSN, or couldbedefined based onsome additional information as we shall see in the case of images.. We can now investigate exactly how nodes should be split into even and odd sets. An even/odd splitting strategy on trees is described in Section 2.2.1 and a set of strategies for graphs are described in Section 2.2.2. Both the tree-based and graph-based splitting methods are used for the WSN application in Chapter 3 and an experimental evaluation is provided in Section 3.6.3. 2.2.1 Tree-based Even/Odd Split Let T denote a rooted tree with root node indexed by N +1. This provides some notion of relative position inT. In particular, every nodenwill have a parentρ(n), children C n , descendants D n , ancestors A n , and will be h(n) hops away from the root node. h(n) can also be thought of as the depth of n in T. We can use this information to define the splitting in a manner analogous to the even/odd splitting done on 1D data, e.g., where inZ, samples occurring at even integers are even and those occurring at odd integers are odd. 17 Onenaturalwayofdoinganeven/oddsplitting alongT (analogoustoeven/odd splittingin1D)istousetheparityofthedepthofeachnode. Thissplittingmethod was introduced by us in [55], where each node n for which h(n) is odd fall into the odd set O, i.e., O = {n : h(n) mod 2 = 1}. Similarly, each node m such that h(m) is even fall into the even set E, i.e., E = {m : h(m) mod 2 = 0}. An example of this split design is shown in Figure 2.1(a). Clearly O∩E = ∅, and so any prediction and update operations satisfying Definition 1 for this choice of E and O will yield an invertible transform. In general, multiple trees T j can be defined for some j = 1,2,...,J and positive integer J, with corresponding even setsE j and odd setsO j . Lifting transforms can then be defined on each T j and, by Corollary 2, each of these transforms will be invertible. Therefore, any multi-level transform constructed in this way will also be invertible. An example of splitting over multiple trees is shown in Figure 2.1. This split design is adopted later in Chapter 3 for the WSN application. 2.2.2 Graph-based Even/Odd Split While even/odd splitting on a tree is rather simple, it has its disadvantages since it willnotexploitlinksthatexistsbetweennodesthatarenotdirectlyconnectedinthe tree. Forexample, intheWSN applicationarouting treeistypically given, butdue to the broadcast nature of wireless communication [10,77], multiple nodes will be abletooverhear asingledatatransmission. Thisinduces additionalcommunication links on top of those along the routing tree, thus, a graph arises. Since it may be possible to achieve more de-correlation by doing an even/odd splitting on this graph (since nodes will typically have more neighbors on a graph than along a tree), it would generally be better to do a graph-based splitting whenever possible. 18 17 5 3 6 2 16 11 10 12 1 9 13 14 21 23 22 15 18 19 20 7 8 4 (a) Split on 1 st tree 17 5 16 10 12 23 22 19 20 7 8 4 (b) Split on 2 nd trees Figure 2.1: Examples of splitting on multiple trees. Black center node is the sink, gray nodes are even and white nodes are odd. The first level tree is shown in (a). In the second level tree (b), the even nodes from the first level are again split and another level of transform decomposition is performed. Moreover, there are other applications [23,36] where only the connectivity between nodes is given (e.g., no tree is provided); clearly a graph-based even/odd splitting is needed in these cases. Thus, we also summarize results on graph-based splitting methods for lifting transforms. Various graph-based even/odd splitting methods have been proposed in the literature [3,23,36,72,73]. The techniques in [3,72,73] are used for distributed compression in WSNs, where nodes are assumed to be randomly distributed on some subset of R 2 . In this case, roughly speaking, the nodes with the largest number of neighbors within a certain distance R are chosen as odd and the rest are chosen as even. This is done over multiple splitting stages j = 1,2,...,J until nodes can no longer be split, and it induces a series of graphs G j , each of which is used to determine the j-th level splitting. This is one example of splitting nodes along a graph G. Note that under this even/odd split design, each odd node will 19 have many even neighbors. Thus, this even/odd split is particularly useful since a very accurate prediction ˆ x(n) can be generated for each odd node n. Therefore, d(n) =x(n)−ˆ x(n) will tend to be small on average. The graph-based split design in [23] uses similar ideas (i.e., each odd node should have many even neighbors), thoughitwasdevelopedandusedspecificallyforthede-noisingofirregularlyspaced data. Alternatively, the even/odd split in [36] attempts to find an even/odd splitting in which the number of links between different even (resp. odd) nodes is minimum. Themotivationinthatworkistoutilizeasmanylinksinthegraphaspossiblewhen computing a lifting transform. Since even (resp. odd) node data is not processed usingothereven(resp. odd)nodedata, thelinksbetween differenteven(resp. odd) nodes will not be used in the transforms. Thus, the goal in that work is to find a split that minimizes the number of links between different even (resp. odd) nodes. Thisisdonebysearching forabi-partitegraph(whichpartitionsnodesintodisjoint even and odd subsets) that minimizes the “conflict fraction”, i.e., the number of links between even nodes plus the number of links between odd nodes divided by thetotalnumber oflinks inthegraph. Sinceoddnodeswill have (onaverage) more even neighbors in graph-based splits than in tree-based splits, the predictions are likely to be more accurate. Thus, there will be less energy in the detail coefficients; this reduction in energy will generally lead to more efficient signal representations (e.g., better energy compaction and coding performance). The graph-based even/odd split we proposed in [37] attempts to optimize the totalenergyconsumptioninaWSN.Thisisdonebyminimizing thenumberofeven nodesundertheconstraintthateveryoddnodehasatleastoneevenneighbor. This willtendtoproducesplitsforwhichoddnodeshaverelativelyfewevenneighbors,so itwillnotbegoodfromadatade-correlationstandpoint. However, inthecontextof 20 WSN wherethegoalistominimize thetotalenergyconsumption, usingdistributed lifting transforms with very few even nodes turns out to be very beneficial. To summarize [37], the main observations are that (i) any form of transform-based distributed data gathering requires some nodes to transmit raw data (i.e., there must be some raw data nodes), (ii) nodes that receive raw data can perform some aggregation to remove correlation as to reduce the amount of data they need to transmit (i.e., there are some aggregating nodes), and (iii) the cost to transmit raw data is typically much higher than the cost to transmit aggregated data. Thus, the main goal in that work is to minimize the number of raw data nodes (or rather, to minimize the total cost incurred by raw data nodes) under some constraints. This is a rather general framework (i.e., assignment of raw and aggregating nodes), and liftingtransformsarejustoneexamplewhereevennodesserveasrawnodesandodd nodes (which compute residual detail coefficients) act as aggregating nodes. Some comparisons of tree-based and graph-based splits will be presented in Section 3.6.3. In summary, the graph-based even/odd splitting techniques proposed in [3,23, 36,72,73] are very general and can be applied to any graph. Thus, these graph- based even/odd split designs (along with the tree-based split designs) could also be applied to other applications such as image coding. On the other hand, the graph- based split proposed in [37] was designed specifically for distributed compression in WSNs, and may not provide good performance for other applications such as image coding. 2.3 Prediction Filter Design Assume (as in Section 2.2) that a graph G = (V,E) is given, nodes are placed in Z 2 and the position of each node n is given by (i n ,j n ). As discussed before, data 21 de-correlation occurs in the prediction step, i.e., for each odd node n a prediction ˆ x(n) = P m∈Nn p n (m)x(m) and detail coefficient d(n) = x(n)− ˆ x(n) is computed and encoded. If ˆ x(n) ≈ x(n), then d(n) ≈ 0, so that it can be encoded using significantly fewer bits thanwould be needed to encodex(n). Thus, the goal inthis section is to design a prediction filterp n that produces very accurate predictions of x(n) using data from an arbitrary set of neighbors N n , i.e., we want to design p n such that ˆ x(n) = P m∈Nn p n (m)x(m) ≈ x(n). The proper choice of p n ultimately depends on how data is correlated across nodes. In this section we present two methods for computing “good” prediction filters (e.g., “good”inthesensethat|d(n)| 2 isminimized). Inthefirstmethod, weassume that data is locally very smooth, so that the data across neighboring nodes is well approximated by apolynomial function. Morespecifically, foreach oddnoden, the data from its neighbors inN n can be well approximated by a K-degree polynomial P(i,j|N n ), in which case we can accurately predict x(n) by ˆ x(n) = P(i n ,j n |N n ), i.e., d(n) =x(n)− ˆ x(n)≈ 0. This will be discussed in Section 2.3.1. If the data is not piece-wise polynomial but is spatially stationary, with the correlationbetweenanynodenandmfollowingcorrelationfunctionR XX (n,m),we can still compute good prediction filters. Thus, in the second method we describe how to compute prediction filter p n that minimizes the mean squared error of detail coefficient d(n). They are simply the well known linear minimum mean squared error (LMMSE) prediction filters [20], and are optimal in the sense that they minimize E[|d(n)| 2 ], where E[·] denotes the expected value operation. If the data is stationary but the correlation function R XX (n,m) is unknown, we can use an adaptive filter [20] to estimate the optimal filters. This will be discussed in Section 2.3.2 in the context of distributed compression for WSNs and was initially introduced by us in [52]. 22 2.3.1 Polynomial Prediction Filters If the data is nearly piece-wise polynomial, then for each odd node n, we can fit data from even neighborsN n to a 2D polynomial P(i,j|N n ), and can predict x(n) by ˆ x(n) = P(i n ,j n |N n ) with great accuracy, i.e., d(n) = x(n)− ˆ x(n) ≈ 0. As an example of this type of design, consider a planar prediction ofx(n) from{x(m)} Nn as was proposed in [3,72,73]. Suppose thatN n ={m 1 ,m 2 ,...,m |Nn| }. The best-fit plane (in the least squares sense [63]) P(i,j|N n ) = A·i+B·j +C of the data {x(m i )} m i ∈Nn can be found by solving x(m 1 ) . . . x(m k(n) ) = i m 1 j m 1 1 . . . i m |Nn| j m |Nn| 1 · A B C . (2.3) Let x Nn = x(m 1 ),x(m 2 ),...,x(m |Nn| ) t ,c = (A,B,C) t and A n = i m 1 j m 1 1 . . . i m |Nn| j m |Nn| 1 . Thenabest-fitplaneexistsaslongasrank(A t n A n ) = 3,sincethenc = (A t n A n ) −1 A t n x(N n ). Note that rank(A t n A n ) = rank(A n ) [63], so a best fit plane exists if and only if rank(A n ) = 3. Since ˆ x(n) = A · i n + B · j n + C = [i n j n 1]· c, ˆ x(n) = [i n j n 1]·(A t n A n ) −1 ·A t n ·x Nn . Thus, p n (N n ) = [i n j n 1]·(A t n A n ) −1 ·A t n t , and is only a function of the positions of n andN n . For a generalK-degree polynomial we have P(i,j|N n ) = P K k=0 P K l=0 C k,l ·i k ·j l . In this case, we must estimate (K+1) 2 of the C k,l parameters, i.e., to approximate x(n) with a K-degree polynomial fitted to {x(m i )} m i ∈Nn , n must have at least 23 (K +1) 2 neighbors. For many practical applications, sets of neighbors whose size scales as(K+1) 2 may not be feasible for largeK, so using high degree polynomials to compute predictions is not very practical. 2.3.2 Data-adaptive Prediction Filters If the data is spatially stationary with correlation between samples at node n and m given by R XX (n,m), we can still compute good prediction filters. In this case, for each odd node n, we want to find a prediction vector for x(n), that minimizes the energy in residual prediction errord(n) =x(n)− P i∈Nn p n (i)x(i), i.e., we want to find p ∗ n = argmin pn E[|x(n)− X i∈Nn p n (i)x(i)| 2 ]. (2.4) The solution is the well-known Wiener-Hopf solution [20], and is a function of the correlation R XX (i,j) =E[x(i)x(j)] between nodes i,j ∈N n . 2.3.2.1 Optimal Prediction Filters We derive the optimal solution to (2.4) for the sake of completeness. Note that x ∗ (n) = P i∈Nn p ∗ n (i)x(i) is the LMMSE estimate of x(n) only if the orthogonality principle [62] is satisfied. Thus, E{[x(n)−x ∗ (n)]x(j)} =E{[x(n)− X i∈Nn p ∗ n (i)x(i)]x(j)} = 0, for all j ∈N n (2.5) This implies that, for all j ∈ N n , E[x(n)x(j)] = P i∈Nn p ∗ n (i)E[x(i)x(j)]. Since R X (i,j) =E[x(i)x(j)], we can simplify (2.5) as X i∈Nn p ∗ n (i)R X (i,j) =R X (n,j), for all j ∈N n . (2.6) 24 LetN n ={i 1 ,i 2 ,...,i |Nn| }. Let the|N n |×|N n | matrixR n be defined byR n (k,l) = R X (i k ,i l ) and let the |N n |×1 vector r n be defined by r n (k) = R X (n,i k ). We can now express (2.6) above asR n p ∗ n (N n ) =r n . So as long asR n is invertible, we have an optimal solution for node n. If R n is positive definite, then p ∗ n (N n ) =R −1 n r n . 2.3.2.2 Approximating Optimal Prediction Filters We now describe an algorithm to estimate the optimal prediction filters in the con- text of WSN as was introduced by us in [52]. Note that estimating these statistics in a WSN will be costly in terms of delay, computation and communication. More- over, a large amount of data is generally needed to reliably estimate correlation matrices. Alternatively, we can use adaptive filters to estimate the optimal spatial prediction filters over time with no learning cost since (i) they converge to the op- timal filters for stationary data, (ii) they do not require estimates of data statistics and (iii) the filtering done at one node can be replicated at any other node (e.g., the sink) given the same prediction errors and initial prediction filters. Note that if quantization is used, then both nodes must use the same quantized prediction errors to update the filters. More specifically, in order for the sink to re-produce the same prediction filters used at an odd node n, it must use (i) the same initial prediction filter as node n, (ii) the same prediction errors that were generated by node n, and (iii) the same data that node n used to compute the prediction errors. Conditions (i) and(ii) are easily met. In the context of lifting, (iii) isalso met since the update step is always inverted before the prediction step is inverted; thus, the sink can always recover data that node n used to compute prediction errors. In this way, we can apply an adaptive filter at each odd node to estimate the optimal prediction filters without specifying any additional information to the sink. Note that it still takes time for the filters to adapt to the data well enough to produce 25 good predictions. Thus, there will be a small learning cost for nodes to initially “train” their filters and also to “re-train” their filters when data statistics change (i.e., the overall encoding rate will be higher during training periods, during which filters have not yet converged to a state that matches current data statistics.) There are many adaptive filters that we can choose from, but the step-size parameter μ often must be chosen based on data dependent parameters to ensure filter convergence. We generally will not know those parameters, thus, the most suitable choice is a normalized least mean squares adaptive filter [20] since μ need notbespecifiedbutisinsteadadaptedasthefilterisadapted. Somenotationisnow established. Supposenodesmeasuredataattimest 1 ,t 2 ,...,t M . Letx(n,m)denote the data at node n captured at time step t m . The N ×M prediction coefficient matrix for node n is given by p n , where column i, i.e., p n (:,i), is the prediction vector at the i-th time step at node n. The adaptive filter at each odd node n is then computed, fromm = 1 tom =M, asd(n,m) =x(n,m)−p t n (N n ,m)x(N n ,m) and the update equationp n (N n ,m+1) =p n (N n ,m)+˜ μ x(Nn,m)d(n,m) x t (Nn,m)x(Nn,m) , where ˜ μ is a parameter that can be used to speed up (or slow down) the rate of convergence. For correlated Gaussian data, the optimal value of ˜ μ = 1 [20]. 2.4 Update Filter Design It is also necessary to provide some form of update step after prediction in order to reduce the effects of numerical instability and propagation of quantization er- rors. We now present two update filter designs. The first one, as presented in Section 2.4.1, was proposed in [72,73]. It essentially preserves the average value of smooth coefficients across multiple levels of decomposition. This reduces the harmful effects caused by numerical instability [23], but theresulting low-pass (LP) 26 filters may notbe orthogonaltothe resulting high-pass(HP)filters. Therefore, any quantization error introduced in the HP and LP subbands will propagate through the inverse transform into both subbands. Instead, it would be better to design update filters that force LP filters to be orthogonal to HP filters. We provide such an update filter design in Section 2.4.2 and show that it always exists. This design was initially proposed by the author in [56]. Moreover, we show that this choice of update filter (for a fixed prediction filter design) minimizes the reconstruction mean squared error due to quantization of transform coefficients. Comparisons of these various update filter designs are given in Section 3.6.3 and in Section 5.3.2. As we will see in Section 5.3.2 the orthogonalizing update filter design provides a modest increase in coding efficiency. 2.4.1 Mean-preserving Update Filters A method to preserve the average value of coefficients across multiple levels of decomposition on a finite number of irregularly spaced data points was proposed in [72,73]. For every even node m ∈ E with prediction filters p n from neigh- bors n ∈ N m , the update filter which preserves the average value of the smooth coefficients is computed as a function of each p n . While this does provide some smoothingproperties, theproposeddesigndoesnotyieldLPandHPfiltersthatare mutually orthogonal. As pointed out in [16], this is problematic when quantization is introduced since it will cause the quantization errors from one subband to prop- agate into the reconstructed samples from other subbands. This will increase the total quantization error in the reconstructed data. Therefore, it is more desirable to design a lifting transform that forces the low-pass component to be orthogonal to the high-pass component. 27 2.4.2 Orthogonalizing Update Filters We now describe the orthogonalizing update filter design as was first introduced by the author in [56]. As has been discussed, it is desirable to design update filters that makes the LP signal component orthogonal to the HP signal component after each lifting step. More specifically, we would like to decompose the signal vector as x = x e +x o , with < x e ,x o >= 0. This should increase the overall energy compaction of the transform. It will also be useful when performing quantization since it essentially isolates quantization noise into each sub-band. Note that after eachliftingstep, theeven samples correspondtothe“low-pass” component andthe odd samples to the “high-pass” component. Furthermore, since I+U and I−P are invertible, T = (I +U)(I−P) is also invertible. Let row i (T) = t t i and let T −1 = ˜ t 1 ˜ t 2 ... ˜ t N . Since {t i } i∈I and { ˜ t i } i∈I form a pair of dual bases, we can represent our signal as x = P N i=1 < t i ,x > ˜ t i . An orthogonal decomposition will then be obtained if we can construct lifting filters that force < ˜ t i , ˜ t j >= 0 for any i∈E and j∈O, since then we have x =x e +x o with x e = P i∈E <t i ,x> ˜ t i , x o = P j∈O <t j ,x> ˜ t j and <x e ,x o >= 0. We would like to design update filters that provide the desired orthogonality of the dual basis vectors { ˜ t i } i∈I . To achieve this, we assume a fixed prediction filter design, then construct update filters such that the “equivalent filter” of any even node (t n for n∈E) is orthogonal to the “equivalent filter” of every odd node (t m for m ∈ O). Suppose there are N e and N o even and odd nodes, respectively, and let E ={j 1 ,j 2 ,...,j Ne } and O ={i 1 ,i 2 ,...,i No } be the sets of even and odd indices, respectively. The equivalent filter for every even node n is t n = e n + P No k=1 u n (i k )d(i k ) and the equivalent filter for every odd node m is t m = e m − p m . What we are seeking is an “orthogonalizing” update filter design for which 28 <t n ,t m >= 0 for all n∈E and m∈O. The orthogonalizing update filter design is presented in Proposition 3. This design is also sufficient to provide the desired orthogonality result as stated in Proposition 4 (the proof is in Appendix A), that is, our proposed lifting filter design ensures that <t n ,t m >= 0 for any n∈E and m∈O, and this implies that < ˜ t n , ˜ t m >= 0 for any n∈E and m∈O. Proposition3 (OrthogonalizingUpdateFilters). Let the prediction filter for every odd node be fixed and let N n = O for all n∈ E. Let t n be the equivalent filter of node n, i.e., it is the filter resulting from application of both the prediction and update step. Then the equivalent filter of an even node n (e.g., every LP filter) is orthogonal to the equivalent filter of every odd node i k (e.g., every HP filter), i.e., (e i k −p i k ) t t n = 0, ∀k = 1,2,...,N o (2.7) if and only if u n (O) = I No + ˜ P t ˜ P −1 ˜ P t e n . (2.8) Since I No + ˜ P t ˜ P −1 always exists, we always have update filters for whicht t n t m = 0, ∀m∈O,n∈E. Proof. Since x(n) =e t n x and d(i k ) = (e i k −p i k ) t x, we have that s(n) = x(n)+ No X k=1 u n (i k )d(i k ) = e t n ·x+ No X k=1 u n (i k )(e i k −p i k ) t ·x = e n + No X k=1 u n (i k )(e i k −p i k ) ! t ·x = t t n x 29 (2.7) is satisfied if and only if (e i l −p i l ) t e n + No X k=1 u n (i k )(e i k −p i k ) ! = 0, ∀i l ∈O. (2.9) Since n∈E, n6= i l . Thus, e i l (n) = 0, and so e t i l e n = 0. Since p i k (i l ) = 0 for all k by Definition 1,e t i l p i k = 0 for all k. Similarly,p i l (i k ) = 0 for all k, sop t i l e i k = 0 for all k. Therefore, (2.9) becomes e t i l No X k=1 e i k u n (i k )+p t i l No X k=1 p i k u n (i k ) =p t i l e n , ∀i l ∈O. (2.10) If we define u n (O) = [u n (i 1 ),...,u n (i No )] t and ˜ I = e i 1 ...e i No , we have that P No k=1 e i k u n (i k ) = ˜ I·u n (O) and P No k=1 p i k u n (i k ) = ˜ P·u n (O). Thus, (2.10) becomes e t i l · ˜ I·u n (O)+p t i l · ˜ P·u n (O) =p t i l ·e n , ∀i l ∈O. (2.11) This provides a set of N o linear equations in N o unknowns, and we can express (2.7) as h ˜ I t ˜ I+ ˜ P t ˜ P i ·u n (O) = ˜ P t ·e n . (2.12) Note that ˜ I t ˜ I = I No , the N o × N o identity matrix. Moreover, for any x 6= 0, x t h ˜ I t ˜ I+ ˜ P t ˜ P i x =||x|| 2 +|| ˜ Px|| 2 > 0. Thus, h I No + ˜ P t ˜ P i is positive definite and (2.8) follows. Since t m =e m −p m for all m∈O,t t n t m = 0, ∀m∈O,n∈E. Proposition 3 yields a filter design in which the equivalent filter of every even node is orthogonal to the filter of every odd node. Intuitively, this provides a decomposition of the signal into two separate subbands and one should expect or- thogonality between the signal components in each subband. In fact, orthogonality between the filters of even and odd nodes is sufficient to guarantee an orthogonal 30 decomposition as x = x e +x o with x t e x o = 0. This is formally stated in Proposi- tion 4 and is proven in the Appendix. Moreover, under a fixed prediction design and some mild assumptions about quantization noise, the filter design proposed in [16] is equivalent to the design in Proposition 3, where in [16] it was shown that this design minimizes the mean-squared reconstruction error due to quantization of the transform coefficients. Thus, our proposed update filters are also useful in coding applications. Proposition 4 (Orthogonal Decomposition). Let there be N nodes with E, O, P and U specified as in Definition 1 and Proposition 1. Let T = (I+U)(I−P) = [t 1 t 2 ... t N ] t and T −1 = ˜ t 1 ˜ t 2 ... ˜ t N . Suppose the lifting filters have been designed as in Proposition 3. Then for any vector x ∈ R N , x = x e +x o , with x e = P i∈E <t i ,x> ˜ t i , x o = P j∈O <t j ,x> ˜ t j , and <x e ,x o >= 0. 2.4.3 Discussion Some remarks are now in order. Note that the work initially proposed by the author in [56] only provided the result in Proposition 3. In this thesis, we have also provenProposition4,whichshowsthattheupdatedesignproposedinProposition3 (from [56]) provides an orthogonal decomposition ofx =x e +x o , with <x e ,x o >= 0. This is useful from a coding perspective since the quantization errors of the even (LP) components are also orthogonal to the quantization errors of the odd (HP) components, i.e., it essentially isolates the quantization errors made in one subbandfromtheother. Thisisolationofquantizationerrorsshouldintuitivelylead to minimum reconstruction error. This is in fact the case since we have arrived at the same solution as in [16] (which aims at minimizing the reconstruction error due to quantization), and the connection with the work in [16] shows why our proposed 31 update filters (and in particular, orthogonal decompositions) are useful in coding applications. 2.5 Conclusions In this chapter we have shown that lifting transforms are invertible by construc- tion (Proposition 2, Corollary 2), have proposed various methods for even/odd splitting, proposed optimized prediction filter designs, and have developed optimal update filter designs. The even/odd splitting methods described have been opti- mized with de-correlation in mind [3,23,36,72,73], and have also been optimized with the goal of minimizing total energy-consumption in WSN [37]. The proposed prediction filters are optimal in the sense that the average energy in prediction residuals is minimized [20,52]. This is useful from a coding perspective since lower energy in the prediction residuals generally leads to fewer bits needed to repre- sent them. Since these optimal filters depend on the correlation structure in the data, which is not always known, an adaptive prediction filter method was also developed [52]. These adaptive filters converge to the optimal filters under some stationarity assumptions [20]. Finally, orthogonalizing update filters were also pro- posed [56] (Proposition 3). It was shown that these filters provide an orthogonal decomposition of the input signal (Proposition 4), and moreover, this choice of fil- ters also minimizes the reconstruction MSE. These designs and optimizations are used throughout the remainder of this thesis, with the goal of (i) minimizing total energy consumption in WSN, and (ii) providing efficient image representations for image coding. 32 Chapter 3 Transform-based Distributed Data Gathering We now describe how to apply these tree-based and graph-based lifting trans- forms to WSNs. Most of the work described in this chapter was proposed by us in [37,52,55,57,58]. We focus on the data gathering problem where the goal is to collect data from every node in a WSN at a central collection (or sink) node. In particular, we assume that the nodes in the WSN are organized onto a routing tree. The application is introduced in detail in Section 3.1. We then develop a gen- eral framework for computing “unidirectional” transforms along routing trees (i.e., transformsthatarecomputedasdataisroutedtowardthesinknodealongthetree) and provide a set of conditions under which these transforms are invertible. This framework was initially introduced by us in [58], and was fully formalized in [57]. It is described in detail in Section 3.2. Once the basic framework is established, we then show how existing transforms fit into this framework. This was also described in [57], and is described in this thesis in Section 3.3. Finally, we discuss how to design unidirectional tree-based and graph-based lifting transforms, then compare the performance against existing work. Again, this work was initially proposed by the author in [57], and is described in detail in Section 3.3.4, 3.3.5 and 3.4. 33 3.1 Introduction In networks such as wireless sensor networks (WSNs), one major challenge is to gather data from a set of nodes and transfer it to a collection (or sink) node as efficientlyaspossible. Efficiencycanbemeasuredintermsofbandwidthutilization, energy consumption, etc. We refer to this as the data gathering problem. The gathering is typically done in data gathering rounds or epochs along a collection of routing paths to the sink, i.e., in every epoch each node forwards data that it has measuredalongamulti-hoppathtothesink. Asimplegatheringstrategyistohave each node route raw data to the sink in a way that minimizes some cost metric, e.g., number of hops to the sink, energy consumption. This minimizes the amount of resources nodes use to transfer raw data to the sink and is the basis for many practical systems used in WSN such as the Collection Tree Protocol (CTP) [68]. However, it has been recognized in the literature [2,4] that, in a WSN, (i) spatial data correlation may exist across neighboring nodes and (ii) nodes that are not adjacent to each other in a routing path can still communicate due to broadcasted wireless transmissions 1 . Raw data forwarding does not make use of these two facts, thus, it will not be the most efficient data gathering method in general. When spatial data correlation exists, it may be useful to apply in-network com- pression distributed across the nodes to reduce this data redundancy [4]. More specifically, nodes can exchange data with their neighbors in order to remove spa- tial data correlation. This will lead to a representation requiring fewer bits per measurement as compared to a raw data representation, also leading to reduced en- ergy consumption, bandwidth usage, delay, etc. Since nodes in a WSN are severely energy-constrained [2,4,74], some form of in-network processing that removes data 1 Data transmissions in a WSN are typically broadcast [10,77], so multiple nodes can receive a single data transmission. 34 redundancy will help reduce the amount of energy nodes consume in transmitting data to the sink. In this way the lifetime of a WSN can be extended. This could also be useful in other bandwidth-limited applications such as underwater acoustic networks [46] and structural health monitoring [34]. Generally speaking, distributed spatial compression schemes require some form of data exchange between nodes. Therefore, one needs to select both a routing strategy and a processing strategy. The routing strategy defines what data com- munications nodes need to make and the processing strategy defines how each node processes data. There are a variety of approaches available, e.g., distributed source coding (DSC) techniques [13,45], transform-based methods like Distributed KLT [15], Ken [5], PAQ [69], and wavelet-based approaches [1,3,6,7,9,55,72,73]. NotethatDSCtechniques donotrequirenodestoexchangedatainordertoachieve compression. Instead, each node can compress its own data using some statistical correlation model. Note, however, that an estimate of these models must be known at every node, so nodes will still need to do some initial data exchange in order to learnthemodels(afterwhichcompressioncanbedoneindependentlyateachnode). Our work only considers transform-based methods, which use linear transforms to decorrelate data while distributing transform computations across different nodes. While we do not consider DSC approaches, our algorithms could be useful in the training phase of these methods to estimate correlation. Ken and PAQ are exam- ples of approaches we consider, where data at each node is predicted using a linear combination of measurements from the node and measurements received from its neighbors. Similarly, the Distributed KLT, wavelet-based methods and many other related methods also use linear transforms to decorrelate data. Therefore, we can restrict ourselves to linear in-network transforms while still encompassing a general class of techniques. 35 Many of the existing transform-based methods propose a specific transform first, then design routing and processing strategies that allow the transform to be computed in the network. Some examples are the wavelet transforms proposed in [3,9,72,73], the Distributed KLT, Ken and PAQ. While these methods are good from a data decorrelation standpoint, the routing and processing strategies that are used to facilitate distributed processing may not always be efficient in terms of data transport cost. In particular, nodes may have to transmit their own data multipletimes[72,73],nodesmayneedtotransmitmultiplecopiesofthesamecoef- ficients [9], or nodes may even need to transmit data away fromthe sink [15,72,73]. As discussed in [57,72], this sort of strategy can outperform raw data gathering for very dense networks, but it can lead to significant communication overhead for small to medium sized ones (less than 200 nodes). Other related methods may also suffer from such inefficiencies. The results of our previous work [54,55] and of [72] demonstrate why transport costs cannot be ignored. One simple way to work around these issues is to first design an efficient routing tree (e.g., a shortest path routing tree, or SPT), then allow thetransform computations to occur onlyalong the routing paths in thetree. We call these types of schemes en-route in-network transforms. These transforms (e.g., the wavelet transforms in [1,6–9,55]) will typically be more efficient since they are computed as data is routed to the sink along efficient routing paths. In addition to overall efficiency, these transforms can be easily integrated on top of existing routing protocols, i.e., a routing tree can be given by a protocol, then the transform can be constructed along the tree. This allows such schemes to be easily usable in a WSN - as demonstrated by the SenZip [41] compression tool, which includes an implementation of our algorithm in [55] - as well as other types of data gathering networks [34,46]. 36 We note that all existing en-route transforms start from well-known transforms, then modify them to work on routing trees. Instead, in this work we start from a routing tree T and additional links given by broadcast (e.g., Figure 3.1). We then pose the following questions: (i) what is the full set of transforms that can be computed as data is routed toward the sink along T and (ii) what are conditions for invertibility of these transforms? The main goal of this work is to determine this general set of invertible, en-route in-network transforms. Note that in many transform-based compression systems, design or selection of a transform is con- sidered separately from the design of a quantization and encoding strategy. This is done in practice in order to simplify the system design (e.g., [67]). In general certain properties of the transform (energy compaction, orthogonality) can serve as indicatorsofachievableperformanceinthelossy case. Weadoptasimilar approach in our work, choosing to only focus on the transform design. Simple quantization and encoding schemes can then be applied to the transform coefficients, as demon- strated in our experimental results. Joint optimization of routing and compression is also possible, as in [40,48] and in Chapter 4 [54], but this is beyond the scope of this section. Here we only focus on the design of transforms for a fixed routing tree such as, e.g., an SPT. In order toformulate thisproblem, we first notethat thedatagathering process consists of data measurement at each node and routing of data to the sink along T done in accordance with some transmission scheduling, i.e., nodes transmit data along T in a certain order. Also note that data is only transmitted along T in the direction of the sink, i.e., data transmissions are unidirectional toward the sink. Moreover, each node can only process its own data with data received from other nodes that transmit before it, i.e., processing of data must be causal in accordance with the transmission schedule. In particular, before each node transmits it will 37 Forwarding Link Broadcast Link Sensor Node Sink Node After adding broadcast links 3 4 2 6 1 5 7 8 11 12 13 9 10 T A 3 4 2 6 1 5 7 8 11 12 13 9 10 T Figure 3.1: Example of routing tree and a tree augmented with broadcasts. Solid arrows denote forwarding links along the tree and dashed arrows denote broadcast links. only have access to data received from nodes that use it as a relay in a multi- hop path to the sink (i.e., “descendants”) and nodes whose data it receives but is not responsible for forwarding to the sink (i.e., “broadcast” neighbors). When- ever broadcast is used, data from a single node will often be available at multiple nodes. While this can help to decorrelate data even further (since more data will be available for transform computations at each node), it would be undesirable to transmit this same piece of data through multiple paths since this would increase the overall communication cost. Thus, in addition to causality and unidirectional- ity, the transform should also be critically sampled, i.e., the number of transform coefficients that are computed and routed to the sink is equal to the number of nodes in the network. We refer to causal, critically-sampled transforms that are computed in a unidirectional manner as unidirectional transforms. Aswewillshow, unidirectionaltransformscanbedefinedintermsoftherouting tree, the broadcast links induced by the routing and the transmission schedule. 38 Thus, given a tree and transmission schedule, the main problem we address in this work is to determine a set of necessary and sufficient conditions under which an arbitrary unidirectional transform is invertible. While unidirectional transforms have been proposed, to the best of our knowledge, none of the existing works have attempted to define the most general set of unidirectional transforms, nor has any attempt been made to find conditions under which such transforms are invertible. Ourproposedtheoryalsoincorporatestheuseofbroadcastdatainageneralsetting. This leads us to develop transforms that use broadcasts in a manner not previously considered. This contribution is discussed in detail in Section 3.2, and was initially proposed by us in [57]. In the context of wavelet transforms for WSNs, early work [1,6,7,9] developed unidirectional wavelet transforms on 1D routing paths in WSNs. Extensions to 2D routing paths on arbitrary routing trees were made by the authors in [54,55]. The superiority of these 1D [9] and 2D [55] transforms over the method in [72] (which requires a great deal of backward communication) was demonstrated in [55]. Gen- eral unidirectional transforms wereinitially proposedbyus in[58], inthecontext of liftingtransforms[65],andconditionsforsingle-level invertibleunidirectionallifting transforms were initially proposed there. However, no invertibility conditions were provided for general unidirectional transforms, nor were any conditions given for invertible multi-level unidirectional lifting transforms. We provide such conditions here (Section 3.2 and 3.3.4) as well as new transform designs (Section 3.4) that outperform previously proposed transforms. General unidirectional transforms with a set of necessary and sufficient invert- ibility conditions are presented in Section 3.2. In order to demonstrate the gener- ality of our proposed theory, Section 3.3 shows how existing unidirectional trans- forms (e.g., the tree-based KLT [52], tree-based differential pulse code modulation 39 (T-DPCM) [41,52] and lifting transforms [52,58]) can be mapped into our frame- work. Moreover, our proposed formalism is used to construct general unidirec- tional lifting transforms. Some of the inefficiencies of existing lifting transforms are then discussed. In order to address these inefficiencies, we define a new Haar-like wavelet transform in Section 3.4 which is analogous to the standard Haar wavelet when applied to 1Dpaths. As is shown in Section 3.4, our formalizationguarantees invertibility of these Haar-like transforms, and also leads to an extension which in- corporates broadcast. Section 3.6 provides experimental results that demonstrate the benefits of using our proposed transforms. 3.2 En-route In-network Transforms In this section, assuming a fixed routing tree T and schedule t(n) are given, we provide a definition of unidirectional transforms and determine conditions for their invertibility. Some notation is established in Section 3.2.1. Unidirectional trans- forms are then defined in Section 3.2.2. Section 3.2.3 presents a set of conditions under which these transforms are invertible. Throughout this discussion, the con- figuration of the network in terms routing and scheduling is assumed to be known. Section 3.2.4 addresses how this can be achieved in practice and how our approach can be used with decentralized initialization approaches. 3.2.1 Notation Assume there are N nodes in the network with a given routing tree T = (V,E T ), where V = {1,2,...,N,N +1}, each node is indexed by n ∈ I = {1,2,...,N}, the sink node is indexed by N +1, and (m,n) ∈ E T denotes an edge from node m to node n along T. We also assume that there is a graph G = (V,E) which is 40 defined by the edges in E T and any additional edges that arise from the broadcast nature of wireless communications. An example graph is shown on the right side of Figure 3.1. We observe that data gathering consists of three key components. The first is data measurement, where each node n measures some scalar data x(n) that it must send to the sink in each epoch (these ideas can be easily generalized to non-scalar data 2 ). Additionally, node n must route its data to the sink along T. The tree T is defined by assigning to every node n a parent ρ(n). We assume that these trees are provided by a standard routing protocol such as CTP. Finally, we assume that data transmissions are scheduled [10,60] in some manner, i.e., node n will transmit data to its parent ρ(n) at time t(n) according to a transmission schedule (see Definition 2). CTP is a practical example that can be viewed in terms of this formalization: nodes are assigned parents in a distributed manner, data is forwarded to the sink along the corresponding routing paths and the times at which nodes transmit serve as an implicit transmission schedule. Definition 2 (Transmission Schedule). A transmission schedule is a function t : I → {1,2,...,M slot }, such that t(n) = j when node n transmits in the j-th time slot 3 . Moreover, node n transmits data before node m whenever t(n)<t(m). Note that, along the tree T, each node has a set of descendants D n which use node n as a data relay to the sink and a set of ancestors A n that node n uses for relaying data to the sink. Also let each node n be h(n) hops away from the sink node, i.e., n has depthh(n) inT. We also letC k n denote the descendants ofn which 2 One straightforward extension is to use a “separable” transform, where a transform is first applied in one dimension (e.g., over time or across dimensions of a multivariate input) and then in the other (i.e., spatially). 3 Note that these time slots arenot necessarilyof equallength; they simply allowus to describe the order in which communications proceed in the network; before time slot t(n), node n is listening to other nodes, and at time t(n) node n starts transmitting its own data, and potentially data from its descendants in the routing tree. 41 areexactlyk hopsawayfromn, i.e.,C k n ={m∈D n |ρ k (m) =n}, whereρ k (m)isthe k-th ancestor of node m (e.g., ρ 1 (m) is the parent of m, ρ 2 (m) is the grandparent ofm, etc). For instance,C 1 n is the set of children ofn,C 2 n is the set of grandchildren of n, etc, and for simplicity we let C n =C 1 n . Also note that data can be heard via broadcast in WSNs, so we let B f n define the full set of broadcast neighbors whose data node n can overhear due to broadcast. Under thisformulation, eachnodencanprocessitsowndatawithdatareceived fromD n andB f n . This yields transform coefficients y(n) and y(m) for each descen- dant m∈D n . We make an abuse of notation by letting y(D n ) ={y(m)|m∈D n }. Since nodenis onlyresponsible forforwardingy(n)andy(D n ) to itsparentρ(n), it should not transmit any data received from broadcast neighbors. In particular, we assume that node n transmits the transform coefficient vector y n = [y(n)y(D n )] t to its parent ρ(n) at time t(n). We refer to this as critical-sampling, where in each epoch only one transform coefficient per sample per node is generated and then transmitted to the sink. Also note that y(n) and y(D n ) can be further processed at the ancestors of n. We refer to this as delayed processing. Note that data is only transmitted along T toward the sink, i.e., data relay is unidirectional toward the sink. The existence of a transmission schedule - given explicitly or implicitly - also induces a notion of causality for transform computa- tions. In particular, the computations performed at each node n can only involve x(n) and anyy m received from a node m that transmits data before node n. More specifically, nodes can only use data from m∈B f n if t(m) < t(n) (we assume that t(m ′ ) < t(n) for all m ′ ∈D n ). These constraints (i.e., causality and unidirectional relay) induce causal neighborhoods whose data each node n can use for processing, where we letB n ={m∈B f n |t(m)<t(n)} denote the set of causal broadcast neigh- bors. These can be abstracted as in Figure 3.2 where y Dn = h y t Cn(1) ... y t Cn(|Cn|) i t 42 and y Bn = h y t Bn(1) ... y t Bn(|Bn|) i t . These ideas are illustrated in Figure 3.3. For instance, nodes 4 and 12 will not receive data from node 2 before they transmit, thus, they cannot use it for processing. These are formally defined as follows. n A n D n B n y D n y B n y n Figure 3.2: Example of causal neighborhoods for each node. Node n receives y Dn and y Bn from D n and B n , respectively, processes x(n) together with y Dn and y Bn , then forwards its transform coefficient vector y n through its ancestors inA n . Definition 3 (Causal Neighborhoods). Given a routing tree T and schedule t(n), the causal neighborhood of each node n is the union of the descendantsD n and the set of causal broadcast neighbors B n = {m ∈ B f n |t(m) < t(n)}, i.e., D n ∪B n . We also define ¯ B n =B n ∪ m∈Bn D m for future discussions. 3.2.2 Definition of Unidirectional Transforms We define a unidirectional transform (not necessarily invertible) as any transform that (i) is computed unidirectionally along a tree T and (ii) satisfies causality and critical sampling. Now we canestablish the general algebraic formof unidirectional transforms. Without loss of generality, assume that node indices follow a pre-order numbering[70]onT,i.e.,D n ={n+1,n+2,...,n+|D n |}foralln(seeFigure3.3for an example). A pre-order numbering always exists, and can be found via standard 43 Forwarding Link Broadcast Link Sensor Node Sink Node After removing forbidden links 3 4 2 6 1 5 7 8 11 12 13 9 10 t(3) = 2 t(9) = 1 t(13) = 3 t(12) = 4 t(10) = 5 t(8) = 6 t(6) = 7 t(4) = 8 t(2) = 9 t(11) = 10 t(7) = 11 t(5) = 12 t(1) = 13 3 4 2 6 1 5 7 8 11 12 13 9 10 t(3) = 2 t(9) = 1 t(13) = 3 t(12) = 4 t(10) = 5 t(8) = 6 t(6) = 7 t(4) = 8 t(2) = 9 t(11) = 10 t(7) = 11 t(5) = 12 t(1) = 13 Figure 3.3: Illustration of causal neighborhoods. Node n transmits at time t(n). The left figure shows the full communication graph. The right figure shows the graph after removing broadcast links that violate causality and step by step decod- ing. algorithms [70]. For the sake of simplicity, we also assume that the transmission schedule provides a unique time slot to each node 4 , i.e., t(n)6=t(m) for all n6=m. Recall that each noden receivesy Dn andy Bn from its descendants and (causal) broadcast neighbors, respectively. Thus, inageneral unidirectional transform, each node processes its own data x(n) along with y Dn and y Bn . Then, it will transmit coefficient vector y n at time t(n). We omit t(n) from the notation of y n since the timing is implicit. In order to satisfy critical-sampling, each node should only forward 1+|D n | coefficients to the sink. Therefore, y n must be a (1+|D n |)×1 dimensional vector. A unidirectional transform can now be expressed as follows. Definition 4 (Unidirectional Transform). Let T be a routing tree with a unique time slot assignment given by t(n), and suppose that the causal neighborhood of 4 We note that the time slot assignment need not be unique. However, this assumption sig- nificantly simplifies the transform construction and invertibility conditions. It is easy to develop similar transformconstructions when multiple nodes are assignedthe sametime slots, and similar invertibility conditions arise. 44 each node is given by Definition 3. A unidirectional transform on T is a collection of local transformations done at each node n given by y n = A n B 1 n ... B |Bn| n · x(n) y Dn y Bn , (3.1) where y n has dimension (1+|D n |)×1, A n has dimension (1+|D n |)×(1+|D n |) and each B i n has dimension (1+|D n |)×(1+|D Bn(i) |). The transform is computed starting from the node at the first time slot up through the nodes in the remaining time slots k = 2,3,...,N. 3.2.3 InvertibilityConditionsforUnidirectionalTransforms We now establish a set of invertibility conditions for unidirectional transforms. Notethat these transforms arealways computed ina particular order, e.g., starting from nodes furthest from the sink (i.e., “leaf” nodes), up to nodes which are 1-hop fromthesink. Somesortofinterleavedscheduling (whereonesetofnodestransmits beforetherest)couldalsobeused[37]. Therefore, itwouldalsobedesirabletohave step by step decoding in the reverse order, since this would simplify the transform constructions. In particular, if the overall transform can be inverted by inverting the computations done at each node in the reverse order, then invertibility will be ensured by designing invertible transforms at each node. Stepbystepdecodinginthereverseorderistriviallyguaranteedwhennobroad- castdataisusedsincethetransformateachnodenissimplyy n =A n · x(n)y t Dn t . Thus, if each A n is invertible, we can invert the operations done at node n as x(n)y t Dn t = (A n ) −1 ·y n . This becomes more complicated when broadcast data is 45 used. By examining (3.1), we observe thaty n =A n · x(n)y t Dn t + h B 1 n ... B |Bn| n i · y Bn , wherey Bn = h y t Bn(1) ... y t Bn(|Bn|) i t . In order to have step by step decodability, we need to be able to recover (for every node n) x(n) and y Dn from y n and y Bn . Note that this fails whenever we cannot decode some transform coefficient vector y m from broadcast node m∈B n before decoding y n . It will also fail if the matrix operations performed at any given node are not invertible. Thus, in order to guar- antee step by step decodability, we need to ensure that (i) the matrix operations at each node are invertible, and (ii) it is possible to decode each y m before decoding y n . Aswenowshow, (i)isguaranteedbyensuringthateachA n matrixisinvertible and (ii) is guaranteed by imposing a timing condition. Proposition 5 (Step by Step Decodability). Suppose that we have the transform in Definition 4 and assume that t(ρ(m)) > t(n) for every broadcast node m∈B n . Then we can recover x(n) and y Dn as x(n)y t Dn t =A −1 n ·y n −A −1 n · B 1 n ... B |Bn| n ·y Bn (3.2) if and only if A −1 n exists. Proof. Note that the vector transmitted by any broadcast node m ∈ B n will be processed at its parent, node ρ(m), and this processing will occur at time t(ρ(m)). Moreover, noden will generate its own transform coefficient vectory n at timet(n), and by assumption we have that t(ρ(m))>t(n). Thus, it is possible to decodey m beforey n for every broadcast neighborm∈B n . It follows that, we can always form y Bn = h y t Bn(1) ... y t Bn(|Bn|) i t before decoding y n . Therefore, we can recover x(n) and y Dn as in (3.2) if and only if A −1 n exists. 46 To simplify our transform constructions, we also assume that nodes use the latest version of broadcast data that they receive, i.e., m∈B n only ifA m ∩B n =∅. This second constraint precludes the possibility that a node n receives broadcast data from node m and from an ancestor of node m. Removing the broadcast links which violatetheseconstraints gives asimplified communicationgraphasshown on therightsideofFigure3.3. Removaloftheselinkscanbedonebylocalinformation exchange within the network; examples of how this can be achieved are discussed in Section 3.2.4. Under the constraint of Prop. 5 and this second constraint, we can represent the global transformtaking place in the network asfollows. Since the time slot assignment is unique, at time t(n) only data from n and its descendants will be modified, i.e., only x(n) and y(D n ) will be changed at time t(n). Since pre- order indexing is used, we have thaty Dn = [y(n+1), ..., y(n+|D n |)] t . Therefore, the global transform computations done at timet(n) are given by (3.3), where each ˜ y i corresponds to data which is not processed at time t(n). ˜ y 1 y Bn(1) . . . ˜ y k y n ˜ y k+1 . . . y Bn(|Bn|) ˜ y K = I 0 ... 0 0 0 ... 0 0 0 I ... 0 0 0 ... 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 ... I 0 0 ... 0 0 0 B 1 n ... 0 A n 0 ... B |Bn| n 0 0 0 ... 0 0 I ... 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 ... 0 0 0 ... I 0 0 0 ... 0 0 0 ... 0 I ˜ y 1 y Bn(1) . . . ˜ y k x(n) y Dn ˜ y k+1 . . . y Bn(|Bn|) ˜ y K (3.3) 47 The global transform matrix C t(n) at time t(n) is just C t(n) = I 0 ... 0 0 0 ... 0 0 0 I ... 0 0 0 ... 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 ... I 0 0 ... 0 0 0 B 1 n ... 0 A n 0 ... B |Bn| n 0 0 0 ... 0 0 I ... 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 ... 0 0 0 ... I 0 0 0 ... 0 0 0 ... 0 I . (3.4) This yields the global transform coefficient vector y =C N ·C N−1 ···C 1 ·x. (3.5) Figure3.4illustratesthesecomputations. Initially,y =x = [x(1)x(2) ... x(5)] t . At times 1 and 2, nodes 3 and 5, respectively, transmit raw data to their parents. Therefore, the global matrices at times 1 and2 aresimplyC 1 =C 2 =I. At time 3, node 4 produces (3.6), where a i and b i represent arbitrary values of the transform matrix used at node 4. Then at time 4, node 2 produces transform coefficients y(2) and y(3) (and coefficient vector y 2 ) as in (3.7), where a ′ i and b ′ i are the values of the matrix used at node 2. y 4 = y(4) y(5) = b 1 a 1 a 2 b 2 a 3 a 4 · x(3) x(4) x(5) (3.6) 48 y 2 = y(2) y(3) = a ′ 1 a ′ 2 b ′ 1 b ′ 2 a ′ 3 a ′ 4 b ′ 3 b ′ 4 · x(2) x(3) y 4 (3.7) Node 1 then computes y 1 at time 5. The global transform is given by y =A 1 1 0 0 0 0 0 a ′ 1 a ′ 2 b ′ 1 b ′ 2 0 a ′ 3 a ′ 4 b ′ 3 b ′ 4 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 b 1 a 1 a 2 0 0 b 2 a 3 a 4 x(1) x(2) x(3) x(4) x(5) . (3.8) 3 5 2 4 1 t(3) = 1 t(2) = 4 t(5) = 2 t(4) = 3 t(1) = 5 x(3) x(3) x(5) y 4 y 4 y 2 y 1 Figure 3.4: Example to illustrate unidirectional computations. Nodes generate and transmit transform coefficients in the order specified by the transmission schedule. It is now simple to show that the transform is invertible if eachA n is invertible. Proposition 6 (Invertible Unidirectional Transforms). Suppose that we have the transform in Def. 4, the second timing constraint (m∈B n only ifA m ∩B n =∅) is met, and Prop. 5 is satisfied for every node n. Then the overall transform given by (3.5) is invertible. 49 Proof. Under the two broadcast timing assumptions, the global transform is given by (3.5). (3.5) is invertible if and only if every C t(n) in (3.4) is invertible. C t(n) is invertible if and only if det C t(n) 6= 0. Recall that adding a multiple of one row to another does not change the determinant [63]. Given the structure of the C t(n) matrices, using such row operations to eliminate eachB i n matrix, it is easy to show that det C t(n) = det(A n ). Moreover, Prop. 5 implies thatA n is invertible. Proposition6showsthatlocallyinvertibletransformsprovidegloballyinvertible transforms. Moreover, under our stated timing constraints, broadcast data does not affect invertibility. Therefore, broadcast data at each node n can be used in an arbitrary manner without affecting invertibility. So in order to design an invertible unidirectional transform, all that one must do is design invertible matrices A n . This is an encouraging result since it essentially means that broadcast data can be used in any way a node chooses. In particular, broadcast data can always be used to achieve more data decorrelation. 3.2.4 Discussion Thetheorypresentedthusfarassumesthattheroutingandtransmissionscheduling are known, and that all of the transform matrices are known both at the nodes and at the sink. In practice, the routing, scheduling and transforms must be initialized. Moreover, the network may need to re-configure itself if, for example, nodes die or link conditions change drastically. In addition, packet losses will often occur. Nodes typically deal with this (as in CTP) by re-transmitting a packet until an acknowledgment (ACK) is received from the intended recipient. While these three issues pose no significant problems for routing, they all have an impact on our 50 proposedtransformduetotheassumptionswemakeabouttiming. Wenowprovide some discussion of how this affects our theory and how it can be handled. We first address the impact that initialization and reconfiguration have on the routing and scheduling, as well as what can be done to address it. We assume that routing is initialized and reconfigured in a distributed manner using standard protocolssuchasCTP.DistributedschedulingprotocolsforWSNsalsoexist[60,61]. However, the resulting schedules may not be consistent with Definition 3 (i.e., they may not provide timings for which t(m) < t(n) for all m∈D n ), so in practice we would need to enforce such timings. One way to achieve this is to force nodes to suppress transmission (in a given epoch) until they have received data from all of their descendants. Another alternative would be to determine such a transmission schedule at the sink, then to disseminate the timing information to the nodes. Whenever timing and routing information is established (or re-established due to re-configuration), it is also necessary to check our main broadcast timing con- straint, i.e., m ∈ B n only if t(n) < t(ρ(m)). We describe one way in which this informationcanbedisseminatedtoeachnodeinadistributedmanner. First,when- ever the time t(n) at node n is initialized or changes, it broadcasts a small packet (i.e., a beacon) which contains t(n) to its children. Then, any child of n which broadcasts data will send the same beacon to all of its neighbors. This requires a total of 2 messages for each broadcasting node. Note that protocols such as CTP already use control beacons (in addition to data packets) to update stale routing information. Thus, nodes could potentially piggyback timing information on these control beacons whenever they are generated, or otherwise use separate control beacons to disseminate timing information. This will incur an additional cost, al- though (as was shown in [68]) the per packet cost for control beacons is typically much smaller than the cost for data forwarding. 51 Initialization and re-configuration also impacts the transform matrices that are used. Each node could transmit the values of its matrix to the sink, or vice versa, but this may be very costly. Instead, the construction of each transform matrix should be based on a small amount of information which is made common to the nodes and to the sink. For example, the values in each transform matrix could be based on the number of 1-hop neighbors that each node has [55] or the relative node positions [73]. In this way each matrix can be constructed at each node and at the sink without explicitly communicating the matrix values. However, additional information (e.g., node positions, number of neighbors) would need to be communicated to the sink whenever the network is initialized or re-configured. Forexample, eachnodecouldconstructatransformusingonlythenumberofnodes that it receives data from (as in [55,73]) and would send the set of nodes whose data it used as overhead to the sink. Then, assuming that the nodes and the sink construct the matrices according to the same rules, the sink can re-construct the matrix used at each node. Packet loss is the last practical issue which impacts our proposed transforms. We do not consider the effects of channel noise on the data since these can be handled using a wide variety of existing techniques. Moreover, packet losses and channel noise will impact other data gathering schemes (e.g., CTP), and we expect that the penalty due to packet losses will be similar in our scheme and in other data gathering schemes. Packet losses are typically handled (as in CTP) by re- transmitting a packet until an ACK is received from the desired destination. Thus, if noden does not receive data fromdescendantn+k by the time that it transmits, due to packet re-transmissions for n + k, the data from node n + k cannot be combined with data available at node n. This is equivalent to not using the data from noden+k in the transform computation (i.e.,A n (j,k+1) =A n (k+1,j) = 0 52 for allj 6=k+1 andA n (k+1,k+1) = 1) and does not affect our proposed theory. However, this change must be signaled to the sink so that it knows how to adjust A n accordingly. This can be done by including some additional information in the packet headers for node n and n+k to signify this change. Packet losses also have an impact on the use of broadcast data. Suppose that node n does not receive a data packet from broadcast neighbor b k but the packet from b k does reach the intended recipient ρ(b k ). In this case, node ρ(b k ) will send an ACK back to node b k and node b k will no longer re-transmit (note that node b k will not expect an ACK from node n). Thus, data from node b k can not be combined with data available at node n. This is equivalent to not using data from node b k in the transform computation (i.e., B k n = 0) and our proposed theory is not affected. However, node n must signal this change to the sink node so that it knows to set B k n =0. One way to work aroundthese issues (initialization, re-configurationandpacket losses) is to design transforms that can work under arbitrary timing and with arbi- traryuses ofbroadcast data. However, under arbitrarytiming anduse of broadcast data, it is no longer possible to guarantee global transform invertibility by design- ing invertible transforms at each node. More specifically, we must ensure that the transform computations done at different nodes are jointly invertible. This leads to a set of complex conditions. The cost to determine such conditions and to coor- dinate nodes so that they satisfy these conditions could be very high, perhaps even much higher than the additional coordination needed to implement our proposed transforms. However, it isstill possible todesign simple versions ofsuch transforms byusingconstructionssuchaslifting. ThetransformsdescribedinSection2.2.2[37] isoneparticular example. Giventhishighdegreeofcomplexity toensure aninvert- ible transform when using broadcast, broadcast data should probably only be used 53 with our proposed transforms if (i) it is possible to fix the timing in the network in accordance with Definition 3, and, (ii) the timing is very stable. 3.3 Unidirectional Transform Designs Proposition 6 provides simple conditions for invertible transform design, i.e.,A n is invertible for every node n. This is a simple design constraint that unifies many existing unidirectional transforms. In this section, we demonstrate how existing unidirectionaltransformscanbemappedtoourformulation. Inparticular,wefocus on the tree-based Karhunen-Lo` eve Transform (T-KLT) [52], T-DPCM [41,52] and earlyformsoftree-basedwavelet transforms[1,7,9,55]constructedusinglifting[65]. In order to exploit spatial correlation to achieve reduction in the number of bits per measurement, nodes must first exchange data. Therefore, some nodes must transmit raw data to their neighbors before any form of spatial compression can be performed. Since raw data typically requires many more bits than encoded transform coefficients, it would be desirable to minimize the number of raw data transmissions that nodes must make to facilitate distributed transform computa- tion. Therefore, our main design consideration is to minimize the number of raw data transmissions that are required to compute the transform. 3.3.1 Tree-based Karhunen-Lo` eve Transform Since transforms that achieve data decorrelation potentially lead to better coding efficiency [17], we consider now the design of unidirectional transforms that achieve the maximum amount of data decorrelation. This can be achieved by applying, at each node n, a transform A n that makes all of the coefficients in y n statistically uncorrelated (or “whitened”), e.g., by using a Karhunen-Lo` eve transform (KLT) 54 at each node, leading to the T-KLT described in our previous work [52]. In this transform, each node n computes and transmits a set of “whitened” coefficients y n , which will then have to be “unwhitened” and then re-whitened at ρ(n) to produce a new set of whitened coefficients. Whitening can be done using a KLT and unwhitening can be achieved using an inverse KLT. More specifically, this is done at each node n by (i) finding the whitening transform H n and unwhitening transforms of each child G Cn(i) , (ii) applying an unwhitening transform to each child to recover the original measurements as x Cn(i) =G Cn(i) ·y Cn(i) , and then (iii) rewhitening these measurements as y n =H n · h x(n)x t Cn(1) ...x t Cn(|Cn|) i t . Thus, y n =H n · 1 0 ··· 0 0 G Cn(1) ··· 0 . . . . . . . . . . . . 0 0 ··· G Cn(|Cn|) · x(n) y Cn(1) . . . y Cn(|Cn|) , (3.9) with A n =H n ·diag 1,G Cn(1) ,...,G Cn(|Cn|) . Each A n is trivially invertible since H n and each G Cn(i) are invertible by construction. 3.3.2 Orthogonal Unidirectional Transforms It may also be desirable to construct orthogonal transforms on an arbitrary tree T. Given the assumptions in Section 3.2, we have that the transform C t(n) is orthogonal if and only if C t(n) t ·C t(n) = C t(n) · C t(n) t = I, which holds if and only if A t n = A −1 n and B i n = 0. Thus, under the formulation in Section 3.2, a unidirectional transform is orthogonal only if broadcast data is not used. 55 3.3.3 Tree-based DPCM A simpler alternative to the T-KLT is T-DPCM [41,52]. A related DPCM based method was proposed in [31]. The method in [31] is not designed for any particular communication structure, but it can easily be adapted to take the form of a uni- directional transform. In contrast to the method in [31], the T-DPCM methods in our previous work [41,52] compute differentials directly on a tree such as an SPT. IntheT-DPCMmethodof[52],eachnodencomputesitsdifferencewithrespect to a weighted average of its children’s data, i.e., y(n) =x(n)− P m∈Cn a n (m)x(m). For this to be possible, one of two things must happen: either every node n must decode the differentials received from its children to recover x(m) for each m∈C n , or, every node n must transmit raw data two hops forward to its grandparent (at which pointy(n) can be computed) to avoid decoding data at every node. In order to avoid each node having to forward raw data two hops, at each node n, the inverse transform on the data of each childC n (i) must be computed first using the inverse matrix A Cn(i) −1 of each child. The forward transform is then designed accordingly. We can express this version of T-DPCM as: y n = 1 −a n (D n ) 0 I 1 A Cn(1) −1 ··· A Cn(|Cn|) −1 0 I ··· 0 . . . . . . . . . . . . 0 0 ··· I x(n) y Cn(1) . . . y Cn(|Cn|) . (3.10) Moreover, only leaf nodes need to forward raw data and the rest transmit only transform coefficients. Alternatively, in the T-DPCM scheme of [41], each node n first forwards raw data x(n) to its parent ρ(n), then node ρ(n) computes a differential for n and 56 forwards it to the sink, i.e., node ρ(n) computes y(n) = x(n)−a n (ρ(n))x(ρ(n)). This transform can also be mapped to our formalism as y n = 1 0 −a Dn (n) I · x(n) y Dn . (3.11) This eliminates the computational complexity of the previous T-DPCM method since no decoding of children data is required. However, every node must now forward raw data one hop. Moreover, it will not decorrelate the data as well as the first method since only data from one neighbor is used. 3.3.4 Unidirectional Lifting-based Wavelets We now describe how unidirectional wavelet transforms can be constructed under our framework as was initially proposed by us in [57]. This can be done using lifting [65]. Lifting transforms are constructed by splitting nodes into disjoint sets of even and odd nodes, by designing prediction filters, which alter odd data using even data, and update filters, which alter even data based on odd data. They are invertible by construction [65]. Recall from Chapter 2 that nodes are split into odd and even sets O and E, respectively. This can be done completely arbitrarily. One example from Chap- ter 2.2.1 [55] is to split according to the depth in the tree, e.g., as illustrated in Figure 3.5. Data at each odd node n∈O is then predicted using data from even neighborsN n ⊂E, yielding detail coefficient d(n) =x(n)− P i∈Nn p n (i)x(i). Incor- porating some broadcast data into the prediction is also useful since it allows odd nodes to achieve even further decorrelation. After the prediction step, data at each 57 even node m ∈ E is updated using details from odd neighbors N m ⊂ O, yielding smooth coefficient s(m) =x(m)+ P j∈Nm u m (j)d(j). 17 5 3 6 2 16 11 10 12 1 9 13 14 21 23 22 15 18 19 20 7 8 4 Figure 3.5: Example of splitting based on the depth of the routing tree. White (odd depth) nodes are odd, gray (even depth) nodes are even and the black center node is the sink. As was shown in Section 2.1, invertibility will be guaranteed as long as (i) odd node data is only predicted using even node data, and (ii) even node data is only updated using details from odd nodes. So if E and O is an arbitrary even and odd split, the transform computed at each node will be invertible as long as the computations satisfy (i) and (ii). We are particularly interested in designing unidirectional lifting transforms, thus, we must constrain the set of neighbors for each node to its descendants D n and its causal broadcast neighbors B n . More formally, let O n = (n∪D n )∩O be the set of odd nodes whose data is available at n from its subtree. Let E n = (n∪D n )∩E be defined similarly. Moreover, let O B n = ¯ B n ∩O denote the set of odd nodes whose data n receives via broadcast. Similarly, let E B n = ¯ B n ∩E. Then the computations at n will be invertible as long 58 as it only predicts y(O n ) fromy(E n ) and y(E B n ) and only updates y(E n ) from y(O n ) and y(O B n ). Let M n and M B n be permutation matrices such that y(O n ) y(E n ) y(O B n ) y(E B n ) = M n 0 0 M B n · x(n) y(D n ) y( ¯ B n ) . (3.12) Then node n can compute transform coefficients as y n = (M n ) t I 0 0 0 U n I U B n 0 I P n 0 P B n 0 I 0 0 0 0 I 0 0 0 0 I y(O n ) y(E n ) y(O B n ) y(E B n ) . (3.13) Bymultiplying thesematricestogether, wegety n = [A n B n ]· x(n)y t Dn y t Bn t , with A n = (M n ) t · I 0 U n I · I P n 0 I ·M n , B n = (M n ) t · 0 P B n U B n U n P B n ·M B n . Since det(A n ) = 1, single-level unidirectional lifting transforms are invertible. The transform given by (3.13) corresponds to only one level of decomposition. In particular, at each node n the transform of (3.13) will yield a set of smooth (or low-pass) coefficients {y(k)} k∈En and a set of detail (or high-pass) coefficients {y(l)} l∈On . The high-pass coefficients will typically have low energy if the original 59 data is smooth, so these can be encoded using very few bits and forwarded to the sink without any further processing. However, there will still be some correlation betweenlow-passcoefficients. Itwouldthereforebeusefultoapplyadditionallevels of transform to the low-pass coefficients at node n to achieve more decorrelation. This will reduce the number of bits needed to encode these low-pass coefficients, and will ultimately reduce the number of bits each node must transmit to the sink. Suppose each node performs an additional J levels of lifting transform on the low-pass coefficients {y(k)} k∈En . At each level j = 2,3,...,J + 1, suppose that nodes inE j−1 n are split into even and odd setsE j n andO j n , respectively. We assume thatE 1 n =E n . Foreachoddnodel∈O j n ,wepredicty(l)usingevencoefficientsfrom some set of even neighbors N j l ⊂ E j n , i.e., y(l) = y(l)− P k∈N j l p l,j (k)y(k). Then for each even node k∈E j n , we update y(k) using odd coefficients from some set of odd neighborsN j k ⊂O j n , i.e., y(k) =y(k)+ P l∈N j k u k,j (l)y(l). This decomposition is done starting from level j = 2 up to level j =J +1. For all j = 2,3,...,J +1, let M j n be a permutation matrix such that y(O j n ) y(E j n ) y(R j n ) =M j n ·y n , (3.14) where R j n = (n∪D n )− (O j n ∪E j n ) is the set of nodes whose coefficients are not modified at level j. Then we can express the level j transform computations in matrix form as y n = M j n t · I 0 0 U j n I 0 0 0 I · I P j n 0 0 I 0 0 0 I · y(O j n ) y(E j n ) y(R j n ) , (3.15) 60 where P j n and U j n represent the prediction and update operations used at level j, respectively. Bycombining(3.12),(3.13),(3.14)and(3.15),wefinallygetthaty n = [A n B n ]· x(n)y t Dn y t Bn t , with A n andB n defined in (3.16) and (3.17). A n = J+1 Y j=2 M jt n I 0 0 U j n I 0 0 0 I I P j n 0 0 I 0 0 0 I M j n M t n I 0 U n I I P n 0 I M n (3.16) B n = J+1 Y j=2 M jt n I 0 0 U j n I 0 0 0 I I P j n 0 0 I 0 0 0 I M j n M t n 0 P B n U B n U n P B n M B n (3.17) Prop. 6 implies that the overall transform is invertible if A n given in (3.16) is invertible. Since each M j n is a permutation matrix, |det(M j n )| = 1. Moreover, the remaining matrices are triangular. Thus, it easily follows that det(A n ) = 1. Therefore, unidirectional, multi-level lifting transforms are always invertible. 3.3.5 Unidirectional 5/3-like Wavelets This section describes the 5/3-like transform on a tree initially proposed in the author’s previous work [55]. First, nodes are split into odd and even sets O and E, respectively, by assigning nodes of odd depth as odd and nodes of even depth as even (as done in Section 2.2.1). This is illustrated in Figure 3.5. The transform neighborsofeachnodearesimplyN n ={ρ(n)}∪C n forevery noden. Thisprovides a 5/3-like wavelet transform on a tree since whenever predictions and updates are used along a 1D path, the transform reduces to the 5/3 wavelet transform [33]. 61 This transform can be computed in a unidirectional manner, but doing so requires that some nodes forward raw data 1 or 2 hops. This is illustrated in Figure 3.6. Nodes 4, 5, 7, 8 tx raw data 5 3 6 2 1 7 8 4 y 4 = [x(4)] y 5 = [x(5)] y 7 = [x(7)] y 8 = [x(8)] 5 3 6 2 1 7 8 4 y 3 = [x(3) x(4) x(5)] t y 6 = [x(6) x(7) x(8)] t Nodes 3, 6 tx raw data Figure 3.6: Raw data example. Nodes 3 and 6 need x(2) to compute details d(3) andd(6), so they must forward raw data over 1-hop to node 2. Nodes 4 and 5 need d(3) to compute s(4) and s(5), so they must forward raw data over 2-hops. Data from each odd node n is predicted using data x(C n ) (from children C n ) andx(ρ(n)) (from parent ρ(n)). However, odd noden will not have x(ρ(n)) locally available for processing. Therefore, we require that each odd node n transmit raw data x(n) one hop forward to its parent ρ(n), at which point node ρ(n) can compute the detail coefficient of n. Each even node m will then compute detail d(j) = x(j)− P i∈C j p l (j)x(j)−p j (m)x(m) for every child j ∈ C m . Similarly, the smooth coefficient of each even node m requires details from its parent ρ(m) and children C m , so it can not be locally computed either. Moreover, detail d(ρ(m)) can only be computed at node ρ 2 (m), i.e., at the grandparent of m. Therefore, we require that even node m transmit raw data x(m) two hops forward to ρ 2 (m), at which point d(ρ(m)) will be available and ρ 2 (m) can compute s(m) = x(m)+ P j∈{ρ(m)}∪Cm u m (j)d(j). Note that each of these operations are trivially invertible, andeasily leadto local transformmatricesA n which areinvertible by construction. However, thenumber ofrawdatatransmissionsisrelatively high, i.e., 1-hopforodd nodes and 2-hops for even nodes. We address this inefficiency in the next section. 62 3.4 Unidirectional Haar-like Wavelets We now construct a transform that addresses the inefficiency of the transform proposed in Section 3.3.5. For the transform in Section 3.3.5, raw data from even and odd nodes must be forwarded over 2-hops and 1-hop, respectively. This can be inefficient in terms of transport costs. Instead, it would be better to construct a liftingtransformthatdirectlyminimizes thenumberofrawdatatransmissions each node must make. We use the splitting method in Section 3.3.5. Note that some formofdataexchange must occurbeforethetransformcanbecomputed, i.e., evens must transmit raw data to odds, or vice versa. Suppose that even nodes forward raw data to their parents. In this case, the best we can do is to design a transform for which even nodes transmit raw data over only 1-hop, and odd nodes do not transmit any raw data. This will minimize the number of raw data transmissions that nodes need to make, leading to transforms which are more efficient than the 5/3-like transform in terms of transport costs. We note that minimizing raw data only serves as a simple proxy for the optimization. A more formal optimization which relies on this same intuition is undertaken in recent work [37]. 3.4.1 Transform Construction Adesignthatismoreefficientthanthe5/3-liketransformcanbeachievedasfollows. Note that an odd node n has data from its children C n and/or even broadcast neighbors B n ∩E locally available, so it can directly compute a detail coefficient for itself, i.e., d(n) =x(n)− P i∈Cn p n (i)x(i)− P j∈Bn∩E p n (j)x(j). Thus, the detail d(n) is computed directly at n, is encoded, and then is transmitted to the sink. These details require fewer bits for encoding than raw data, hence, this reduces the number of bits that odd nodes must transmit for their own data. Since data 63 from even node m is only used to predict data at its parent ρ(m), we simply have thatN m ={ρ(m)} ands(m) =x(m)+u m (ρ(m))d(ρ(m)). Moreover, these smooth coefficients can be computed at each odd node n. Therefore, even nodes only need to forward raw data over one hop, after which their smooth coefficients can be computed. Note that not all odd nodes will have children or even broadcast neighbors, i.e., there may exist some odd nodes n such thatC n =∅ andB n ∩E =∅. Such odd nodes can simply forward raw data x(n) to their parent ρ(n), then ρ(n) can compute their details as d(n) =x(n)−p n (ρ(n))x(ρ(n)). Thus, there may be a few odd nodes that must send raw data forward one hop. This leads to a Haar-like transform since it is the Haar wavelet transform when applied to 1D paths. Odd nodes can also perform additional levels of decomposition on the smooth coefficients of their descendants. In particular, every odd node n will locally com- pute the smooth coefficients of its children. Therefore, it can organize the smooth coefficients {s(k)} k∈Cn onto another tree T 2 n and perform more levels of transform decomposition along T 2 n . In this work, we assume T 2 n is a minimum spanning tree. Thisproducesdetailcoefficients{d 2 (k)} k∈O 2 n ,{d 3 (k)} k∈O 3 n ,...,{d J+1 (k)} k∈O J+1 n and smoothcoefficients{s J+1 (k)} k∈E J+1 n for someJ ≥ 0. Inthis way, oddnodes canfur- ther decorrelate the data of their children before they even transmit. This reduces the resources they consume in transmitting data. An example of this separable transform for J = 1 is illustrated in Figure 3.7. By choosing averaging prediction filters and the orthogonalizing update filter design in Section 2.4.2 [56], we get the global equation in (3.18). The coefficient vectory 6 is obtained in a similar manner. y 3 = 1 0 0 0 1 0 0 1 2 1 · 1 0 0 0 1 −1 0 0 1 · 1 0 0 1 3 1 0 1 3 0 1 · 1 − 1 2 − 1 2 0 1 0 0 0 1 x(3) x(4) x(5) (3.18) 64 5 3 6 2 7 8 4 y 4 = [x(4)] y 5 = [x(5)] y 7 = [x(7)] y 8 = [x(8)] d(3) = x(3) - [x(4) + x(5)] / 2 s(4) = x(4) + d(3) / 3 s(5) = x(5) + d(3) / 3 d(6) = x(6) - [x(7) + x(8)] / 2 s(7) = x(7) + d(6) / 3 s(8) = x(8) + d(6) / 3 (a) 1 st -level along T 5 2 7 8 4 y 3 = [d(3) d 2 (4) s 2 (5)] t y 6 = [d(6) d 2 (7) s 2 (8)] t d 2 (4) = s(4) - s(5) s 2 (5) = s(5) + d 2 (4) / 2 d 2 (7) = s(7) - s(8) s 2 (8) = s(8) + d 2 (7) / 2 (b) 2 nd -level “orthogonal” to T Figure 3.7: Unidirectional Computations for Haar-like Transform. In (a), nodes 3 and 6 compute a first level of transform. Then in (b), nodes 3 and 6 compute a second level of transform on smooth coefficients of their children. 3.4.2 Discussion The transform computations that each node performs can be easily mapped into our standard formy n = [A n B n ]· x(n)y t Dn y t Bn t by appropriately populating the matrices in (3.16) and (3.17). Therefore, they will always yield invertible trans- forms. For example, since each odd node n predicts its own data x(n) using data from its children C n and even broadcast neighbors B n ∩E, then updates the data of its children from its own detail, a single level transform can be expressed as y n = 1 0 u Dn (n) I 1 −p n (D n ) −p n ( ¯ B n ) 0 I 0 x(n) y Dn y Bn . (3.19) By choosing A n = 1 0 u Dn (n) I · 1 −p n (D n ) 0 I , (3.20) and B n = 1 0 u Dn (n) I · −p n ( ¯ B n ) I , (3.21) 65 we have that y n = [A n B n ]· x(n)y t Dn y t Bn t . Note that (3.19) covers all of the cases discussed in Section 3.4.1 for each odd node n, that is to say: (i)C n 6=∅ and B n ∩E 6=∅, (ii)C n =∅ andB n ∩E 6=∅, (iii)C n 6=∅ andB n ∩E =∅, and (iv)C n =∅ andB n ∩E =∅. In particular, wheneverC n 6=∅,p n (D n ) andu Dn (n) will have some non-zero entries. Otherwise, n has no descendants and so p n (D n ) and u Dn (n) will just be vectors of zeros. Similarly, whenever B n ∩E 6= ∅, p n ( ¯ B n ) will have some non-zero entries. Otherwise, n has no even broadcast neighbors andp n ( ¯ B n ) will be a vector of zeros. Similarly, each even node m may need to compute predictions for its odd chil- dren, so its computations for a single level transform can be expressed as y m = 1 0 −p Dm (m) I · x(m) y Dn . (3.22) Also note that (3.22) covers all of the cases for each even node m discussed in Section 3.4.1, i.e., when m has to compute predictions for children thenp Dm (m)6= 0, otherwise, p Dm (m) =0. Notethat, whenbroadcastdataisused, thedecorrelationachieved atoddnodes may still be comparable to the 5/3-like transform since the same number of neigh- bors (or more) will be used. Moreover, broadcasts are particularly useful for odd nodesn that have no children, i.e., n for whichC n =∅ butB n ∩E 6=∅. If broadcast data is not used when it is available, node n will have to transmit x(n) to its par- ent. Sincex(n) requiresmorebitsforencodingthandoesadetailcoefficientd(n),n will consume more resources during data transmission. By using broadcasts, these odd nodes which have no children can still use data overheard from even broadcast neighbors, allowing them to avoid transmitting raw data to their parents. This is 66 illustrated in Figure 3.8, where node 11 has no children but overhears data from node 12. The example in Figure 3.8(a) will consume more resources at node 11 than will the example in Figure 3.8(b). 11 10 12 9 x(11) [x(10), d(11)] x(12) [d(9), s(10), d(11), s(12)] (a) Without Broadcasts 11 10 12 9 x(12) d(11) [x(10), d(11)] x(12) [d(9), s(10), d(11), s(12)] (b) With Broadcasts Figure3.8: Nobroadcastsareusedin(a),sonode11consumesmoreresourceswhen transmitting raw data x(11). Broadcasts are used in (b), so node 11 consumes less resources when transmitting detail d(11). 3.5 Quantization of Transform Coefficients Quantization is also an important part of the compression process since it allows nodes to provide data to the sink at different bit rates (and correspondingly, differ- ent costs and reconstruction qualities), though lossless coding (i.e., without quanti- zation) could always be performed for the wavelet transforms in Section 3.3.4, 3.3.5 and 3.4 or the T-DPCM schemes in Section 3.3.3. In this thesis we consider both lossless and lossy coding. One major problem with quantization in WSNs is that original datawill not always beavailable forcomputation atevery node. Thus, itis possible that some nodes receive data that has already been quantized, process it, 67 thenquantizeitagain. Thiscreatestwomajorproblems. First, itcanleadtosevere propagation of quantization errors in the inverse transform computations, leading to significant degradation in the reconstructed data. Secondly, if the encoders (i.e., at the nodes) and decoder (i.e., at the sink) operate on different data (unquantized versus quantized) for adaptation of the filters or entropy coders, this will lead to a serious drift problem. Thefirstproblemcanbeeasilyhandledbyonlyusingquantizeddataforfiltering operationsateachnode, thatwaywecanavoidhavingcascaded quantizationsteps. T-DPCM is one scheme that only uses quantized data to compute predictions. In light of these facts, for the lifting transforms we also find it sensible to (i) only use quantizeddatatocomputepredictionsforoddnodedataand(ii)onlyusequantized detail coefficients to compute updates of even node data. A similar strategy would be employed over multiple levels of decomposition. In this way we can avoid this severe propagationofquantization errors. This idea wasinitially proposedin[8]for the same reason, and was used to quantize the coefficients of a unidirectional 1D, 5/3 wavelet transform. Note that transforms such as the T-KLT do not have this luxury since inverse and forward transforms are constantly being applied at every node, i.e., initially each node transmits quantized raw data, the quantized raw data is used to generate transform coefficients, then the coefficients are quantized again. Thus, propagation of quantization error is inevitable for the T-KLT. The second problem can be dealt with by using only quantized data in the filter adaptation. For example, in the adaptive prediction filter scheme described in Section 2.3.2, the prediction filters are always updated using quantized data. In this way, nodes and the sink will use the same data for adaptation, so drift problems are completely avoided. Obviously quantized data should always be used when adapting the entropy coders. 68 3.6 Experimental Results This section presents experimental results that compare the transforms proposed here against existing methods. Source code used to generate these results can be found on our web page 5 . In particular, we focus on comparing the proposed multi-level Haar-like lifting transforms against the multi-level 5/3-like transform from [55,58], the T-DPCM scheme in [52] and raw data gathering. We consider the application of distributed data gathering in WSNs. Performance is measured by total energy consumption. 3.6.1 Experimental Setup For evaluation, we consider simulated data generated from a second order AR model. This data consists of two 600×600 2D processes generated by a second order AR model with low and high spatial data correlation, e.g., nodes that are a certain distance away have higher inter-node correlation for the high correlation datathanforthelowcorrelationdata. Morespecifically,weusethesecondorderAR filter H(z) = 1 (1−ρe jω 0z −1 )(1−ρe −jω 0z −1 ) , with ρ = 0.99 and ω 0 = 99 (resp. ω 0 = 359) to produce datawith low (resp. high) spatial correlation. The nodes were placed in a 600×600grid, withnodemeasurements corresponding to thedata valuefromthe associated position in the grid. Each network used in our simulations is generated from a set of random node positions distributed in the 600×600 grid. An SPT is constructed for each set of node positions. We consider two types of networks: (i) variable radio range networks in which each node can have a different radio range, and(ii) fixed radio range networks inwhich each node has the same radio range. In thevariableradiorangecase, theradiorangethateachnodenusesfortransmission 5 http://biron.usc.edu/wiki/index.php/Wavelets on Trees 69 is defined by the distance from n to its parent in the SPT. Additional broadcast links induced by the SPT are also included, i.e., a broadcast link between node n and m exists if m is not a neighbor of n in the SPT but is within radio range of n. Inordertomeasureenergyconsumption, weusethecostmodelforWSNdevices proposed in [21,74], where the energy consumed in transmitting k bits over a distance D is E T (k,D) = E elec ·k +ε amp ·k·D 2 Joules and the energy consumed in receiving k bits is E R (k) = E elec · k Joules. The E elec · k terms capture the energy dissipated by the radio electronics to process k bits. The ε amp ·k·D 2 term captures the additional energy for signal amplification needed to ensure reasonable signal power at the receiver. WSN devices also consume energy when performing computations, but these costs are typically very small compared with transmission andreceptioncosts. Therefore, weignoretheminourcostcomputations. Alsonote that all data gathering schemes will suffer from channel noise and attenuation, so a no-channel-loss comparison is still valid. Thus, we do not consider these effects in our experiments. Comparisons are made with the Haar-like transforms of Section 3.4 against the 5/3-like transform with delayed processing proposed in [58] and the T-DPCM scheme proposed in [52]. Predictions for each of these transforms are made using theadaptivepredictionfilterdesigninSection2.3.2(proposedin[52]). Updatesare made using the “orthogonalizing” update filter design in Section 2.4.2 (proposed in[56]). Ineach epoch, we assume that each nodetransmitsM = 50 measurements taken atM different times. Also, each raw measurement is represented using B r = 12 bits. We assume each odd node encodes M detail coefficients together with an adaptive arithmetic coder. Smooth coefficients are treated like raw data, e.g., each one uses B r bits. Since we only seek to compare the performance of spatial transforms, we do not consider any temporal processing. 70 3.6.2 Simulation Results In the case of lossless compression, the average cost reduction ratios taken over multiple uniformly distributed networks are shown in Figure 3.9 for high and low data correlation. These are expressed as the average of multiple values of (C r − C t )/C r , where C t is the cost for joint routing and transform and C r is the cost for rawdataforwarding. Resultsforvariableradioranges(eachnodehasdifferentradio range) are shown in Figure 3.9(a). Results for fixed radio ranges (each node has the same radio range) are given in Figure 3.9(b). T-DPCM does the worst overall. The 5/3-like transform provides significant improvement over the simple T-DPCM scheme. The Haar-like transforms have the highest average cost reduction ratio, or equivalently, the lowest average cost. Moreover, we note that broadcast is not veryhelpful(onaverage)whennodeshavevariableradioranges(Figure3.9(a)),but thereisasignificantgainwhennodesuseafixedradiorange(Figure3.9(b)). Thisis mainlybecause, inthefixedradiorangecase, (i)therearemanymoreopportunities for using broadcast data and (ii) each node has more broadcast neighbors. Thus, broadcast is likely to be most useful when nodes are configured with a fixed radio range. Note that the amount of raw data forwarding needed to compute the Haar-like transformissignificantly reduced comparedwiththe5/3-liketransform. Therefore, theHaar-liketransformwilldo betterthanthe5/3-liketransformintermsoftrans- port costs. Granted, the 5/3-like transform will use data from more neighbors for processing, so thedecorrelation given by the5/3-liketransformwill begreater than that given by the Haar-like transform. However, in our experiments the average reduction in rate that the 5/3-like transform provides over the Haar-like transform is rather small. The Haar-like transform with broadcast also provides additional 71 0 100 200 300 400 500 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Average Percent Cost Reduction No. of Sensors (C r −C t )/C r Haar−like Wav. w/ Broad. Haar−like Wav. 5/3−like Wav. T−DPCM (a) Variable Radio Range 0 100 200 300 400 500 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Average Percent Cost Reduction No. of Sensors (C r −C t )/C r Haar−like Wav. w/ Broad. Haar−like Wav. 5/3−like Wav. T−DPCM (b) Fixed Radio Range Figure 3.9: Average percent cost reduction ( Cr−Ct Cr ). Solid and dashed lines cor- respond to high and low spatial data correlation, respectively. Best performance achieved by Haar-like transforms, followed by 5/3-like transform and T-DPCM. High correlation data also gives greater reduction than low correlation data. cost reduction over the Haar-like transform without broadcasts since less raw data forwardingisneededonaverage. Moreover, theamountofcostreductionachievable is higher for the high correlation data than for the low correlation data. Lossy coding is also possible and can provide even greater cost reductions while introducing some reconstruction error. In this case, we quantize transform coef- ficients with a dead-zone uniform scalar quantizer. Performance is measured by the trade-off between total cost and distortion in the reconstructed data, which we express as the signal to quantization noise ratio (SNR). Sample 50 node networks are shown in Figs. 3.10(a) and 3.10(c) and, in the case of high correlation data, the corresponding performance curves are shown in Figs. 3.10(b) and 3.10(d). The Haar-like transforms do the best among all transforms. WhenusingbroadcastswiththeHaar-liketransform,thereisanadditional1dB (resp. 2.5 dB) gain in SNR for the variable (resp. fixed) radio range network at a 72 fixed cost, i.e., by using broadcasts we can increase the quality in the reconstructed data for a fixed communication cost. Thus, for these networks, using broadcast is quite helpful. Also note that there are only 2 broadcast links used in the trans- form for the variable radio range network (Figure 3.10(a)), whereas there are over 10 broadcast links used in the fixed radio range network (Figure 3.10(c)). Thus, broadcast provides even greater gains for the fixed radio range network (2.5 dB versus 1dB) since there are more broadcast links. More generally, broadcast should provide more gains in networks where many broadcast opportunities are available. In this particular network for the variable radio range case, T-DPCM actually doesbetter thanthe5/3-like transform. Notethat in T-DPCM, only theleaf nodes forward raw data to the sink; so if there are only a few leaf nodes, the raw data forwarding cost for T-DPCM may not be very high compared with the raw data forwarding cost for the 5/3-like transform. In this particular network, only 19 of the 50 nodes are leaves in the tree. Therefore, the raw data forwarding cost for T-DPCM in this case is lower than that for the 5/3-like transform. However, on averagetherawdataforwardingcostforT-DPCMwillbeveryhigh(seeFigure3.9), leading to higher total cost on average as compared with the 5/3-like transform. 3.6.3 Comparison of Filter and Even/Odd Split Designs Thissectionprovidesexperimentalresultswhichcomparethevarioussplitandfilter designs. Weagainconsider thescenariooftransform-baseddistributeddatagather- inginWSN,andusethesameexperimentalsetupasinSection3.6.1. Figure3.11(a) shows the same 50 node network used in Section 3.6.2 and the corresponding cost- distortion curves are shown in Figure 3.11(b). 73 The data adaptive prediction filters from Section 2.3.2 are far superior to sim- ple average and planar-based prediction filters proposed in Section 2.3.1. We also see that the orthogonalizing update filters from Section 2.4.2 do not provide much improvement for this network. This is mainly because not many levels of decom- position are performed. For larger networks, there is more gain when using or- thogonalizing updates, but this gain is still not very substantial. In fact, for WSN and our particular transform constructions, we only observe at most 3 levels of decomposition being possible. Thus, this update design is not likely to improve the performance for this application. However, as we will see in Section 5.3, this does provide some significant improvements when used in image coding, mainly because (i) there are many pixels and (ii) many levels of decomposition can be applied. We also show some comparisons for various split designs in this same WSN context. In particular, we compare the simple tree-based splitting described in Section 2.2.1 with Haar-like transforms from Section 3.4 against the optimized graph-based splitting discussed in 2.2.2 (see [37] for more details). We use the same network shown in Figure 3.11(a), and adaptive prediction filters are used in all cases. The resulting graph-based split is shown in Figure 3.12(a) and the corresponding cost-distortion curves are shown in Figure 3.12(b). The graph-based splits provide some improvements over the tree-based splits, mainly because there is much less raw data forwarding in the network. 3.7 Conclusions A general class of en-route in-network (or unidirectional) transforms has been pro- posed along with a set of conditions for their invertibility. This covers a wide 74 range of existing unidirectional transforms and has also led to new transform de- signs which outperform the existing transforms in the context of data gathering in wireless sensor networks. In particular, we have used the proposed framework to provide a general class of invertible unidirectional wavelet transforms constructed usinglifting. Thesegeneralwavelet transformscanalsotakeintoaccountbroadcast data without affecting invertibility. A unidirectional Haar-like transform was also proposed which significantly reduces the amount of raw data transmissions that nodes need to make. Since raw data requires many more bits than encoded trans- form coefficients, this leads to a significant reduction in the total cost. Moreover, our proposed framework allows us to easily incorporate broadcasts into the Haar- like transforms without affecting invertibility. This use of broadcast data provides further performance improvements for certain networks. 75 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Transform Structure on SPT (Variable RR) (a) Variable RR Network 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 5 10 15 20 25 30 35 40 45 50 Total Energy Consumption (Joules) SNR (dB) SNR vs. Energy Consumption (Variable RR) Haar−like w/ Broad. Haar−like 5/3−like T−DPCM Raw Data (b) Cost-Distortion Curves (Variable RR) 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Transform Structure on SPT (Fixed RR) (c) Fixed RR Network 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 5 10 15 20 25 30 35 40 45 50 Total Energy Consumption (Joules) SNR (dB) SNR vs. Energy Consumption (Fixed RR) Haar−like w/ Broad. Haar−like 5/3−like T−DPCM Raw Data (d) Cost-Distortion Curves (Fixed RR) Figure 3.10: Sample networks with corresponding Cost-Distortion curves. In (a) and (c), solid lines denote forwarding links, dashed lines are broadcast links, circles are even nodes, x’s are odd nodes, and the square center node is the sink. 76 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Transform Structure on SPT (a) Sample Network 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0 10 20 30 40 50 60 Total Energy Consumption (Joules) SNR (dB) SNR vs. Energy Consumption Adaptive Pred. Planar Pred. / Orth. Update Avg. Pred. / Orth. Update Avg. Pred. / Update Raw Data (b) Performance Curves Figure 3.11: Filter design comparison. Circles are even nodes and x’s are odd nodes. Adaptive prediction filters do much better than fixed prediction filters. Orthogonalizing updates provide almost no gain. 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Transform Structure on Graph (a) Graph-based Split 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0 10 20 30 40 50 60 Total Energy Consumption (Joules) SNR (dB) SNR vs. Energy Consumption Tree−based Split (1−level) Tree−based Split (Multi−level) Unweighted Graph−based Split Weighted Graph−based Split Raw Data (b) Performance Curves Figure 3.12: Split design comparison. Circles are even nodes and x’s are odd nodes. Dashed lines denote broadcast links. Graph-based splits provide some improvements over tree-based splits. 77 Chapter 4 Joint Optimization of Transform and Routing In this chapter we address the problem of joint optimization of transform and rout- ing. Note that the unidirectional transforms proposed in Chapter 3 can be defined alonganyroutingtree, thus, it wouldbebest tochoosetheroutingtreejointlywith the transform. We achieve this by proposing two optimization algorithms (initially proposed by us in [54]) that search for the best choice of tree for a fixed choice of transform. 4.1 Introduction While existing transform-based methods (e.g., those proposed in Chapter 3) are capable of reducing the number of bits to be transferred to the sink, almost all of them separate transform design and routing, i.e., they define transforms first then mapthosetransformsontoefficientroutingtrees, orviceversa. Inthefirstcase,this requires nodes to transmit uncompressed data directly to a cluster head as in [15] or to a certain number of neighbors [72,73] before transform coefficients can even becomputed. If theneighbors(orcluster head) ofanodearefurtheraway fromthe sink than the node itself, additional backward transmissions of uncompressed data 78 willberequiredthatincreasethetotalcost. Theunidirectionaltransformsproposed in Chapter 3 (and [1,9,52,55,57,58]) i) can be computed on arbitrary routing trees and ii) do not require additional backward transmissions (they are computed in a unidirectional manner as data flows toward the sink). These unidirectional transforms have been shown to be more energy-efficient than the bi-directional transform in [72,73]. However, note that these unidirectional transforms consider the design of the transform and routing separately, e.g., a shortest path routing tree is chosen first and then a transform is performed over that tree. Instead, the techniques we propose in this chapter attempt to exploit the inherent interaction between different trees and the unidirectional transforms proposed in Chapter 3. Thisleadsustoapracticalapproachforjointlyoptimizingcompressionandrouting, i.e., we can aim at designing a tree with good transport cost and data correlation properties, knowing that no matter what tree is chosen a unidirectional transform can be implemented. A shortest path routing tree (SPT), guarantees that the path from a given node to the sink is most efficient for routing, but obviously does not guarantee that consecutive nodes in a path contain highly correlated data. Of course one could simply optimize the transform along this SPT, as was done in work by us in [37]. However, that optimization problem is quite different since (i) the routing is assumed fixed, (ii) any additional broadcast links that arise are added to the links on the tree in order to form a more general graph and (iii) a greedy heuristic is used to search for the minimum cost even / odd splitting on this graph. This leads to a more general transform than the tree-based wavelet transforms proposed in Sections 3.3.4, 3.3.5 and 3.4. On the other hand, the optimization algorithm proposedin thischapter assumes that atree-based transformis given (such as, e.g., the Haar-like tree-based wavelet transform proposed in Section 3.4), then searches 79 for the routing tree that minimizes the total energy consumption for this choice of transform. Again, note that the SPT does not guarantee that consecutive nodes along a routing path contain highly correlated data. For example, if data correlation is inversely proportional to distance between nodes, one would always have to route through the nearest neighbor in order to achieve maximal inter-node data corre- lation. Clearly SPT routing does not guarantee this, since this design aims to minimize distance to sink, not inter-node distance. The results in [40] corroborate this basic intuition, where it was shown that a network with high data correlation benefits most from routing with compression along shorter hops with longer overall paths. Asanalternative, we couldconsider trees thatlinktogether nodeswithhigh inter-node data correlation. Such trees can provide greater compression efficiency than an SPT. However, aggregating along these types of trees may force nodes to transmit data away from the sink, so that gains provided by the increase in de- correlation are offset by increased transmission cost. Since aggregation will occur along routing trees, there is a trade-off between trees that result in energy-efficient routing and ones that allow a transform to de-correlate data effectively. Thus, the main goal in this chapter is to find trees that effectively exploit this trade-off. In order to achieve jointly optimized routing and transform we search exhaus- tively for the lowest cost tree among a set of possible trees, for a fixed distortion levelD. For a given tree T, we use cost model proposed in Section 3.6 which speci- fies the cost for each nodento routeandcompress data tothe sink alongT. We let C T (D) denote thetotal cost to route andcompress data along treeT for some fixed distortionlevelD. Since any ofthe transformsinChapter 3 canbecomputed along arbitrary trees, a natural optimization problem is to find a tree T that minimizes the total cost C T (D) for a fixed distortion level D and fixed transform. 80 While onecouldconsider thefull set of possible trees foragiven communication graph, this set can be extremely large. The well-known matrix-tree theorem [25] (which provides the number of spanning trees for a given graph) shows that a complete graph with n nodes has n n−2 possible trees. Even if the graph is not complete, the matrix-tree theorem may still imply a very large solution space. Thus, it is not computationally feasible to consider a full solution set. To make the optimizationproblemtractable,wechooseonlytoexploretreesthatcanbeobtained bycombininglinksfromanSPTcomputedwithedgesdefinedbyphysicalinter-node distances (to minimize distance to the sink) with links from a minimum spanning tree (MST) computed with edge weights defined by inter-node data correlation (to maximize pair-wise inter-node correlation). More specifically, we design an MST using edge weights w(m,n) = 1−r m,n with r m,n the correlation coefficient between nodes m and n so that an MST corresponding to these edge weights will have a link between each node n and the neighbor of n that has maximum inter- node data correlation with n. Clearly, such an MST is “best” in the sense of maximizing pair-wise data correlation along the tree, which should help achieve improvedcompressionefficiency forourtransform. SincetheSPTwillminimizethe cost to route any amount of data from a node to the sink, we can use combinations ofsuchanMSTwithanSPTtoprovideadirecttrade-offbetweenhighcompression performance and low routing cost. To illustrate this point, consider the real network in Figure 4.1 taken from [38], where acombinationofanSPT andMSTisused forjointroutingandcompression. The SPT provides the shortest route to the sink from any node, but fails to link some nodes to their closest neighbors. This can reduce compression efficiency. The MST links those nodes to their closest neighbors, but also has some longer paths that push data away from the sink. Clearly, neither alone is sufficient to achieve 81 the best joint routing and compression performance. Instead, the trees obtained by our proposed optimization methods tend to link nodes to their closest neighbor, but in a way that preserves short paths to the sink, resulting in improved overall performance. 500 520 540 560 580 600 240 260 280 300 320 SPT 500 520 540 560 580 600 240 260 280 300 320 MST 500 520 540 560 580 600 240 260 280 300 320 Heuristic Tree 500 520 540 560 580 600 240 260 280 300 320 Optimal Tree Figure 4.1: SPT, MST, and Combined Tree In summary, in this chapter we address the joint optimization of routing and compression by using the unidirectional transform framework proposed in Chap- ter 3. In particular, we propose a two heuristic algorithms for selecting routing and transform jointly by accounting for both data correlation and routing costs. 82 4.2 Joint Routing and Transform Optimization Ourproposedoptimizationmethodisinspiredbythe“foreigncoding”techniquede- veloped in [48], where an MST is constructed with edge weights that are a function of data correlation and where data is encoded along this MST and forwarded along an SPT. Note that an MST constructed from edge weights that are inversely pro- portional to pair-wise data correlation will always connect nodes to the neighbors with which they share the highest data correlation. To see why this is so, consider Prim’s algorithm [12] for constructing an MST. Given a set of edge weights, Prim’s algorithm can construct an MST by starting from any arbitrary node n. So by starting from any node n, the second node added to the MST will be the node m such that w(n,m) ≤ w(n,k), for all k ∈ V\{n}. Since w(n,m) is minimal over all edges containing n as a vertex, the correlation between n and m is maximal. Therefore, data at n is maximally correlated with data at m. Since this can be done for arbitrary n, it follows that in an MST every node will be connected to the neighbor with which it shares the highest data correlation. It is worth noting here that an MST defined in this way only considers pair-wise correlation, and so is not necessarily optimal from a coding standpoint when using our proposed transforms (where data is filtered over multiple hops). However, it should provide a good approximation to the optimal tree. Thus, we choose to exploit our stated trade-off by searching for a minimum cost tree among a set of trees that combine a distance-based SPT and a correlation-based MST. In the technique proposed in [48], spatial correlation at each node is only ex- ploited using data fromat most one child in the routing tree. If data froma child is used, nodes perform what is called “foreign coding”. Otherwise (if no other data is used forcompression), nodesperformwhat isknownas“self coding”. Inthecaseof 83 the unidirectional transforms proposed in Chapter 3, nodes always perform foreign coding. Moreover, this foreign coding can incorporate data from more than one child and/or descendant. Since the optimal solution in [48] assumes that foreign coding uses data from only one child, it can not be directly extended to our trans- forms. Furthermore, the unidirectional transforms we consider should also provide more de-correlation since a node can compress its data using data from more than one neighbor. An MST does have some drawbacks, though. For one, it may not have as many merge points as an SPT. Since the transforms in Chapter 3 only exploit cross- path correlation at and around merge nodes, having fewer merges may actually reduce the efficiency of our transform when performed along an MST. However, an appropriate combination with an SPT should maintain these merges whenever beneficial. In fact, not all of the neighbors of a node in the MST will have high data correlation with it so some merges may actually hurt coding performance. As mentioned above, our MSTs only consider correlation over a single hop and may result in some inefficiency since our proposed transform actually filters data over multiple hops. Furthermore (as will be discussed in Section 4.5), if a predict has more neighbors it will tend to have less residual energy and so should require fewer bits. Similarly, having more neighbors at an update node can produce a smoother approximation of the original data and as such should also require fewer bits. So as an alternative to MSTs, we could develop trees that (1) preserve beneficial merges and (2) keep the number of merges at predict nodes to a minimum. These issues will be explored experimentally in Section 4.5 and could be an interesting area for future work. In Section 4.2.1, we propose an algorithm that finds the minimum cost combi- nation of an SPT and MST by computing, for every possible combination, the cost 84 of transform and routing along each tree (overlayed on an SPT) and then select- ing the lowest cost combination for a fixed distortion D. The algorithm is general enough to accommodate an arbitrary definition of edge weights used to construct the MST, i.e., w(m,n) = 1−r m,n , or w(m,n) can be physical inter-node distance, oranything elsethatallowsustoquantifythedegreeofinter-nodedatacorrelation. Since the number of combinations grows rapidly with the number of nodes, we also propose a heuristic approximation algorithm that is amenable to larger networks in Section 4.3. 4.2.1 Optimization Algorithm For a set of N nodes, let T S denote the SPT and T M denote an oriented version of the MST. Let T represent the tree which is our desired combination of T S and T M . An oriented version of the MST (T M ) is necessary to define a unidirectional transform. Basically, T M fixes the sink node N + 1 as the root and directs all edges in the MST toward the sink. We can represent each tree by defining parent functions ρ M n and ρ S n for T M and T S respectively. Under this construction, data at node n is routed to the sink through ρ M n in T M , ρ S n in T S , and ρ n in T. Thus, we define the edges in each tree by the ordered pairs (n,ρ M n ) and (n,ρ S n ) for T M and T S . Weconstruct a minimum cost tree bysearching amongall feasible combinations ofsuch edges inT M with such edges inT S . Wefirst explain how tofindthesmallest possible set of feasible combinations (i.e., combinations that result in a connected acyclic graph) in Section 4.2.2. We then provide an algorithm that searches over this feasible set to find a minimum cost solution in Section 4.2.3. 85 4.2.2 Feasible Set Construction The total number of combinations could be as many as 2 N , but many such edges in T S and T M will be the same so we may eliminate those from consideration. Furthermore, notall combinationsofsuch edgeswill producea validtree(i.e., some may result in cycles or may disconnect certain groups of nodes) so the number of combinations canbe reduced even further by eliminating invalid trees. We consider an edge (n,m) to be the same in both trees if m = ρ M n = ρ S n (i.e., the parent of node n is the same in both trees). Thus, we define V ′ = {n|ρ M n 6= ρ S n } and N ′ as the number of nodes in V ′ . We also enumerate this set as V ′ ={n 1 ,n 2 ,...,n N ′}. For each node n i ∈ V ′ , let E ′ (n i ) = { n i ,ρ M n i , n i ,ρ S n i } be the set of edges from n i to the parent ofn i in eitherT S orT M . Then the full set of combinations of edges we consider in T M and T S is given by: E =E ′ (n 1 )×E ′ (n 2 )×...×E ′ (n N ′). We reduce the search space further by eliminating combinations of edges inE that do not produce a valid tree (i.e., graphs that are disconnected or have cycles or both). We check for tree validity as follows. Let ˜ E j ∈ E, where j indexes the j-th combination of edges inE. Naturally, ˜ E j ={(n 1 ,m 1,j ),...,(n 1 ,m N ′ ,j )} for the j-th combination. Let ˜ E be the set of edges in T S that are the same in T M . Then a combination E j will be feasible only if the graph ˜ T = (V, ˜ E∪ ˜ E j ) is connected and acyclic. This is done by checking that each leaf node has a non-cyclic path to the sink (which is sufficient since this process traverses every node in the network). Otherwise, the graph ˜ T does not form a valid tree. We represent the set of feasible 86 trees by the N f × N matrix T f , where N f is the number of feasible trees and T f (m,n) is the parent of node n in the m-th feasible tree. 4.2.3 Feasible Set Search Since the full set of feasible trees is given by T f , we could then find the tree that optimizes routingandtransform byi) fixing a targetdistortion levelD (inour case, distortion is mean squared error (MSE)) , ii) computing the cost C j for performing routing and transform along every possible tree given in T f with distortion level D, and iii) choosing the tree with minimum cost. This is an exhaustive search over our set of feasible combinations of MST and SPT, and should therefore provide the minimum cost combination. Specifically, thisisdoneasfollows. LetC ∗ bethecost forthebest treefoundup to row j and initialize it as C ∗ =∞. Also let i ∗ index the row inT f corresponding toC ∗ andinitializeitasi ∗ = 0. Thenforeachrowj ofT f ,withj = 1,2,...,N f , do the following. Define the parent function ρ j (1 :N) =T f (j,1 :N) and compute C j = ComputeCost(ρ j ,D,T S ) (a function that computes the cost of doing transform and routing along the tree corresponding to ρ j with SPT T S overlayed on top). If C j < C ∗ , then C ∗ = C j and i ∗ = j. Once all feasible trees are exhausted, we can construct the tree T (which minimizes the cost for routing and transform over all feasible combinations of MST and SPT) using the parent function defined by T f (i ∗ ). The function ComputeCost(ρ j ,D,T S ) returns the total cost for using the tree defined by ρ j . This cost is computed using the cost models discussed in Section 3.6. 87 4.3 Heuristic Approximation Algorithm For large N, the feasible set from Section 4.2.2 can still be very large. This makes the problem intractable for large N, which motivates the need for a good heuristic algorithm that approximates the minimum cost algorithm in Section 4.2.1. The main goal of a good heuristic should be to choose links that provide a direct gain in coding efficiency only if the resulting increase in routing cost does not offset the gains achieved, so that a desirable balance of low cost routing and higher compression efficiency can beobtained. This canbe donereasonably well by startingfromaninitialtreeandsearching onenodeatatimefromnodesofgreatest depth (since these nodes will be further from the sink and will benefit more from efficient coding) and decrementing depth at each stage until all nodes are covered. In our case we choose SPT as the initial tree in order to preserve low routing costs, then for each node we simply determine if the cost (C T ) to use the next hop in the MST is lower than the cost to continue along the next hop in the current tree (e.g. SPT). If so, then the next hop of such a node will be the next hop along the MST (rather than the next hop along the SPT). This ensures that, for each node, any direct gains in coding efficiency will not be offset by the resulting increase in routing cost. This is clearly a greedy algorithm, and so can not guarantee that the optimal combination of an MST and SPT will be found. But at the very least, it will guarantee that the resulting tree provides lower cost transform and routing than a transform performed along the SPT. This algorithm is described formally in Algorithm 1. The final tree we seek isT and initially T =T S . This allows us to greedily choose an edge in T M over an edge in T S only if the direct gain in coding efficiency offsets the increase in routing cost. Naturally, the validity of the tree that results from switching to an edge in T M is 88 checked before further steps are taken. We also say that a parent functionρ j yields a feasible tree if the tree defined byρ j is a connected, acyclic graph. The algorithm simply searches each resulting tree and returns the lowest cost tree it finds as T. Algorithm 1 Find Heuristic Tree 1: T =T S and ρ j =ρ S j ,∀j ∈I 2: k = max(depth) and C =∞ 3: while k≥ 1 do 4: I k ={m∈I : depth(m) =k} 5: for each n∈I k do 6: ρ t n =ρ M n and ρ t j =ρ j ,∀j ∈I\{n} 7: if ρ t j yields a feasible tree then 8: C t = ComputeCost(ρ t ,D,T S ) 9: if C t <C then 10: Update T and ρ j using ρ t j 11: C =C t 12: end if 13: end if 14: end for 15: k =k−1 16: end while 17: return T 4.4 Practical Considerations There are multiple practical issues involved with joint routing and transform opti- mization,themostprevalentonesbeing(i)howtoestimatethechangeincorrelation 89 (and the corresponding difference in cost) for a given change in routing for each node, (ii) how to communicate the changes in routing to each node and (iii) how to perform the joint optimization in a distributed manner. The first issue could be solved with some training data, where the differences in correlation (and the cor- responding differences in routing costs) for different routing choices are estimated off line. In this case, the routing optimization could also be done off line and the routing information could then be flooded to the nodes from the sink. Depend- ing on the transform, these changes in correlation (and routing costs) could also be estimated in a distributed manner by having nodes exchange data with their immediate neighbors, then the optimal routing decisions could be made at each node. For example, in the T-DPCM scheme proposed in [41], each node computes a difference between its own data and that of its parent in the routing tree. Thus, the total cost for routing and compression for each node is equal to (i) the cost to route raw data to its parent plus (ii) the cost to route the difference between its own data and its parent’s data to the sink. Since the cost for each node (cost (i) plus cost (ii))onlydepends onthechoice of parent, thecostfor eachnodetojointly routeandcompressitsowndataiscompletelyindependent ofanyothernodes’cost. Thus, if each node chooses the minimum cost parent for itself, the total cost will also be minimized. Moreover, each node can exchange routing information (e.g., ETX) and some training data (used to estimate bit rate for prediction residuals), andthencanchoose theparent thatminimizes cost (i)plus cost (ii)inacompletely distributed manner. Note that care must be taken to ensure that routing loops do not occur (e.g., the best parent of node n is node m, and the best parent for node m is node n, etcetera), so some local consistency checking would also need to be done to ensure that no cycles exist. 90 For more complex transforms such as the unidirectional wavelet transforms in Chapter 3, data from multiple neighbors (both descendants and ancestors) will be used to compute the transforms. Thus, theoptimization will bedifficult to perform in a distributed manner in general. Moreover, if topology changes such as link or node failures occur, re-configuration will be needed no matter what transform is chosen. Therefore, the proposed optimization techniques are probably best suited tovery stable, staticnetworks inwhich topologychanges areinfrequent. These and other related practical issues are an interesting topic for future work. 4.5 Evaluation of MST Performance As discussed in Section 4.2, the MST is not necessarily the best tree to minimize the distortion (i.e. maximize SNR) for a given rate. This is mainly due to its lack of merges (as compared to an SPT). For the sake of computational feasibility, we consider the small 20 node network shown in Figs. 4.2(a) and 4.2(b) where nodes are indexed by the number next to them. We use the unidirectional 5/3- like transform proposed in Section 3.4 for the comparisons. We compute all of the possible spanning trees using standard algorithms [70] then perform an exhaustive search of all possible spanning trees for the best RD tree. The performance curves are shown in Figure 4.3. In this case, there are only around 1000 spanning trees, so a brute-force search is feasible. However, in general there will be very many spanning trees. For each fixed rate we compute the distortion for each tree and choose the tree with the minimum distortion. The optimal RD tree shown in Figure 4.2(b) has one more merge (node 12) than the MST shown in Figure 4.2(a) and also has a different merge with different neighbors (node 17), resulting in reconstruction quality shown in Figure 4.3. The 91 100 200 300 400 500 0 100 200 300 400 500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 MST (a) Minimum Spanning Tree 50 100 150 200 250 300 350 400 450 500 550 0 100 200 300 400 500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 RD Opt. Tree (b) RD Optimal Tree Figure 4.2: Comparison of MST with RD optimal tree. increase in quality is not so significant in this case, but it may be more significant as the network density grows. The fact that adding a merge improves performance is consistent with our previous discussion. These results suggest that having more merges at a given node will generally provide better performance. This is reasonable if the data considered is spatially stationary and is highly correlated across space. As discussed before, adding more neighborstoapredictnodewilltendtoproducesmallerresidualenergy. Conversely, anupdatecoefficient (lowpass)willprovideasmoothapproximationtotheoriginal data and so adding more residues (i.e. predicts) can increase smoothness which would also reduce the number of bits. All things considered, an MST can provide a good approximation to the optimal RD tree but it is clearly not optimal, mainly because it does not have many merges and because the merges it does have may not occur in the right places in the tree. Finding trees that can eliminate the shortcomings of MSTs is an interesting area for future work. 92 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 20 25 30 35 40 45 SNR vs. Rate Rate SNR Raw Data 2D SPT 2D MST 2D Opt. Rate Figure 4.3: Performance Comparison of MST and RD Optimal Tree 4.6 Experimental Results To evaluate the performance of the proposed optimization algorithm we use the sample 40 node network shown in Figure 4.4(a). We use the same set of sample data, the same cost models and the same entropy coding techniques presented in Section 3.6. We find a jointly optimized transform and routing, with the transform fixedastheHaar-likeseparablewaveletswithandalsowithoutbroadcasts. Wecom- parethejointlyoptimizedtransformsandtreesagainstanSPTusingT-DPCM,the 5/3-like separable wavelets and the Haar-like separable wavelets (with and without broadcasts). The jointly optimized network topology is shown in Figure 4.4(a) and the performance curves are shown in Figure 4.4(b). As we can see, the heuristic optimized tree with Haar-like transform is exactly the same as the full optimized tree, and gives a 2 dB improvement in SNR over the best SPT Haar-like transform (with broadcasts). Thus, in this case we see that the heuristic optimization algorithm presented in Section 4.3 provides essentially 93 0 200 400 600 0 200 400 600 SPT w/ Broadcasts 0 200 400 600 0 200 400 600 MST 0 200 400 600 0 200 400 600 Heuristic Jointly Optimal Tree 0 200 400 600 0 200 400 600 Jointly Optimal Tree (a) Sample Network 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 5 10 15 20 25 30 35 40 45 50 55 Total Energy Consumption (Joules) SNR (dB) SNR vs. Energy Consumption Haar−like Opt. Tree Haar−like Heuristic Opt. Tree Haar−like w/ Broad. Haar−like 5/3−like T−DPCM Raw Data (b) Cost-Distortion Curves Figure 4.4: Jointly optimized network with corresponding Cost-Distortion curves. In (a), blue lines denote forwarding links, dashed magenta lines denote broadcast links, green circles represent even nodes, red x’s represent odd nodes, and the black center node is the sink. the same performance as the full optimization algorithm in Section 4.2.1. We make similar observations for other networks (which have a few thousand feasible trees) in that the heuristic and full optimized trees are almost exactly the same 1 . This provides a clear example where the total cost can be reduced using joint transform and routing optimization. Naturally, the overall gains that can be achieved will vary from network to network. Wealsocomparethisjointroutingandtransformoptimizationalgorithmagainst the graph-based even/odd split optimization described in Section 2.2.2 [37]. The corresponding graph-based even/odd split and the cost-distortion curves are shown in Figure 4.5. In this case, the graph-based even/odd split and the routing tree optimized using the greedy heuristic give very similar performance, while the opti- mized routing tree using the full search algorithm still gives the best results. We 1 Of course it is computational infeasible to always compare the full optimization method with our proposed heuristic since the number of possible trees is generally very large. 94 make similar observations for other networks. As such, the joint optimization of routing and transform should typically provide performance that is competitive with the graph-based even/odd splitting optimization. 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Transform Structure on Graph (a) Sample Network 0.01 0.015 0.02 0.025 0.03 0.035 10 15 20 25 30 35 40 45 50 55 Total Energy Consumption (Joules) SNR (dB) SNR vs. Energy Consumption Graph−based Split Haar−like Opt. Tree Haar−like Heuristic Opt. Tree Haar−like w/ Broad. Haar−like (b) Cost-Distortion Curves Figure 4.5: Comparison of optimized graph-based splitting and optimized routing. In (a), blue lines denote forwarding links, dashed magenta lines denote broadcast links, green circles represent even nodes, red x’s represent odd nodes, and the black center node is the sink. In order to get a better sense of what will happen on average, we also consider thecostreductionforlossless codingaveragedover multiplerandomnetworks. This is shown in Figure 4.6 for the high correlation data. With routing optimization, we see an average reduction in total cost of around 4% with respect to the Haar- like transformonan SPT. This suggests that routing optimizationmay not provide significantperformanceimprovementsonaverage. Thisconclusionisalsoconsistent withpreviouswork[79], whichshowed thatroutingwithcompression alonganSPT is nearly optimal for varying degrees of inter-node data correlation. Our results are simply a reflection of that. 95 0 100 200 300 400 500 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Average Percent Cost Reduction No. of Sensors (C r −C t )/C r Haar−like Opt. Tree Haar−like Wav. w/ Broad. Haar−like Wav. 5/3−like Wav. T−DPCM Figure 4.6: Cost reduction ratios with routing optimization. 96 Chapter 5 Graph-based Transforms for Image Coding The main focus of this chapter is on developing graph-based transforms for effi- cient image representations. Efficient representations are typically achieved in one of two ways. The first common method is to apply a separable transform to an image in order to de-correlate data across neighboring pixels. The second method is to divide the image into blocks, to predict the pixel values in each block using pixel values from neighboring blocks, then to compute prediction residuals. Both of these techniques have been adopted in standards such as JPEG, JPEG-2000 and H.264/AVC, andtendprovide very efficient representations forsmoothimageswith simple discontinuities (oredges) such ashorizontal orvertical edges. However, they do not provide efficient representations for image regions with more complex edge structure. In particular, they tend to produce many large magnitude coefficients in regions with more complex edges, and these require many bits to be encoded. Moreover, quantization of these large magnitude coefficients lead to annoying arti- facts near edges in the reconstructed images, e.g., the well-known ringing artifacts. Note that filtering across edges in the image is the main source of inefficiency in both of these schemes. Thus, the main goal of this chapter is to develop graph- based transforms that avoid filtering across edges. This should reduce the number 97 of large magnitude coefficients, hence reducing the total bit rate needed to repre- sent the transform coefficients. This should also better preserve the edge structure in the reconstructed images. To this end, we make two contributions, summarized as follows. 5.1 Overview The first part of this chapter discusses a separable, edge-adaptive, tree-based lift- ing transform. Note that in all of the existing image and video coding standards, a standard separable transform (i.e., a transform that consists of row-wise filter- ing followed by column-wise filtering, or vice versa) is used to de-correlate data across neighboring image pixels. These separable transforms can efficiently repre- sentsmoothimagesregionswithonlyhorizontalorverticaledges,i.e.,thereareonly a few large magnitude transform coefficients. This is mainly due to the shape of the corresponding basis functions, i.e., only a few of the basis functions are smooth or are separated into smooth regions separated by only horizontal or vertical edges. However, forregionswithmorecomplex edgestructure, they tendtoproducemany large magnitude transform coefficients. In order to achieve an efficient representa- tion of regions with more complex edges, we need to adapt the transforms to the edge structure. In the first part of this chapter (Section 5.3, which summarizes the work proposed by the author in [53,56]) we focus on developing edge-adaptive tree- based lifting transforms, i.e., tree-based transforms that avoid filtering across edges as to avoid creating large magnitude high-pass coefficients. This leads to higher overall coding efficiency (i.e., fewer bits to achieve a fixed reconstruction quality) than the standard separable wavelet transforms used in JPEG-2000 [67]. 98 Another way of efficiently representing images is by dividing the image into blocks, predicting the pixel values in each block using pixel values from neighbor- ing blocks, then computing and transmitting prediction residuals. These types of schemes are referred to as intra prediction schemes and are a part of standards such as H.264/AVC and MPEG-4. In these standard schemes, predictions are al- ways computed in a single direction and in order to account for directional edge information in each block (e.g., diagonal edges), a fixed set of prediction directions is used and the “best” one is chosen. These directional predictions will be effective for encoding image blocks where there is only a single diagonal edge, since then a very accurate prediction can be produced and the prediction residuals will be very small. This provides an efficient representation of each block since small pre- diction residuals will require many fewer bits for encoding than the original pixel values. However, in regions with more complex edge structure such as “L” shaped or “V” shaped edges, these directional prediction modes will produce large predic- tion residuals at pixels near the edges, and these large residuals will require many bits to be encoded. Note that complex edge structure poses the same problems for the fixed directional predictions as it does for the standard separable transforms (since many large magnitude coefficients are produced), and we can address it in a similar manner by not computing predictions across edges. As such, in the second portion of this chapter (Section 5.4, which describes the work presented by the au- thor in [51]) we propose an edge-adaptive intra prediction scheme that can produce accurate predictions for blocks with more complex (non-diagonal) edge structure. As an additional application, we also consider using the edge-adaptive trans- form and intra prediction scheme for coding depth map images used in multi-view video coding systems. Recent advances in multi-view video have generated interest in new applications such as 3D-TV [59], bringing a plethora of 3D video services 99 closer to reality. However, the amount of data captured in such systems is typi- cally very large, making it difficult to store and transmit. We can decrease this data by (i) reducing the number of views, and (ii) using data taken from actual cameras at known positions to interpolate intermediate views. Note that view in- terpolation techniques such as depth-image-based rendering [76] require accurate position information in 3D space for objects in the scene, i.e., they require depth map information in addition to 2D pixel positions. These depth maps are an extra source of data and should therefore be compressed. However, compression artifacts indepth maps(especially aroundedges) canleadtoannoying visual artifactsin the interpolated views [26,27,29]. Thus, these edge-adaptive schemes can also be used toefficiently compress depthmapswhile also preserving theedgeinformation. Also note that the depth maps are actually never displayed and are only used for inter- polation, so it is also important to consider the trade-off between depth map rate and interpolation distortion. Therefore, we optimize this trade-off by leveraging results from [26,27]. Wefirstobservethatdepthmapstypicallyconsistofnearlyconstantregionssep- arated by edges, i.e., depth maps are nearly piece-wise constant signals. Moreover, preserving edges in depth maps often yields view interpolations that are percep- tually better. Thus, using transforms and/or prediction schemes that exploit this piece-wise constant assumption while also preserving edges should lead to better interpolation results. As is shown in work by us [50], using edge-adaptive lift- ing transforms leads to more efficient depth map coding while also improving the quality of the synthesized views. Similar improvements were also observed for the edge-adaptive intra-prediction scheme proposed by us in [51]. The performance im- provements areevaluatedexperimentally inSection5.3fortheedge-adaptivelifting transforms and in Section 5.4 for the edge-adaptive intra prediction scheme. 100 5.2 Preliminaries We first establish some preliminary concepts andnotations that arecommon to the edge-adaptive tree-based lifting transform and the edge-adaptive intra prediction scheme. Let X denote the original image and denote each pixel by the pair (i,j), with its intensity value given by value X(i,j). Let N r and N c denote the number of rows and columns of X, respectively. Note that both of these methods rely on a graph-based signal representation. Therefore, let G = (V,E) denote an undirected graph, where the setV represents the vertices (i.e., the pixels) in the graph and the setE denotestheset ofconnectionsbetween pixels. Also leteach[(i,j),(i ′ ,j ′ )]∈E denote a connection from pixel (i,j) to (i ′ ,j ′ ) in V. Note that the edge-adaptive lifting transform and the edge-adaptive intra pre- diction scheme both avoid filtering across edges, thus, they both require knowledge of edge locations. We compute edge locations using the edge detector in work done by the author in [50], which is just a modification of the one used in [32]. This technique defines edges to be at fractional locations between pixels. In particular, edges around pixel (i,j) exist at (i,j±0.5), (i±0.5,j) and (i±0.5,j±0.5), and an edge exists whenever|X(i,j)−X(i,j±1)|>T,|X(i,j)−X(i±1,j)|>T and |X(i,j)−X(i±1,j±1)| > T, respectively, for some threshold T. This leads to a binary edge map B which has 2N r rows and 2N c columns and specifies an edge between pixel (i,j) and(i,j+1) if andonly ifB(2i,2j+1) = 1. Otherwise, thereis no edge between them andB(2i,2j+1) = 0. Note that this is equivalent to stating that an edge exists at location (i,j +0.5), just that we index it using integers by multiplying by 2, e.g., 2· (i,j + 0.5) = (2i,2j + 1). Similarly, B(2i +1,2j) = 1 implies that there is an edge between pixel (i,j) and (i+1,j),B(2i+1,2j+1) = 1 implies that there is an edge between pixel (i,j) and (i+1,j +1), etcetera. Note 101 thatB(2i,2j) = 0 for alli,j and ifB(2i±1,2j) = 0, then there is no edge between pixel (i,j) and (i±1,j). In a similar way, B(2i,2j±1) = 0 implies that there is no edge between pixel (i,j) and (i,j±1), and B(2i±1,2j±1) = 0 implies that there is no edge between pixel (i,j) and (i± 1,j± 1). Since our transforms are constructed using this edge information, the edge map must be encoded and sent to the decoder so that it can re-produce the same transform used at the encoder. 5.3 Tree-based Lifting Transforms When using standard separable wavelet transforms in images with complex con- tours, significant amounts of energy may be present in all high pass sub-bands, mostly concentrated in large coefficients near contours. Encoding these coefficients is costly in terms of rate. Reduction of such high pass sub-band energy can be achieved by applying separable filtering along directions on which pixels exhibit low intensity variation. Directionlets [71] and Bandelets [43] achieve this using (i) a block-based segmentation defining dominant directional flow of pixel intensity in each block and (ii) separable filtering along a grid aligned with each directional flow. In Directionlets integer lattices are used, while in Bandelets the original grid is re-mapped onto a grid aligned with directional flows, onwhich separable filtering is applied. In both of these methods a set of “good” paths for traversing the pixels is defined (i.e., one that exhibits low variation in pixel intensity) and then filtering is applied along those paths. Both of these approaches are constrained in that they assume that a set of parallel paths can approximate well the geometric flow of an image in a given region, e.g., a block. Moreover, both of these transforms provide superior performance to standard separable wavelets mainly because they do not 102 filter across edges. Thus, one can still achieve competitive coding performance to these transforms by performing edge-avoiding filtering as is done in, e.g., the shape-adaptiveDWT[30]andtheworkin[11],whereedge-avoidingfilteringisdone only along rows and columns. The main contribution in this section is to propose techniques that avoid filtering across edges while also loosening the constraint of parallel paths by allowing non-parallel paths for filtering (unlike [11,30,33,71]) and by not restricting flows to be uniform within blocks (unlike [33,71]). This leads us to a set of edge-adaptive tree-based wavelet transforms, where the main goal is to design trees which do not cross over discontinuities in an image, and then to perform filtering along these trees. This serves to reduce the amount of energy in the high-pass subbands, and ultimately leads to greater coding efficiency. Thesetypesoftreeswereinitiallydevelopedbytheauthorin[53]wheremultiple minimum spanning trees (MSTs) with different orientations were used to provide a separable transform. This approach has two major drawbacks. First, lifting-based prediction and update filters lead to an invertible representation but, given the structures of the trees (e.g., the merge points), the corresponding basis functions tend to be far from orthogonal (unlike similar lifting structures for 1D regularly sampled signals). This results in signal energy being spread evenly across sub- bands which reduces energy compaction. Second, the MST traversal does not guarantee that downsampled pixel locations will be regularly spaced in successive levels of decomposition. Thus filtering is often applied across pixels that are far apart. Ourwork in[56]improved uponthisby(i)designing orthogonalizingupdate filters (as detailed in Section 2.4.2), and by (ii) designing trees that provide regular downsampling patterns. In this section we describe the work in [56]. Section 5.3.1 describes the tree- based transform construction. We then discuss tree constructions that produce 103 regular downsampling patterns while avoiding crossing over discontinuities in Sec- tion 5.3.1.2. Some experimental results are shown in Section 5.3.2 and some con- cluding remarks are provided in Section 5.5. 5.3.1 Tree-based Transform Design Wenowdescribehowtree-basedwaveletssuchasthoseproposedinSection3.3.4can be applied to images. Assume a tree T has been selected that allows us to traverse thepixels intheimagefromleavestothetreeroot. Supposeweindextherootnode of T by (MN +1,MN +1). LetC m,n and ρ m,n denote the set of children and the parent of pixel (m,n) in T, respectively, and letN m,n =C m,n ∪{ρ m,n }. Finally, let depth(m,n) be the depth of pixel (m,n) in T, with depth(MN +1,MN +1) = 0. A lifting transform can then be developed along T as discussed in Section 2.2. Let O and E denote the sets of odd and even pixels and let p m,n and u k,l denote the prediction and update operators at pixels (m,n) ∈ O and (k,l) ∈ E, respectively. For the sake of simplicity, we only allow pixels to use data from their one-hop neighbors, i.e., for all odd pixels (m,n) we require p m,n (m,n) = 1 and p m,n (i,j) = 0 for all (i,j) / ∈C m,n ∪{ρ m,n }. Similarly, for eachu m,n . The transform is computed as follows. For each (k,l)∈P: d(k,l) =X(k,l)− X (it,jt)∈N k,l p k,l (i t ,j t )X(i t ,j t ) (5.1) and given every d(k,l(, for each (m,n)∈U we have: s(m,n) =X(m,n)+ X (it,jt)∈Nm,n u m,n (i t ,j t )d(i t ,j t ). (5.2) 104 5.3.1.1 Lifting Filter Design Note that images are locally smooth in that around each pixel (m,n), the values of neighboring pixels tend to be similar to X(m,n). Thus, whenever an odd pixel (m,n)hasatleast3neighborsinthetreetheplanar-basedpredictionfilterdiscussed inSection 2.3.1canbeused toprovide a goodprediction ofits data. If anoddpixel (m,n) has at most 2 neighbors, then clearly a planar based prediction cannot be generated, so for these pixels we use a simple average. More specifically, whenever |N m,n | > 3, p m,n (N m,n ) = − [m n 1]· A t m,n A m,n −1 ·A t m,n t , where N m,n = {(i 1 ,j 1 ),(i 2 ,j 2 ),...,(i |Nm,n| ,j |Nm,n| )} and A m,n = i 1 j 1 1 . . . . . . . . . i |Nm,n| j |Nm,n| 1 . Whenever |N m,n | = 1 or |N m,n | = 2, p m,n (N m,n ) =− 1 |Nm,n| . For the update filters, we choose the orthogonalizing update filter design proposed in Section 2.4.2. 5.3.1.2 Tree Construction In this thesis, we compute a set of discontinuities D is defined using edges in an image. We use the edge detector and the corresponding graph described in Sec- tion 5.2. A tree is then constructed this graph (which has no links which cross over edges). In early work by us [53], MSTs were used to avoid filtering across disconti- nuities. However, when applying multiple levels of decomposition by subsampling the MSTs based on parity of depth, there is no guarantee that pixels will be regu- larly spaced at each level. This causes the sampling grid at each level to be highly irregular, i.e., the distance (in the image) between pixels that are neighbors in the 105 tree can vary significantly. As a result, it is possible that a group of pixels that are neighbors in the tree are located far from each other in the image. This loss of “spatial locality” in the filtering operations does not occur when using separable 2D transform (for which the resulting 2D filters in a given subband have the same spatial localization at all positions). This was addressed in later work by the author [56] by constructing (horizontal and vertical) trees that avoid filtering across major discontinuities while preserving regularsampling gridsover multiple levels ofdecomposition. Thistreeconstruction process is now described for one level of decomposition. The binary edge map is encodedusing,e.g.,JBIG[24]andissentassideinformationsothatthedecodercan construct the same trees. Sample trees are shown in Figure 5.1 with discontinuities shown by red dots. The horizontal tree is constructed starting from pixels in the right-most column of the image to pixels in the left-most column. Assume N is even and that pixels in the right-most column are even. We want pixels in an even (odd) column to be even (odd) as to preserve column-wise parity. To do so, we just need to specify a parent for each pixel such that the link between itself and its parent does not cross over discontinuities and does not induce cycles. Since construction progresses from right to left, a natural set of parental candidates for pixel (m,n) is a set of pixels to its left given by {(k,n−1)|m l ≤ k≤ m u } for m l ≤ m≤ m u . Let Cand m,n denote this set of candidates. To avoid filtering across discontinuities, we must eliminate any candidate (k,n−1) such that a discontinuity point exists between (m,n) and (k,n−1). So ifL k,n−1 m,n is the set of points on the line segment between (m,n) and (k,n−1), we have Cand m,n ={(k,n−1)|m l ≤k≤m u ,L k,n−1 m,n ∩D =∅.} (5.3) 106 If Cand m,n 6= ∅, the parent of (m,n) is chosen as the pixel in Cand m,n closest to (m,n). Figure 5.1 (b) shows an example of this horizontal tree when m l = m−2 and m u =m+2. If Cand m,n =∅ (no valid parental candidates), we do the following. If n is odd, the parent of (m,n) is the sink. In the example of Figure 5.1 (a), all of the pixels in the first column have no valid parental candidates, thus, their parent is simply the sink. If n is even, then setting the parent of (m,n) as the sink will not preserve the proper parity. Instead, we can set the parent of (m,n) as (m,n+1), in which case the parent of (m,n+1) must be set as the sink to avoid creating a cycle. For the same example, pixel (4,4) is even with Cand 4,4 =∅ so its parent is (4,5) and the parent of (4,5) is the sink. Each vertical tree is constructed in a similar manner. Note how the horizontal andvertical treesinFigure5.1(b), (c) and(d) avoidfiltering across discontinuities. 5.3.1.3 Separable Tree-based Transforms Note that we can compute the transform over multiple trees, with different orien- tations, with a different tree used at each level of decomposition. We now outline a general framework for computing general 2D transforms along multiple trees in a separable manner, with the main goal of exploiting directionality in images. A simple example of such separable trees is shown in Figure 5.1. Suppose we are given an arbitrary method for constructing trees that follow the geometric flow in an image given a prespecified root node (or more generally, set of root nodes). For a one level decomposition (using a tree in one direction followed by 2 trees in an- other), we would first apply 1 level of decomposition along one tree T 1 (as shown in Fig 5.1(b)) oriented in one direction, then split the set of coefficients in T 1 into even (low pass) and odd (high pass) subsets (according to their depths in T 1 ) and 107 (a) Toy Image with Edges 0 2 4 6 8 0 2 4 6 8 (b) Horizontal Tree 0 2 4 6 8 0 2 4 6 8 (c) Vertical Tree on Odds 0 2 4 6 8 0 2 4 6 8 (d) Vertical Tree on Evens Figure5.1: Exampletoillustratetreeconstruction, wherelinks inthetree(denoted by blue lines between pixels) are not allowed to cross edges in the image (denoted by red dots) then run a one level transform along a second tree T 1,l (as shown in Fig 5.1(c)) and third tree T 1,h (as shown in Fig 5.1(d)) over the even and odd subsets respectively. 5.3.2 Experimental Results We compare the performance of our proposed tree-based lifting transform against thestandard9/7separabletransform[67],thestandard5/3separabletransform[67] andsecondgenerationbandelets[44] 1 . Allofthetransformcoefficientsareencoded usingSPIHT[49] 2 . Forthetree-basedtransforms,wealsocomparemeanpreserving update filters (Section 2.4.1) against orthogonalizing update filters (Section 2.4.2). 1 http://www.cmap.polytechnique.fr/peyre/bandelets/ 2 http://www.cipr.rpi.edu/research/SPIHT/ 108 The edge maps are generated using the edge detector described in [32] and are encoded using JBIG [24]. Five levels of decomposition are used in all cases. We evaluatetherelative performanceonthestandardpeppersimageandalso using the ground truth depth map taken from the Middlebury data set 3 . The peppers image andits corresponding edgemap areshown inFigure5.2(a) and5.2(b), respectively. The Tsukuba depth map and its edge map are shown in Figure 5.3(a) and 5.3(b), respectively. (a) Peppers image (b) Edge map Figure 5.2: The Peppers image (a) and its corresponding edge map (b) Coding performance is shown in Figure 5.4 and 5.5 for the peppers and tsukuba images, respectively. The edge threshold used here is T = 30. Despite the addi- tional bits for edge information, the tree-based transforms still gives performance far superior to the standard transforms, with up to 2 dB increase in PSNR for the peppers image and 7 dB increase for the tsukuba image. Most of this gain comes from not filtering across edges. Second generation bandelets [44] only provides up to 0.2 dB improvement over the standard transforms. Note that that this scheme 3 http://vision.middlebury.edu/stereo/ 109 (a) Tsukuba depth map (b) Edge map Figure 5.3: The Tsukuba depth map (a) and its corresponding edge map (b) does processing in the wavelet domain by searching for the “best” mapping of a square block onto a 1D line, then applies an orthogonal 1D wavelet transform on the corresponding 1D signal. However, as was observed in their work, most of the resulting 1D signals still have many sharp transitions (e.g., edges). Thus, it is still likely to have some large wavelet coefficients, though the number of total number of large wavelet coefficients is reduced. On the other hand, our transform operates in the spatial domain by avoiding filtering across known (i.e., detected) edge loca- tions. Thus, for these types of piece-wise smooth images with very little texture, our transforms tend to produce fewer large wavelet coefficients than second gen- eration bandelets, hence the better performance. However, for images with more directional texture information, bandelets is likely to do better than our proposed transforms. We will examine this shortly by using test images that contain a large amount of texture. The orthogonalizing update filters provide an additional 0.1 dB to 1.5 dB im- provement in PSNR over non orthogonalizing update filters, and more gain is seen at lower bit rates. This is not surprising since the orthogonalizing update filters 110 from Section 2.4.2 showed that this choice of update filters minimizes the recon- struction MSE due to quantization. Thus, we obtain performance improvements by (i) not filtering across edges, (ii) using merges and (iii) using orthogonalizing update filter designs. It is worth noting that using our proposed transform on trees with merges does better than on trees with no merges, but only provides another 0.05 dB gain on average. Thus, using merges may not provide significant improve- ments in general. The reconstructed depth maps at 0.25 bpp are also shown in Figure 5.6. Clearly the reconstruction using our tree-based transform looks much better and has many fewer ringing artifacts. 0 0.2 0.4 0.6 0.8 1 18 20 22 24 26 28 30 32 34 36 38 bpp PSNR PSNR vs. bpp Tree−based Bandelets Standard Figure5.4: Rate-distortioncurve forvarioustransformsusingpeppersimage. Tree- based transforms give the best performance, and orthogonalizing update filters provide additional gain over mean-preserving update filters. 111 0 0.2 0.4 0.6 0.8 1 25 30 35 40 45 50 55 bpp PSNR PSNR vs. bpp Tree−based (Orth. Update) Tree−based (Non−orth. Update) Standard Figure 5.5: Rate-distortion curve for various transforms using depth map image. Tree-basedtransformsgivethebestperformance, andorthogonalizingupdatefilters provide additional gain over mean-preserving update filters. For the piece-wise smoothimages shown inFigures 5.2(a) and 5.3(a), our trans- form outperforms the standard wavelet filters and 2nd generation bandelets. How- ever, second generation bandelets will probably outperform our proposed method for images with many textured regions, particularly since edge detection will not be very reliable for textured regions. We examine this using the standard Lena and Barbara images shown in Figures 5.7(a) and 5.7(b), respectively. The corre- sponding RD curves are shown in Figures 5.8(a) and 5.8(b), respectively. For the Lena image, 2nd generation bandelets does slightly better at lower bitrates and is the same at higher rates. On the other hand, our edge-adaptive lifting transform is worse at lower rates but better at higher rates. The main reason it is worse at lower ratesis because of thehighedge mapbit rate(around0.018bpp), thus, much 112 (a) Tree-based transform (b) Standard 9/7 Figure 5.6: Subjective performance comparison at 0.25 bpp. Our proposed method has a PSNR of 42.65 dB whereas the standard 9/7 transform has PSNR of 35.83 dB. This difference is clearly reflected in the reconstructed images of the bitrate is consumed in coding the edge information. For the Barbara image, our edge-adaptive transform performs about the same as the standard transform and bandelets at lower bitrates, but (just as with the Lena image) does better at higher bitrates. Themaincause forworseperformanceat lower bitratesisthesame as with the Lena image. Therefore, our edge-adaptive lifting transform should con- sistently outperform existing methods for piece-wise smooth images with very little texture, but not necessarily for images with textured regions at lower bitrates. 113 (a) Lena image (b) Barbara image Figure 5.7: The Lena image (a) and Barbara image (b) 0 0.2 0.4 0.6 0.8 1 24 26 28 30 32 34 36 38 40 bpp PSNR PSNR vs. bpp Tree−based Bandelets Standard (a) Lena RD Curve 0 0.2 0.4 0.6 0.8 1 20 25 30 35 bpp PSNR PSNR vs. bpp Tree−based Bandelets Standard (b) Barbara RD Curve Figure 5.8: RD curves for the Lena image (a) and Barbara image (b) 114 5.4 Edge-Adaptive Intra Prediction We now introduce our edge-adaptive intra prediction scheme originally proposed in [51]. This follows the same spirit as the wavelet transforms proposed in Sec- tion 5.3 in that filtering across edges is avoided. However, the algorithms in this section provide a set of edge-avoiding graphs, whereas in Section 5.3 a set of edge- avoiding trees is provided. Moreover, only edge-avoiding prediction is considered. Edge-adaptive wavelet transforms have been proposed in Section 5.3 as well as in [32,50]. However, note that these transforms are not easily amenable to block based processing as used in H.264. Platelets [75] have been proposed for efficient representation of piece-wise planar images, and these ideas were extended to depth map encoding in [35]. The coding results of [35] are quite good with respect to JPEG-2000. However, the edges are approximated by lines, hence, they become “smoothed out”. This may lead to worse interpolation results. Moreover, only the parameters for the planar approximation aresent without any residue, so there will always be some fixed approximation error. In this section we seek to develop a block-based, edge-aware intra prediction scheme that can preserve the edge infor- mation in images while also being easy to integrate with H.264. TheintrapredictionmodesinH.264providegoodpredictionsforblocksconsist- ing of flat regions or regions with only horizontal, vertical or diagonal edges, e.g., Figure 5.9(a) and 5.9(b). However, those modes do not provide good predictions for blocks with arbitrary edge shapes, e.g., Figure 5.9(c). Figure 5.10 shows the predicted pixels (pixels a-p) and predictor pixels (pixels A-M) used in H.264 4×4 intra prediction. In blocks such as Figure 5.9(c) there will often be edges between pixels and their predictors. This leads to large prediction residuals which require more bits to be represented. Moreover, when used to encode depth map images 115 used for view synthesis in a multi-view video coding system, quantization of these large residuals produces ringing artifacts around the depth map edges, and these tend to produce annoying visual artifacts in the synthesized views. To address this inefficiency, we develop an edge-aware intra prediction scheme that provides accurate predictions for blocks with arbitrary edge shapes. Our proposed scheme provides an exact representation (up to quantization errors, unlike [35]) and can be easily integrated with H.264 coding tools (unlike [32,50]). No such method has been developed yet to the best of our knowledge. (a) Horizontal (b) Diagonal (c) Arbitrary Figure 5.9: Examples of blocks with different edge structure. Blocks such as those in (a) and (b) can be efficiently represented by existing intra prediction schemes. Blocks such as those in (c) are not efficiently represented. e f g h i j k l m n o p I J K L a b c d M A B C D E F G H Figure 5.10: Predicted pixels (a-p) and predictor pixels (A-M) used in H.264. 116 We separate the design of edge-aware intra prediction schemes into three parts: (i) detection of edge locations, (ii) identification of “valid predictors” which do not have an edge between themselves and a given pixel, and (iii) prediction of the intensity of a pixel using the intensities of its “valid predictors”. Edge locations are found using the technique described in Section 5.2. Valid predictors for each predicted pixel are then identified by finding paths (in a graph) to predictor pixels that do not cross edges. A method for choosing a prediction among the valid predictors is also proposed in the case of depth map images, though more general prediction choices could be made for other types of images. In fact, the primary focus of this section is on efficient, edge-preserving depth map coding for use in multi-view video coding systems, though the ideas presented here can be easily extended to more general types of images. Since depth maps typically consist of nearly flat regions separated by edges, we assume that depth maps are piece-wise constant signals. In this case, the best prediction for a given pixel is just the intensity value of any of its valid predictors. In order to optimize the trade-off between depth map bitrate and interpolation distortion, we leverage results from previous work [26,27]. In particular, we use the distortion metric proposed in [27] in the rate-distortion (RD) optimized mode selection. Thisyieldsanadditionalimprovementontopofwhatthenewedge-aware intra prediction scheme provides. This section is organized as follows. Section 5.4.1 describes our proposed edge- adaptive intra prediction scheme. The optimization between depth map bitrate and interpolation distortion is described in Section 5.4.2. Section 5.4.3 shows ex- perimental results demonstrating thegains of our proposed methods. In particular, we are able to reduce the bit rates for coded depth maps by up to 37% for a fixed interpolated PSNR. Finally, some concluding remarks are made in Section 5.5. 117 5.4.1 Edge-adaptive Intra Prediction We describe our scheme only in the case of 4×4 intra prediction, but it can be easily extended to other block sizes. We separate the design of an edge-aware intra prediction scheme into three parts. First we must find edge locations using an edge detector as described in Section 5.4.1.1. Once we compute the edge locations, we must then determine the set of “valid predictors” for each pixel in a given block. This is done using a graph-based representation of the pixels in a block, as described in Section 5.4.1.2. Finally, for every pixel in a block, we must determine thepredictionvalueforeachvalidpredictor. ThisisalsodescribedinSection5.4.1.2 under a piece-wise constant signal model for depth maps. 5.4.1.1 Edge Detection We compute edge locations using the edge detector in Section 5.2, which is just a modification of the one used in [32]. This edge detector produces a binary edge map that must be encoded and sent to the decoder so that it can produce the same (edge-adaptive) predictions used at the encoder. However, for many test sequences we observe that the number of bits required to encode the edge map for the entire image is very large. Thus, it is necessary to control the amount of edge information that is generated. We describe a block-based coding scheme for the edge information in Section 5.4.1.3. 5.4.1.2 Predictor Selection In order to find accurate predictors for each pixel, we must first identify valid prediction neighbors. Then, given the set of valid predictors, we must compute prediction values. We identify the “valid predictors” of each pixel as follows. First, 118 for each block we form an undirected graph G = (V,E) which has as vertices the pixels a-p in the given block and the pixels A-M from previously coded blocks (see Figure 5.10). For each pixel (i,j), we restrict the connections to be only with its left, right, top and bottom neighbors, i.e., pixel (i,j) can only have connections with pixel (i,j−1), (i,j+1), (i−1,j) and (i+1,j). Then for pixel (i,j), we define a connection between (i,j) and (i,j− 1) if and only if there is no edge between them, i.e., [(i,j),(i,j−1)] ∈ E if and only if B(2i,2j−1) = 0. The connections between (i,j) and (i,j−1), (i+1,j) and (i−1,j) are defined in a similar manner. As an example, consider the image in Figure 5.11, where two constant regions are separated by an edge. For the example shown in Figure 5.11 the resulting graph used for searching for valid neighbors is shown in Figure 5.12. Note that pixel m only has connections to pixels n and L. It is not connected to pixel i since there is an edge between them. e f g h i j k l m n o p I J K L a b c d M A B C D E F G H Figure 5.11: Example of valid predictors. This section of the image consists of two flat regions separated by an edge shown by the thick solid line. In this case pixels A,B,...,K andM areallvalidpredictorsforpixelsa,b,...andi, but arenotvalid predictors for pixels h and j,k,...,p. On the other hand, pixel L is only a valid predictor for pixels h and j,k,...,p. 119 a b c h n m l k j i g f e d o p L K J H F E D C B A M G I Figure 5.12: Example of graph used to find valid predictors using the same sample from Figure 5.11. The thick dotted line with small black circles denotes the edges. Thin solid lines between pixels represent connections in the graph G. The thick solid line represents the boundary between the current block and previously coded blocks. We use this graph-based representation to emphasize the generality of our ap- proach, where the goal is to (i) find all the valid prediction neighbors of each pixel at different distances, then to (ii) compute a prediction based onthe values of these valid predictors. However, other techniques could be used that would be easier to implement in practice. Using this representation, we can systematically determine if an edge exists between a predicted pixel and a predictor pixel as follows. Note that there is no edge between two pixels if a path between them exists in G. This can be checked as follows. First let A(u,v) denote the adjacency matrix of graph G, where A(u,v) = 1 if [u,v] ∈ E, and A(u,v) = 0 if [u,v] / ∈ E. For a general graphG and adjacency matrixA, a path of length k exists between vertex u andv ifA k (u,v)> 0 [70]. Thus, for each pixel a-p, we can use this to determine if paths exist to pixels A-M. The set of valid neighbors for a given pixel is simply the set of pixels A-M which have a path to it. This entire process is described in detail in Algorithm 2, where for each pixel x, V x denotes the set of valid predictors of x. A 120 valid predictor for each pixel can then be chosen among those in the setV x . In the implementation of our algorithm, we set k max = 8. Algorithm 2 Find Valid Predictors 1: Perform edge detection as described in Section 5.2 2: Form the adjacency matrix A as described in Section 5.2 3: for each pixel x =a,b,...,p do 4: V x =∅ 5: for each pixel y =A,B,..., do 6: for k = 1,2,k max do 7: if A k (x,y)> 0 then 8: V x =V x ∪{y} 9: Break for loop 10: end if 11: end for 12: end for 13: end for Figure 5.12 provides an example of this. For instance, pixel p has a path of length 4 to pixel L, but it has no paths to pixels A-K nor to pixel M. Thus, pixel L is the only valid predictor for pixel p. Note that this technique can be used to identify the set of valid prediction neighbors for other block sizes (e.g., it can be used for edge-adaptive intra prediction in 16×16 blocks). If there exists a single pixel that does not have a path to any pixel A-M, then the edge-adaptive intra prediction scheme obviously can not be used. In such cases, only standard prediction modes are used. 121 Now that we have a way to determine the valid predictors of each pixel, all that remainsistodeterminewhichpredictorstouseandhowtocomputethepredictions. If a pixel has multiple valid prediction neighbors, then it is possible to predict the intensity at each pixelusing the values from multiple prediction neighbors. However, we note that depth maps typically consist of flat regions separated by edges, i.e., depth maps are nearly piece-wise constant signals. Thus, if pixel (i ′ ,j ′ ) is a valid predictor of pixel (i,j), then there is no edge between (i ′ ,j ′ ) and (i,j), hence, X(i,j) ≈ X(i ′ ,j ′ ) and the prediction error is almost zero. As such, one valid prediction neighbor is sufficient to predict each pixel. Thus, if a pixel has multiple valid predictors, we simply choose the one which is the least number of hops away in the graph G as its predictor. For example, in Figure 5.11 and 5.12, pixels c and g will use C as its predictor, pixel d will use pixel D, etc. In the event of a tie, we choose the lowest indexed pixel as the predictor, e.g., pixel a will be predicted by pixel A, pixel f will be predicted from pixel B. If a given depth map is not actually piece-wise constant (e.g., it is piece-wise planar or piece-wise smooth), we can compute more accurate predictions by using valuesfrommultiplevalidpredictionneighbors. Thiscanbedoneusingasimpleav- erage, a linear / planar approximation, or even by using the spectral representation of the graph G [19]. This remains a topic for future work. 5.4.1.3 Discussion Recall that edge-adaptive prediction is likely to be more useful in blocks with ar- bitrary edge shapes, e.g., Figure 5.9(c). Since it may only be useful for these types of blocks, it would be better to decide whether edge data should be sent on a block by block basis. This way we only need to encode edge data for such blocks and that choice can be encoded with a simple mode bit. In order to compute these 122 predictions for a given block, we also need the edge information from neighboring reconstructed blocks. For example, in order to find the valid predictors for pixels a-p in Figure 5.11, we also need to know the edge information from pixels A-M. However, note that the edge information from neighboring blocks can be derived from the decoded data. Furthermore, these reconstructed blocks will also be avail- able at the decoder. Thus, as long as the encoder (i) encodes edge data for such a block and (ii) derives edge data from neighboring blocks by using the reconstructed values, it will be possible for the decoder to regenerate the same edge information used at the encoder. More importantly, edge data only needs to be sent for blocks in which it helps. We can also add our proposed prediction scheme as another mode and use it with the RD-optimized mode selections in H.264. This allows us to further reduce edge information if the RD cost for such blocks (which includes the bits needed to represent the edges) is not the minimum cost among all intra prediction modes. Therefore, we only encode edge information for a block if (i) it has complex edge shapes and (ii) edge-adaptive prediction yields the minimum cost. The corresponding edge maps can be encoded using schemes such as binary arithmetic coding. 5.4.2 RD Optimization When a depth map is compressed using lossy coding, distortion occurs in the re- constructed depth map. However, depth map decoding errors affect video quality only indirectly, since depth maps are used for interpolation of intermediate views (and the depth maps themselves are not displayed). In previous work [27] it was shown that the depth map error causes translation error in the rendering process, and using this translation error the resulted distortion in the rendered view can be 123 Δx im Δy im 1 = 1 Z p (x im ,y im )+ΔZ p (x im ,y im ) − 1 Z p (x im ,y im ) A p ′R p ′{T p −T p ′} = ΔL p (x im ,y im ) 255 1 Z near − 1 Z far A p ′R p ′{T p −T p ′}. (5.4) estimated by comparing two video pixel values - one at the same location as the depth map value to be coded and the other translated due to depth map error, both in the video belongs to the same view as depth map. First, thetranslationerror canbefoundusing theintrinsic andextrinsic camera parameters along with nearest and farthest depth values in the scene as in (5.4), where Δx im and Δy im are translation errors (due to distortion in the decoded depth map) in the x and y direction, respectively, at image coordinate (x im ,y im ). Z denotes the depth value, ΔL is depth map distortion at that pixel, and p and p ′ indicateviewindices. ThecameraintrinsicparametersaredenotedA,andextrinsic parameters are the rotation matrix R and translation vector T. Z near and Z far indicates the nearest and farthest depth value in the scene, respectively. Note that these parameters can be calculated once for every depth map values in the scene so that they can be used to compute the distortion corresponding to each depth map value. Then, using this translation error, the distortion in the rendered view can be estimated by measuring the distance between pixel at position (x im ,y im ) and a pixel translated to (x im +Δx im ,y im +Δy im ), where bothpixels belong to thesame video frame (refer to [27] for details). WeapplythisnewdistortionmetrictotheRDoptimizedmodeselectionprocess todecidewhethertheproposedintrapredictionmodewillbeusedforeach4×4and 124 16×16 blocks. That is, when the RD cost is calculated, the estimated distortion at the rendered view is used with the bitrate to code the depth map. 5.4.3 Experimental Results An implementation of our proposed prediction method has been made based on H.264 / AVC (joint model reference software ver. 13.2) and the improvements that our techniques provide are demonstrated by testing with the ‘Ballet’ and ‘Break- dancers’ sequences provided by Microsoft Research [80]. We encode 15 video and depth map frames with intra only coding using QP = 24,28,32 and 36, where the proposed intra prediction applied only to depth map coding. The View Synthesis Reference Software (VSRS) 3.0 [66] is used to generate the rendered view between two coded views. We apply our proposed scheme only to 4×4 and 16×16 blocks which have bothhorizontal andvertical edges. Standardprediction modes areused for all other blocks. A binary edge map is generated for each depth map frame as described in Section 5.4.1.1. For each 4×4 (or 16×16) block which has horizon- tal and vertical edges, the corresponding block in the edge map is encoded with a binary arithmetic coder. No edge information is encoded for blocks which have no edges or only horizontal or vertical edges. In order to facilitate a quick im- plementation, we have simply replaced the horizontal up mode with our proposed prediction mode for 4× 4 blocks and replaced the planar prediction mode with our proposed prediction mode for 16×16 blocks. The results for the ‘Ballet’ and ‘Breakdancer’ sequences are shown in Figure 5.13. Our proposed intra prediction provides Bjontegaard Delta Bit Rate (BDBR) reduction of 26% and 8% for ‘Ballet’ and ‘Breakdancer’ sequences, respectively. The proposed intra prediction with new 125 distortion metric yields BDBR reduction of 29% and 16% for ‘Ballet’ and ‘Break- dancer’ sequences, respectively. The edge map bit rates for these sequences are shown in Table 5.1. Note how the edge coding scheme proposed in Section 5.4.1.3 decreases the amount of edge data as the QP value increases. This is reasonable since at lower (resp. higher) rates the edge map bit rate consumes most of (resp. less of) the total bit rate. Breakdancers 30.5 30.6 30.7 30.8 30.9 31 31.1 31.2 1000 2000 3000 4000 5000 Kbps PSNR Y H.264/AVC NewIntra NewIntra+NewRD Ballet 31.5 32 32.5 33 33.5 1000 2000 3000 4000 5000 6000 Kbps PSNR Y H.264/AVC NewIntra NewIntra+NewRD Figure 5.13: Comparison of the rate-distortion curves between the proposed meth- ods and H.264 AVC. x-axis: total bitrate to code two depth maps; y-axis: PSNR of luminance component between the rendered view and the ground truth. QP 24 28 32 36 Edges (Ballet) 483 481 469 383 Edges (Break) 403 351 177 17 Table 5.1: Edge map bit rates (in kbps). When using our new intra mode with inter frames in an IPPP structure, we see similar gains as shown in Figure 5.14. For this IPPP structure, we see lower bit rates than for all I frame structure (for fixed interpolation quality). Note that in H.264,intramodesarealsousedininter(P)frames. Thus, ournewintraprediction mode is also utilized to improve the coding efficiency for P frames. 126 Breakdancers 30.5 30.6 30.7 30.8 30.9 31 31.1 0 1000 2000 3000 4000 Kbps PSNR Y H.264/AVC NewIntra Ballet 31.5 32 32.5 33 33.5 0 1000 2000 3000 4000 5000 Kbps PSNR Y H.264/AVC NewIntra Figure 5.14: Comparison of the rate-distortion curves between the proposed meth- ods and H.264 AVC using IPPP structure. x-axis: total bitrate to code two depth maps; y-axis: PSNR of luminance component between the rendered view and the ground truth. 5.5 Conclusions A general class of separable edge-adaptive tree-based wavelet transforms has been provided that provides superior RD performance over existing methods. This is due in part to the orthogonalizing update filter design as well as by using trees that avoid filtering across discontinuities. It mainly provides improvements for piece- wise smooth images with little to no texture, but it is not always better for images with textured regions. Moreover, merges do provide some improvements, but the gains are not significant as compared with a tree without any path merges. Thus, using trees with merges may not be useful when performing separable, tree-based transforms along rows and columns. However, it may be useful for certain classes of images. This would be an interesting topic for future work. An edge-adaptive intra prediction scheme for depth map coding was also pro- posed, and has been integrated with H.264. The edge-adaptive scheme is also con- structed from an edge map, and a graph-based representation of the pixels based on the edge information. The edge information is encoded on a per block basis, and is directly accounted for in the optimized mode selection used in H.264. This 127 new scheme (without the new distortion metric) provides up to a 33% reduction in total bitrate for a fixed PSNR in the interpolated views. An existing distortion metric has also been employed that takes into account the quality of the interpo- lated views and when used with our proposed can provide up to 37% reduction in total bitrate. This demonstrates the efficacy of the proposed edge-adaptive intra prediction scheme. 128 Chapter 6 Conclusions A set of de-correlating tree-based and graph-based transforms have been proposed which can be applied to irregularly (and regularly) spaced data. In particular, a general set of lifting transforms on graphs and trees were proposed in Chapter 2. These transforms are general in that they can be constructed for any graph or any tree. More specifically, general purpose split designs were proposedandother alter- native methods were discussed in Section 2.2. Prediction filters were also proposed in Section 2.3 that provide a high degree of data de-correlation for certain classes of signals. Moreover, we proposed a novel update filter design in Section 2.4.2 that makes the LP and HP component orthogonal after each lifting step. InChapter 3, general unidirectional transformsandspecific transformconstruc- tionswerethenproposedforuseindistributedcompressionWSN.Thesetransforms are able to handle irregularly spaced node locations, and lead to energy-efficient, distributedimplementations sincetheycanbecomputedinaunidirectionalmanner along a given routing tree, i.e., they can be computed as data is routed toward the sink. Itisexactlybecauseofthisunidirectionalitythattheproposedtransformscan consistently outperform existing distributed transforms for WSN. Since the trans- forms are constructed along trees, they provide an easy way into joint optimization 129 of transform and routing. As such, we also proposed the joint optimization algo- rithms described in Chapter 4. Note that the lifting transforms we have proposed under this unidirectional framework are only bi-orthogonal. A method for con- structing orthogonal unidirectional transforms was provided in Section 3.3.2, but only one particular transform construction was given in Section 3.3.1. As a final application, edge-adaptive graph-based transforms were proposed for use in image coding. These transforms are edge-adaptive in that they avoid filteringacrossedges. Inparticular,edge-avoidingtree-basedliftingtransformswere constructedforimagesinSection5.3. Theseprovedeffectivetoolsforcodingcertain images, e.g., smooth images with very sharp discontinuities, since the number of large high frequency coefficients can be greatly reduced. This led to very efficient image representations which significantly outperform standard transforms. Edge- avoiding graphs were also constructed in Section 5.4 for use in edge-adaptive intra prediction for efficient depth map coding. In particular, we construct graphs that avoidcrossing edges, thencompute predictions alongthese graphs. Thisalso allows us to significantly reduce the number of large high frequency coefficients, thereby leading to very efficient representations of depth map images and, ultimately, to higher coding efficiency. Moreover, the compression artifacts around edges (e.g., ringingartifacts)arealsosignificantlyreduced,andthisleadstobetterinterpolation quality in rendered views. 6.1 Future Work Thereareafewinteresting directionsforfuturework. Oneistodesignmoregeneral types of orthogonal unidirectional transforms. The joint routing and transform op- timization algorithm could also be improved by (i) developing methods to perform 130 the optimization in a distributed manner and (ii) finding better high correlation trees to use in place of the MST for the joint optimization process. Currently the edge-avoiding tree-based lifting transforms for image coding can only efficiently represent piece-wise smooth images without significant texture. In particular, this methodis abletooutperformdirectional transformsfor imagesthat have very complex edge structure such as, e.g., depth map images. However, since edge detection typically fails in textured regions, this transform will not provide an efficient representation for textured image regions. In particular, existing direc- tional transforms can efficiently represent oriented textured regions such as stripes. Thus, combining the edge-avoiding lifting transforms with directional transforms could be one new way to achieve gains from both techniques. The edge-adaptive intra prediction scheme is capable of representing nearly piece-wise constant images. However, since only one predictor is used to predict each pixel, the resulting predictions will not be as accurate for images that have morevariationinpixelintensitysuchas,e.g.,piece-wiseplanarimages. Thisscheme can be improved by using more neighbors for computing predictions. Moreover, boththetree-based liftingtransformandtheedge-adaptiveintra prediction scheme both use edge information to perform edge-avoiding filtering, both could benefit from improved edge coding tools. 131 Appendix A Additional Proofs This appendix contains additional proofs for Chapter 2. A.1 Proof of Proposition 1 Lete i denotethei-thidentity vector, i.e.,e i (i) = 1ande i (j) = 0forallj 6=i. Note thatx(n) =e t n ·xand P m∈Nn p n (m)x(m) =p t n ·x, thus, we cancompute the detail coefficientasd(n) =x(n)−p t n ·x= (e n −p n ) t ·x. Thus, ifweletP = [p 1 p 2 ... p N ] t denote the N ×N prediction matrix and I be the N ×N identity matrix, we can compute detail coefficients as d = (I−P)·x. Note that d(m) =x(m) for m∈E. Furthermore, x(m) = e t m ·x and P n∈Nm u m (n)d(n) = u t m ·d = u t m ·(I−P)·x, so we can compute the smooth coefficient as s(m) = x(m)+ P n∈Nm u m (n)d(n) = (e t m +u t m ·(I−P))·x. Sincem∈E,row m (P) =p t m =0 t . Thus, (i)e t m ·(I−P) =e t m , (ii)e t m +u t m ·(I−P) = (e m +u m ) t ·(I−P), and (iii)s(m) = (e m +u m ) t ·(I−P)·x. Therefore, if we let U = [u 1 u 2 ... u N ] t be the N × N update matrix, the full transform matrix is T = (I+U)·(I−P) and the transform coefficient vector is y = (I+U)·(I−P)·x. 132 A.2 Proof of Proposition 2 We only prove that (I−P) −1 =I+P. That (I+U) −1 =I−U follows from similar arguments. Let N o =|O| and N e = |E|. Without loss of generality, suppose that O ={1,2,...,N o } andE ={N o +1,N o +2,...,N}. Under Definition 1, we have thatp n (O) = 0 for every n∈O and p m =0 for all m∈E. Thus, P = P(O,O) P(O,E) P(E,O) P(E,E) = 0 P(O,E) 0 0 . Thus, I±P = I No 0 0 I Ne ± 0 P(O,E) 0 0 = I No ±P(O,E) 0 I Ne , whereI No andI Ne denote the N o ×N o and N e ×N e identity matrices, respectively. Therefore, (I+P)·(I−P) = (I−P)·(I+P) =I. A.3 Proof of Proposition 4 We first prove Lemma 1, then use it to prove Proposition 4. 133 Lemma 1 (Orthogonality of the Dual Basis). Let K, L and N be non-negative integers such that K +L = N. Let A ={a 1 ,a 2 ,...,a K } be a set of linearly inde- pendent vectors in R N and letB ={b 1 ,b 2 ,...,b L } be a set of linearly independent vectors orthogonal to vectors in A, i.e., <a i ,b j >= 0 for all i and j. Define A = a t 1 a t 2 . . . a t K , B = b t 1 b t 2 . . . b t L andC = A B . Let C −1 = [˜ c 1 ˜ c 2 ... ˜ c N ]. Then CC t = AA t 0 0 t BB t , where 0 is the K×L zero matrix, and moreover, (C −1 ) t C −1 = (AA t ) −1 0 0 t (BB t ) −1 , Therefore, we also have a similar orthogonality relationship between the dual basis vectors, i.e., < ˜ c i ,˜ c j >= 0 for any i = 1,2,...,K and j =K +1,K +2,...,N. Proof. Since every vector inA is orthogonal to every vector inB, the vectors inA are linearly independent of the vectors inB. Therefore, C has N linearly indepen- dent columns and C −1 exists. Moreover, AB t =0 and BA t =0 t . Therefore, CC t = AA t 0 0 t BB t . 134 A t has linearly independent columns, so for anyx6=0,A t x6=0. Thus,x t AA t x> 0, which implies that AA t is positive definite. Similarly, BB t is positive definite. By basic matrix properties (C −1 ) t C −1 = (C t ) −1 C −1 = (CC t ) −1 , thus (C −1 ) t C −1 = (AA t ) −1 0 0 t (BB t ) −1 . This implies that < ˜ c i ,˜ c j >= 0 for any i≤K and j ≥K +1. The proof of Proposition 4 is as follows. Let N o = |O| and N e = |E|. The wavelet coefficient vector isw = (I+U)(I−P)x=Tx. SinceT is invertible, T −1 exists and so x = T −1 w = N X i=1 w(i) ˜ t i = N X i=1 <t i ,x> ˜ t i = X i∈E <t i ,x> ˜ t i + X j∈O <t j ,x> ˜ t j = x e +x o Without loss of generality, suppose thatO ={1,2,...,N o } andE ={N o +1,N o + 2,...,N}. Now define A = t t 1 t t 2 . . . t t No ,B = t t No+1 t t No+2 . . . t t N andT = A B . 135 Proposition 3 implies that AB t =0 and BA t =0 t and Lemma 1 implies that (T −1 ) t T −1 = (AA t ) −1 0 0 t (BB t ) −1 . Therefore, < ˜ t i , ˜ t j >= 0 for all i∈O and j ∈E. Moreover, <x e ,x o >= 0. 136 Bibliography [1] J. Acimovic, B. Beferull-Lozano, and R. Cristescu. Adaptive distributed al- gorithms for power-efficient data gathering in sensor networks. Intl. Conf. on Wireless Networks, Comm. and Mobile Computing, 2:946–951, June 2005. [2] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A survey on sensor networks. IEEE Communication Magazine, 40(8):102–114, August 2002. [3] R. Baraniuk, A. Cohen, and R. Wagner. Approximation and compression of scattered data by meshless multiscale decompositions. Applied Computational Harmonic Analysis, 25(2):133–147, September 2008. [4] C. Chong and S. P. Kumar. Sensor networks: Evolution, opportunities, and challenges. Proceedings of the IEEE, 91(8):1247–1256, August 2003. [5] D. Chu, A. Deshpande, J. Hellerstein, and W. Hong. Approximate data col- lection in sensor networks using probabilistic models. In IEEE International Conference on Data Engineering (ICDE), pages 3–7. IEEE, 2006. [6] A. Ciancio. Distributed Wavelet Compression Algorithms for Wireless Sensor Networks. PhD thesis, University of Southern California, 2006. [7] A. Ciancio and A. Ortega. A flexible distributed wavelet compression algo- rithm for wireless sensor networks using lifting. In Proc. of ICASSP’04, 2004. [8] A. Ciancio and A. Ortega. Distributed wavelet compression algorithm for wireless multihop sensor networks based on lifting. In Proc. of ICASSP’05, 2005. [9] A. Ciancio, S. Pattem, A. Ortega, and B. Krishnamachari. Energy-efficient data representation and routing for wireless sensor networks based on a dis- tributed wavelet compression algorithm. In Proc. of IPSN’06, April 2006. [10] I. Cidon andM. Sidi. Distributed assignment algorithms formulti-hop packet- radio networks. IEEE Transactions on Computers, 38(10), October 1989. 137 [11] R.L. Claypoole, G.M. Davis, W. Sweldens, and R.G. Baraniuk. Nonlinear wavelet transforms for image coding via lifting. IEEE Transactions on Image Processing, 12(12):1449–1459, December 2003. [12] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2nd edition, 2001. [13] R. Cristescu, B. Beferull-Lozaon, and M. Vetterli. Networked Slepian-Wolf: Theory, algorithms, andscalinglaws. IEEE Transactions on Information The- ory, 51(12):4057–4073, December 2005. [14] D. Estrin, D. Ganesan, S. Ratnasamy, and H. Wang. Coping with irregu- lar spatio-temporal sampling in sensor networks. ACM SIGCOMM Comput. Commun. Rev., 34(1):125–130, January 2004. [15] M. Gastpar, P. Dragotti, and M. Vetterli. The distributed Karhunen-Lo` eve transform. IEEE Transactions on Information Theory, 52(12):5177–5196, De- cember 2006. [16] B. Girodand S. Han. Optimum update for motion-compensated lifting. IEEE Signal Processing Letters, 12(2):150–153, February 2005. [17] V.K. Goyal. Theoretical foundations of transform coding. IEEE Signal Pro- cessing Magazine, 18(5):9–21, September 2001. [18] R. Gummadi, X. Li, R. Govindan, C. Shahabi, and W. Hong. Energy-efficient data organization and query processing in sensor networks. SIGBED Review, 2(1), January 2005. [19] D. K. Hammond, P. Vandergheynst, and R. Gribonval. Wavelets on graphs via spectral graph theory. Technical Report arXiv:0912.3848, Dec 2009. [20] S. Haykin. Adaptive Filter Theory. Prentice Hall, 4th edition, 2004. [21] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy-efficient routing protocols for wireless microsensor networks. In Proc. of Hawaii Intl. Conf. on Sys. Sciences, January 2000. [22] A.K. Jain. Fundamentals of Digital Image Processing. Prentice Hall, 1989. [23] M. Jansen, G. Nason, and B. Silverman. Scattered data smoothing by empir- ical Bayesian shrinkage of second generation wavelet coefficients. In Wavelets: Applications in Signal and Image Processing IX, Proc. of SPIE, 2001. [24] JBIG. Progressive Bi-level Image Compression. In ISO/IEC ITU Recommen- dation T.82, 1993. 138 [25] D. Jungnickel. Graphs, Networks and Algorithms. Springer-Verlag Press, 2nd edition, 2004. [26] W.-S. Kim, A. Ortega, P. Lai, D. Tian, and C. Gomila. Depth map distortion analysis for view rendering and depth coding. In Proc. of ICIP’09, 2009. [27] W.-S. Kim, A. Ortega, P. Lai, D. Tian, and C. Gomila. Depth map coding with distortion estimation of rendered view. In Proc. of VIPC’10, 2010. [28] J. Kovacevic and M. Vetterli. Wavelets and Subband Coding. Prentice Hall, 1995. [29] P. Lai, A. Ortega, C. Dorea, P. Yin, and C. Gomila. Improving view rendering quality and coding efficiency by suppressing compression artifacts in depth- image coding. In Proc. of VCIP’09, 2009. [30] S. Li and W. Li. Shape-adaptive discrete wavelet transforms for arbitrarily shaped visual object coding. IEEE Transactions on Circuits and Systems for Video Technology, 10(5):725–743, August 2000. [31] H. Luo, Y.C. Tong, and G. Pottie. A two-stage DPCM scheme for wireless sensor networks. In Proc. of ICASSP’05, March 2005. [32] M. Maitre and M.N. Do. Joint encoding of the depth image based representa- tion using shape-adaptive wavelets. In Proc. of ICIP’08, 2008. [33] S. Mallat. A Wavelet Tour of Signal Processing. Elsevier, 2nd edition, 1999. [34] K.Mechitov, W.Kim,G.Agha,andT.Nagayama. High-frequencydistributed sensing for structure monitoring. In In Proc. First Intl. Workshop on Net- worked Sensing Systems (INSS), 2004. [35] Y. Morvan, P.H.N. de With, and D. Farin. Platelet-based coding of depth maps for the transmission of multiview images. volume 6055. SPIE, 2006. [36] S.K.NarangandA.Ortega. Liftingbasedwavelet transformsongraphs. InTo Appear in Proc. of Asia Pacific Signal and Information Processing Association (APSIPA), October 2009. [37] S.K. Narang, G. Shen, and A. Ortega. Unidirectional graph-based wavelet transforms for efficient data gathering in sensor networks. In Proc. of ICASSP’10, March 2010. [38] H. M. on Great Duck Island. Online data-set located at http://www.greatduckisland.net. 139 [39] A.V. Oppenheim and R.W. Schafer. Discrete-time Signal Processing. Prentice Hall, 3rd edition, 2009. [40] S. Pattem, B. Krishnamachari, and R. Govindan. The impact of spatial corre- lation on routing with compression in wireless sensor networks. ACM Trans- actions on Sensor Networks, 4(4):60–66, August 2008. [41] S. Pattem, G. Shen, Y. Chen, B. Krishnamachari, and A. Ortega. Senzip: An architecture for distributed en-route compression in wireless sensor networks. In Workshop on Sensor Networks for Earth and Space Science Applications (ESSA), April 2009. [42] W.B. Pennebaker and J.L. Mitchell. JPEG Still Image Data Compression Standard. Van Nostrand Reinhold, 1993. [43] E. Le Pennec and S. Mallat. Sparse geometric image representations with bandelets. IEEE Transactions on Image Processing, 14(4):423– 438, April 2005. [44] G. Peyre and S. Mallat. Surface compression with geometric bandelets. In SIGGRAPH ’05, 2005. [45] S.S. Pradhan, J. Kusuma, and K. Ramchandran. Distributed compression in a dense microsensor network. IEEE Signal Processing Magazine, pages 51–60, March 2002. [46] J.G.Proakis,E.M.Sozer,J.A.Rice,andM.Stojanovic. Shallowwateracoustic networks. IEEE Communications Magazine, 39(11):114–119, 2001. [47] K.R. Rao and P. Yip. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic, 1990. [48] P. Rickenbach and R. Wattenhofer. Gathering correlated data in sensor net- works. In Proceedings of the 2004 Joint Workshop on Foundations of Mobile Computing, October 2004. [49] A. Said and W.A. Pearlman. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Technology, 6(3):243–250, June 1996. [50] A. Sanchez, G. Shen, andA. Ortega. Edge-preserving depth-mapcoding using graph-based wavelets. In Proc. of Asilomar’09, 2009. [51] G. Shen, W.-S. Kim, A. Ortega, J. Lee, and H.C. Wey. Edge-aware intra predictionfordepth-mapcoding. InToAppearin Proc. ofICIP’10,September 2010. 140 [52] G. Shen, S. Narang, and A. Ortega. Adaptive distributed transforms for irreg- ularly sampled wireless sensor networks. In Proc. of ICASSP’09, April 2009. [53] G. Shen and A. Ortega. Compact image representation using wavelet lifting along arbitrary trees. In Proc. of ICIP’08, October 2008. [54] G. Shen and A. Ortega. Joint routing and 2D transform optimization for irregular sensor network grids using wavelet lifting. In IPSN ’08, April 2008. [55] G. Shen and A. Ortega. Optimized distributed 2D transforms for irregularly sampled sensor network grids using wavelet lifting. In Proc. of ICASSP’08, April 2008. [56] G. Shen and A. Ortega. Tree-based wavelets for image coding: Orthogonal- ization and tree selection. In Proc. of PCS’09, May 2009. [57] G. Shen and A. Ortega. Transform-based distributed data gathering. IEEE Transactions on Signal Processing, 58(7):3802–3815, July 2010. [58] G. Shen, S. Pattem, and A. Ortega. Energy-efficient graph-based wavelets for distributed coding in wireless sensor networks. In Proc. of ICASSP’09, April 2009. [59] A. Smolic, K. Mueller, N. Stefanoski, J. Ostermann, A. Gotchev, G.B. Akar, G. Triantafyllidis, and A. Koz. Coding algorithms for 3DTV: A survey. Cir- cuits and Systems for Video Technology, IEEE Transactions on, 17(11):1606– 1621, Nov. 2007. [60] K. Sohrabi, J. Gao, V. Ailawadhi, and G. J. Pottie. Protocols for self- organization of a wireless sensor network. IEEE Personal Communications, 7(5), October 2000. [61] A. Sridharan and B. Krishnamachari. Max-min fair collision-free scheduling forwireless sensor networks. In Proc. of IEEE IPCCC Workshop on Multi-hop Wireless Networks, April 2004. [62] H.StarkandJ.W.Woods. ProbabilityandRandomProcesseswithApplications to Signal Processing. Prentice Hall, 3rd edition, 2002. [63] G. Strang. Linear Algebra and its Applications. Thomson Learning, 3rd edi- tion, 1988. [64] G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997. 141 [65] W. Sweldens. The lifting scheme: A construction of second generation wavelets. Tech. report 1995:6, Industrial Mathematics Initiative, Department of Mathematics, University of South Carolina, 1995. [66] M. Tanimoto, T. Fujii, and K. Suzuki. View synthesis algorithm in view synthesis reference software2.0(VSRS2.0). ISO/IECJTC1/SC29/WG11, Feb. 2009. [67] D. Taubman and D. Marcellin. JPEG2000: Image Compression Fundamen- tals, Standards and Practice. Kluwer Academic Publishers, 1st edition, 2001. [68] TinyOS-2. Collection tree protocol. http://www.tinyos.net/tinyos-2.x/doc/. [69] D.TuloneandS.Madden. PAQ:Timeseriesforecastingforapproximatequery answering in sensor networks. In Proceedings of the European Conference in Wireless Sensor Networks (EWSN), pages 21–37. IEEE, February 2006. [70] G. Valiente. Algorithms on Trees and Graphs. Springer, 1st edition, 2002. [71] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P.L. Dragotti. Direc- tionlets: Anisotropic multidirectional representation with separable filtering. IEEE Transactions on Image Processing, 15(7):1916– 1933, July 2006. [72] R.Wagner, R. Baraniuk, S. Du, D.B.Johnson, andA. Cohen. Anarchitecture for distributed wavelet analysis and processing in sensor networks. In IPSN ’06, April 2006. [73] R.Wagner, H.Choi, R.Baraniuk, andV.Delouille. Distributedwavelet trans- form for irregular sensor network grids. In IEEE Stat. Sig. Proc. Workshop (SSP), July 2005. [74] A. Wang and A. Chandraksan. Energy-efficient DSPs for wireless sensor net- works. IEEE Signal Processing Magazine, 19(4):68–78, July 2002. [75] R. Willett and R. Nowak. Platelets: A multiscale approach for recovering edges and surfaces in photon-limited medical imaging. IEEE Transactions on Medical Imaging, 22(3):332–350, March 2003. [76] K. Yamamoto, M. Kitahara, H. Kimata, T. Yendo, T. Fujii, M. Tanimoto, S.Shimizu, K.Kamikura, andY. Yashima. Multiview video codingusing view interpolationandcolorcorrection. Circuits and Systems for Video Technology, IEEE Transactions on, 17(11):1436–1449, Nov. 2007. [77] W. Ye, J. Heidemann, and D. Estrin. An energy-efficient MAC protocol for wireless sensor networks. In INFOCOM ’02, 2002. 142 [78] B. Zeng and J. Fu. Directional discrete cosine transforms for image coding. In Proc. of ICME’06, 2006. [79] Y. Zhu, K. Sundaresan, and R. Sivakumar. Practical limits on achievable energy improvements and useable delay tolerance in correlation aware data gathering in wireless sensor networks. In IEEE SECON’05, September 2005. [80] L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski. High- quality video view interpolation using a layered representation. ACM Trans- actions on Graphics, 23(3), Aug. 2004. 143
Abstract (if available)
Abstract
There are many scenarios in which data can be organized onto a graph or tree. Data may also be similar across neighbors in the graph, e.g., data across neighboring sample points may be spatially correlated. It would therefore be useful to apply some form of transform across neighboring sample points in the graph to exploit this correlation in order to achieve more compact representations. To this end, we describe a general class of de-correlating lifting transforms that can be applied to any graph or tree, and propose a variety of transform optimizations. We mainly focus on the design of tree-based lifting transform designs. Extensions to graph-based lifting transforms are also discussed. As a first application, we develop distributed graph-based transforms for efficient data gathering in wireless sensor networks (WSNs), where the goal is to transmit data from every node in the network to a collection (or sink) node along a routing tree. In particular, we (i) propose a general class of unidirectional transforms that can be computed in a distributed manner as data is routed toward the sink, and (ii) provide conditions for their invertibility. Moreover, we show that any unidirectional lifting transform is invertible, and propose a variety of tree-based lifting transform designs. By using these transforms to de-correlate data in the network, the total communication cost for data gathering is significantly reduced. We also extend these tree-based lifting transforms to incorporate broadcast communication links. This leads to a set of graph-based lifting transforms for WSNs. In particular, nodes incorporate data received from their broadcast neighbors together with data received from their neighbors in the routing tree. By doing so, they are able to achieve more data de-correlation. By exploiting the additional broadcast communication links in this way, these graph-based lifting transforms reduce the total communication cost even further.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Distributed wavelet compression algorithms for wireless sensor networks
PDF
Application-driven compressed sensing
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Efficient transforms for graph signals with applications to video coding
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Critically sampled wavelet filterbanks on graphs
PDF
Efficient graph learning: theory and performance evaluation
PDF
On location support and one-hop data collection in wireless sensor networks
PDF
Aging analysis in large-scale wireless sensor networks
PDF
Joint routing and compression in sensor networks: from theory to practice
PDF
Transport layer rate control protocols for wireless sensor networks: from theory to practice
PDF
Robust routing and energy management in wireless sensor networks
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
Realistic modeling of wireless communication graphs for the design of efficient sensor network routing protocols
PDF
Efficient data collection in wireless sensor networks: modeling and algorithms
PDF
Robust and efficient geographic routing for wireless networks
PDF
Sampling theory for graph signals with applications to semi-supervised learning
Asset Metadata
Creator
Shen, Godwin
(author)
Core Title
Lifting transforms on graphs: theory and applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
08/06/2010
Defense Date
05/10/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
data compression,filter design,OAI-PMH Harvest,wavelet transforms,wireless sensor networks
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ortega, Antonio (
committee chair
), Dimakis, Alexandros G. (
committee member
), Govindan, Ramesh (
committee member
), Krishnamachari, Bhaskar (
committee member
), Kuo, C.-C. Jay (
committee member
)
Creator Email
godwinsh@gmail.com,godwinsh@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3316
Unique identifier
UC1432166
Identifier
etd-Shen-3874 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-370041 (legacy record id),usctheses-m3316 (legacy record id)
Legacy Identifier
etd-Shen-3874.pdf
Dmrecord
370041
Document Type
Dissertation
Rights
Shen, Godwin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
data compression
filter design
wavelet transforms
wireless sensor networks