Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Distributed source coding for image and video applications
(USC Thesis Other)
Distributed source coding for image and video applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DISTRIBUTED SOURCE CODING FOR IMAGE AND VIDEO APPLICATIONS by Ngai-Man Cheung A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful¯llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) Aug 2008 Copyright 2008 Ngai-Man Cheung Dedication To my wife Tao. ii Acknowledgements I would like to thank my advisor Prof. Antonio Ortega for his guidance and suggestions throughout the course. I will always remember his enthusiasm about research discussion. And thanks for being critical of my work, which at the end I do feel grateful. I would also like to thank Prof. C.-C. Jay Kuo and Prof. Aiichiro Nakano for being the faculty members in my dissertation committee, and Prof. Sanjit Mitra and Prof. Zhen Zhang for being the faculty members in my qualifying exam committee. It is indeed a great privilege to have their valuable comments and feedbacks on my work. IwouldliketothankmymentorsDr. YiliangBao, Dr. DakeHe, andDr. AshishJag- mohanformakingmyinternshipexperienceatNokiaResearchCenterandIBMResearch pleasant and productive. I would also like to thank Dr. Sam Dolinar, Dr. Aaron Kiely, Dr. Hua Xie and Dr. Matthew Klimesh of NASA-JPL for helpful research discussion. Thanks also to all my group members and friends for their help, cares, encouragements and feedbacks. I would like to thank my parents for their unconditional love and support. It is always my regret not being able to spend much time with you these years. I would like to thank my son Long. It has been pleasure seeing him growing up. Finally, I would like to express my deepest gratitude to my wife Tao. There were many times when I felt iii tired and wanted to give up. Without your support and care I can never face all the di±culties. I feel indebted for having you sacri¯cing so much, and worrying so much, but you never feel tired of me. I am happy we can hang in there until the end, and life has been pleasant with you together. iv Table of Contents Dedication ii Acknowledgements iii List Of Tables viii List Of Figures ix Abstract xvi Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Distributed Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Slepian-Wolf Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Constructive Coding Algorithm . . . . . . . . . . . . . . . . . . . . 7 1.2.2.1 Coset Example . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.2.2 Main Ideas and General Steps . . . . . . . . . . . . . . . 9 1.2.2.3 Practical Algorithms Using Error Correcting Codes . . . 10 1.2.3 Properties of Distributed Source Coding . . . . . . . . . . . . . . . 12 1.2.3.1 Encoding Requiring Correlation Information Only . . . . 12 1.2.3.2 Systems Robust to Uncertainty on Predictor . . . . . . . 15 1.3 Hyperspectral Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.1 Review of Hyperspectral Imagery Compression Approaches . . . . 17 1.3.1.1 Inter-band Prediction Approaches . . . . . . . . . . . . . 17 1.3.1.2 3D Wavelet Approaches . . . . . . . . . . . . . . . . . . . 19 1.4 Contributions of This Research . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 2: E±cient Wavelet-based Predictive Slepian-Wolf Coding for Hyperspectral Imagery 23 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.2 Our Contributions and Related Works . . . . . . . . . . . . . . . . 24 2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Codec Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Estimating the Correlation and Encoder Complexity Comparison . . . . . 36 v 2.4.1 Estimation of Predictor Coe±cients and Correlation . . . . . . . . 38 2.4.1.1 Estimation of Predictor Coe±cients . . . . . . . . . . . . 38 2.4.1.2 Estimation of Crossover Probability . . . . . . . . . . . . 39 2.4.2 Encoder Complexity Comparison . . . . . . . . . . . . . . . . . . . 41 2.4.2.1 Comparison with Inter-band Prediction . . . . . . . . . . 42 2.4.2.2 Comparison with 3D Wavelet Approaches . . . . . . . . . 45 2.5 Experimental Results of Baseline Codec . . . . . . . . . . . . . . . . . . . 45 2.5.1 Rate-Distortion Comparison with 3D Wavelet Approaches . . . . . 48 2.5.2 Rate-Distortion Comparison with 2D Wavelet Approaches . . . . . 51 2.5.3 Preservation of Spectral Signature . . . . . . . . . . . . . . . . . . 56 2.6 Adaptive Coding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.6.1 Re¯nement/Sign Bits Compression . . . . . . . . . . . . . . . . . . 59 2.6.2 Signi¯cance Maps Compression and Raw Bitplane Coding . . . . . 62 2.6.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.6.4 Adaptive Coding Scheme . . . . . . . . . . . . . . . . . . . . . . . 65 2.7 Experimental Results of Adaptive Codec . . . . . . . . . . . . . . . . . . . 66 2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Chapter 3: Flexible Video Decoding: A Distributed Source Coding Ap- proach 72 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.1.2 Flexible Decoding: Challenges . . . . . . . . . . . . . . . . . . . . 73 3.1.3 Our Contributions and Related Work . . . . . . . . . . . . . . . . 78 3.2 Flexible Decoding Based on DSC: Intuition . . . . . . . . . . . . . . . . . 80 3.3 Theoretical Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.4 Proposed Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.4.1 Motion Estimation and Macroblock Classi¯cation . . . . . . . . . . 87 3.4.2 Direct Coe±cient Coding (DCC) . . . . . . . . . . . . . . . . . . . 89 3.4.3 Signi¯cant Coe±cient Coding (SCC) . . . . . . . . . . . . . . . . . 90 3.4.4 Bit-plane Compression . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.4.5 Model and Conditional Probability Estimation . . . . . . . . . . . 91 3.5 Application Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.6 Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . . 95 3.6.1 Viewpoint Switching in Multiview Video Coding . . . . . . . . . . 95 3.6.2 Forward/Backward Video Playback. . . . . . . . . . . . . . . . . . 98 3.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 99 Chapter 4: Correlation Estimation for Distributed Source Coding Under Rate and Complexity Constraints Using Sampling-based Techniques 103 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.1.2 Correlation Information Models in Practical DSC Applications . . 104 4.1.3 Correlation Estimation in Distributed Source Coding . . . . . . . . 106 4.1.4 Our Contributions and Related Work . . . . . . . . . . . . . . . . 108 vi 4.2 Single Binary Source: Rate Penalty Analysis . . . . . . . . . . . . . . . . 112 4.2.1 Problem De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.2 Correlation Estimation. . . . . . . . . . . . . . . . . . . . . . . . . 114 4.2.3 Rate Penalty Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.2.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.3 Multiple Binary Sources: Rate Penalty Analysis and Sample Allocation . 118 4.3.1 Problem De¯nition . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.3.2 Optimal Sample Allocation Strategy . . . . . . . . . . . . . . . . . 121 4.3.3 Rate Penalty Analysis - Multiple Sources . . . . . . . . . . . . . . 124 4.3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.4 Continuous Input Source: Model-based Estimation . . . . . . . . . . . . . 125 4.4.1 General Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.5 Model-based Estimation on Structured Bit-planes . . . . . . . . . . . . . . 131 4.5.1 Model-based Estimation for Re¯nement/Sign Bit-planes . . . . . . 134 4.5.2 Model-based Estimation for Wavelet-based Applications . . . . . . 135 4.5.2.1 Estimation with Adequate Samples . . . . . . . . . . . . 136 4.5.2.2 Estimation without Adequate Samples . . . . . . . . . . 137 4.6 Hyperspectral Image Compression Experiments . . . . . . . . . . . . . . . 140 4.6.1 DSC-based Hyperspectral Image Compression . . . . . . . . . . . . 140 4.6.2 Sample Allocation Experiments . . . . . . . . . . . . . . . . . . . . 141 4.6.3 Model-based Estimation Experiments . . . . . . . . . . . . . . . . 143 4.7 Extensions to Other Correlation Models . . . . . . . . . . . . . . . . . . . 146 4.7.1 Rate Penalty Model for Non-equiprobable Input . . . . . . . . . . 148 4.7.2 Model-based Estimation with Previously Decoded Bit-planes as Side-information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 4.7.3 Model-basedEstimationwithContinuousSideInformationinJoint Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Chapter 5: Conclusions and Future Work 156 Appendix A Derivation of Bounds on the Estimation Error. . . . . . . . . . . . . . . . . . . 159 Appendix B Derivation of Optimal Sample Allocation . . . . . . . . . . . . . . . . . . . . . 161 Bibliography 163 vii List Of Tables 3.1 Comparison of DSC-based low-complexity encoding and °exible decoding. 80 4.1 Kolmogorov-Smirnov (K-S) tests statistics for 4H: Mobile DC, QP=24. Numbers in parenthesis indicate cases that K-S statistics are larger than the 1% signi¯cance cuto® value, 0:1608, and therefore do not pass K-S tests. Those are cases when n¢p¸4 does not hold. Note that bit position 6 corresponds to a more signi¯cant bit-plane. . . . . . . . . . . . . . . . . 118 viii List Of Figures 1.1 Compression using CLP. Encoder computes the di®erence between the in- put source X and the predictor Y and communicates the di®erence (pre- diction residue) to the decoder. . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Slepian-Wolf theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 An example using coset to illustrate the main ideas of constructive DSC algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Practical coding algorithms using error correcting codes. . . . . . . . . . . 11 1.5 Decoding error due to inaccurate correlation information. . . . . . . . . . 14 1.6 Robustness property of DSC. . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.7 Hyperspectral image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.8 Mean square residuals after simple image alignment and subtraction. . . . 18 2.1 The proposed parallel encoding system. Each processor compresses one spatial image at a time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Block diagram of SW-SPIHT. . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Bit-plane coding in SW-SPIHT. . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4 Encoding using proposed system. . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Example of conditional entropy of di®erent bit-planes. . . . . . . . . . . . 36 2.6 MSE under ¯rst-order linear predictor for a typical spectral band. . . . . 39 ix 2.7 Exampleofestimationofcrossoverprobability. Notethatforallbit-planes it is possible to achieve an accurate estimate using a small percentage of data in a bit-plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.8 Encoding using inter-band prediction. . . . . . . . . . . . . . . . . . . . . 43 2.9 Rate-distortion curves of SW-SPIHT and predictive 3D-SPIHT - Cuprite. 48 2.10 Rate-distortion curves of SW-SPIHT and predictive 3D-SPIHT - Mo®et Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.11 Rate-distortion curves of SW-SPIHT and predictive 3D-SPIHT - Lunar Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.12 Inter-band SNR °uctuation under 3D-SPIHT. . . . . . . . . . . . . . . . . 51 2.13 Rate-distortion curves of SW-SPIHT and predictive 2D-SPIHT - Cuprite. 52 2.14 Rate-distortion curves of SW-SPIHT and predictive 2D-SPIHT - Mo®et Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.15 Rate-distortion curves of SW-SPIHT and predictive 2D-SPIHT - Lunar Lake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.16 Rate-distortion curves of SW-SPIHT and SPIHT (Site: Cuprite, View: SC01, Band: 40). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.17 Rate-distortion curves of SW-SPIHT and SPIHT (Site: Cuprite, View: SC01, Band: 133). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.18 Rate-distortion curves of SW-SPIHT and SPIHT (Site: Cuprite, View: SC01, Band: 190). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.19 Classi¯cation performance (Site: Lunar Lake, View: SC02). . . . . . . . . 56 2.20 Bit extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.21 Coding con¯gurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.22 Probability that a re¯nement/sign bit being zero or one. . . . . . . . . . . 60 2.23 Events of re¯nement/sign bits crossover. . . . . . . . . . . . . . . . . . . . 61 2.24 Bias in signi¯cance maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 x 2.25 Raw bit-planes crossover. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.26 Modeling results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.27 Adaptive coding scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.28 Coding performance: Cuprite. Correlation information is estimated by a model-based approach discussed in Chapter 4. . . . . . . . . . . . . . . . . 70 2.29 Coding performance: Lunar. Correlation information is estimated by a model-based approach discussed in Chapter 4. . . . . . . . . . . . . . . . . 71 3.1 Multiviewvideoapplications-viewpointswitchingmayrequireacompres- sion scheme to support several di®erent decoding orders: (a) users stay in the same view during playback; (b), (c) users switch between adjacent views during playback. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2 Forward and backward frame-by-frame playback. . . . . . . . . . . . . . . 74 3.3 Robust video transmission using multiple decoding paths. . . . . . . . . . 74 3.4 Problem formulation for °exible decoding. Either one of the candidate predictors Y 0 ;Y 1 ;:::;Y N¡1 will be present at the decoder, but encoder does not know which one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5 CompressionofinputsourceX: (a)CLPcoding; (b)DSCfromthevirtual channel perspective; (c) DSC approach to the °exible decoding problem. . 82 3.6 Source networks: (a) Source network of Slepian-Wolf [21,63]. Csiszar and Korner[20,21]suggestanachievablerateH(XjY)forcommunicatingX is universally attainable. (b) Source networks of °exible decoding with pre- dictor uncertainty. The same universal codes can be used to communicate X in any of these networks at a rate H(XjY i ). . . . . . . . . . . . . . . . 85 3.7 Theoretical performances of intra coding, CLP and DSC in a °exible de- coding scenario: Multiview video coding as in Figure 3.1. (a) Previously reconstructedframesofneighboringviewsareusedaspredictorcandidates followingthedepictedorder;(b)EntropyofthequantizedDCTcoe±cients (as an estimate of the encoding rate) vs. number of predictor candidates. The results are the average of 30 frames using Akko&Kayo view 28-th. (c) Entropy of each frame in the case of three predictor candidates. . . . . . 85 xi 3.8 Theoretical performances of intra coding, CLP and DSC in a °exible de- coding scenario: Robust video transmission as in Figure 3.3. (a) Past re- constructed frames are used as predictor candidates following the depicted order; (b) Entropy of the quantized DCT coe±cients (as an estimate of the encoding rate) vs. number of predictor candidates. The results are the average of 30 frames using Coastguard. (c) Entropy of each frame in the case of three predictor candidates. . . . . . . . . . . . . . . . . . . . . . . 86 3.9 Proposed encoding algorithm to encode an macroblock M of the current frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.10 Encoding macroblock M using DSC. . . . . . . . . . . . . . . . . . . . . . 88 3.11 Estimate the conditional probability p(b(l)jY i ;b(l+1);b(l+2);:::). . . . . 94 3.12 Di®erent coding structures in multiview video application. Shaded video framesarethosethatneededtobedecodedtoenablethedepicteddecoding path. (a) Simulcast; (b) New coding structure enabled by the proposed tools, where\S"denotesthosevideoframesthatcanbedecodedusingany one of the predictor candidates. Note that in simulcast, decoder needs to decode some extra video frames that users did not request, e.g., the ¯rst three frames in the v-th view in the depicted example. In client-server applications, bitstream corresponding to these extra frames is also needed to be sent to the client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.13 Simulationresultsofmultiviewvideocoding: Akko&Kayo. Theresultsare the average of the ¯rst 30 frames (at the switching points) from 3 di®erent views (views 27th-29th) arbitrarily chosen from all the available views. . . 98 3.14 Simulation results of multiview video coding: Ballroom. The results are the average of the ¯rst 30 frames (at the switching points) from 3 di®erent views (views 3rd-5th) arbitrarily chosen from all the available views. . . . 99 3.15 DriftingexperimentusingAkko&Kayoview28th: (a)CLP;(b)DSC.GOP size is 30 frames. Note that with DSC, the PSNR are almost the same in the switching and non-switching cases. . . . . . . . . . . . . . . . . . . . . 100 3.16 Scaling experiment: in this experiment, the temporally and spatially ad- jacent reconstructed frames are used as predictor candidates following the depicted order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.17 ScalingexperimentusingsequenceAkko&Kayoview29th. PSNRofdi®er- ent schemes are comparable - Intra: 35.07dB, CLP: 34.78dB, DSC:34.79dB. 101 xii 3.18 Simulation results of forward/backward video playback: Coastguard. Re- sults are reported for the average of the ¯rst 30 frames. Note that H.263 inter-framecodingcannotsupport°exibledecoding-theresultsareshown here for reference only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.19 Simulation results of forward/backward video playback: Stefan. Results are reported for the average of the ¯rst 30 frames. Note that H.263 inter- frame coding cannot support °exible decoding - the results are shown here for reference only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.20 Drifting experiment using Stefan sequence: (a) CLP; (b) DSC. The ¯g- ure shows the PSNR of the reconstructed frames in the ¯rst GOP. Note that with DSC, the PSNR are almost the same in the switching and non- switching cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.1 (a)AnexampleofSlepian-Wolfcodingexploitingbinarycorrelation. Boxes \Q" and \Q ¡1 " denote quantization and inverse quantization respectively. Boxes \B" and \B ¡1 " denote binarization and the inverse respectively. (b) An example of Slepian-Wolf coding exploiting continuous correlation. 107 4.2 Apply sampling to distributed video coding: when encoding the current frame, werandomlysamplenmacroblocksofthecurrentframetoundergo motion estimation, with n being much smaller than the total number of macroblocks. By doing so, we would obtain n pairs of samples (X;Y), where X is the current macroblock and Y is the corresponding motion- compensated predictor. From these n sample pairs the encoding rate can be estimated. Note that here the sampling cost associated with each data sample is not primarily due to data exchange but the computation in mo- tion search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 Encoding rate function H(:). The same 4p l would have a larger impact to the estimated rate if the true p l is small. The optimal sample allocation takes into account this characteristic. . . . . . . . . . . . . . . . . . . . . . 122 4.4 Reduction in rate penalty (bits) using the optimal sample allocation, with L = 4, K l = 16384, n T = 128 or 1024 (i.e., 0:20% or 1:56% of total respectively), N E =100, and p=0:25. . . . . . . . . . . . . . . . . . . . . 126 4.5 (a) Crossover probability estimation for raw bit-planes. A i are the events that lead to occurrence of crossover between X and Y at signi¯cance level l. (b)Re¯nementbit-planecrossoverprobabilityestimation: probabilityof crossover and X i is already signi¯cant. (c) Sign bit-plane crossover proba- bility estimation: probability of sign crossover and X i becomes signi¯cant. 129 xiii 4.6 Comparing estimation accuracy between the direct approaches and the model-based approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.7 Pr(U(l;x)) when (a)b jxj 2 l c is odd ; (b)b jxj 2 l c is even. . . . . . . . . . . . . 139 4.8 The DSC-based hyperspectral image compression with direct correlation estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4.9 DSC-based hyperspectral image compression with model-based estimation. 143 4.10 Sample allocation experiments - (a) Lunar (re°ectance data), (b) Mo®et (radiance data), using 0:25% total sample. An adaptive sample alloca- tion scheme using the proposed optimal sample allocation strategy in Sec- tion4.3withaprioriinformationfrompreviousencodedbandiscompared with the even sample allocation. Here \Exact" denote the cases when the exact correlation information is used to determine the coding rate, i.e., no rate penalty.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 4.11 Coding e±ciency comparison: (a) Cuprite (radiance data); (b) Lunar (re- °ectance data). The model-based system is compared with the original system, which uses all samples in direct estimation and exact crossover probabilities to determine the encoding rate, i.e. no rate penalty. Cod- ing performances of several other wavelet systems are also presented for reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 4.12 (a)Encodingofbit-planeb X (l)withboththepreviouslydecodedbit-planes b X (l + 1);:::;b X (l + m) and that of the correlated source b Y (l);b Y (l + 1);:::;b Y (l+m) as SI. (b) Joint p.d.f. between the input and all the SI. . 152 4.13 Entropy/conditionalentropyofthevideobit-planesdatausedintheexper- iment in Section 4.2.4, i.e., X and Y are the quantized DCT coe±cients in acurrentframeandthecorrespondingquantizedcoe±cientsinthemotion- compensated predictors in the reference frame, respectively, using the 2nd AC coe±cients of Mobile (720£576, 30 fps), at QP=12: (a) Without SI, i.e., intra coding (\No SI"); (b) Using only the corresponding bit-plane as SI, i.e., m = 0 (\SI: Corr. bit-plane"); (c), (d), (e): Using corresponding and one, two or three previously decoded bit-planes as SI, i.e., m = 1;2 or 3 respectively. The results suggest further improvements with m > 1 could be negligible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.14 Extensionofmodelbasedestimation: Theeventsleadingtotheoccurrence of hb X (l+m):::b X (l+1)b X (l)i = i and hb Y (l+m):::b Y (l+1)b Y (l)i = j correspond to the region A i;j in the sample space of X and Y. . . . . . . 153 xiv 4.15 The events leading to the occurrence of hb X (l+m):::b X (l+1)b X (l)i = i correspond to regions A i , subset of X. . . . . . . . . . . . . . . . . . . . 155 xv Abstract Many video compression schemes (e.g., the recent H.264/AVC standard) and volumetric image coding algorithms are based on a closed-loop prediction (CLP) framework. While CLP based schemes can achieve state-of-the-art coding e±ciency, they are inadequate in addressing some important emerging applications such as wireless video, multiview video, etc, which have new requirements including low complexity encoding, robustness to transmission error, °exible decoding, among others. In this research we investigate new video and image compression algorithms based on distributed source coding (DSC), and we demonstrate the proposed algorithms can overcome some of the de¯ciencies in CLP based systems while achieving competitive coding performance. The¯rstpartofthisthesisdiscussesourworktoexploreDSCprinciplesfordesigning hyperspectral imagery compression algorithms, with an eye toward an e±cient and par- allel encoder implementation with modest memory requirement. Using DSC tools allows encoding to proceed in \open loop", and this facilitates parallel compression of spec- tral bands in multi-processors con¯gurations. We demonstrate that our proposed DSC techniques can be adaptively combined with set partitioning of wavelet coe±cients to exploit spatial and spectral correlation. Our latest results show the proposed algorithm xvi can achieve a comparable coding e±ciency to a simple 3D wavelet codec developed at NASA-JPL. ThesecondpartofthisthesisinvestigatesDSCbasedcodingalgorithmstoaddressthe °exibledecodingprobleminvideoapplications. Inthese,theencoderneedstocompressa current frame under uncertainty on the predictor available at decoder. Flexible decoding isrelevantinanumberofapplicationsincludingmultiviewvideo,frame-by-frameforward and backward video playback, robust video transmission, etc. The proposed algorithm incorporates novel macroblock mode switching and signi¯cance coding within the DSC framework. This, combined with a judicious exploitation of correlation statistics, allows us to outperform other competing solutions. The third part of this thesis proposes solution to address the correlation estimation problem in DSC, which is an important subject for practical DSC applications. We formulate the rate-constrained correlation estimation problem in a DSC framework, and proposeinformationexchangestrategiesthatminimizetheratepenaltyduetoinaccurate estimation. Wealsoproposeanovelmodel-basedmethodforcorrelationestimationinthe context of DSC. We demonstrate that the model-based estimation can achieve accurate estimation with minimal computational and data exchange requirements. xvii Chapter 1 Introduction 1.1 Motivation In the past decade, both academia and industry have devoted substantial research and standardization e®orts to the development of multimedia compression algorithms. Some well-known examples are the MPEG video coding standards developed by ISO/IEC and the H.26X video coding standards developed by the ITU. Digital video and image com- pressionhavebecomecentraltechnologiesinavarietyofapplications,includingconsumer electronics(e.g.,DVD,digitalstillcamera),theInternet(e.g.,JPEG-compressedpictures, streaming video), distance learning, surveillance and security, and remote sensing. Conventionally, video compression standards (e.g., the recent H.264/AVC standard) andmanyvolumetricimagecodingalgorithmsarebasedonaclosed-loopprediction(CLP) framework [80,82]. In CLP systems, encoder would compute the di®erence between the inputsourceandapredictoravailableatboththeencoderanddecoder,andcommunicate this di®erence, or prediction residue, to the decoder (Figure 1.1). Compression schemes based on CLP have demonstrated state-of-the-art coding performance. 1 Encoder Decoder X X R≥H(X|Y) Y Z=X-Y Figure 1.1: Compression using CLP. Encoder computes the di®erence between the input source X and the predictor Y and communicates the di®erence (prediction residue) to the decoder. CLP based algorithms, however, could be inadequate to address the requirements of several important emerging applications. For example, CLP schemes are vulnerable to transmission error, and it is non-trivial to communicate CLP compressed data over lossy channels as occurred in wireless video applications [66,75,87]. Moreover, CLP systems may lack the decoding °exibility required by some emerging applications such as multiview video [35,70]. Furthermore, CLP coding algorithms are inherently sequential, and it is di±cult to achieve parallel encoding of slices in the cases of volumetric image compression [51]. It is the purpose of this research to investigate novel video and image compression algorithms to address the aforementioned issues in conventional compression framework based on CLP. Speci¯cally, we propose hyperspectral image and video compression al- gorithms based on distributed source coding (DSC). We demonstrate the proposed DSC based algorithms can overcome some of the de¯ciencies in conventional schemes, and achieve competitive compression e±ciency. One of the central problems in DSC is to estimate, during encoding, the correlation information between the input and the predic- tor (side information) available only at the decoder [30,85]. Therefore, we also propose di®erent correlation estimation strategies based on sampling techniques. 2 Hyperspectral imagery is usually highly correlated, in some cases within each spec- tral band, but in particular across neighboring frequency bands [38,51]. In Chapter 2 we propose to use DSC to exploit this correlation with an eye to a parallel encoding implementation with modest memory requirement [8,14,67]. We apply DSC principles to hyperspectral images by encoding individual spectral bands under the assumption that these bands are correlated. As will be discussed, using DSC tools allows the en- coder to operate in \open loop" without requiring having access to decoded versions of (spectrally) neighboring bands, and this facilitates parallel encoding of spectral bands in multi-processor architectures. We ¯rst compute the parameters of a linear predictor to estimate the current spectral band from a neighboring one, and estimate the correlation between these two bands (after prediction). Then a wavelet transform is applied and a bit-plane representation is used for the resulting wavelet coe±cients. We observe that in typical hyperspectral images, bit-planes of same frequency and signi¯cance located in neighboring spectral bands are correlated. We exploit this correlation by using low- densityparity-check(LDPC)basedSlepian-Wolfcodes[43,46]. Thecoderatesarechosen basedontheestimatedcorrelation. Wedemonstratethatsetpartitioningofwaveletcoef- ¯cients, such as that introduced in the popular SPIHT algorithm, can be combined with our proposed DSC techniques so that coe±cient signi¯cance information is sent indepen- dently for all spectral bands, while sign and re¯nement bits can be coded using adaptive combinations of DSC and zerotree coding. Our latest results suggest that coding e±- ciency comparable to that of a simple 3D wavelet codec developed at NASA-JPL can be achieved by our proposed algorithm. 3 In Chapter 3 we investigate compression techniques to support °exible video decod- ing [10{12,16]. In these, encoders generate a single compressed bit-stream that can be decoded in several di®erent ways, so that users or decoders can choose among several available decoding paths. Flexible decoding has several advantages, including improved accessibility of the compressed data for emerging applications (e.g., multiview video) and enhanced robustness for video communication. Flexible decoding, however, makes it dif- ¯cult for compression algorithms to exploit temporal redundancy: when the decoder can choose among di®erent decoding paths, the encoder no longer knows deterministically which previously reconstructed frames will be available for decoding the current frame. Therefore, to support °exible decoding, encoders need to operate under uncertainty on the decoder predictor status. We propose to address °exible decoding based on DSC. The main advantage of a DSC approach to °exible decoding is that the information com- municated from the encoder to the decoder (namely, the parity bits) is independent of a speci¯c predictor. By \decoupling" the compressed information from the predictor, we will demonstrate that, theoretically and experimentally, DSC can lead to a solution that compares favorably to, in terms of coding e±ciency, one based on conventional CLP approach, where multiple prediction residues are sent, one for each possible predictor available at the decoder. The main novelties of the proposed algorithm are that it incor- porates di®erent macroblock modes and signi¯cance coding within the DSC framework. This, combined with a judicious exploitation of correlation statistics, allows us to achieve competitive coding performance. Experimental results using multiview video coding and forward/backward video playback suggest the proposed DSC-based solution can outper- form °exible decoding techniques based on CLP coding. 4 CentraltopracticalDSCapplicationsisthecorrelationinformationbetweentheinput and side information, which usually has to be estimated at the encoder in order to deter- minetheencodingrate[30,85]. Codinge±ciencydependsstronglyontheaccuracyofthis correlation estimation. While error in estimation is inevitable, the impact of estimation error on compression e±ciency has not been su±ciently studied for the DSC problem. In Chapter 4, we study correlation estimation subject to rate and complexity constraints, and its impact on coding e±ciency in a DSC framework for practical distributed image and video applications [9,15]. We focus, in particular, on applications where binary cor- relation models are exploited for Slepian-Wolf coding and sampling techniques are used toestimatethecorrelation,whileextensiontoothercorrelationmodelswillalsobebrie°y discussed. In the ¯rst part of Chapter 4 we investigate the compression of binary data. We ¯rst propose a model to characterize the relationship between the number of samples used in estimation and the coding rate penalty, in the case of encoding of a single binary source. The model is then extended to scenarios where multiple binary sources are com- pressed, and based on the model we propose an algorithm to determine the number of samples allocated to di®erent sources so that the overall rate penalty can be minimized, subject to a constraint on the total number of samples. The second part of Chapter 4 studies compression of continuous-valued data. We propose a model-based estimation for the particular but important situations where binary bit-planes are extracted from a continuous-valued input source, and each bit-plane is compressed using DSC. The pro- posed model-based method ¯rst estimates the source and correlation noise models using continuous-valued samples, and then uses the models to derive the bit-plane statistics analytically. We also extend the model-based estimation to the cases when bit-planes 5 are extracted based on the signi¯cance of the data, similar to those commonly used in wavelet-based applications. Experimental results demonstrate the e®ectiveness of the proposed algorithms. The rest of this chapter gives a brief review of related topics and summarizes the contributions of this thesis. 1.2 Distributed Source Coding DSC addresses the problem of compression of correlated sources that are not co-located. Theinformation-theoreticfoundationsofDSCwerelaidoutinthe1970sinthepioneering works of Slepian and Wolf [63], and Wyner and Ziv [84]. Driven by its potential to some emerging applications (sensors network, wireless video, etc), DSC has attracted much attention recently. In this section we brie°y review some information-theoretic results of DSC most rele- vant to our applications. We also discuss the basic ideas of constructive DSC algorithms, and properties of DSC essential to practical applications. 1.2.1 Slepian-Wolf Theorem Here we illustrate a particular case of the Slepian-Wolf theorem which is most relevant to practical DSC applications, often referred as (lossless) source coding with decoder side- information. Consider the set-up in Figure 1.1, where we try to losslessly compress an i.i.d. random source X n =fX 1 ;X 2 ;:::;X n g with another correlated i.i.d. random source Y n =fY 1 ;Y 2 ;:::;Y n g available at both the encoder and decoder. That is,fX i ;Y i g n i=1 i.i.d. »p(x;y), andX andY arediscreterandomvariables. Inthiscase, wecanuseCLP(e.g., 6 Encoder Decoder X X Y P(X,Y) X ~ R≥H(X|Y) Figure 1.2: Slepian-Wolf theorem. DPCM) to compress X n , i.e., we use Y n to predict X n and then encode the prediction residue. The theoretical lower bound of the lossless encoding rate is H(XjY). In distributed source coding, instead, we consider the situation where Y n is available only at the decoder (Figure 1.2). This situation appears to pose a more di±cult coding problem than that of Figure 1.1. However, Slepian and Wolf [63] have shown that the- oretically we can achieve the same lower bound as in the previous case, i.e., H(XjY), even the actual realization of Y n that will be available at the decoder is not known dur- ing encoding. The Slepian-Wolf theorem thus suggests that e±cient encoding is indeed possible even when the encoder does not have precise knowledge of the side-information Y n available at the decoder. 1.2.2 Constructive Coding Algorithm TheSlepian-Wolftheoremstatesthat,theoretically,thebestachievableratesarethesame with or without Y n at the encoder. However, practically, how can we compress X n when Y n is available only at the decoder, and achieve the same performance as when Y n is availableatboththeencoderanddecoder? MostconstructivealgorithmsforSlepian-Wolf coding are based on channel coding [1,27,43,56,57,83]. We will ¯rst illustrate the idea 7 with an example using cosets, and then describe the mechanism to perform Slepian-Wolf coding with error correcting codes. 1.2.2.1 Coset Example Consider an example when we try to represent a discrete random variable X distributed uniformlyin[0,255]. Inthiscaseintracodingwouldrequire8bitstorepresentX. Suppose there is another correlated random variable Y available, and the correlation between X andY issuchthatY¡X isdistributeduniformlyin[-4,4). Wecanexploitthiscorrelation to reduce the bit-rate to represent X. If Y is available at both encoder and decoder, we can use CLP to communicate X, i.e., we encode the residue Y ¡X. Since Y ¡X is distributed uniformly in [-4,4), only 3 bits are required to represent X. In situations when Y is not available at the encoder we can convey X using the following algorithm (Figure 1.3) [85]. We partition the space of all reconstruction levels into di®erent groups or cosets. In this particular example, we partition the space into 8 cosets: fA;B;C;:::;G;Hg,andeachcosetincludesseveralreconstructionlevelsseparated by the same distance. As will be discussed, the number of cosets is determined by the correlation between X and Y. To communicate X, we transmit the label of the coset which X belongs to (C in Figure 1.3), which would require 3 bits (since there are 8 cosets). At the decoder, we receive the coset label and observe the side-information Y. The coset label suggests X can be any one of the members in the coset. To disambiguate this information, we select the one coset member that is closest to Y and declare it as the reconstructed value for X. By doing so, one can check that we can always recover X 8 X=2 A B C D E F G H A B C D G H C C 0 1 2 … 255 … P(X,Y): 4 > Y-X ≥ -4 (i) X distributes uniformly in [0,255]. Intra coding requires 8 bits to encode X. X=2 … (ii) To encode X using DSC, we partition the input space into cosets: {A,B,C,…,G, H}. X=2 … (iii) Encoder conveys X by sending the label of the coset which X belongs to (i.e., C). … C C Y (iv) Decoder selects the member closest to Y and declares it as the reconstructed value for X. Encoder: Decoder: X ^ Figure 1.3: An example using coset to illustrate the main ideas of constructive DSC algorithms. exactlyaslongasX andY obeythecorrelationstructure. Therefore,wecancommunicate X using 3 bits, same as CLP, with Y available only at the decoder. 1.2.2.2 Main Ideas and General Steps As illustrated by the coset example, the main ideas of DSC are that the encoder sends ambiguous information to achieve rate savings, and the decoder would disambiguate the information using the side information. In the coset example, the ambiguous information is the coset label representing a group of reconstruction levels. As long as the members are su±ciently apart within the coset, we would be able to recover the original value 9 by selecting the member which is identi¯ed by the coset label and is closest to the side information. The general coding steps of DSC can be summarized as follows: 1. Partitioning the input space into cosets; 2. Members in the coset are separated by some minimum distance; 3. Encoder sends the coset label to decoder; 4. Decoder selects the coset member closest to the side-information, and declares it as the reconstructed value. 1.2.2.3 Practical Algorithms Using Error Correcting Codes Practical coding algorithms employing error correcting codes follow the same basic ideas and general steps discussed above [28,58,85]. For example, to encode an n-bit binary vectorX with a linear (n;k) binary error correcting code de¯ned by the parity matrixH, we compute S = XH T , where the (n¡k)-bit S is the syndrome. The syndrome would serve the same function as the coset label in the coset example. From coding theory, of all the 2 n possibleX, 2 k of them will have the same syndrome, thus we can partition the input space of X into 2 (n¡k) cosets according to the syndrome (Figure 1.4). Moreover, the minimum Hamming distance between members is the same for each coset. To communicate X, we would send the syndrome of the coset which X belongs to. At the decoder, we receive the syndrome and observe the side-information Y. To disam- biguate the information, we select the one closest to Y in the coset (Figure 1.4). Here 10 A A A B B B C C C C C C Y (i) 2 n-k cosets, each identified by coset label (syndrome). Space of n-bits vector: C C C C C C (ii) Encoder sends the syndrome of the coset which X belongs to. (iii) Decoder selects the member closest to Y. Figure 1.4: Practical coding algorithms using error correcting codes. we achieve compression by sending the (n¡k)-bit S instead of the original n-bit source X, and the encoding rate is n¡k n , which should be no less than the Slepian-Wolf limit. Note that in order to achieve performance close to the Slepian-Wolf limit, error cor- rection code based algorithms would need to use very large block lengths, e.g., n is of the order of 10 5 [85]. These may not be suitable for practical image and video applications, where correlation between symbols is non-stationary and may vary signi¯cantly between small blocks. Recent work has proposed to address distributed compression based on arithmetic coding, which may achieve good coding performance with much smaller block sizes, e.g., n is of the order of 10 3 [29]. 11 1.2.3 Properties of Distributed Source Coding In this section we will highlight some of the properties of DSC that are most useful for some emerging applications. 1.2.3.1 Encoding Requiring Correlation Information Only The most notable property of DSC is that encoding requires, in addition to the input sourceitself,onlythecorrelationinformationbetweenthesourceanditsside-information. In particular, the exact realization of the side-information is not needed during encoding, as illustrated by the coset example. This property makes DSC useful for application scenarios where encoders do not have access to the predictors. An example of this sce- nario could be compression of sensor data [55,85]. Consider a dense sensor ¯eld where individual sensors acquire and transmit information to a central node for processing. Since the information could be highly correlated, the sensors may opt to coordinate with their neighbors and remove the redundancy in the collected information before sending it to the central node, so that lower transmission rates can be achieved. However, the conventional CLP approach would incur substantial local communication between the neighboring sensors, since the exact value of a predictor (Y) available at one node needs to be communicated to a neighboring node in order to encode X. Using DSC, communi- cation between sensors could be avoided if the correlation information is available at the sensors, leading to signi¯cant reduction in energy consumption. Practically, correlation information may need to be estimated, and some aspects of this correlation estimation are studied in this research [9,15] (Chapter 4). 12 Anotherexampleofascenariowherepredictorscouldbeinaccessibleduringencoding is low complexity video encoding. Conventional video coding standards (e.g. MPEG, H.26x) follow the CLP approach, and to encode a block in the current frame, encoder would use motion estimation to ¯nd the optimal predictor block in the previous recon- structedframes. Thepredictionresidueisthencomputed,compressedandcommunicated to the decoder. Motion estimation is a well-established technique to exploit the temporal correlation between neighboring frames to achieve high compression ratio, but it incurs substantial computational complexity, leading to asymmetric coding frameworks where encoders are more complex than decoders. This poses challenges to some emerging ap- plications such as video sensor networks or mobile video, where encoders need to operate under power constraints and low complexity encoding would be more suitable. In con- trast to CLP, DSC requires only correlation information during encoding, and if encoder could acquire such information with minimal computation, low complexity encoding can be achieved [4,28,59]. It should be noted that the coding e±ciency of DSC depends strongly on the ac- curacy of the correlation information, and decoding error may occur if the correlation is not known accurately. We illustrate this with the coset example discussed in Sec- tion 1.2.2.1. Recall in this example the \correct" correlation between the source and the side-information is 4 > Y ¡X ¸¡4. Suppose the encoder has obtained inaccurate cor- relation information between the input data, say, 2 >Y ¡X ¸¡2, i.e., encoder thought X and Y were more correlated. Based on the inaccurate correlation information, the encoder determines the Slepian-Wolf limit, H(XjY), to be 2 bits, and uses four cosets to communicateX (Figure1.5). Followingthesamecodingsteps, decoderwouldreceivethe 13 X=2 A B C D A B C D … C C … 0 1 2 … 255 … P(X,Y): 4 > Y-X ≥ -4 (i) X distributes uniformly in [0,255]. X=2 … (ii) Encoder uses only four cosets {A,B,C,D} to convey X. X=2 … (iii) Encoder conveys X by sending the label of the coset which X belongs to (i.e., C). … C C … Y (iv) Decoder selects the member closest to Y and makes decoding error. Encoder: Decoder: X ^ Figure 1.5: Decoding error due to inaccurate correlation information. coset label, and to disambiguate this coset information decoder would select the member closest to Y and declare it as the reconstructed value of X. However, in this case, the reconstructed value could be di®erent from the original one, as illustrated in Figure 1.5, anddecodingerrorcouldoccur. Therefore, inDSC,iftheencoderover-estimatesthecor- relation a decoding error can occur. On the other hand, if the encoder under-estimates the correlation, we can still reconstruct X correctly at decoder, but we su®er a penalty in terms of coding performance. 14 1.2.3.2 Systems Robust to Uncertainty on Predictor Another property of DSC is that DSC systems tend to be more robust to error in the predictor, or to uncertainty on the predictor. As illustrated in the coset example, in DSC, the compressed information (i.e., coset label or symdrome) is computed based on the correlation information, instead of directly from the predictor as in CLP. Therefore, DSC is an approach to exploit the correlation without using the predictor directly in the encoding. By \de-coupling" the predictor from the encoding process and from the compressed information, DSC systems can be robust to errors on or uncertainties about the predictor. For example, it is possible to reconstruct the source exactly even when the side-information is corrupted by errors. This can be illustrated by the coset example (Figure 1.6). Recall that at the decoder we would select the member closest to the side- information Y as the reconstructed value for X. Suppose Y is corrupted by noise N. If thenoisepowerislessthansomelimit, wecanstillreconstruct X exactlyasillustratedin Figure 1.6. This property can be exploited in a DSC based video communication system, where X would be the current frame and Y would be the reference frame [60,62,78]. In this case even there is transmission error in the reference frame, it is still possible to reconstruct the current frame exactly and prevent any error propagation. This is in contrast to CLP systems such as MPEG/H.26x video coding schemes, where error would propagate until a video frame (or block) is intra-coded. Note that DSC exploits inter- symbol correlation (between X and Y) when encoding X, and thus could achieve better coding e±ciency than intra-coding of X. 15 … C C … Y Y+N Decoder: X ^ Figure 1.6: Robustness property of DSC. 1.3 Hyperspectral Imagery In Chapter 2 we propose DSC-based compression algorithms for hyperspectral imagery. We ¯rst brie°y review the basics of hyperspectral image compression in this section. Hyperspectral images consist of hundreds of spatial images each being acquired at a particular frequency (spectral band) (Figure 1.7). Therefore, in hyperspectral data sets, the pixel values along the spectral direction depict the spectra of the captured objects, andthisspectralinformationcanbeusedinclassi¯cationapplications, e.g., identi¯cation of surface materials. The raw data size of hyperspectral images is non-trivial. For exam- ple, the images captured by AVIRIS (Airborne Visible/Infrared Imaging Spectrometer, operated by NASA) include 224 spectral bands, and a single hyperspectral image could contain up to 140 Mbytes of raw data [40]. Therefore, e±cient compression is necessary for practical hyperspectral imagery applications. In addition, hyperspectral images are usuallycapturedbysatellitesorspacecraftsthatuseembeddedprocessorswithlimitedre- sources,soencodingcomplexityisanimportantissueinhyperspectralimagecompression. Furthermore, due to the large amount of raw data, high speed encoding is desirable, and one approach to speed up encoding could be to perform parallel compression on several processors. This will be one of the topics investigated in this research. 16 wavelength Relative brightness Figure 1.7: Hyperspectral image. 1.3.1 Review of Hyperspectral Imagery Compression Approaches In a hyperspectral dataset many spectral bands are highly correlated. This is shown in Figure 1.8, where image mean-square residuals after simple alignment are shown for two di®erent views on a site. Neighboring bands tend to be correlated and the degree of correlation varies relatively slowly over a broad range of spectral regions. Thus, ex- ploiting inter-band correlation using, for example, inter-band prediction followed by 2D compression [51] or 3D wavelet decompositions [69], has proven to be a popular approach to compress hyperspectral images 1 . 1.3.1.1 Inter-band Prediction Approaches In inter-band prediction approaches, a band is predicted using previously encoded bands andtheresultingpredictionresidualsareencodedusingstandardimagecodingtechniques (transformation followed by quantization and entropy coding). Since, typically, the pre- diction residue has a much lower energy than the original band, encoding the residue 1 As will be illustrated later, it is easy to modify an algorithm that exploits cross-band correlation so that it operates independently in each frame when correlation is low, as is the case in some spectral regions in Fig. 1.8. 17 50 100 150 200 −60 −50 −40 −30 −20 −10 0 10 20 30 Spectral Band Index Mean Square Residuals (in dB) Cuprite SC01 Cuprite SC02 Figure 1.8: Mean square residuals after simple image alignment and subtraction. usually requires fewer bits than encoding the original band. Inter-band prediction ap- proaches are analogous to the standard MPEG/H.26x compression algorithms for video, butmotionestimation/compensationarenotnecessarysinceco-locatedpixelsindi®erent spectral bands represent the same ground object (at di®erent frequencies). Inter-band prediction approaches can achieve high compression ratio with moderate memory requirements. However, there are several drawbacks. First of all, inter-band predictionmethodsneedtogenerateexactcopiesofthedecodedbandsattheencoder, so encoders need to perform decoding as well, and decoding complexity could be signi¯cant, e.g., comparable to encoding complexity. [51] has proposed using only full bit-planes to form the predictors at the encoder and decoder. This could avoid bit-plane decoding at the encoder. However, since this approach does not utilize fractional bit-plane informa- tion in reconstruction, the predictors in general have worse qualities compared to that of conventional inter-band prediction methods, leading to degradation in coding perfor- mance. Second, inter-band predictive methods are inherently serial, since each band is encoded based on a predictor obtained from previously decoded bands. Therefore, it is 18 di±cult to scale up the processing speed of an inter-band predictive encoder to handle the high data rate generated by hyperspectral imaging instruments. Furthermore, it is di±cult to achieve e±cient rate scalability. This is because bit-rate scaling by on-the-°y truncation of the bit-stream during transmission may lead to a di®erent reconstruction at the decoder, resulting in drifting. 1.3.1.2 3D Wavelet Approaches 3D wavelet methods, including 3D-SPECK, 3D-SPIHT [69], or 3D-ICER [38,39] devel- oped by NASA Jet Propulsion Laboratory, provide alternatives to predictive techniques. 3D wavelet methods can also exploit inter-band correlation by performing ¯ltering across spectral bands, with the expectation that most of the signal energy will be concentrated in low pass subbands (corresponding to low spatial and \cross band" frequencies). As an example to illustrate 3D wavelet approaches, in 3D-ICER [38,39], a modi¯ed 3D Mallat decomposition is ¯rst applied to the image cube. Then mean values are subtracted from the spatial planes of the spatially low-pass sub-bands to account for the \systematic di®erence" in di®erent image bands [39]. Then bit-plane coding is applied to the transform coe±cients, with each coe±cient bit adaptively entropy-coded based on its estimated probability-of-zero statistics. The probability is estimated by classifying the current bit to be encoded into one of several contexts according to the signi¯cance status of the current coe±cient and its spectral neighbors. A di®erent probability model is associated to each context. 19 While 3D wavelet methods can achieve good compression e±ciency with excellent scalability, a main disadvantage is that they lead to complex memory management is- sues. A naive implementation would consist of loading several spectral bands in memory so as to perform cross-band ¯ltering, leading to expensive memory requirements. More sophisticated approaches are possible, e.g., loading simultaneously only subbands corre- sponding to a given spatial frequency in various spectral bands, but these approaches have the drawback of requiring numerous iterations of memory access. 1.4 Contributions of This Research In this research we propose novel image and video coding algorithms based on DSC in order to address some of the de¯ciencies in conventional compression framework. We also address the correlation estimation problem in practical DSC applications. The main contributions of this research include the following: 1. Wavelet-based Slepian-Wolf coding for hyperspectral imagery [8,14,67]. ² We exploit the DSC principle to design hyperspectral imagery compression with an eye to an e±cient parallel encoder implementation. We combine set partitioning of wavelet coe±cients with our proposed DSC techniques to achieve competitive coding performance. Our proposed system can achieve a comparable coding e±ciency to a simple 3D wavelet codec developed at NASA-JPL. ² WeproposeadaptivecodingstrategiesthatoptimallycombineDSCwithintra coding for wavelet-based DSC applications. Experimental results demonstrate 20 that these adaptive strategies can lead to up to 4dB improvement compared with a non-adaptive system. 2. Flexible video decoding [10{13,16]. ² Weinvestigatecompressiontechniquestosupport°exiblevideodecodingbased onDSC.Theproposedalgorithmincorporatesdi®erentmacroblockmodesand signi¯cance coding within the DSC framework, and combined with a judicious exploitation of correlation statistics the proposed algorithm can outperform °exible decoding techniques based on conventional CLP coding. ² We study the information-theoretical achievable rate bound for the °exible decoding problem under predictor uncertainty. 3. Sampling-based correlation estimation for DSC [9,15,17]. ² WithintheDSCframework,weproposemodelstocharacterizetherelationship betweenthenumberofsamplesusedinestimationandthecodingratepenalty. ² In compression of multiple binary sources and under constraints on the total number of samples, we propose an algorithm to determine the number of sam- ples allocated to di®erent sources so that the minimum overall rate penalty can be achieved. ² Weproposemodel-basedestimationfordistributedcodingofcontinuousvalued input sources. Experimental results, including some based on real image data, demonstrate the e®ectiveness of the proposed algorithms. 21 Therestofthethesisisorganizedasfollows. Wediscusshyperspectralimagecompression in Chapter 2. Chapter 3 discusses how to address °exible decoding based on DSC. Chapter4studiesthecorrelationestimationprobleminDSC.Finally,Chapter5concludes the research and discusses future work. 22 Chapter 2 E±cient Wavelet-based Predictive Slepian-Wolf Coding for Hyperspectral Imagery 2.1 Introduction 2.1.1 Motivation In this chapter we propose novel compression algorithms for hyperspectral imagery that facilitate parallel and low complexity encoding, while achieving competitive compression performance. Ourproposedtechniquesusewavelet-basedencodingtoenablelossytoloss- less, scalable encoding of the spectral bands. This is combined with distributed source coding techniques [63], which are used to exploit the inter-band correlation. As discussed inChapter1.2, SlepianandWolf[63]provedthattwocorrelatedsourcescanbeoptimally encoded even if the encoder only has access to the two sources separately. This counter- intuitive result permits in principle signi¯cant complexity and communication overhead reductions in parallel encoding con¯gurations, while preserving the ability to optimally compress the data (approaching the same performance as conventional schemes based on predictive framework) by exploiting the redundancy in the correlated spatial images 23 at adjacent spectral bands. These advantages are particularly important for hyperspec- tral imagery compression, where low complexity, high speed encoding is most needed. Other applications of Slepian-Wolf coding include data aggregation in sensor networks (e.g., [55,85]) and video coding (e.g., [28,59]). In the video coding applications, the correlated sources are successive video frames. In this work, correlated sources will be successive bands of hyperspectral imagery. 2.1.2 Our Contributions and Related Works Ourproposedscheme,set-partitioninginhierarchicaltreeswithSlepian-Wolfcoding(SW- SPIHT),isanextensionofthewell-knownSPIHTalgorithm[61]. SW-SPIHT¯rstusesan iterativeset-partitioning algorithm to extract bit-planes. Bit-planes at the same bit posi- tioninneighboringbandsareshowntobecorrelated. Oncethe¯rstspectralband, which is encoded independently, is available to the joint decoder, bit-planes can be extracted from it and successive bit-planes at corresponding subbands and signi¯cance levels from the second spectral band can be decoded. All bit-planes other than those from the ¯rst spectralbandareencodedindependentlyusinganLDPCbasedSlepian-Wolfcode[43,45] and jointly decoded by a sum-product decoding algorithm. As an example of coding performance, for the NASA AVIRIS hyperspectral images data set, at medium to high quality, our baseline SW-SPIHT can achieve up to 5dB gain compared to 2D-SPIHT on individual bands, while an adaptive SW-SPIHT codec to be discussed in Section 2.6 can achieve up to 8dB gain compared to 2D-SPIHT and a comparable performance as a simple 3D wavelet system developed at NASA-JPL [8]. 24 Note that when all bit-planes are encoded SW-SPIHT can also provide lossless com- pression. Inmanyhyperspectralimagingapplicationspreservingthespectralsignatureis important (e.g., the spectral signature may be used for classi¯cation and preserving clas- si¯cation rates becomes important [69]). SW-SPIHT provides °exibility in the choice of operating points, so that the rate can be selected in order to preserve the spectral signa- ture. A detailed analysis is presented in Sec. 2.5.3, which demonstrates that SW-SPIHT can provide a more uniform distortion pro¯le across bands than 3D wavelet techniques. This is shown to be advantageous in terms of preserving the spectral signature. To the best of our knowledge we were the ¯rst to propose the application of DSC techniquesinthecontextofhyperspectralimagery[14,67],whileapplyingDSCforlossless hyperspectral image compression was also proposed in [6,47]. Another key novelty of our work is that we combine (i) DSC techniques operating on binary data and (ii) bit- plane successive re¯nement encoding based on set partitioning, a technique that has been broadly used in wavelet-based image coding. These two techniques achieve good coding e±ciency by exploiting di®erent characteristics of the input data, namely, spatial and frequency localization of wavelet coe±cient energy (set partitioning) and correlation acrossspectralbands(DSC).Weshowthatbycombiningthesetechniques,sothatDSCis appliedwhenitprovidesthemostgain,abetterperformanceisachievedthanifDSCwere applied directly to \raw" bit-planes (i.e. complete bit-planes, rather than set-partitioned ones). Morespeci¯cally,ourproposedcodecreliesonstandardset-partitioningtechniques tosignalthelocationof\signi¯cant"waveletcoe±cients,whileusingDSCtoencodesigns and re¯nement bits. 25 Note that DSC techniques require the encoder to have information about the correla- tion between the source being encoded and side information available at the decoder. In our application, the side information, i.e., neighboring bands, is actually available at the encoder and thus correlation can be estimated exactly. However, to estimate this corre- lation accurately may involve a signi¯cant overhead, in terms of memory and complexity attheencoder. Thus, anotherimportantnoveltyinourworkisthatwetakeintoaccount the cost involved in estimating inter-band correlation. In this chapter we discuss a direct approach to estimate correlation and demonstrate that this results in minimal losses in compression performance. In Chapter 4 we will discuss a model-based estimation, and illustrate how the model-based approach could facilitate parallel compression of spectral bands. The proposed hyperspectral image coding algorithm has potential advantages when compared with competing techniques that exploit cross-band correlation, such as inter-band predictive methods and 3D wavelet techniques. In inter-band prediction approaches [51], a band is predicted using previously en- coded bands and the resulting prediction residuals are encoded using standard image coding techniques. As discussed in Chapter 1.3.1, there are several drawbacks of inter- band prediction approaches, namely (i) high encoding complexity due to the requirement ofreplicatingdecoderreconstruction,(ii)inherentlysequentialencoding,and(iii)di±cult to achieve rate scalability. Our proposed DSC based algorithms can, however, address these shortcomings. First, DSC requires only access to correlation statistics, and these statistics can be reliably estimated with low complexity from uncoded data, as will be shown. Second, a DSC approach has the potential to enable parallel encoding with mul- tiple processors. Speci¯cally, once the inter-band correlations have been estimated, each 26 processor can in principle proceed independently. While correlation estimation requires dataexchangeacrossbands, thisprocesscouldbemuchsimplerthanencoding/decoding, as we will discuss in more detail. This inherent parallelism can facilitate hardware im- plementations and greatly increases the on-board encoding speed. Third, our proposed algorithmsfacilitatescalability. WeapplyDSCtobit-planesextractedfromwaveletcoef- ¯cient data. A given bit-plane in a given subband depends only on the same bit-plane in a neighboring spectral band. Thus, once hyperspectral data has been encoded, e±cient rate scalability can be achieved by decoding all spectral bands up to the same bit-plane resolution level. Note that the rate scalability problem in hyperspectral imaging is anal- ogous to that in video compression scenarios, for which DSC techniques have also been proposed recently [76,77]. 3D wavelet methods provide alternatives to predictive techniques. As discussed in Chapter 1.3.1, some drawbacks of 3D wavelet methods are that they lead to expensive memory requirements or complex memory management issues. In contrast, our proposed DSC-based algorithms only require storing in memory a single spectral band at a time, once correlation statistics are estimated. These lower memory requirements could po- tentially lead to lower power consumption at the encoder, since a substantial amount of o®-chip memory access would be avoided. This is particularly important because o®-chip memory accesses often consume up to one order of magnitude higher power than on-chip data accesses [54]. This chapter is organized as follows. We ¯rst present an overview of the proposed systeminSection2.2,thebaselinecodecinSection2.3,andourpredictionandestimation model in Section 2.4. Implementation and experimental results for the baseline codec are 27 describedinSection2.5. Section2.6discussestheadaptivecodec. Section2.7presentsthe experimental results for the adaptive hyperspectral image codec. Section 2.8 concludes this chapter. 2.2 System Overview Figure 2.1 shows an overview of the proposed encoding system. The proposed encoder consists of multiple processors, and each processor compresses one entire spatial image at a time, using the algorithms to be discussed in detail in Section 2.3. As discussed in Section 1.2, with a DSC approach encoding needs only the correlation information to exploit inter-band redundancy. In particular, if this correlation information is available, each encoding thread compressing one spatial image would be able to proceed in paral- lel, and parallel encoding of hyperspectral dataset can be achieved. Therefore, one key question is how to estimate the correlation information e±ciently during encoding. It is, however, nontrivial to estimate this correlation information, for the following reasons: ² The spatial images of di®erent spectral bands are resided at di®erent processors, and the communication bandwidth between the processors could be limited; ² Data exchanges between the processors may impact parallel processing, as the pro- cessors may have to remain idle to wait for the data. To address these constraints, in Sections 2.4 and 4.4 we will propose several techniques to achieve accurate estimation of correlation with small amounts of inter-processors com- munication overhead and minimal dependencies of di®erent encoding threads. 28 Spectral Band Processor Processor Processor Processor X Y P(X, Y) Parallel encoding Compressed data Figure2.1: Theproposedparallelencodingsystem. Eachprocessorcompressesonespatial image at a time. 2.3 Codec Architecture Considertwohyperspectralbands, X andY, anddenote ^ X thereconstructionofbandX at the decoder, which will be used to produce the side-information to decode Y 1 . This side information, ^ X 0 , is generated by linear prediction ^ X 0 =® ^ X+¯, where ® and ¯ will be estimated at the encoder. Let us assume ¯rst that the correlation statistics are known to both the encoder and the decoder. In particular, assume that for every set of binary data to be encoded (e.g., a bit-plane or part of a bit-plane extracted from Y), we have access to the \crossover probabilities", i.e., the probabilities that two bits in corresponding bit-plane positions of ^ X 0 and Y, respectively, are not equal. These crossover probabilities will tend to be di®erent at each level of signi¯cance (i.e., crossover probability will tend to increase from MSB to LSB bit-planes). Section 2.4 will present techniques to estimate e±ciently both 1 Note that, as will be discussed later, decoding is possible with many reconstructions of X at the decoder; as coarser versions of X are used, the reconstruction of Y will be correspondingly coarser. This facilitates rate scalability, i.e., multiple operating points can be achieved with a single embedded bitstream. 29 crossover probabilities and prediction parameters from input data; these techniques re- quireprocessingasmallfractionofpixelsinspectralbands X andY sothatcomputation overhead is kept low. In our work we use SPIHT [61], a well known wavelet-based image coding algorithm, as a starting point. Similar ideas could be applied to other image coding algorithms that achieve successive re¯nement of information by representing data in bit-planes. In each pass, SPIHT uses a signi¯cance test on wavelet coe±cients to partition them into two sets: the signi¯cant set and the insigni¯cant set. Bits corresponding to signi¯cance information are entropy coded and output by the encoder; they allow the decoder to update the list of coe±cients in the signi¯cant set. A block diagram of our proposed system is shown in Figure 2.2. Band X is en- coded and decoded independently (i.e., without information from any other band) using a wavelet transform and SPIHT coding. The reconstructed band ^ X will then be used to form side-information to decode Y. As for band Y, the ¯rst step is again a wavelet transform T(f;n) where f is the ¯lter used in the transform and n is the number of transformationlevels. ThenSW-SPIHTsuccessivelyupdatesthesetofsigni¯cantwavelet coe±cients of Y at each pass. As shown in Figure 2.3, at the end of each iteration, a sign bit-plane, a re¯nement bit-plane and corresponding signi¯cance bits are generated. Sign bits and re¯nement bits are encoded using an LDPC-based Slepian-Wolf code and corresponding syndrome bits are output to the bitstream. However, signi¯cance bits are encoded independently using intra coding (in particular, zero tree coding in our imple- mentation), i.e., exactly as they would have been coded in a standard SPIHT approach. 30 Joint Decoding Source(Y) Source(X) T(f, n) T(f, n) SW-SPIHT(m) SPIHT(m) X Remote Site T (f, n) -1 b b = SPA(s , b ) SPIHT (m) -1 s = H b Y T (f, n, b ) Independent Decoding -1 Remote Site x x y y y y y Filtering T(f, n) Figure 2.2: Block diagram of SW-SPIHT. Significance Sign Refinement LDPC LDPC Set Partitioning B I T S T R E A M Syndrome Syndrome Pass Y Figure 2.3: Bit-plane coding in SW-SPIHT. In Section 2.6 we will analyze the conditions when intra coding of the signi¯cance infor- mationcould be more e±cientthan exploitingthe inter-bandcorrelation, andpropose an adaptive combination of the techniques. In what follows, b w , b w i , and b w i (l) denote a bit-plane, the i-th bit-plane and the l-th bit of the i-th bit-plane of image W, respectively. Also in what follows, unless otherwise stated, bit-planes are sets of sign bits and re¯nement bits as generated after 31 set partitioning at a given level of signi¯cance. This is illustrated by Figure 2.3. The encoder comprises the following steps (see Figure 2.4). E-1. Estimation of predictor coe±cients ® and ¯ using a subset of information in X and Y; E-2. Application of the prediction coe±cients to obtain wavelet transform coe±cients of X 0 ; E-3. Computation of wavelet transform of Y; E-4. At each iteration, set partitioning of the wavelet coe±cients of Y to extract bit- planes b y i (1·i·m); E-5. Application of the signi¯cance tree of Y to the wavelet coe±cients of X 0 to extract bit-planes b x i (1·i·m); E-6. Computation of ^ p i , estimated crossover probability of the bit-plane pair (b x i ;b y i ) (1·i·m) of X 0 and Y respectively; E-7. Determination of the Slepian-Wolf coding rate based on the estimated crossover probability; E-8. Generation of parity-check matrix for b y i (1·i·m). ThecompressedbitstreamgeneratedforY includes,foreachcodingpass,thecorrespond- ing signi¯cance map and the syndromes generated for sign and re¯nement bit-planes. Note that in this algorithm it is not necessary to have access to an encoded version of X. 32 E-3: Wavelet Transform E-4: Bitplane Encoding E-5: Apply Significance Tree of Y E-2: Linear Filtering X Y E-1: Estimate Predictor Coefficients E-6: Estimate Crossover Probability b i y b i x E-8: Compute Syndrome p i ^ b i y bitstream T(X)+ T(1) T(X) T: wavelet transformation Figure 2.4: Encoding using proposed system. Also, we will discuss in Section 2.4 how prediction coe±cients and crossover probabilities can be estimated with low complexity. At the decoder, the reconstructed ^ X 0 is transformed using T(f;n), i.e., the same wavelettransformationusedonY attheencoder. Thenthesigni¯cancetreeofY (not X) isused to parse the waveletcoe±cients of ^ X 0 in order to extract the bit-planes to be used as side information. Note that the signi¯cance tree is sent to the decoder directly (i.e., codedin\intra"mode)andthuswillbeavailablewithoutrequiringanysideinformation. This is an important aspect of our algorithm because we have chosen to partition Y into setsbeforeapplyingSlepian-Wolfcodingtechniquestosomeofthedata. Thus,inorderto producethe\right"sideinformationfordecodingwemustapplythesamesetpartitioning to ^ X 0 . The LDPC sum-product algorithm (SPA) is used to decode the bit-planes of Y given syndrome bits and side-information bit-planes from ^ X 0 . Whenallbit-planesaredecodedandcoe±cientshavebeenre¯nedtoadesiredquality level, the decoder applies the inverse wavelet transform T ¡1 (f;n) to reconstruct ^ Y, an estimate of Y. Since Slepian-Wolf coding is used to code these bit-planes, they can be 33 transmitted with no or negligible information loss, as long as the correlation model is correct. Information loss would only occur if some of the crossover probabilities were un- derestimated. Note also that simple quality scalability can be achieved with our scheme; since any bit-plane in Y is encoded based on a single bit-plane in X, we can scale the rate by stopping the bit-plane re¯nement at the same level of signi¯cance in both X and Y. SW-SPIHT can also provide lossless compression for hyperspectral imagery when all bit-planes are coded, provided that an integer-to-integer wavelet transform [7] is used. Note that the least signi¯cant bit-planes tend to be uncorrelated from image to image and also have near maximum entropy; thus, in lossless applications, these bit-planes can be sent uncoded. Crossover probabilities are used by the encoder to determine the compression rate. This rate determines which parity-check matrix should be used for a given bit-plane. In SW-SPIHT, irregular Gallager codes are used. A table is built o²ine that associates di®erentcrossoverprobabilitieswithrandomseedsforproperparity-checkmatrices. Once the crossover probability between a bit-plane and its corresponding side-information bit- plane is obtained, a proper parity-check matrix can be selected at run-time. To make sure the same parity-check matrix is used at the decoder, the random seed used by the encoder to generate the parity-check matrix is sent to the decoder. To match the exact bit-plane width, column puncturing and splitting is used on the parity-check matrix. In summary our decoder comprises the following steps: D-1. Application of prediction coe±cients to obtain ^ X 0 =® ^ X +¯; D-2. Transformation of ^ X 0 using the same wavelet transform used for Y at the encoder; 34 D-3. Application of the signi¯cance tree of Y to the wavelet coe±cients of ^ X 0 to extract m bit-planes b x i (1·i·m); D-4. Computation of a priori probability Pr(b y i (j) = 0jb x i (j)) for b x i (j) = 0 or 1. using SPA. Note that our proposed technique can be also extended to support multiple sources of side information. For example, if we consider encoding each bit-plane of the current band, n, which we denote X n , using the corresponding bit-planes in the two previous bands, n¡1 and n¡2, denoted X n¡1 and X n¡2 , respectively, we could in theory achieve anencodingrateclosetoH(X n jX n¡1 X n¡2 ), andthiswouldbesmallerthanthatofusing onlysinglesideinformation,H(X n jX n¡i ),i=1;2. Thiswouldrequireaminimalincrease incomplexityattheencoder(duetocomputationofadditionalpredictioncoe±cientsand crossover probabilities) but would lead an increase in decoder complexity. We tested this approach for the datasets considered in this chapter, and observed that the gains may notjustifytheadditionalcomplexityatthedecoderexceptlosslessornearlosslesscoding operation. For most bit-planes, using band n¡1 alone as side information already leads to signi¯cant compression gains, and a relative small conditional entropy, H(X n jX n¡1 ). In our observation, the additional compression gain when using X n¡2 as additional side information, i.e., H(X n jX n¡1 )¡H(X n jX n¡1 X n¡2 ), tends to be relatively small. As an example, Figure 2.5 shows H(X n jX n¡1 ) and H(X n jX n¡1 X n¡2 ) at di®erent bit-planes of typical spectral bands. As shown in the ¯gure, the reduction in coding rate achievable whenusingmultiplebandsassideinformationisonlyaround0:01bits/sampleinthemore signi¯cantbit-planes,whichformanylossycompressionapplicationswouldnotjustifythe 35 5 6 7 8 9 10 11 12 13 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Bit Positions of Bitplanes Conditional Entropy (bits/sample) H(X n |X n−1 ) H(X n |X n−1 X n−2 ) Figure 2.5: Example of conditional entropy of di®erent bit-planes. additional complexity at the decoder. As for the less signi¯cant bit-planes, the reduction inconditionalentropywhenusingmultiplebandsassideinformationislarger(upto0:05 bits/sample), so that in lossless or near-lossless scenarios multiple side information may be useful. Given that we are not focusing speci¯cally in the near-lossless or lossless case, the rest of this chapter describes our design and experimental results based on a single band used as side information. 2.4 Estimating the Correlation and Encoder Complexity Comparison The performance of DSC techniques depends strongly on the estimation of correlation and prediction parameters. In our system, we need to estimate two sets of parameters, namely, (i) the linear prediction coe±cients, ® and ¯, and (ii) the bit-plane crossover probabilities. In this section, we demonstrate that accurate estimation of correlation 36 parameters can be achieved using techniques involving a limited number of data trans- fers and computations. Because this estimation is accurate and requires low complexity, our proposed DSC techniques compare favorably with inter-band predictive approaches, whichusuallyinvolveasubstantialamountofdatatransfer(e.g., ifawholespectralband is predicted using another spectral band, then all pixels in the predictor image need to be fetched in order to generate a prediction residue). Reduction in the amount of data transfer is particularly important for applications operating in embedded environments, suchashyperspectralimagerycompressioninsatellites. Intheseapplicationstheencoder may only have enough internalmemory to accommodate the current spectral band (since the application programs and operating systems may have occupied signi¯cant portions of the internal memory). In order to perform prediction, the system would need to fetch the relevant information from neighboring bands, which is likely to be stored in external memory. Such external memory accesses usually lead to substantial power consumption and delay. For example, while some sophisticated CPU/DSPs can handle multiple arith- metic operations in a single cycle, accessing external memory data may incur latency of the order of tens of cycles [64]. So it is desirable to reduce the total amount of data exchanged, which translates into reduction in overall system complexity. In what follows we present low complexity techniques in estimating prediction coe±- cients and correlation. We also compare the encoder complexity of the proposed system with two competing techniques, namely those based on inter-band prediction and 3D wavelets. 37 2.4.1 Estimation of Predictor Coe±cients and Correlation The encoder can determine a rough level of correlation after it estimates ® and ¯ by computing an estimate of the residual energy after prediction. If this energy is above a certain threshold, the spectral band can be coded in intra mode (i.e., independently of other bands); with the coding mode reverting to DSC mode when the residual energy goes under the threshold. For example, Band 162 in Figure 1.8 can be coded in intra mode. Note that in real data sets we have considered, a majority of bands can be coded using DSC (e.g., 95% of bands in the Cuprite data set we use in our experiments). 2.4.1.1 Estimation of Predictor Coe±cients As discussed earlier, we use a linear predictor X 0 =®X+¯ to generate side-information for Y. The least-squares technique can be used to calculate ® and ¯. In order to reduce the complexity (and data exchange requirements) of this process, we ¯rst down-sample the spectral bands and use only pixels in the down-sampled bands for estimation. As shown in Figure 2.6, with only 0:32% of pixels, the resulting predictor can achieve a predictionmeansquareerror(MSE)within0.05ofthatoftheoptimalpredictor(i.e.,that computed using all pixels in X and Y). By using only a small fraction of data we reduce data exchange and computation in the least-squares calculation, without compromising the performance of the predictor (or its impact on the crossover probability estimation). The overhead due to downsampling the data is usually negligible, as downsampling can be accomplished by incrementing the access position in data memory by a constant, and nowadays many CPU/DSPs have build-in hardware to support this operation and incur negligible overhead. 38 1 2 3 4 5 6 16.6 16.7 16.8 16.9 17 17.1 17.2 17.3 17.4 Percentage of Pixels in Source Images Used in the Estimation MSE of Linear Predictor Optimal Estimated Figure 2.6: MSE under ¯rst-order linear predictor for a typical spectral band. 2.4.1.2 Estimation of Crossover Probability We now consider estimation of the crossover probabilities at the encoder. These are needed to select an appropriate Slepian-Wolf coding rate at the encoder and to initialize the SPA at the decoder. In this section we ¯rst discuss a direct approach for correlation estimation,whileinChapter4wewilldiscussanothermodel-basedestimation. Toachieve low cost estimation we propose that only a small portion of bit-plane data (generated by set partitioning) be exchanged between spectral bands. Note that, since set partitioning tends to \scramble" the ordering of coe±cients, estimates of crossover probability after set partitioning are in general reliable 2 . We use the upper bound of the 95% statistical 2 Speci¯cally, set partitioning orders the transform coe±cients according to the bit levels they become signi¯cant and the zero trees they belong to [61]. 39 con¯dence interval as our estimate. Speci¯cally, the upper bound of the (1¡!)£100% con¯dence interval for a population proportion is given by [50] ^ p i = s i n i +z !=2 p p i (1¡p i )=n i (2.1) ¼ s i n i +z !=2 s s i n i µ 1¡ s i n i ¶ =n i (2.2) Here ^ p i is the estimate of the crossover probability of bit-plane pair (b x i ;b y i ), n i is the number of samples exchanged in estimating p i , s i is the number of exchanged samples for which crossover occurs, and z !=2 is a constant that depends on the chosen con¯dence interval, e.g., z !=2 = 1:96 when we use a 95% con¯dence interval. Note that we choose the upper bound as the estimator to minimize the risk of decoding failure, at the expense of some encoding rate penalty. Statistically, with this estimation, we are (1¡!)£100% con¯dent that the true crossover probability p i is within s i n i §z !=2 p p i (1¡p i )=n i . Hence the estimation error, 4p i = ^ p i ¡p i , is bounded by 0·4p i · 2z !=2 p p i (1¡p i )=n i with probability 1¡!. In addition, it can be shown that (refer to Appendix A for details): Pr(4p i <0)=!=2 (2.3) Pr ³ 4p i >2z !=2 p p i (1¡p i )=n i ´ =!=2; (2.4) which allows us to bound in a systematic way the probability of decoding error and the probability of incurring a large encoding rate penalty. Since the estimation process consists of simply counting occurrences of crossovers in small portions of two bit-planes, the overall estimation overhead is small. 40 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 Bit Positions of Bitplanes Estimates of Crossover Probabilities Optimal 0.26% bits of a bitplane 1% bits of a bitplane 9% bits of a bitplane Figure 2.7: Example of estimation of crossover probability. Note that for all bit-planes it ispossibletoachieveanaccurateestimateusingasmallpercentageofdatainabit-plane. Asanexampleoftheaccuracyofcrossoverprobabilityestimationusingthislowcom- plexitytechnique, Figure2.7showsatypicalestimationresultusingdi®erentpercentages of data from a bit-plane. As an example, with 5% of bits exchanged the crossover prob- ability estimate is within 0.003 of the actual crossover probability. Since we choose the compression rate to leave a margin of about 0.05 bits over the Slepian-Wolf limit (as esti- matedbyH(^ p i ), since weassumethesourcemodel as in [43]), this estimationaccuracy is su±cient. In addition, we also test this technique in our coding performance experiments (details in Section 2.5). There we use around 10% of data in a bit-plane for correlation estimationandourexperimentalresultsshowthattheestimatesareaccurateenoughthat no decoding errors occur. 2.4.2 Encoder Complexity Comparison In this section we compare the encoder complexity of our proposed scheme to that of inter-band prediction and 3D wavelet approaches. 41 2.4.2.1 Comparison with Inter-band Prediction Inter-band prediction approaches need to generate exact copies of the decoded bands at the encoder, so that the encoder needs to perform decoding as well. To encode the currentbandY usingneighboring bandX for prediction, the inter-bandencoderrequires following steps (Figure 2.8): I-1. Estimation of predictor coe±cients ® ¤ and ¯ ¤ (in this case approximate techniques could also be used as long as the chosen parameters are communicated to the decoder); I-2. Application of the prediction coe±cients to obtain ^ X 0 =® ¤ ^ X +¯ ¤ ; I-3. Computation of Y ¡ ^ X 0 to generate the residue; I-4. Transformation of residue using the wavelet transform; I-5. Set partitioning on the wavelet coe±cients of residue. Output bitstream; I-6. Inverse set partitioning; I-7. Inverse transformation; I-8. Adding ^ X 0 to the output of inverse transformation to generate ^ Y. Comparing the encoding steps of our proposed scheme (Figure 2.4) with inter-band prediction approach (Figure 2.8), we can make the following observations: (i) Bothschemesneedtocomputethewavelettransformandperformbit-planeencoding of the current band Y (Steps (E-3) and (E-4) in Figure 2.4, Steps (I-4) and (I-5) in Figure 2.8). 42 - * X * ^ I-4: Wavelet Transform I-5: Bitplane Encoding I-6: Bitplane Decoding I-7: Inverse Wavelet Transform + Y ^ X ^ I-2: Linear Filtering X * * Y I-3: I-8: bitstream I-1: Estimate Predictor Coefficients Figure 2.8: Encoding using inter-band prediction. (ii) The inter-band prediction approach has to perform an inverse wavelet transform (I-7). In our proposed scheme, we need wavelet coe±cients of the linear predictor X 0 for correlation estimation. However, forward transformation is not necessary here since wavelet coe±cients of X have been computed during the compression of previous band, and we can compute wavelet coe±cients of X 0 simply by T(X 0 )=T(®X +¯)=®T(X)+¯T(1) (2.5) whereT denoteswavelettransformation,and1isavectorofones. Wepre-compute ¯T(1) and use it for all the coe±cients in a spectral band. Therefore, computing T(X 0 ) in our proposed algorithm requires one multiplication and one addition per coe±cient as suggested by (2.5), and this amount of computation is in general less than that of inverse transform (I-7) in inter-band prediction approach. 43 (iii) In the inter-band approach, encoder needs to perform bit-plane decoding (I-6) 3 . In our proposed system we apply the signi¯cance tree of Y to the wavelet coe±cients of X 0 to extract bit-planes (E-5), for crossover probability estimation. Note that in (E-5) we merely extract coe±cients according to the signi¯cance tree of Y, and no signi¯cance test on partition is required, so this is similar to (I-6) in terms of complexity. We would like to emphasize that the complexity of our system can be further reduced by avoiding bit-plane extraction, since there are low complexity alternatives for correlation estimation, as will be discussed in Chapter 4. (iv) Theinter-bandpredictionapproachrequiressubtractingthepredictorfromthecur- rent band to compute the residue (I-3), and then adding back the predictor to the reconstructed residue (I-8). Since the subtraction/addition has to be performed on every pixel, the complexity here is of the order of the amount of data in one band. On the other hand, our proposed scheme needs only a small portion of data to estimate crossover probabilities (E-6). Also the complexity of generating syndrome (E-8) is linear (since the parity check matrix is sparse), and is of the order of the number of bit-planes we need to encode, which is usually small since in many lossy compression applications only high signi¯cant bit-planes are transmitted. Basedontheabovecomparisons,weconcludethatourschemerequireslowerencoding complexity than inter-band prediction approaches. 3 [51]hasproposedusingonlyfullbit-planestoformthepredictor. Thiscouldavoidbit-planedecoding at the encoder, but leads to performance degradation. In the general case when one wants to truncate at the middle of bit-plane, some decoding of the signi¯cance information or bookkeeping are necessary to determine the order of the wavelet coe±cients. 44 2.4.2.2 Comparison with 3D Wavelet Approaches 3Dwaveletapproachesoperateonmultiplespectralbandsatthesametime. Thisusually incurs substantial external memory access overheads in storing intermediate results. For example, using 3D wavelet approaches, 3D wavelet coe±cients need to be computed ¯rst, followedbyset-partitioningofthe3Dwaveletcoe±cients. Sincetheinternalmemorymay notbeabletoaccommodateseveralbandsof3Dwavelettransformcoe±cients,theyneed to be transferred back-and-forth between external and internal memory. In contrast, our proposedschemeoperatesoneachsinglespectralbandindependentlyoncetheinter-band correlation has been estimated, and wavelet transformation and bit-plane encoding of a single spectral band could be in general completed entirely in internal memory without incurring external memory access for storing intermediate data 4 . Hence the data access overheads in our scheme are much smaller than those involved in a 3D wavelet approach. 2.5 Experimental Results of Baseline Codec WehaveimplementedSW-SPIHTandappliedittothe16-bithyperspectralimages. Our C program implementations of the set partitioning and bit extraction algorithms are derived based on a MATLAB implementation of 2D-SPIHT [73]. The SPA we used for SW-SPIHT is based on the algorithm in Section III-A of [46], and is modi¯ed according to [68] for Slepian-Wolf decoding. 4 As an example, storing the transform coe±cients of a single 512£512£16-bit spectral band would need512£512£24-bitbu®ermemory,or768Kbytes. Notethattransformcoe±cientshavelargerdynamic ranges than the original pixel data. 45 In our experiments we use data sets originally comprising 224 spectral bands, each of size 614£512 pixels. Due to constraints of the implementation of our codecs (e.g., our implementation of wavelet transform can handle only power-of-two dimension data), in theexperimentswecompress512£512pixelsineachband,andintotal192bandsstarting from band number 33. Experimental results use SNR and PSNR for the comparison on individual frames and multiband SNR (MSNR) and multiband peak SNR (MPSNR) for the whole spectrum. These quantities are de¯ned as follows: MSE = E[(x¡ ^ x) 2 ] (2.6) SNR = 10log 10 µ E[x 2 ] MSE ¶ (2.7) PSNR = 10log 10 µ (65535) 2 MSE ¶ (2.8) (2.9) where, E(:) is the expectation operator over pixels from an image band. x is the 16-bit value of a source pixel and ^ x is the 16-bit value of reconstructed pixel of x. Also, MSNR = 10log 10 µ E[x 2 ] MSE ¶ (2.10) MPSNR = 10log 10 µ (65535) 2 MSE ¶ (2.11) where, now E(:) is the expectation operator over pixels from all spectral bands. The rates for individual image band are measured in bits per pixel (bpp) and those for the whole spectrum are in bits per pixel per band (bpppb). 46 The outline of this experimental study is given as follows: First, we provide a com- parison in terms of rate-distortion performance between SW-SPIHT and predictive 3D- SPIHT.Second,wecompareSW-SPIHTwithpredictive2D-SPIHT.Intheseexperiments, we use di®erent scenes and sites from the NASA AVIRIS data set including Cuprite Ra- diance (SC01), Mo®et Field Radiance (SC03) and Lunar Lake Re°ectance (SC02). In order to describe these alternative codecs and our implementations of them, we need following notations: 1. A denotes a general image band. 2. B i denotes the i-th image band from the spectrum. 3. 1 is the vector with all 1 elements. The dimension is set as the number of pixels used by the least squares predictor. 4. V(A) is the function to vectorize a ¯xed number of pixels from image band A. 5. For the predictor image bands A and source image band B i , ®(A;B i ) is the predic- tion slope coe±cient and ¯(A;B i ) is the prediction intercept coe±cient. 6. B 0 i (A) denotes the band after regression using least squares prediction, and the design matrix is given by X = (1;V(A)). Recall that the least squares coe±cients are given as follows: (¯;®) ¿ =(X ¿ X) ¡1 XV (B i ) where, ¿ is the transpose operator. 7. Regressionresidualsoftheleastsquarespredictorofthei-thframecanbecomputed as B i ¡B 0 i . 47 Cuprite Radiance (Scene SC01) 60 65 70 75 80 85 90 95 100 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) Predictive SPIHT-3D SW-SPIHT Figure 2.9: Rate-distortion curves of SW-SPIHT and predictive 3D-SPIHT - Cuprite. 2.5.1 Rate-Distortion Comparison with 3D Wavelet Approaches Before presenting the rate-distortion comparison of SW-SPIHT with a predictive variant of 3D-SPIHT, we brie°y describe these codecs and how we implemented them. Wemodify3D-SPIHTtoadjustthebandstakingintoaccounttheircorrelation. Thus, instead of operating on the original bands, (B 1 , B 2 , :::, ) we apply the wavelet transform and encoding to a new set of bands, (B 0 1 , B 0 2 , :::, ), obtained as follows: 1. B 0 1 =B 1 . 2. For all i > 1, B 0 i = ®(B i ;B 0 i¡1 )B i +¯(B i ;B 0 i¡1 ), and ®(B i ;B 0 i¡1 ) and ¯(B i ;B 0 i¡1 ) are directly encoded into the bitstream. We use this predictive 3D-SPIHT approach so as to better \align" all spectral bands, so that wavelet transform can better exploit the inter-band correlation. Figures 2.9 to 2.11 provide coding performance comparisons for the radiance data from the Cuprite and Mo®et Field sites, and the re°ectance data from the Lunar Lake 48 Moffet Field Radiance (Scene SC03) 55 60 65 70 75 80 85 90 95 100 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) Predictive SPIHT-3D SW-SPIHT Figure 2.10: Rate-distortion curves of SW-SPIHT and predictive 3D-SPIHT - Mo®et Field. Lunar Lake Reflectance (Scene SC02) 55 60 65 70 75 80 85 90 95 100 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) Predictive SPIHT-3D SW-SPIHT Figure 2.11: Rate-distortion curves of SW-SPIHT and predictive 3D-SPIHT - Lunar Lake. 49 site (log scale is used for the rate to facilitate the comparison at low bit-rates). To obtain the results of our predictive 3D-SPIHT, we ¯rst align the spectral bands as discussed in the previous paragraph and compress B 0 1 , B 0 2 , ::: with an implementation of 3D- SPIHT available in the public domain [23,24]. It can be seen that SW-SPIHT performs competitively, with some gain over 3D-SPIHT at most rate regions. In addition SW- SPIHT has moderate memory requirements for encoding. It should be noted that the performance of 3D-SPIHT can be improved by applying entropy coding (e.g., arithmetic coding) on the output bits. Similarly, we can improve SW-SPIHT by applying entropy coding on the signi¯cance bits information. Also note that results for 3D-SPIHT without prediction (not included here) are close to predictive 3D-SPIHT with a marginal loss at low bit-rates. It is well known that wavelet set-partitioning based codecs can precisely control the bit-rate. In other words, the SNR can be kept at a required level when the bit-rate is allowed to change. However this only holds for global SNR, and not necessarily for di®erent parts of the encoded stream. In the case of 3D-SPIHT, the SNR of individual spectralbandscanactually°uctuatesigni¯cantlyforagiventargetglobalSNR(variations of up to 5dB are possible, see Fig. 2.12 for an example). Another salient feature of SW- SPIHTisthatitallowstargetingindividualbandSNRs,sothat°uctuationsacrossbands can be kept very small (e.g., within 1 dB). Note that these variations are undesirable, as they could destroy the spectral signatures that are of primary interest in analysis of hyperspectral imagery. Refer to Section 2.5.3 for an example of how SW-SPIHT is better at preserving these spectral signatures. 50 Lunar Lake Reflectance (Scene SC02) 60 65 70 75 80 85 90 95 100 0 50 100 150 200 250 Band (index) PSNR (dB) MPSNR = 75.66 dB MPSNR = 70.24 dB Figure 2.12: Inter-band SNR °uctuation under 3D-SPIHT. 2.5.2 Rate-Distortion Comparison with 2D Wavelet Approaches We have also compared SW-SPIHT with two other 2D wavelet based codecs. The ¯rst is thestandard2D-SPIHTcodecthatoperatesindependentlyonallspectralbands,without cross-band prediction. The second is the predictive 2D-SPIHT codec, which operates as follows: 1. The ¯rst image band B 1 is encoded as is. 2. ^ B i¡1 , reconstruction of image band B i¡1 , is used to obtain a predictor for B i , B 0 i . 3. 2D-SPIHT codec is applied to B i ¡B 0 i for all i> 1; if the residual energy is above a certain threshold then B i is encoded directly. 4. Prediction coe±cients ®( ^ B i¡1 ;B i ) and ¯( ^ B i¡1 ;B i ) are sent as overhead. Note that the predictor used in predictive 2D-SPIHT is the preceding image band, and this is di®erent from the predictor used in the predictive 3D-SPIHT codec. The 2D- SPIHT we used to compress the images (in the intra-band codec) or the residues (in the 51 Cuprite Radiance (Scene SC01) 60 65 70 75 80 85 90 95 100 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) SW-SPIHT Predictive SPIHT-2D Figure 2.13: Rate-distortion curves of SW-SPIHT and predictive 2D-SPIHT - Cuprite. predictive codec) is the same MATLAB implementation from which our SW-SPIHT was derived [73]. Figures2.13to2.15providecomparisonsbasedontheradiancedatafromtheCuprite and Mo®et Field sites, and the re°ectance data from the Lunar Lake site. For Cuprite, SW-SPIHT achieves some gain at middle range bit-rates, but su®ers marginal loss at high bit-rates. The coding performance of predictive 2D-SPIHT improves at high bit- rates thanks to the better quality reconstruction used as predictor. For Mo®et Field and Lunar Lake sites, SW-SPIHT achieves marginal gain consistently, demonstrating competitive rate-distortion performance 5 . 5 Note that there are several signi¯cant di®erences between our system and predictive 2D-SPIHT. Our schemeappliesset-partitioningtothetransformcoe±cientsoftheoriginalbandandDSCisusedtofurther compress the sign/re¯nement bits after set-partitioning. In contrast, predictive 2D-SPIHT applies the wavelet transform to the prediction residue and then uses set-partitioning as the only \entropy coding" tool (i.e., sign/re¯nement bits are not compressed; this is true for 3D-SPIHT as well). Since, typically, the prediction residue has much smaller energy than the original band, set-partitioning on the residue can lead to fewer bits required to represent the signi¯cance tree, as well as fewer re¯nement and sign bits, as compared to set-partitioning on the original band. This gain is more signi¯cant at low rates. Moreover, while our proposed algorithms ignore the noise symbols dependency (we model the correlation noise as a simple i.i.d. source), predictive 2D-SPIHT exploits the spatial correlation that might remain in 52 Moffet Field Radiance (Scene SC03) 55 60 65 70 75 80 85 90 95 100 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) SW-SPIHT Predictive SPIHT-2D Figure 2.14: Rate-distortion curves of SW-SPIHT and predictive 2D-SPIHT - Mo®et Field. Lunar Lake Reflectance (Scene SC02) 55 60 65 70 75 80 85 90 95 100 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) SW-SPIHT Predictive SPIHT-2D Figure 2.15: Rate-distortion curves of SW-SPIHT and predictive 2D-SPIHT - Lunar Lake. 53 Cuprite Radiance (Scene SC01) 55 60 65 70 75 80 85 90 0.1 1 10 Bitrate (bits/pixel) PSNR (dB) SW-SPIHT SPIHT Figure 2.16: Rate-distortion curves of SW-SPIHT and SPIHT (Site: Cuprite, View: SC01, Band: 40). Figures2.16to2.18comparetheperformanceofSPIHTandSW-SPIHTonindividual image bands. We selected three pairs of bands from di®erent spectral regions where the levels of correlations are di®erent, as also shown in Figure 1.8. We did not select bands in spectral regions where the predictor sees large surges in mean square residuals, since thesebandshavelowcorrelationandintra-codingisusedinstead. Asshowninthe¯gures, SW-SPIHT outperforms SPIHT signi¯cantly, with up to 5dB gain in some rate regions. There are some variations in the PSNR gain due to variations of the energy among these images and correlations between images in these pairs. the residue, leading to some additional coding gain. Note that some advanced 3D wavelet methods (e.g., 3D-ICER developed in NASA-JPL [38]) also utilize this noise symbols dependency through modi¯ed 3D decomposition to improve performance. 54 Cuprite Radiance (Scene SC01) 65 70 75 80 85 90 95 0.1 1 10 Bitrate (bits/pixel) PSNR (dB) SW-SPIHT SPIHT Figure 2.17: Rate-distortion curves of SW-SPIHT and SPIHT (Site: Cuprite, View: SC01, Band: 133). Cuprite Radiance (Scene SC01) 70 75 80 85 90 95 0.1 1 10 Bitrate (bits/pixel) PSNR (dB) SW-SPIHT SPIHT Figure 2.18: Rate-distortion curves of SW-SPIHT and SPIHT (Site: Cuprite, View: SC01, Band: 190). 55 Lunar Lake Reflectance (Scene SC02) 80 82 84 86 88 90 92 94 96 98 100 0.1 1 10 Bitrate (bits/pixel/band) Correct Classification Rate (percentage) SW-SPIHT Predictive SPIHT-3D Figure 2.19: Classi¯cation performance (Site: Lunar Lake, View: SC02). 2.5.3 Preservation of Spectral Signature As mentioned earlier, SW-SPIHT allows encoding with very consistent quality across bands, a property that cannot be guaranteed with 3D-SPIHT. To illustrate the potential advantages of of SW-SPIHT in terms of signature preservation, we have also assessed its performanceinaremotesensingclassi¯cationapplication. Wetestedoursystemwiththe Spectral Angle Mapper algorithm (SAM) [18], which is a well-known algorithm designed to measure the similarity between the unknown test spectra and the reference spectra. Similar to the set-up in [69], we assume the classi¯cation results of the original image are correct, and measure the number of pixels of the reconstructed image which have the sameclassi¯cationresultsastheoriginalimagepixels. Figure2.19providesacomparison in terms of classi¯cation performance. As shown in the ¯gure our proposed approach outperforms 3D-SPIHT in general. This is because our approach can keep the variation of SNR small across bands. As a result, spectral signatures can be better preserved. 56 2.6 Adaptive Coding Strategy In Section 2.3 we have discussed a compression algorithm that applies DSC to sign and re¯nement bits, and intra coding to signi¯cance information. We now discuss a cod- ing scheme which adaptively applies DSC or intra coding to bit-plane data according to bit-plane statistics and correlation, with the algorithm discussed in Section 2.3 being a particular case when switching does not occur, as will be discussed. Experiment results suggest the adaptive coding strategy can lead to considerable improvements to the hy- perspectral image system. To justify the adaptive coding strategy, we ¯rst analyze the coding gains of intra coding/DSC tools under di®erent bits extraction scenarios. Fig- ure 2.20 each column corresponds to a wavelet coe±cient, and we extract and encode the bits bit-plane by bit-plane starting from the most signi¯cant bit-plane. As is usually done in wavelet image compression, we extract a sign bit only when the corresponding coe±cient becomes signi¯cant. The extracted sign bits can be encoded by DSC or intra coding (e.g., arithmetic coding). As will be discussed in detail in the next section, it is generally more e±cient to encode sign bits using DSC. 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 0 1 s s s s s s Most significant bit Least significant bit Sign Significance map Refinement bit Significance level l Each column is a wavelet coefficient Figure 2.20: Bit extraction. 57 magnitude bits raw bit-planes DSC intra significance maps refinement bit-planes bit-plane extraction: bit-plane compression: DSC intra DSC intra (a) (b) (c) (d) (e) (f) configuration: Adaptive coding scheme Figure 2.21: Coding con¯gurations. The magnitude bits can be extracted in two di®erent ways. We can extract and en- code all the magnitude bits of the same signi¯cance level (i.e., a raw bit-plane) in one singlepass. Alternatively, wecanpartitionthemagnitudebitsintosigni¯cancemapsand re¯nement bits, and encode them separately using DSC or intra coding (zerotree coding in our case). Therefore, it is possible to compress the magnitude bits using several dif- ferent coding con¯gurations, each representing a possible combination of DSC and intra coding under di®erent bit extraction scenarios (Figure 2.21). Our goal is to select and appropriately combine some of these con¯gurations so that the optimal overall coding performance can be achieved. In what follows we will ¯rst quickly dismiss several con¯g- urations and then examine the remaining ones in more detail. As will be discussed, the optimal coding strategy involves judicious application of con¯gurations (a) and (e) de- pendingonbitsigni¯cancelevelsandwaveletsubbands, leadingtotheproposedadaptive coding scheme. 58 In Figure 2.21, both con¯gurations (b) and (d) utilize only intra coding. However, (d) has been found to be more e±cient, as it exploits the di®erences in the zero-th order statistics of the signi¯cance maps and re¯nement bits. In addition, there exist e±cient methods to jointly encode the bits and convey the overhead for classifying the bits (e.g., set-partitioning in SPIHT). Therefore, we eliminate (b). We also eliminate (c) and (f) as both utilize DSC to encode signi¯cance maps, which could potentially result in a vulnerable system. This is because signi¯cance maps carry important structural information about the positions of signi¯cance and re¯nement bits. While a single error in the signi¯cance maps could lead to incorrect decoding of all the remaining bits, DSC usuallyhasasmallbutnon-zeroprobabilityofdecodingfailure. Thisistrueinparticular in our application, where it is infeasible to adopt the feedback architecture proposed in the literature [28] due to long delay. In the following sections, we discuss sign bits compression by DSC or intra coding, and magnitude bits compression using con¯gurations (a), (d) and (e). Note that the di®erence between (d) and (e) is in terms of compression of re¯nement bits and will be discussed in Section 2.6.1, while the di®erences between (a) and (e) are in the extraction and compression of signi¯cance bits, which will be discussed in Section 2.6.2. 2.6.1 Re¯nement/Sign Bits Compression It is well known that the re¯nement bits of wavelet coe±cients are almost random (i.e., marginal probability of the bits is close to 0:5), and this can be shown by inspecting the distributionsofthewaveletcoe±cients. Waveletcoe±cientscanbemodeledbyLaplacian distribution. Figure 2.22(a) shows how to estimate, from the p.d.f. of X i (coe±cients in 59 theithsubband),theprobabilitythatare¯nementbitbeingzeroandtheprobabilitythat a re¯nement bit being one. As shown in the ¯gure, the probabilities are almost the same, and hence intra coding cannot achieve much compression for re¯nement bits in general. Similarly, when a coe±cient becomes signi¯cant, the probabilities of the coe±cient being positive and being negative are almost the same (Figure 2.22(b)). Therefore, sign bits are almost random, with marginal probability close to 0:5, and intra coding in general cannot achieve much compression for sign bits as well 6 . |X i | f |Xi| (x) 2 l+1 p(b=0) p(b=1) … (a) Re¯nement bit. 2 l X i 2 l+1 f Xi (x) -2 l+1 -2 l p(s=+) p(s=-) (b) Sign bit. Figure 2.22: Probability that a re¯nement/sign bit being zero or one. On the other hand, the source X i and SI Y i are correlated (Y i are ith wavelet sub- band coe±cients of the previous spectral band after linear prediction), and samples of (X i ;Y i )concentratemostlynearthediagonalinascatterplot. Therefore, itispossibleto compress re¯nement/sign bits by exploiting inter-band correlation. For re¯nement bits, the events of crossover correspond to the regions A j in the sample space of X i and Y i in Figure 2.23(a). Usually there are only a few samples that occur inside the o®-diagonal 6 Note that some wavelet-based image codecs, e.g., JPEG 2000, assign di®erent probability models to re¯nement and sign bits depending on some information of the neighboring coe±cients, and achieve some rate reduction through these conditional codings [71]. 60 crossover regions. Therefore, the crossover probability is usually small, and substantial compression of re¯nements bits can be achieved by DSC 7 . |X i | |Y i | 0 2 l 2x2 l 3x2 l 2 l 2x2 l 3x2 l … … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A 0 A 1 A 2 (a) Re¯nement bit crossover. X i Y i 0 2 l 2x2 l -2 l -2x2 l A 0 A 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (b) Sign bit crossover. Figure 2.23: Events of re¯nement/sign bits crossover. Similarly, for the sign bits, the crossover events correspond to the regions A j in the sample space of X i and Y i in Figure 2.23(b). The probability of sign crossover is usually small, except for the lowest signi¯cance levels (small l), when the crossover regions are near the origin (A j starts at jX i j = 2 l ) and there are more samples occurring inside the sign crossover regions. Therefore, in our system, we compress sign and re¯nement bits with DSC to exploit inter-band correlation. This eliminates con¯guration (d) for magnitude bits compression, and only (a) and (e) are left for further evaluation. 7 Under some assumptions, the theoretical compression rate of DSC for binary source is given by H(p), where p · 0:5 is the crossover probability between the source and SI [43]. Therefore, the compression e±ciency of DSC increases as the crossover probability becomes smaller. 61 2.6.2 Signi¯cance Maps Compression and Raw Bitplane Coding The signi¯cance map is biased toward zero in general. This can be veri¯ed from the dis- tribution of the coe±cients (Figure 2.24(a)). Therefore, intra coding can lead to e®ective compressionforsigni¯cancemaps. However, thebiaswoulddecreaseforthelowersigni¯- cancelevels(whenl issmall)asshowninFigure2.24(b). Accordingly,intracodingwould become less e±cient when coding the least signi¯cant bit-planes. In addition, the bias would decrease as the variance of the coe±cients increases (see Figure 2.24(b); a more rigorous mathematical justi¯cation will be given later). Therefore, for low-pass subbands and higher level of decomposition subbands, intra coding may not be very e±cient. |X i | f |Xi| (x) 2 l+1 p(b=0) p(b=1) 2 l (a) Most signi¯cance levels. |X i | f |Xi| (x) 2 l+1 p(b=0) p(b=1) 2 l (b) Least signi¯cance levels/large variance. Figure 2.24: Bias in signi¯cance maps. Alternatively, instead of partitioning into signi¯cance and re¯nement bits, we can extract the magnitude bits as raw bit-planes and apply DSC to exploit inter-band cor- relation. For raw bit-plane, the events of bit crossover correspond to the regions A j in Figure 2.25(a). The probability of raw bit crossover is usually small, as there are only a few samples (X i ;Y i ) that occur inside A j . Therefore, DSC can achieve compression. On theotherhand, whenthesigni¯cancelevell issmall, theareaofeachcrossoverregionde- creases (A j are square regions with length 2 l ), and they become more evenly distributed 62 overthesamplespace(Figure2.25(b)). Asaresult, moresampleswouldoccurwithinthe crossover regions, and DSC would also become less e±cient as we encode least signi¯cant bit-planes. |X i | |Y i | 0 2 l 2x2 l 3x2 l 2 l 2x2 l 3x2 l … … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A 0 A 1 A 2 (a) Most signi¯cance levels. |X i | |Y i | 0 2 l 2x2 l 2 l 2x2 l 3x2 l … … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (b) Least signi¯cance levels. Figure 2.25: Raw bit-planes crossover. To summarize, both intra coding of signi¯cance information and inter-band coding of raw bit-planes can potentially achieve considerable compression for magnitude bits, and con¯gurations (a) and (e) are promising candidates. However, it is unclear which one we should use in di®erent situations to achieve the optimal coding performance. Therefore, weproposemodelingtechniquestopreciselyanalyzetheirperformancefordi®erentsource characteristics and correlation levels. 2.6.3 Modeling Themodelingtechniquesestimatethenumberofcodedbitsforcon¯gurations(a)and(e). Recallincon¯guration(a)weencodetheentirerawbit-planeusingDSC,whereasin(e)we partition the bits into re¯nement bits and signi¯cance maps and encode them separately using DSC and intra coding respectively (Figure 2.21). To determine the optimal coding strategy, we estimate and compare the number of coded bits at each signi¯cance level l 63 and for each wavelet subband i. The estimated number of coded bits for con¯guration (a) is given by: N i £(H(p raw (l;i))+m); (2.12) whereN i isthenumberofuncodedrawbitsatagivensigni¯cancelevelinwaveletsubband i (hence N i is equal to the number of coe±cients in subband i), and p raw (l;i) is the estimated crossover probability of the raw bit-plane at signi¯cance level l for coe±cients in wavelet subband i. H(p raw (l;i)) is the theoretical compression rate using DSC, and we add a margin m to account for the performance of practical systems. The estimates for p raw (l;i) can be derived by integrating the joint p.d.f. f X i Y i (x;y) over the crossover regions A j as shown in Figure 2.25. Details will be provided in Chapter 4.4, where we discuss how to estimate crossover probabilities for the purpose of determining the encoding rate. The estimated number of coded bits for con¯guration (e) is given by: N ref (l;i)£(H(p ref (l;i))+m)+N signif (l;i)£°(l;i); (2.13) where N ref (l;i) and N signif (l;i) are the number of uncoded re¯nement bits and signi¯- cancemapbitsatsigni¯cancelevellinsubbandirespectively,p ref (l;i)istheestimatefor re¯nementbitscrossoverprobability,and°(l;i)isthecompressionratioachievedbyintra coding. H(p ref (l;i))+m is the compression ratio achieved by a practical DSC scheme, and we can estimate p ref (l;i) as in Chapter 4. We model the signi¯cance map bits as an i.i.d. binary source and estimate °(l;i) by H(p 0 (l;i)), where p 0 (l;i) is the probability that signi¯cance map bits being zero. Assume the wavelet coe±cients in subband i are 64 Laplacian distributed with parameter ¯ i , i.e, f X i (x)= 1 2 ¯ i e ¡¯ i jxj , following from the def- initions of re¯nement bits and signi¯cance maps, N ref (l;i), N signif (l;i) and p 0 (l;i) can be estimated by N ref (l;i) = N i exp(¡¯ i 2 l+1 ) (2.14) N signif (l;i) = N i (1¡exp(¡¯ i 2 l+1 )) (2.15) p 0 (l;i) = 1¡exp(¡¯ i 2 l ) 1¡exp(¡¯ i 2 l+1 ) (2.16) We found that H(p 0 (l;i)) is a good estimate for the compression e±ciency of zerotree coding. 2.6.4 Adaptive Coding Scheme We compare (2.12) and (2.13) to determine the optimal coding con¯guration at each signi¯cance level and for each wavelet subband. Figure 2.26 shows the number of coded bits for di®erent signi¯cance levels for coding con¯gurations (a) and (e) in two wavelet subbands. The numbers are estimated using (2.12) and (2.13). As shown in the ¯gures, at the most signi¯cant levels, both schemes can achieve substantial compression, but by compressingthesigni¯cancemapusingintracodingitispossibletoachievebettercoding gain. Ontheotherhand, inthemiddlesigni¯cancelevels, codingtheentirerawbit-plane with DSC can achieve better results. As for the least signi¯cant bit-planes, both schemes cannot achieve much compression, as the bits there are equally likely and do not have much correlation with the corresponding SI. 65 Based on these modeling results, we propose an adaptive coding scheme: when coding the most signi¯cant bit-planes at the beginning, we partition the magnitude bits into re¯nement bits and signi¯cance maps, and apply DSC and intra coding (zerotree coding) respectively (i.e., con¯guration (e)). Later, in the middle signi¯cance levels, we switch to compress the entire raw bit-planes using DSC (i.e., con¯guration (a)) (Figure 2.27). We use (2.12) and (2.13) to determine the signi¯cance level at which con¯gurations switch- ing should occur. Note that for di®erent subbands, switching could occur at di®erent signi¯cance levels. Switching would occur earlier (at a higher signi¯cance level) for high decomposition level subbands as intra coding is less e±cient there 8 . Intuitively, in high decomposition level subbands, coe±cients would become signi¯cant earlier, and zerotree coding would become ine±cient as much partitioning would be needed. 2.7 Experimental Results of Adaptive Codec This section presents the experimental results of the adaptive DSC-based hyperspectral imagecodec. Weuse(2.12)and(2.13)todeterminethesigni¯cancelevelatwhichcon¯g- urations switching occurs for each subband. We compare with the non-adaptive scheme in Section 2.3 [67], which uses zerotree coding for signi¯cance maps for all signi¯cance levels (i.e. con¯guration (e) only). We use the NASA AVIRIS image data-sets in the ex- periment [38]. The original image consists of 224 spectral bands, and each spectral band 8 Recall p 0 (l;i) is the probability that signi¯cance map bits being zero. We can rewrite (2.16) as p0(l;i) = 1¡¿ 1¡¿ 2 , with ¿ = exp(¡¯i2 l ), hence 0 · ¿ · 1. It can be shown that p0(l;i) decreases mono- tonically with increasing ¿, with p 0 (l;i) = 1 when ¿ = 0 and lim ¿!1 p 0 (l;i) = 0:5. According, intra coding would become less e±cient for signi¯cance maps when ¿ is large, i.e., when l is small (at the least signi¯cance levels) and when ¯ i is small (when coe±cients distributions have large variances, in low-pass subbands and high decomposition level subbands). 66 consists of 614£512 16-bits pixels. In the experiment, we compress 512£512 pixels in eachband. Figures 2.28and2.29 showsomeof theresultsin compressingimages Cuprite (radiance data) and Lunar (re°ectance data). Here MPSNR = 10log 10 (65535 2 =MSE), where MSE is the mean squared error between all the original and reconstructed bands. As shown in the ¯gures, the adaptive coding scheme can provide considerable and con- sistent improvements in all cases, with up to 4dB gain at some bit-rates. Wealsocompare theDSC-based systemswith several3Dwavelet systems(3D ICER) developed by NASA-JPL [38]. As shown in Figure 2.28, the DSC-based system with adaptive coding is comparable to a simple 3D wavelet system (FY04 3D ICER) in terms of coding e±ciency. The simple 3D wavelet system uses the standard dyadic wavelet decomposition and a context-adaptive entropy coding scheme to compress coe±cients bits. However, there still is a performance gap when comparing the DSC-based systems to a more recent and sophisticated version of 3D wavelet (Latest 3D ICER). The more recent 3D wavelet developed in NASA-JPL exploits the spatial correlation remained in thecorrelationnoise[38]. ThiscouldbeonedirectiontoimprovetheDSC-basedsystems, which currently use a simple i.i.d. model for correlation noise and ignore the dependency between correlation noise symbols. We also compare the DSC-based systems with 2D SPIHT, and the DSC-based systems can achieve 8dB gains at some bit-rates. 2.8 Conclusions In this chapter, we have demonstrated a viable approach for compression of hyperspec- tral imagery. A novel scheme called SW-SPIHT is proposed. Our scheme can facilitate 67 parallel encoding with modest memory requirement. As for coding performance, we have compared our scheme with several existing techniques. Experimental results show that our scheme can achieve competitive coding e±ciency. In particular, our scheme is com- parable to a simple 3-D wavelet codec developed by NASA-JPL in terms of compression performance. Furthermore, our scheme can preserve spectral signatures and obtain good classi¯cation performance. Acknowledgments The authors would like to thank Sam Dolinar, Aaron Kiely and Hua Xie of NASA-JPL forhelpfuldiscussionsandforprovidingdataforcomparison. Theauthorswouldalsolike to thank Nazeeh Aranki of NASA-JPL for providing access to the AVIRIS hyperspectral images. 68 band 4 -50 0 50 1 00 1 50 200 250 300 0 2 4 6 8 1 0 1 2 1 4 1 6 No. of coded bits Significance level Config. (a) Config. (e) (a) The 4th subband. band 14 -1 0000 0 1 0000 20000 30000 40000 50000 60000 70000 0 2 4 6 8 1 0 1 2 1 4 1 6 No. of coded bits Significance level Config. (a) Config. (e) (b) The 14th subband. Figure 2.26: Modeling results. 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 0 1 s s s s s s Config. (e): signif. map intra- coded, ref. bits DSC-coded Config. (a): raw bit-plane DSC-coded No coding Figure 2.27: Adaptive coding scheme. 69 Cuprite Radiance (Scene SC01, Band 131-138) 65 70 75 80 85 90 95 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) DSC (Adaptive) DSC (Non-adaptive) Latest 3D ICER (Apr 06) FY04 3D ICER SPIHT Figure 2.28: Coding performance: Cuprite. Correlation information is estimated by a model-based approach discussed in Chapter 4. 70 Lunar Reflectance (Scene SC02, Band 41-48) 60 65 70 75 80 85 90 95 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) DSC (Adaptive) DSC (Non-adaptive) SPIHT Figure 2.29: Coding performance: Lunar. Correlation information is estimated by a model-based approach discussed in Chapter 4. 71 Chapter 3 Flexible Video Decoding: A Distributed Source Coding Approach 3.1 Introduction 3.1.1 Motivation In this chapter we investigate video compression algorithms to support °exible decod- ing for a number of emerging applications [11]. Flexibility in this context means that video frames can be decoded in several di®erent orders, while still exploiting redundancy between successively decoded frames (e.g., temporal or cross-view redundancy) 1 . The decoding order is decided only at the time of decoding, so that a choice among several available decoding paths can be made depending on the users' preferences or the operat- ing conditions. We focus on coding tools to generate a single compressed bit-stream that can be decoded in several di®erent ways, i.e., we assume it is not possible to request at decoding time (via feedback) coded data matching the chosen decoding order. 1 A trivial approach to enable °exible decoding would be to encode every frame independently, as an Intra frame. 72 Flexible decoding can be useful for several applications. Notably, it improves the accessibility of the compressed data, which is important for several emerging applications andforsomenovelimagerydatasets[53]. Forexample,somemultiviewvideoapplications such as free viewpoint TV [35,70] aim to enable users to play back di®erent views and to switchbetweendi®erentviewsduringplayback. Inordertofacilitatethesefreeviewpoint switchings,itisdesirableforthecompressedmultiviewvideodatatobedecodableinsev- eraldi®erentorders, correspondingto di®erentview switchingscenarios[10](Figure 3.1). As another example, new video applications which support forward and backward frame- by-frame playback can bene¯t from compression schemes that allow both forward and backward decoding [16] (Figure 3.2). Moreover,°exibledecodingcanbeusedtoachievemorerobustvideocommunications, in applications where some reference frames may be corrupted during transmission. If a compression scheme can support multiple decoding paths the decoder would be able to recover the current frame using any of several possible error-free references (Figure 3.3) [79]. 3.1.2 Flexible Decoding: Challenges State-of-the-art video coding algorithms exploit redundancy between neighboring frames to achieve compression [82]. Flexible decoding makes it di±cult to exploit this kind of interframe redundancy because decoders can choose di®erent decoding paths, each leading to a di®erent set of previously decoded frames. Thus at the time of encoding there will be uncertainty about which frames can be used to predict the current frame (as there is no guarantee that those same frames will be available at decoding time). 73 Decoding path View Time X Y 1 Y 0 Y 2 X X Y 2 Y 2 Y 0 Y 0 Y 1 Y 1 View View (a) (b) (c) Figure3.1: Multiviewvideoapplications-viewpointswitchingmayrequireacompression schemetosupportseveraldi®erentdecodingorders: (a)usersstayinthesameviewduring playback; (b), (c) users switch between adjacent views during playback. X Y 0 Y 1 time … Decoding path Figure 3.2: Forward and backward frame-by-frame playback. X Y 0 Y 1 time … Decoding path Figure 3.3: Robust video transmission using multiple decoding paths. 74 For example, in multiview video applications, depending on whether the user continues requesting the same view as in Figure 3.1(a), or switches views as in Figure 3.1(b) or Figure 3.1(c), either the previous reconstructed frame of the same view (Y 1 ) or that of another view (Y 0 or Y 2 ) would be available as predictor for decoding the current frame X. However, since it is up to the users to choose among di®erent decoding paths, the encoderwouldnotknowexactlywhichreconstructedframeswillbeavailablefordecoding X. Similarly, in a forward/backward video playback application, either the \past" or the \future" reconstructed frame will be available at the decoder to serve as the predictor, dependingonwhetherthedataisbeingplayedbackintheforwardorbackwarddirection (Figure3.2). Sinceuserscanchoosetoplaybackineitherdirection,theencoderwouldnot know which reconstructed frame will be available at the decoder. Similar scenarios can also arise in low delay video communication, where decoder feedback could be infeasible. Inthesecasestheencodermaynothaveanyinformationregardingwhichreferenceframes have arrived at the decoder error-free and would be available for decoding the current frame (Figure 3.3). In short, °exible decoding, while desirable, results in uncertainty on the predictor status at decoder. Figure 3.4 depicts the general formulation of the °exible decoding problem. When compressing an input source X (the current video frame), the encoder has access to a number of correlated sources Y 0 ;Y 1 ;:::;Y N¡1 (previously decoded video frames) to serve as predictors for encoding X. Here each Y k is associated with a possible decoding path. However, of these predictor candidates, only one will be available at the decoder de- pending on the decoding path it takes. Crucially, since the encoder does not have any information regarding the chosen decoding path, it does not know which Y k will be used 75 Encoder Decoder X X ^ Y 0 , Y 1 , Y 2 , …, Y N-1 Y 0 or Y 1 or Y 2 … or Y N-1 (Encoder does not know which one) Figure 3.4: Problem formulation for °exible decoding. Either one of the candidate pre- dictors Y 0 ;Y 1 ;:::;Y N¡1 will be present at the decoder, but encoder does not know which one. at the decoder. Our goal is to investigate coding algorithms such that the encoder can operate under this kind of uncertainty about predictor status at the decoder. In order to support °exible decoding within a conventional closed-loop prediction (CLP)framework,e.g.,motion-compensatedpredictive(MCP)videocodingsystemssuch as MPEG or H.26X, the encoder may send all the possible prediction residues fZ i ;i=0 to N ¡1g to the decoder, where Z i = X ¡Y i (following the notations in Figure 3.4), so that X can be recovered no matter which Y i is available at the decoder. Each Z i would correspond to a P-frame in these video coding standards. Note that it is indeed necessary for the encoder to communicate all the N possible prediction residues to the decoder. This is because, in CLP, a prediction residue would be \tied" to a speci¯c predictor. For example, if Y k is the available predictor at the decoder, then we can only use Z k during the decoding process to recover X without causing signi¯cant mismatch. Therefore, in the cases of predictor uncertainty, the encoder would need to send multiple prediction residues. Thus there are two potential issues with the CLP approach. First, coding performance is degraded because multiple prediction residues are included in the bitstream. Speci¯cally, the overhead to support °exible decoding tends to increase with the number of candidate predictors (or the number of possible decoding paths). Second, 76 thisapproachmaycausedrifting. Thisisbecause,inpracticalvideocompressionsystems, the encoder would send the quantized versions of Z i , ^ Z i , to the decoder. Therefore, the reconstructed sources ^ X i = ^ Z i +Y i would be slightly di®erent when di®erent Y i are used aspredictors. Driftingmayoccurwhen ^ X i isusedasreferencefordecodingfutureframes. The H.264 video compression standard has de¯ned SP- and SI-frames to support functionalities such as random access or error recovery that were originally supported by I-frames [37]. Essentially SP-frames follow the CLP coding approach we just discussed, but with modi¯cations such that ^ X i can be identically reconstructed from di®erent Y i 's using its corresponding Z i (here Z i corresponds to a primary or secondary SP-frame). This is achieved by using a di®erent prediction loop from that in conventional P-frames (e.g., SP-frames compute the prediction residue w.r.t. the quantized reconstruction in the transform domain, whereas P-frames would compute it w.r.t. the original image in the pixel domain [37]). However, this causes some penalty in coding performance, and the compression e±ciency of SP-frames is in general worse than that of P-frames [37]. To support °exible decoding, di®erent SP-frames bits (each corresponding to a di®erent Y i ) need to be generated and sent to the decoder, similar to CLP coding, and therefore, H.264 SP-frames would incur a comparable amount of overhead as that in CLP coding. It should be noted that most H.264 SP-frame applications assume the availability of feedback from the decoder (e.g., Zhou et al. [87]), so that the encoder does know which predictor is available at the decoder and transmits only one of the Z i . In short, H.264 SP-frames could be ine±cient to support °exible decoding when there is no feedback. 77 3.1.3 Our Contributions and Related Work In this chapter, we propose to apply distributed source coding (DSC) [28,56,63,84,85] to address the °exible decoding problem, where the encoder has access to all predictors, Y k , which will play the role of side information (SI) at the decoder, but there is uncertainty as to which one will be used for decoding. One of the main challenges for DSC-based applications has proven to be achieving competitive compression e±ciency [28]. To ad- dress this challenge, our proposed algorithm incorporates novel macroblock modes and signi¯cance coding into the DSC framework. This, along with careful exploitation of correlation statistics, allows us to achieve signi¯cant performance improvements. Using multiviewvideoandforward/backwardvideoplaybackasexamples, wedemonstratethat theproposedalgorithmcanoutperform,intermsofcodinge±ciency,techniquesbasedon CLP coding such as those as discussed above. Moreover, the proposed algorithm incurs only a small amount of drifting. In particular, DSC-coded macroblocks lead to the same reconstruction no matter which predictor candidate Y k is used. DSC has been studied extensively for enabling low-complexity video encoding, e.g., PuriandRamchandran[59],Aaronetal.[4]. However,therearesigni¯cantdi®erencesbe- tween low-complexity encoding and °exible decoding, as summarized in Table 3.1, which willleadustoadi®erentsolution. DSChasalsobeenproposedtoproviderandomaccess in compression of image-based rendering data/light ¯elds in Jagmohan et al. [36] and Aaron et al. [2], and of multiview video data in Guo et al. [32]. This prior work, however, assumes that the encoder has knowledge of predictor status at decoder, notably through using feedback, while in our case the encoder needs to operate with unknown predictor 78 status. Recent work by Wang et al. [79] has proposed a DSC-based approach to address the problem of robust video transmission by allowing a video block to be decoded using more than one predictor blocks. While the general philosophy is similar to ours, di®erent assumptions are made. In particular, this work assumes the encoder knows the proba- bility that each predictor will be used, as determined by the packet erasure probability (whereas we assume all predictors are equally-likely to be used). This information is exploited to reduce the coding rate. In addition, the speci¯c tools used are di®erent from those proposed here. Recent work by Naman and Taubman [52] has proposed to enhance decoding °exibility and accessibility using intra coding and conditional replenishment. Some information theoretic aspect of °exible decoding has also been studied indepen- dently by the recent work of Draper and Martinian [22], which we will brie°y discuss. However, there is no practical coding algorithm proposed in this work. Reverse playback of video speci¯cally focusing on MPEG coding algorithms was discussed in Wee and Va- sudev [81], Lin et al. [42] and Fu et al. [25]. Our previous work Cheung et al. [16] has also proposed to apply DSC to enable forward/backward video playback. The proposed algorithminthepresentchapteris,however,considerablydi®erentandsigni¯cantlymore e±cient. Among the key improvements are the introduction of macroblock modes and signi¯cance coding, a di®erent approach to exploit the correlation between source and side-information, a di®erent way to partition the input symbols and estimate the source bit's conditional probability, and a minimum MSE dequantization. This chapter is organized as follows. In Section 3.2 we discuss how DSC can address °exible decoding. A comparison of theoretically achievable performances is provided in Section 3.3. In Section 3.4 we present the proposed compression algorithm. Section 3.5 79 Table 3.1: Comparison of DSC-based low-complexity encoding and °exible decoding. DSC-based low complex- ity video encoding [4,59] DSC-based °exible video decoding Key objective Lowcomplexityvideoencod- ing for mobile video, video sensors, etc. Generate a single bitstream tosupportmultipledecoding paths for forward and back- ward video playback, multi- view video, video transmis- sion, etc. Encoding com- plexity Most target applications re- quire low complexity, real- time encoding. Notprimaryissue. Mosttar- get applications may use o®- line encoding. Encoder access to the side in- formation SI not accessible by encoder duetocomplexityconstraint. Encoder has access to all the SI candidates. However, the exact one to be used at de- coderisunknowntoencoder. discussesbrie°ysomeapplicationscenarios. Section3.6presentstheexperimentalresults and Section 3.7 concludes the work. 3.2 Flexible Decoding Based on DSC: Intuition In conventional CLP coding, the encoder computes a prediction residual Z = X ¡Y, between source X and predictor Y, and communicates Z to the decoder (Figure 3.5(a)). DSCapproachesthesamecompressionproblemtakinga\virtualcommunicationchannel" perspective [28,85]. Speci¯cally, X is viewed as an input to a channel with correlation noiseZ,andY astheoutputofthechannel(Figure3.5(b)). Therefore,torecoverX from Y, encoder would send parity information to the decoder. That is, in DSC, the encoder would communicate X using parity information. Signi¯cantly, the parity information is independent of a speci¯c Y being observed: the parity information is computed entirely 80 from X taking into account the statistics of Z 2 . In particular, what matters in the DSC approach is, analogous to data communication, the amount of parity information corresponding to the statistics of Z. Thus the decoder will be able to recover X as long as a su±cient amount of parity information has been communicated. In short, in DSC, the information communicated from encoder to decoder is independent of a speci¯c Y, in contrast to CLP, where the encoder would communicate to the decoder the prediction residue, which is tied to a speci¯c Y. TounderstandhowDSCcantackle°exibledecodingwithN predictorcandidates,con- sider N virtual channels, each corresponding to a predictor candidate Y i (Figure 3.5(c)). Each channel is characterized by the correlation noise Z i =X¡Y i . In order to recover X fromanyofthesechannels, theencoderwouldneedtosendanamountofparitysu±cient for all the channels. In particular, the encoder would need to transmit enough parity information to allow decoding of the worst-case Z i . Doing so, X can be recovered no matter which Y i is available at the decoder. Note that the encoder only needs to know thestatisticsofalltheZ i todeterminetheamountofparityinformation,andthisisfeasi- blesinceX andallY i areaccessibleatencoderinourproblemformulation. Inparticular, the encoder does not need to know which Y i is actually present at decoder. Since parity information is independent of a speci¯c Y i , the same parity information generated based on the worst case Z i can be used to communicate X no matter which Y i is available at the decoder. 2 AswillbediscussedinSection3.4,weencodethebit-planerepresentationof X usingDSC,andparity information is computed by XOR-ing a subset of bits in the bit-planes of X. 81 + X Y 0 Z 0 Dec X ^ Parity information … Enc Dec X X ^ Z=X-Y Y + X Y Z Dec X ^ Parity information (a) (c) (b) + X Y 1 Z 1 Dec X ^ Parity information + X Y N-1 Z N-1 Dec X ^ Parity information Figure 3.5: Compression of input source X: (a) CLP coding; (b) DSC from the virtual channel perspective; (c) DSC approach to the °exible decoding problem. 3.3 Theoretical Performance In this section we compare the theoretical performance of CLP and DSC in a °exible decoding scenario. We consider the compression of a discrete i.i.d. scalar source X under the scenario depicted in Figure 3.4. The predictor candidates Y i , i = 0 to N ¡1 are discrete i.i.d. scalar sources such that X = Y i +Z i , Y i ?Z i , where Z i are the discrete i.i.d. scalar prediction residues. As previously discussed, in CLP the overhead to address °exible decoding increases with the number of predictors N, while in DSC the overhead dependsmainlyontheworst-casecorrelationnoise. Speci¯cally,intheCLPapproach,all theresiduesfZ i ;i=0toN¡1gwouldhavetobesenttothedecoder,whichtheoretically would require an information rate R CLP = N¡1 X i=0 H(Z i ): (3.1) On the other hand, in the DSC approach, the information rate required to communi- cate X with Y i at the decoder is H(XjY i ), and using X = Y i +Z i and Y i ?Z i , we have 82 H(XjY i )=H(Z i ). Under the scenario of side-information uncertainty, we would need to communicate X at a rate 3 R DSC =max i H(XjY i )=max i H(Z i ): (3.2) It is clear that the encoder needs to communicate at least max i H(XjY i ) bits so that X can be recovered for whichever Y i available at decoder. To show that max i H(XjY i ) is indeed achievable, we use the source networks approach proposed by Csiszar and Korner [20,21]. A source network is a graphical abstraction of a multiterminal source coding problem involving information sources, encoders and destinations located at its vertices. Each encoder operates on the input messages from the sources connected to it, and the resulting codeword is made available to all destinations connected to the encoder through noiseless communication channels. Each destination must be able to reproduce accuratelythemessagesfromcertainspeci¯edsourcesbasedonthereceivedcodewords. In particular,thesourcenetworksinCsiszarandKorner[21]focuson(i)discretememoryless sources,(ii)graphsinwhichnoedgejoinstwosourcesortwoencodersortwodestinations, and (iii) graphs in which the destinations are required to reproduce the messages of the speci¯ed sources with small probability of error (i.e., lossless data compression). Figure3.6(a)showsanexampleofsourcenetworkrepresentingtheSlepian-Wolfproblem. For certain subclass of source network, Csiszar and Korner [21] derived the exponential error bounds which are tight in a neighborhood of the boundary of the achievable rate region. In addition, these bounds were shown to be universally attainable, i.e., they are 3 To be more precise, RCLP and RDSC are the best (minimum) achievable rates. 83 attainablebyencodersanddecodersnotdependingonthesourcestatistics. Subsequently, Csiszar [20] further showed that the error exponents can be attained universally using linear codes. To model the predictor uncertainty in °exible decoding, we use N source networks each corresponding to a di®erent predictor candidate as depicted in Figure 3.6(b). Fol- lowing from the result in Csiszar and Korner [20,21] that the Slepian-Wolf's achievable rate region is universally attainable, the same codes can be used to communicate X in any of these source networks at an achievable rate H(XjY i ). Therefore, at a rate of max i H(XjY i ), the codes can be used to communicate X regardless of which Y i is avail- able at the decoder. From (3.1) and (3.2), we have R CLP ¸ R DSC . Therefore, the DSC approach can potentially achieve better coding performance. Figures 3.7 and 3.8 depict forsomeempiricaldataH(X)(i.e., intracoding), P i H(Z i )(i.e., R CLP )andmax i H(Z i ) (i.e., R DSC ) in di®erent applications and show how these quantities could vary with the number of predictors. Note that we assume the sources (i.e., quantized DCT coe±cients of images, X, or that of residues, Z i ) consist of independent elements and estimate the coding performances by H(X) and H(Z i ), whereas in practice there could be some cor- relation exist between the elements and that could be exploited to reduce coding rates. Notethatsometheoreticalaspectof°exiblevideodecodingwasalsostudiedbyDraper andMartinian[22]. Inparticular,(3.2)wasindependentlyprovedin[22],usingadi®erent approach based on some extension of the random binning arguments. [22] also discussed the improved achievable error exponents compared to those of conventional Slepian-Wolf problem. 84 X Y ^ ^ X Y source encoder destination (a) X Y 0 ^ ^ X Y 0 X Y 1 ^ ^ X Y 1 X Y N-1 ^ ^ X Y N-1 … (b) Figure 3.6: Source networks: (a) Source network of Slepian-Wolf [21,63]. Csiszar and Korner [20,21] suggest an achievable rate H(XjY) for communicating X is universally attainable. (b) Source networks of °exible decoding with predictor uncertainty. The same universal codes can be used to communicate X in any of these networks at a rate H(XjY i ). View Switching in Multiview Video 0 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 No. of predictors Entropy (bits) H(X), i.e., Intra Sum H(Zi), i.e., CLP Max H(Zi), i.e., DSC Multiview Video (Three predictor candidates) 0 0.2 0.4 0.6 0.8 1 1.2 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Frame no. Entropy (bits) H(X), i.e., Intra Sum H(Zi), i.e., CLP Max H(Zi), i.e., DSC 3 1 0 2 4 v-2 v-1 v v+1 v+2 View (a) (b) (c) Time C C: current frame DSC CLP Intra Figure3.7: Theoreticalperformancesofintracoding,CLPandDSCina°exibledecoding scenario: Multiview video coding as in Figure 3.1. (a) Previously reconstructed frames of neighboring views are used as predictor candidates following the depicted order; (b) En- tropy of the quantized DCT coe±cients (as an estimate of the encoding rate) vs. number of predictor candidates. The results are the average of 30 frames using Akko&Kayo view 28-th. (c) Entropy of each frame in the case of three predictor candidates. 85 (a) (b) (c) Time C: current frame Robust Video Transmission 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 No. of predictors Entropy (bits) H(X), i.e., Intra Sum H(Zi), i.e., CLP Max H(Zi), i.e., DSC Video Transmission (Three predictor candidates) 0 0.5 1 1.5 2 2.5 3 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Frame no. Entropy (bits) H(X), i.e., Intra Sum H(Zi), i.e., CLP Max H(Zi), i.e., DSC C 0 1 2 … DSC Intra CLP Figure3.8: Theoreticalperformancesofintracoding,CLPandDSCina°exibledecoding scenario: Robust video transmission as in Figure 3.3. (a) Past reconstructed frames are used as predictor candidates following the depicted order; (b) Entropy of the quantized DCTcoe±cients(asanestimateoftheencodingrate)vs. numberofpredictorcandidates. The results are the average of 30 frames using Coastguard. (c) Entropy of each frame in the case of three predictor candidates. 86 3.4 Proposed Algorithms Figure 3.9 depicts the proposed video encoding algorithms to address °exible decoding based on DSC [11]. These are described next. 3.4.1 Motion Estimation and Macroblock Classi¯cation Each macroblock (MB) M in the current frame ¯rst undergoes standard motion esti- mation (and disparity estimation in the case of multiview video application) w.r.t. each candidate reference frame f i , and the corresponding motion information (one per refer- ence frame, f i ) is included in the bitstream, i.e., the encoder sends N motion vectors to the decoder. Denote A i the best motion-compensated predictor for M obtained in f i . If the di®erence between M and A i is su±ciently small, M may be classi¯ed to be in skip mode w.r.t. f i (Figure 3.9). In that case, since the encoder can skip some prediction residues and the encoder does not need to communicate all N residues, the overhead of including multiple prediction residues using CLP could be small. Speci¯cally, in our cur- rent implementation, each macroblock M can be in either skip mode or a non-skip mode. A macroblock will be encoded using the skip mode when, out of all of the N residue blocksbetweenM andA i , 0·i·N¡1, atleastoneoftheresidueblocksisazeroblock after quantization (i.e., all the quantized transform coe±cients in the block are zero). In skip mode M is encoded using conventional CLP coding (similar to standard H.26X algorithms) w.r.t. the candidate reference frames which do not have skipping. However, majority of the macroblocks will be classi¯ed into the non-skip mode and be encoded using DSC following the steps discussed in the next section. 87 Note that choosing between CLP and DSC for a given macroblock can be achieved usingrate-distortion(RD)basedmodeselection(asinH.264): TheRDcostsofCLPand DSCarecomputedandtheoneachievingtheminimumRDcostisselected. SuchRDop- timizedmodedecisionalgorithmcanachieveabettercodingperformance, attheexpense of requiring higher encoding complexity. In our comparison with H.263 (Section 3.6) we did not use this RD optimized mode decision. As will be discussed, we implemented our proposed algorithms mainly based on H.263 coding tools. Motion estimation/ MB classification f 0 , f 1 , f 2 , …, f N-1 Motion vector information DSC CLP Output bitstream M Input current frame Mode decision skip non-skip Candidate reference frames: Figure 3.9: Proposed encoding algorithm to encode an macroblock M of the current frame. DCT M X Quantization W Direct coding Significance coding Slepian-Wolf coding b(l) b(l) s Y i DCT Model estimation i Parity information Predictor from f i Figure 3.10: Encoding macroblock M using DSC. 88 3.4.2 Direct Coe±cient Coding (DCC) For those macroblocks M to be encoded with DSC, we ¯rst apply standard 8£8 DCT to the pixel data to obtain the vector of transform coe±cients. Denote X the DCT coe±cient. We then quantize X to obtain the quantization index W (Figure 3.10). This is similar to intra-frame coding in standard H.26X algorithms. Denote Y i the DCT coe±cient in A i corresponding to W (recall A i is the best motion-compensated predictor fromeachf i ). WecompressW byexploitingitscorrelationwiththeworstcaseY i ,sothat itcanberecoveredwithanyY i thatmaybepresentatthedecoder. Speci¯cally, basedon a correlation model between W and Y i (to be discussed in Section 3.4.5), the encoder can estimate the coding rates needed to communicate W when Y i is available at the decoder. Then the encoder communicates W by sending an amount of parity information equal to the maximum of these estimated coding rates. Since both W and Y i are available at the encoder in our problem, the correlation model can be readily estimated. The quantized values of the K lowest frequency DCT coe±cients (along a zig-zag scan order) are encoded with a direct coe±cient coding (DCC), and for the rest we use a signi¯cant coe±cient coding (SCC). In DCC, we form the k-th frequency coe±cient vector by grouping together the k-th (0· k · K¡1) frequency quantized coe±cients, W k , from all the 8£8 blocks in a frame (except those in skip modes). Then each of these vectors is converted into a raw bit-plane representation, and the bit-planes are passed to a Slepian-Wolf (SW) coder, where inter-frame correlation is exploited to compress the bit-planes losslessly. Note that DCC would lead to L k bitplanes for the k-th frequency coe±cients vector, where L k =dlog 2 (maxjW k j+1)e. 89 3.4.3 Signi¯cant Coe±cient Coding (SCC) The quantized values of the k-th highest frequency coe±cients, k ¸ K, are encoded using SCC. Speci¯cally, we ¯rst use a signi¯cance bit s to signal if the quantized value of a coe±cient is zero (s = 0) or not (s = 1), so that distributed coding is used to communicate only the value of non-zero coe±cients. The signi¯cance bits of all the k-th frequency coe±cients in the frame (one for each 8£8 block, except those in skip modes) are grouped together to form a signi¯cance bit-plane to be compressed by the SW coder. On the other hand, the non-zero coe±cients are grouped together to form coe±cient vectors where all the DCT frequencies are combined, as we found that the correlation statistics of non-zero coe±cients are similar at di®erent frequencies. SCC is introduced as an alternative to DCC to reduce the number of source bits to be handled in SW coding. Speci¯cally, assume DCC leads to L k bitplanes for the k-th frequency coe±cient vector. Therefore, each k-th frequency coe±cient contributes L k source bits in DCC, regardless of whether the coe±cient is zero or not. On the other hand, with SCC, a zero coe±cient contributes one source bit (signi¯cance bit), while a non-zero coe±cient contributes approximately 1+L k bits. If p k is the probability that the k-th frequency coe±cient will be zero, then the expected number of source bits using SCC is 1£p k +(1+L k )£(1¡p k ); (3.3) 90 and SCC can lead to rate savings (compared with DCC) if the expected number of bits using SCC, i.e., (3.3), is less than that of DCC, i.e., L k , or equivalently if p k > 1 L k (3.4) holds. Therefore, SCC can achieve rate savings when coe±cients are likely to be zero. In the experiment, we use K = 3 (where SCC starts) determined using (3.4) and some statistics of the video sequences. 3.4.4 Bit-plane Compression Bit-planes extracted from the K coe±cient vectors produced in DCC along with those produced in SCC are compressed by a SW coder, starting from the most signi¯cant bit- planes. Denote a bit in the bit-plane at l-th level of signi¯cance by a binary r.v. b(l), wherel =0 correspondingtotheleastsigni¯cantlevel. Thatis, b(l)is thel-thsigni¯cant bit in the quantization index W. A binary r.v. b(l) is to be compressed using Y i and decoded bits b(l+1);b(l+2);::: as side information. Speci¯cally, this is performed by a low density parity check (LDPC) based SW encoder, which computes the syndrome bits from the original bit-planes and sends them to the decoder [43]. 3.4.5 Model and Conditional Probability Estimation SW decoding needs the conditional probability p(b(l)jY i ;b(l+1);b(l+2);:::) estimated from SI to aid recovering b(l). The probability can be estimated as follows. The encoder 91 estimates the conditional p.d.f. f XjY i (xjy i ) for each coe±cient vector and for each can- didate predictor. Assuming a system model X = Y i +Z i , and under the assumption of independence of Y i and Z i , we have f XjY i (xjy i )=f Z i (x¡y i ) (3.5) We assumeZ i is Laplacian distributed, i.e., f Z i (z i )= 1 2 ® i e ¡® i jz i j , and estimate the model parameters ® i at the encoder using maximum likelihood estimation (MLE) and send them to the decoder. Note that in the °exible decoding problem, the encoder can access all the candidate SIs. Therefore, the model parameters can be readily estimated. This is not the case in typical DSC applications, where there are constraints on accessing side-information at the encoder making model estimation a non-trivial problem [15]. Givenallthemodelparameters® i ,thedecodercanestimatetheconditionalprobabil- ity for any particular Y i available at decoder using the following procedure (Figure 3.11). Denote ~ W the numerical value of the concatenation of the sequence of the decoded bits b(l+1);b(l+2);:::, i.e., ~ W = b(l+1)£2 0 +b(l+2)£2 1 +:::. Given the decoded bits, the quantization index W can range only from ~ W £2 l+1 to ~ W £2 l+1 +2 l+1 ¡1. When 92 W 2 [W r ;W s ], b(l) = 0, and when W 2 [W t ;W u ], b(l) = 1, where W r ;W s ;W t ;W u are given by (in the cases when ~ W ¸0): W r = ~ W £2 l+1 ; W s = ~ W £2 l+1 +2 l ¡1; W t = ~ W £2 l+1 +2 l ; W u = ~ W £2 l+1 +2 l+1 ¡1: (3.6) Equations for ~ W < 0 are similar. Therefore, the decoder can estimate the probabilities that b(l) will be zero and one by integrating f XjY i (xjy i ) over the intervals [X r ;X s ] and [X t ;X u ]respectively, where[X r ;X s ]istheinversequantizationmappingof[W r ;W s ], and [X t ;X u ] is that of [W t ;W u ]. Note that each Y i exhibits di®erent levels of correlation with respect to b(l). There- fore, the l-th signi¯cant bitplane comprised of bit b(l) and extracted from the current video frame would require a di®erent number of syndrome bits to be recovered, when a di®erent candidate reference frame f i is available at the decoder. Denote R i this number of syndrome bits. To ensure that the l-th signi¯cant bitplane can be recovered with any of the candidate decoder reference f i , the encoder could send R=max R i syndrome bits to the decoder. By doing so, each bit-plane and hence bit b(l) can be exactly recovered no matter which candidate reference is available at the decoder. Therefore, W can be losslessly recovered and X reconstructed to the same value when any of the Y i is used as predictor. This eliminates drifting in DSC-coded macroblocks. 93 W r W s W t W u W * 2 l+1 ~ X r X s X t X u X W Yi b(l)=0 b(l)=1 p(b(l)=0) p(b(l)=1) p(W | Y i ) f(X | Y i ) Inverse quantization Figure 3.11: Estimate the conditional probability p(b(l)jY i ;b(l+1);b(l+2);:::). 3.5 Application Scenarios While the main objective of the present chapter is to propose coding techniques to facil- itate °exible decoding, we will brie°y discuss in this section some application scenarios that can bene¯t from the coding structures enabled by the proposed tools. One of these scenarios could be storage type applications, where the video data is pre-encoded, and theentirebitstreamismadeavailabletousersthroughsomestoragemedia, e.g., DVD.In theseapplications,theproposedcodingtechniquescanleadtonewcodingstructuresthat would require less decoding complexity, i.e., a smaller number of computations and less amount of memory bu®ering. As an example, for multiview video applications, where conventionally an individual view could be encoded independently (i.e., simulcast) or multiple views could be compressed jointly, viewpoint switching would require recon- struction of several additional video frames (predictor frames) which the users did not request (Figure 3.12). In other words, the decoder would have to decode some extra frames that would not be displayed. The proposed techniques can, on the other hand, reducethisextradecodingoverhead, sincetheycouldleadtocodingstructureswherethe 94 current frame can be recovered using any one of the several predictor candidate frames, and no additional processing would be needed as long as some of these candidates were previously requested by the users and hence are already decoded. Another application scenario that can bene¯t from the proposed coding techniques could be client-server video streaming applications, where the server would send to the client(decoder)onlypartofthebitstreamneededfordecodingthevideoframesrequested by users. In this case, the proposed techniques can reduce, in addition to decoding complexity, the amount of information sent to the client. This is possible because some of the predictor candidates may have been previously requested by the users and have already been communicated to the client. Therefore, the current frame can be decoded without requesting extra information. Note that in client-server applications a DSC based approach to °exible decoding could lead to a smaller total amount of pre-encoded data stored in the server, while a CLP based approach may result in less data being transmitted to the client, since in this case the client could inform the server about the predictor status and only the prediction residue matching the available predictor would need to communicated to the client. 3.6 Experimental Results and Discussion 3.6.1 Viewpoint Switching in Multiview Video Coding Thissectionpresentstheexperimentalresults. We¯rstdiscussourexperimentsonmulti- viewvideocoding(MVC).Herewegeneratecompressedmultiviewbit-streamsthatallow 95 I I P P P P P Decoding path P P P View Time v-1 v (a) I I P P S P S P P P View Time v-1 v (b) Figure 3.12: Di®erent coding structures in multiview video application. Shaded video frames are those that needed to be decoded to enable the depicted decoding path. (a) Simulcast; (b) New coding structure enabled by the proposed tools, where \S" denotes those video frames that can be decoded using any one of the predictor candidates. Note that in simulcast, decoder needs to decode some extra video frames that users did not request, e.g., the ¯rst three frames in the v-th view in the depicted example. In client- server applications, bitstream corresponding to these extra frames is also needed to be sent to the client. 96 switching from the adjacent views as in Figure 3.1. Therefore, there are three predic- tor candidates. We compare the coding performance using the following algorithms to generate the bit-stream: (i) intra coding using H.263 I-frames; (ii) CLP approach with each of the three residues encoded using H.263 P-frames; (iii) proposed DSC-based algo- rithm, with H.263 half-pel motion estimation and quantization. Since we implement all theschemesusingthesame(H.263)codingtools(e.g., half-pixelaccuracymotionestima- tion) the comparison is fair. We compare the schemes using MVC sequences Akko&Kayo and Ballroom, which are in 320£240 and encoded at 30fps and 25fps respectively. Fig- ures 3.13 and 3.14 show the comparison results. As shown in the ¯gures, the proposed algorithm outperforms CLP and intra coding, with about 1dB gain in the medium/high picture quality range (33-36dB). We also compare the approaches in terms of drifting by simulating a scenario where viewpoint switching from the (v¡1)-th view to the v-th view occurs at frame number 2. Figure 3.15 compares the PSNR of the reconstructed frames within the GOP with that of the non-switching case, where the v-th view is being played back throughout the GOP. As shown in the ¯gure, while CLP (using P-frame) may cause considerable amount of drifting, the proposed algorithm is almost drift-free, since the quantized coe±cients in DSC coded macroblock are identically reconstructed. We also evaluate how the coding performance of the proposed system scales with the numberofpredictorcandidates. Inthisexperiment,thetemporallyandspatiallyadjacent reconstructed frames are used as predictor candidates following the order depicted in Figure 3.16. As shown in Figure 3.17, the bit-rate of the DSC-based solution increases at a much slower rate compared with that of its CLP counterpart. This is because, with the DSC approach, an additional predictor candidate would cause a bit-rate increase (when 97 coding a bit-plane) only if it has the worst correlation among all the predictor candidates (w.r.t. that bit-plane). Akko&Kayo (320x240, 30fps, GOP=30) 32 33 34 35 36 37 38 0 500 1000 1500 2000 2500 3000 Bit-rate (kbps) PSNR-Y (dB) Intra CLP Proposed Figure 3.13: Simulation results of multiview video coding: Akko&Kayo. The results are the average of the ¯rst 30 frames (at the switching points) from 3 di®erent views (views 27th-29th) arbitrarily chosen from all the available views. 3.6.2 Forward/Backward Video Playback Wethendiscussourexperimentsonforward/backwardplaybackapplication, wherethere are two predictor candidates (Figure 3.2). We compare our proposed algorithm with a CLP approach where both forward predicted H.263 P-frames and backward predicted H.263 P-frames are included. As discussed, such approach may incur drifting, since in generalthereconstructedforwardandbackwardpredictedP-framesarenotidentical. We compare the schemes using sequences Coastguard and Stefan, which have considerable amounts of motion and picture details. As shown in Figures 3.18 and 3.19, the proposed algorithm outperforms CLP and intra coding. We also show the results of \normal" H.263 inter-frame coding (i.e., including only forward prediction residue) with the same 98 Ballroom (320x240, 25fps, GOP=25) 31 32 33 34 35 36 37 0 500 1000 1500 2000 2500 3000 3500 Bit-rate (kbps) PSNR-Y (dB) Intra CLP Proposed Figure 3.14: Simulation results of multiview video coding: Ballroom. The results are the average of the ¯rst 30 frames (at the switching points) from 3 di®erent views (views 3rd-5th) arbitrarily chosen from all the available views. GOP sizes. Note that inter-frame coding cannot support °exible decoding. The results are shown here for reference only. We also compare the variousapproachesin terms of drifting with the following exper- iment: in forward decoding, a backward predicted frame is used for frame number 1 and as a reference for decoding the following frame. This is similar to what would happen when decoding direction switches from backward to forward. As shown in the results in Figure 3.20, the proposed algorithm incurs a negligible amount of drifting. 3.7 Conclusions and Future Work We have proposed a video compression algorithm to support °exible decoding, based on DSC. The proposed algorithm integrates macroblock mode and signi¯cance coding to improve coding performance. Simulation results using MVC and forward/backward video playback demonstrate the proposed DSC-based algorithm can outperform the CLP 99 CLP (Akko&Kayo) 32 33 34 35 0 10 20 30 Frame no. PSNR-Y (dB) without switching switching (a) DSC (Akko&Kayo) 32 33 34 35 0 10 20 30 Frame no. PSNR-Y (dB) without switching switching (b) Figure 3.15: Drifting experiment using Akko&Kayo view 28th: (a) CLP; (b) DSC. GOP size is 30 frames. Note that with DSC, the PSNR are almost the same in the switching and non-switching cases. 1 0 2 3 4 5 7 6 v-1 v v+1 Figure 3.16: Scaling experiment: in this experiment, the temporally and spatially adja- cent reconstructed frames are used as predictor candidates following the depicted order. approach, while incurring only a negligible amount of drifting. Future work includes investigating improved model estimation methods. Acknowledgment TheworkwassupportedinpartbyNASA-JPL.TheauthorswouldliketothankMatthew Klimesh of NASA-JPL for pointing out the theoretical results by Csiszar and Korner. 100 Scaling Experiment (Akko&Kayo) 0 20000 40000 60000 80000 1 2 3 4 5 6 7 8 9 No. of Predictor Candidates Bits per frame Intra CLP Proposed Figure3.17: ScalingexperimentusingsequenceAkko&Kayoview29th. PSNRofdi®erent schemes are comparable - Intra: 35.07dB, CLP: 34.78dB, DSC:34.79dB. Coastguard (CIF, 30fps, GOP=15) 31 32 33 34 35 36 37 38 39 0 1000 2000 3000 4000 5000 6000 7000 Bit-rate (kbps) PSNR-Y (dB) Intra CLP Proposed H.263 Inter-frame Figure 3.18: Simulation results of forward/backward video playback: Coastguard. Re- sults are reported for the average of the ¯rst 30 frames. Note that H.263 inter-frame coding cannot support °exible decoding - the results are shown here for reference only. 101 Stefan (CIF, 30fps, GOP=15) 33 34 35 36 37 38 39 40 0 2000 4000 6000 8000 10000 Bit-rate (kbps) PSNR-Y (dB) Intra CLP Proposed H.263 Inter-frame Figure 3.19: Simulation results of forward/backward video playback: Stefan. Results are reportedfortheaverageofthe¯rst30frames. NotethatH.263inter-framecodingcannot support °exible decoding - the results are shown here for reference only. CLP (Stefan, QP=12) 29 30 31 32 33 0 5 10 15 20 Frame no. PSNR-Y (dB) without switching switching (a) DSC (Stefan, QP=12) 29 30 31 32 33 0 5 10 15 20 Frame no. PSNR-Y (dB) without switching switching (b) Figure 3.20: Drifting experiment using Stefan sequence: (a) CLP; (b) DSC. The ¯gure shows the PSNR of the reconstructed frames in the ¯rst GOP. Note that with DSC, the PSNR are almost the same in the switching and non-switching cases. 102 Chapter 4 Correlation Estimation for Distributed Source Coding Under Rate and Complexity Constraints Using Sampling-based Techniques 4.1 Introduction 4.1.1 Motivation Distributed source coding (DSC) [19,63,84] studies independent encoding and joint de- coding of correlated sources, for which a correlation model is known at the encoder. Central to DSC is the information about existing correlation between the source and the side-information (SI) available at the decoder. Speci¯cally, correlation information refers to the joint p.d.f. between the source and the SI. This correlation information plays several important roles in practical distributed source coding applications. First, many applications require correlation information at the encoder to determine the en- coding rate. Essentially, the encoders use the correlation information to determine the number of cosets for partitioning the input space, so that error-free decoding can be 103 achieved [56,59]. Second, for many practical Slepian-Wolf coding schemes that employ channelcodinganditerativedecoding, correlationinformationisrequiredtoinitializethe decodingalgorithmsbyprovidinglikelihoodestimatesforthesourcebits[43]. Third, cor- relationinformationcouldbeusedatthedecodertodeterminetheoptimalreconstruction given the output of the Slepian-Wolf decoder and side information [56]. In this chapter we focus on estimating the correlation for the purpose of determining the encoding rate. The results may, however, be useful for the other two cases as well. 4.1.2 Correlation Information Models in Practical DSC Applications At the heart of practical DSC applications is a lossless Slepian-Wolf (SW) codec, which plays a role similar to that of entropy codecs in conventional image/video compression. More precisely, the Slepian-Wolf encoder compresses a discrete i.i.d. source, which could be losslessly recovered at the decoder with the aid of the correlated side-information (SI) providedthatenoughcompressedbitshavebeensent. DetailsonSWcodingcanbefound in [28,58,85]. The problem to be investigated in this chapter is that of determining the amountofinformationcommunicatedtothedecoder,i.e.,theencodingrate. From[63],an ideal rate to achieve a vanishing probability of decoding error is the conditional entropy of the input source given the SI 1 . The conditional entropy, in turn, depends on the correlation information between the source and SI. Thus rate allocation in DSC can be performed by solving the associated correlation estimation problem. Various types of correlation models (e.g., binary valued p.m.f. or continuous valued p.d.f.) may be 1 Practical SW coders would add a small margin to this ideal rate to account for using ¯nite-length input blocks in SW coding. 104 exploited to compress the data depending on the speci¯c SW coding algorithms and applications. We illustrate some of them below. ² SW coding with binary correlation model. These are cases where some form of binary valued joint p.m.f. is used to relate the source and SI (which are not necessarily binary) for SW coding. As an example, Figure 4.1(a) depicts SW cod- ing with a binary correlation structure similar to that used in [14,47,72], etc. The continuous valued i.i.d. source X is mapped via scalar-quantization (or rounding) to a discrete source ~ X. Then ~ X is mapped to a bit-plane representation, and each extracted bit-plane b ~ X (l), l = 0 ::: L¡ 1, is compressed independently by the SW encoder. Here b ~ X (0) denotes the least signi¯cant bit-plane (LSB). The SW decoder recovers b ~ X (l) with bit-plane b ~ Y (l) extracted from the quantized version of the correlated source Y as side information. By exploiting the joint binary p.m.f. p(b ~ X (l);b ~ Y (l)), each extracted bit-plane can be compressed to a rate as low as H(b ~ X (l)jb ~ Y (l)). To determine this encoding rate, one can estimate the joint bi- nary p.m.f. p(b ~ X (l);b ~ Y (l)), and derive the conditional entropy from the estimated p.m.f. Note that independent compression of each bit-plane facilitates e±cient rate scalability, which is highly desirable in some imagery applications. For some appli- cations it is also su±cient to achieve satisfactory coding performance by exploiting the correlation between corresponding bit-planes. ² SW coding with continuous correlation model. These are cases where a continuous valued joint p.d.f. is exploited for SW coding. Figure 4.1(b) illustrates an example similar to that proposed in [59]. The continuous i.i.d. source X is 105 ¯rst scalar-quantized, and the quantized input ~ X is directly compressed by the SW encoder. The decoder uses Y to recover ~ X. By exploiting the joint p.d.f. p(X;Y), the SW encoder can compress ~ X to a rate as low as H( ~ XjY). The encoder may determine this encoding rate from p(X;Y). 4.1.3 Correlation Estimation in Distributed Source Coding While both the underlying theory and the recently proposed code construction algo- rithms [1,5,27,43,44,56] assume correlation information to be available exactly at the encoder, in many practical DSC applications, the correlation information may not be availablebeforehand,andonewouldneedtoestimateitduringtheencodingprocess[59] 2 . The accuracy of this correlation estimation has a direct impact on the performance of DSC-based systems. While under-estimating the correlation may result in a penalty in coding e±ciency, over-estimation can cause decoding error: in this case candidate de- coded values within a given coset would be too close to each other, so that it is no longer possible to guarantee that they can be disambiguated without error by using the SI, leading to degradation in reconstruction quality. Estimating the correlation information at the encoder is a non-trivial problem due to the computational and communication constraints imposed by the target applications, i.e., often, correlation estimation in DSC has to be performed under rate and complexity constraints. Forexample,whenDSCisappliedtocompresswirelesssensormeasurements, it is important to limit the amount of data exchanged between nodes during correlation 2 Note that in some cases, lack of an accurate correlation model is acceptable if there exists feedback from decoders to encoders [1], but this leads to an increase in overall delay. 106 b (l) SW Dec X SW Enc ~ R ≥ H( | ) Q B X ~ X ~ X Q -1 B -1 ^ ^ B Q Y Y ~ b (l) Y ~ b (l) X ~ b (l) Y ~ b (l) X ~ ^ X ~ (a) SW Dec X SW Enc ~ R ≥ H( X | Y ) Q X ~ X Q -1 X ~ ^ ^ Y ~ (b) Figure 4.1: (a) An example of Slepian-Wolf coding exploiting binary correlation. Boxes \Q"and\Q ¡1 "denotequantizationandinversequantizationrespectively. Boxes\B"and \B ¡1 " denote binarization and the inverse respectively. (b) An example of Slepian-Wolf coding exploiting continuous correlation. 107 estimation, in order to minimize the communication cost. Similarly, in applications such as video coding, source data needed to estimate the correlation is present at the encoder, but it is desirable to limit the computation resources devoted to this estimation [3,59]. 4.1.4 Our Contributions and Related Work In this work we study correlation estimation strategies subject to rate and complexity constraints, and their impact on coding e±ciency in a DSC framework. Our proposed algorithms are based on the observation that for many DSC applications side informa- tion is actually available at the encoder, but the encoder may not make use of this side information because of the associated communication or computational cost. As an ex- ample, in low complexity distributed video coding (DVC) [1,49,59], past frames that will be used as side information are available at the encoder, but the computation cost involved in performing motion estimation may be signi¯cant. Other examples can be found in distributed multiview image/video compression [31,34,88], wireless sensor data compression [55,68,85], etc. Focusing on these applications, we propose sampling-based algorithms to estimate the correlation information. Sampling is a well-established con- ceptinstatisticstoinferthepropertiesofapopulationfromasmallamountofindividual observations [50]. To see how sampling applies to DSC consider these two examples: ² When compressing distributed sensors measurements, X, a node can request sam- ples, Y, from the neighboring node in order to estimate the correlation p(X;Y). The number of samples exchanged should be small, however, to keep the commu- nications overhead low. 108 Current frame X Y Previously reconstructed frame Figure4.2: Applysamplingtodistributedvideocoding: whenencodingthecurrentframe, we randomly sample n macroblocks of the current frame to undergo motion estimation, with n being much smaller than the total number of macroblocks. By doing so, we would obtain n pairs of samples (X;Y), where X is the current macroblock and Y is the corresponding motion-compensated predictor. From these n sample pairs the encoding ratecanbeestimated. Notethatherethesamplingcostassociatedwitheachdatasample is not primarily due to data exchange but the computation in motion search. ² In some DVC applications, the encoding rate depends on the joint p.d.f. between blocks in current frame (X) and the corresponding motion-compensated predictor blocks (Y) from reference frame [3,11,59]. Encoder can employ a sampling-based algorithm, where only a small portion of current frame's blocks would undergo mo- tion estimation, so that the joint p.d.f. can be estimated from sample pairs (X;Y), formed with a given current block and the corresponding predictor (Figure 4.2). Since motion search is required to acquire a sample pair (X;Y) each sampling op- erationwouldrequiresomecomputationalcost. Therefore, theproportionofblocks undergoing motion estimation should be small. Sampling,however,leadstoestimationerrorsandwillhaveanimpactoncodinge±ciency. Analyzing this impact is a key focus of our work. Since DSC applications may exploit various types of correlation models, it is di±cult to address all of them. Therefore, we focus on one particular model in this chapter and brie°y discuss how to analyze other models. Speci¯cally, focusing on situations where a binary correlation is estimated for 109 SW coding (as discussed in Section 4.1.2) and correlation is estimated via sampling, this chapter makes the following contributions: ² Ratepenaltyanalysisincompressionofasinglebinarysource. Weanalyze how estimation error in sampling impacts the coding performance of a DSC system when encoding a single binary source. We derive an expression to quantify how the number of samples relates to the increase in the encoding rate due to estimation error, taking into account that over-estimation can lead to signi¯cant increases in distortion in DSC applications (due to decoding error). ² Rate penalty analysis and sample allocation in compression of multiple binarysources. Wethenextendtheratepenaltyanalysistosystemswithmultiple binary input sources, where each of them is compressed independently using SW coding with its corresponding side-information. Based on the analysis, we propose analgorithmtodeterminesamplingratestoassigntoeachbinarysourcesothatthe overall penalty in coding performance (due to estimation error) can be minimized, subject to a constraint on the total number of samples. ² Model-based estimation in compression of a continuous source. We then studyencodingofacontinuousinputsource. Weconsiderscenarioswherebit-planes are extracted from a continuous input source and each bit-plane is compressed via SW coding, e.g., as in [3,14,48,86]. We propose a model-based method where the continuous-valued joint p.d.f. of the source and SI is ¯rst estimated via sampling of continuous valued inputs, and then the bit-plane level (binary) correlation is derived from the estimated model. This is in contrast to a direct approach where 110 the bit-plane correlation is estimated through exchanging binary samples from the extracted bit-planes. We demonstrate that the model-based method can achieve better estimation accuracy than the direct approach provided that the continuous- valued model is su±ciently accurate. ² Model-based estimation for structured bit-planes. We also describe how model-basedestimationcanbeextendedtothecaseswherebit-planesareextracted from continuous input data using more sophisticated methods. For example, in wavelet-based applications, bit-planes are separated into di®erent \sub-bit-planes" depending on the magnitude (signi¯cance) of the transform coe±cients. A concrete exampleofthis,whichweconsiderinthischapter,isthatofbit-planesgeneratedby set-partitioningasinSPIHT[61]. Thistypeofbit-planegenerationimprovescoding e±ciency, but complicates the model-based correlation estimation process, as will be shown. Using a practical system as an example, we demonstrate that model- based estimation can lead to an additional advantage of e±cient implementation in these types of DSC applications. While this chapter focuses on cases where a binary correlation is used for SW coding, some of the proposed ideas may be extended to other types of correlation (details in Section 4.7). Several methods have been proposed for correlation estimation problems in DSC. In DVC, low-complexity schemes to classify macroblocks into di®erent correlation classes have been proposed [59], while other methods use a feedback channel to convey correla- tion information to the encoder [4]. For robust video transmission, recursive algorithms 111 havebeenproposedtoestimatethecorrelationbetweenthenoise-freeandnoise-corrupted reconstructions [26,78]. In our prior work, correlation estimation was performed by di- rect bit-plane comparisons between the source and an approximation of the decoder side-information [67]. This chapter proposes, however, several novel sampling-based cor- relation estimation algorithms applicable to a range of DSC applications, and presents the performance analysis. Ageneralapproachformodel-basedestimationforDSCwas¯rstproposedinourwork in[15]. Theworkfocusedonthesimplecaseswherebit-planesaregenerateddirectlyfrom the the binary representation of the sources. A recent work [33] has proposed a similar idea to show the advantage of Gray code representations, but does not discuss the exact algorithm to estimate the correlation. This chapter is organized as follows. In Section 4.2 we present the rate penalty analysis. InSection4.3weproposethesampleallocationalgorithmtominimizetheoverall ratepenalty. InSection4.4weproposethemodel-basedestimation,andinSection4.5we extend the model-based estimation to cases where bit-planes are extracted based on the signi¯cance of the data. Section 4.6 presents experiments with real image compression applications. Section 4.7 discusses how to extend the rate penalty analysis to other correlation models. Finally, Section 4.8 concludes the work. 4.2 Single Binary Source: Rate Penalty Analysis In this section we analyze how estimation error in sampling a®ects the compression per- formance of a DSC system in the case of a single binary source. Speci¯cally, given that 112 n samples are used to estimate the correlation, we derive the corresponding increase in coding rate due to estimation error. As discussed, in most DSC systems there is a com- munication or computational cost associated with each acquired sample. Therefore, our results represent the trade-o® between communication/computational cost and coding e±ciency. 4.2.1 Problem De¯nition We focus on the cases where a binary correlation is exploited in SW coding. Consider compressing a binary source b X with another binary SI b Y available at the decoder. We assume fb X ;b Y g i.i.d. » p(b X ;b Y ). To simplify the analysis, we assume (i) b X is equiprobable, i.e., Pr[b X = 0] = 0:5, and (ii) the correlation is symmetric, i.e., Pr[b Y = 1jb X = 0] = Pr[b Y = 0jb X = 1] = p, where 0· p· 0:5 is the crossover probability for the sources. With these assumptions, the lower bound in the lossless encoding rate of b X with b Y available at the decoder is [19,63] H(b X jb Y )=H(p): (4.1) Therefore, encoder can estimate the lossless compression rate of b X by estimating p throughnrandomsamplespairsfb X ;b Y g. De¯netheestimationerror(4p)(n)= ^ p(n)¡p, where ^ p(n) is some estimation of p 3 . When4p is negative, this could lead to a decoding error. This is because the crossover probability is under-estimated and so the number of cosets chosen for encoding may be too small. Instead, when 4p ¸ 0, correct decoding 3 Note the dependency of ^ p (and other quantities) on n. Precisely, the p.d.f. of ^ p is a function of n, as will be discussed. 113 and lossless recovery of b X can be guaranteed theoretically, but there will be a penalty in compressione±ciency. Thisdi®erenceinbehavior(decodingerrorvs.codingpenalty)will leadustoproposeabiasedestimatorsuchthat4p¸0withhighprobability(discussedin the next section). On average, the coding penalty, in bits/sample, is given by (assuming no decoding error): (4H)(n) = H(^ p(n))¡H(p): (4.2) As will be discussed,4H is indeed a random variable, since we randomly choose samples of b X and b Y for estimating H(^ p). Our focus is to derive the probability density of (4H)(n). 4.2.2 Correlation Estimation For encoding b X we need to estimate ^ p by acquiring n random samples of b Y . In di®er- ent DSC applications, encoder may obtain the samples in di®erent ways. For example, in a sensor application, the encoder of b X may request samples of b Y from a spatially- separated sensor node. In distributed video coding, the encoder may perform motion estimation to generate samples of b Y . Common to most of the applications is that com- munication/computational costs will be incurred in acquiring the samples. Therefore, it is desirable to keep n small. By inspecting the n pairs fb X ;b Y g now available at the encoder, an estimate of p can be computed to determine the encoding rate for b X . A naive estimate for p is S(n) n , where S(n) is the number of inspected samples such that b X 6=b Y . Since S is essentially 114 the summation of n i.i.d. Bernoulli random variables with success probability p, S is a binomiallydistributedr.v. withmeannpandvariancenp(1¡p). Therefore,forsu±ciently large n, S(n) n »N(p;¾ 2 ) ; ¾ = p p(1¡p)=n: (4.3) Asaconsequence,ifweuse S n astheestimatorthereis50%probabilityofunder-estimation of p. Therefore, we opt to use a biased estimator given by ^ p(n) = S(n) n +z !=2 ¾: (4.4) That is, we bias S n by a factor proportional to ¾. By choosing the constant z !=2 we can controlpreciselytheprobabilityofunder-estimationof p, e.g., ifz !=2 =1:64,Pr[^ p<p]= !=2=0:05. We choose this biased estimator to minimize the risk of decoding failure, at the expense of some encoding rate penalty. 4.2.3 Rate Penalty Analysis Using (4.4) as the estimator, we analyze how n relates to the p.d.f. of4H. From (4.4), (4p)(n) = ^ p(n)¡p = S(n) n +z !=2 ¾¡p»N(z !=2 ¾;¾ 2 ): (4.5) 115 Expanding H(:) at p by a Taylor series and using the de¯nition of4H in (4.2), (4H)(n) = H 0 (p)4p+ H 00 (p)(4p) 2 2! +::: ¼ H 0 (p)(4p)(n): (4.6) The approximation holds when 4p is su±ciently small. The p.d.f. of 4H can then be derived as: (4H)(n) » N(H 0 (p)z !=2 ¾;(H 0 (p)) 2 ¾ 2 ); (4.7) where H 0 (p) = ln( 1 p ¡1) and ¾ is given by (4.3). (4.7) relates n to the density of 4H. Using (4.7), one can readily compute some statistics for 4H, e.g., E[(4H)(n)]. Note that these statistics are functions of n. In practice, since p is unknown, ¾ is unknown when computing the estimator (4.4). A good rule of thumb is to approximate the estimator using [50] ^ p(n) ¼ S n +z !=2 r S n (1¡ S n )=n; (4.8) i.e., S n is used to approximate p in computing the estimator. The approximation would be valid when n¢ S n ¸4 and n(1¡ S n )¸4, as a rule of thumb [50]. 4.2.4 Experiment Inthissectionweassesstheaccuracyoftheratepenaltymodelproposedin(4.7). Specif- ically, we perform sampling experiments and measure 4H, and compare the empirical 116 distribution of 4H with (4.7). Video data is used for the experiments. As the binary input source (b X ), we use the raw bit-planes extracted from all (quantized) DCT coe±- cientsofagivenfrequencyinthecurrentframe,whileasSI(b Y )weusetherawbit-planes ofsamesigni¯canceextractedfromthecorresponding(quantized)DCTcoe±cientsofthe motion-compensated predictors in the reference frame 4 . Therefore, the dimension of the source, M, is equal to the number of DCT blocks in a frame. We then sample n (< M) pairsoffb X ;b Y grandomly,andanestimation ^ pisthencomputedaccordingto(4.8)from the chosen pairs. With ^ p, a single 4H can then be obtained using (4.2). The sampling experiment is repeated N E times, each time with di®erent pairs offb X ;b Y g sampled and a di®erent 4H obtained. We compare the empirical p.d.f. of 4H (with N E data) with the model in (4.7) using Kolmogorov-Smirnov (K-S) tests [50]. Table 4.1 shows the re- sulting K-S statistics at di®erent sampling rates for some bit-planes extracted from the DC coe±cients quantized at QP = 24 (H.263 quantization) in the 2nd frame of Mobile (720£576, 30 fps), with z !=2 =1:64 and N E =100. In particular, for the range of n and p where n¢p ¸ 4, K-S tests approve the hypothesis that the empirical 4H follows the modelin(4.7). Thisresultsindicatethatourproposedmodelcanadequatelycharacterize thedistributionoftheratepenaltyforpracticalproblemswithsu±cientlylarge n¢p, e.g., n¢p¸4. Notethatasamplingsizeof128correspondstoabout2%ofdatafora720£576 video. Additional results using di®erent data lead to similar conclusions. 4 Note that it is common for DVC systems to exploit correlation between the DCT coe±cients in the current frame and the corresponding coe±cients in the motion-compensated predictors in the reference frame, e.g., [3,11]. 117 Table4.1: Kolmogorov-Smirnov(K-S)testsstatisticsfor4H: MobileDC,QP=24. Num- bers in parenthesis indicate cases that K-S statistics are larger than the 1% signi¯cance cuto® value, 0:1608, and therefore do not pass K-S tests. Those are cases when n¢p¸4 does not hold. Note that bit position 6 corresponds to a more signi¯cant bit-plane. Bit Position 6 5 4 3 p=0:0228 p=0:0529 p=0:0961 p=0:2002 H(p)=0:1571 H(p)=0:2987 H(p)=0:4566 H(p)=0:7222 n=96 (0.1912) 0.1470 0.1310 0.1510 n=128 0.1580 0.1401 0.1266 0.1358 n=256 0.1458 0.1343 0.1073 0.1080 n=512 0.1459 0.1219 0.0943 0.0957 4.3 Multiple Binary Sources: Rate Penalty Analysis and Sample Allocation In this section we study compression of multiple binary input sources. The multiple sources scenario can arise in di®erent applications. One example is the compression of multiple streams of sensor measurements captured in di®erent nodes. Another important exampleisthecompressionofacontinuousinputsource,wherethesourceis¯rstmapped to a bit-plane representation and then each bit-plane is compressed using DSC, so that the problem becomes one of compression of multiple binary sources. From (4.7), in the singlesourcecasetheratepenaltyduetoestimationerrordependsonbothsampling,i.e., n, and the characteristics of the source, i.e., p. In a system with multiple input sources each with a di®erent p, we now investigate the optimal sample allocation to each source suchthattheoverallratepenaltycanbeminimized,subjecttotheconstraintonthetotal number of sample allocated to the system. 118 4.3.1 Problem De¯nition Consider the compression of L binary sources b X (l), l = 0 to L¡1. Each binary source is independently encoded using SW coding with its respective SI b Y (l) available at the decoder. We shall follow the assumptions in Section 4.2, i.e., fb X (l);b Y (l)g i.i.d. » p(b X (l);b Y (l)), with Pr[b X (l) = 0] = 0:5 and crossover probability Pr[b X (l)6= b Y (l)] = p l . Let K l be the number of binary values to be encoded for source b X (l). We follow the correlation estimation procedure in Section 4.2, where encoding b X (l) requires observing n l (· K l ) random samples of b Y (l) in order to compute the biased estimator ^ p l (n l ) according to (4.4) (or (4.8) in practice), so that the encoding rate for b X (l) can be determined. The encoding of b X (l) would su®er a rate penalty (4H l )(n l )=H(^ p l (n l ))¡ H(p l ). In particular, following the discussion in Section 4.2 and using (4.7), 4H l would be normally distributed: (4H l )(n l ) » N(H 0 (p l )z !=2 ¾ l ;(H 0 (p l )) 2 ¾ l 2 ); (4.9) where H 0 (p l ) = ln( 1 p l ¡1) and ¾ l = p p l (1¡p l )=n l . On average, the coding penalty of the whole system, in bits/sample, is given by: 4H = 1 K T L¡1 X l=0 K l 4H l : (4.10) whereK T = P L¡1 l=0 K l . Note that in this section4H refers to the average coding penalty of the entire system with L sources. Since the samplings are performed independently on 119 each source, 4H l are independent r.v., and therefore 4H is normally distributed with expectation and variance given by: E[4H] = 1 K T L¡1 X l=0 K l H 0 (p l )z !=2 ¾ l ; (4.11) VAR[4H] = L¡1 X l=0 µ K l K T ¶ 2 (H 0 (p l )) 2 ¾ l 2 : (4.12) The total number of samples is limited to be n T , i.e., P L¡1 l=0 n l = n T , under the as- sumption that we would like to have n T ¿ K T , because each sample would incur a communication/computational cost. Our main goal is to minimize E[4H] subject to a given n T . Note that E[4H] is a function of (i) p=fp l g, correlation of di®erent sources, (ii) n T , total number of samples usedtoestimatecorrelation, and(iii)n=fn l g, allocationofsamplestodi®erentsources. In the following sections: 1. we derive an optimal sample allocation strategy, i.e., given p = fp l g, n T , we ¯nd the optimal n=fn l g to minimize the rate penalty E[4H]; 2. given the optimal sample allocation, we study how E[4H] changes with n T . As will be discussed in the next section, the optimal sample allocation requires knowl- edge of fp l g, which obviously is not known a priori. Several strategies will therefore be described to apply our results in practice. 120 4.3.2 Optimal Sample Allocation Strategy In this section we seek to ¯nd the optimal number of samples to allocate to di®erent sources,fn ¤ l g, so as to minimizeE[4H]. To ¯ndfn ¤ l g, we solve the following constrained optimization problem: min fn l : P L¡1 l=0 n l =n T ;n l ·K l g E[4H]; (4.13) where E[4H] is given by (4.11). Applying the Lagrangian optimization method and Kuhn-Tucker conditions to deal with the inequalities constraints, we obtain (details in Appendix B): n ¤ l = 8 > > < > > : °(K l ® l ) 2=3 ; if ° < K l 1=3 ® l 2=3 ; K l ; if °¸ K l 1=3 ® l 2=3 ; (4.14) where ® l is a constant that depends on z !=2 and p l : ® l =ln( 1 p l ¡1)z !=2 p p l (1¡p l ); (4.15) and ° is chosen so that P L¡1 l=0 n ¤ l = n T . This result gives rise to a sample allocation scheme analogous to the \water-¯lling" results in information theory [19]. We allocate equal weighted number of samples to each source, until for some sources the number of samples is equal to the number of source inputs. The weighting factor (K l ® l ) 2=3 is a constant that depends only on the speci¯c characteristics of the lth source (length and crossover probability). In situations where n l < K l can be guaranteed for all sources (e.g., when n T is small enough such that n T < K l is true for all l), (4.14) can be simpli¯ed to: n ¤ l = 121 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p i H(p i ) p l p l +Δp l H p l ’ p l ’ +Δp l H Same Δp l Figure4.3: EncodingratefunctionH(:). Thesame4p l wouldhavealargerimpacttothe estimated rate if the true p l is small. The optimal sample allocation takes into account this characteristic. n T (K l ® l ) 2 3 P L¡1 l=0 (K l ® l ) 2 3 : In the important case when raw bit-planes are extracted from a continu- ous valued source and compressed independently by SW coding, similar to the example in Figure 4.1(a), K l would be the same, andfn ¤ l g reduces to n ¤ l =n T ® 2 3 l P L¡1 l=0 ® 2 3 l : (4.16) The intuition behind the optimal sample allocation can be understood by inspecting the rate function H(:). For the same estimation error (4p l )(n l ) = ^ p l (n l )¡p l , the impact to the encoding rate will tend to be larger when the true p l is small (Figure 4.3). Since di®erent sources have di®erent p l , we should allocate the samples accordingly and use more samples for those sources with small p l , so as to minimize the overall rate penalty. This is re°ected in (4.14) and (4.15). Sincewehavechosentousethesamez !=2 inallsources,wehavethesameprobabilities of over-estimation for all sources. In some applications it may be more appropriate to choose di®erent target failure probabilities for di®erent sources, e.g., in the cases when the sources are bit-planes extracted from a continuous valued source, MSB bit-planes are 122 more important and should have smaller failure probabilities. This can be incorporated in the sample allocation by using di®erent z !=2 for di®erent sources in (4.11). Note that the optimal sample allocation depends on the crossover probabilities fp l g. However, fp l g is obviously not known initially. In practical applications, the results of optimal sample allocation can be applied in the following ways: ² In many applications, we may have some a priori knowledge of fp l g. For example, in hyperspectral image compression [14], where bit-planes of spectral bands are extracted and compressed, fp l g of neighboring spectral bands are usually similar. Therefore,wecanusesomeapproximationsoffp l gintheoptimalsampleallocation equationstodeterminethesampleassignment. Wewillpresentexperimentalresults inSection4.6todemonstratethatthiscanbeaviablemethod. Notethatusingthe a priori knowledge directly to select the encoding rate may cause decoding error, if this a priori knowledge leads to over-estimating the correlation. Instead, by using theaprioriknowledgetodeterminethesampleallocationand(4.8)astheestimator toselectthecodingrateweareguaranteedthat ^ p l islargerthanp l withprobability (1¡!=2),andwecanbounddecodingerrorsystematically. Alsothisismorerobust in cases where the a priori knowledge may not be a good approximation to the true fp l g. ² Wecanalsouseaniterativeapproachsimilar to[41]. Essentially, wewouldallocate samples in batches of same size. For the ¯rst batch, we allocate the same number of samples to all bit-planes and obtain some initial estimates for fp l g. For the subsequent batches, we allocate the number of samples according to the current 123 estimates and the optimal sample allocation strategy. In this approach, we can also use any available a priori knowledge to initialize the scheme. 4.3.3 Rate Penalty Analysis - Multiple Sources HavinganexpressionforE[4H]asafunctionofn T allowstheencodertoselectappropri- ate values for n T , given that increasing n T leads to additional overhead but also reduces the rate increase due to inaccurate estimation. We focus on the cases when n l < K l , where closed form equations can be derived. The equations relating E[4H] to n T can be obtained from (4.11) and (4.14): E[4H]= ¯ p n T ; (4.17) where ¯ = 1 K T h P L¡1 l=0 (K l ® l ) 2=3 i 3=2 . Note that ¯ is a constant for a given system. There- fore, the average rate penalty is inversely proportional to the squared root of the amount of sampling overhead. 4.3.4 Experiments In this section we assess the bene¯ts of of using the optimal sample allocation when compressing i.i.d. sources. We randomly generate L pairs of binary correlated sources fb X (l);b Y (l)g each with crossover probability p l and dimension K l . Then n T samples are used to estimate the correlation. The number of samples allocated to each source is determined according to the following schemes: 124 ² Optimal allocation. Weuse(4.14)todeterminen l , thenumberofsamplesallocated to the l-th source. ² Even allocation. We allocate the same number of samples to each source, i.e., n l =n T =L. Note that in this section we use the true crossover probabilities to determine the optimal allocation, while similar comparison using some a priori information to determine the optimalallocationinpracticalscenarioswillbediscussedinSection4.6. Theschemesare comparedbasedontheaverage(overall)ratepenaltyafterN E samplingexperiments. We compute the reduction in the average rate penalty (in bits) as: E[4H] even ¡E[4H] opt , where E[4H] opt is the average rate penalty using the optimal sample allocation, and E[4H] even isthatofusingtheevensampleallocation. Sincetherearemanypossiblecom- binations offp l g, as an example we choosefp l g of the formfp+k± p g, k =§1;§2;:::;p= 0:25. Therefore,alarge± p correspondstomoresubstantialvariation(standarddeviation) infp l g. Figure4.4depictsthecomparisonresultsandshowsthatover0:07bitsreduction in rate penalty can be achieved in the case of considerable variation in fp l g, with this extra number of bits representing a 11.7% rate increase. In practical applications diverse crossover probabilities are indeed common (e.g., see [14] or data in Section 4.2.4). 4.4 Continuous Input Source: Model-based Estimation Inthis sectionweinvestigatecorrelationestimation methods inthe particularbutimpor- tantcaseswhenbit-planesareextractedfromacontinuousinputsourceandeachbit-plane 125 0 0.02 0.04 0.06 0.08 0.1 0.04 0.06 0.08 0.1 0.12 0.14 Average rate penalty reduction (bits) 0.20% of total 1.56% of total p Figure 4.4: Reduction in rate penalty (bits) using the optimal sample allocation, with L = 4, K l = 16384, n T = 128 or 1024 (i.e., 0:20% or 1:56% of total respectively), N E =100, and p=0:25. is compressed via SW coding. This situation arises in several proposed distributed im- age and video coding algorithms (e.g., [3,14,48,86]). Often, in these applications, some a priori statistical model knowledge of the continuous-valued input source is available. For example, wavelet and DCT transform coe±cients are typically considered to be well modeled by Laplacian distributions [65]. In the following we propose a model-based es- timation method, where the continuous-valued joint p.d.f. of the source and SI is ¯rst estimated via sampling of continuous valued inputs 5 , and then the bit-plane level (bi- nary) correlation is derived from the estimated model. This is in contrast to the direct approach studied in Sections 4.2 and 4.3, where the bit-plane correlation is estimated through exchanging binary samples from the extracted bit-planes. We shall demonstrate themodel-basedmethodcanachievebetterestimationaccuracythanthedirectapproach provided that the continuous-valued model is su±ciently accurate. 5 In practice, the continuous valued inputs are rounded so that the samples can be represented with a ¯nite number of bits. 126 4.4.1 General Approach We shall focus on the setting of Figure 4.1(a), where binary correlation is exploited for compression of a continuous input X using continuous side-information Y. Assume X andY aredrawni.i.d. fromf XY (x;y). WeassumeY =X+Z,whereZ isthecorrelation noise independent of X. Our proposed model-based approach starts by estimating the p.d.f.'s f X (x) and f Z (z). This can be done by choosing appropriate models for the data samples and then estimating the model parameters using one of the standard parameter estimation techniques, e.g., maximum likelihood estimation (MLE). Once we have estimated f X (x) and f Z (z) we can derive the bit-plane statistics as follows. Suppose we extract raw bit-planes from the binary representations of X and Y, and are interested in estimating p l , the crossover probability between the bit-planes of X and Y at signi¯cance level l. Figure 4.5(a) depicts the events (shaded regions A i ) that lead to the occurrence of crossover between X and Y at signi¯cance level l. For example, consider l = 2 (i.e., the 2nd bit-plane), when X takes the values from 8 (= 01000b) to 11 (= 01011b), crossover occurs when Y takes the values from 4 (= 00100b) to 7 (= 00111b) (region A 4 in Figure 4.5(a)), or 12 (= 01100b) to 15 (= 01111b) (region A 5 in Figure 4.5(a)), ..., etc. Speci¯cally, A i is a subset ofR 2 de¯ned as A i =f (x;y) j 2c¢2 l ·jxj<(2c+1)¢2 l ; (2d+1)¢2 l ·jyj<(2d+2)¢2 l ; or (2g+1)¢2 l ·jxj<(2g+2)¢2 l ; 2h¢2 l ·jyj<(2h+1)¢2 l g;(4.18) 127 for some c;d;g;h 2Z + . Hence we can estimate the crossover probability at bit-plane l by ^ p l = X i Z Z A i f XY (x;y)dxdy = X i Z Z A i f X (x)f YjX (yjx)dxdy (4.19) Assuming that Y = X +Z and that X, Z are independent, f YjX (yjx) can be found to be equal to f YjX (yjx)=f Z (y¡x) (4.20) and the integral in (4.19) can be readily evaluated for a variety of densities. In practice we only need to sum over a few regions, A i , where the integrals are non-zero. Note that when l is small (i.e., least signi¯cant bit-planes) the crossover probability is close to 0:5, since in such cases A i are small and evenly distributed throughout the sample space, and hence for most models (4.19) will give ^ p l close to 0:5. InSection4.5,wewilldiscusshowtoextendmodel-basedestimationtothecaseswhen bit-planesareextractedusingmoresophisticatedmethods, inparticularthosebeingused in wavelet-based applications. 4.4.2 Experiments We now compare the accuracy of direct estimation and model-based estimation. We generate i.i.d. Laplacian sources X and Z of dimension M with model parameters ¯ and ® respectively, i.e., f X (x) = 1 2 ¯e ¡¯jxj , f Z (z) = 1 2 ®e ¡®jzj . Then the crossover probability 128 |X| |Y| 0 2 l 2x2 l 3x2 l 2 l 2x2 l 3x2 l … … A 0 A 1 A 3 A 2 A i A 4 A 5 (a) |X i | |Y i | 0 2 l 2x2 l 3x2 l 2 l 2x2 l 3x2 l A 0 A 1 A 2 A 3 … … (b) X i Y i 0 2 l 2x2 l -2 l -2x2 l A 0 A 1 (c) Figure4.5: (a)Crossoverprobabilityestimationforrawbit-planes. A i aretheeventsthat lead to occurrence of crossover between X and Y at signi¯cance level l. (b) Re¯nement bit-plane crossover probability estimation: probability of crossover and X i is already sig- ni¯cant. (c) Sign bit-plane crossover probability estimation: probability of sign crossover and X i becomes signi¯cant. 129 p l of X and Y(= X +Z) at signi¯cance level l;0 · l · L¡1, is estimated using the following approaches: ² Direct estimation with even sample allocation. This is similar to the estimation methodinSection4.2, wherenbinarysamplesofthel-thbit-planesareexchanged. Since there are L bit-planes in total, the total amount of exchanged data is L¢n bits. ² Direct estimation with optimal allocation. This is similar to the aforementioned approach except that the optimal sample allocation (4.14) is used to distribute the L¢n binary samples among bit-planes. ² Model-based estimation. HerenL-bitsrandomsamplesofY aresenttotheencoder of X, where the model parameters ¯ and ® are estimated from the n samples of X andZ(=Y¡X)respectively,usingMLE[50]. Thentheestimateofp l ;0·l·L¡1, can be derived analytically from ^ ¯ and ^ ® using (4.19). Therefore, the model-based approach also incurs L¢ n bits to estimate the crossover probabilities of all the bit-planes. Note that in the direct approach we do not include o®set in the estimator, i.e., the estimator is S(n) n following the notations in Section 4.2. Therefore, it is fair to compare with (4.19). Practical applications may choose to bias both the direct and model-based estimators as in (4.8) (in model-based estimation we would replace S n by that calculated in(4.19)). Theapproachesarecomparedbasedonthedeviationoftheestimatesfromthe true(empirical)crossoverprobability: j^ p l ¡p l j=p l . Thedeviationismeasuredfordi®erent bit-planes using di®erent percentage of exchanged samples, n M , with ¯ = 0:3, ® = 2:5, 130 M = 6480. These parameters are similar to those observed in the video data used in the experiments in Section 4.2.4, i.e., X and Y are the quantized DCT coe±cients in a current frame and the corresponding quantized coe±cients in the motion-compensated predictors in the reference frame, respectively. Figure 4.6 depicts the comparison results, where the deviations are obtained by averaging over N E = 1000 experiments. As shown in the ¯gure, the model-based estimation can achieve considerable improvements in the estimationaccuracy,especiallywhenonlyasmallnumberofsamplesareusedorcrossover probability is small. Note that model-based estimation utilizes the information that the bits to be encoded have been extracted from continuous-valued data following certain distributions, and therefore would tend to perform better than direct estimation, which doesnotusesuchinformation. However,model-basedestimationisapplicableonlytobit- planes extracted from continuous sources, and obviously its performance depends on how accurately the continuous-valued data can be modeled. Additional experiments assessing the performance in terms of coding rate and distortion using a real application will be presented in Section 4.6. 4.5 Model-based Estimation on Structured Bit-planes In this section we discuss how to extend the model-based estimation to the cases when bit-planes are extracted using more sophisticated methods. For example, in the cases when X and Y are wavelet transform coe±cients 6 , bit-planes are usually partitioned depending on the magnitude of the transform coe±cients to improve coding e±ciency, as 6 A concrete application scenario can be X and Y are collocated wavelet transform coe±cients of two correlated images. 131 1st bit-plane; crossover prob. = 0.1700 0 10 20 30 40 50 0 0.5 1 1.5 2 2.5 % of sample Deviation (%) Direct (Even Allocation) Direct (Optimal Allocation) Model (a) The 1st bit-plane. 2nd bit-plane; crossover prob. = 0.0610 0 10 20 30 40 50 60 70 80 0 0.5 1 1.5 2 2.5 % of sample Deviation (%) Direct (Even Allocation) Direct (Optimal Allocation) Model (b) The 2nd bit-plane. 3rd bit-plane; crossover prob. = 0.0141 0 20 40 60 80 100 120 140 160 180 0 0.5 1 1.5 2 2.5 % of sample Deviation (%) Direct (Even Allocation) Direct (Optimal Allocation) Model (c) The 3rd bit-plane. Figure4.6: Comparingestimationaccuracybetweenthedirectapproachesandthemodel- based approach. 132 in the set-partitioning algorithm used in SPIHT [61]. Speci¯cally, in these \signi¯cance coding"techniques, theencoder¯rstsignalsthe signi¯canceofeachofthecomponentsat a given bit-plane. After a component becomes signi¯cant, sign information is conveyed andthenfurtherre¯nementbitsaretransmitted. Whiledi®erentwaveletsystemsmayuse di®erent techniques to encode the bits (e.g., context coding can be used as in JPEG2000 or alternatively zerotree coding can be used as in SPIHT to represent signi¯cance maps), thede¯nitionsofthe(uncoded)sign/re¯nementbitsandsigni¯cancemapsarelargelythe same, so the techniques we propose in the context of SPIHT in this section would also be applicable to other wavelet-based compression schemes that use bit-plane encoding. We shall consider systems where SW coding is applied to compress the sign/re¯nement bit- planes 7 , and propose extension of model-based approach to estimate the corresponding crossover probabilities, so that encoding rate can be determined. The extended approach can also be applied to facilitate adaptive combinations of SW/entropy coding to improve coding performance [8]. In what follows we will ¯rst discuss how to extend model-based approach to estimate crossover probabilities of sign/re¯nement bit-planes. Since signi¯cance coding is usually usedinwavelet-basedapplications(e.g.,[61,71]),wewillalsodiscusshowtoaddresssome of the issues when applying model-based estimation in wavelet-based DSC applications. 7 Since signi¯cance map carries structural information, a single decoding error in the signi¯cance map would cause decoding failure of all the subsequent bits, and therefore SW coding may not be suitable for compressing signi¯cance map. 133 4.5.1 Model-based Estimation for Re¯nement/Sign Bit-planes Given an input source X i to be compressed using side information Y i (= X i +Z i ), and followingthesameassumptionsinSection4.4.1,ourgoalistoestimatethecrossoverprob- abilityofthere¯nementandsignbit-planesofsigni¯cancelevell,denotedasp ref (l;i)and p sgn (l;i) respectively 8 . Following from the de¯nition of re¯nement bits, the re¯nement bit-plane of signi¯cance level l includes only coe±cients that are already signi¯cant [61], i.e., jX i j >= 2 l+1 . Therefore, the crossover probability of the l-th re¯nement bit-plane for source X i is p ref (l;i)= Pr(R \ jX i j>=2 l+1 ) Pr(jX i j>=2 l+1 ) (4.21) where R denotes the event of crossover in magnitude bits, i.e., R= S A i , with A i de¯ned in(4.18). FollowingthediscussioninSection4.4.1, wecancalculate Pr(R \ jX i j¸2 l+1 ) byintegratingthejointp.d.f. ofX i andY i ,f X i Y i ,overtheshadedregionsinFigure4.5(b), similar to (4.19). Moreover, f X i Y i can be factorized as in (4.20). Hence p ref (l;i) can be readily calculated after estimating models for f X i (x) and f Z i (z). In practice, we only need to integrate a few regions where f X i Y i is non-zero. The crossover probability of sign bit-planes can be derived in a similar fashion. The di®erence here is we need to integrate di®erent regions in the sample space of X i and Y i . Thel-thsignbit-planeincludesonlythesignbitsofthecoe±cientsthatbecomesigni¯cant 8 Weintroducethesubscriptiinthissectiontofacilitatethediscussionofwavelet-basedapplicationsin the next section. Speci¯cally, in Section 4.5.2, X i will be used to denote the wavelet transform coe±cient in the i-th subband. We use separate models for di®erent subbands in order to take into account di®erent statistics in di®erent subbands (e.g., variances tend to decrease when going from high level subbands to low level subbands). 134 at signi¯cance level l [61], i.e., 2 l+1 >jX i j¸ 2 l . Hence the crossover probability of the l-th sign bit-plane in the source X i is p sgn (l;i)= Pr(S \ 2 l+1 >jX i j¸2 l ) Pr(2 l+1 >jX i j¸2 l ) (4.22) where S denotes the event of crossover in sign bits, i.e., S =f (x i ;y i )j x i >0; y i <0g[f (x i ;y i )j x i <0; y i >0g: (4.23) Pr(S \ 2 l+1 >jX i j¸ 2 l ) can be calculated by integrating the joint p.d.f. of X i and Y i over the shaded regions in Figure 4.5(c), similar to (4.19), and factoring the p.d.f. as in (4.20). However, estimation of f X i (x) and f Z i (z) is usually not necessary, since this has already be done in re¯nement crossover estimation. 4.5.2 Model-based Estimation for Wavelet-based Applications Sincesigni¯cancecodingisusedmostlyforwavelet-basedcompressions,inthissectionwe willdiscusstheparticularscenariosofapplyingmodel-basedestimationforwavelet-based DSCapplications. WedenoteX thewavelettransformcoe±cientsoftheinputdata,with thei-th subband denoted X i ;0·i·N B ¡1, where N B is the total number of subbands. A main issue to extend model-based approach to wavelet-based applications is that in some subbands there may not be enough coe±cients to obtain reliable estimates of the model parameters (e.g., high level subbands in the case of dyadic decomposition). We will discuss how to address the issue in the following. 135 4.5.2.1 Estimation with Adequate Samples FollowingthediscussioninSection4.5.1, toestimatethecrossoverprobabilitiesofsignor re¯nement bit-planes, we need to estimate the models f X i (x) and f Z i (z). Experimental results on real image data suggest a single model f Z (z) can be used for all Z i without any noticeable impact on coding performance. Estimation of f Z (z) (at the encoder of X) involves samples of Y and therefore is subject to communication/computational con- straints, and only a small amount of samples Z (= Y ¡X) should be used to estimate f Z (z) (as illustrated in the experiments in Section 4.6). On the other hand, estimation of f X i (x) involves samples of X i and results are a®ected by the number of coe±cients in the i-th subband, N i . Since some subbands may not have enough coe±cients to obtain reliable estimate of f X i (x), we partition the set of subbandsfij 0·i·N B ¡1g intoL andH, whereL denotes the subset of subbands (low level subbands) which have enough coe±cients for reliable estimation of the models, andH denotes the set of remaining sub- bands(highlevelsubbands). ThepartitionofallsubbandsintoLandHisdeterminedby N i . Inparticular,ifLaplacianmodelf X i (x)= 1 2 ¯ i e ¡¯ i jxj ischosenofX i ,andMLEisused to estimate ¯ i , then the MLE estimator ^ ¯ i has a percentage deviation D = ( ^ ¯ i ¡¯ i )=¯ i . It can be shown that D » N(0;1=N i ), i.e., it depends on N i only. Therefore, we can select a threshold to apply to N i in order to classify a subband intoL orH according to a desired distribution of D. Estimation of p ref (l;i) and p sgn (l;i), where i2L, can be performed using the algo- rithmsdiscussedinSection4.5.1,withmodelsf X i (x)andf Z (z)estimatedfromtransform coe±cients samples using standard methods, e.g. MLE. Alternatively, the correlation 136 noise model f Z (z) can be estimated from statistics in the raw data domain (e.g. pixel data) calculated using raw domain samples, and this can lead to some implementation advantageastransformationoftheside-informationisnolongerrequired. Forexample,if Laplacian model f Z (z)= 1 2 ®e ¡®jzj is chosen for Z, then we can estimate ® by calculating the standard deviation of Z, ¾ Z , using raw domain samples, and using the relationship between standard deviation and model parameter in Laplacian distribution, ®= p 2=¾ Z . Thisisviablesincethevarianceofthecorrelationnoisewouldbethesameintherawand transform domains if orthogonal ¯lters are used. For some bi-orthogonal ¯lters, e.g. 9/7, the variance of the correlation noise in the raw data domain would also be very close to that in the transform domain [74], and we can estimate ® using similar procedures. For other bi-orthogonal ¯lters, e.g. 5/3, the raw domain variance would need to be properly normalized, following the discussions in [74], so that ® can be accurately estimated. 4.5.2.2 Estimation without Adequate Samples SubbandsinHdonothaveenoughcoe±cientstoestimatef X i (x)reliably. Instead,weuse the empirical p.m.f. Pr(X i =x) of subbands inH along with the correlation noise f Z (z) to estimate sign/re¯nement crossover probabilities. Speci¯cally, we derive the average crossoverprobabilityforthere¯nementbitsconsistingof i-thsubbandcoe±cients, i2H, by p ref (l;i)= X Pr(U(l;x))Pr(X i =x) (4.24) where U(l;x) denotes the events of l-th re¯nement bits crossover when X i = x, and the summation is taken over all the possible values of X i where Pr(X i =x) is non-zero. We 137 can determine Pr(X i =x) empirically during encoding by binning the coe±cients in the i-th subband. To determine Pr(U(l;x)), we assume Y i = x+Z (note that here x is a constant instead of a random variable), and U(l;x) is a subset of the sample space of Z and can be found to be equal to U(l;x)= 8 > > > > > > > > > > < > > > > > > > > > > : fz j ¡x+(2k)2 l ·z·¡x+(2k+1)2 l ;or ¡x¡(2k+1)2 l ·z·¡x¡(2k)2 l g; if j jxj 2 l k is odd; fz j ¡x+(2k+1)2 l ·z·¡x+(2k+2)2 l ;or ¡x¡(2k+2)2 l ·z·¡x¡(2k+1)2 l g; if j jxj 2 l k is even; (4.25) where k2Z + . Therefore, Pr(U(l;x)) can be derived by summing the integrals of f Z (z) over the shaded regions as depicted in Figure 4.7. In practice we only need to sum over a few regions where the integrals are non-zero (e.g., around Z = 0, if Z is Laplacian distributed). Note that computing Pr(X i =x) by binning the coe±cients may not incur much complexity as the subbands inH have only a small number of coe±cients. Similarly, we can derive p sgn (l;i), i2H, by p sgn (l;i)= X Pr(V(l;x))Pr(X i =x) (4.26) with V(l;x) denotes the event of the l-th sign bits crossover when X i = x. It can be shown that Pr(V(l;x)) = R ¡jxj ¡1 f Z (z)dz when f Z (z) is symmetric. Note that we use (4.21) and (4.22) to estimate the crossover probabilities when there are enough samples in a subband to allow reliable estimation of f X i (x), and (4.24) and (4.26) when there are 138 -x -x+2 l -x+2*2 l -x-2 l -x+3*2 l -x-2*2 l z f Z (z) (a) -x -x+2 l -x+2*2 l -x-2 l -x+3*2 l -x-2*2 l z f Z (z) (b) Figure 4.7: Pr(U(l;x)) when (a)b jxj 2 l c is odd ; (b)b jxj 2 l c is even. 139 insu±cientsamples inasubbandandempiricalp.m.f. Pr(X i =x)isused tocharacterize the data. 4.6 Hyperspectral Image Compression Experiments In this section we describe several additional experiments on the proposed algorithms using real image compression applications. In particular, the DSC-based hyperspectral image compression proposed in Chapter 2 is used as the test-bed 9 to assess the perfor- mance of the sample allocation strategy proposed in Section 4.3 and the model-based estimation proposed in Sections 4.4 and 4.5. We brie°y review the system ¯rst, and present the experiment details and results in the following sections. 4.6.1 DSC-based Hyperspectral Image Compression Figure 4.8 depicts the encoding algorithm of the DSC-based hyperspectral image com- pression as proposed in Chapter 2 [8,67]. To compress the current spectral band B i , the sign and magnitude bits of the wavelet transform coe±cients are extracted using an algorithm similar to the standard SPIHT [61], with modi¯cations such that at some sig- ni¯cance levels the magnitude bits are extracted as raw bit-planes (instead of separating themintosigni¯canceandre¯nementbit-planes)inordertoimprovecodingperformance. Details of the bit-plane extraction strategy can be found in [8], while here our focus is to investigate e±cient algorithms to estimate the correlation between extracted bit-planes and their corresponding SI. 9 We choose the hyperspectral image applications for experiments mainly because of the availability of the system. 140 Slepian-Wolfcodingisemployedtocompresssign/re¯nement/rawbit-planes,usingas side information the sign/re¯nement/raw bit-planes of same signi¯cance extracted from a ^ B i¡1 +b, where ^ B i¡1 is the previous adjacent reconstructed band available only at the decoder, and a and b are some linear prediction coe±cients. Signi¯cance maps of B i are intra-coded by zerotree coding. To determine the coding rate, the original previous band B i¡1 is used to approximate ^ B i¡1 at the encoder. This is viable since these appli- cations focus on high ¯delity. In particular, sign/re¯nement/raw bit-planes are explicitly extracted from the wavelet transform coe±cients of aB i¡1 +b, and the crossover proba- bilities are estimated by exchanging small subsets of bits and using the direct estimation approach discussed in Section 4.2. The amount of information exchanged needs to be kept small so that the algorithm can be used in parallel encoding scenarios, where each band is assigned to a di®erent processor, and the processors are connected by low band- width data buses. In order to ensure that the source and SI bit-planes are formed with the wavelet coe±cients at the same locations, we need to apply the signi¯cance tree of B i when extracting bit-planes from B i¡1 [67]. Note that the extracted sign/re¯nement/raw bit-planes from B i¡1 are solely used for correlation estimation. 4.6.2 Sample Allocation Experiments Given that n T binary samples can be used to estimate the crossover probabilities when compressing B i , we compare two strategies to determine how to allocate the samples to di®erent bit-planes: ² Adaptive sample allocation. We use (4.14), i.e., the optimal sample allocation, to decide the numbers of samples allocated to di®erent bit-planes. However, since 141 Processor k-1 Linear Predictor Wavelet Transform Wavelet Transform Direct Comparison Significance bit-planes Significance map Sign/refinement bit-planes multiplexing Bit-plane Extraction Bit-planes Extraction Input Buffer Input Buffer Processor k X Y Idle Idle Slepian- Wolf Coder Zerotree Coder b X b Y Pr[b X <>b Y ] B i B i-1 Timing Diagram: Processor k Processor k-1 Figure 4.8: The DSC-based hyperspectral image compression with direct correlation es- timation. the crossover probabilities of B i are unknown, we use as a priori information the crossover probabilities of B i¡1 in (4.14) (which have been estimated during the compression of B i¡1 ). When compressing the ¯rst DSC-coded band, since a priori information is not available we allocate the same number of samples to each bit- plane. ² Even sample allocation. Weallocatethesamenumberofsamplestoeachbit-planes for all the bands. The NASA AVIRIS image data-sets [38] are used in the experiment. The original image consists of 224 spectral bands, and each spectral band consists of 614£512 16-bits pixels. In the experiment, we compress 512£512 pixels in each band. Figures 4.10(a) and 4.10(b) depict the RD performances of the system under di®erent sample allocation 142 Wavelet Transform Significance bit-planes Sign/refinement bit-planes multiplexing Bit-planes Extraction Input Buffer Input Buffer Processor k-1 Pr[b X <>b Y ] X Slepian- Wolf Coder Zerotree Coder Linear Predictor Derive crossover analytically Estimate Correlation Noise Model Estimate Source Model Processor k b X B i B i-1 Processor k-1 Timing Diagram: Processor k Figure 4.9: DSC-based hyperspectral image compression with model-based estimation. strategies. HereMPSNR=10log 10 (65535 2 =MSE), whereMSEisthemeansquarederror between all the original and reconstructed bands. Also shown in the ¯gures are the RD performances when the exact crossover probabilities are used to determine the coding rate. As shown in the ¯gures, in the situations with small number of samples exchanged (e.g.,0.25%oftotal),theadaptivesampleallocationcanreducetheratepenaltybyabout 1dB,ascomparedtotheevensampleallocation. Notethattheadaptivesampleallocation requires negligible overhead: it simply uses (4.14) to determine a more e±cient sample allocation across bit-planes based on any available a priori information. 4.6.3 Model-based Estimation Experiments Model-based estimation can be applied to the hyperspectral image system following the algorithmsoutlinedinSections4.4and4.5,withX andY beingthetransformcoe±cients 143 Lunar Reflectance (Band 41-48) 65 70 75 80 85 90 95 100 0.1 1 10 bpp MPSNR Exact Adaptive Sample Exchange Even Sample Exchange (a) Moffet Radiance (Band 73-80) 65 70 75 80 85 90 95 100 0.1 1 10 bpp PSNR Exact Adaptive Sample Exchange Even Sample Exchange (b) Figure 4.10: Sample allocation experiments - (a) Lunar (re°ectance data), (b) Mo®et (radiance data), using 0:25% total sample. An adaptive sample allocation scheme using the proposed optimal sample allocation strategy in Section 4.3 with a priori information from previous encoded band is compared with the even sample allocation. Here \Exact" denote the cases when the exact correlation information is used to determine the coding rate, i.e., no rate penalty. 144 ofB i andaB i¡1 +brespectively. Asdiscussed,continuous-valuedsourcesamplesareused to estimate the models in model-based estimation, so bit-plane extraction from SI is no longer necessary. In addition, the correlation noise model can be estimated in the pixel domain in this case following the discussion in Section 4.5.2. Therefore, the model-based estimation can result in a more e±cient system as depicted in Figure 4.9. Here the core compression module is the same as that in the original system (with direct estimation) in Figure 4.8, while the correlation estimation algorithm is modi¯ed following the model- based approach, leading to the following implementation advantages in this application: ² First, the model-based system requires less computation. This is evident when comparing Figures 4.8 and 4.9: Wavelet transform on and bit-planes extraction from SI as in the original system 10 are no longer required, while model estimation (using MLE) and calculating the crossover probability estimates (using analytical equations) require only small amounts of computation in the model-based system. ² Moreover, in parallel encoding scenarios, the model-based system requires less data tra±cbetweenprocessors. ThisisalsoevidentwhencomparingFigures4.8and4.9: In the model-based system, encoder of B i only needs to request pixel domain sam- ples from encoder of B i¡1 at the beginning of processing to compute the linear prediction coe±cients and correlation noise model, whereas in the original system additional tra±c is incurred for exchanging signi¯cance tree and SI bit-planes. 10 It may be possible to avoid wavelet transform of SI by storing and re-using the coe±cients during the compression of Bi¡1. But this would signi¯cantly increase the memory requirements. And bit-plane extraction from SI is always required if generating SI bit-planes explicitly, since the SI sign/re¯nement bits need to extracted based on the signi¯cance tree of Bi. 145 ² Furthermore, themodel-basedsystemcanachievebetterparallelencoding, asthere are only small amounts of dependency between encoders of di®erent bands at the beginningofprocessing,andprocessorscanproceedwithoutfurthersynchronization (See timing diagrams in Figures 4.8 and 4.9). Theperformanceofmodel-basedestimationisassessedbycomparingwiththeoriginal system, where SI bit-planes are explicitly extracted and exact crossover probabilities are used to estimate the encoding rate, i.e., there is no rate penalty. In the model-based approach, samples of B i¡1 are obtained by downsampling the image by factors of four and eight horizontally and vertically respectively. Therefore, 3:125% of image data are used to estimate the correlation noise model. To prevent decoding error due to under- estimating the crossover probability, we allow a larger margin to determine the encoding rate, at the expense of coding e±ciency. Speci¯cally, a 0:15-bit margin is added to the estimated Slepian-Wolf bound, so that there is no decoding error occurred in the testing data-sets. Figures 4.11 depicts the RD performance. As shown in the ¯gures, model- based estimation incurs only small degradation in coding e±ciency. In most cases, the di®erenceisless than 0:5dBwhencompared to thedirect estimationwith exact crossover probabilities used to determine the coding rate. 4.7 Extensions to Other Correlation Models While in the previous sections we focused on binary correlation, in this section we give examples of how some of the proposed ideas can be extended for several other correlation 146 Cuprite Radiance (Scene SC01, Band 131-138) 65 70 75 80 85 90 95 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) Direct est. Model SPIHT Latest 3D ICER (Apr 06) FY04 3D ICER (a) Lunar Reflectance (Scene SC02, Band 41-48) 55 60 65 70 75 80 85 90 95 0.1 1 10 Bitrate (bits/pixel/band) MPSNR (dB) Direct est. Model (b) Figure 4.11: Coding e±ciency comparison: (a) Cuprite (radiance data); (b) Lunar (re- °ectance data). The model-based system is compared with the original system, which uses all samples in direct estimation and exact crossover probabilities to determine the encodingrate, i.e. noratepenalty. Codingperformancesofseveralotherwaveletsystems are also presented for reference. 147 models, where non-equiprobable inputs are considered or previously (higher signi¯cance) decoded bitplanes are used as side information for decoding. 4.7.1 Rate Penalty Model for Non-equiprobable Input In Section 4.2 we presented a model for rate penalty (4H)(n) for equiprobable inputs, and experiment results in Section 4.2.4 suggest the model is su±ciently accurate for real- worlddata,whichareingeneralnon-equiprobable. Nevertheless,moreaccurate(butmore complicated) penalty models can be obtained by relaxing the equiprobable assumption. We follow the notations in Section 4.2, and assume (i) Pr[b X = 0] = µ, and (ii) the correlation is symmetric, i.e., Pr[b Y = 1jb X = 0] = Pr[b Y = 0jb X = 1] = p. It can be shown that Pr[b X 6= b Y ] = p, and the lower bound in the lossless encoding rate of b X is a function of p and µ H(b X jb Y )=g(p;µ)=¼H( (1¡p)µ ¼ )+(1¡¼)H( pµ 1¡¼ ); (4.27) where ¼ =Pr[b Y =0]=µ+p¡2µp: (4.28) We follow the approach of Section 4.2, where the p.d.f. of the estimation error is ¯rst derived,andthentheratepenaltyisdeterminedfromtherelationshipbetweenestimation errorandratepenalty. Inparticular,followingtheestimationproceduresinSection4.2.2, theestimationerror(4p)(n)isgivenby(4.5). Notethatonlyasubsetofb Y 'sareavailable when encoding b X and thus Pr[b X 6= b Y ] cannot be found exactly. On the other hand, 148 since all b X 's are available at the encoder there is no estimation error for µ. Therefore, the rate penalty can be approximated by: (4H)(n) ¼ @g(p;µ) @p (4p)(n) » N( @g(p;µ) @p z !=2 ¾;( @g(p;µ) @p ) 2 ¾ 2 ); (4.29) where @g(p;µ) @p = (1¡2µ)(H( (1¡p)µ ¼ )¡H( pµ 1¡¼ ))+ µ(µ¡1) ¼ ln( p(1¡µ) (1¡p)µ )+ µ(1¡µ) 1¡¼ ln( (1¡p)(1¡µ) pµ ); (4.30) and n is the number of samples of b Y used in the estimation. One can verify (4.29) simpli¯esto(4.7)whenµ =0:5. Notethatonecanuse(4.29)toderivethepenaltymodel for multiple non-equiprobable binary sources and determine the corresponding optimum bit allocation following the discussion in Section 4.3. 4.7.2 Model-based Estimation with Previously Decoded Bit-planes as Side-information We now discuss how to extend model-based estimation to the cases when the previously decoded bit-planes are used as SI. Speci¯cally, we consider the cases when bit-planes are extracted from a continuous input source, and each bit-plane b X (l) is compressed with boththepreviouslydecodedbit-planesofthesamesourceb X (l+1);:::;b X (l+m)andthat 149 of the correlated source b Y (l);b Y (l+1);:::;b Y (l+m) as SI (Figure 4.12(a)). We assume bit-planes are communicated to the decoder starting from the MSB, while the reverse order can be addressed similarly. For encoding b X (l) we need to estimate the coding rate H(b X (l)jb X (l+1);:::;b X (l+m);b Y (l);b Y (l+1);:::;b Y (l+m)): (4.31) We follow the same general approach as discussed in Section 4.4. To determine (4.31), we need the joint p.d.f. between the input and all the SI: p(b X (l);b X (l+1);:::;b X (l+m);b Y (l);b Y (l+1);:::;b Y (l+m)); (4.32) whichhas2 (2m+2) ¡1freeparameters. Itmayseemcomplicatedtoestimatethejointp.d.f. However, it turns out the model-based estimation for (4.32) exhibits regular structure, which greatly simpli¯es the estimation process. In addition, some video data suggests further improvement could be negligible with m > 1 in practical applications [10,11] (Figure 4.13). We denote ° i;j the joint probability when the binary representation of i is b X (l + m):::b X (l+1)b X (l) and that of j is b Y (l+m):::b Y (l+1)b Y (l) (Figure 4.12(b)), i.e., ° i;j =p(hb X (l+m):::b X (l+1)b X (l)i=i;hb Y (l+m):::b Y (l+1)b Y (l)i=j); and hb(l+m):::b(l+1)b(l)i denotes the numerical value of the concatenation of the se- quence of the bits b(l+m);:::;b(l+1);b(l), i.e., P m i=0 b(l+i)£2 i . It can be shown that by tracing the binary representations of X and Y the events leading to the occurrence of 150 hb X (l+m):::b X (l+1)b X (l)i = i and hb Y (l+m):::b Y (l+1)b Y (l)i = j correspond to the region A i;j in the sample space of X and Y in Figure 4.14. Therefore, ° i;j = Z Z A i;j f XY (x;y)dxdy (4.33) where A i;j =f (x;y) j c¢2 l+m+1 +i¢2 l ·jxj·c¢2 l+m+1 +i¢2 l +2 l ¡1; d¢2 l+m+1 +j¢2 l ·jyj·d¢2 l+m+1 +j¢2 l +2 l ¡1; c;d2Z + g: (4.34) (4.33)canbereadilycomputedbyfactorizingf XY andestimatingf X andf Z asdiscussed in Section 4.4. In practice, we only need to sum over a few regions where the integrals of f XY are practically non-zero. Note that we can extend this to estimate the encoding rate for structured bit-planes (i.e., sign/re¯nement bit-planes) following the discussion in Section 4.5. 4.7.3 Model-based Estimation with Continuous Side Information in Joint Decoding In this section we consider situations where each bit-plane b X (l) is compressed with both the previously decoded bit-planes of the same source b X (l + 1);:::;b X (l + m) and the 151 X Y … … … … b X (l) b X (l+1) b X (l+m) b X (0) b Y (l) b Y (l+1) b Y (l+m) b Y (0) l m b X (l-1) (a) b X (l+m) … b X (l+1) b X (l) b Y (l+m) … b Y (l+1) b Y (l) 00…00 00…01 00…10 … 11…11 00…00 00…01 … 11…11 0,0 1,0 0,1 0,2 0,2 m+1 -1 2 m+1 -1,0 2 m+1 -1,2 m+1 -1 … … (b) Figure 4.12: (a) Encoding of bit-plane b X (l) with both the previously decoded bit-planes b X (l+1);:::;b X (l+m) and that of the correlated source b Y (l);b Y (l+1);:::;b Y (l+m) as SI. (b) Joint p.d.f. between the input and all the SI. 152 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 B it pos ition Entropy/conditional entropy (bits) No SI SI: Corr. bit-plane SI: Corr. bit-plane and one prev . decoded bit-planes SI: Corr. bit-plane and two prev . decoded bit-planes SI: Corr. bit-plane and three prev . decoded bit-planes Figure 4.13: Entropy/conditional entropy of the video bit-planes data used in the experi- mentinSection4.2.4,i.e., X andY arethequantizedDCTcoe±cientsinacurrentframe andthecorrespondingquantizedcoe±cientsinthemotion-compensatedpredictorsinthe reference frame, respectively, using the 2nd AC coe±cients of Mobile (720£576, 30 fps), atQP=12: (a)WithoutSI,i.e.,intracoding(\NoSI");(b)Usingonlythecorresponding bit-plane as SI, i.e., m=0 (\SI: Corr. bit-plane"); (c), (d), (e): Using corresponding and one, two or three previouslydecodedbit-planes as SI,i.e., m=1;2or 3 respectively. The results suggest further improvements with m>1 could be negligible. |X| |Y| A 0,0 A 2 m+1 -1,2 m+1 -1 A 1,0 A 2 m+1 -1,0 A 0,0 A 1,0 … A 0,1 A 1,1 A 0,1 A 1,1 … A 0,2 m+1 -1 A 1,2 m+1 -1 A 0,0 A 1,0 A 2 m+1 -1,0 A 0,0 … A 2 m+1 -1,1 0 2 l 2·2 l 2 m+1 ·2 l 2 l 2·2 l 2 m+1 ·2 l … … A 1,0 … Figure 4.14: Extension of model based estimation: The events leading to the occurrence of hb X (l+m):::b X (l+1)b X (l)i = i and hb Y (l+m):::b Y (l+1)b Y (l)i = j correspond to the region A i;j in the sample space of X and Y. 153 continuous SI Y available to be used in joint decoding 11 . For encoding b X (l) we need to estimate the coding rate H(b X (l)jb X (l+1);:::;b X (l+m);Y): (4.35) To determine (4.35), we need the joint p.d.f.: p(b X (l);b X (l+1);:::;b X (l+m);Y): (4.36) We denote p(hb X (l+m):::b X (l+1)b X (l)i = i;Y = y) by ° i (y). Following the discussion in Section 4.7.2 it can be shown that ° i (y) = Z A i f X;Y (x;y)dx (4.37) where A i are subsets of X (Figure 4.15): A i =f x j c¢2 l+m+1 +i¢2 l ·jxj·c¢2 l+m+1 +i¢2 l +2 l ¡1; c2Z + g: (4.38) 4.8 Conclusions Inthispaper, wehaveinvestigatedcorrelationestimationfordistributedimageandvideo applications under rate and complexity constraints. Focusing on the situations when 11 Note that while this model is commonly used in distributed video applications (e.g., [3]), many lossy image compression applications may not communicate the LSB bit-planes and therefore a full resolution version of Y would not be available to be used for joint decoding. 154 |X| |Y| A 0 A 1 A 2 m+1 -1 A 0 A 1 … 0 2 l 2·2 l 2 m+1 ·2 l … Y=y Figure 4.15: The events leading to the occurrence of hb X (l+m):::b X (l+1)b X (l)i = i correspond to regions A i , subset of X. sampling techniques are employed to estimate the correlation, we ¯rst analyzed how the number of samples relates to the p.d.f. of the rate penalty when compressing a binary input source. The rate penalty was found to be normally distributed with parameters depending on the number of samples and the crossover probability between the source and SI. We then extended the analysis to the cases when multiple binary input sources are to be compressed and proposed a strategy to allocate samples to the sources such thattheoverallratepenaltycanbeminimized. Furthermore,Weproposedamodel-based estimationfortheparticularbutimportantsituationswhenbit-planesareextractedfrom a continuous-valued input source, and discussed extensions to the cases when bit-planes are extracted based on the signi¯cance of the data for wavelet-based DSC applications. Experimentresultsincludingrealimagecompressiondemonstratedmodel-basedapproach can achieve accurate estimation. In addition, the model-based estimation might lead to some implementation advantages. 155 Chapter 5 Conclusions and Future Work We have proposed in this thesis new video and image coding algorithms based on DSC to address some of the issues in conventional compression framework. First, we discuss a wavelet-based hyperspectral image compression algorithm that combines set partitioning with our proposed DSC techniques to achieve competitive cod- ingperformance. DSCtoolsallowencodingtoproceedin\openloop",andthisfacilitates parallel compression of spectral bands in multi-processors architectures. We also discuss a coding strategy that adaptively applies DSC or intra coding to bit-planes according to the statistics of the data, to maximize the coding gain. Experimental results suggest our scheme is comparable to a simple 3-D wavelet codec developed by NASA-JPL in terms of compression performance. Moreover, weproposetoaddress°exiblevideodecodingusingaDSCapproach. With DSC, the overhead to support °exible decoding depends on the worst case correlation noise rather than the number of possible decoding paths. As a result, DSC can lead to a solution that compares favorably to conventional approaches. Experimental results 156 using multiview video coding and forward/backward video playback demonstrate the improvement. Furthermore, we study the correlation estimation problem in DSC. We propose a model to characterize the coding rate penalty due to estimation error, and based on the model we propose a sample allocation algorithm to minimize the overall rate penalty in the multiple sources scenarios. We also propose a model-based estimation for distributed coding of continuous-valued input sources. We demonstrate the e®ectiveness of the pro- posed algorithms by experimental results including some based on real image data. Some related future research topics could be ² Peer-to-peer multicast streaming based on DSC. Conventional compression tools may lack the robustness to address peer-to-peer (P2P) multicast video streaming. In P2P networks, peer nodes serving the video data may disconnect at any time. Moreover, individual peer nodes may support di®erent and time-varying upstream data rates. Conventional compression algorithms fail to cope with these operating conditions since they are vulnerable to data loss and delay variation. In Chapter 3, wedemonstratethatwithourproposedalgorithmsitispossibletogenerateasingle compressed bitstream that can be decoded in several di®erent ways under di®erent operatingconditions. Thisenhancedadaptabilitycan,inprinciple,greatlyfacilitate P2P multicast streaming. ² Application-speci¯c hyperspectral image compression. Existing hyperspectral im- age compression algorithms are mostly optimized for rate-distortion. However, one 157 of the main applications of hyperspectral image data is to identify the ground ob- jects. Therefore,itisimportanttoinvestigatenewhyperspectralimagecompression techniques that optimally preserve the spectral signatures at a given rate, so as to achieve the best classi¯cation performance. 158 Appendix A Derivation of Bounds on the Estimation Error HerewejustifytheboundsontheestimationerrorinSection2.4.1.2. Weletthecrossover probability estimator be the upper bound of the (1¡!)£100% con¯dence interval for a population proportion, i.e., ^ p i = s i n i +z !=2 p p i (1¡p i )=n i ¼ s i n i +z !=2 s s i n i µ 1¡ s i n i ¶ =n i Let m=z !=2 p p i (1¡p i )=n i . By the de¯nition of con¯dence interval, we have Pr µ s i n i ¡m·p i · s i n i +m ¶ =1¡! Equivalently, Pr µ p i ¡m· s i n i ·p i +m ¶ =1¡! 159 By this and the fact that si ni can be approximated by a Normal density with mean p i and variance p i (1¡p i )=n i , we have Pr µ s i n i <p i ¡m ¶ =!=2 Pr µ s i n i +m¡p i <0 ¶ =!=2 Pr(^ p i ¡p i <0)=!=2 and Pr µ s i n i >p i +m ¶ =!=2 Pr µ s i n i +m¡p i >2m ¶ =!=2 Pr(^ p i ¡p i >2m)=!=2 From these equations, the probability of decoding error and probability of large en- coding rate penalty can be estimated. 160 Appendix B Derivation of Optimal Sample Allocation Here we give the detail of the derivation of the optimal sample allocation given in (4.14). We consider min fn l : P L¡1 l=0 n l =n T ;n l ·K l g E[4H]; (B.1) whereE[4H]= 1 K T P L¡1 l=0 K l ® l n ¡1=2 l and® l =ln( 1 p l ¡1)z !=2 p p l (1¡p l ). UsingLagrange multipliers, we construct J(n)= 1 K T L¡1 X l=0 K l ® l n ¡1=2 l +¸ L¡1 X l=0 n l : (B.2) Di®erentiate (B.2) and set to zero, @J @n l = ¡1 2K T K l ® l n ¡3=2 l +¸=0; (B.3) or n ¤ l =(K l ® l ) 2=3 ( 1 2¸K T ) 2=3 =(K l ® l ) 2=3 °: (B.4) 161 This is true as long as n ¤ l < K l . Otherwise, we need to use the Kuhn-Tucker conditions to ¯nd n ¤ l : @J @n l 8 > > < > > : =0; if n ¤ l <K l ; ·0; if n ¤ l =K l : (B.5) Solving (B.5) we obtain the results in (4.14). 162 Bibliography [1] A. Aaron and B. Girod. Compression with side information using turbo codes. In Proc. Data Compression Conference (DCC), 2002. [2] A.Aaron,P.Ramanathan,andB.Girod. Wyner-Zivcodingoflight¯eldsforrandom access. In Proc. Workshop on Multimedia Signal Processing (MMSP), 2004. [3] A. Aaron, S. Rane, and B. Girod. Wyner-Ziv video coding with hash-based motion compensation at the receiver. In Proc. Int'l Conf. Image Processing (ICIP), 2004. [4] A. Aaron, R. Zhang, and B. Girod. Wyner-Ziv coding of motion video. In Proc. Asilomar Conf. Signals, Systems, and Computers, Nov. 2002. [5] J. Bajcsy and P. Mitran. Coding for the Slepian-Wolf problem with turbo codes. In Proc. Global Telecommunications Conference (GLOBECOM), 2001. [6] M. Barni, D. Papini, A. Abrardo, and E. Magli. Distributed source coding of hyper- spectral images. In Proc. Int'l Symp. Geoscience and Remote Sensing (IGARSS), 2005. [7] A. R. Calderbank, I. Daubechies, W. Sweldens, and D. Yeo. Wavelet transforms that map integers to integers. Applied Comp. Harmonic Analy., 5:332{369, 1998. [8] N.-M.CheungandA.Ortega. Ane±cientandhighlyparallelhyperspectralimagery compression scheme based on distributed source coding. In Proc. Asilomar Conf. Signals, Systems, and Computers, 2006. [9] N.-M. Cheung and A. Ortega. A model-based approach to correlation estimation in wavelet-based distributed source coding with application to hyperspectral imagery. In Proc. Int'l Conf. Image Processing (ICIP), 2006. [10] N.-M. Cheung and A. Ortega. Distributed source coding application to low-delay free viewpoint switching in multiview video compression. In Proc. Picture Coding Symposium (PCS), 2007. [11] N.-M. Cheung and A. Ortega. Flexible video decoding: A distributed source coding approach. In Proc. Workshop on Multimedia Signal Processing (MMSP), 2007. [12] N.-M. Cheung and A. Ortega. Compression algorithms for °exible video decoding. In Proc. Visual Communications and Image Processing (VCIP), 2008. 163 [13] N.-M. Cheung and A. Ortega. Flexible video decoding: A distributed source coding approach. IEEE Transactions on Image Processing. In preparation, 2008. [14] N.-M. Cheung, C. Tang, A. Ortega, and C. S. Raghavendra. E±cient wavelet-based predictive Slepian-Wolf coding for hyperspectral imagery. EURASIP Journal on Signal Processing - Special Issue on Distributed Source Coding, Nov. 2006. [15] N.-M. Cheung, H. Wang, and A. Ortega. Correlation estimation for distributed source coding under information exchange constraints. In Proc. Int'l Conf. Image Processing (ICIP), 2005. [16] N.-M. Cheung, H. Wang, and A. Ortega. Video compression with °exible playback order based on distributed source coding. In Proc. Visual Communications and Image Processing (VCIP), 2006. [17] N.-M. Cheung, H. Wang, and A. Ortega. Sampling-based correlation estimation for distributedsourcecodingunderrateandcomplexityconstraints. IEEE Transactions on Image Processing. Under review, 2007. [18] R. N. Clark, G. Swayze, J. Boardman, and F. Kruse. Comparison of three methods formaterialsidenti¯cationandmappingwithimagingspectroscopy. In JPL AVIRIS Airborne Geoscience Workshop, 1993. [19] T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: Wiley, 1991. [20] I. Csiszar. Linear codes for sources and source networks: Error exponents, universal coding. IEEE Trans. Information Theory, 28(4):585{592, July 1982. [21] I. Csiszar and J. Korner. Towards a general theory of source networks. IEEE Trans. Information Theory, 26(2):155{165, Mar. 1980. [22] S. Draper and E. Martinian. Compound conditional source coding, Slepian-Wolf list decoding, and applications to media coding. In Proc. Int'l Symp. Information Theory (ISIT), 2007. [23] J. E. Fowler. An open source software libaray for quantization, compression and coding. In Proc. SPIE Applications of Digital Image Processing XXIII, pages 294{ 301, Aug 2000. [24] J. E. Fowler. QccPack: Quantization, Compression, Coding Library and Utilities. http://qccpack.sourceforge.net, 2006. [25] C.-H. Fu, Y.-L. Chan, and W.-C. Siu. E±cient reverse-play algorithms for MPEG video with VCR support. IEEE Trans. Circuits and Systems for Video Technology, 16(1), Jan. 2006. [26] M. Fumagalli, M. Tagliasacchi, and S. Tubaro. Improved bit allocation in an error- resilient scheme based on distributed source coding. In Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP), 2006. 164 [27] J. Garcia-Frias and Y. Zhao. Compression of correlated binary sources using turbo codes. IEEE Communications Letters, pages 417{419, Oct. 2001. [28] B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero. Distributed video coding. Proceedings of the IEEE, Special Issue on Advances in Video Coding and Delivery, 93(1):71{83, Jan. 2005. [29] M. Grangetto, E. Magli, and G. Olmo. Distributed arithmetic coding. IEEE Com- munications Letters, 11(11), Nov. 2007. [30] C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi, R. Leonardi, and J. Ostermann. Distributed monoview and multiview video coding. IEEE Signal Processing Maga- zine, 24, Sept. 2007. [31] X. Guo, Y. Lu, F. Wu, W. Gao, and S. Li. Distributed multi-view video coding. In Proc. Visual Communications and Image Processing (VCIP), 2006. [32] X. Guo, Y. Lu, F. Wu, W. Gao, and S. Li. Free viewpoint switching in multi-view video streaming using Wyner-Ziv video coding. In Proc. Visual Communications and Image Processing (VCIP), 2006. [33] Z. He, L. Cao, and H. Cheng. Correlation estimation and performance optimization for distributed image compression. In Proc. Visual Communications and Image Processing (VCIP), 2006. [34] Z. He and X. Chen. Correlation structure analysis for distributed video compression over wireless video sensor networks. In Proc. Visual Communications and Image Processing (VCIP), 2006. [35] ISO/IEC JTC1/SC29/WG11. Introduction to multiview video coding. Technical Report N7328, MPEG document, 2005. [36] A. Jagmohan, A. Sehgal, and N. Ahuja. Compressed of light¯eld rendered images using coset codes. In Proc. Asilomar Conf. Signals, Systems, and Computers, 2003. [37] M. Karczewicz and R. Kurceren. The SP- and SI-frames design for H.264/AVC. IEEE Trans. Circuits and Systems for Video Technology, 13, July 2003. [38] A.Kiely,M.Klimesh,H.Xie,andN.Aranki. ICER-3D:Aprogressivewavelet-based compressor for hyperspectral images. Technical report, NASA-JPL IPN Progress Report, 2006. http://ipnpr.jpl.nasa.gov/progress report/42-164/164A.pdf. [39] M. Klimesh, A. Kiely, H. Xie, and N. Aranki. Spectral ringing artifacts in hy- perspectral image data compression. Technical report, NASA-JPL IPN Progress Report, 2005. http://ipnpr.jpl.nasa.gov/progress report/42-160/160C.pdf. [40] D. Landgrebe. Hyperspectral image data analysis. IEEE Signal Processing Maga- zine, 19, Jan 2002. 165 [41] P. Lassila, J. Karvo, and J. Virtamo. E±cient importance sampling for monte carlo simulation of multicast networks. In Proc. Conf. Computer Communications (IN- FOCOM), 2001. [42] C. W. Lin, J. Zhou, J. Youn, and M. T. Sun. MPEG video streaming with VCR functionality. IEEE Trans. Circuits and Systems for Video Technology, 11(3), Mar. 2001. [43] A. Liveris, Z. Xiong, and C. Georghiades. Compression of binary sources with side information at the decoder using LDPC codes. IEEE Communications Letters, 6, Oct. 2002. [44] A. Liveris, Z. Xiong, and C. Georghiades. A distributed source coding technique for correlated images using turbo-codes. IEEE Communications Letters, 2002. [45] A. D. Liveris, C.-F. Lan, K. Narayanan, Z. Xiong, and C. N. Georghiades. Slepian- Wolf coding of three binary sources using LDPC codes. In Proc. Intl. Symp. Turbo Codes and Related Topics, Sept. 2003. [46] D. J. C. MacKay. Good error-correcting codes based on very sparse matrices. IEEE Trans. Information Theory, pages 399{431, Mar. 1999. [47] E. Magli, M. Barni, A. Abrardo, and M. Grangetto. Distributed source coding techniques for lossless compression of hyperspectral images. EURASIP Journal on Applied Signal Processing, 2007, 2007. [48] A.Majumdar,J.Chou,andK.Ramchandran. Robustdistributedvideocompression based on multilevel coset codes. In Proc. Asilomar Conf. Signals, Systems, and Computers, 2003. [49] A. Majumdar, R. Puri, P. Ishwar, and K. Ramchandran. Complexity/performance trade-o®s for robust distributed video coding. In Proc. Int'l Conf. Image Processing (ICIP), 2005. [50] W.MendenhallandT.Sincich. StatisticsforEngineeringandtheSciences. Prentice- Hall, 1995. [51] A. C. Miguel, A. R. Askew, A. Chang, S. Hauck, and R. E. Ladner. Reduced complexity wavelet-based predictive coding of hyperspectral images for FPGA im- plementation. In Proc. Data Compression Conference (DCC), 2004. [52] A. Naman and D. Taubman. A novel paradigm for optimized scalable video trans- mission based on JPEG2000 with motion. In Proc. Int'l Conf. Image Processing (ICIP), 2007. [53] A. Ortega. Video coding: Predictions are hard to make, especially about the future. In Proc. Image Media Processing Symposium, 2007. [54] M. Pedram and E. J. M. Rabaey. Power Aware Design Methodologies. Kluwer Academic Publishers, 2002. 166 [55] S. Pradhan, J. Kusuma, and K. Ramchandran. Distributed compression in a dense microsensor network. IEEE Signal Processing Magazine, pages 51{60, Mar. 2002. [56] S. Pradhan and K. Ramchandran. Distributed source coding using syndromes (DIS- CUS): Design and construction. In Proc. Data Compression Conference (DCC), 1999. [57] S. Pradhan and K. Ramchandran. Distributed source coding: symmetric rates and applications to sensor networks. In Proc. Data Compression Conference (DCC), 2000. [58] S. Pradhan and K. Ramchandran. Distributed source coding using syndromes (DIS- CUS). IEEE Trans. Information Theory, 49(3):626{643, Mar. 2003. [59] R.PuriandK.Ramchandran. PRISM:anewrobustvideocodingarchitecturebased on distributed compression principles. In Proc. Allerton Conf. Communications, Control, and Computing, Oct. 2002. [60] S. Rane and B. Girod. Systematic lossy error protection based on H.264/AVC re- dundantslices. InProc. Visual Communications and Image Processing (VCIP),Jan. 2006. [61] A. Said and W. A. Pearlman. A new, fast, and e±cient image codec using set parti- tioning in hierarchical trees. IEEE Trans. Circuits and Systems for Video Technol- ogy, pages 243{250, June 1996. [62] A.Sehgal,A.Jagmohan,andN.Ahuja. Wyner-Zivcodingofvideo: anerror-resilient compression framework. IEEE Trans. Multimedia, 6(2):249{258, Apr. 2004. [63] D. Slepian and J. Wolf. Noiseless coding of correlated information sources. IEEE Trans. Information Theory, 19:471{480, July 1973. [64] N. T. Slingerland and A. J. Smith. Cache performance for multimedia applications. In Proc. Int'l Conf. Supercomputing, 2001. [65] S.R.SmootandL.A.Rowe. StudyofDCTcoe±cientdistributions. In Proc. SPIE, Jan. 1996. [66] E. Steinbach, N.Farber, and B.Girod. Standard compatiable extension of H.263 for robust video transmission in mobile environments. IEEE Trans. Circuits and Systems for Video Technology, 7, Dec. 1997. [67] C. Tang, N.-M. Cheung, A. Ortega, and C. S. Raghavendra. E±cient inter-band prediction and wavelet based compression for hyperspectral imagery: A distributed source coding approach. In Proc. Data Compression Conference (DCC), 2005. [68] C. Tang and C. S. Raghavendra. Bitplane coding for correlations exploitation in wireless sensor networks. In Proc. Int'l Conf. Communications (ICC), Seoul, May 2005. 167 [69] X. Tang, W. A. Pearlman, and J. W. Modestino. Hyperspectral image compression using three-dimensional wavelet coding: A lossy-to-lossless solution. submitted to IEEE Transactions on Geoscience and Remote Sensing. [70] M.Tanimoto. FTV(freeviewpointtelevision)creatingray-basedimageengineering. In Proc. Int'l Conf. Image Processing (ICIP), 2005. [71] D. Taubman and M. Marcellin. JPEG2000: Image compression fundamentals, stan- dards and practice. Kluwer, 2002. [72] V. Thirumalai, I. Tosic, and P. Frossard. Distributed coding of multiresolution omnidirectional images. In Proc. Int'l Conf. Image Processing (ICIP), 2007. [73] J. Tian. Software for MATLAB SPIHT codes. Available at MATLAB Central http://www.mathworks.com/matlabcentral/. [74] B. Usevitch. Optimal bit allocation for biorthogonal wavelet coding. In Proc. Data Compression Conference (DCC), 1996. [75] M. van der Schaar and P. Chou. Multimedia over IP and Wireless Networks: Com- pression, Networking, and Systems. Academic Press, 2007. [76] H. Wang, N.-M. Cheung, and A. Ortega. WZS: Wyner-Ziv scalable predictive video coding. In Proc. Picture Coding Symposium (PCS), Dec. 2004. [77] H. Wang, N.-M. Cheung, and A. Ortega. A framework for adaptive scalable video coding using Wyner-Ziv techniques. EURASIP Journal on Applied Signal Process- ing, 2006. [78] J. Wang, A. Majumdar, K. Ramchandran, and H. Garudadri. Robust video trans- mission over a lossy network using a distributed source coded auxiliary channel. In Proc. Picture Coding Symposium (PCS), 2004. [79] J. Wang, V. Prabhakaran, and K. Ramchandran. Syndrome-based robust video transmissionovernetworkswith burstylosses. In Proc. Int'l Conf. Image Processing (ICIP), 2006. [80] Y. Wang, J. Ostermann, and Y. Zhang. Video Processing and Communication. Prentice Hall, New Jersy, 2002. [81] S.J.WeeandB.Vasudev. Compressed-domainreverseplayofMPEGvideostreams. In Proc. Multimedia Systems and Applications, 1999. [82] T.Wiegand. Joint¯nalcommitteedraftforjointvideospeci¯cationH.264. Technical Report JVT-D157, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, 2002. [83] A. Wyner. Recent results in the Shannon theory. IEEE Trans. Information Theory, 20(1):2{10, 1974. 168 [84] A. Wyner and J. Ziv. The rate-distortion function for source coding with side infor- mation at the decoder. IEEE Trans. Information Theory, 22:1{10, Jan. 1976. [85] Z. Xiong, A. Liveris, and S. Cheng. Distributed source coding for sensor networks. IEEE Signal Processing Magazine, 21:80{94, Sept. 2004. [86] Q. Xu and Z. Xiong. Layered Wyner-Ziv video coding. In Proc. Visual Communi- cations and Image Processing (VCIP), Jan. 2004. [87] X. Zhou, W.-Y. Kung, and C.-C. J. Kuo. A robust H.264 video streaming scheme for portable devices. In Proc. Int'l Symp. Circuits and Systems, 2005. [88] X. Zhu, A. Aaron, and B. Girod. Distributed compression for large camera arrays. In Proc. Workshop on Statistical Signal Processing, 2003. 169
Abstract (if available)
Abstract
Many video compression schemes (e.g., the recent H.264/AVC standard) and volumetric image coding algorithms are based on a closed-loop prediction (CLP) framework. While CLP based schemes can achieve state-of-the-art coding efficiency, they are inadequate in addressing some important emerging applications such as wireless video, multiview video, etc, which have new requirements including low complexity encoding, robustness to transmission error, flexible decoding, among others. In this research we investigate new video and image compression algorithms based on distributed source coding (DSC), and we demonstrate the proposed algorithms can overcome some of the deficiencies in CLP based systems while achieving competitive coding performance.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Algorithms for scalable and network-adaptive video coding and transmission
PDF
Robust video transmission in erasure networks with network coding
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Advanced techniques for green image coding via hierarchical vector quantization
PDF
Efficient coding techniques for high definition video
PDF
Focus mismatch compensation and complexity reduction techniques for multiview video coding
PDF
Random access to compressed volumetric data
PDF
Advanced intra prediction techniques for image and video coding
PDF
Texture processing for image/video coding and super-resolution applications
PDF
Efficient transforms for graph signals with applications to video coding
PDF
Coded computing: a transformative framework for resilient, secure, private, and communication efficient large scale distributed computing
PDF
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Application-driven compressed sensing
PDF
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
PDF
Techniques for compressed visual data quality assessment and advanced video coding
PDF
Coded computing: Mitigating fundamental bottlenecks in large-scale data analytics
PDF
Optimization strategies for robustness and fairness
PDF
Structured codes in network information theory
PDF
Syntax-aware natural language processing techniques and their applications
PDF
Provenance management for dynamic, distributed and dataflow environments
Asset Metadata
Creator
Cheung, Ngai-Man
(author)
Core Title
Distributed source coding for image and video applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
06/19/2008
Defense Date
02/05/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
correlation estimation,distributed source coding,flexible video decoding,hyperspectral imagery,multiview video,OAI-PMH Harvest,robust video transmission,Slepian-Wolf,Wyner-Ziv
Language
English
Advisor
Ortega, Antonio (
committee chair
), Kuo, C.-C. Jay (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
ncheung@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1276
Unique identifier
UC1204469
Identifier
etd-Cheung-20080619 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-72568 (legacy record id),usctheses-m1276 (legacy record id)
Legacy Identifier
etd-Cheung-20080619.pdf
Dmrecord
72568
Document Type
Dissertation
Rights
Cheung, Ngai-Man
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
correlation estimation
distributed source coding
flexible video decoding
hyperspectral imagery
multiview video
robust video transmission
Slepian-Wolf
Wyner-Ziv