Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Error resilient techniques for robust video transmission
(USC Thesis Other)
Error resilient techniques for robust video transmission
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ERROR RESILIENT TECHNIQUES FOR ROBUST VIDEO TRANSMISSION by Wei-Ying Kung A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2004 Copyright 2004 Wei-Ying Kung Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3145223 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3145223 Copyright 2004 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. D ed ica tio n To my beloved father,mother,sister and Shih-Chieh. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ck n ow led gem en ts First, I would like to show an immense gratitude to my advisor Professor C.-C. Jay Kuo for his generous support and intelligent guidance. His patience and en couragement gave me confidence to keep striving for solutions, and his insights and advices helped me to stay focus on the key points that we could make con tributions. I would like to greatly thank my mentor Dr.Chang-Su Kim for his constant help even after he left our group. I learned valuable opinions and knew my blind spots every time after discussing with him. His brilliant ideas and professional expertise speeded up my research progress and enhanced the quality of my work. I also would like to express my deep appreciation to Professor Antonio Ortega, Professor Alexander A. Sawchuk, Professor Shrikanth S. Narayanan and Professor Roger Zimmermann for serving as my qualify committee members and providing useful feedback. I especially thank Professor Antonio Ortega and Professor Roger Zimmermann for their precious time to further serve on my defense committee. Finally, I would like to thank my group members and other friends for their friendship and assistance. iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents D edication ii Acknowledgem ents iii List o f Tables vii List of Figures ix A bstract xiv 1 Introduction 1 1.1 Significance of the R e s e a rc h .................................................................. 1 1.2 Description of the Research .................................................................. 3 1.3 Contributions of the R ese arch .............................................................. 5 1.4 Outline of the P roposal........................................................................... 7 2 R eview of Previous Work 8 2.1 Introduction.............................................................................................. 8 2.2 Error Protection ................................................................................... 10 2.2.1 Layered Coding and Unequal Error P ro tectio n ..................... 11 2.2.2 Multiple D escrip tio n .................................................................. 13 2.2.3 Joint Source/Channel Resource A llo catio n ........................... 13 2.3 Error C o n fin em en t................................................................................ 14 2.3.1 Resynchronization ..................................................................... 15 2.3.2 Reversible Variable Length C o d in g ........................................ 15 2.3.3 Independent Segment Prediction ( I S P ) .................................. 16 2.3.4 Intra-Block or Frame Refreshment............................................ 16 2.4 Error C oncealm ent................................................................................ 17 2.4.1 Motion Compensated Temporal P red ictio n ............................ 17 2.4.2 Spatial In terp o latio n .................................................................. 18 2.5 Error C o n tro l.......................................................................................... 18 2.5.1 Reference Picture Selection (R P S )............................................ 19 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 W ireless V ideo Transm ission w ith Unequal Error P rotection 22 3.1 Multiple Video Transmission over CDMA Systems with Unequal Error Protection .................................................................................... 22 3.1.1 Modified H.263 Source C o d e r .................................................. 24 3.1.2 Channel M o d e l............................................................................ 27 3.1.3 Channel C o d e r ............................................................................ 30 3.1.4 Channel Code and Multicode Assignment ........................... 33 3.1.5 S im u latio n s.................................................................................. 41 3.2 Packet Video Transmission with Adaptive Channel Rate Allocation 46 3.2.1 Video C o d er.................................................................................. 48 3.2.2 Static Channel Code Rate A llocation ..................................... 53 3.2.3 Video Source Model .................................................................. 59 3.2.4 Channel D isto rtio n ..................................................................... 67 3.2.5 A d a p ta tio n .................................................................................. 70 3.2.6 Simulation R e s u lts ..................................................................... 73 3.3 C onclusions............................................................................................. 81 4 Spatial and Temporal Error C oncealm ent Techniques 83 4.1 Introduction............................................................................................. 83 4.2 Previous Work on Error Concealm ent................................................ 85 4.2.1 I-Frame Concealm ent.................................................................. 85 4.2.2 P-Frame C o n cealm en t............................................................... 87 4.3 Directional Interpolation for I-frame C oncealm ent......................... 89 4.3.1 Edge Recovery ............................................................................ 89 4.3.2 Selective Directional Interpolation............................................ 93 4.4 MMSE Decoding for P-frame C oncealm ent...................................... 95 4.4.1 Error Tracking M odel.................................................................. 95 4.4.2 MMSE Decoding with Two Concealment M o d es.................. 99 4.5 P-Frame Concealment M o d e s ............................................................. 104 4.5.1 Temporal Linear Interpolation.................................................. 104 4.5.2 Motion Vector Recovery with Block M a tc h in g ...................... 104 4.6 Summary of Decoder Im plem entation................................................ 106 4.7 Simulation R e s u lts ................................................................................ 109 4.7.1 Error Concealment of I- F ra m e s ............................................. 110 4.7.2 Error Concealment of P -F ram es............................................. 112 4.7.3 Error Concealment of I and P-Fram es.................................... 116 4.8 Conclusion................................................................................................ 118 5 Analysis of M ulti-Hypothesis M otion Compensated Prediction (M H M C P) for R obust V isual Com m unication 119 5.1 Introduction............................................................................................. 119 5.2 Error Propagation Model for MHMCP Coder ................................ 123 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Effect of Hypothesis N um ber................................................................. 127 5.3.1 Impact on Propagation E r r o r ....................................... 127 5.3.2 Impact on Bit R a t e s ............................................................... 130 5.3.3 Rate-Distortion A nalysis.......................................................... 135 5.3.4 Comparison with Random Intra Refreshing........................... 138 5.4 Effect of Hypothesis Coefficients.......................................................... 141 5.4.1 Impact on Propagation E r r o r ....................................... 144 5.4.2 Impact on Bit R a t e s ............................................................... 146 5.4.3 Rate-distortion A n a ly sis.......................................................... 148 5.4.4 Comparison with Random Intra Refreshing........................... 150 5.5 Conclusion................................................................................................. 154 6 Conclusion and Future work 155 6.1 Conclusion................................................................................................ 155 6.2 Future W ork............................................................................................. 156 References 157 Bibliography 168 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables 2.1 Comparison of error resilience strategies..................................... 9 3.1 Results for the AWGN channel with a fixed weight ratio........ 36 3.2 Results for the frequency selective Rayleigh fading channel with a fixed weight ratio............................................................................. 36 3.3 Results for the AWGN channel with a fixed SNR ratio........... 37 3.4 Results for the frequency selective Rayleigh fading channel with a fixed SNR ratio................................................................................ 38 3.5 Search results, estimated and simulated BERs for the proposed scheme in the AWGN channel (SR = source rate, CR = channel rate, MCs = no. of multicodes.) ....................................................... 44 3.6 Estimated and simulated BERs for the reference system in the AWGN channel................................................................................ 44 3.7 Search results, estimated and simulated BERs for the proposed scheme in the Rayleigh fading channel........................................ 45 3.8 Estimated and simulated BERs for the reference system in the Rayleigh fading channel.................................................................. 45 3.9 An example of the fast channel code rate assignment.............. 58 3.10 Comparison of different search results.......................................... 58 3.11 Parameters for the rate-quantization model Re = + B .... 60 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1 Parameters for MMSE decoding. The concealment error variances are normalized with respect to the intra concealment error variance 255................................................................................................................ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures 2.1 Reference picture selection by ACK/NACK....................................... 21 3.1 The blockdiagram of the H.263 video encoder................................... 25 3.2 Illustration of the proposed data partitioning scheme...................... 26 3.3 The frequency-selective Rayleigh fading simulator............................ 29 3.4 RCPC with K=3, rate =1 /2 ............................................................. 31 3.5 The block diagram of a wireless CDMA communication syetem. . 34 3.6 BER v.s. no. of users in the AWGN channel..................................... 41 3.7 BER v.s. no. of users in the frequency-selective slow Rayleigh fading channel............................................................................................ 41 3.8 The PSNR performance for two test sequences................................. 46 3.9 MB grouping for a QCIF f r a m e ......................................................... 50 3.10 Error concealment by linear interpolation........................................... 52 3.11 Loss of 3rd base packet: (a) The erroneous region is copied from the previous frame, (b) The erroneous region is concealed by the proposed algorithm................................................................................... 53 3.12 The rate-quantization curves for (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘S alesm an ’ sequences.............................................................................................. 61 3.13 The bit rate estimation for (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Sales man’ sequences.......................................................................................... 64 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.14 The distortion-Q curves for (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Salesman’ sequences................................................................................. 66 3.15 The estimation of the quantization distortion Dsource for each packet: (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Salesman’ sequences.................. 68 3.16 The estimated distortion due to each packet loss: (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Salesman’ sequences........................................... 71 3.17 Flow chart of adaptation c h a rt.............................................................. 72 3.18 PSNR comparison for ‘Claire’ lst-50th frames, which have slow motion characteristics: (a) BER = 0.01 and (b) BER = 0.05. . . 74 3.19 PSNR comparison for ‘Foreman’ 100th-150th frames, which have fast motion characteristics: (a) BER = 0.01 and (b) BER = 0.05. 75 3.20 PSNR comparison for ‘Foreman’ lst-50th frames, which have mod erate motion activities: (a) BER = 0.01 and (b) BER = 0.05. . . 75 3.21 Several frames of ‘Foreman’ sequence, which are reconstructed with (a) equal error protection and direct copying (‘EEP w /o ER’) and (b) unequal error protection and error concealment (‘UEP w / ER’). 77 3.22 The PSNR comparison for ‘Claire’ sequence at (a) BER = 10_1 ~ 10"1 1 and (b) BER = 10’ 1 ~ 10~3........................................................ 79 3.23 The PSNR comparison for ‘Salesman’ sequence at (a) BER = 10-1 ~ 10“ n and (b) BER = 10”1 ~ 10~3.......................................... 80 3.24 The PSNR comparison for ‘Foreman’ sequence at (a) BER = 10”1 ~ 10-1 1 and (b) BER - lO”1 - 10“3........................................................ 80 3.25 Several frames of ‘Foreman’ sequence, which are reconstructed by (a) the non-adaptive system and (b) the proposed adaptive system. 81 4.1 The edge recovery process........................................................................ 91 Ei_f_£2 4.2 Selective directional interpolation: p = ^ 93 d , d2 4.3 Interpolation schemes for half-pixel motion compensation................ 97 4.4 Motion vector recovery with block matching....................................... 105 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 MMSE decoding of an MB in a P-frame............................................... 108 4.6 Three MBs are lost in the left image and concealed by the proposed directional interpolation in the right image......................................... 110 4.7 I-frame concealment results when 20 MBs are lost, where the left image is obtained by spatial linear interpolation and the right image is obtained by the proposed directional interpolation scheme. . . . 110 4.8 I-frame concealment results when the second GOB is lost, where the left image is obtained by spatial linear interpolation while the right image is obtained by the proposed directional interpolation scheme......................................................................................................... I l l 4.9 Comparison of different temporal concealment techniques, when the previous frame is correctly reconstructed...................................... 113 4.10 The PSNR performance comparison of several temporal conceal ment techniques for the “Foreman” sequence, where the GOB pack etization is used and the packet loss rate is 0.1.................................. 114 4.11 The PSNR performance comparison of temporal concealment tech niques for the “Foreman” sequence, where the interleaving packe tization is used and the packet loss rate is 0.1.................................... 115 4.12 The PSNR value versus the packet loss rate (GOB packetization), where ‘SI’ denotes the spatial linear interpolation scheme and ‘DP denotes the proposed directional interpolation scheme..........................116 4.13 The PSNR value versus the packet loss rate for the interleaving packetization, where ‘SI’ denotes the spatial linear interpolation scheme and ‘DP denotes the proposed directional interpolation scheme......................................................................................................... 117 5.1 Illustration of the multi-hypothesis motion compensated prediction (MHMCP) scheme.................................................................................... 124 5.2 Comparison of theoretical and experimental NMSE values after a burst error occurs in the 10th frame of the “Foreman” sequence. . 129 5.3 The averaged NMSE as a function of the hypothesis number for the “Foreman” test sequence.................................................................. 130 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4 The variance of the joint prediction error as a function of the cor relation coefficient between two individual prediction errors for the double-hypothesis MCP scheme............................................................. 133 5.5 Comparison of theoretical and experimental bit rates as a function of the hypothesis number for the “Foreman” sequence........................ 135 5.6 Theoretical and experimental rate-distortion curves for the “Fore man” sequence caused by a burst error............................................... 137 5.7 Comparison of the rate-PSNR performances of SHMCP (n = 1), MHMCP (n = 2,3) and adaptive MHMCP for (a) “Foreman” and (b) “Carphone” sequences...................................................................... 139 5.8 Comparison of the rate-PSNR performance between MHMCP and IR after a burst error: (a) the coding of the next frame and (b) the average performance of encoding the 100 frames after the corrupted one................................................................................................................ 142 5.9 Comparison of the rate-PSNR performance between MHMCP and IR in the GOB loss environment: (a) the “Foreman” and (b) the “Carphone” sequences.............................................................................. 143 5.10 Comparison of theoretical and experimental MSEs after a burst error in the triple-hypothesis MCP for the “Foreman” sequence with weighting coefficients w = (0.1,0.45,0.45)................................ 145 5.11 The propagation error of the triple-hypothesis MCP is plotted as a function of hypothesis coefficients...................................................... 146 5.12 The impact of hypothesis coefficients on the bit rates for the “Fore man” sequence in the triple-hypothesis MCP...................................... 148 5.13 The rate-distortion performance of the triple-hypothesis MCP after a burst error............................................................................................... 149 5.14 Performance comparison of the adaptive and the fixed hypothesis coefficient schemes with w = (0.33,0.33,0.34) for (a) the “Fore man” and (b) the “Carphone” sequences............................................. 151 5.15 The rate-PSNR performances of the triple-hypothesis MCP and IR for the “Foreman” sequence due to a burst error: (a) the next frame and (b) the average of the following 100 frames after the corrupted one................................................................................................................ 152 xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.16 The rate-PSNR performances of the triple-hypothesis MCP and IR for the “Carphone” sequence due to a burst error: (a) the next frame and (b) the average of the following 100 frames after the corrupted one............................................................................................. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A b stract W ith the rapid growth of multimedia services, increasing demands on video trans mission have driven numerate researchers to develop error resilient techniques. The perceptual video quality is influenced by not only compression artifacts, but also the channel errors. Hence, a desired video codec has to accommodate the contradictory requirements, which are coding efficiency, robustness to data loss as well as other limitations, such as memory, bandwidth and complexity. In this work, we first presented unequal error protection schemes in multi-user and end-to-end systems. Fast searching algorithms have been proposed to sig nificantly speed up the system optimization. Additionally, we have developed a simple rate-distortion model in terms of quantization parameter, so that the rate and the distortion can be estimated without an extensive encoding procedure. Secondly, two novel error concealment methods have been proposed for I-frames and P-frames, respectively. The I-frame error concealment method employs edge detection and directional interpolation. It can efficiently recover both smooth and edge areas, while demanding a low computational complexity. The P-frame error concealment method uses error tracking and dynamic mode weighting, ft is xiv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. able to not only better compensate the corrupted macroblocks, but also suppress the propagation errors. Thirdly, we have investigated the error propagation ef fect in the multi-hypothsis motion compensated prediction (MHMCP) coder and analyzed the rate-distortion performance in terms of the hypothesis number and hypothesis coefficients. Adaptive MHMCP codecs have been designed to further enhance the rate distortion performance. Additionally, we have compared the per formance of MHMCP with other error resilient tools, and shown that MHMCP suppresses the short-term effect of error propagation more effectively than the intra-refreshing scheme. Finally, several design principles for the MHMCP coder are derived based on the analytical and experimental results. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction 1.1 Significance o f th e R esearch W ith the increasing demand on multimedia services and the stimulus of 4th gen eration wireless systems and Internet, it has drawn much attention from industry and academia to design a suitable method for transmitting multimedia signals over those lossy channels. Neither wireless channels nor the Internet are reliable enough to guarantee error-free transmission. Wireless channels have the path loss, long-term fading and short-term fading effects, which result in fast fluctuation and unreliability. Also, packet loss and delay are inevitable in the Internet. On the other hand, a raw video file tends to consume a large memory for sto rag e a n d d em an d a large b a n d w id th for delivery. V ideo signals have to be compressed so that they can be transmitted or stored more effectively. However, 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a higher compression ratio results in more vulnerability to errors. In variable length coding (VLC), synchronization between the encoder and the decoder is required for correct decoding. Even a single bit error may cause the loss of syn chronization so that the remaining bit stream cannot be decoded properly. The motion compensated prediction scheme is also vulnerable, since transmission er rors in a frame tend to propagate to subsequent frames. Therefore, it is necessary to design a robust transmission scheme to protect the quality of multimedia signals against transmission errors. In this research, we focus on techniques for robust video transmission over lossy channels. Our objective is to design, analyze and evaluate error resilient techniques for various applications with different requirements. For instance, as transmission channels are hostile, channel coding schemes are often used to pro tect the underlying video bit streams, and unequal error protection schemes are designed to achieve an efficient bit allocation between the source and the chan nel. When both the bandwidth and delay are critical, efficient post-processing techniques are developed to compensate the loss and reduce annoyance to human eyes. Finally, under the condition where a small amount of additional bandwidth is allowed, an error resilient enhanced encoder is proposed to effectively suppress the propagation errors. 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2 D escrip tio n o f th e R esearch Our research has provided a variety of solutions to achieve error resilience for robust video transmission. They are briefly summarized below. 1. Unequal error protection for wireless video transmission To achieve robust video streaming, we insert the rate-compatible punctured convolutional (RCPC) code to protect video signals in fading channels. For multi-user CDMA system, we propose a scheme to adjust channel coding rates based on the importance of source data as well as the effect of re sulting interference. Then, we iteratively search the proper channel coding rates for separated layers with unequal importances to minimize the defined cost functions. Finally, those protected bitstreams are spread by using an appropriate number of multi-codes. Both AWGN and frequency selective Rayleigh fading channels are examined. In addition, we develop a simple rate-distortion model for general video coders using DCT and motion com pensation, so that the rate and the distortion can be estimated without an extensive encoding procedure. Simulation results show that the proposed algorithms provide acceptable image quality even in high bit error rate en vironment. 2. Error concealment to compensate lost information At the decoder side, two novel error concealment techniques are proposed 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for video transmission over noisy channels, we present a spatial error con cealment method to compensate a lost macroblock in intra-coded frames, in which no useful temporal information is available. Based on selective direc tional interpolation, our method can recover both smooth and edge areas at a low computational complexity. Second, we examine a dynamic mode- weighted error concealment method for replenishing missing pixels in a lost macroblock of inter-coded frames. Our method adopts a decoder-based er ror tracking model and combines several concealment modes adaptively to minimize the mean square error of each pixel. The method is capable of concealing lost packets as well as reducing the error propagation effect. Ex tensive simulations have been performed to demonstrate the performance of the proposed methods in error-prone environments. 3. Multi-hypothesis motion compensated prediction (MHMCP) An MHMCP scheme, which predicts a block from a weighted superposition of more than one reference blocks in the frame buffer, is proposed and ana lyzed for error resilient visual communication in this research. By combining these reference blocks effectively, MHMCP can enhance the error resilient capability of compressed video as well as achieve a coding gain. In partic ular, we investigate the error propagation effect in the MHMCP coder and analyze the rate-distortion performance in terms of the hypothesis number 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and hypothesis coefficients. It is shown that MHMCP suppresses the short term effect of error propagation more effectively than the intra refreshing scheme. Simulation results are given to confirm the analysis. Finally, several design principles for the MHMCP coder are derived based on the analytical and experimental results. 1.3 C on trib u tion s o f th e R esearch The main contributions of this research can be stated as follows. 1. For the multiuser system, we have set up a framework for the joint design of channel coding and spreading, and provide a solution to the trade-off between the channel coding rate and the assigned number of multi-codes to achieve efficient transmission. Two cost functions are defined. One is designed to calculate the sum of the weighted residual bit error rates after applying channel codes with typical rate assignment. The other reflects the distance of the estimated bit error rates from the desired bit error rates for the two layers. Then the corresponding channel coding rates as well as the number of multi-codes are selected to minimize the cost functions. Furthermore, an iterative searching algorithm is proposed to speed up the decisions of the optimum solutions for those two cost functions. 2. For pre-compressed video signals, we have developed a fast channel rate 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. assignment algorithm to realize unequal error protection, which significantly reduces the exponential searching space to be linear. 3. We have analyzed effects on source and channel distortions of video char acteristics and quantization parameters. According to our investigation, we have proposed low-complexity methods to estimate the packet loss rate and its corresponding source and channel distortions and applied them to real-time adaptive video transmission based on the estimated rate-distortion data. 4. We have devised a novel error concealment method for I-frames, which is able to restore edge components as well as low frequency information by em ploying edge detection and directional interpolation with a low complexity. 5. We have developed a dynamic error compensation scheme based on a relative distortion indicator. This method attempts not only to conceal erroneous blocks but also to suppress the error propagation phenomenon. 6. We have investigated the error propagation effect in MHMCP. The rate- distortion performances in terms of hypothesis number and hypothesis co efficients has been analyzed thoroughly. Simulation results are given to confirm the analysis. 7. Several design principles for the MHMCP coder are derived based on the analytical and experimental results. MHMCP can alleviate the effect of error 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. propagation by combining several predictions for motion compensation, but requires a higher bit rate to represent the additional motion information. It is shown that a hypothesis number no larger than three is suitable at low bit rates. Also, in the triple-hypothesis MCP, the optimum R-D points are achieved when the second coefficient w2 is set to 1/3. 8. By comparing MHMCP with the intra refreshing (IR) scheme, we showed that MHMCP suppresses the short-term error propagation more effectively than IR. The performance of MHMCP can be improved furthermore, if the hypothesis number and the hypothesis coefficients become adaptive at the MB level at the expense of a higher encoder complexity. 1.4 O u tlin e o f th e P rop osal In this introductory chapter, research motivations, scopes, and contributions are stated. The remainder of this thesis is organized as follows. In Chapter 2, we briefly review existing error resilient techniques. Then, we describe our three research tasks in the following three chapters, respectively. First, unequal er ror protection schemes are presented in Chapter 3. Second, decoder-based error concealment techniques are proposed in Chapter 4. The encoder-based multi hypothesis motion compensated prediction scheme is described in Chapter 5. Fi nally, conclusion and future work are given in Chapter 6. 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 R eview of Previous Work 2.1 In trod u ction Video compression technologies have been extensively studied in recent years. The basic concept of video compression is to reduce the amount of bits for video representation by exploiting spatial and temporal correlations in image sequences. In general, the discrete cosine transform (DCT) is employed to transform time domain signals to frequency domain coefficients so that signal energies are con centrated in low frequency regions. Then, those frequency components can be effectively encoded w ith quantization and variable length coding (VLC) due to energy compaction and long consecutive zeros. Moreover, the compression per form ance can b e fu rth e r en h an ced by em ploying m o tio n -co m p en sated p red ictio n , which predicts each frame blockwise from the previous frame. The prediction 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2. Comparison of error resilience strategies. Error Resilience Error Error Error Error Strategies Protection Confinement Concealment Control Coding Efficiency Poor Fair Excellent Good Extra Delay No No No Yes Error Resilience Capability Excellent Good Fair Excellent error can be more effectively compressed than the original frame data. Unfortunately, most channels such as wireless channels and the Internet are not reliable enough to guarantee error-free transmission. Wireless channels have the path loss, long-term fading and short-term fading effects, which result in fast fluctuation and unreliability. Also, packet loss and delay are inevitable in the Internet. Compressed video signals are very sensitive to transmission errors. In VLC, synchronization between the encoder and the decoder is required for correct decoding. Even a single bit error may cause the loss of synchronization so that the remaining bit stream cannot be decoded properly. The motion compensated prediction scheme is also vulnerable, since transmission errors in a frame tend to propagate to subsequent frames. Table 2.1 compares four strategies to enhance the error resilience of video bitstreams against transmission errors. They are: (1) error protection, (2) error confinement, (3) error concealment and (4) error control. The first two strategies are either adding redundant bits into the compressed bit stream or reducing the use of prediction. Both methods are effective especially in a high bit error rate (or 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a high packet loss rate) environment while sacrificing coding efficiency. No matter how well error protection and error confinement are performed, they are not able to guarantee an error free transmission. Therefore, compensation for erroneous blocks becomes necessary to alleviate decoded visual distortion, which requires a post processing procedure called error concealment. This method neither intro duces extra delay, nor lowers the compression ratio. However, its robustness to errors is limited. The last error control mechanism requires a feedback channel to adjust the coding parameters according to channel conditions or acknowledge signals from the receiver. This method accomplishes error resilience in the most efficient way, but requires extra delay and may not be suitable in multicast and broadcast applications. To summarize, no specific strategy always performs better than others. To se lect a suitable error resilient method, we should consider the application require ments, such like coding efficiency, delay tolerance and implementation complexity. Four strategies are introduced in detail in the following sections. 2.2 Error P ro tectio n Two methods are adopted to achieve error protection. One is to separate the de sign of source and channel coders, but jointly optimize the performance. The other is to jointly design a source/channel coder. Layered coding plus unequal error pro tection is the most popular scheme for the first method and will be described in 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Section 2.2.1. Multiple description is a typical scheme belonging to the second method and will be presented in Section 2.2.2. Finally, joint source/channel re source allocation is described in Section 2.2.3, which is an active research area. 2.2.1 Layered C oding and U nequal Error P rotection A compressed video stream can be separated into one base layer and several enhancement layers. Moreover, the base layer may be more protected than en hancement layers since it contains essential information. Layered coding has been adopted in standards such as MPEG-2 and H.263+-1- [1]. It can be implemented by temporal scalability, signal-to-noise ratio (SNR) scalability and spatial scalability. Scalability allows for the decoding of a sequence at more than one quality level. Temporal scalability is achieved using B-pictures, which allow prediction from previous and subsequent reference pictures. By in serting B-pictures, decoded video may play at a higher frame rate. But if some B-pictures are lost due to limited bandwidth or channel errors, the subsequent pictures are not affected since B-pictures are not used as reference for prediction. Spatial and SNR scalability are equivalent except for the use of interpolation. W ith SNR scalability, coding errors due to lossy compression can be encoded and sent to the decoder, producing an enhancement to the decoded pictures. With spatial scalability, a reference picture is interpolated to be a larger one for predic tion of a spatial enhanced picture. Moreover, it is possible to have more than one 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. SNR or spatial enhancement layer in conjunction with the base layer. In addition, it is also possible to insert B-pictures between base layer pictures and enhance ment layer pictures. Thus, a multilayer scalable bitstream can be accomplished by combination of different scalability methods. Once a compressed bit stream is layer coded, a network that can provide differ ent service levels (DiffServ) is desirable to transmit each layer at its corresponding priority. Unfortunately, today’s wireless and Internet systems are not intelligent enough to support DiffServ. Thus, channel coding has to been employed to pro vide unequal error protection [2-8]. One popular way is to add forward error correction (FEC) codes. The BCH block code is an option defined in the H.263+ document due to its simplicity. The ability of a block code to correct errors is related to the code distance. Thus, the code rate can be easily designed by some equation according to the channel condition. However, for time-varying channels, both the encoder and the decoder have to maintain N BCH coders to support N different channel rates. In contrast, one single rate-compatible punctured con volutional coder (RCPC) is able to provide variable channel rates by selecting a different puncturing table. The performance analysis of RCPC is much more complicated so that no precise control mechanism is available in assigning coding rates of RCPC. Other schemes to provide DiffServ is to use the modulation tech niques [9,10]. A twin class 16-QAM system was used to transport a compressed video bit stream in [10]. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2.2 M ultiple D escription Multiple description coding (MDC) is designed for transmission through several channels with equal reliability. As done in layered coding, MDC also encodes a bit stream into several substreams, known as descriptions. The resulting descrip tions are correlated to each other and have similar importance. By this way, the redundancy information is inserted into the source directly rather than separating compression and protection. Any single description provides an acceptable qual ity because other descriptions can be estimated from it. As more descriptions are correctly received, the decoded video quality is enhanced. Many MDC approaches have been proposed to achieve correlated decompo sition, such as the motion-compensated temporal prediction [1 1 ], the correlated filter bank [12], the lattice vector quantization [13], the pair-wise correlating trans form [14] and the parity-check motion vector [15]. 2.2.3 Joint S ource/C hannel R esource A llocation Due to the limited bandwidth and the delay sensitivity of real-time video trans mission, optimal trade-off [16,17] between source and channel coding under the rate constraint [18-21], the delay constraint [22] or the power constraint [23,24] has been studied extensively [25]. Basically, the problem can be solved in three directions. The first one is to propose a rate-distortion model [26-34] for the video of interest such that a close form of the optimum can be obtained. However, it is 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. difficult to have precise model to predict video bit rate or distortion because they are content correlated. The second one is to twist the problem so that standard optimization tools such as the Lagrangian multiplier [35] or dynamic program ming can be applied directly to the modified problem. The last one is to calculate all instant rate distortion operating points to find the best combination with a reduced complexity. A typical joint source/channel coder [18-21] is designed to choose the quantizer of the entropy coder and the channel rate of the FEC chan nel coder. When the channel becomes very noisy, a coarser quantizer and a lower rate FEC will be selected to combat channel errors. On the contrary, when the channel condition becomes good, a finer quantizer and a higher rate FEC can be selected to generate high quality video. 2.3 Error C onfinem ent A video bit stream can be compressed efficiently by employing variable length coding (VLC) and tem poral/spatial prediction. However, such VLC and predic tion is the main reason for error sensitivity. Once an error happens somewhere in the bit stream, the decoded bit stream tends to lose synchronization with the encoder, ft makes the current VLC and the following bit stream undecodable. One simple and effective approach to solve this problem is to insert resynchro nization markers periodically. Another enhanced approach is to use reversible variable length coding (RVLC) so that the decoder not only decodes bits after the 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. resyncronization marker, but also decodes bits before the next resyncronization marker. Nevertheless, any reconstruction error results in error propagation, even if the following bit stream is received correctly. “Independent segment prediction (ISP)” and “Intra-blocks or frames refreshment” are two widely used methods to suppress and stop error propagation. 2.3.1 R esynchronization The resynchronization marker [36] is designed such that it can be easily distin guished from other codewords. In the H.263 standard, a 17-bit start code is added before every group of blocks (GOB), which is an example of the resynchronization marker. We can add resynchronization markers more frequently to enable the de coder to regain synchronization more quickly. However, this lowers the coding efficiency. Hence, only standard resynchronization markers are used in practical video coding systems. 2.3.2 R eversible Variable Length Coding Once an error occurs, the decoder in general discards all bits until it reaches the next resynchnization marker. W ith RVLC, the decoder can decode bits with the reverse direction from the next resynchronization marker. Thus, fewer correctly received bits are dropped and errors are localized in a smaller area. Recent re search has shown that RVLC can be designed with near-perfect entropy coding 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. efficiency. Both MPEG-4 and H.263 adopt the RVLC technique as one error resilient tool. 2.3.3 Independent Segm ent P rediction (ISP) ISP is a way to limit error propagation by separating uncompressed video into several segments and performing temporal/spatial prediction only within the same segment. By this way, errors in one segment will not affect other segments. ISP has been adopted as an option in H.263 [1]. A segment defined in H.263 can be a GOB or a slice. This approach significantly reduces the efficiency of motion compensation. To reduce the loss of coding efficiency, ISP is often used with the unrestricted motion vector mode, which allows motion vectors pointing outside the segment. In MPEG-4 [37,38], arbitrary shaped video object planes (VOP’s) can be encoded independently and superimposed at the decoder. Thus, errors are confined within the erroneous VOP. 2.3.4 Intra-Block or Frame R efreshm ent One powerful scheme to stop error propagation is to insert intra-coded blocks or frames. The intra-coding mode is to encode the current image frame or block with out the use of motion-compensated prediction. Obviously, this approach reduces the coding efficiency, especially when intra-frames are inserted. In a channel with a limited bandwidth, the intra-frame is only inserted when a scene cut occurs. In 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. contrast, intra-blocks are more often used for error resilience since they are more flexible to control and introduce a smaller amount of overhead than intra-frame refreshment. The intra-block refreshment rate can be optimally determined based on channel conditions. Furthermore, by using the error tracking mechanism or acknowledgements from the feedback channel, specific erroneous blocks can be chosen to be intra-coded to stop error propagation even more effectively. 2.4 Error C oncealm ent Decoder error concealment is used to compensate the residual errors after applying other active error resilient methods. This strategy does not require modification of syntax, thus it is favorable for both standard and proprietary products. The basic idea for estimating the lost information is to exploit information of the spatial neighbors or temporal corresponding parts. When temporal information is available and the object motion is slow, temporal prediction is used to estimate lost information. Otherwise, spatial interpolation is used [39,40]. 2.4.1 M otion C om pensated Tem poral P rediction One simple way to exploit the temporal correlation in video signals is to replace a damaged macroblock with that in the previous frame. This method introduces visual artifacts for fast motion objects. The decoded visual quality can be signifi cantly improved when the motion vector of the lost macroblock is available. This 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. method has been widely used due to its simplicity. MPEG-2 and H.263 standards allow the encoder to send motion vectors for intra-coded macroblocks. 2.4.2 Spatial Interpolation Spatial interpolation mainly targets at still images and intra-coded blocks in video. It exploits spatial correlation to make the damaged macroblock smooth in the entire picture. In [41], a missing pixel of a lost MB is linearly interpolated from pixels in nearest boundaries. The weighting coefficients for interpolation depend on the distance between the missing pixel and its boundary pixels. 2.5 Error C ontrol For video communications without feedback control, both error protection and error confinement should be designed for the worst cases. This practice results in a prohibitive amount of redundancy. Feedback-based error control techniques are more effective to suppress errors, although they introduce additional delay. Ac cording to the encoder requirement, a feedback channel can carry the estimated channel information or acknowledgements (ACK) from the receiver. The channel information could be the bit error rate or the packet loss rate, which is required for joint source/channel resource allocation. Although it may be lost during trans mission, the sender just ignores it and uses the previous one. In this way, the introduced delay by sending back the channel information can be viewed as a 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. constant. In fact, this is usually included in the design of joint source/channel coding schemes. In this section, we emphasis on error control with feedback channels to in dicate which parts of the bit stream were received intact. Depending on the desired error behavior, negative acknowledgement (NACK) or positive acknowl edgement (ACK) messages can be sent. In general, NACKs require a lower bit rate than ACKs because NACKs are sent only when errors occur while ACKs are sent regularly. The feedback message is usually transmitted in the link layer or the transport layer with other control information. For H.263, the ITU standard H.245 allows reporting of temporal and spatial location of MB’s that could not be decoded successfully. Upon received ACKs or NACKs, the sender retransmits packets that are damaged at the decoder. Thus, an error-free reception is guar anteed due to the retransmission protocol. However, this reliable transmission mechanism introduces additional delay, which is intolerant in most real time ap plications. Another alternative approach is to sacrifice reliability to reduce delay. Both approaches are practically used in today’s video transmission systems. 2.5.1 R eference P icture Selection (R P S) Both H.263+ and H.264 include reference picture selection (RPS) as an option. This approach requires both the encoder and the decoder to maintain a large buffer to store several recent reconstructed frames as references. Basically, the 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reference frame is set to be the most recent reconstructed frame without errors. The feedback channel is used to inform the encoder to select the same reference frame as the decoder. By keeping synchronization between the encoder and the decoder, error propagation can be stopped effectively. RPS can be operated [42] in two different modes as shown in Fig. 2 .1 . When the encoder receives only NACKs, the most recent reconstructed frame is used as the reference. When an error happens, the decoder notifies the encoder about the error location. Error propagates during NACK transmission. Once the encoder receives the NACK, it sets the previous undamaged frame as a new reference to encode the next frame, which should be decoded without distortion if no additional errors occur during its transmission time. When the encoder receives only ACKs, the decoder reference an older reconstructed frame instead, which may lose some coding efficiency. However, error stops more quickly as shown in Fig.2.1. 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. NACK Mode Transmission Error NACK(4) ACK Mode Time ACK(5) ACK(6) ACK( 1) ACK(2) ACK(3) Figure 2.1: Reference picture selection by ACK/NACK. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 W ireless Video Transmission w ith Unequal Error Protection 3.1 M u ltip le V id eo T ransm ission over C D M A S y stem s w ith U nequ al Error P r o tec tio n A CDMA system accommodates multiple users by allocating different spreading codes to different users. The bit-rate of one spreading code is often not sufficient to transmit high bit-rate video signals. Thus, it is likely to assign more than one codes to one user to meet the bandwidth requirement. This is known as the multi-code spreading. The available spreading codes are limited resources in the CDMA system, and they may interfere with each other to degrade the quality of transmitted signals if the system is overloaded. This is called the multiple 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. access interference (MAI). On one hand, channel codes, such as RCPC (rate- compatible punctured convolutional codes), are used to protect the video source against transmission errors. On the other hand, better protection demands more channel coding bits, which may in turn overload the system to result in MAI. It is therefore an important yet challenging problem to efficiently protect compressed bitstreams while minimizing MAI. Chang and Lin [43] proposed a dynamic spreading code assignment scheme for Object Composition Petri Net based multimedia services. Based on the bit error rate (BER) requirement on each video signal and the signal-to-noise ratio (SNR) of the channel, their algorithm determines the maximum number of Walsh- Hadamard (WH) codes that can be assigned to users in a certain time interval. In addition, there is an upper bound on the number of codes assigned to a single user. Then, depending on whether these two constraints are violated, one can decide whether a data packet should be dropped or not. Finally, the code assignment task is performed according to the requested source rate of each user after packet dropping. Their scheme assumes a fixed channel coding rate and provides the same protection to all bits. This scheme is certainly not efficient if the input data are not equally important. Deep and Feng [44] considered an adaptive channel coding scheme based on the video frame type. A lower rate channel code was assigned to I-frames while a higher rate channel code was assigned to P-frames. However, the MAI increasement was not taken into account in their work. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this chapter, we propose a scheme to adjust channel coding rates based on the importance of source data as well as the effect of resulting interference. At the encoder end, we first compress a video signal by employing the H.263 codec and split the resulting bitstream into two layers. The base layer contains the essential data such as the header and the motion vector information while the enhancement layer contains less important data composed of DCT coefficients. Then, we iteratively search the proper channel coding rates for these two layers. At this stage, we define a cost function that reflects the distance of the estimated bit error rates from the desired bit error rates for the two layers, and the corresponding channel coding rates are selected to minimize the cost function. Both AWGN and frequency selective Rayleigh fading channels are examined. Third, the two layered source bitstreams are passed to the RCPC coder to achieve their target channel coding rates. Finally, these protected bitstreams are spread by using an appropriate number of multi-codes. At the decoder end, a robust decoder is employed to detect and conceal errors introduced by channel noise and MAI. 3.1.1 M odified H .263 Source Coder H.263 is a low bit-rate video coding standard developed by ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) [1]. It is employed in several video communication system standards, including H.324 for communications over PSTN, H.323 for communications over the Internet, and 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Video in CC T Transform Q Q uantizer P Picture Memory with motion co m p en sated variable delay CC Coding control p Flag for INTRA/INTER t Flag for transm itted or not qz Q uantizer indication q Quantizing index for transform coefficients v Motion vector To > video multiplex coder Figure 3.1: The blockdiagram of the H.263 video encoder. H.324/M for communications through wireless channels. The block diagram of the H.263 video encoder is depicted in Figure 3.1. It consists of several main compo nents, including the D C T transform, the quantizer and the motion-compensated prediction unit. In general, video signals are highly correlated in both the time and the spatial domains. Spatial correlation (or redundancy) can be removed by u sing tw o co m p o n en ts, i.e. th e D C T tran sfo rm an d q u a n tiz a tio n . T h e D C T 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Base layer MB Header MV ... H.263 decoded GOB MB Header MV DCT Enhancement layer DCT ... Figure 3.2: Illustration of the proposed data partitioning scheme. transform is a mapping of one signal from the 2-D space domain to the 2-D dis crete cosine domain. A good transform should be able to compact the energy of a signal as much as possible. The 8x8 block DCT has been widely adopted in many standards such as JPEG, M PEG-1/2/4 and H.263, since it is efficient in energy compaction and allows fast implementation. The quantization is a way to repre sent a signal by a finite number of levels. It enables a higher compression ratio at the cost of degradation (or information loss) of the original source data. The motion-compensated prediction is used to exploit the high temporal correlation. Each macroblock is predicted with one or more motion vectors and the residual errors are encoded by using DCT and quantization. The encoded bitstream is split into two layers as shown in Figure 3.2 in our approach. The base layer contains the header and the motion vector information, whose loss can severely degrade the picture quality. The enhanced layer contains less important information, i.e. DCT coefficients. Note that the GOB header is duplicated in both base and enhanced layers to detect transmission errors. When 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a mismatching occurs during the decoding process of GOB data in a base layer or an enhanced layer, the whole GOB data are assumed to be corrupted since the point of detected VLC error often appears later than the actual location in which the error occurs. If a GOB of the base layer is declared to be corrupted, then pixels in the GOB are replaced by pixels at the same locations in the previous frame to conceal transmission errors. If a GOB in the enhanced layer is corrupted, DCT coefficients of the residual GOB are set to zero. Consequently, the corresponding macroblocks are copied directly from the previous frame by using error-free motion vectors. Generally speaking, data loss in the enhanced layer results in less severe degradation than that in the base layer. 3.1.2 C hannel M odel There are three fading factors to be considered in wireless communication [45]. The first one is path loss, which depends on the distance between the sender and the receiver. It is deterministic and can be easily computed. The second one is the long-term fading effect, which is also referred as shadowing. It is often modelled as a log-normal distribution, which means the power in dB is a Gaussian random variable with its mean determined by the path loss model and its variance determined by channel noise. The third one is the short-term fading effect, which is the most complicated factor. The short-term fading effect can be classified into four types depending on the relation between signal parameters and channel 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. characteristics. To characterize the delay spread resulting from the multi-path transmission, either flat or frequency selective fading can be used. The flat fading model is applied when the delay spread is less than one symbol period while the frequency selective fading model is adopted when the delay spread is larger than one symbol period. A similar situation occurs in the frequency domain when the mobile user is moving at a certain speed. For such a case, fast fading and slow fading should be considered based on the Doppler Spread. The fast fading model is used for a fast-moving mobile, in which the radio channel fluctuates more frequently, while the radio channel can be modelled as slow fading for a slow-moving mobile. In the system of our current investigation, it is assumed that the power is controlled perfectly in the receiver so that there is no near-far problem. The Doppler effect is also assumed to be negligible. Thus, we consider the following three sources for noise: the additive white Gaussian noise (AWGN), interference from other users, which is also known as the multiple access interference (MAI), as well as the frequency-selective slow Rayleigh fading channel. To implement the short-term fading model, we assume that there is no direct line-of-sight path, and that delayed paths with different delay units are Rayleigh faded and added together, where each delay unit is of several chips long as shown in Figure 3.3. Usually, a RAKE receiver is used to collect time-shifted versions of the original signal by providing a separate correlator to each of multipath signals. 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. > D -> D D (KH- Rayleigh Simulator ( X h - Rayleigh ( X H - Rayleigh Simulator Simulator Figure 3.3: The frequency-selective Rayleigh fading simulator. E rro r A nalysis in C D M A At the receiver, the decision statistic for the Z-th user is given by Zi — Ii + n + £, where is the power of the desired signal, n is the output of thermal noise passing through the matched filter and can be modeled as a zero-mean Gaussian random variable, and £ is MAI which can also be approximated as a zero-mean Gaussian random variable, since there are many users in the CDMA system. Consequently, Z\ is a Gaussian random variable with mean m contributed by /; and the variance v contributed by both n and £. The variance of Zi is simply the sum of variances of random variables n and £, since they are independent. Once the mean m and the variance v of Zi are obtained, the bit error rate (BER) of the received signal can be expressed as Q(^=). However, this result can be applied only to the AWGN channel. If we consider the Rayleigh fading, n and £ all 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. depend on the fading coefficients. In such a case, the received signal’s BER can be expressed as E {Q (^=)}, where E(-) is the average over the distribution of the fading coefficients. In the above discussion, we consider only the fading effects on the source bits. Another important factor to influence BER is the channel coding scheme, which will be discussed in more detail in the next section. 3.1.3 C hannel Coder RCPC A wireless channel suffers from fading effects. To combat the undesired noise and interference, error correction codes are needed to improve the performance. The convolutional code is often used in a CDMA system. Since our goal is to vary the channel coding rate appropriately to achieve different QoS requirements, the rate-compatible punctured convolutional (RCPC) code [46] is adopted due to its flexibility in adjusting the rate. A convolutional code with a coding rate of k/n can be represented with n linear algebraic function generators and implemented by passing an input sequence to a finite state machine with A:-bits per stage. As shown in Figure 3.4, RCPC is obtained by deleting certain output bits according to the puncturing table after the convolutional encoding in order to increase the channel coding rate. In the decoder side, ‘0’s are inserted according to the same puncturing table as in the encoder before the Viterbi decoding. There is no 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Time T im e 2 3 4 5 2 3 4 5 1 1 1 0 < + 1 + 1+1 -1 - 1 + 1-1 + 1 +1 -1 -1 -1 D D < 1 0 0 1 ■ * Puncture Table f u n *"(.101, Figure 3.4: RCPC with K— 3, rate — 1/2 systematic way to determine the generator function and the puncturing table for the RCPC code family. Fortunately, some generator functions and tables have been found by computers, and can be easily found in related textbooks and papers B ER for Soft D ecision D ecoding The Viterbi algorithm with soft decision is used at the decoder end for maximum likelihood detection. For RCPC, 0’s should be inserted according to the punctur ing table before it passes though the Viterbi decoder. Even though it is difficult to get the exact BER for a convolutional code, a pretty tight upper bound can be found in [47]. More specifically, the bound for the bit error probability is given where M is the puncturing period, c * , is the total number of nonzero information bits on all k weight paths, dfree stands for the free distance, Es is the symbol energy [46]. by (3.1) 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. right after channel coding in the transmitter, and D is a constant depending on the adopted channel code. In a CDMA system, we can replace Es/N 0 in (3.1) by Rcjb, where R c is the channel coding rate and 7 b is the ratio of signal to interference and noise on the information bit. We have D = e~lbR\ where 1 3r) 2 - i k = l , k ^ l E i ^ 2E b J for the AWGN channel, and L— l d = y U 1+^ where E {a 2 ,} 2{^ E7o‘ E {aU t + 2 5fe} \ q J 7<? ~ ^ r 1 v - i - 1 Z 7 J ™ 2 \ Ek , Nq __T ( 3 ’3 ' for the frequency-selective slow Rayleigh fading channel [48]. In above, &{<$} 7r „ = n E { a n - E { a l } ' i=0^jtq 1 L and Ek is the information bit energy for the fcth code, 7 is the spreading factor, K is the total number of used codes, L is the number of taps in the RAKE 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. receiver, and E{a^} is the variance of the tap gain an. The value E {afy can be approximated by r [ 1 r e T V ; E {an} ~ ——e~T rms; n = 0,..., L — 1, 7~rms where Tc is the tap spacing and rrms is the root mean square delay spread of the channel. 3.1.4 C hannel C ode and M ulticode A ssignm ent The block diagram of a wireless C D M A communication syetem is shown in Figure 3.5. First, n video signals are compressed by using the modified H.263 encoder. The source encoder generates the base and the enhanced layer bitstreams for each video signal. Based on bit rates of the base and the enhancement layers and channel conditions, the channel code and multicode assignments are performed by searching possible combinations of the channel code rate and the number of allocated multi-codes. Then, compressed bitstreams are protected by the selected RCPC codes. The output sequences from RCPC are then spread by multi-codes, scrambled by a random sequence and modulated by BPSK. Here, we use the WH codes as the spreading code, which is a kind of orthogonal code. The reason to use th e scram b lin g code is to decrease th e cro ss-co rrelatio n of W H codes. A t th e receiver side, signals are demodulated, descrambled, despread, channel-decoded 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. V id eo str. 1 Modified H.263 B ase layer seq. Enhancem ent layer seq. RCPC RCPC V id eo str. 2.. Modified B ase layer seq. ^ RCPC H.263 RCPC E nhancem ent layer seq. \ I \ \ \ w \\ / v Multicode spreading and modulation O ptim um code assign Channel model O /P V id eo str Modified ^Basc layer seq. Viterbi H.263 decoder Enhancem ent layer seq. O /P V id eo str Modified JB ase layer seq. Viterbi H.263 decoder E nhancem ent Demodulation and despreading layer seq. Figure 3.5: The block diagram of a wireless CDMA communication syetem. with the Viterbi algorithm and finally decompressed by using the modified H.263 decoder. Cost Function 1 By using the formulas given in the previous section, we can calculate BER for each incoming bit sequence. To optimize the overall performance of the whole system, we minimize the following cost function N c = j 2 w ip i > i=1 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where N is the total number of codes used in the system, Wi is the weighting coefficient for the fth code, P, is the BER of the corresponding sequence, which is a function of the channel coding rate. To get a lower BER, we may try to put more channel protection. However, more channel protection demands more spreading codes. The additional codes are leading to higher MAI, which in turn increases BER. Thus, we attem pt to find the best channel coding rates that minimize the cost function. To have a better understanding of the framework. Let us consider the following scenario. There are two users, and each of them produces two bitstreams, i.e. the base layer and the enhancement layer. Thus, there are four bitstreams to be transmitted in the system. The weight ratio assigned to these four bitstreams is 100 : 1 : 100 : 1 and the source rate ratio is 1 : 3 : 2 : 6 . Let us also assume there are only four choices for channel coding rates, 8/9, 1/2, 1/3 and 1/4. Tables 3.1 and 3.2 give the optimum code allocation results according to different spreading factors and SNR levels. From these tables, it is observed that more codes are assigned if the spreading factor is increasing. However, even if the spreading factor is as high as 64, in which case the system affords any combination of channel coding rates, the selected codes are not always the lowest rate for channel coding. This is because more codes introduce more MAI. These two tables also show the influence of SNR. As the difference of SNR between the base layer and the enhancement layer increases, the assigned channel coding rates for the base 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.1: Results for the AWGN channel with a fixed weight ratio. SNRbase • S N R enh SF=16 SF=64 1:1 Rate=[l/2,8/9,8/9,8/9] R ate= [1/4,1/3,1/3,1/2] No. of Codes=[2 4 3 7 ] No. of Codes=[4 9 6 12] 3:1 Rate=[8/9,8/9,8/9,8/9] R ate= [1/2,1/4,1/2,1/3] No. of Codes=[2 4 3 7 ] No. of Codes=[2 12 4 18] 6:1 Rate=[8/9,8/9,8/9,8/9] Rate= 8/9,1/4,8/9,1/3] No. of Codes=[2 4 3 7 ] No. of Codes=[2 12 3 18] 1 0 :1 Rate=[8/9,8/9,8/9,8/9] Rate= [8/9,1 /4 ,8 /9 ,1/4] No. of Codes=[2 4 3 7 ] No. of Codes=[2 12 3 24] Table 3.2: Results for the frequency selective Rayleigh fading channel with a fixed weight ratio.___________________________________________________________ SNRbase ■ S N R enh SF=16 SF=64 1 :1 Rate— [l/2,8/9,8/9,8/9] R ate= [l/4 ,l/4 ,l/4 ,l/4 ] No. of Codes=[3 9 6 1 2 ] No. of Codes=[4 12 8 24] 3:1 Rate=[8/9,8/9,8/9,8/9] Rate= 1/3,1/4,1/3,1/4] No. of Codes=[2 4 3 7] No. of Codes=[3 12 6 24] 6 :1 Rate=[8/9,8/9,8/9,8/9] R a te= [l/3 ,l/4 ,l/2 ,l/4 ] No. of Codes=[2 4 3 7] No. of Codes=[3 12 4 24] 1 0 :1 Rate=[8/9,8/9,8/9,8/9] R ate= 1/2,1/4,1/2,1/4] No. of Codes=[2 4 3 7] No. of Codes=[2 12 4 24] layer becomes higher. This implies that the base layer gets less channel protection. However, it has a higher power and introduces more MAI than the enhancement layer. Next, let us fix the SNR ratio to be 1:1 and vary the weight coefficients to ob serve how th e o p tim u m so lu tio n changes. T h is is im p o rta n t since th e re is a stro n g relationship between weight coefficients and user requirements. In Tables 3.3 and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.3: Results for the AWGN channel with a fixed SNR ratio. wb l : We 1 : Wb 2 : We2 SF=16 SF=64 1 :1 :1 :1 R ate=[l/2,8/9,8/9,8/9] R ate= [l/4 ,l/4 ,l/4 ,l/3 ] No.of Codes=[2 4 3 7] No. of Codes=[4 12 8 18] 1 0 0 0 :1 :1 0 0 0 :1 R ate=[l/2,8/9,8/9,8/9] Rate— [1 /4,1 /3 ,1 /3 ,1/2] No.of Codes=[2 4 3 7] No. of Codes=[4 9 6 12] 1 0 0 0 0 :1 0 :1 0 0 0 :1 R ate=[l/2,8/9,8/9,8/9] R ate= [l/4 > l/3 ,l/3 ,l/2 ] No.of Codes=[2 4 3 7] No. of Codes=[4 9 6 12] 1 0 0 0 :1 :1 0 0 0 0 : 1 0 Rate=[l/2,8/9,8/9,8/9] R ate= [l/4 ,l/3 ,l/4 ,l/3 ] No.of Codes=[2 4 3 7] No. of Codes=[4 9 8 18] 3.4, we have tried four weighting ratios. First, we give equal importance to all four sequences. Then, the coding rate is related to the source rate. That is, the sequence with a lower source rate gets higher protection. Second, we assign much higher weight to the base layer than the enhancement layer to differentiate their importance. Thus, the base layer gets more protection. Third, we assume user no. 1 is more important than user no. 2 . Obviously, the overall protection for user no. 1 is better than that for user no. 2. The last case is that user no. 2 is more important than user no. 1. User no. 2 does not get less coding rate than user no. 1, since user no. 2 has a higher source rate. Note that, as compared to the first rows in Tables 3.1 and 3.2, user no. 2 gets more protection in the current case. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.4: Results for the frequency selective Rayleigh fading channel with a fixed 'v fR , ratio. Wbl : Wel : Wb 2 : We2 SF=16 SF=64 1 :1 :1 :1 R ate=[l/2,8/9,8/9,8/9] R ate= [1/4,1/4,1/4,1/4] No.of Codes=[2 4 3 7] No. of Codes=[4 12 8 24] 1 0 0 0 :1 :1 0 0 0 :1 R ate=[l/2,8/9,8/9,8/9] Rate= [1/4,1/4,1/4,1/3] No.of Codes=[2 4 3 7] No. of Codes=[4 12 8 18] 1 0 0 0 0 :1 0 :1 0 0 0 :1 R ate=[l/2,8/9,8/9,8/9] Rate— [1/4,1 /4 ,1/4,1/3] No.of Codes=[2 4 3 7] No. of Codes=[4 12 8 18] 1 0 0 0 :1 :1 0 0 0 0 : 1 0 R ate=[l/2,8/9,8/9,8/9] R ate= [1/4,1/4,1/4,1/4] No.of Codes=[2 4 3 7] No. of Codes=[4 12 8 24] Cost Function 2 In the previous section, weighting coefficients have to be set manually before cost evaluation. If we have explicit requirement on each video stream, we can design an alternative cost function . B E R f , where B E R f is the estimated bit error rate for the i-th sequence obtained by (3.1), and B E R f is the BER requirement of the corresponding sequence, which is the input to the system by the user. The cost function indicates the worst satisfaction level of all resulting BERs. We attem pt to treat all user data fairly by minimizing this worst satisfaction level. W ith our video codec, a base layer stream can tolerant bit error rate less than 1 0 - 4 while an enhancement layer 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. stream can tolerant bit error rate less than 10-3. Thus, the two values are set to be default when no input requirements are given. By examining equations in the previous section, we see that it is complicated to solve this optimization problem explicitly. Fortunately, the number of channel coding rates is limited, thus a straightforward way to achieve the optimization is to perform exhaustive search. However, the complexity of this method is 0 ( K 2U), where K is the number of channel coding rate choices, and U is the number of users. Since the search time increases exponentially with the number of users, exhaustive search is not practical in many cases. To speed up the search time, we propose the following iterative method. As sume that we are given an initial solution. First, we select the best channel rate for one sequence while fixing rates of other sequences. Then, we update the solution, and find the best rate for the next sequence, and so on. This is repeated until the channel rates converge. Apparently, the initialization is an important factor to de termine whether the convergent result is the same as the globally optimal solution or not. Also, it is related to the number of iterations to be performed for con vergence. From experimental results, we observed that the best channel rate for each sequence is highly dependent on its SNR and the source rate. Therefore, we initialize the channel rate by considering those two factors. For example, suppose that there are four sequences (two users) in the system, and the corresponding source rate is SCi, where i = 0...3. First, we calculate the average channel rate 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. by S F x BaseRate Ave where S F is the spreading factor, and BaseRate is the transmission rate of one WH code. Thus, S F x BaseRate is the maximum transmission rate that the CDMA system can support. Note that, as the overall source rate is lower, RCPC provides better protection on the average. Second, we assign an initial channel rate to each sequence, by modifying the average channel rate as Init-chnlrate-forJbaseJayer — Ave — Step Init-chnlrate.f or .enhancement-layer = Ave + Step Step = log 2 Experimental results confirm that the power level is a more dominant factor for signal protection than the channel coding rate, and that it is advantageous to decrease the channel rate for a higher power sequence in order to alleviate MAI. Therefore, we decrease the channel rate for the base layer, when it has a higher power level. By initializing the process in such a way, it has been verified by extensive simulation results that a convergent result is obtained within five iterations in most cases. The complexity of the iterative method is only 0(K*2U), which is much simpler than that of exhaustive search. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.5 Sim ulations Com parison of B E R bounds and sim ulated BERs 1 2 3 4 5 AWGN 9 10 11 12 13 14 15 16 1 | “ 0.1 ! q < o I 0.01 a. o 0.001 in 5 0.0001 0.00001 — ♦ — analytic 8/8 --H — analytic 8/16 analytic 8/24 analytic 8/32 —lit— sim 8/8 —♦ — sim 8/16 — 1 — sim 8/24 - sim 8/32 9 ^ «.j 1 \ . l 1 -------/ / No. of users in system Figure 3.6: BER v.s. no. of users in the AWGN channel. 1.00E+00 & 1.00E-01 Frequency selective slow Rayleigh fading 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1.00E-03 g 1.00E-04 1.00E-05 • No. of users in system — analytic 8 /8 — • - a n a l y t i c 8/1 6 a n a ly tic 8/2 4 a n a ly tic 8/3 2 ■sim 8 /8 ■sim 8 /1 6 •sim 8 /2 4 •sim 8 /3 2 Figure 3.7: BER v.s. no. of users in the frequency-selective slow Rayleigh fading channel. In this section, we perform numerical experiments to verify analytical results derived in previous sections. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. First, the channel coding performance for both the AWGN channel and the fre quency selective Rayleigh fading channel are shown in Figures. 3.6 and 3.7, respec tively. In this simulation, the channel SNR is set to 3dB, and RCPC (23,35,27,33) with four rates 8 / 8 , 8/16, 8/24, and 8/32 are employed. For channelization codes, we use the WH codes with spreading factor 16. Thus, the system can accomodate at most 16 users. To reflect a real CDMA system, asynchronous transmission is simulated. For the frequency selective slow Rayleigh fading channel, 4 multipaths are assumed, and the RAKE receiver with 4 fingers is used to exploit multipath signals. To further improve the performance, an interleaver is used to randomize burst noise. Obviously, it can be seen that the BER decreases as the channel rate becomes lower. Note that for the channel rate 8 / 8 (without channel coding), simulated BERs are almost the same as the estimated ones. However, for other cases, sim ulated BERs are lower than estimated values. This is because we have a closed BER form only for the case without channel coding. As mentioned earlier, the exact BER performance for convolutional codes is difficult to derive. Thus, for channel coding rates lower than 1, we use the BER upper bound in (3.1) as the estimated BER. It can be seen that the difference between the bound and the simulated BER is within 10 times for the AWGN channel and even smaller in the Rayleigh fading channel. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We can also observe from these figures that if the system uses more multi-codes, BER increases. However, the BER increase is relatively small. This indicates that we had better use lower channel rate as long as the system can accommodate the total transmission rate (= source data rate / channel rate). But, if several users simultaneously transmit their data over the same channel, channel codes should be carefully assigned so that the overall satisfaction is maximized. Channel C ode Assignm ent In the next simulation, we examine the performance of the proposed code as signment scheme. It is assumed that two users exist in the system. One user is transmitting the Foreman QCIF sequence while the other is sending the Claire QCIF sequence. Furthermore, suppose that the two users are of equal importance and the bit error rate requirements are 10~4, 10-3, 10- 4 and 10- 3 for Foreman’ s base layer, Foreman’s enhancement layer, Claire’s base layer and Claire’s enhance ment layer, respectively. The channel SNR is set to 5 dB for both AWGN and frequency selective slow Rayleigh fading, and the transmission rate for one spread ing code is 10 kbps. As in the previous simulation, the spreading factor is set to 16, and RCPC (23,35,27,33) with four rates 8 / 8 , 1/ 2 , 1/3, and 1/4 are used. Table 3.5 shows the search result of the proposed algorithm, the estimated and sim u lated BERs, an d th e resu ltin g cost. For com parison, we implement a system with exactly the same parameters but use an intuitively assigned channel coding 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.5: Search results, estimated and simulated BERs for the proposed scheme in the AWGN channel (SR = source rate, CR = channel rate, MCs = no. of multicodes.)_________________________________________________ SR (kbps) CR MCs est BER sim BER fm.O 2 0 1/3 6 0 .0 0 1 0 1.5547 x 10" 5 fm.l 2 0 1 / 2 4 0.0069 0.00163246 cl.O 1 0 1/3 3 0 .0 0 1 0 0.0002488 cl.l 1 0 1 / 2 2 0.0069 0.0087687 cost = 1 0 Table 3.6: Estimated and simulated BERs for the reference system in the AWGN channel. _____________________________________ _____________ SR (kbps) CR MCs est BER sim BER fm.O 2 0 1/4 2 8.25 x 10- 4 1.55 x 10“ 5 fm.l 2 0 1 / 1 8 0.0725 0.0611785 cl.O 1 0 1/4 4 8.25 x 10“ 4 0.0002798 cl.l 1 0 1 / 2 2 0.0109 0.0159515 cost = 72.5 rate. In Table 3.6, the least channel rate 1/4 is assigned to the base layer and the remaining code resources are given to the enhancement layer. In this case, the enhancement layer get too less protection. Its BER is 0.0725, which is much higher than requirement 10-3. By comparing Tables 3.5 and 3.6, we see that the proposed algorithm satisfies the requirements of all users more evenly than the reference system. The same comparison is performed also for the frequency selective Rayleigh fading channel. Tables 3.7 and 3.8 summarize the results. Note that the proposed 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.7: Search results, estimated and simulated BERs for the proposed scheme in the Rayleigh fading channel.______________________ ______________ SR (kbps) CR MCs est BER sim BER fm.O 2 0 1/3 6 2.34 x 10" 5 0 fm.l 2 0 1 / 2 4 0.00112627 0.000139925 cl.O 1 0 1/3 3 2.34 x 10~ 5 0 cl.l 10 J 1 / 2 2 0.00112627 0.0007774 cost = 1.12627 Table 3.8: Estimated and simulated BERs for the reference system in the Rayleigh fading channel_______________________________________________ SR (kbps) CR MCs est BER sim BER fm.O 2 0 1/4 8 4.46 x 10" 5 0 fm.l 2 0 1 / 1 2 0.0921 0.0610074 cl.O 1 0 1/4 4 4.46 x 10“ 5 0 cl.l 1 0 1 / 2 2 0.0016 0.0009639 cost = 92.1 algorithm provides a much better performance than the reference system. It can be also seen that the Rayleigh fading case yields lower BERs than the AWGN case, since the RAKE receiver takes advantage of the multipath effect. Finally, the received bitstreams are passed through the video decoder. Figure 3.8 show the PSNR performance of the proposed algorithm for ‘Claire’ and ‘Fore man’ sequences. In both AWGN Rayleigh fading case, the proposed algorithm yields almost the same PSNR as the error-free bitstream. On the other hand, in the reference system, the enhanced layer of the received bitstreams contains 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Foreman 29.4 29.2 28.8 28.6 £ 28.4 2 28.2 -• - R a y le i g h -•-A W G N no errors 27.8 27.6 27.4 27.2 T frame no. Claire 33.2 32.8 -• - R a y le ig h - • - A W G N no errors 32.6 32.4 32.2 + fram e n o . Figure 3.8: The PSNR performance for two test sequences. too many errors, thus the decoder cannot reconstruct the whole sequence success fully. This results indicates that the proposed algorithm can effectively assign the channel rate to each user, thus improving the quality of reconstructed frames. 3.2 P ack et V id eo T ransm ission w ith A d a p tiv e C hannel R a te A llo ca tio n From the information theoretical viewpoint, the optimum reliable transmission can be achieved by separating the source coding from the channel coding [49]. However, in real applications under finite delay and complexity constraints, we can improve the quality of reconstructed video by jointly designing the source and the channel coders. Video signal is a kind of distortion tolerant media and human eyes are not equally sensitive to all bits in a video bitstream. It is thus advantageous to partition compressed video data into several layers so that they 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can be treated in different ways according to their importance levels. In other words, a network, which is capable of providing different service levels (DiffServ), can transmit each layer at its corresponding priority. However, many wireless and Internet systems are not intelligent enough to support DiffServ[50]. In such a case, we can provide unequal protection by em ploying forward error correction (FEC) codes, such as block codes and convolu tional codes [3-6,8]. Rate-compatible punctured convolutional (RCPC) codes are especially suitable for unequal error protection, since they can flexibly adapt the level of protection by changing puncturing tables [46]. An alternative approach to provide DiffServ is to use the different bit error rates, caused by the characteristics of signal constellations in modulation [9,10]. Due to the limited bandwidth and the delay sensitivity of real-time video transmission, the optimal trade-off between source and channel coding under rate constraint [18-21], delay constraint [22] or power constraint [23,24] has been stud ied extensively [25]. A typical joint source/channel coder [18-21] is designed to adaptively choose the quantizer of source coder and the channel rate of FEC coder. When the channel becomes very noisy, a coarser quantizer and a lower rate FEC code are selected to combat channel errors. In contrast, when the chan nel condition becomes good, a finer quantizer and a higher rate FEC code can be selected to transmit high quality video. It is not feasible to compute all possible combination of quantizer step size and channel rate to find the best operating 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. point. Thus, many attempts have been made to develop a rate-distortion model for video signals to reduce the computational complexity [26-29,31-34]. In this chapter, we first develop a video coder based on layered coding and interleaved packetization. Then, we consider two scenarios of video communica tions. The first scenario is to protect the packets of a pre-compressed video signal, when the channel condition is fixed. We propose a channel code rate allocation scheme to minimize the expected mean square error subject to a constraint on the overall bit rate. In the second scenario, we jointly allocate the source and the channel rates to each packet in real-time video transmission over a wireless channel, whose bit error rate is fluctuating. We develop a simple rate-distortion model in terms of quantization parameter, so that the rate and the distortion can be estimated without an extensive encoding procedure. Simulation results demonstrate that the proposed algorithms for both scenarios provide acceptable image quality even in high bit error rate environments. 3.2.1 V ideo Coder We employ a video coder, which is modified from the standard H.263 coder [1]. In the encoder, we partition compressed video data into base and enhancement layers and packetize them in an interleaved way to enhance error resilience. In the decoder, we use a motion-compensated error concealment scheme to recover corrupted video regions faithfully. 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Layered Coding and Packetization As Section 3.1.1, we first partition compressed video data into two layers as shown in Figure 3.2. The base layer contains macroblock (MB) headers and motion vectors, whose loss severely degrades the received video quality. The enhancement layer contains less important information, i.e., the residual DCT coefficients after motion compensation. For an intra MB, DCT coefficients are put into the base layer. We then perform the packetization at the base and the enhancement layers, respectively, for packet-based video communications. Due to the packetization, the effect of errors can be localized within a packet. As the packet size becomes smaller, the error localization becomes more effective. However, a smaller packet size introduces more overhead bits to distinguish separated data parts. Further more, a larger amount of computations are required to evaluate the importance of each packet so that packets can be treated differently according to their impor tance. Thus, the bit rate overhead, the computational complexity as well as the error localization capability should be taken into account to determine the packet size. In H.263, a synchronization code can be inserted at the beginning of a group of blocks (GOB) to localize the errors within a GOB. A GOB usually contains a slice of MBs. For a a QCIF (176 x 144) frame, 9 GOBs are generated. Similarly, we can partition the layer-coded video stream into GOBs. W ith layered coding 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 4 5 6 7 8 9 1 2 3 4 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 6 7 8 9 I 2 3 4 5 6 7 8 9 Figure 3.9: MB grouping for a QCIF frame and packetization, a QCIF frame can be partitioned to 9 base packets and 9 enhancement packets. However, if a packet is corrupted, only the upper and lower MBs are available to conceal an erroneous MB. In this chapter, we introduce an alternative way to reorganize MBs so that each packet consists of sparsely distributed MBs. Thus, the decoder can conceal the erroneous MB more effectively by using the information of more neighboring MBs. Figure 3.9 illustrates the proposed packetization scheme. A packet for the QCIF format video is formed with 11 MBs chosen from every nine consecutive MBs. Specifically, 1st, 10th, 19th and 28th MBs are grouped into one packet, 2nd, 11th, 20th and 29th MBs are grouped into another packet, and so on. Therefore, as in the GOB packetization, the proposed scheme also generates 9 base packets and 9 enhancement packets for each frame. But, when a packet is missing, the erroneous MBs can be concealed by using the information of the upper, lower, left and right MBs. At the end of each packet, a 16-bit cyclic redundancy code (CRC) is added to enable the error detection at the decoder side. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Error Concealm ent Error protection schemes can reduce bit error rate, but they cannot guarantee to correct all bit errors. Since compressed video data consist of variable length codewords, they are vulnerable to even a single bit error. Using the CRC code at the end of each packet, the decoder first performs the error detection. If a packet is detected as erroneous, the decoder discards all the data within the packet and conceals the loss. If an enhancement packet is lost but the corresponding base packet is intact, we replace the missing DCT coefficients with zeros and copy each MB from the previous frame using the motion vectors in the base packet. This approach pro vides a good image quality, since we can exploit high temporal correlation in image sequences with correct motion vectors. On the other hand, if a base packet is lost, the corresponding enhancement packet is useless. A simple approach is to directly copy the missing MBs from the previous frame with zero motion vector. This approach gives an acceptable performance when a sequence contains only slow motions. But, in a fast moving sequence, the direct copying results in obvious discontinuities and artifacts. In the proposed algorithm, the loss of a base packet is concealed in the following way. As shown in Figure 3.10, for each pixel p in a missing MB, four reference pixels are obtained from the previous frame using the motion vectors of the upper, lower, left and right MBs. They are denoted by pU ppen Piowen Pieft and pnght ■ To 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Upper : Lower = (17-y): y L eft: Right = (17-x): x Figure 3.10: Error concealment by linear interpolation. conceal the pixel p, the four reference pixel values are averaged using weighting coefficients, which are inversely proportional to the distances between p and the adjacent MBs. Specifically, assume that p is the (x,y)th pixel in the missing MB, where 1 < x, y < 16. Then, it is concealed via ~ P u p p e r * ( 1 7 y ) " b S lo w e r ’ V d- P \e ft ' (17 x) f f i b r i g h t ' p _ __ . If a neighboring motion vector is not available due to the packet loss, intra coding mode or boundary effect, only those available motion vectors are used for the concealment. If all motion vectors are not available, the erroneous MB is copied from the previous frame with zero motion vector. Figure 3.11 compares the reconstructed frames of the direct copying method and the proposed algorithm, when 3rd base packet for 2nd frame of ‘Foreman’ 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.11: Loss of 3rd base packet: (a) The erroneous region is copied from the previous frame, (b) The erroneous region is concealed by the proposed algorithm. sequence is lost. We see clearly that the copying algorithm yields obvious dis continuities in the background building, while the proposed algorithm recovers the motion direction smoothly and provides very faithful image reconstruction without noticeable artifacts. 3.2.2 Static Channel C ode R ate A llocation Channel D istortion Error correction codes are needed to combat undesired noises and interferences in wireless environments. Since our goal is to vary the channel coding rate ac cording to different QoS requirements, we adopt the rate-compatible punctured convolutional (RCPC) code [46] due to its flexibility in adjusting the rate. It is necessary to assign the channel code rate to each packet according to its importance so that the limited bandwidth can be used efficiently. An ideal way to measure the packet importance is to associate each packet with the effect of its 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. loss to human perception. However, human eyes are very complicated organs, and it is difficult to obtain the exact relation between video packet and its perceptual quality. Thus, for simplicity, we use the mean square error (MSE) to measure packet loss effect. The MSE measurement of each packet loss is straightforward. We emulate the dropping of each packet, perform its error concealment and then compute MSE between the concealed data with the error-free data. Let Sf denote the MSE value due to the loss of packet i. Then the expected channel distortion for a frame can be written as n - ^ c h a n n e l = ^ • P i, ( 3 - 4 ) i = 1 where n is the number of packets for a frame and Pi is the probability that packet i is lost. The packet loss rate Pi is given by Pi = l - ( l - e ) Ni, (3.5) where e is the bit error rate when a certain RCPC channel code rate is applied, and iV j is the size of packet i in bits. We assume that even a single bit error ruins the whole packet. In other words, an uncorrupted packet means that it contains no bit error after the RCPC decoding. The objective is to determine the RCPC channel code rate for each packet, which minimizes the distortion in Eq. (3.4) subject to a certain rate constraint. 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Therefore, we need to find the relation between the channel code rate and the bit error rate e in Eq. (3.5). Since it is difficult to find a closed form relation, we use several training sequences to measure the bit error rate when a certain channel code rate is used. In this way, we can find an experimental mapping of channel code rate and bit error rate. Then, we can solve the problem by the following exhaustive search. 1. Get the MSE value 6? for each packet in a frame. 2. Try every combination of channel code rates, and select the set that mini mizes the expected distortion in Eq. (3.4) subject to the constraint on the overall transmission rate. 3. Apply RCPC to each packet with the selected channel code rate. Fast Search Algorithm The rate assignment method described above is computationally too expensive. In our implementation, there are four channel code rate choices (8 / 8 , 8/16, 8/24 and 8/32), and we process a frame as a unit for the assignment. For example, in a QCIF frame, there are 18 packets, and the exhaustive search should check 418 candidates to find the best channel rate combination. Such a high complexity is unacceptable, and a fast search algorithm is necessary in real time applications. The Lagrangian multiplier method [25] is often employed to solve the rate- distortion optimization problem, though it has the limitation that it can only reach 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. points on the convex hull of the achievable R-D region. Unfortunately, in our case, the convex hull is not dense enough and the solution obtained from the Lagrangian multiplier method may not provide a reasonably good performance. Although the dynamic programming approach can overcome this problem, its complexity increases exponentially with both the number of packets and the number of rate choices. We develop a different fast search method. As given in the cost function in Eq. (3.4), there are two factors that affect the assignment result. One is the MSE value of each packet loss, and the other is the packet loss rate that is a function of the packet size. If we assume that all packets have the same size, the assignment depends only on the MSE value. Specifically, the packet with a higher MSE value requires stronger protection. Thus, if we arrange packets in the descending order of the MSE values, the assigned rates for those packets should be in the ascending order. The assigned channel code rates also should satisfy the overall transmission rate constraint. But, if the total rate is much lower than the rate constraint, we can lower the channel code rates of some packets to provide stronger protection. Thus, the best channel rate combination happens only when the total transmission rate is close to the rate constraint. The following procedure is developed based on these observations. 1. List the packets in the descending order of their MSE values. 2. Find the set S of the channel code rate vectors that satisfies the ascending 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rule. Specifically, S = {rk = (rM , rfcj2, ..., r fc )n ) : r f c > i < rk> j for i < j}, where rk,i denotes a channel code rate for packet i. 3. Find the subset S' of S that contains only the vectors satisfying the overall transmission rate constraint. 4. Let 17 = (n,i, n,2, • • •, ri> n ) and rT O = (rm > i, rmj2, ..., r m,n) be two elements in 5". Note that the protection capability of 17 is inferior to that of rm, if fi,% > f°r ad * • Thus, we remove such elements from S' by comparing every pair of elements in S'. Let S" denote the resulting subset of S'. 5. Select the suboptimal code rate vector from S" that minimizes the cost function in Eq. (3.4). In the above procedure, we reduce the candidate set from S to S". In general, the size of S" is much smaller than that of S, reducing the search time significantly. For example, let us assume that there are 6 packets and each packet can have one of the four channel rates 8/8,8/16,8/24 and 8/32. When the overall transmission rate is constrained to be smaller than twice the original source rate, the above procedure yields S" consisting of only 7 candidate vectors in Table 3.9. On the contrary, if the full search method is employed, we need to calculate the cost function 46 = 4096 times. Although the same packet size assumption was made in developing the fast search algorithm, packet sizes are actually varying in our system. It is however 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.9: An example of the fast channel code rate assignment. Packet 1 Packet 2 Packet 3 Packet 4 Packet 5 Packet 6 1 8/16 8/16 8/16 8/16 8/16 8/16 2 8/24 8/16 8/16 8/16 8/16 8 / 8 3 8/24 8/24 8/16 8/16 8 / 8 8 / 8 4 8/24 8/24 8/24 8 / 8 8 / 8 8 / 8 5 8/32 8/16 8/16 8/16 8 / 8 8 / 8 6 8/32 8/24 8/16 8 / 8 8 / 8 8 / 8 7 8/32 8/32 8 / 8 8 / 8 8 / 8 8 / 8 Table 3.10: Comparison of different search results. Lagrangian multiplier method Fast search algorithm frame 1 cost=18.699 cost=6.27394 rate=1518 rate=1478 frame 2 cost=40.6726 cost=7.03531 rate=1741 rate=1741 frame 3 cost=31.4817 cost=6.7466 rate=1652 rate— 1927 worthwhile to point out that packet sizes do not vary significantly, since we employ a rate control scheme at the video encoder. Experimental results confirm that the fast search algorithm provides an acceptable performance with a low complexity. For a QCIF frame, the fast search algorithm usually takes around 40 time units to find the solution instead of 418 time units for the full search. Table 3.10 com pares the search results of the Lagrangian multiplier method and the fast search algorithm. We see that the fast search algorithm tends to achieve a significantly lower cost than the Lagrange multiplier method. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.3 V ideo Source M odel In the previous section, we developed a channel rate allocation scheme for the case when the source bit rate and the channel condition are fixed. However, the channel condition varies dynamically in wireless video communications. In such cases, it is advantageous to jointly allocate the source and the channel rates according to the fluctuating channel condition. A typical video encoder performs motion estimation, DCT, quantization, and variable length coding (VLC). These components affect one another. For example, if the motion estimation module can find a well-matched block in the previous frame or the quantization is performed with a large step size, VLC requires a small amount of bits to encode residual DCT coefficients. Also, the distribution of residual coefficients highly depends on the characteristics of the input image sequence. It is hence not easy to develop a statistic model that precisely predicts bit rates and distortions. However, in this chapter, we attem pt to develop a simple model that can estimate the bit rate and the distortion of a video packet in terms of the quantization parameter Q. B it R ate M odel The bit rate can be approximated as the entropy of quantized coefficients [51]. However, the empirical rate is usually lower than the lst-order entropy, which is computed by assuming that the coefficients are independent of one another. This 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.11: Parameters for the rate-quantization model Re = ^ + B A B ‘Foreman’ 28778 5 ‘Claire’ 5226.4 -0.9 ‘Salesman’ 9713.5 -3.3 is because the coefficients are encoded by the run-length coding, exploiting con secutive zero coefficients. The discrepancy between the entropy and the empirical rate becomes larger as the image sequence is encoded at a lower bit rate with a larger Q. Since we focus on the low bit rate coding for wireless applications, the entropy method is not suitable in our approach. Recently, another method called p-domain R-D analysis was proposed in [32], where the relationship between the bit rate and the percentage p of zeros in quantized DCT coefficients was ana lyzed. However, to achieve an accurate rate estimation, the method proposed in [32] requires a computationally expensive process to obtain about 10 model parameters. To maintain a low computational complexity, we attem pt to find a bit rate function in terms of Q rather than through several intermediate parameters. Fig ure 3.12 shows the average bit rate for three sequences ‘Foreman,’ ‘Claire,’ and ‘Salesman.’ Let us analyze the bit rates for base packets and enhancement packets separately. Enhancement packets contain residual DCT coefficients, and their bit rate de creases hyperbolically as Q increases. In Figure 3.12, we also plot the polynomial 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Rate counted i n b its/p ack et Foreman.qcif ....... sim ulated en h bit rate sim ulated b a s e bit rate e stim ated e n h bit rate e stim ated b a s e bit ra te ' \ ' -•X X, °S 10 15 20 25 30 Q u a n tiz a tio n p aram eter (a) Claire.qcif S alesm an .q c if sim ulated e n h bit rate sim ulated b a s e bit rate estim ated en h bit rate estim ated b a s e bit rate ® 100 sim ulated e n h bit rate sim ulated b a s e bit rate estim ated en h bit rate estim ated b a s e bit rate 15 20 Q cuantization param eter 15 20 Q cuantization p a ra m e te r (b) (c) Figure 3.12: The rate-quantization curves for (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Salesman’ sequences. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. function R e n h = ~ q 2 ^ > which approximates the enhancement bit rate. Parameters A and B are ob tained to minimize the mean square approximation error. It is observed that the polynomial functions approximate very well the empirical bit rates. Table 3.11 summarizes parameters A and B for the three sequences. Note that B is negligible as compared to A. Thus, we can approximate the bit rate further with Parameter A depends on source characteristics. ‘Foreman’ sequence contains the fastest motion, thus its DCT coefficients are more widely distributed and require a higher bit rate than those of the other two sequences. Thus, ‘Foreman’ sequence has the largest A in Table 3.11. Parameter A is estimated in this chapter as follows. Let the input sequence be encoded with a sample quantization parameter Qs, we obtain the resulting bit rate R enh,s for the enhancement layer. Then, A is estimated as A = Renh,s x Qs- On the other hand, base packets contain MB headers and motion vectors, thus their bit rate does not vary significantly as Q changes. It is observed that the average bit rate for base packets linearly decreases as Q increases, where the slope is empirically found to be around 1. Thus, the bit rate for base packets can be 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. approximated by the function (3.7) where Abase,s is the corresponding base layer bit rate when Qs is selected as the sample quantization parameter. Figure 3.13 compares the empirical bit rates and the estimated bit rates. It can be seen that the estimated bit rates for both base and enhancement packets are very close to the empirical ones. Q uantization D istortion M odel In a typical video coder, only the quantization of DCT coefficients results in the source distortion Dsource, whereas the other components process video signals losslessly. We adopt the mean square error as the measure for the quantization distortion, which is given by s o u r c e where / is a DCT coefficient ranging from oq, to au, f is the quantized output, and p(-) is the probability density function of / . The distribution of DCT coefficients 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B it rate counted i n bits/packet Foreman.qcif sim ulated bit ra te of e n h a n ce d p ac k et sim ulated bit ra te o f b a s e p acket estim ated bit ra te of e n h a n c e d p ac k et estim ated bit ra te of b a s e p ac k et r ! J c n i ■ ■ :) i \ 1 : ■ J / i - ''I ' I1 ' I . 1 - !l 1 - ' v -i 1 | > i ! S 1 - ' 'J f I' l 1 i 1 1 , v i J 1 ; V ' V p I ' 1 t i l l 1 ' I'O i'H I u ! 1 f| r H i f " l / i ! °0 10 20 30 40 50 60 70 80 90 P ack e t num ber (a) 600 500 Claire.qcif S alesm an.qcif sim ulated bit rate of en h a n c e d p acket sim ulated bit rate of b a s e p acket e stim ated bit ra te of e n h a n c e d p ack et estim ated bit ra te of b a s e p ac k et 2000 1800 1 sim ulated bit ra te of en h a n c e d packet sim ulated bit ra te of b a s e packet estim ated bit ra te of e n h a n ce d packet e stim ated bit rate of b a s e p acket A piif ■ § 1400 ( C " ■ 1 • I'li 1 Mm :i ’ ; 1 ll|ii , < ':( : ";U n i : m clll v A P W 1 f iM ! !V w |v !h !i! d $ h ■ 1 n f V J i r y » i : [j y 1 1200 C I 1000 c § 800 £ 1 600 C O 400 • / 'i h Mi , \ '■ ' 1 / , M i iv A k u . n \A \’t ••././V s . ' ^ " ' f v M , . . ; .. - — r 200 0 ... ■ j 0 10 20 30 40 50 60 P ack e t num ber 70 80 90 0 10 20 30 40 50 60 P ack e t num ber 70 80 90 (b) (c) Figure 3.13: The bit rate estimation for (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Sales man’ sequences. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is often modelled as the Laplacian distribution [52], given by P(x ) = 2 exP ^ (3.8) Then, we have the quantization distortion pq(i+1/2) A o u r c e ( X ~ qi)2p(x) dx, i= - o o '/ 9(i-l/2) where q denotes the quantization step size and i denotes the quantization index. Note that the quantization parameter Q is half the step size in H.263 (i.e. q = 2Q). After some derivation, we can approximate the quantization distortion as A A 2 H2 _ p ~ 1 2 8 . 5 ^ 9 i - i — ---------- ( m ) 1 - e-M p m Q \ 2 fiQ ( 2 pQ) (3.9) Figure 3.14 plots the quantization distortion in terms of the quantization pa rameter Q. At the same Q, the ‘Foreman’ sequence has the highest distortion, since it contains fast movements. The distortions of ‘Claire’ and ‘Salesman’ reach a saturation point when Q is greater than 10, i.e., when most DCT coefficients are quantized to zeros. The proposed model tends to underestimate Dsomce, when a small value of Q is chosen. This is because there is a large discrepancy between the actual distribution and the Laplacian distribution at a small Q. Nevertheless, 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Quantization error 60 sim ulated quantization error • estim ated quantization error 15 20 25 30 Q uantization param eter (a) sim ulated quantization error estim ated quantization error 15 20 25 Q uantization param eter 20 18 16 14 w . t 4) 12 C 10 N 8 o 6 4 2 0 sim ulated quantization error estim ated quantization error 15 20 25 30 Q uantization param eter (b) (c) Figure 3.14: The distortion-Q curves for (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Sales man sequences. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the proposed model provides much more accurate estimation of Dsource than the traditional model D - ‘- 's o u r c e ^ g that assumes the uniform distribution of coefficients. For each packet, after DCT but before the quantization, the Laplacian param- Then, Dsom ce can be estimated from the quantization parameter Q via Eq. (3.9). Figure 3.15 shows that the proposed model effectively estimates the quantization distortion for each packet, even though the distribution of DCT coefficients varies dynamically according to the motion and the texture of video contents. 3.2.4 C hannel D istortion In the proposed adaptive source and channel rate allocation, a GOB is a basic adaptation unit. A GOB consists of a base packet and an enhancement packet. Similar to Eq. (3.4), the channel distortion due to possible losses of the base and the enhancement packets can be written as eter fi can be obtained from the variance a2 of DCT coefficients by c h a n n e l (3.10) 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Quantization error o f each packet em perical D source estim ated D source f I I i l l W i ' 0 10 20 30 40 50 60 70 80 90 P ack e t num ber (a) Claire.qcif em perical D source estim ated D source A i i i i i it a i 1 1 f ft/' u. i S 30 10 20 30 40 50 60 70 80 90 P ack e t num ber S alesm an.qcif em perical D source estim ated D source ik iHli i-t i f ! I i f i ' / l i .................. :u i fij W H i 11 -i 10 20 30 40 50 60 70 80 90 P ack e t num ber (b) (c) Figure 3.15: The estimation of the quantization distortion Dsomce for each packet: (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Salesman’ sequences. 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The MSE values 5 ^ and < 5 2 nh are computed by emulating the dropping and the concealment of the base and the enhancement packets, respectively. Also, the packet loss rates A base and Penh can be computed from the packet sizes via Eq. (3.5). The quantization parameter Q also affects ^ase an(l 52 nh, and its influence on the channel distortion should be investigated. An enhancement packet consists only of residual DCT coefficients. When it is lost, all coefficients are set to ze ros. If we assume that there is no quantization distortion, < 5 2 nh is equal to the variance er2 of the residual coefficients. In general, it was found experimentally that d2 nh is inversely proportional to the quantization distortion Dsource and can be approximated as Also, the motion-compensated error concealment is applied to a missing base packet. From definition, A ^ ase is the mean square difference between the concealed reconstruction and the error-free reconstruction. The concealed reconstruction is based on the previously reconstructed frame and the neighboring motion vectors so that it is independent of the current Q. But, the error-free reconstruction is dependent on Q. We can calculate the mean square difference, S2(Qs, Q), between the reconstructed frames using the quantization parameters Qs and Q. Then, < 5 2 )a se is approximated by so u rc e * (3.11) (3.12) 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where s is the mean square error due to the loss of the base packet when the sample quantization parameter Qs is employed. The above formula indicates that ^base decreases as Q increases, which is consistent with empirical results. Figure 3.16 shows that the estimated distortions are close to the empirical data for ‘Foreman,’ ‘Claire’ and ‘Salesman’ sequences. 3.2.5 A daptation The overall distortion D is the sum of the source and the channel distortions. From Eqs. (3.9)— (3.12), we have D -D source T - 9 channel — [^b ase,s ^ {Qsi Q ) ] P b a s e + Cr2 P en h 2 + 2 n2 e 1 l _ e -2pQ ( 1 - P e n h ) , (3-13) where er2 is the variance of DCT coefficients and /i = Before the RCPC coding, the bit rates of the base and the enhancement packets are given by Eqs. (3.7) and (3.6), respectively. Then, the overall transmission rate can be written as R = Rb ase P e n h C b a s e f 'e n h P b a s e ,s “F Qs Q , A/Q (o 1 A ^ l^ b a s e C /e n h 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Mean square error o f each p ac k et Foreman.qcif 350 « -5 300 ( Q a . m 250 V o g 200 © £ 3 150 C T ( A © 100 2 50 em perical M SE of en h a n c e d p acket e stim ated M SE of e n h a n c e d p acket em perical M SE of b a s e p acket e stim ated M SE of b a s e p acket i » A " < - ' O ' / V * ' 10 20 30 40 50 60 70 80 90 P ack e t num ber C laire.qctf em perical M SE of e n h a n c e d p ac k et e stim a te d M SE of e n h a n c e d p a c k e t e m p e rical M SE of b a s e p a c k e t e stim a te d M SE of b a s e p a c k e t ) t i V \ V V '- 'W ' V ; ' 1* 4 0 50 P a c k e t n u m b er ■ 8 S 50 f c b 4) © 30 10 S a le sm a n .q c if e m p erical MSE of e n h a n c e d p a c k e t e stim a te d M SE of e n h a n c e d p a c k e t e m p erical MSE of b a s e p ac k et e stim a te d M SE of b a s e p ac k et W k 40 50 60 P a c k e t n u m b er I * ti I ... . ( , (b ) (c) Figure 3.16: The estimated distortion due to each packet loss: (a) ‘Foreman,’ (b) ‘Claire’ and (c) ‘Salesman’ sequences. 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reject the choice Violate constraint Encode a frame (Qs) A = RCSQS2 Obtain Re, Rb & D S (Q ) Assign Cratee n h and Crateb a se Check rate constraint If any other Otherwise choice Obtain DC (Q) If any other Q No other Q nor channel rate choice Dc(min)=Dc(Q) Yes If Dc(Q) < Dc(min) No Figure 3.17: Flow chart of adaptation chart where C 'b a se and C 'c n h are the RCPC channel code rates for the base and the enhancement packets. rates for each pair of base and enhancement packets, which minimize the overall distortion D in Eq. (3.13) subject to the constraint that the overall rate R in Eq. (3.14) is lower than a certain bit rate. The optimization is performed as shown in Figure 3.17. The detailed procedure is described as below: 1. Encode the current frame by a sample quantization parameter Qs. Count the bit rates generated from the base packets (7?base,s) and the enhancement packets (jRenh,s)- Determine coefficient A via A = Renh,s x Ql- 2. For each enhancement packet, calculate the variance a2 of the residual DCT Our goal is to optimize the quantization parameter and the channel code coefficients and the corresponding Laplacian parameter fj, = 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. For each combination of (Q, Chase, Cenh), compute the total bit rate via Eq. (3.14). If it exceeds the given bit budget, reject the choice. If it is within the bit budget, compute the overall distortion via Eq. (3.13). 4. Repeat Step 3 for all valid combinations and find out the best combination that minimizes D. 5. Quantize the DCT coefficients by Q* and apply RCPC with rates C£ase and C*nh to the base and the enhancement packets, respectively. 6 . Repeat Steps 2-5 to process all the packets in the frame. 3.2.6 Sim ulation R esults Channel R ate A llocation for Pre-C om pressed V ideo B itstream s We investigate the performance of the channel rate allocation system in Section 3 in high bit error rate environment. A binary symmetric channel with bit error rate (BER) 0.01 or 0.05 is simulated. For channel coding, RCPC with four rates, which are 8/8,8/16,8/24 and 8/32, is employed. Also, the Viterbi decoder is assumed to be capable of tracing back at most 80 symbols. We use three test sequences. They are ‘Claire’ lst-50th frames, ‘Foreman’ lst-50th frames and ‘Foreman’ 100th-150th frames, which have slow, moderate and fast motion characteristics, respectively. Figures 3.18, 3.19 and 3.20 show the PSNR performances of the proposed algorithm (UEP) and the equal error 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. < K Z w a 5“ EEP w/o ER • t - UEP w/o ER • • • EEP w/ ER ■ UEP w/ ER ''V 10 IS 20 25 30 35 40 45 50 (a) (b) Figure 3.18: PSNR comparison for ‘Claire’ lst-50th frames, which have slow mo tion characteristics: (a) BER = 0.01 and (b) BER = 0.05. protection (EEP) scheme, which uses the same channel code rate (= 8/16) for all base and enhancement packets. For error recovery, ‘w / ER’ indicates that the error concealment method in Section 2.2 is used, while ‘w/o ER’ denotes the direct copying algorithm. Since the locations of errors affect the quality of the reconstructed video significantly, each curve is obtained by averaging PSNRs over 100 different error patterns. It is clear that the method ‘UEP w / ER’ provides a significant performance improvement as compared to the other three methods. For the slow motion sequence, the gap between UEP and EEP is larger than that between ‘w / ER’ and ‘w/o ER’ as shown in Figure 3.18. This indicates that UEP is more powerful than the motion-compensated error concealment, since the simple copying algorithm is sufficient to conceal the loss of slowly moving objects. However, even in the slow motion sequence, the importance of each 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. - EEP w/o ER UEP w/o ER EEP w/ ER UEP w/ ER EEP w/o ER ■ UEP w/o ER EEP w/ ER UEP w/ ER \ V- 10 15 20 25 30 35 40 45 50 frame no. 10 15 20 25 30 35 40 45 50 (a) (b) Figure 3.19: PSNR comparison for ‘Foreman’ 100th-150th frames, which have fast motion characteristics: (a) BER = 0.01 and (b) BER = 0.05. ..... EEP without ER i- UEP without ER EEP w/ ER UEP w/ ER 30 EEP w/o ER UEP w/o ER EEP w/ ER UEP w/ ER i \ ■ 28 •t • U W \ 26 -,V PSNR 1 \ - . i \ \ \ 22 ' k 4 ................ ~x- .." 20 .... - ........... - .7 / 18 (a) (b) Figure 3.20: PSNR comparison for ‘Foreman’ lst-50th frames, which have mod erate motion activities: (a) BER = 0.01 and (b) BER = 0.05. 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. packet varies greatly and UEP enhances the performance. On the contrary, for the fast motion sequence, the error concealment is more powerful than UEP when BER = 0.01, as shown in Figure 3.19(a). The simple copying algorithm causes severe discontinuities and artifacts if the sequence contains fast motion. Thus, the proposed motion-compensated concealment provides a much better performance than the simple copying algorithm. However, as BER increases, there are more packet losses and the error concealment performance becomes poorer. Thus, when BER = 0.05, UEP plays a more important role than the error concealment. The best performance improvement is obtained in the moderate motion se quence as shown in Figure 3.20. Both the error concealment and UEP provide a significant PSNR improvement. As shown in Figure 3.20(b), UEP gives about 3 dB improvement, and the error concealment also gives about 3 dB improvement on the average. However, the total improvement is not additive. It is about 4 dB. That is because the error concealment tends to reduce the cost of the base packet loss. In other words, a good error concealment decreases the cost gap between the base packet and the enhancement packet, which in turn decreases the gain of UEP over EEP. Figures 3.21 compare several frames of the moderate motion sequence, which are reconstructed by the ‘UEP w / ER’ and ‘EEP w/o ER’ methods. It can be observed that the ‘UEP w / ER’ method provides much better image quality than ‘EEP w/o ER’. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (b) Figure 3.21: Several frames of ‘Foreman’ sequence, which are reconstructed with (a) equal error protection and direct copying (‘EEP w/o ER’) and (b) unequal error protection and error concealment (‘UEP w / ER’). A daptive Source and Channel R ate A llocation for R eal T im e V ideo Com m unications Next, we investigate the performance of the proposed adaptive video transmission system in Section 4. In the following experiments, a binary symmetric channel with average BER equal to 0.004 and 0.02 is simulated. The spontaneous BER varies from 0.1 to 10~n for an average BER of 0.004, and varies from 0.1 to 0.001 for an average BER of 0.02. RCPC with four channel code rates, 8 / 8 , 8/16, 8/24 and 8/32, is employed and the Viterbi decoder traces back up to 80 symbols. Figures 3.22, 3.23 and 3.24 show the performances of the proposed adaptive system on ‘Claire,’ ‘Salesman,’ and ‘Foreman’ sequences, which are examples of slow, moderate, and fast motion pictures, respectively. For comparison, we also 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. show the performance of the non-adaptive system under the same channel condi tion, where Q and the channel code rate are fixed to 20 and 8/16, respectively. Due to the bit rate control and the proper channel code rate assignment, the pro posed algorithm reduces the packet loss rate significantly. It can be seen that the adaptive system provides at least 2 dB and up to 6 dB improvement as compared to the non-adaptive system. For the slow motion sequence ‘Claire,’ the motion-compensated error conceal ment algorithm conceals corrupted regions faithfully. However, the non-adaptive system does not fully utilize the bandwidth, and its PSNR performance is limited by the quantization error. In contrast, the proposed algorithm assigns a smaller Q, and improves the quality of the received video. For the moderate motion sequence ‘Salesman,’ the PSNR value decreases sharply around the 10th frame, where the sequence contains fast motion. Nevertheless, the proposed adaptive system still performs better than the non-adaptive system. For the fast motion sequence ‘Foreman,’ the average assigned Q value is around 20. Thus, the average source bit rate in the adaptive system is close to that in the non-adaptive system. However, the adaptive system allocates different channel rates to the base packets and the enhancement packets to minimize the overall distortion. Due to the fast motion activities of ‘Foreman’ sequence, the base packets are more important, thus being protected with stronger channel codes. Therefore, the adaptive sys tem provides much better PSNR performance than the non-adaptive system as 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 no adaptation I — adaptation 1 no adaptation 1 | — adaptation | 1 37 36 i . < ; 35 - 1 34 ce z ''\ V ■ . / V v " ' - ' ” 0. 33 V. V 32 31 4, N , i / L i 'x 7 i ' \ / \ ^ 30 \ / X. x - - - . / x , , , 20 30 40 frame number (a) (b) -l Figure 3.22: The PSNR comparison for ‘Claire’ sequence at (a) BER = 10 10“ n and (b) BER = K T 1 ~ 1CT3. shown in Figure 3.24(a). As shown in Figure 3.24(b), when the channel condition becomes worse, the adaptive system has to lower Q and the channel code rate to satisfy the overall bit rate constraint, and the PSNR improvement becomes smaller. Figure 3.25 shows examples of reconstructed frames of ‘Foreman’ sequence when BER varies from 10_1 to 10-11. In the non-adaptive system, the recon structed frames contain severe distortions and blurring. On the contrary, the pro posed algorithm reconstructs the frames with an acceptable image quality. These simulation results indicate that the proposed algorithm is an effective method for robust video transmission. 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. adaptation 32 ' ^ 31 \ 30 V \ I \ 29 i \ \ I £ i \ * 2 8 \ 0 . \ 27 \ 26 ’ 25 24 no adaptation adaptation | (a) (b) Figure 3.23: The PSNR comparison for ‘Salesman’ sequence at (a) BER = 10 -l 10" 11 and (b) BER - 10" 1 io - 34 adaptation 32 30 28 £ 8 0. 26 24 22 20 S O 60 60 0 10 20 30 fram e number 40 (a) (b) Figure 3.24: The PSNR comparison for ‘Foreman’ sequence at (a) BER = 10 1 10~n and (b) BER = IO’ 1 - 10“3. 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (b) Figure 3.25: Several frames of ‘Foreman’ sequence, which are reconstructed by (a) the non-adaptive system and (b) the proposed adaptive system. 3.3 C onclusions In this chapter, we first proposed a scheme to deal with the trade-off relation between channel coding rates and the number of assigned multi-codes to achieve efficient transmission. Then, we defined two cost functions in terms of BER, and performed a fast search to determine the best combination of channel coding rates and the corresponding number of multi-codes. It is worthy to note that the code assignment scheme can be applied to a dynamic system with little modification on the initialization. For example, in case of initiating or ending a call, the current channel coding rates can be used to provide an initial solution for the new cost function, reducing the search complexity significantly. Secondly, we developed a video coder based on layered coding and interleaved packetization. In addition, we proposed joint source/channel coding schemes for 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. two typical video communications scenarios. The first one was designed for pre compressed video bitstreams, such that the expected mean square error is min imized subject to a constraint on the overall bit rate. The second scheme was designed for real-time video transmission over wireless channels, where the source and the channel rates of each packet are jointly optimized to maximize the quality of reconstructed video. Simulation results demonstrated that the proposed algo rithms for both scenarios provide acceptable image quality even in high bit error rate environments. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Spatial and Temporal Error Concealm ent Techniques 4.1 In trod u ction Error concealment techniques at the decoder attem pt to conceal erroneous blocks using the correctly decoded information without modifying source and channel coding schemes [53,54], They are hence suitable for a wide range of applications. Depending on the available information, different error concealment methods can be developed to exploit the information effectively. Typical video codecs, such as MPEG-4, H.263 and H.264, classify video frames into three types: the intra (I), the predictive (P) and the bidirectional (B) frames. Erroneous B-frames can be simply dropped, since they are not referenced by subsequent frames. In contrast, 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. erroneous I or P-frames may result in error propagation to subsequent frames and have to be concealed in some way. In this chapter, we propose novel spatial and temporal error concealment algo rithms for I and P-frames. The algorithm for I-frame concealment can restore edge components as well as low frequency information by employing edge detection and directional interpolation. The algorithm for P-frame concealment adaptively fills in erroneous blocks with the information in previous frames based on a dynamic error tracking model. It is demonstrated by simulation results that the proposed algorithms can suppress error propagation as well as conceal erroneous blocks effectively. The rest of this chapter is organized as follows. Previous work on error con cealment is reviewed in Section 4.2. An error concealment algorithm for the I-frame is presented in Section 4.3 while another error concealment algorithm for the P-frame is discussed in Sections 4.4 and 4.5. A few implementation issues are examined in Section 4.6, and experimental results are presented in Section 4.7. Finally, concluding remarks are given in Section 4.8. 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 P rev io u s W ork on Error C oncealm en t 4.2.1 I-Frame C oncealm ent In many low bitrate applications, the I-frame mode is used only for the frames at the beginning of a sequence or a scene cut, for which no temporal information can be exploited to reduce the bitrate. Various algorithms have been proposed for the concealment of errors in I-frames based on the spatial information. A typical method is to interpolate each pixel p in a lost macroblock (MB) from intact pixels in adjacent MBs [55,56]. Let pi (i = 1,2,3,4) denote the closest pixel to p in the upper, lower, left, and right MBs, respectively. Then, the reconstruction value p of p is given by E L pW - * ) ' (4.1) where W is the horizontal or vertical size of an MB, and dt is the distance be tween Pi and p. This linear interpolation scheme is a simple yet effective method for smooth images. Note that the weighting coefficient (W — di) is selected to be inversely proportional to distance dt. In [57], a more advanced technique was proposed to perform the interpolation adaptively to achieve the maximum smooth ness. Generally speaking, these methods attem pt to reconstruct a lost MB as a smooth interpolated surface from its neighbors. However, blur occurs if the lost MB contains high frequency components such as object edges. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The fuzzy logic reasoning approach [58,59] uses a vague similarity relationship between a lost MB and its neighbors to recover high as well as low frequency information. It first recovers the low frequency information with surface fitting. Then, it uses fuzzy logic reasoning to coarsely interpret high frequency informa tion such as complicated textures and edges. Finally, a sliding window iteration is performed to integrate results in the previous two steps to get the optimal output in terms of surface continuity and a set of inference rules. In [60], another iterative error concealment algorithm was proposed. It uses a block classifier to determine edge directions based on the gradient data. Then, instead of imposing a smooth ness constraint only, an iterative procedure called “projections onto convex sets (POCS)” is adopted to restore lost MBs with an additional directional constraint. This approach provides satisfactory results when the missing MB is characterized by a single dominant edge direction. In [61], the coarse-to-fine block replenishment (CFBR) algorithm was proposed, which first recovers a smooth large-scale pat tern, then a large-scale structure, and finally local edges in a lost MB. The fuzzy logic, POCS and CFBR approaches are, however, computationally expensive for real-time applications because of the use of iterative procedures. In [62], a computationally efficient algorithm was proposed based on direc tional interpolation. First, it infers the geometric structure of a lost MB from the surrounding intact pixels. Specifically, the surrounding pixels are converted into a binary pattern and one or more edges are retrieved by connecting transition points 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. within the binary pattern. Then, the lost MB is directionally interpolated along edge directions so that it is smoothly connected to its neighbors with consistent edges. However, the transition points are selected heuristically and connected using only the angle information. Thus, the retrieved edges may not be faithful to the original ones. In Section 4.3, we will propose a low complexity algorithm for I-frame concealment, which employs a more robust edge detection scheme. 4.2.2 P-Fram e C oncealm ent For the error concealment of P-frames, temporal as well as spatial information is available. In fact, temporal correlation is much higher than spatial correlation in real world image sequences so that P-frames can be more effectively concealed than I-frames. In P-frames, the compressed data for an MB consist of one or more motion vectors and residual DCT coefficients. If only DCT coefficients are lost, a motion-compensated MB still provides acceptable visual quality. However, if both the motion vector and DCT coefficients are lost, the motion vector is recovered using the information in adjacent MBs, and the lost MB is motion-compensated using the recovered motion vector. There are several approaches to recover lost motion vectors. 1. Set the lost motion vector to zero. Thus, this approach replaces a lost MB by the MB at the same spatial location in the previous frame. 2. Use the motion vector of one of the spatially or temporally adjacent MBs. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Use the average or median of motion vectors of adjacent MBs. 4. Choose the motion vector based on the side matching criterion [63,64]. Among the set of candidate motion vectors, this approach selects the vec tor minimizing the side matching distortion so that the concealed MB is smoothly connected to the surrounding pixels. 5. Estimate the motion vector with block matching [65-67]. This approach estimates the motion vector for the set of the surrounding pixels, and applies that vector to the lost MB. It was shown that the error concealment performance can be improved by em ploying advanced motion compensation techniques such as the overlapped block motion compensation [64] and the affine motion compensation [ 6 8 ] after motion vector recovery. Another method for P-frame concealment [69] interpolates damaged regions adaptively to achieve the maximum smoothness in the spatial, temporal and fre quency domains. Statistical methods [70-72] model image pixels or motion fields as Markov random fields, and then estimate the lost content using maximum a posteriori (MAP) estimators. Alternatively, a model-based method [73] builds a model for the region of interest (e.g. the face) during the decoding of image sequences and recover the corrupted data by projecting it onto the model. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. All the above methods focus on the concealment of erroneous blocks only. Furthermore, the concealment effect is not complete, and concealment errors tend to propagate to subsequent frames because of motion compensated prediction. In Sections 4.4 and 4.5, we will propose a novel P-frame error concealment method, which attempts not only to conceal erroneous blocks but also to suppress the error propagation phenomenon. 4.3 D irection al In terp olation for I-fram e C on cealm en t In this section, we propose a low complexity algorithm for I-frame concealment, which can restore edge components as well as low frequency information. The proposed algorithm first detects edges components in neighboring boundary pixels, and connects broken edges in the lost MB via linear approximation. Then, the lost MB is partitioned into segments based on the recovered edge information. Finally, each pixel in a segment is directionally interpolated from the boundary pixels that are adjacent to the segment. 4.3.1 Edge R ecovery Edges, which mean sharp changes or discontinuities in luminance values, play an important role in human perception of images. Generally, an image with blurred 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. edges is annoying to human eyes. In this chapter, edges in missing MBs are recovered by the scheme illustrated in Figure 4.1. Suppose that a missing MB is surrounded by four correctly decoded MBs. First, edges are detected by calculating the gradient field on the boundary pixels in neighboring MBs. The gradient at pixel (x, y), denoted by (G>(x, y), Gc(x, y)), can be computed by the convolution of the image P(x, y) with row and column impulse arrays as Gr(x,y) = P (x,y)*H r(x,y), (4.2) Gc(x,y) = P(x,y)* Hc(x,y). (4.3) The following Sobel operator is adopted in this chapter: 1 0 - 1 r— l 1 1 - 2 1 ---------- t -H 1 H l-^H I I 2 0 - 2 £ I I i —1 0 0 0 1 0 1 h — 1 1 1 2 1 Note that if the Sobel operators are directly applied to boundary pixels, the gradient calculation involves corrupted pixel values, which leads to inaccurate edge detection. Instead, we apply the Sobel operators to the second boundary lines from the to p , b o tto m , left an d rig h t of th e co rru p te d MB. The am p litu d e 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) Edge detection on boundary pixels (c) Obtain representative edge points (b) Detected edge points (d) Edge matching and linking Figure 4.1: The edge recovery process, and angle of the gradient are then defined as A(x,y) = y/G^{x,y) + G2 c(x,y), (4.5) 0(x,y) = arctan ^ . (4.6) Gc{z, y) If the amplitude A(x, y) is larger than a pre-specified threshold, pixel (x, y) is said to lie on an edge. The threshold is set to the variance of pixel values here. Several consecutive pixels are often detected as edge points as shown in Figure 4.1(b). Among them, only one pixel with the largest gradient amplitude is selected as the true edge point as shown in Figure 4.1(c). It is assumed that there are two cases when an edge enters a lost MB through an edge point. The first case is that the edge exits the MB via another edge point. 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The second case is that the edge meets another edge within the MB and, as a result, does not exit the MB. Based on this assumption, we should compare the edge points to find the matched pairs. The attribute vector of an edge point at (x, y) is defined as a(x, y) = (Gr (x, y), Gc(x, y), d(x, y), P(x, y)), (4.7) each element in a(x, y) gives similar contribution for an edge point. So, by setting the normalized factor to be 1 , a simple attribute distance between two edge points can be calculated via d(a(x1,yi),a(x2,y2)) = |G>(zi,2/i) - Gr(x2,y2)\ + |Gc(zi,y i) - Gc{x2,y2)\ + \0(xi,yi)-9it2\ + \9(x2,y2) - 6 i i2\ + \P (x i,y i) -P (x 2,y 2)l (4.8) where 6\i2 is the slant angle of the line connecting (X\,y\ ) and (x2,y2). A pair of edge points is deemed to be a match if their attribute distance is the smallest among all. Thus, we will label them and treat the remaining edge points as a new group. The same matching process is performed iteratively until all points are matched or the attribute distance between two edge points is still above a certain threshold. Finally, each matched pair is linked together to recover a broken edge. After edge linking of all pairs, if there is still some unmatched edge point, it is extended into the lost MB along its gradient until it reaches an edge line. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.2 Selective D irectional Interpolation (a) A lost M B w ith tw o edges linked (b) Tw o reference pixels are determ ined along each edge direction pi q p2 (c) Select reference pixels w ithin the same region o f p Pl+P 2 Figure 4.2: Selective directional interpolation: p = ' jp . d1 + d 2 After edges are recovered in a missing MB, the resulting edge lines partition the 2D plane into several regions. As shown in Figure 4.2, pixel p in the missing MB is interpolated using only boundary pixels in the same region to smoothly recover the lost information in that region. Let us assume that there are n edges in a missing MB. Each edge can be rep resen ted by a line eq u atio n y - Pi-rrii(x - xi) = 0, 1 < i < n, (4.9) 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where m* is the edge slope and (xj,yj) is the coordinate of an edge point of the *th edge. If this edge is recovered by a matching pair of edge points (xt, yt) and (Xj,yj), rrii — - J '~ .. Otherwise, m* = That is, it is determined by the gradient of the unmatched edge point. For each lost pixel p, we find its reference pixels to be used in the interpolation process. Along each edge direction, the reference pixels in neighboring MBs are obtained as shown in Figure 4.2(b). Note that only those reference pixels within the same region as p are reliable due to discontinuities caused by edges. Thus, sign tests are performed for the line equation of each edge to eliminate unreliable reference pixels. Specifically, let (px,Py) denote the coordinate of the lost pixel p, and (rx,ry) the coordinate of a reference pixel. The reference pixel is within the same region as p, if and only if \py — y % — rn*(px — x*)] and [ry — yt — rrii(rx — x,)} have the same sign for each i. After eliminating unreliable reference pixels, the missing pixel p can be direc- tionally interpolated via P i e P = = ^ ± , (4-10) 2-/k dk where pk is the /cth reliable reference pixel, and dk is the distance between pk and p. Figure 4.2(c) shows an example when two reference pixels are available. If a lost pixel is enclosed by edges, then no reference pixel is available. In such a case, p is interpolated from the nearest pixels along the those edges. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4 M M SE D eco d in g for P -fram e C oncealm en t In this section, we propose a novel error concealment algorithm based on the minimum mean square error (MMSE) criterion by improving the original scheme presented in [74], This algorithm attempts to conceal erroneous blocks as well as to suppress the error propagation effect. To be more specific, the decoder adopts an error propagation model to estimate and track the mean square error (MSE) of each reconstructed pixel value. Several modes are developed to conceal erroneous MBs, where each mode has its strength and weakness. The decoder combines these modes adaptively to minimize the MSE of each concealed pixel based on the error propagation model. 4.4.1 Error Tracking M odel The error tracking model and the general MMSE decoding procedure are reviewed in this section. For more details, readers are referred to [74], Two specific con cealment modes for P-frames will be described in the next section. In packet video transmission, erroneous packets are detected and discarded by the channel receiver, and only correctly received packets are passed to the video decoder. Consequently, the decoder knows the error locations but has no infor mation about the error magnitudes. Let us define a pixel error as the difference between its decoded value and error-free reconstruction. It is natural to treat each pixel error as a zero mean random variable with a certain variance. Here, we 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. would like to estimate and track the variance of each pixel error. To achieve this goal, we maintain an extra frame buffer called the error variance map. Each ele ment in the error variance map records the error variance c rp of the corresponding pixel p in the reconstructed video frame Pk- Suppose that the decoder reconstructs pixel p in Pk by motion-compensating it from a pixel in the previous frame Pk-i- Then, the pixel error of p is affected only by the propagation error. On the other hand, suppose that the value of p is lost so that p is replaced by a pixel q in Pk-i using a temporal concealment method. Then, the pixel error of p is given by the sum of the concealment error and the propagation error [74], The concealment error is caused by the loss of the motion vector and the DCT-encoded residual, and it is defined as the pixel error when the referenced pixel q is not corrupted. The propagation error is caused by the corruption of the referenced pixel q. It is assumed that the concealment error and the propagation error are independent of each other. Thus, we have < *l = (^conc + ^ p r o p , C4 ' 1 1 ) where a ( ? onc and c T p rop denote the variances of the concealment error and the prop agation error, respectively. Note that o^onc = 0 when the data for p are not lost, and c T p lop = 0 when the referenced pixel q is not corrupted. The ofonc can be obtained from training sequences using various error patterns. 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. # Integer-pixel position + or x Half-pixel position + x + + <7+= ( # i+<72 )/2 q* = (^ ,+ ^ 2 + ^ 3 + ^ 4 )/4 Figure 4.3: Interpolation schemes for half-pixel motion compensation. The propagation error variance (jprop in (4.11) is calculated based on the accu racy of the motion vector of p. Figure 4.3 illustrates the interpolation scheme for the half-pixel motion compensation in H.263 or MPEG-4, where ordinary pixels or ‘x .’ Let us consider three cases according to the accuracy of motion vector v = (vx,vy) as discussed below. • Both vx and vy are of integer-pixel accuracy. The current pixel p is predicted from an ordinary pixel q,, specified by motion vector v. Then, the error in q. propagates to p without attenuation, and the propagation error variance is given by • vx is of half-pixel accuracy while vy is of integer-pixel accuracy (and vice versa). are depicted by black circles and virtual interpolated pixels are depicted by ‘+ ’ (4.12) 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The motion vector v specifies a virtual pixel. For instance, suppose that the current pixel p is predicted from the virtual pixel q+ = (q\ + q2) / 2 in Figure 4.3. Let e\ and e2 denote errors in q\ and q2, respectively. Then, p is corrupted by (ei + e2) / 2 . Consequently, we have 2 2 2 _ 2 _ 7 ° ’ qi a Q2 ( A ^prop 2 ’ (4-13) where ( + = h i + 2 4 ^ 4 ) 2 + ^ 2 is called a leaky factor with its value in [ 0 , 1]. • Both vx and vy are of half pixel accuracy. The current pixel p is predicted from virtual pixel qx = (qi + q2 + q$ + q±)/4 as shown in Figure 4.3. Let e * denotes the error of qi for 2 = 1,2, 3,4. Then, we have ^ = a l = l x - < + < + < + < , (4 ,4 ) where / / i I 0 S i = i X q = i+ i ^ { e*ei K '* “ 4(1+2 > is another leaky factor with its value in [ 0 , 1]. 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Note that due to half-pixel motion compensation, errors attenuate as they prop agate. The leaky factors l+ and Z x are obtained from training sequences. Typical values of l+ and lx are 0.8 and 0.65, respectively. To summarize, the propagation error variance ( 7prop can be calculated from (4.12)-(4.14) according to motion vector accuracy. After obtaining crprop, the error variance ap of pixel p is updated by < r ^ onc + crprop, where ofonc depends on the concealment method for p. As mentioned previously, if the value for p is not lost, we have cr^onc — 0. In this way, the decoder can estimate and track the error variance of each pixel recursively. 4.4.2 M M SE D ecoding w ith Two C oncealm ent M odes Let us consider multiple concealment methods for a lost pixel simultaneously, where each concealment is called a mode. Based on the error tracking model, we can conceal pixel values by combining several concealment modes. The combina tion rule is dynamically determined to minimize the MSE of each pixel. Let us describe and analyze the MMSE decoding mechanism in more detail. Suppose a lost pixel with unknown value p can be concealed by two modes. The first mode replaces the pixel with value fix, and the second mode replaces it with value p2. Then, instead of using one of the two modes directly, p can be 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. concealed by a weighted sum of p\ and p2- given by p = apx + (1 - a)p2, (4.15) where a is a weighting coefficient. Let a'f and a\ denote the error variances of p\ and p2, respectively. Then, the error variance < r 2 of p can be written as a 2 = E { ( p -p )2} = E{(a(p! - p ) + (1 - a )(p 2 ~p))2} = a 2< 7 2 + (1 — a )2 of + 2o:(l — where p\< 2 denotes the correlation coefficient between (p — pi) and (p — p2). The optimal a that minimizes cr2 is given by « = 2 f 7 f t '* ™ - , (4.16) o f + o f - 2ph20ia2 and the minimum value of cr2 is given by a 2 = a2a2 2( l - p 2> 2 ) r r n n 2 I - . 2 ° 1 + ° 2 ~ 2Pl}2<7l&2 (4.17) 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It can be shown that cr^in has an upper bound, i.e. o-min < m in{oi,oj}- This indicates that the weighted sum p results in a lower error variance than the two modes applied individually, if the decoder selects the optimal weighting coefficient a in (4.16). However, in real applications, we do not know the accurate values of of, of and pii2. The weighting coefficient a obtained with inaccurate statistical measurements may result in a huge amount of distortion, especially when a < 0 or a > 1. Therefore, to be conservative, we impose the following restriction 0 < a < 1, (4.18) such that the absolute error of p is limited by \p - p \< max {\pi -p\, \p2 -p\}. By substituting (4.16) into (4.18), we have the following condition P i ,2 < m in { — , — (4 .1 9 ) < J\ < 7 2 When this condition is satisfied, oWn in (4.17) is an increasing function of pi,2 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. since its derivative is nonnegative: d^min = 2 g ? g |( f f i p i , 2 ~ cr2) ( a 2p h2 - e g ) > dpi,2 (o'? + 02 - 2 p i,2 0 1 0 2)2 ~ This suggests that the smaller the correlation coefficient is, the lower the error variance will be. Note that pi 2 = — 1 achieves the minimum value of cr2 lin while the maximum occurs when the equality holds in (4.19). However, pij2 is higher than zero in most cases, since any concealment method exploits similar spatial and temporal information. For example, adjacent MBs and previous reconstructed frames are commonly used to conceal the lost MB, even though specific methods may be different. One simple way to lower the correlation coefficient pii2 is to select different reference frames in the two concealment modes. Let us examine the following variance ratio This can be interpreted as the gain of the weighted MMSE decoding method, min {of, erf} (4.20) min compared with the decoding method that chooses the better one between the two concealment modes. By substituting (4.17) into (4.20), we have M M S E (4.21) 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where r . r& 2 cn-, o — min{ — , — }. G\ V2 It is clear that 8 ranges from 0 to 1. Let assume the the two concealment modes are selected such that the correlation coefficient p1 2 is close to 0. Then, the gain in (4.21) is maximized when < 5 = 1, that is, when = cr2. This indicates that the error variances of the two concealment modes should be as close as possible to get the best benefit of MMSE decoding. The MMSE decoding method can be summarized as follows. First, we choose two concealment modes based on the following two criteria: • They should have a small correlation coefficient pi,2. • They should provide similar concealment capabilities, i.e., a\ & cr2. The parameters a\, cr 2 and pij2 are obtained by training in advance. In the decoder, each pixel is reconstructed via (4.15) and (4.16). Then, the corresponding element in the error variance map is updated by (4.17). During the reconstruction and the map updating, if a > 1 , a is set to 1 and the error variance is updated to a\ to satisfy the constraint in (4.18). Similarly, if a < 0, a is set to 0 and the error variance is updated to o\. 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 P -F ram e C oncealm en t M od es Based on the above discussion, the proposed algorithm employs two temporal concealment modes in the decoder: 1 ) temporal linear interpolation and 2 ) motion vector recovery with block matching. Let us describe these two modes in detail below. 4.5.1 Tem poral Linear Interpolation Linear interpolation is often used for error concealment. As in (4.1), four pixel val ues in spatially adjacent MBs can be linearly interpolated to conceal an erroneous pixel. On the other hand, in this chapter, four pixel values in the previous frame is linearly interpolated to conceal a pixel temporally. We employ the temporal interpolation rather than the spatial interpolation, since temporal correlation is much higher than spatial correlation in general. The detail of the algorithm is described in Section 3.2.1. 4.5.2 M otion V ector R ecovery w ith Block M atching As mentioned in Section 4.2.2, there are several approaches to recover the motion vector of an erroneous MB. We adopt the block matching approach [65-67], which finds the motion vector for the set of surrounding pixels and uses that vector for the erroneous block. Figure 4.4 illustrates the idea of motion vector recovery with block matching. First, the decoder estimates the motion vector for the error-free 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (1) block matching (2) temporal replacement P P k-n k Figure 4.4: Motion vector recovery with block matching. surrounding pixels, which are adjacent to the erroneous block. In this chapter, the motion vector is searched from the previous frame Pk-i or the earlier frame Pk-2 , and the sum of square differences (SSD) is used as the block matching criterion. Then, the erroneous block is temporally replaced using the retrieved motion vector. Since MBs are decoded in a raster scan order, when reconstructing a MB, its right and lower adjacent MBs are not decoded yet. Thus, to simplify the decoding procedure, the matching of four sides can be reduced to that of two sides which include only upper and left surrounding pixels. If one side of the surrounding pixels is not error-free, then it is ignored when calculating the SSD. If all surrounding pixels are erroneous, the motion vector is simply set to the zero vector. To reduce the computational complexity of the block matching, the search area for the motion vector is reduced by exploiting the spatio-temporal correlation 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. between adjacent motion vectors. Let (a,, 6*), 1 < i < 4, denote the motion vectors of the four adjacent MBs, respectively. Then, the search area from the previous frame Pk-i is restricted to min{a,} < a < max{aj}, min{6,} < b < max{&j}, where (a, b) denotes the motion vector of the erroneous MB. Also, the search area from the previous previous frame Pk- 2 is restricted to min{aj} + c < a < maxfc^} + c, min{6j} + d < b < max{6j} + d, where (c, d) is the motion vector of the MB in Pk-1, which is at the same spatial location as the current erroneous MB. In this way, the decoder can reduce the computations for block matching significantly at the cost of slight performance degradation. 4.6 Sum m ary o f D ecod er Im p lem en ta tio n To reconstruct or conceal frame Pk, the proposed algorithm uses the information from frames Pk- 1 and Pk-2 - Thus, the decoder should maintain three video frame buffers. A lso, th e decoder requires ad d itio n al th re e fram e buffers to reco rd th e corresponding error variance maps. Therefore, a six frame buffer is in use. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let us first consider the decoding of I-frames. If an MB is error-free, it is reconstructed and the corresponding variances in the error variance map are set to 0. On the other hand, if an MB is erroneous, it is concealed by the directional interpolation in Section III and the error variances are set to the highest value 255. Next, let us consider the decoding of P-frames. The MMSE weighting method is applied to conceal erroneous MBs using the two concealment modes, • Mode 1) Temporal linear interpolation from frame Pk-1, • Mode 2) Motion vector recovery with block matching from frame Pk-i- The MMSE weighting method is also used to reconstruct error-free MBs using the following two modes. • Mode 3) Conventional reconstruction using frame P k-1, • Mode 4) Motion vector recovery with block matching from frame Pk-2 - Note that, in a P-frame, even though the pixel values of an MB are received cor rectly, the MB can still be severely corrupted by the error propagated from frame Pk-1- In such a case, mode 4 may provide better reconstruction by concealing the MB using the information in Pk-2 - Figure 4.5 shows the decoding flowchart for an MB in the P-frame. The proposed algorithm conceals an erroneous MB or reconstructs an error-free MB by combining two modes via (4.15) and (4.16), and then updates the error variance 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Yes Erroneous MB ? Start No MMSE MMSE R econ stru ct M B b y M o d e 3 Conceal MB by Mode 2 Conceal MB by Mode 1 R econ stru ct M B b y M o d e 4 Update error variance table Figure 4.5: MMSE decoding of an MB in a P-frame. Table 4.1: Parameters for MMSE decoding. The concealment error variances are normalized with respect to the intra concealment error variance 255.__________ Parameters Comments Erroneous MBs 8 1 0 0 p II CO 0 0 concealment error variance for mode 1 ^conc.2 = 45 concealment error variance for mode 2 P i , 2 = 0.47 correlation coefficient between mode 1 and mode 2 Error-free MBs ^conc,3 0 No concealment error variance in mode 3 8 “ 3 O II 0 0 o concealment error variance for mode 4 P3,4 — 0.4 correlation coefficient between mode 3 and mode 4 map via (4.17). Table 4.1 summarizes the parameters for P-frame decoding, in which the concealment error variances are normalized with respect to the intra concealment error variance 255. It is worthy to point out that the two concealment modes for erroneous MBs are designed to satisfy the criteria in Section IV.B. They have similar error variances and their correlation coefficient is relatively small. 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.7 S im u lation R esu lts The performance of the proposed algorithm is evaluated using the standard H.263 coder [1], In H.263, the group of blocks (GOB) is defined as a number of MB rows. For example, a GOB consists of a single MB row at the QCIF (176 x 144) resolution. In many cases, each GOB is packetized into one packet. However, in the GOB packetization, if a packet is lost, the information in the left and right MBs cannot be used for the concealment of an MB. To improve the concealment performance, we also implement the interleaving packetization by modifying the syntax of H.263. As shown in Figure 3.9, an interleaving packet for a QCIF frame is formed with 11 MBs chosen from every nine consecutive MBs. For instance, the first packet consists of the (9i + l)th MBs, where 0 < * < 11. Therefore, as in the GOB packetization, the interleaving packetization also generates 9 packets for each frame. But, when one packet is missing, an erroneous MB can be concealed more effectively by using the information in the upper, lower, left and right MBs. In this chapter, 16-bit cyclic redundancy check (CRC) [75] is appended to each packet for the error detection purpose. Although CRC requires a small overhead (2 byte), it can detect most of errors and can be easily implemented. In addition, the 2 byte overhead may be absorbed, when video packets are transm itted using the user datagram protocol (UDP) and the checksum in the UDP header is enabled. The packets, which are declared to be corrupted by the CRC decoder, are not used in the video decoder. 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.6: Three MBs are lost in the left image and concealed by the proposed directional interpolation in the right image. Figure 4.7: I-frame concealment results when 20 MBs are lost, where the left image is obtained by spatial linear interpolation and the right image is obtained by the proposed directional interpolation scheme. 4.7.1 Error C oncealm ent of I-Frames Figure 4.6 shows the performance of the proposed directional interpolation on three typical MBs. The test image is selected from the “Foreman” QCIF sequence. The lowest MB contains a single dominant edge. In the concealed image, the edge is successfully detected and bridged together. The uppermost MB contains three parallel edges. Representative edge points are also precisely determined and correctly matched. The middle one contains intersecting edges. No blur and 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.8: I-frame concealment results when the second GOB is lost, where the left image is obtained by spatial linear interpolation while the right image is obtained by the proposed directional interpolation scheme. significant discontinuities are observed in the concealed image, which indicates that the proposed algorithm eliminates outlying edge points and recovers missing pixels effectively. Figure 4.7 shows the concealed “Foreman” images, when 20 isolated MBs are lost during the transmission. In this test, each lost MB is surrounded by four correctly decoded MBs. The left image is obtained by the spatial linear interpolation method, and the right one by the proposed algorithm. It is observed that the linear interpolation generates severe blurring artifacts. In contrast, the proposed algorithm provides a faithful concealed image. Note that the complex MBs, which contain the eye, mouth and collar, are also well recovered with small distortion. If consecutive MBs are lost, the decoder defines the boundary pixels surround ing the missing part and then performs the edge recovery and the selective di rectional interpolation in the same way as the concealment of an isolated MB. In 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.8, the second GOB is lost. The proposed algorithm reconstructs most broken edges faithfully and gives better visual quality than linear interpolation. 4.7.2 Error C oncealm ent of P-Fram es Next, we evaluate the performance of the MMSE decoding for P-frame conceal ment. The test sequence is the “Foreman,” whose frame rate is 8.33 frames/s. The first frame is encoded in the I-frame mode, and the other frames in the P-frame mode. To focus on the P-frame error concealment, the first I-frame is assumed to be error-free, and errors are inserted only in the P-frames. Figure 4.9 shows the reconstructed ‘Foreman’ images, which are obtained by different temporal concealment methods: (1) ‘direct copying’ sets a lost motion vector to be the zero vector, (2 ) ‘temporal linear interpolation’ replaces a lost pixel with a weighted sum of the four pixels specified by the neighboring motion vectors, (3) ‘motion vector recovery’ retrieves a motion vector with block matching, (4) ‘static combining’ estimates a lost pixel by averaging two values p\ and P2 obtained by ‘temporal linear interpolation’ and ‘motion vector recovery’ and (5) ‘MMSE combining’ adaptively combines pi and p2 to minimize the MSE of the concealed pixel. It can be seen that both temporal linear interpolation and motion vector recovery provide significantly better performance than the direct copying method. Furthermore, by combining these methods, the proposed MMSE decoding achieves the highest PSNR, especially when propagation errors accumulate. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) Error pattern (10.56 dB) (b) Direct copying (26.28 dB) (c) Tem poral linear interpolation (34.66 dB) (d) M otion vector recovery (33.07 dB) (e) Static combining (34.56 dB) (f) MMSE combining (34.71 dB) Figure 4.9: Comparison of different temporal concealment techniques, when the previous frame is correctly reconstructed. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 45 40 35 0 C z w C L 30 25 20 0 20 40 60 80 100 120 fra m e n u m b e r Figure 4.10: The PSNR performance comparison of several temporal concealment techniques for the “Foreman” sequence, where the GOB packetization is used and the packet loss rate is 0 .1 . Figures 4.10 and 4.11 show the PSNR degradation as a function of the frame number, when the packet loss rate is 0.1. The former figure is obtained with the GOB packetization while the latter one with interleaving packetization. Both temporal linear interpolation and motion vector recovery outperform the direct copying. The static combining method is better than the former two, since it lowers the error variance of each pixel by averaging two random variables. Based on the error tracking model, the proposed MMSE decoding provides the highest PSNR performance. Note that the MMSE decoding provides about 5 dB per formance gain over direct copying in both GOB and interleaving packetization schemes. 114 GOB Packetization — e rro r fr e e re c o n stru c tio n d irec t copying te m p o ra l lin e a r in terp o latio n m o tio n v e to r re c o v e ry •••*•- sta tic com bining M M SE com bining___________ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 5 r 4 0 Interleaving Packetization o ' z 30 2 5 ■ 1 /V ' V error free reconstruction direct copying temporal linear interpolation motion vetor recovery static combining MMSE combining________ ^ „ , r ' ■ ■ c . l*S* ' * \ ■ ■ 20 L 20 4 0 60 frame number 100 120 Figure 4.11: The PSNR performance comparison of temporal concealment tech niques for the “Foreman” sequence, where the interleaving packetization is used and the packet loss rate is 0 .1 . 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. G O B Packetization 40 S I & d ire c t c o p y in g - 4- SI & M M SE co m b in in g ■ K Dl & d ire c t cop y in g Dl & te m p o ra l lin e a r in terp o latio n i Dl & m o tio n v e to r re c o v e ry Dl & M M SE co m b in in g ___________ 3 5 30 2 5 20 15 10 0 .4 0 .4 5 0 .5 0 .1 5 0.2 0 .2 5 0 .3 0 .3 5 0 .0 5 0.1 0 P a c k e t lo s s ra te Figure 4.12: The PSNR value versus the packet loss rate (GOB packetization), where ‘SI’ denotes the spatial linear interpolation scheme and ‘DP denotes the proposed directional interpolation scheme. 4.7.3 Error C oncealm ent of I and P-Fram es In this test, the random packet loss occurs in both I and P frames. Figure 4.12 shows the PSNR performance according to the packet loss rate in the case of the GOB packetization, and Figure 4.13 in the case of the interleaving packetization. For each packet loss rate, twenty error patterns are simulated, and the obtained PSNR’s are averaged over all patterns and all frames. The curves, denoted by ‘Dl & MMSE combining’, are obtained by the proposed algorithm, which uses th e d irectio n al in terp o latio n an d th e MMSE decoding to conceal I-fram es an d P- frames, respectively. It is observed that the proposed algorithm achieves the best 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Interleaving Packetization — SI & direct copying <r SI & MMSE combining Dl & direct copying Dl & temporal linear interpolation i Dl & motion vetor recovery Dl & MMSE combining________ 0.2 0 .2 5 Packet loss rate 0 .3 5 0 .4 0 .4 5 0 .5 0 .0 5 0 .1 5 0 .3 Figure 4.13: The PSNR value versus the packet loss rate for the interleaving packetization, where ‘SI’ denotes the spatial linear interpolation scheme and ‘DP denotes the proposed directional interpolation scheme. 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. performance among all methods tested. Also, the proposed algorithm benefits more from the interleaving packetization than from the GOB packetization. This is because that more information can be used for concealment in the interleaving packetization. These simulation results indicate that the proposed algorithm is a promising technique for robust video transmission. Moreover, the proposed algorithm re quires neither feedback channel nor extra delay. Since the proposed algorithm only applies to the decoder, it can be easily modified to be compatible with any video coding standard. 4.8 C onclusion In this chapter, we proposed novel I-frame and P-frame error concealment meth ods. The I-frame error concealment method employs edge detection and di rectional interpolation. It can efficiently recover both smooth and edge areas, while demanding a low computational complexity. The P-frame error conceal ment method uses error tracking and dynamic mode weighting. It conceals a pixel as a weighted sum of candidate pixels that are reconstructed using different concealment modes. The weighting coefficients are dynamically determined to reduce the propagation error and the concealment error. It was shown with simu lation results that the proposed methods provide significantly better performance in error-prone environments than most well-known concealment methods. 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Analysis of M ulti-H ypothesis M otion Com pensated Prediction (M H M C P) for Robust Visual Com m unication 5.1 In trod u ction Modern video coding standards, such as H.261, H.263, H.264, MPEG-1, MPEG-2 and MPEG-4, are all based on the hybrid scheme of motion compensated predic tion and transform coding. Raw video data are processed frame by frame, each of which is partitioned into macroblocks (MB). Then, a motion estimator matches 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. each MB to a MB in the reference frame, which has been previously encoded, to yield the smallest distance measure. Afterwards, the residual MB, obtained by subtracting the matched MB from the original MB, is further partitioned into smaller blocks of size 8 x 8 , which is then transformed via the discrete cosine transform (DCT) for the purpose of energy compaction. Finally, quantization is applied to DCT coefficients followed by entropy coding to generate the desired bit stream. Apparently, motion estimation (ME) plays an important role in the hybrid video coding. The basic idea of ME is to find the best MB in the reference frame to minimize the prediction error, thus, saving the bit rate. Several advanced ME techniques [76] have been proposed to improve the performance. One approach is to employ a more precise motion model with an increased number of parameters. For instance, the affine motion model [77] is often used for global motion compen sation. This is effective, especially when video contains complex motions such as camera rotation and zooming. Another way is to enlarge the buffer size to store more reference frames. This is called the long-term memory motion compensation (LMMC) scheme [78]. LMMC was originally proposed to stop error propagation [42]. If an error occurs in a frame, the encoder can choose an earlier but intact frame for predic tion. By allowing a feedback channel from the receiver to the sender, the encoder 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can use the same reference frame as the decoder does, which leads to an error- free reconstruction. For this purpose, LMMC has been adopted by H.263 [1] and MPEG-4 [79] as an optional mode. It was observed in recent research [76] that LMMC also achieves bit rate saving for cases such as scene cuts, uncovered back ground and texture with aliasing. Therefore, LMMC has been adopted by the new video coding standard H.264 [80] for high compression efficiency. The concept of LMMC can be furthermore extended to obtain the multi hypothesis motion compensated prediction (MHMCP) scheme [81], [82], [83]. MHMCP predicts a block from a weighted superposition of multiple reference blocks stored in the frame buffer. Unlike LMMC, which generates only one motion vector (MV) for each MB, MHMCP uses more than one MV (see Figure 5.1). The current MB is estimated by linearly combining these hypotheses, each of which is specified by one MV. Sullivan [81] first proposed this idea in 1993 and claimed that the coding bit rate can be reduced by MHMCP. Later, Sullivan and Orchard [84] analyzed the prediction accuracy of overlapped block motion compensation (OBMC), which is a special case of MHMCP. Girod [82] explained the efficiency of the B-picture mode theoretically, which is another example of MHMCP. He extended the analysis and developed general theory on how MHMCP achieves high coding efficiency in [83]. MHMCP has been studied[85,86] extensively for practical applications in re cent years. An iterative hypothesis selection algorithm was proposed in [87]. 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Moreover, Flierl et al. [ 8 8 ] showed that the hypothesis coefficients converge to 1/n regardless of their initial values, where n denotes the number of hypotheses. They further observed in [89] that two jointly estimated hypotheses provide a major portion of the achievable gain. Although MHMCP was originally proposed to achieve high coding efficiency, it was also observed to enhance the error resilience of compressed video. Kim et al. [90] proposed the double vector motion compensation, where each MB is predicted from two reference MBs. Even if one reference MB is corrupted, its error propagation can be effectively alleviated using the other reference MB for motion compensation. However, there is the encoder-decoder mismatch problem if only one reference MB is used in the decoder. This can be overcome by encod ing mismatch signals [91], and further improvement can be achieved based on a decoder distortion model [92]. The multi-hypothesis (MH) concept can also be adopted for error concealment at the decoder. Al-Mualla et al. [93] proposed an MH error concealment scheme, which uses the weighted average of several concealed signals to replace corrupted MBs. In [94], we developed an MH error concealment scheme that adaptively as signs weighting coefficients to lower concealment errors and suppress propagating errors. An MH error concealment method with iterative hypothesis search was proposed in [95]. It is worthwhile to point out that these three methods work well even without MHMCP in the encoder. Actually, none of them addressed 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the error resilience of MHMCP at the encoder side. Lin and Wang [96] discussed the error resilience property of double-hypothesis motion compensated prediction. However, a thorough analysis of the error resilience property of MHMCP is still lacking up to now. In this chapter, we investigate the error propagation effect in the MHMCP coder. First, the problem is formulated in Section 5.2. In Section 5.3, we discuss how the number of hypotheses influences error propagation and coding efficiency. In Section 5.4, we discuss the same issue in association with hypothesis coefficients. Concluding remarks are given in Section 5.5. 5.2 Error P rop agation M od el for M H M C P C oder As shown in Figure 5.1, a frame fk at time index k in MHMCP can be expressed as n fk = X ] W ih k > i + r k> C 5 '1) i= l where n is the number of hypotheses, hk,i is the ith hypothesis image, iut is a weighting (or hypothesis) coefficient, and rk denotes the residual error after MHMCP. The sum of the hypothesis coefficients is equal to 1 (i.e. i wi = 1) so that no bias error occurs in rk- The hypothesis images h k j’s are predicted blockwise from the reference frames using motion vectors. Ideally, the motion vectors should be jointly optimized 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.1: Illustration of the multi-hypothesis motion compensated prediction (MHMCP) scheme. to minimize the rate-distortion cost function. An iterative procedure [ 8 8 ] was proposed to select a locally optimal combination of motion vectors. In this case, the selected vectors are not necessarily the best ones for the conventional single hypothesis motion compensated prediction (SHMCP) scheme, but achieves better performance by joint prediction. In general, several hypotheses can be obtained from the same reference frame and overlap with one another in many cases. However, error resilience can be improved if each hypothesis image is obtained from a different reference frame. Then, even though a hypothesis is corrupted, the current frame can be recon structed with an acceptable quality using the information in the other hypotheses as done in [90]. Therefore, we constrain the zth hypothesis image h^ ; to be pre dicted from fk_i in this chapter. Then, we can rewrite (5.1) as n fk = ^ 2 WiP k,i(fk-i) + r k, i= l where P^i denotes the motion compensation operator from fk_i to fk, which is 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. specified by the motion vectors. Let us consider the effect of transmission errors in MHMCP. Suppose that the Oth frame is corrupted during transmission and its error is denoted by eo- The error propagates to subsequent frames, and the propagation error ek in the A ;th frame can be written as n ek = X ! WiPk,i(ek-i)’ k ^ ° - (5‘2) i=l with initial conditions e0 and ei = 0 for I < 0. As the error propagates, it tends to spread spatially due to non-zero motion vectors. Also, note that ek is given by the weighted sum of error components Pfcij(ek_j)’s. The weighted summation is equivalent to a lowpass filtering operation so that the error attenuates as it propagates. Generally speaking, the error ek is attenuated more effectively when error components i \ , ( e k_;)’s are less correlated. If all motion vectors are zero, each motion compensation operator P*,^ in (5.2) is an identity function. Then, we have n ek ^ ( ^»6k_; i=l = (f)(k, n, w)e0 (5.3) where <p{k, n, w) is a scalar, called the MH attenuation factor, which depends on the number of hypotheses n and the set of hypothesis coefficients w = , w^,. . . , wn 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let us consider an example with n = 2 and w = ( |, | ) . Since ei = |eo , 0(1,n,w ) = Also, e2 = |e i + | e 0 = | e 0 and 0(2,n,w ) = |. 0(fc,n,w) for a larger k can be computed in a similar way. From (5.3), we see that each ek is a scalar multiple of eo, and the correlation coefficient between two correspond ing pixels in ek and ei is 1. Therefore, the error attenuation is the weakest when all motion vectors are zero. Let .Dfc^o denote the mean square error (MSE) of pixels in the fcth frame due to the propagation of an initial error in the Oth frame. When all motion vectors are zero, we can obtain the following relationship from (5.3): o = [ 0 (fc, n, w)]2£V -0, (5.4) where Dot-o is the mean square value of the initial pixel errors in the Oth frame. However, motion vectors are not fixed to zero in practice, and the initial error experiences a stronger attenuation. Even if MHMCP is not employed, the error is attenuated by the spatial filtering process in a typical video coder, such as sub pixel accuracy motion compensation, overlapped block motion compensation and deblocking filtering. In [97], the attenuation due to the spatial filtering was ana lyzed and approximated by a decay factor where 7 is a parameter describing the strength of the spatial filter. By incorporating the decay factor into (5.4), we 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This is our model for error attenuation, which is resulted from both MHMCP and spatial filtering. In this chapter, 7 is trained from test sequences and set to 0.012. Let us compute the MH attenuation factor (j)(k, n, w) and discuss its effect on the error resilient property of the MHMCP coder in the following two sections. 5.3 E ffect o f H y p o th esis N u m b er In an error-prone environment, it is advantageous to employ MHMCP rather than SHMCP to reduce the error propagation. However, MHMCP needs more bits to represent the additional motion information. In this section, we discuss how the hypothesis number affects error resilience and coding efficiency. In particular, we compare the performance of MHMCP with that of the random intra refreshing scheme. 5.3.1 Im pact on Propagation Error Let us analyze the relationship between the hypothesis number n and the propa gation error. It is assumed that hypothesis coefficients are all the same. In other 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. words, Wi = - for 1 < i < n and ’ * n — — (5.6) Then, the solution to (5.3) can be obtained as 1 ( 1 + I)* -i n v n ' 1 < k < n 4>(k, n, w) (5.7) 2 7 1 + 1 ’ oo V When k = 1, the < fi(k, n, w) value achieves the lowest value K In other words, the propagation error drops significantly right after the corrupted frame containing an initial error. But, the propagation error gently increases in the subsequent frames until the nth frame, in which the attenuation factor achieves the highest 4> (n + l,n , w) = ^[(1 + ^ )n — 1] and increases slowly until the (2 n)th frame, and so on. The oscillating phenomenon becomes smoother and smoother, and the attenuation factor eventually converges to a non-zero value This indicates that the propagation error cannot be totally eliminated by MHMCP alone, and th e p ro p a g a tio n erro r is m ore effectively su p p ressed w hen m ore h y p o th eses are used. value ^(1 + ^)n h Then, the attenuation factor drops to the second lowest value 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Error Propagation Effect 0.8 0.9 Estimated MSE for 2H Simulated MSE for 2H Estimated MSE for 5H Simulated MSE for 5H Estimated MSE for 10H Simulated MSE for 10H 0.7 0.6 0.3 0.4 0.2 0.1 0 50 10 20 30 40 60 Frame Number Figure 5.2: Comparison of theoretical and experimental NMSE values after a burst error occurs in the 10th frame of the “Foreman” sequence. The H.264 reference codec of version JM6.1e has been modified to provide the MHMCP functionality. The “Foreman” QCIF (176 x 144) sequence at 25 frames/s is used in this test. A burst error causes three consecutive groups of blocks (GOBs) lost in the 10th frame, and no more loss occurs in other frames. Figure 5.2 shows the theoretical and experimental NMSE (Normalized MSE) values of each frame for three hypothesis numbers n = 2, 5 and 10. The theoretical NMSE is computed with the factor (5-5) while the experimental NMSE is computed as the ratio of the actual MSE and the initial MSE of the 10th frame. We see that the theoretical and experimental NMSEs have an excellent match. Under the same test condition, we also plot the NMSE performances with respect to the hypothesis number n in Figure 5.3. Each NMSE value is averaged between 100 frames (from the 1 0 th to 109th). As shown in the figure, when we 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.7 0.6 0.5 LU CO 5 0.4 ■ o < D N 75 | 0.3 o z 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 N H ypotheses Figure 5.3: The averaged NMSE as a function of the hypothesis number for the “Foreman” test sequence. employ a larger number of hypotheses, the averaged propagation error becomes smaller. However, a large number of hypotheses uses more bits to send additional motion vectors. The relationship between the hypothesis number and the bit rate will be studied in the next section. 5.3.2 Im pact on B it R ates The hypothesis number n is related to the bit rate in two factors. On one hand, as more hypotheses are used, MHMCP yields smaller prediction errors, thus saving bits to encode residual DCT coefficients. On the other hand, more hypotheses demand more bits to encode a larger number of motion vectors. Thus, a balance has to be found between these two factors. 130 Estim ated MSE o Sim ulated MSE A \ - W - A. \ - \ \ ''k v G .. . ............ ------ ^____ _ _ i i i i i , f ■ * Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. From (5.1), we see that MHMCP obtains the prediction p of each pixel p in the & ;th frame from n hypothesis pixels Pi, 1 < i < n, via where of = E{(p — Pi)2} and pij is the correlation coefficient between (p — Pi) and (p — pj). It is difficult to derive a closed-form solution to (5.8) in general. However, we can investigate the following three special cases to gain some insights into MHMCP. • C ase (1) pitj = 0 for all i and j. In this case, the individual prediction errors (p — p?)’s are assumed to be uncorrelated. Then, the joint prediction error has the minimum variance n Thus, the variance of the prediction error can be written as °MH = E { ( p - p ) 2} n n (5.8) 2 _ °MH 1 l > 2^i=i ^ when 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Thus, each hypothesis coefficient should be inversely proportional to its corresponding error variance. Also, note that can made arbitrarily small. That is, if we can choose as many hypotheses as possible, each of which yields a prediction error smaller than a certain constant. However, this is not realistic in practice since individual prediction errors tend to be correlated as the number of hypotheses becomes large. Actually, the correlation coefficients are often larger than 0.9. • C ase ( 2 ) af = < 7gH and pl% ] = p for all i and j. Assume that the variance of each individual prediction error is equal to the prediction error variance O gH of SHMCP, and the correlation coefficients are fixed to p. Then, we have the minimum variance of the joint prediction error as _2 ( i + ^ - ^ V s H ,K n \ MH = ---------- “----------- > (5.9) f L when 1 Wi = - . n In this case, ajAii is bounded below by pcr|H so that cr^H does not vanish even if we use as many hypotheses as possible. Also, note that hypotheses with lower correlations yield a smaller joint error variance. C ase (3) n = 2 . 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. > | 0.7 L U X 5 ■o 0.6 0) N r a | 0.5i z 0.4 0.3 max{CT2/ c 1,o 1/C T 2}= 1 ' max{o2/ c 1,a 1/a z} = 2 1 * m a x ^ /c r 1 ,o t o 2} = 5 max{<j2/ c 1 ,< t1/c t 2} = 10 ' 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Correlation Coefficient p 1 Figure 5.4: The variance of the joint prediction error as a function of the correla tion coefficient between two individual prediction errors for the double-hypothesis MCP scheme. If two hypotheses are used, the variance is minimized to f f M H — a\ + erf - 2 pi,2cri<r2 ’ (5.10) when Wi w2 &2 ~ Pl,2<7l0'2 erf + erf - 2 / 0i,2o-icr2’ Q ~ 1 ~ Pl,20'l0'2 ° l + ° 2 ~ 2 p i,2 crl cr2 In Figure 5.4, we plot a as a function of pi,2, when the ratio of erf to erf (or vice versa) is equal to 1, 2, 5 and 10. The value of aj^H is normalized such that its maximum is 1 for each curve. When the ratio of erf to a\ is 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. large, a high pii2 value leads to a small cr^H value. However, in general, pii2 is small when the ratio is large, and cr^H stays in the plateau region of the operational curve. Moreover, the two error variances are close to each other in most cases. Under these circumstances, we can conclude that, the lower Pii2 is, the smaller is. To facilitate the following analysis, we assume that the prediction error is Gaussian-distributed. From the rate-distortion function for a Gaussian random variable [49], it can be shown that, as compared to SHMCP with prediction error variance cr|H, the bit rate change for residual DCT coding in MHMCP can be written as To compute Ai?residuab we adopt Eq. (5.9) in Case (2) for cr^H among the above three cases since it is close to the real world situation. The overall change in the bit rate, denoted by A R, can be expressed as the sum of the reduced bit rate for residual coding and the increased bit rate for additional motion vectors. Thus, we have where c is the average number of bits to represent an additional motion vector and 256 = 162 is the number of pixels per MB. Parameter c depends on the coding method of the motion information. In this chapter, we predictively encode the 'residual ^SH A R = \ l o §2 l + p(n —1 ) ( n — l)c n /_ -.-.x ----------------- + — (bits/pixel), (5.11) n 25b 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. motion vectors to exploit the correlation between motion vectors, and c is set to 16. Figure 5.5 compares the averaged bit rates for the lOth-llOth frames of the “Foreman” sequence, when the hypothesis number n varies from 1 to 10. It is observed that the theoretical and experimental bit rates match closely. Estim ated Bit Rate Sim ulated Bit Rate '& 0.8 * 0.7 0.6 0.5 0.4 0.3'; N H ypotheses Figure 5.5: Comparison of theoretical and experimental bit rates as a function of the hypothesis number for the “Foreman” sequence. 5.3.3 R ate-D istortion A nalysis It was shown in [89] that a small number of hypotheses n is preferred for high coding efficiency. The optimal n is dependent on the correlation coefficients be tween hypotheses. In typical video sequences, hypotheses are highly correlated and two hypotheses are sufficient to provide good rate-distortion performance in 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. an error-free environment. However, a larger value of n is beneficial to propa gation error suppression. Generally speaking, the hypothesis number should be selected by considering both coding efficiency and error resilience. By combining the distortion data in Figure 5.3 and the rate data in Figure 5.5, we plot the theoretical and experimental rate-distortion (R-D) curves in Figure 5.6. Each R-D point corresponds to one hypothesis number n, which takes values from 1 to 10 from the upper left position to the right lower position. Recall that these data were obtained by inserting a burst error at the 1 0 th frame of the “Foreman” sequence. We see that the distortion decreases faster as the hypothesis number increases from 1 to 3. When n > 4, an increase in the hypothesis number gives only a marginal gain in image quality, but demands a much higher bit rate. Thus, for this error type, it is reasonable to confine the choice of consideration to be n < 4 or n = 1,2,3. Let us assume that the decoder can inform the encoder of the locations of erroneous packets through a feedback channel. In such a case, we can use the Lagrangian method to find the optimal number of hypotheses to minimize the distortion subject to the constraint on the bit rate. The distortion is the sum of two components: the source distortion Ds and the channel distortion Dc. The source distortion Ds is given by in (5.9). The correlation coefficient p is trained from test sequences and set to 0.98. Also, if the current frame is the £:th frame, its channel distortion is caused by the accumulated propagation errors 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.7 0.6 £ u u £ °-5 3 m 0.4 0.3 0.2 0.1 y n=1 < i Estim ated MSE e Sim ulated MSE I A w \\ \ a =3, n=10 5.2 0.4 0.6 0.8 1 1.2 1.4 Bit R ate (bits/pixel) 1.6 Figure 5.6: Theoretical and experimental rate-distortion curves for the “Foreman” sequence caused by a burst error. from the reference frames, which can be expressed as fc-i Dc = ^ Dk^i, l= k—n where Dk< ~ i is defined in a way similar to that in (5.5). Then, the Lagrangian cost function can be written as J = Ds + Dc + A {R + A R), (5.12) where R is the bit rate for SHMCP and A R is the increased bit rate due to MHMCP as given in (5.11). The rate-distortion optimization using the Lagrangian method has been extensively studied for SHMCP. In this chapter, we adopt the 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. approach in [98], where the Lagrangian multiplier is determined by the quantiza tion step size Q via A = 0.85Q2. In the following test, we consider the case where GOBs are lost according to a two-state Markov model with a loss rate of 10% and a burst length of 2. The error locations affect the resulting PSNR performance significantly. Thus, 50 different error patterns are simulated and each PSNR point is obtained by averaging over all the patterns and the frames. The rate-PSNR performance of an adaptive MHMCP scheme that chooses the optimum hypothesis number for each MB to minimize the cost function in (5.12) is plotted in Figures 5.7 (a) and (b) for the 1st to the 100th frames of the “Foreman” and “Carphone” sequences, respectively. The performances of SHMCP/MHMCP schemes with a fixed hypothesis number n = 1, 2 or 3, are also shown in Figures 5.7 (a) and (b), for comparison. We see that MHMCP with hypothesis number 2 or 3 outperforms SHMCP. The double-hypothesis MCP is better than the triple-hypothesis MCP at low bit rates, whereas the triple-hypothesis MCP is much more efficient at higher bit rates. Moreover, the adaptive scheme has the best performance over the whole range of bit rates and provides up to 5.5 dB better performance than SHMCP. 5.3.4 C om parison w ith R andom Intra R efreshing The random intra refreshing (IR) scheme is a well known error resilient tool used in the video encoder to stop error propagation. In this section, we compare its 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Adaptive Hypothesis Number 34 33 32 31 o N=1 e N=2 ► N=3 Adaptive N 30 29 28 27 26 0 200 400 600 Bit Rate (kbps) 800 1000 1200 (a) Foreman Adaptive Hypothesis Num ber 31 30 29 28 o N=1 27 N=3 Adaptive N 26 25 24 0 100 200 300 400 500 600 700 800 900 Bit R ate (kbps) (b) Carphone Figure 5.7: Comparison of the rate-PSNR performances of SHMCP (n = 1), MHMCP (n = 2,3) and adaptive MHMCP for (a) “Foreman” and (b) “Carphone” sequences. 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. error resilient capability with MHMCP. Our discussion will lead to some useful guidelines for the MHMCP design. In IR, an MB is intra-encoded with probability pr, which is called the refreshing rate. If the Oth frame is corrupted, an MB in the 1st frame is affected by the propagation error with a probability (1 — pr). In general, the probability that an MB in the & ;th frame is affected by the initial error in the Oth frame can be approximated by (1 — pr)k. Thus, as k approaches infinity, we see that IR can stop error propagation completely while MHMCP cannot as shown in Eq. (5.7). In Figure 5.8, we compare the rate-PSNR performance of MHMCP and IR with the “Foreman” test sequence. The performance of MHMCP is obtained by setting the hypothesis number n to 1, 2, 5 or 10. For IR, the refreshing rate is chosen to be ^ = 2.02%, = 10.1%, §§ = 50.51% and |jj = 90.91%. To achieve a fair comparison, the other encoding parameters such as the quantization step size are set to be the same in both cases. In this test, a single burst error, which causes the loss of three consecutive GOBs, is introduced into the coded bit stream. Figure 5.8(a) shows the rate-PSNR performance of the first frame after the corrupted one. We see that MHMCP outperforms IR in suppressing error propagation in the short term. On the other hand, Figure 5.8(b) shows the average rate-PSNR performance over the 100 frames after the corrupted one. As shown in the figure, IR is better than MHMCP in the long run, since it can stop the propagation error completely. However, if I-frames (intra-frames) are inserted 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. periodically, it is likely that MHMCP can achieve better performance than IR in both the short and the long terms. Furthermore, let us compare the rate-PSNR performances of MHMCP and IR for the “Foreman” and the “Carphone” sequences in the GOB loss environment with a loss rate equal to 10% and a burst length of 2 in Figure 5.9. Only the first frames are encoded in the I-frame mode, and no intermediate I-frames are inserted. Each PSNR is obtained by averaging over 100 frames and 50 different error patterns. We see that IR outperforms MHMCP at higher bit rates. However, even without periodic I-frame insertion, MHMCP is still preferred with a small value of n at low bit rates. 5.4 E ffect o f H y p o th esis C oefficients In this section, we study how hypothesis coefficients w^s are related to error propagation. As shown in the previous section, it is sufficient to use two or three hypotheses especially at low bit rates. In the following analysis, we focus on the case n — 3. The case of n = 2 can be easily analyzed in a parallel fashion. 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Burst Error (k=1) 55 . 1 5 0 T 5 5 g 45 a d £ 40 U J 3 a d z tn Q. 35 200 C d a d 52 50 48 46 44 42 40 38 36 34 Multihypothesis Random Intra MB Refreshing / 300 200 300 400 400 500 600 700 Bit R ate (kbps) (a) Burst Error (k=100) 800 900 1000 Multihypothesis Random Intra MB Refreshing 500 600 700 Bit Rate (kbps) 800 900 1000 (b) Figure 5.8: Comparison of the rate-PSNR performance between MHMCP and IR after a burst error: (a) the coding of the next frame and (b) the average performance of encoding the 1 0 0 frames after the corrupted one. 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Random Packet Loss R ate = 0.1 36 34 ■ g 32 2 8 30 o > D C C l > 2 28 L 2 DC 24 22 22, - s' - / A .......... / / / / / ' - / .................... . V - / / ■ o ~ ~ Multihypothesis t Random Intra MB Refreshing - :oo 300 400 500 600 700 Bit Rate (kbps) 800 900 1000 (a) Random Packet Loss Rate = 0.1 40 38 c o t5 5 36 to ® 'iA C C 34 o 32 in £ C d 30 z C O Q_ 28 2? , - / / A ' ' / / - f / Multihypothesis — ^ Random Intra MB Refreshing 00 200 300 400 500 Bit R ate (kbps) 600 700 (b) Figure 5.9: Comparison of the rate-PSNR performance between MHMCP and IR in the GOB loss environment: (a) the “Foreman” and (b) the “Carphone” sequences. 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.1 Im pact on Propagation Error When the number of hypotheses n is three and the hypothesis coefficients are given by the vector w = (w i,w 2,w 3), the solution to (5.3) can be derived as 4> {k, 3, (w1,W2,W3)) 1 + (1 - twi + W 3) E j = 0Q ;fc j Pj + W ,iY !}= lak 1 2 - wi + w3 (5.13) where (iW l - 1 ) + V (1 - ^ l ) 2 ~ 4u;3 a = ------------------- , 0 _ (wi - 1) - y/(l - Wi)2 - 4w3 By inserting (5.13) into (5.5), we can obtain the MSE of pixels in the /cth frame after the corrupted frame. We have tested the effect of a burst error with several combinations of hypoth esis coefficients and confirmed that theoretical MSEs match experimental MSEs well in every combination. Figure 5.10 demonstrates one example of the error propagation effect on the “Foreman” sequence, when w = (0.1,0.45,0.45). It is observed that the propagation error oscillates with a decreasing amplitude. The average MSE is plotted as a 2-dimensional surface in terms of hypothesis coefficients w \ a n d w 2 in F ig u re 5.11. N ote th a t, since w x - \ - W 2 + w z = 1, th ere exist only two independent weighting coefficients. The test condition is given below. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Error Propagation Effect 1 0.9 0.8 0.7 0.6 U J M 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Frame Number Figure 5.10: Comparison of theoretical and experimental MSEs after a burst error in the triple-hypothesis MCP for the “Foreman” sequence with weighting coefficients w = (0.1,0.45,0.45). A burst error is introduced in the 3rd frame of the “Foreman” sequence, and the MSEs are averaged over 100 frames (from the 3rd frame to the 102nd frame) and then normalized with respect to the initial MSE of the 3rd frame. The star points represent the theoretical MSE values, while the lines represent the experimental MSE values. The largest MSE is observed when w = (1,0,0), which corresponds to the conventional SHMCP. Also, the other two extreme cases, w = (0, 1 , 0) and (0,0,1), provide relatively large MSEs. This indicates that MHMCP alleviates propagation errors by combining multiple prediction signals. The minimum MSE is achieved around w = (0,0.18,0.82). However, the hypothesis coefficients also affect th e p re d ic tio n perfo rm an ce an d , hence, th e bit ra te . T h e p o in t w = (0,0.18,0.82) results in a high bit rate due to less efficient motion prediction with 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Est. MSE at (0.1,0.45,0.45) Sim. MSE at (0.1,0.45,0.45) f \ K „ / 1 • . f e x . M S E v .s. H y p o th e s is C o e ffic ie n ts 0.8'. 0 .6 ' io 0.4' 5 0. 2' f\ 0 1 Figure 5.11: The propagation error of the triple-hypothesis MCP is plotted as a function of hypothesis coefficients. zero value of w\. The effect of hypothesis coefficients on the bit rate is discussed in the next section. 5.4.2 Im pact on B it R ates The variance of prediction errors in the triple-hypothesis MCP can be expressed by replacing n with 3 in Eq. (5.8): w2 0 0 w1 3 3 j=l,j& (5.14) 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We found experimentally that the individual prediction error variance and the correlation coefficients can be approximated as o\ = 1.18<Ti, al = 1.385crj, ph2 = Pi,3 = P2,3 = 0.95. Similar to (5.11), the overall change of the bit rate is the sum of the reduced bit rate due to smaller prediction errors and the increased bit rate due to two additional motion vectors Ai? = \ log2 ^ (bits/pixel), (5.15) Z Z O O as compared to the conventional SHMCP with the prediction error variance o\. We compare the experimental bit rates with the theoretical bit rates for the “Foreman” sequence in Figure 5.12. The star points represent the theoreti cal bit rates while the lines represent the experimental ones. The highest bit rate is observed at w = (0 , 0 , 1 ), which provides the poorest motion compen sation performance. On the other hand, the lowest bit rate is achieved around w = (0 .8 , 0 .2 , 0 ), which yields smaller prediction errors by interpolating two hy potheses. This is consistent with the result in [89] that two hypotheses provide a major portion of the achievable coding gain in MHMCP. 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B it R a te v .s . H y p o th e s is C o e ffic ie n ts 500 b 350- 300 0.2 0.5 0.4 0.6 0.8 1 1 w2 w1 Figure 5.12: The impact of hypothesis coefficients on the bit rates for the “Fore man” sequence in the triple-hypothesis MCP. 5.4.3 R ate-distortion A nalysis Using the MSE and the bit rate models, we plot theoretical R-D data points in Figure 5.13 for the triple-hypothesis MCP after a burst error, where the x-axis denotes the differential rate, which is required in addition to the SHMCP rate, and the y-axis is the normalized MSE, which is averaged over 100 frames after the burst error. Each point corresponds to a certain value of w = (w i,w 2,w 3). The three spikes are observed in the three extreme cases: w = (1,0,0), and (0 , 1 , 0 ) and (0,0,1). We see that the choice of hypothesis coefficients affects the R-D performance considerably. From these data, it is found that the optimum R-D points on the convex hull have similar w2 values, which are concentrated around 1/3. The red curve corresponds to various possible combinations of (irq, 1^3) when w2 is fixed to 1/3. This indicates that, for a video codec with triple-hypothesis 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.7 0.5 2 0.4 0.3 0.2 -0.05 0.05 0.15 0.2 0.25 0.3 A R Figure 5.13: The rate-distortion performance of the triple-hypothesis MCP after a burst error. MCP, we can fix w2 to 1/3 and change only w\ and w3 to meet the overall rate or distortion requirement. For given wx, w3 can be computed by 2/3 — w\. The adaptation of hypothesis coefficients can be made at the sequence or the frame level. However, due to the variation of video contents and channel conditions, the performance of MHMCP can be further improved if the hypothesis coefficients are adapted at the MB level. After plugging (5.14) and (5.15) into the cost function in (5.12), we can select the optimum set of coefficients w = (u>i, 1/3, 2/3 — Wi) that minimizes the cost function. In our implementation, w\ is varied from 0 to 1 with a step size 0.2. Figure 5.14 shows the rate-PSNR performance of the adaptive hypothesis coefficient scheme for the “Foreman” and “Carphone” sequences at the MB-level. GOBs are lost according to the Markov model with a loss rate of 10% and a burst length of 2. We see that the adaptive 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scheme provides up to 2.5 dB performance improvement over the fixed scheme with w = (0.33, 0.33,0.34). 5.4.4 Com parison w ith R andom Intra R efreshing As done in Section 5.3.4, we would like to compare the error resilient capability of the triple-hypothesis MCP with that of the random intra refreshing (IR) scheme by varying hypothesis coefficients. We compare the rate-PSNR performance of the triple-hypothesis MCP and IR for the “Foreman” and “Carphone” sequences in Figures 5.15 and 5.16, respec tively. For the triple-hypothesis MCP, we fix w2 to 1/3 and vary W\ from 0.1 to 0.6. For IR, the refreshing rates are varied to obtain a range of different bit rates. The H.264 standard adopts intra prediction, which improves the coding gain but deteriorates error resilience. IR with and without intra prediction are both tested and depicted in Figures 5.15 and 5.16. Three consecutive GOBs are dropped at the 3rd frame. Figure 5.15 (a) and Figure 5.16 (a) show the PSNR performance of the first frame after the corrupted one, while Figure 5.15(b) and Figure 5.16 (b) show the average PSNR performance over the 100 frames after the corrupted one. MHMCP significantly outperforms IR schemes for both test sequences in the short term. MHMCP still performs better than IR with intra prediction in the long term, but achieves lower PSNR values than IR without intra prediction. Therefore, MHMCP serves as a good 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 35 34 33 Adaptive Hypothesis Coefficient o c z w CL 32 31 30 29 28 Q C , .. Adaptive HC H=(0.33,0.33,0.34) - ..— ....---- --------------- " I p'" \ r/ - ll j I l i 4j - 0 200 400 600 800 1000 1200 Bit Rate (kbps) (a) Adaptive Hypothesis Coefficient Adaptive HC H=(0.33,0.33,0.34) y 3 "' Too 200 300 400 500 600 Bit R ate (kbps) 700 800 900 (b) Figure 5.14: Performance comparison of the adaptive and the fixed hypothesis coefficient schemes with w = (0.33,0.33,0.34) for (a) the “Foreman” and (b) the “Carphone” sequences. 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 27 2g---------1 ---------' --------- 1 ---------' ---------' ---------' --------- 280 300 320 340 360 380 400 420 Bit R ate (kbps) (a) Burst Error (k=100) 38 36 c o 0 5 1 34 o h. LU o 30 o : z CL 28 2|8 0 300 320 340 360 380 400 420 Bit R ate (kbps) (b) Figure 5.15: The rate-PSNR performances of the triple-hypothesis MCP and IR for the “Foreman” sequence due to a burst error: (a) the next frame and (b) the average of the following 1 0 0 frames after the corrupted one. / (0 .6 ,0 .3 3 ,0 .0 7 )/ -‘ (0.1,0.33,0.57) « * i Multihypothesis Random Intra MB Refreshing RIR with Intra Prediction e Multihypothesis * Random Intra MB Refreshing • ■ ■ ■ • ■ * ■ ■ ■ ■ RIR with Intra Prediction 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Burst Error (k=10) 32 31 30 0) a: < D ® 29 § L U O a: z to C L 28 27 1 1 p • r i . J - - V ^ -"""(0.1,0.33,0.57) /' X / •y (0.6,0.33,0.07) o - Multihypothesis Random Intra MB Refreshing r - RIR with Intra Prediction 220 240 260 280 300 Bit R ate (kbps) (a ) Burst Error (k=100) 320 340 360 37 36 35 34 33 32 31 30 - ^,-"(0.1,0.33,0:57) ,.-K X / / „ - ' ' ✓ • - S ' (0 .6 ,0 .3 3 ,0 .0 7 ),1 - Multihypothesis — p — Random Intra MB Refreshing c RIR with Intra Prediction 220 240 260 280 300 320 Bit R ate (kbps) 340 360 (b) Figure 5.16: The rate-PSNR performances of the triple-hypothesis MCP and IR for the “Carphone” sequence due to a burst error: (a) the next frame and (b) the average of the following 1 0 0 frames after the corrupted one. 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. error resilient tool in the H.264 codec. Also, even for H.263 and MPEG4 codecs without the enforced intra prediction, MHMCP can still be advantageous, if I- frames are periodically inserted. 5.5 C onclusion The error propagation effect in MHMCP was examined and the relationship be tween the rate-distortion performance and the hypothesis number and hypothesis coefficients was thoroughly analyzed in this research. MHMCP can alleviate the effect of error propagation by combining several predictions for motion compensa tion, but requires a higher bit rate to represent the additional motion information. It was shown that a hypothesis number no larger than three is suitable at low bit rates. Also, in the triple-hypothesis MCP, the optimum R-D points are achieved when the second coefficient w2 is set to 1/3. By comparing MHMCP with the intra refreshing (IR) scheme, we showed that MHMCP suppresses the short-term error propagation more effectively than IR. The performance of MHMCP can be improved furthermore, if the hypothesis number and the hypothesis coefficients become adaptive at the MB level at the expense of a higher encoder complexity. 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Conclusion and Future work 6.1 C onclu sion In this work, various error resilient techniques for robust video transmission over noisy channels have been proposed. We described our research objectives and main contributions in Chapter 1. Then, we reviewed related previous work in Chapter 2. Unequal error protection schemes for both multi-user and end-to-end systems were presented in Chapter 3. Fast searching algorithms were proposed to significantly speed up the system optimization. Additionally, we developed a simple rate-distortion model in terms of quantization parameter, so that the rate and the distortion could be estimated without an extensive encoding procedure. In C h a p te r 4, tw o novel erro r concealm ent m eth o d s were p ro p o sed for I-fram es and P-frames, respectively. In Chapter 5, we investigated the error propagation 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. effect in the MHMCP coder and analyzed the rate-distortion performance in terms of the hypothesis number and hypothesis coefficients. 6.2 F uture W ork As a new standard with distinguished high coding efficiency, H.264 draws much attention from both academic and industry societies. It adopted numerous smart options to enhance the video performance in rate-distortion sense. However, as the signals are compressed to smaller size, they become more error sensitive. Thus, H.264 coded video streams are more vulnerable than those streams coded by other standards. It would be beneficial to redesign or modify the existing error resilient tools so that they can be effectively employed in H.264 and do not sacrifice the coding efficiency much. One example is to utilize the SP/SI frame concept to stop or reduce error propagation. Specifically, for each encoded macroblock, we encode additional predicted versions of it using different reference frames (or different prediction methods) and save them as SP/SI macroblocks. During transmis sion, these SP/SI macroblocks are used to replace the original macroblocks in the output video stream if they are corrupted. The way to encode these SP/SI macroblocks ensures that such a replacement will not cause any mismatch at the decoder side. This scheme introduces a small amount of bit rate overhead only when there are transmission errors and no overhead when no error occurs. 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References [1] Video coding for low bitrate communication, 1998, ITU-T Recommendation H.263. [2] T. T. Cheung, M.S. So, Roger S.K. Cheng, and K.B. Letaief, “Adaptive unequal error protection and vie reshuffling for image transmission over wireless channels,” in Vehicular Technology Conference Proceedings, 2000. VTC 2000-Spring Tokyo. 2000 IEEE 51st, May 2000, vol. 2, pp. 800-804. [3] M. Gallant and F. Kossentini, “Rate-distortion optimized layered coding with unequal error protection for robust internet video,” IEEE Trans. Cir cuits Syst. Video Technol., vol. 11, no. 3, pp. 357-372, Mar. 2001. [4] H. Li and C. W. Chen, “Robust image transmission with bidirectional synchronization and hierarchical error correction,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 1183-1187, Nov. 2001. [5] H. Zheng and K. J. R. Liu, “Robust image and video transmission over spectrally shaped channels using multicarrier modulation,” IEEE Trans. Multimedia, vol. 1, no. 1, pp. 88-103, Mar. 1999. [ 6 ] H. Gharavi and S. M. Alamouti, “Multipriority video transmission for third- generation wireless communication systems,” Proc. IEEE, vol. 87, no. 10, pp. 1751-1763, Oct. 1999. [7] W. Zhu, Q. Zhang, and Y.-Q. Zhang, “Network-adaptive rate control with unequal loss protection for scalable video over internet,” in Proc. ISCAS, May 2001, vol. 5, pp. 109-112. [ 8 ] W. R. Heinzelman, M. Budagavi, and R. Talluri, “Unequal error protection of MPEG-4 compressed video,” in Proc. ICIP, Oct. 1999, vol. 2, pp. 530- 534. [9] L.-F. Wei, “Coded modulation with unequal error protection,” IEEE Trans. Commun., vol. 41, no. 10, pp. 1439-1449, Oct. 1993. 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [10] H. Gharavi, “Pilot-assisted 16-level QAM for wireless video,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, pp. 77-89, Feb. 2002. [11] A. R. Reibman, H. Jafarkhani, Y. Wang, M. T. Orchard, and R. Puri, “Multiple-description video coding using motion-compensated temporal prediction,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, pp. 193— 204, Mar. 2002. [12] B. Shankar and M. R. A. Makur, “Allpass delay chain-based HR PR fil- terbank and its application to multiple description subband coding,” IEEE Trans. Image Processing, vol. 50, pp. 814-823, Apr. 2002. [13] V. K. Goyal, J. A. Kelner, and J. Kovacevic, “Multiple description vector quantization with a coarse lattice,” IEEE Trans. Information Theory, vol. 48, pp. 781-788, March 2002. [14] Y. Wang, M. T. Orchard, V. Vaishampayan, and A. R. Reibman, “Multiple description coding using pairwise correlating transforms,” IEEE Trans. Image Processing, vol. 10, pp. 351-366, March 2001. [15] C.-S. Kim, R.-C. Kim, and S.-U. Lee, “Robust transmission of video se quence over noisy channel using parity-check motion vector,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 1063-1074, Oct. 1999. [16] K. Stuhlmuller, N. Farber, and B. Girod, “Trade-off between source and channel coding for video transmission,” in Proc. ICIP, 2000, vol. 1, pp. 399-402. [17] Q. Zhang, W. Zhu, and Y.-Q. Zhang, “Resource allocation for multimedia streaming over the internet,” IEEE Trans. Multimedia, vol. 3, pp. 339 - 355, Sept. 2001. [18] G. Cheung and A. Zakhor, “Bit allocation for joint source/channel coding of scalable video,” IEEE Trans. Image Processing, vol. 9, no. 3, pp. 340-356, Mar. 2000. [19] L. P. Kondi and A. K. Katsagglos, “Joint source-channel coding for scalable video using models of rate-distortion functions,” in Proc. ICASSP, 2001, vol. 3, pp. 1377-1380. [20] B. Hochwald and K. Zeger, “Tradeoff between source and channel coding,” I E E E T r a n s . I n f o r m a t i o n T h e o r y , vol. 43, no. 5, pp. 1412 1424, S ept. 1997. [21] L. P. Kondi, S. N. Batalama, D. A. Pados, and A. K. Katsagglos, “Joint source-channel coding for scalable video over DS-CDMA multipath fading channels,” in Proc. ICIP, 2001, vol. 1, pp. 994-997. 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [22] C.-Y. Hsu, A. Ortega, and M. Khansari, “Rate control for robust video transmission over burst-error wireless channels,” IEEE J. Select. Areas in Cornrnun., vol. 17, pp. 1-18, May 1999. [23] Q. Zhang, W. Zhu, Z. Ji, and Y.-Q. Zhang, “A power-optimized joint source channel coding for scalable video streaming over wireless channel,” in Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium On, May 2001, vol. 5, pp. 137-140. [24] Z. Ji, Q. Zhang, W. Zhu, and Y.-Q. Zhang, “End-to-end power-optimized video communication over wireless.channels,” in Proc. IEEE Workshop on Multimedia Signal Processing, Oct. 2001, pp. 447-452. [25] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Mag., vol. 15, no. 6, pp. 23 50, Nov. 1998. [26] K. Stuhlmuller, N. Farber, M. Link, and B. Girod, “Analysis of video transmission over lossy channels,” IEEE J. Select. Areas in Commun., vol. 18, pp. 1012-1032, June 2000. [27] K. H. Yang, A. Jacquin, and N. S. Jayant, “Rate-distortion optimized video coding considering frameskip,” in Proc. ICIP, 2001, pp. 534-537. [28] K. Ho Yang, A. Jacquin, and N. S. Jayant, “A normalized rate-distortion model for H.263-compatible codecs and its application to quantizer selec tion,” in Proc. ICIP, 1997, pp. 41-44. [29] Z. He and S. K. Mitra, “Novel rate-distortion analysis framework for bit rate and picture quality control in DCT visual coding,” IEE Proc. Vision, Image and Signal Processing, pp. 398-406, Dec. 2001. [30] G. J. Sullivan and T. Weigand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Mag., vol. 15, pp. 74-90, November 1998. [31] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic rate distortion model,” IEEE Trans. Circuits Syst. Video Technol, vol. 7, pp. 246-250, Feb. 1997. [32] M. Gallant and F. Kossentini, “Low-delay rate control for DCT video coding via />domain source modeling,” IEEE Trans. Circuits Syst. Video Technol, vol. 11, no. 8, pp. 928-940, Aug. 2001. 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [33] M. Bystrom and T. Stockhammer, “Modeling of operational distortion-rate characteristics for joint source-channel coding of video,” in Proc. ICIP, Sept. 2000, pp. 359-362. [34] M. Gallant and F. Kossentini, “Source model for transform video coder and its application-part I: fundamental theory,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 287-298, April 1997. [35] Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 36, no. 9, pp. 1445-1453, Sept. 1988. [36] H. S. Jung, R.-C. Kim, and S.-U. Lee, “A hierarchical synchronization technique based on the EREC for robust transmission of H.263 bit stream,” IEEE Trans. Circuits Syst. Video Technol, vol. 10, no. 3, pp. 433-438, Apr. 2000. [37] ISO/IEC, “Overview of the MPEG-4 standard,” JTC1/SC29/W G11 N4030, Mar. 2001. [38] Q. Zhang, W. Zhu, and Y.-Q. Zhang, “Network-adaptive rate control with TCP-friendly protocol for multiple video objects,” in Proceedings of Inter national Conference on Multimedia and Expo, 2000. [39] Y. Wang and Q.-F. Zhu, “Error control and concealment for video commu nication: a review,” Proc. IEEE, May 2000. [40] Y. Wang, S. Wengers, J. Wen, and A. K. Katsaggelos, “Error resilient video coding techniques,” IEEE Signal Processing Mag., vol. 17, no. 4, pp. 61-82, July 2000. [41] S. Aign and K. Fazel, “Temporal and spatial error concealment techniques for hierarchical MPEG-2 video codec,” in Proc. ICC, Jan. 1995, pp. 1778- 1783. [42] B. Girod and N. Farber, “Feedback-based error control for mobile video transmission,” Proc. IEEE, vol. 87, no. 10, pp. 1707-1723, Oct. 1999. [43] P.-R. Chang and C.-F. Lin, “Design of spread spectrum multicode cdma transport architecture for multimedia services,” IEEE J. Select. Areas in Commun., vol. 18, no. 1, pp. 99-111, Jan. 2000. [44] B. Deep and W.-C. Feng, “Adaptive code allocation in multicode-CDMA for transmitting H.263 video,” in Wireless Communications and Networking Conference, Sept. 1999, vol. 2, pp. 1003-1007. 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [45] T. S. Rappaport, Wireless Communications Principles & Practices, Pren tice Hall, 1996. [46] J. Hagenauer, “Rate-compatible punctured convolutional codes(RCPC codes) and their applications,” IEEE Trans. Commun., vol. 36, no. 4, pp. 389-400, Apr. 1988. [47] A. J. Viterbi, “Convolutional codes and their performance in communica tion systems,” IEEE Trans. Commun., vol. 19, no. 5 ,, pp. 751-772, Oct. 1971. [48] S. H. Choi and S. W. Kim, “Optimum bandwidth expansion for ds/ssma communication over a multipath rayleigh fading channel,” in IEEE Inter national Conference on Communications, 1995, vol. 3, pp. 1705-1709. [49] T. M. Cover and J. A. Thomas, Elements of information theory, John Wiley and Sons, 1991. [50] J. Shin, J. W. Kim, and C.-C.J. Kuo, “Quality-of-service mapping mech anism for packet video in differentiated services network,” IEEE Trans. Multimedia, vol. 3, no. 2, pp. 219-231, June 2001. [51] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low- delay communications,” IEEE Trans. Circuits Syst. Video Technol, vol. 9, no. 1, pp. 172-185, Feb. 1999. [52] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Processing, vol. 9, no. 10, pp. 1661-1666, Oct. 2000. [53] B. W. Wah, X. Su, and D. Lin, “A survey of error-concealment schemes for real-time audio and video transmissions over the internet,” in Proc. Int. Symp. Multimedia Software Engineering, Dec. 2000, pp. 17-24. [54] P. Cuenca, L. Orozco-Barbosa, A. Garrido, F. Quiles, and T. Olivares, “A survey of error concealment schemes for MPEG-2 video communications over ATM networks,” in Proc. IEEE 1997 Canadian Conf. Electrical and Computer Engineering, May 1997, vol. 1, pp. 25-28. [55] A. Raman and M. Babu, “A low complexity error concealment scheme for MPEG-4 coded video sequences,” in Proc. Tenth Annual Symp. Multimedia Communications and Signal Processing, Bangalore, India, Nov. 2001. [56] P. Salama, N. B. Shroff, and E. J. Delp, “Deterministic spatial approach,” in Signal Recovery Techniques for Image and Video compression and Trans mission, A. Katsaggelos and N. Galatsanos, Eds., pp. 212-216. Kluwer Aca demic Publishers, June 1998. 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [57] Y. Wang and Q.-F. Zhu, “Signal loss recovery in DCT-based image and video codecs,” in Proc. SPIE VCIP-91, Boston, M A, Nov. 1991, pp. 667- 678. [58] X. Lee, Y.-Q. Zhang, and A. Leon-Garcia, “Image and video reconstruction using fuzzy logic,” in Proc. IEEE Global Telecommunications Conf., Dec. 1993, vol. 2, pp. 975-979. [59] X. Lee, Y.-Q. Zhang, and A. Leon-Garcia, “Information loss recovery for block-based image coding techniques-a fuzzy logic approach,” IEEE Trans. Image Processing, vol. 4, no. 3, pp. 259-273, Mar. 1995. [60] H. Sun and W. Kwok, “Concealment of damaged block transform coded image using projections onto convex sets,” IEEE Trans. Image Processing, vol. 4, no. 4, pp. 470-477, Apr. 1995. [61] S. Belfiore, L. Crisa, M. Grangetto, E. Magli, and G. Olmo, “Robust and edge-preserving video error concealment by coarse-to-fine block replenish ment,” in Proc. ICASSP, May 2002, vol. 4, pp. 3281-3284. [62] W. Zeng and B. Liu, “Geometric-structure-based error concealment with novel applications in block-based low-bit-rate coding,” IEEE Trans. Cir cuits Syst. Video Technol, vol. 9, no. 4, pp. 648-665, June 1999. [63] W. M. Lam, A. R. Reibman, and B. Liu, “Recovery of lost or erroneously received motion vectors,” in Proc. ICASSP, Apr. 1993, vol. 5, pp. 417-420. [64] M.-J. Chen, L.-G. Chen, and R.-M. Weng, “Error concealment of lost mo tion vectors with overlapped motion compensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 3, pp. 560-563, June 1997. [65] J. Zhang, J. F. Arnold, and M. R. Frater, “A cell-loss concealment technique for MPEG-2 coded video,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 4, pp. 659-665, June 2000. [66] S. Tsekeridou and I. Pitas, “MPEG-2 error concealment based on block- matching principles,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 4, pp. 646-658, June 2000. [67] C. Li, J. Lu, J. Gu, and M. L. Liou, “Error resilience schemes for digital terrestrial TV broadcasting system,” in Proc. IEEE Workshop on Signal P r o c e s s i n g S y s t e m s , S ept. 2001, pp. 247 258. [68] S.-H. Lee, D.-H. Choi, and C.-S. Hwang, “Error concealment using affine transform for H.263 coded video transmissions,” Elec. Letters, vol. 37, no. 4, pp. 218-220, Feb. 2001. 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [69] Q.-F. Zhu, Y. Wang, and L. Shaw, “Coding and cell-loss recovery in DCT- based packet video,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 3, pp. 248-258, June 1993. [70] P. Salama, N. B. Shroff, and E. J. Delp, “Statistical spatial approach: MAP estimation,” in Signal Recovery Techniques for Image and Video compres sion and Transmission, A. Katsaggelos and N. Galatsanos, Eds., pp. 217- 219. Kluwer Academic Publishers, June 1998. [71] S. Shirani, F. Kossentini, and R. Ward, “A concealment method for video communications in an error-prone environment,” IEEE J. Select. Areas in Commun., vol. 18, no. 6, pp. 1122-1128, June 2000. [72] P. Salama, N. B. Shroff, and E. J. Delp, “Error concealment in MPEG video streams over ATM networks,” IEEE J. Select. Areas in Commun., vol. 18, no. 6, pp. 1129-1144, June 2000. [73] D. S. Turaga and T. Chen, “Model-based error concealment for wireless video,” IEEE Trans. Circuits Syst. Video Technol, vol. 12, no. 6, pp. 483- 495, June 2002. [74] C.-S. Kim, J.-W. Kim, I. Katsavounidis, and C.-C. J. Kuo, “Robust MMSE video decoding: theory and practical implementations,” submitted to IEEE Trans. Circuits Syst. Video Technol, May 2002 (in revision). [75] S. B. Wicker, Error control systems for digital communication and storage, Prentice Hall, 1995. [76] T. Wiegand and B. Girod, Multi-frame motion-compensated prediction for video transmission, Kluwer Academic Publishers, 2001. [77] M.-C. Lee, W.-G. Chen, C. B. Lin, C. Gu, T. Markoc, S. I. Zabinsky, and R. Szeliski, “A layered video object coding system using sprite and affine motion model,” IEEE Trans. Circuits Syst. Video Technol, vol. 7, no. 1, pp. 130-145, Feb. 1997. [78] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion- compensated prediction,” IEEE Trans. Circuits Syst. Video Technol, vol. 9, no. 1, pp. 70-84, Feb. 1999. [79] Information technology - Coding of audio-visual objects - Part 2: Visual, 2000, ISO/IEC JTC 1/SC 29/WG11 N3056. [80] Study of Final Committee Draft of Joint Video Specification, 2003, ITU-T Rec. H.264 — ISO/IEC 14496-10 AVC. 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [81] G. J. Sullivan, “Multi-hypothesis motion compensation for low bit-rate video coding,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 1993, vol. 5, pp. 437-440. [82] B. Girod, “Why B-pictures work: a theory of multi-hypothesis motion- compensated prediction,” in IEEE International Conference on Image Pro cessing, Oct. 1998, vol. 2, pp. 213-217. [83] B. Girod, “Efficiency analysis of multihypothesis motion-compensated pre diction for video coding,” IEEE Trans. Image Processing, vol. 9, no. 2, pp. 173-183, Feb. 2000. [84] M. T. Orchard and G. J. Sullivan, “Overlapped block motion compensation: an estimation-theoretic approach,” IEEE Trans. Image Processing, vol. 3, no. 5, pp. 693-699, Sept. 1994. [85] M. Flierl, T. Wiegand, and B. Girod, “Rate-constrained multi-hypothesis motion-compensated prediction for video coding,” in IEEE International Conference on Im,age Processing, Sept. 2000, vol. 3, pp. 150-153. [86] M. Flierl and B. Girod, “Multihypothesis motion estimation for video cod ing,” in Data Compression Conference, Mar. 2001, pp. 341-350. [87] T. Wiegand, M. Flierl, and B. Girod, “Entropy-constrained linear vector prediction for motion-compensated video coding,” in IEEE International Symposium on Information Theory, Aug. 1998, vol. 3, p. 409. [88] M. Flierl, T. Wiegand, and B. Girod, “A locally optimal design algorithm for block-based multi-hypothesis motion-compensated prediction,” in Data Compression Conference, Mar.-Apr. 1998, pp. 239-248. [89] M. Flierl, T. Wiegand, and B. Girod, “Rate-constrained multihypothesis prediction for motion-compensated video compression,” IEEE Trans. Cir cuits Syst. Video Technol., vol. 12, no. 11, pp. 957-969, Nov. 2002. [90] C.-S. Kim, R.-C. Kim, and S.-U. Lee, “Robust transmission of video se quence using double-vector motion compensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 9, pp. 1011-1021, Sept. 2001. [91] Y. Wang and S. Lin, “Error-resilient video coding using multiple description motion compensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 6, pp. 438-452, June 2002. [92] S. Lin and Y. Wang, “Analysis and improvement of multiple description motion compensation video coding for lossy packet networks,” in IEEE 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. International Conference on Image Processing, Sept. 2002, vol. 2, pp. 185- 188. [93] M. E. Al-Mualla, C. N. Canagarajah, and D. R. Bull, “Multiple-reference temporal error concealment,” in IEEE International Symposium on Circuits and Systems, May 2001, vol. 5, pp. 149-152. [94] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “A dynamic error concealment for video transmission over noisy channels,” in IEEE GlobeCom, Nov. 2002, vol. 2, pp. 1769-1773. [95] Y. O. Park, C.-S. Kim, and S.-U. Lee, “Multi-hypothesis error concealment algorithm for H.261 video,” in IEEE International Conference on Im,age Processing, Sept. 2003, vol. 3, pp. 465-468. [96] S. Lin and Y. Wang, “Error resilience property of multihypothesis motion- compensated prediction,” in IEEE International Conference on Image Pro cessing, June 2002, pp. 545-548. [97] N. Farber, K. Stuhlmuller, and B. Girod, “Analysis of error propagation in hybrid video coding with application to error resilience,” in IEEE Interna tional Conference on Image Processing, Oct. 1999, pp. 550-554. [98] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74-90, Nov. 1998. [99] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “Spatial and temporal error concealment techniques for video transmission over noisy channels,” sub mitted to IEEE Trans.on Circuits Syst. Video Technol., 2003. [100] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “Packet video transmission over wireless channels with adaptive channel rate allocation,” submitted to Journal of Visual Communication and Image Representation Special Issue on Visual Communication in the Ubiquitous Era, 2004. [101] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “Analysis of multi-hypothesis motion compensated prediction (MHMCP) for robust visual communica tion,” submitted to IEEE C SVT Special Issue on Analysis and Understand ing for Video Adaptation Conferences, 2004. [102] X. Zhou, W.-Y. Kung, and C.-C. Jay Kuo, “A high-performance error resilient video encoding scheme for H.264,” in submitted to International Packet Video Workshop, Irvine, CA, Dec. 2004. 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [103] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Multi-hypothesis motion com pensated prediction (MHMCP)for error-resilient visual communication,” in invited paper for 2004 IEEE International Symposium on Intelligent Multi- media, Video and Speech Processing, The Hong Kong Polytechnic Univer sity, Hong Kong, Oct. 2004. [104] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Error resilience analysis of multi-hypothesis motion compensated prediction for video coding,” in IEEE International Conference on Image Processing, Singapore, Oct. 2004. [105] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Design and analysis of multi hypothesis motion compensated prediction (MHMCP) codec for error re silient visual communications,” in Proc. Of ITCOM, Philadelphia, PA, Oct. 2004. [106] W.-Y. Kung, H.-S. Kong, A. Vetro, and H. Sun, “Error resilient methods for real-time MPEG-4 video streaming,” in IEEE International Symposium on Circuits and Systems, Vancouver, Canada, May 2004. [107] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Error resilient video transmis sion with multi-hypothesis motion compensated prediction,” in IEEE In ternational Symposium on Circuits and Systems, Vancouver, Canada, May 2004. [108] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Analysis of multi-hypothesis motion compensated prediction for error resilient video transmission,” in Proc. Of VCIP, San Jose, CA, Jan. 2004. [109] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Error resilient video transmis sion with multi-hypothesis motion compensated prediction,” in Proc. Of ITCom, Orlando, Florida, Sept. 2003. [110] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Edge recovery and selective directional interpolation for spatial error concealment,” in IEEE ICASSP 2003, Hong Kong (IEEE ICME, Baltimore, MD), Apr. 2003. [111] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “A dynamic error concealment for video transmission over noisy channels,” in Proceedings of ITCom, Aug. 2002 . [112] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Adaptive channel coding for robust video transmission,” in IEEE International Symposium on Circuits and Systems, May 2002. 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [113] W.-Y. Kung, C.-S. Kim, and C.-C. Jay Kuo, “Robust video transmission using adaptive bit allocation,” in Proceedings of Visual Communications and Image Processing Conference 2002, Jan. 2002. [114] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Unequal error protection for packet video transmission over wireless CDMA channel,” in Proceedings of ITCom, Aug. 2001. [115] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “Video transmission over CDMA systems with unequal error protection,” in Proceedings of International Conference on Third Generation Wireless and Beyond (SGwireless), May 2001. [116] W.-Y. Kung, C.-S. Kim, and C.-C. J. Kuo, “An efficient channel code assignment scheme for multiple users in CDMA systems,” in Proceedings of SPIE, Nov. 2000. 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bibliography Video coding for low bitrate communication. ITU-T Recommenda-tion H.263, 1998. Information technology - Coding of audio-visual objects - Part 2: Visual, 2000. ISO/IEC JTC 1/SC 29/WG11 N3056. Study of Final Committee Draft of Joint Video Specification. ITU-T Rec. H.264 ISO/IEC 14496-10 AVC, 2000. Aign, S., and Fazel, K. Temporal and spatial error concealment tech-niques for hierarchical MPEG-2 video codec. In Proc. ICC (Jan. 1995), pp. 1778-1783. Al-Mualla, M. E., Canagarajah, C. N., and Bull, D. R. Multiple-reference temporal error concealment. In IEEE International Symposium on Circuits and Systems (May 2001), vol.5, pp. 149-152. Belfiore, S., Crisa, L., Grangetto, M., Magli, E., and Olmo, G. Robust and edge-preserving video error concealment by coarse-to-fine block replenishment. In Proc. ICASSP (May 2002), vol. 4, pp. 3281-3284. Bystrom, M., and Stockhammer, T. Modeling of operational distortion-rate characteristics for joint source-channel coding of video. In Proc. ICIP (Sept. 2000), pp. 359-362. Chang, P.-R., and Lin, C.-F. Design of spread spectrum multicode cdrna trans port architecture for multimedia services. IEEE J. Select. Areas in Commun. 18, 1 (Jan. 2000), 99-111. Chen, M.-J., Chen, L.-G., and Weng, R.-M. Error concealment of lost mo tion vectors with overlapped motion compensation. IEEE Trans. Circuits Syst. Video Technol. 7, 3 (June 1997), 560-563. Cheung, G., and Zakhor, A. Bit allocation for joint source/channel coding of scalable video. IEEE Trans. Image Processing 9, 3 (Mar. 2000), 340-356. 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Cheung, T. T., So, M., Cheng, R. S., and Letaief, K. Adaptive unequal error protection and vie reshuffling for image transmission over wireless channels. In Vehicular Technology Conference Proceedings, 2000. VTC 2000-Spring Tokyo. 2000 IEEE 51st (May 2000), vol. 2, pp. 800-804. Chiang, T., and Zhang, Y.-Q. A new rate control scheme using quadratic rate distortion model. IEEE Trans. Circuits Syst. Video Tech-nol. 7 (Feb. 1997), 246-250. Choi, S. H., and Kim, S. W. Optimum bandwidth expansion for ds/ssma com munication over a multipath rayleigh fading channel. In IEEE Interna-tional Conference on Communications (1995), vol. 3, pp. 1705-1709. Cover, T. M., and Thomas, J. A. Elements of information theory. John Wiley and Sons, 1991. Cuenca, P., Orozco-Barbosa, L., Garrido, A., Quiles, F., and Olivares, T. A survey of error concealment schemes for MPEG-2 video communications over ATM networks. In Proc. IEEE 1997 Canadian Conf. Electrical and Computer Engineering (May 1997), vol. 1, pp. 25-28. Deep, B., and Feng, W.-C. Adaptive code allocation in multicode-CDMA for transmitting H.263 video. In Wireless Communications and Networking Con ference (Sept. 1999), vol. 2, pp. 1003-1007. Farber, N., Stuhlmuller, K., and Girod, B. Analysis of error propagation in hybrid video coding with application to error resilience. In IEEE International Conference on Image Processing (Oct. 1999), pp. 550-554. Flierl, M., and Girod, B. Multihypothesis motion estimation for video coding. In Data Compression Conference (Mar. 2001), pp. 341-350. Flierl, M., Wiegand, T., and Girod, B. A locally optimal design algorithm for block-based multi-hypothesis motion-compensated prediction. In D ata Compres sion Conference (Mar.-Apr. 1998), pp. 239-248. Flierl, M., Wiegand, T., and Girod, B. Rate-constrained multi-hypothesis motion-compensated prediction for video coding. In IEEE Inter-national Con ference on Image Processing (Sept. 2000), vol. 3, pp. 150-153. Flierl, M., Wiegand, T., and Girod, B. Rate-constrained multi-hypothesis pre diction for motion-compensated video compression. IEEE Trans. Circuits Syst. Video Technol. 12, 11 (Nov. 2002), 957-969. 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Gallant, M., and Kossentini, F. Source model for transform video coder and its application art I: fundamental theory. IEEE Trans. Circuits Syst. Video Technol. 7 (April 1997), 287-298. Gallant, M., and Kossentini, F. Low-delay rate control for D CT video coding via p-domain source modeling. IEEE Trans. Circuits Syst. Video Technol. 11, 8 (Aug. 2001), 928-940. Gallant, M., and Kossentini, F. Rate-distortion optimized layered coding with unequal error protection for robust internet video. IEEE Trans. Circuits Syst. Video Technol. 11,3 (Mar. 2001), 357-372. Gharavi, H. Pilot-assisted 16-level QAM for wireless video. IEEE Trans. Cir cuits Syst. Video Technol. 12 (Feb. 2002), 77-89. Gharavi, H., and Alamouti, S. M. Multipriority video transmission for third- generation wireless communication systems. Proc. IEEE 87, 10 (Oct. 1999), 1751-1763. Girod, B. Why B-pictures work: a theory of multi-hypothesis motion- compensated prediction. In IEEE International Conference on Image Process ing (Oct. 1998), vol. 2, pp. 213-217. Girod, B. Efficiency analysis of multihypothesis motion-compensated predic tion for video coding. IEEE Trans. Image Processing 9, 2 (Feb. 2000), 173-183. Girod, B., and Farber, N. Feedback-based error control for mobile video trans mission. Proc. IEEE 87, 10 (Oct. 1999), 1707-1723. Goyal, V. K., Kelner, J. A., and Kovacevic, J. Multiple description vector quantization with a coarse lattice. IEEE Trans. Information Theory 48 (March 2002), 781-788. Hagenauer, J. Rate-compatible punctured convolutional codes(RCPC codes) and their applications. IEEE Trans. Commun. 36, 4 (Apr. 1988), 389-400. He, Z., and Mitra, S. K. Novel rate-distortion analysis framework for bit rate and picture quality control in DCT visual coding. IEE Proc. Vision, Image and Signal Processing (Dec. 2001), 398-406. Heinzelman, W. R., Budagavi, M., and Talluri, R. Unequal error protection of M P E G - 4 c o m p r e s s e d , v i d e o . In P ro c. IC IP (O ct. 1999), vol. 2, pp. 530-534. Hochwald, B., and Zeger, K. Tradeoff between source and channel coding. IEEE Trans. Information Theory 43, 5 (Sept. 1997), 1412-1424. 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hsu, C.-Y., Ortega, A., and Khansari, M. Rate control for robust video trans mission over burst-error wireless channels. IEEE J. Select. Areas in Commun. 17 (May 1999), 1-18. ISO/IEC. Overview of the MPEG-4 standard. JTC1/SC29/W G11 N4030 (Mar. 2001). Ji, Z., Zhang, Q., Zhu, W., and Zhang, Y.-Q. End-to-end power-optimized video communication over wireless, channels. In Proc. IEEE Workshop on Mul timedia Signal Processing (Oct. 2001), pp. 447-452. Jung, H. S., Kim, R.-C., and Lee, S.-U. A hierarchical synchronization tech nique based on the EREC for robust transmission of H.263 bit stream,. IEEE Trans. Circuits Syst. Video Technol. 10, 3 (Apr. 2000), 433-438. Kim, C.-S., Kim, J.-W., Katsavounidis, I., and Kuo, C.-C. J. Robust MMSE video decoding: theory and practical implementations, submitted to IEEE Trans. Circuits Syst. Video Technol., May 2002 (in revision). Kim, C.-S., Kim, R.-C., and Lee, S.-U. Robust transmission of video sequence over noisy channel using parity-check motion vector. IEEE Trans. Circuits Syst. Video Technol. 9 (Oct. 1999), 1063-1074. Kim, C.-S., Kim, R.-C., and Lee, S.-U. Robust transmission of video sequence using double-vector motion compensation. IEEE Trans. Circuits Syst. Video Technol. 11, 9 (Sept. 2001), 1011-1021. Kondi, L. P., Batalama, S. N., Pados, D. A., and Katsagglos, A. K. Joint source-channel coding for scalable video over DS-CDMA multipath fading channels. In Proc. ICIP (2001), vol. 1, pp. 994-997. Kondi, L. P., and Katsagglos, A. K. Joint source-channel coding for scalable video usinq models of rate-distortion functions. In Proc. ICASSP (2001), vol. 3, pp. 1377-1380. Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Analysis of multi-hypothesis motion compensated prediction (MHMCP) for robust visual communication. submitted to IEEE Trans.on Circuits Syst. Video Technol. 2003. Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Packet video transmission over wireless channels with adaptive channel rate allocation, submitted to Journal of Visual Communication and Image Representation Special Issue on Visual Communication in the Ubiquitous Era, 2004. 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Spatial and temporal error con cealment techniques for video transmission over noisy channels, submitted to IEEE CSVT Special Issue on Analysis and Understanding for Video Adapta tion Conferences, 2004. Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. An efficient channel code assign ment scheme for multiple users in CDMA systems. In Proceedings of SPIE (Nov. 2000). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Unequal error protection for packet video transmission over wireless CDMA channel. In Proceedings of ITCom (Aug. 2001). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Video transmission over CDMA systems with unequal error protection. In Proceedings of In-ternational Con ference on Third Generation Wireless and Beyond (3Gwireless) (May 2001). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Adaptive channel coding for robust video transmission. In IEEE International Symposium on Circuits and Systems (May 2002). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. A dynamic error concealment for video transmission over noisy channels. In IEEE GlobeCom (Nov. 2002), vol. 2, pp. 1769-1773. Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. A dynamic error concealment for video transmission over noisy channels. In Proceedings of ITCom (Aug. 2002). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Robust video transmission using adaptive bit allocation. In Proceedings of Visual Communications and Image Processing Conference 2002 (Jan. 2002). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Edge recovery and selective di rectional interpolation for spatial error concealment. In IEEE ICASSP 2003, Hong Kong (IEEE ICME, Baltimore, MD) (Apr. 2003). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Error resilient video transmis sion with multi-hypothesis motion compensated prediction. In Proc. Of ITCom, Orlando, Florida (Sept. 2003). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Analysis of multi-hypothesis motion compensated prediction for error resilient video transmission. In Proc. Of VCIP, San Jose, CA (Jan. 2004). 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Design and analysis of multi hypothesis motion compensated prediction (MHMCP) codec for error resilient visual communications. In Proc. Of ITCOM, Philadelphia, PA (Oct. 2004). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Error resilience analysis of multi hypothesis motion compensated prediction for video coding. In IEEE Interna tional Conference on Image Processing, Singapore (Oct. 2004). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Error resilient video transmission with multi-hypothesis motion compensated prediction. In IEEE International Symposium on Circuits and Systems, Vancouver, Canada (May 2004). Kung, W.-Y., Kim, C.-S., and Kuo, C.-C. J. Multi-hypothesis motion com pensated prediction (MHMCP)for error-resilient visual communication. In in vited paper for 2004 IEEE International Symposium on Intelligent Multime dia, Video and Speech Processing, The Hong Kong Polytechnic University, Hong Kong (Oct. 2004). Kung, W.-Y., Kong, H .-S., Vetro, A., and Sun, H. Error resilient methods for real-time MPEG-4 video streaming. In IEEE International Symposium on Circuits and Systems, Vancouver, Canada (May 2004). Lam, E. Y., and Goodman, J. W. A mathematical analysis of the D CT co efficient distributions for images. IEEE Trans. Image Processing 9, 10 (Oct. 2000), 1661-1666. Lam, W. M., Reibman, A. R., and Liu, B. Recovery of lost or erroneously received motion vectors. In Proc. ICASSP (Apr. 1993), vol. 5, pp. 417-420. Lee, M.-C., Chen, W.-G., Lin, C. B., Gu, C., Markoc, T., Zabin- sky, S. I., and Szeliski, R. A layered video object coding system using sprite and affine m,otion m,odel. IEEE Trans. Circuits Syst. Video Technol. 7, 1 (Feb. 1997), 130-145. Lee, S.-H., Choi, D.-H., and Hwang, C.-S. Error concealment using affi,ne transform for H.263 coded video transmissions. Elec. Letters 37, 4 (Feb. 2001), 218-220. Lee, X., Zhang, Y.-Q., and Leon-Garcia, A. Image and video re-construction using fuzzy logic. In Proc. IEEE Global Telecommunications Conf. (Dec. 1993), vol. 2, pp. 975-979. Lee, X., Zhang, Y.-Q., and Leon-Garcia, A. Information loss recovery for block-based image coding techniques - fuzzy logic approach. IEEE Trans. Image Processing 4, 3 (Mar. 1995), 259-273. 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Li, C., Lu, J., Gu, J., and Liou, M. L. Error resilience schemes for digital terrestrial TV broadcasting system. In Proc. IEEE Workshop on Signal Pro cessing Systems (Sept. 2001), pp. 247-258. Li, H., and Chen, C. W. Robust image transmission with bidirectional synchro nization and hierarchical error correction. IEEE Trans. Circuits Syst. Video Technol. 11 (Nov. 2001), 1183-1187. Lin, S., and Wang, Y. Analysis and improvement of multiple description mo tion compensation video coding for lossy packet networks. In IEEE Interna tional Conference on Image Processing (Sept. 2002), vol. 2, pp. 185-188. Lin, S., and Wang, Y. Error resilience property of multihypothesis motion- compensated prediction. In IEEE International Conference on Image Process ing (June 2002), pp. 545-548. Orchard, M. T., and Sullivan, G. J. Overlapped block motion compensation: an estimation-theoretic approach. IEEE Trans. Image Processing 3, 5 (Sept. 1994), 693-699. Ortega, A., and Ramchandran, K. Rate-distortion methods for image and video compression. IEEE Signal Processing Mag. 15, 6 (Nov. 1998), 23-50. Park, Y. O., Kim, C.-S., and Lee, S.-U. Multi-hypothesis error concealment algorithm for H.261 video. In IEEE International Conference on Image Pro cessing (Sept. 2003), vol. 3, pp. 465-468. Raman, A., and Babu, M. A low complexity error concealment scheme for MPEG-4 coded video sequences. In Proc. Tenth Annual Symp. Multimedia Communications and Signal Processing, Bangalore, India (Nov. 2001). Rappaport, T. S. Wireless Communications Principles and Practices. Prentice Hall, 1996. Reibman, A. R., Jafarkhani, H., Wang, Y., Orchard, M. T., and Puri, R. Multiple-description video coding using motion-compensated temporal predic tion. IEEE Trans. Circuits Syst. Video Technol. 12 (Mar. 2002), 193-204. Ribas-Corbera, J., and Lei, S. Rate control in DCT video coding for low-delay communications. IEEE Trans. Circuits Syst. Video Technol. 9, 1 (Feb. 1999), 172-185. Salama, P., Shroff, N. B., and Delp, E. J. Deterministic spatial approach. In Signal Recovery Techniques for Image and Video compression and Transmis sion, A. Katsaggelos and N. Galatsanos, Eds. Kluwer Academic Publishers, June 1998, pp. 212-216. 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Salama, P., Shroff, N. B., and Delp, E. J. Statistical spatial approach: M AP estimation. In Signal Recovery Techniques for Image and Video compression and Transmission, A. Katsaggelos and N. Galatsanos, Eds. Kluwer Academic Publishers, June 1998, pp. 217-219. Salama, P., Shroff, N. B., and Delp, E. J. Error concealment in MPEG video streams over ATM networks. IEEE J. Select. Areas in Commun. 18, 6 (June 2000), 1129-1144. Shankar, B., and Makur, M. R. A. Allpass delay chain-based HR PR. filterbank and its application to multiple description subband coding. IEEE Trans. Image Processing 50 (Apr. 2002), 814-823. Shin, J., Kim, J. W., and Kuo, C.-C. Quality-of-service mapping mechanism for packet video in differentiated services network. June 2001. Shirani, S., Kossentini, F., and Ward, R. A concealment method for video com munications in an error-prone environment. IEEE J. Select. Areas in Com mun. 18, 6 (June 2000), 1122-1128. Shoham, Y., and Gersho, A. Efficient bit allocation for an arbitrary set of quantizers. IEEE Transactions on Acoustics, Speech and Signal Processing 36, 9 (Sept. 1988), 1445-1453. Stuhlmuller, K., Farber, N., and Girod, B. Trade-off between source and chan nel coding for video transmission. In Proc. ICIP (2000), vol. 1, pp. 399-402. Stuhlmuller, K., Farber, N., Link, M., and Girod, B. Analysis of video trans mission over lossy channels. IEEE J. Select. Areas in Commun. 18 (June 2000), 1012-1032. Sullivan, G. J. Multi-hypothesis m,otion com.pensat.ion for low bit-rate video coding. In IEEE International Conference on Acoustics, Speech, and Signal Processing (Apr. 1993), vol. 5, pp. 437-440. Sullivan, G. J., and Weigand, T. Rate-distortion optimization for video com pression. IEEE Signal Processing Mag. 15 (November 1998), 74-90. Sullivan, G. J., and Wiegand, T. Rate-distortion optimization for video com pression. IEEE Signal Processing Magazine 15, 6 (Nov. 1998), 74-90. S un, H., a n d Kw ok, W . C o n c e a l m . e n t o f d a m a g e d b l o c k t r a n s f o r m , c o d e d i m a g e , using projections onto convex sets. IEEE Trans. Image Processing 4, 4 (Apr. 1995), 470-477. 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Tsekeridou, S., and Pitas, I. MPEG-2 error concealment based on block- matching principles. IEEE Trans. Circuits Syst. Video Technol. 10, 4 (June 2000), 646-658. Turaga, D. S., and Chen, T. Model-based error concealment for wireless video. IEEE Trans. Circuits Syst. Video Technol. 12, 6 (June 2002), 483-495. Viterbi, A. J. Convolutional codes and their performance in communication systems. IEEE Trans. Commun. 19, 5 (Oct. 1971), 751-772. Wah, B. W., Su, X., and Lin, D. A survey of error-concealment schemes for real-time audio and video transmissions over the internet. In Proc. Int. Symp. Multimedia Software Engineering (Dec. 2000), pp. 17-24. Wang, Y., and Lin, S. Error-resilient video coding using multiple description motion compensation. IEEE Trans. Circuits Syst. Video Technol. 12, 6 (June 2002), 438-452. Wang, Y., Orchard, M. T., Vaishampayan, V., and Reibman, A. R. Multiple description coding using pairwise correlating transforms. IEEE Trans. Image Processing 10 (March 2001), 351-366. Wang, Y., Wengers, S., Wen, J., and Katsaggelos, A. K. Error resilient video coding techniques. IEEE Signal Processing Mag. 17, 4 (July 2000), 61-82. Wang, Y., and Zhu, Q.-F. Signal loss recovery in DCT-based image and video codecs. In Proc. SPIE VCIP-91, Boston, MA (Nov. 1991), pp. 667-678. Wang, Y., and Zhu, Q.-F. Error control and concealment for video communi cation: a review. Proc. IEEE (May 2000). Wei, L.-F. Coded modulation with unequal error protection. IEEE Trans. Com mun. 41, 10 (Oct. 1993), 1439-1449. Wicker, S. B. Error control systems for digital communication and storage. Prentice Hall, 1995. Wiegand, T., Flierl, M., and Girod, B. Entropy-constrained linear vector pre diction for motion-compensated video coding. In IEEE International Sympo sium on Information Theory (Aug. 1998), vol. 3, p. 409. Wiegand, T., and Girod, B. Multi-frame motion-compensated prediction for v i d e o t r a n s m i s s i o n . K luw er A cadem ic P u b lish ers, 2001. Wiegand, T., Zhang, X., and Girod, B. Long-term memory motion- compensated prediction. IEEE Trans. Circuits Syst. Video Technol. 9, 1 (Feb. 1999), 70-84. 176 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Yang, K. H., Jacquin, A., and Jayant, N. S. A normalized rate-distortion model for H.263-compatible codecs and its application to quantizer selection. In Proc. ICIP (1997), pp. 41-44. Yang, K. H., Jacquin, A., and Jayant, N. S. Rate-distortion optimized video coding considering frameskip. In Proc. ICIP (2001), pp. 534-537. Zeng, W., and Liu, B. Geometric-structure-based error concealment with novel applications in block-based low-bit-rate coding. IEEE Trans. Circuits Syst. Video Technol. 9, 4 (June 1999), 648-665. Zhang, J., Arnold, J. F., and Frater, M. R. A cell-loss concealment technique for MPEG-2 coded video. IEEE Trans. Circuits Syst. Video Tech-nol. 10, 4 (June 2000), 659-665. Zhang, Q., Zhu, W., Ji, Z., and Zhang, Y.-Q. A power-optimized joint source channel coding for scalable video streaming over wireless channel. In Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium On (May 2001), vol. 5, pp. 137-140. Zhang, Q., Zhu, W., and Zhang, Y.-Q. Network-adaptive rate control with TCP-friendly protocol for multiple video objects. In Proceedings of Interna tional Conference on Multimedia and Expo (2000). Zhang, Q., Zhu, W., and Zhang, Y.-Q. Resource allocation for multi-media streaming over the internet. IEEE Trans. Multimedia 3 (Sept. 2001), 339-355. Zheng, H., and Liu, K. J. R. Robust image and video transmission over spec trally shaped channels using multicarrier modulation. IEEE Trans. Multimedia 1, 1 (Mar. 1999), 88-103. Zhou, X., Kung, W.-Y., and Kuo, C.-C. J. A high-performance error resilient video encoding scheme for H.26f. submitted to International Packet Video Workshop, Irvine, CA (Dec. 2004). Zhu, Q.-F., Wang, Y., and Shaw, L. Coding and cell-loss recovery in DCT- based packet video. IEEE Trans. Circuits Syst. Video Technol. 3, 3 (June 1993), 248-258. Zhu, W., Zhang, Q., and Zhang, Y.-Q. Network-adaptive rate control with unequal loss protection for scalable video over internet. In Proc. ISCAS (May 2001), vol. 5, pp. 109-112. 177 Reproduced with permission of the copyright owner. Furiher reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Adaptive video transmission over wireless fading channel
PDF
Algorithms and architectures for robust video transmission
PDF
Design and analysis of server scheduling for video -on -demand systems
PDF
Contributions to image and video coding for reliable and secure communications
PDF
Color processing and rate control for storage and transmission of digital image and video
PDF
Contributions to coding techniques for wireless multimedia communication
PDF
Advanced video coding techniques for Internet streaming and DVB applications
PDF
Content -based video analysis, indexing and representation using multimodal information
PDF
Design and analysis of MAC protocols for broadband wired/wireless networks
PDF
Design and applications of MPEG video markup language (MPML)
PDF
Intelligent systems for video analysis and access over the Internet
PDF
Efficient acoustic noise suppression for audio signals
PDF
Contributions to content -based image retrieval
PDF
Contributions to efficient vector quantization and frequency assignment design and implementation
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
PDF
Information hiding in digital images: Watermarking and steganography
PDF
Dynamic radio resource management for 2G and 3G wireless systems
PDF
Clustering techniques for coarse -grained, antifuse-based FPGAs
PDF
Intelligent image content analysis: Tools, techniques and applications
PDF
Complexity -distortion tradeoffs in image and video compression
Asset Metadata
Creator
Kung, Wei-Ying
(author)
Core Title
Error resilient techniques for robust video transmission
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Ortega, Antonio (
committee member
), Zimmermann, Roger (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-410275
Unique identifier
UC11335863
Identifier
3145223.pdf (filename),usctheses-c16-410275 (legacy record id)
Legacy Identifier
3145223.pdf
Dmrecord
410275
Document Type
Dissertation
Rights
Kung, Wei-Ying
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical