Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
H.264/AVC decoder complexity modeling and its applications
(USC Thesis Other)
H.264/AVC decoder complexity modeling and its applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
H.264/AVC DECODER COMPLEXITY MODELING AND ITS APPLICATIONS by Szu-Wei Lee A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2008 Copyright 2008 Szu-Wei Lee Dedication This dissertation is dedicated to my beloved family. To my mother, Hui-Fang Chen, you give me hope and make me have no fear anytime. To my grandmother and grandfather, you support me all my life. I will always be indebted and grateful to you. ii Acknowledgments I would like to thank Prof. C.-C. Jay Kuo for giving me the chance to study in USC and constructive suggestions when I was struggling with difficult research problems. iii Table of Contents Dedication ii Acknowledgments iii List Of Tables vii List Of Figures ix Abstract xiii Chapter 1: Introduction 1 1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Introduction to H.264/AVC Video Coding Standard . . . . . . . . 6 1.2.2 Mode Decision with Rate-Distortion Optimization (RDO) . . . . . 8 1.3 Contribution of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 2: Complexity Modeling of Spatial and Temporal Compensations in H.264/AVC Decoders and Its Applications 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 OverviewofH.264InterandIntraPredictionsandModeDecisionAlgorithms 16 2.2.1 H.264 Inter- and Intra-Predictions . . . . . . . . . . . . . . . . . . 16 2.2.2 Categories of Mode Decision Algorithms . . . . . . . . . . . . . . . 19 2.3 Complexity Model for Motion Compensation Process (MCP) . . . . . . . 24 2.3.1 Linear Complexity Model . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 Computation of Cache Misses . . . . . . . . . . . . . . . . . . . . . 28 2.3.3 Computation of Other Quantities . . . . . . . . . . . . . . . . . . . 31 2.3.4 Decoding Complexity versus Frame Size . . . . . . . . . . . . . . . 31 2.3.5 Multiple Reference Frames . . . . . . . . . . . . . . . . . . . . . . 33 2.3.6 Determination of Model Parameters . . . . . . . . . . . . . . . . . 37 2.4 Complexity Model for Spatial Compensation Process (SCP) . . . . . . . . 40 2.5 Applications of MCP and SCP Models . . . . . . . . . . . . . . . . . . . . 43 2.5.1 RDO Process with Decoding Complexity Model. . . . . . . . . . . 43 2.5.2 Decoding Complexity Control . . . . . . . . . . . . . . . . . . . . . 46 iv 2.6 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Chapter 3: CABAC Decoding Complexity Modeling and Its Applications 65 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2 Proposed CABAC Decoding Complexity Model . . . . . . . . . . . . . . . 68 3.2.1 Background Review of CABAC Encoding and Decoding Processes 69 3.2.2 CABAC Decoding Complexity Model . . . . . . . . . . . . . . . . 72 3.3 Decoder-friendly H.264 System Design . . . . . . . . . . . . . . . . . . . . 75 3.3.1 Rate-Distortion and Decoding Complexity (RDC) Optimization. . 75 3.3.2 Relationship Between Bit Rate and Decoding Complexity . . . . . 77 3.3.3 CABAC RDC Optimization and Decoding Complexity Control . . 81 3.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 4: CAVLC/UVLC Decoder Complexity Modeling and Its Appli- cations 91 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 Decoding Complexity Models for CAVLC and UVLC . . . . . . . . . . . . 93 4.2.1 CAVLC Encoding and Decoding . . . . . . . . . . . . . . . . . . . 93 4.2.2 CAVLC Decoding Complexity Model . . . . . . . . . . . . . . . . . 95 4.2.3 UVLC Encoding and Decoding . . . . . . . . . . . . . . . . . . . . 97 4.2.4 UVLC Decoding Complexity Model . . . . . . . . . . . . . . . . . 98 4.3 Decoder-Friendly H.264/AVC System Design . . . . . . . . . . . . . . . . 100 4.3.1 Rate Distortion and Decoding Complexity (RDC) Optimization Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.2 Relationship between Bit Rate and Decoding Complexity . . . . . 101 4.3.3 RDC Optimization and Decoding Complexity Control . . . . . . . 106 4.3.4 Decoder-Friendly H.264/AVC Encoding . . . . . . . . . . . . . . . 108 4.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Chapter 5: Decoding Complexity Modeling of Deblocking Filters and Its Applications 115 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.2 Overview of H.264/AVC Deblocking Filter . . . . . . . . . . . . . . . . . . 117 5.3 H.264/AVC DBF Complexity Model . . . . . . . . . . . . . . . . . . . . . 120 5.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3.2 DBF Decoding Complexity Model with Fixed Frame Size . . . . . 121 5.3.3 DBF Decoding Complexity Model with Variable Frame Sizes . . . 122 5.4 Decoder-friendly H.264/AVC System Design . . . . . . . . . . . . . . . . . 124 5.4.1 Framework for Rate, Distortion and Decoding Complexity (RDC) Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4.2 Decoding Complexity Control and Its Challenges . . . . . . . . . . 126 5.4.3 Average Complexity Control Method . . . . . . . . . . . . . . . . . 127 5.4.4 Hierarchical Complexity Control Method . . . . . . . . . . . . . . 129 v 5.4.4.1 Slice-layer and BU-layer Complexity Control . . . . . . . 129 5.4.4.2 GOS-based Complexity Control . . . . . . . . . . . . . . 129 5.4.4.3 Lagrangian Multiplier Selection . . . . . . . . . . . . . . 130 5.5 Detailed Implementation of Hierarchical Complexity Control Method . . . 130 5.5.1 Frame-based DBF Process . . . . . . . . . . . . . . . . . . . . . . . 130 5.5.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.6 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.6.1 Model Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.6.2 Complexity Control with the Average Method . . . . . . . . . . . 143 5.6.3 Complexity Control with the Hierarchical Method . . . . . . . . . 143 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Chapter 6: Conclusion and Future Work 154 6.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.2 Future Research Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 References 164 vi List Of Tables 2.1 Numbers of memory accesses, memory access sizes and the beginning ad- dress of the first memory access for an MxN block. . . . . . . . . . . . . . 29 2.2 Numbers of the x-direction and the y-direction interpolation filters for an MxN block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3 MCP complexities (decoding time in milli-seconds) for bit streams with different distributions of selected reference frames (RFs), where D1 Susie (S) and Football (F) sequences are coded under different bit rate constraints. 35 2.4 Weight training bit streams for the MCP complexity model. . . . . . . . . 39 2.5 SCP decoding complexities per MB for Foreman, Football and Blue sky bit streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Weights of four prediction directions used by the I16MB intra type and MB chrominance component. . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.7 Weights of nine prediction directions used by the I8MB and I4MB intra types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.8 ComparisonsofactualandestimatedSCPdecodingcomplexities(decoding time in milli-seconds) for Blue sky, Toy and calendar, Sunflower and Rush hour sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.9 Comparison of actual and estimated decoding complexities (decoding time inmilli-seconds)forAkiyo,Container,FlowerandForemanCIFsequences, where five reference frames were used in the ME process . . . . . . . . . . 53 2.10 Comparison of actual and estimated decoding complexities (decoding time in milli-seconds) for Mobile, Silent, Stefan and Tempete CIF sequences, where five reference frames were used in the ME process . . . . . . . . . . 54 vii 2.11 Comparison of actual and estimated decoding complexities (decoding time in milli-seconds) for D1 sequences, where five reference frames were used in the ME process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.12 Comparisonsofactualandestimateddecodingcomplexities(decodingtime in milli-seconds) for HD sequences, where five reference frames were used in the ME process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.1 Estimationerrors(%)betweentheactualandestimatedCABACdecoding complexities for the source and header data for various bit streams, where the top row indicates the bit rate in the unit of Megabits per second (Mbps). 87 4.1 Estimation errors (%) between the actual and estimated CAVLC(source data)andUVLC(headerdata)decodingcomplexitiesforvariousbitstreams.111 5.1 Theboundarystrength(BS)ofanedgedependsoncodingmodes,reference frames, motion vectors and quantized residual data. . . . . . . . . . . . . 119 5.2 Selected training bit streams for CIF sequences, where each bit stream is with {IPPP...PP} picture structure . . . . . . . . . . . . . . . . . . . . . . 138 5.3 Selected training bit streams for D1 sequences, where each bit stream is with {IPPP...PP} picture structure . . . . . . . . . . . . . . . . . . . . . . 139 5.4 Selected training bit streams for HD sequences, where each bit stream is with {IPPP...PP} picture structure . . . . . . . . . . . . . . . . . . . . . . 140 viii List Of Figures 1.1 The concept of decoder-friendly H.264/AVC encoding. . . . . . . . . . . . 4 1.2 The block diagram of H.264/AVC video encoding. . . . . . . . . . . . . . 7 2.1 Intra prediction directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Inter prediction MB types . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Interpolation filter for sub-pel accuracy motion compensation . . . . . . . 19 2.4 Implementationofthey-directioninterpolationfilterusingSIMD2instruc- tions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Inter-prediction of two consecutive blocks, where the two reference blocks are closer to each other in Case I but far away in Case II. . . . . . . . . . 26 2.6 RDO process and RDO process with integrated decoding complexity model 44 2.7 Weights for the number of cache misses for CIF, D1 and HD sequences, where the x-axis is the entropy of the distributions of selected reference frames. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.8 DecodingcomplexitycontrolforCIFbitstreamswithfivereferenceframes: Flower (row 1), Foreman (row 2), Stefen (row 3) and Tempete (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . . . . . . 59 2.9 Decoding complexity control for D1 bit streams with five reference frames: Football (row 1), Susie (row 2), Ship (row 3) and Mobile (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complex- ity control (column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 ix 2.10 DecodingcomplexitycontrolforHDbitstreamswithfivereferenceframes: Blue sky (row 1), Toy and calendar (row 2), and Subflower (row 3), where the x-axis is the decoding time and the y-axis is the deviation in complex- ity control (column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.11 Decoding complexity control for CIF bit streams with one reference frame: Flower (row 1), Foreman (row 2), Stefen (row 3) and Tempete (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . . . . . . 62 2.12 Decoding complexity control for D1 bit streams with one reference frame: Football (row 1), Susie (row 2), and Ship (row 3), where the x-axis is the decodingtimeandthey-axisisthedeviationincomplexitycontrol(column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.13 Decoding complexity control for HD bit streams with one reference frame: Blue sky (row 1), and Toy and calendar (row 2), where the x-axis is the decodingtimeandthey-axisisthedeviationincomplexitycontrol(column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1 CABAC encoding and decoding processes . . . . . . . . . . . . . . . . . . 69 3.2 CABAC encoding process for quantized transformed coefficients . . . . . . 71 3.3 The relationship between the bit rate and CABAC decoding complexity for various high bit rate bit streams . . . . . . . . . . . . . . . . . . . . . 80 3.4 Rate control and the proposed decoding complexity control algorithms in H.264 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5 The CABAC decoding complexity control for various video sequences. . . 89 4.1 The CAVLC decoding process. . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 The implementation of the CAVLC decoding process by loops. . . . . . . 96 4.3 The relationship between the source bit rate and CAVLC decoding com- plexity for four test sequences (Blue sky, Toy and calendar, Sunflower, and Rush hour) coded by high bit rates. . . . . . . . . . . . . . . . . . . . . . 103 x 4.4 The relationships between header bit rates and the UVLC decoding com- plexity for four high rate video sequences (Blue sky, Toy and calendar, Sunflower, and Rush hour). . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.5 The proposed decoding complexity control algorithms in H.264 . . . . . . 108 4.6 CAVLC/UVLC decoding complexity control for four test sequences: Blue sky (row 1), Toy and calendar (row 2), Sunflower (row 3), and Rush hour (row 4), where the x-axis is the decoding time and the y-axis is the devi- ation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). . . . . . . . . . . . . . . . . . . . 113 5.1 The deblocking filter process for one 4x4 block. . . . . . . . . . . . . . . . 118 5.2 Pixels near a block boundary. . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3 The deblocking filter operation for one frame. . . . . . . . . . . . . . . . . 131 5.4 The deblocking filter operations integrated into the rate-distortion opti- mization (RDO) processs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.5 The proposed decoding complexity control method for a group of slices (GOS).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.6 The rate, distortion and decoding complexity characteristics for Foreman and Mobile 256KBps sequences at the 10th frame, where the y-axis is RD(λ c ) and the x-axis is C(λ c ), respectively. . . . . . . . . . . . . . . . . 137 5.7 Model verification for variant CIF bit streams . . . . . . . . . . . . . . . . 141 5.8 Model verification for variant D1 bit streams . . . . . . . . . . . . . . . . 142 5.9 Model verification for variant HD bit streams . . . . . . . . . . . . . . . . 142 5.10 DecodingcomplexitycontrolbytheAveragemethodforAkiyo, Container, Mobile, and Silent CIF bit streams . . . . . . . . . . . . . . . . . . . . . . 144 5.11 Decoding complexity control by the Average method for Flower, Foreman, Stefan, and Tempete CIF bit streams . . . . . . . . . . . . . . . . . . . . . 145 5.12 Decoding complexity control by the Average method for Football, Ship, Susie, and Mobile D1 bit streams . . . . . . . . . . . . . . . . . . . . . . . 146 5.13 DecodingcomplexitycontrolbytheAveragemethodforBluesky, Toyand calendar, Sunflower, and Rush hour, HD bit streams . . . . . . . . . . . . 147 xi 5.14 Decoding complexity control by the Hierarchical method for Akiyo, Con- tainer, Mobile, and Silent CIF bit streams . . . . . . . . . . . . . . . . . . 149 5.15 Decoding complexity control by the Hierarchical method for Flower, Fore- man, Stefan, and Tempete CIF bit streams . . . . . . . . . . . . . . . . . 150 5.16 DecodingcomplexitycontrolbytheHierarchicalmethodforFootball,Ship, Susie, and Mobile D1 bit streams . . . . . . . . . . . . . . . . . . . . . . . 151 5.17 Decoding complexity control by the Hierarchical method for Blue sky, Toy and calendar, Sunflower, and Rush hour, HD bit streams . . . . . . . . . . 152 xii Abstract The problem of H.264/AVC decoder complexity modeling and its applications to control the decoding complexity are studied in this research. The encoder integrated with the decodingcomplexitymodelandtheassociateddecodingcomplexitycontrolalgorithmcan generate decoder-friendly bit streams in the sense that compressed bit streams can be easily decoded on a particular decoding platform at a lower complexity and/or to meet various decoding complexity constraints. First, a decoding complexity model for H.264/AVC motion compensation process (MCP)andspatialcompensationprocess(SCP)isproposedandappliedtotheH.264/AVC decoding complexity reduction. The proposed complexity model considers a rich set of interandintrapredictionmodesofH.264/AVCaswellastherelationshipbetweenmotion vectors (MVs), frame sizes and the distribution of selected reference frames, which turn outtobehighlyrelatedtocachemanagementefficiency. AnH.264/AVCencoderequipped withthecomplexitymodelcanestimatethedecodingcomplexityandthenchoosethebest inter- or intra-prediction mode to meet the decoding complexity constraint of a target decodingplatform. AdecodingcomplexitycontrolschemeforH.264/AVCMCPandSCP isalsopresented. Theperformanceoftheproposedcomplexitymodelandthecomplexity control scheme in video decoding complexity reduction is demonstrated experimentally. xiii Second,twodecodingcomplexitymodelsforH.264/AVCentropycodingareproposed. It has been observed that entropy decoding demands a higher computational complexity for high rate video streams (e.g., high definition contents) due to a larger number of non- zero quantized transformed coefficients (QTCs) and MVs, which motivates our study on this topic. There are two entropy coding modes in H.264/AVC: the context-based adaptive binary arithmetic coding (CABAC) and the variable length coding (VLC). Fur- thermore, the latter mode consists of two tools: universal variable length coding (UVLC) and content-based adaptive variable length coding (CAVLC). In H.264/AVC, CABAC is used to encode all syntax elements while CAVLC and UVLC are used to encode QTCs and header data, respectively. The proposed entropy decoding complexity models consist of two parts. Its first part is designed for the source data (i.e., QTCs) while its second part aims at effective coding of the header data. Both parts are verified experimentally. Complexity control for H.264/AVC entropy decoding is also examined. Finally,thedecodingcomplexitymodeloftheH.264deblockingfilter(DBF)isstudied. The DBF process consists of three main modules: boundary strength computation, edge detection, and low-pass filtering. Complexities of all three of them are considered in our proposed model. DBF-based decoding complexity control is also investigated. It is shown experimentally that the proposed complexity model provides good complexity estimates. Besides, the H.264 encoder equipped with the decoding complexity model and complexity control algorithms can generate bit streams to save a significant amount of decoding complexity while offering quality similar to those generated by a typical H.264 encoder. xiv Chapter 1 Introduction 1.1 Significance of the Research Video compression is a technique to reduce the amount of bits to represent video data without significant visual quality loss [16]. It plays an important role in video trans- mission and storage systems. Although the transmission rate and the storage size have greatly improved in the past decade, video compression is still very much needed for two reasons. First, today’s storage size and transmission rate are insufficient for raw (i.e., uncompressed) video data. For example, less than 90 sec uncompressed high defi- nition (1920x1080) video data can be stored in one fifteenth of the Giga-byte HD-DVD disc. Second, new consumer electronics that demand better and better visual quality are emerging. As a result, new video coding standards and storage/transmission systems are developed to meet this need. For example, the ISO MPEG-1 video coding standard [13] was adopted by the video compact disc (VCD) standard to generate 352x288 resolution bit stream in one compact disk (CD), whose size is about 640 Mega-byte, in 1993. Later 1 on, the ISO MPEG-2 video coding standard [14] that achieves better compression perfor- mance than MPEG-1 was selected in the digital video disk (DVD) standard to generate a bit stream of resolution up to 720x480 pixels per frame and 30 frames per second in one 4.7 Giga-byte DVD. Typically, video compression algorithms consist of four modules: inter- or intra- prediction, spatial domain transform, quantization, and entropy coding. First, a video frameispartitionedintoseveralblocksofasmallerregion,i.e.,macro-blocks(MBs). After that, the inter-prediction is used to remove the temporal redundancy among consecutive frames while the intra-prediction is employed to remove spatial redundancy within neigh- boringblocks. Then,theresidualdataresultingfromtheinter-orintra-predictionarefed into the spatial domain transform and the quantization modules to generate quantized transformed coefficients (QTCs). Finally, the entropy coding module is utilized to reduce the redundancy among QTCs to generate the final bit stream. Videocodingstandardsonlydefinethebitstreamsyntax. Theencodermustgenerate bitstreamsthatconformwiththesyntaxsothattheresultingbitstreamscanbedecoded by a compatible decoder. However, video coding standards do not demand encoders to generate the same bit stream with respect to the same input video. In other words, the encoder has the freedom to generate different bit streams for different applications. For example, the conventional encoder can generate a bit stream to achieve the best quality whilesatisfyingabitrateconstraint(sothatthegeneratedbitstreamcanbetransmitted through a communication channel of a certain rate). A low complexity encoder can generate bit streams in real time while a more sophisticated encoder can generate bit streams to give better visual quality in an off-line mode. 2 H.264/AVC is the latest video coding standard, which has been selected as a video codingformatforHD-DVDandBlue-rayDVD[53]. H.264/AVCprovidesalargenumber of inter- and intra-prediction modes to improve the coding gain at the cost of higher encoding and decoding complexity. Even though there has been a lot of work focusing on fast algorithms for the encoder optimization, there is little work dedicated to the decoder optimization. In many H.264/AVC applications, encoding is performed off-line whiledecodingisalwaysperformedon-line. Itwasreportin[8]thatH.264/AVCdecoding complexity is about 2.1 to 2.9 times more than H.263. Thus, the reduction in decoding complexity so as to save power is an important issue. This is particularly true, when we implement H.264/AVC decoder on a portable device, where low complexity decoding algorithms can save the power and prolong the usage time. Onesolutiontoreducethedecodingcomplexityisthattheencodergeneratesdecoder- friendly bit streams, which are easier to decode on the decoding platform. For example, the encoder can find the best coding mode that minimizes the distortion under the rate and the decoding complexity constraints. In such a scenario, the encoder must have a targetdecodingplatforminmindsoastogeneratedecoder-friendlybitstreams. Thisidea motivates us to study decoding complexity models and associated decoding complexity control schemes. The concept of decoder-friendly H.264/AVC encoding is illustrated in Fig. 1.1, where we show an H.264/AVC encoder that has several decoding complexity models and decod- ingcomplexitycontrolschemesinitsbitstreamencodingprocess. Baseonthesedecoding complexity models and complexity control schemes, the encoder can generate decoder- friendly bit streams for a particular decoding platform. In this work, several decoding 3 Figure 1.1: The concept of decoder-friendly H.264/AVC encoding. complexity models and complexity control schemes for H.264/AVC will be investigated. Theyinclude: themotioncompensationprocess(MCP),thespatialcompensationprocess (SCP), the entropy decoding process, and the deblocking filter (DBF) process. There are a few challenging research problems to be addressed along this direction. • Decoding complexity model should be generic and accurate On one hand, to generate the decoder-friendly bit streams for a wide range of decoding platforms, the decoding complexity model should be generic. On the otherhand,agenericdecodingcomplexitymodelmaynotprovideagoodestimation of decoding complexities. Thus, it is challenging to balance the tradeoff between generality and accuracy. To make this point clear, we consider two examples below. First, the macro-block (MB) decoding complexity is modeled as the number of boundary MBs and the number of non-boundary MBs in the MPEG-4 VCV model [15]. However,ithasbeenobservedthattheMBdecodingcomplexitiesaredifferent among those MBs coded by various modes. As a result, the MPEG-4 VCV model 4 is inaccurate although this model is generic. Second, the H.264/AVC decoding complexity model reported in [37, 8] considers the numbers of clock cycles spent in different sub-functions. That is, the numbers of clock cycles for all sub-functions in the H.264/AVC decoder are first counted individually for a particular decoding platform. Then, the decoding complexity is the sum of the individual number of clock cycles multiplied by the usage rate of each sub-function. Although this model can provide an accurate decoding complexity estimation, it is too specific with respect to a particular decoding platform and, therefore, may not be suitable for other platforms. • MCP Decoding complexity is content dependent It is well known that the MCP decoding process consumes more CPU power than anyotherprocesses[8]. DuetothehighestCPUpowerdemand, theMCPdecoding complexitymodelhasbeenstudiedin[45,50,46,47]. SincetheMCPoperationcan beseenasamemoryoperation,thememoryusageandcachemanagementefficiency areinfluencedbycodingmodesandtherelationshipbetweenmotionvectors(MVs). TheMCPdecodingcomplexitymodelshouldbeafunctionofthesetwofactors[50]. However, since it is difficult to model the MCP decoding complexity as a function of the relationship between MVs, most existing methods [50, 48, 49] disregard the relationship between MVs, and simply model it as a function of the number of interpolation filters. Consequently, the resultant models cannot provide accurate estimation results. • Decoding complexity control is an open problem 5 Being similar to rate control used in the encoder to control bit rates, decoding complexity control is a process to control the decoding complexity. An important task in decoding complexity control is to determine the relationship between the decoding complexity model and coding parameters. To be more specific, decoding complexity control helps the encoder select proper coding parameters based on the decoding complexity constraint and the estimated decoding complexity obtained from the model. However, little theoretical analysis on decoding complexity and coding parameters is available, and most existing work [50, 48, 49, 9] relies on simulations to find their relationship. In this research, we propose a novel decoding complexity control scheme that enables the encoder to generate bit streams that meet the decoding complexity constraint accurately. 1.2 Research Background 1.2.1 Introduction to H.264/AVC Video Coding Standard The block diagram of an H.264/AVC encoder is shown in Fig. 1.2 and explained below. Being similar to other video coding standards, each frame in H.264/AVC is first parti- tionedinto16x16macro-blocks(MBs),andtheneachMBcanbecodedbytheinter-orthe intra-prediction modes. One 16x16 MB can further be partitioned into one 16x16 block, two 16x8 or 8x16 blocks or four 8x8 blocks, where each 8x8 blocks can be further parti- tioned into two 8x4 or 4x8 blocks or four 4x4 blocks, depending on the inter-prediction mode. On the other hand, one MB can be partitioned into one 16x16 block, four 8x8 blocks or 16 4x4 blocks if the intra-prediction mode is selected. After that, each block is 6 Figure 1.2: The block diagram of H.264/AVC video encoding. subtracted from the predicted block to get the residual block, where the predicted block can be either estimated by the motion estimation (ME) process from the neighboring frames and MV if an inter-prediction mode is selected, or obtained from the neighboring block within the same frame if an intra-prediction mode is selected. H.264/AVC provides fractional samples of ME. That is, the MV can be of half- or quarter-pel resolution. Afterinter/intra-prediction,theresidualblockisfedintothespatialdomaintransform andthequantizationprocessestogetquantizedtransformedcoefficients(QTCs). Finally, the entropy coding process encodes QTCs to generate bit streams. In H.264/AVC, the so-calledP SkipmodecanbeusedtocodeanMBinalargeareawithslowmotion. When one MB is coded as the P Skip mode, neither MV nor residual block are transmitted. In other words, the spatial domain transform, the quantization process, and the entropy coding process are skipped if MB is coded by the P skip mode. The MV of the P Skip mode MB is decided by those MVs of its neighboring MBs. 7 TherearetwoentropycodingmodesinH.264/AVC:thecontext-basedadaptivebinary arithmetic coding (CABAC) and the variable length coding (VLC). CABAC is used to encode all syntax elements. VLC consists of two entropy coding tools: the context- based adaptive variable length coding (CAVLC) and the universal variable length coding (UVLC), which are used to encode source data and header data, respectively. Since H.264/AVC adopts the block-based coding scheme, blocking artifacts may be introducedbycoarsequantizationoftransformedcoefficientsorthediscontinuitybetween reference blocks during motion compensation. To mitigate this problem, an in-loop de- blockingfilter(DBF)isusedinH.264/AVC,whichistobeperformedinboththeencoding and decoding loops so that reconstructed frames can be processed by the DBF process and then used as reference frames for future encoding (or decoding). The DBF process consists of three main modules: boundary strength (BS) computation, edge detection and the low-pass filtering. In H.264/AVC, the BS value of each edge in one 4x4 block is first computed. After that, edges with non-zero values of BS are analyzed in the edge detection module to distinguish true edges and those due to block artifacts. Finally, two low-pass filtering operations are used to remove block artifacts, where the choice of low-pass filtering depends on the BS value of an edge. 1.2.2 Mode Decision with Rate-Distortion Optimization (RDO) The rate-distortion optimization (RDO) process is often used to decide the best coding mode among various coding modes in H.264/AVC so as to minimize the distortion under a certain rate constraint. Since the coded MB is partitioned into different number of blocks according to the inter- or intra-prediction mode, the RDO process first finds the 8 bestMVorthebestintra-predictiondirectionforaspecificinter-predictionmodeorintra type, respectively. After that, the RDO process either actually performs the encoding task, which includes the spatial domain transform, quantization and entropy encoding or uses a model to determine the associated coding rate and distortion. Finally, the RDO process finds the best inter- or intra-prediction mode that yields the minimal rate- distortion (RD) cost for the MB. Typically, the RDO process performs several times of encoding processes to decide the best coding mode that minimizes the distortion and meets the rate constraint simultaneously. 1.3 Contribution of the Research H.264/AVC decoding complexity models and their applications are examined in depth in this research. Main contributions of the research are summarized below. • In Chapter 2, we propose novel decoding complexity models for H.264/AVC mo- tion compensation process (MCP) and spatial compensation process (SCP), which are the decoding process of inter- and intra-prediction, respectively. These models considernotonlyarichsetofinter-andintra-predictionmodesusedinH.264/AVC but also the relationship between MVs, frame sizes, and the distribution of selected referenceframes, whichaffectthecacheperformanceandthereforetheMCPdecod- ing complexity. Our experimental results show that the proposed MCP and SCP decoding complexity models provide good estimation results. Errors between the estimated and the actual decoding complexities measured by Intel VTurn are less 9 than 10%. Furthermore, the MCP and SCP decoding complexity models are incor- porated in the H.264/AVC encoder for decoding complexity control. The proposed complexity control scheme helps the encoder to generate bit streams to meet differ- ent decoding complexity constraints. The resultant bit streams can be decoded at a much lower complexityat the cost ofsmall PSNR loss. This is particular interest- ing in a mobile broadcasting environment, where multiple mobile devices download broadcast/streaming video in real time. • It has been observed that the entropy decoding complexity becomes significant in the high rate video due to more non-zero QTCs and MVs. There are two entropy coding modes in H.264/AVC. One entropy coding mode is CABAC while another oneisCAVLC/UVLC.CAVLCandUVLCareusedtocodesourcedata,i.e., QTCS, and header data, respectively. On the other hand, CABAC can be used to code all syntaxelements. Complexitymodelsforentropydecodingandtheirapplicationsare studiedinChapters3and4. PleasealsonotethatCAVLCandUVLCareimportant in the baseline profile of the H.264/AVC encoder since this profile targets at low complexityandlowcostapplicationssothatonlyCAVLCandUVLCareallowedto be the entropy coding tools. The baseline profile is particular suitable for portable devices due to their limited computing power. – A novel CABAC decoding complexity model is presented in Chapter 3. This model consists of two parts. One is for the source data (i.e., QTCs) while the other is for the header data such as MVs and coding modes. The proposed 10 CABAC decoding complexity model is verified experimentally. Then, the pro- posed CABAC decoding complexity model is incorporated in the H.264/AVC encoder to generate decoder-friendly bit streams. The associated decoding complexity control scheme is studied. We investigate the relationship between the CABAC decoding complexity and the bit rate, which helps the selection of proper coding parameters in complexity control. – A new CAVLC/UVLC decoding complexity model is described in Chapter 4. The proposed CAVAC/UVLC decoding complexity model is verified experi- mentally. AdecodingcomplexitycontrolschemeforH.264/AVCCAVLC/UVLC decoding is also studied. The relationship between the bit rate and decoding complexityisinvestigated,whichisthenexploitedbytheencoderingenerating decoder-friendly bit streams. • A decoding complexity model for the deblocking filter (DBF) and its applications are presented in Chapter 5. Although the DBF decoding complexity model was studied by Yu et al. [10] before, their complexity model is not accurate enough due to the lack of some model parameters. We have improved the model in [10] by introducing new model parameters. Furthermore, the relationship between the DBF decoding complexity and the frame size is considered in Chapter 5, since the frame size affects cache efficiency, which plays an important role in the DBF implementation [23]. The proposed DBF decoding complexity model is verified ex- perimentally. The estimation errors are all within 10% for a wide range of frame 11 sizes with different video contents. The application of this model to DBF decod- ing complexity reduction in H.264/AVC decoding is presented. Our experimental results show that the H.264/AVC encoder equipped with the proposed decoding complexity model and complexity control schemes can generate decoder-friendly bit streams at much lower computational power while these bit streams can of- fer good coding performance similar to those generated by a typical H.264/AVC encoder. 1.4 Organization of the Dissertation The rest of this dissertation is organized as follows. Decoding complexity models for H.264/AVC MCP and SCP and their associated decoding complexity control scheme are proposed in Chapter 2. A CABAC decoding complexity model and its associated decod- ing complexity control scheme are presented in Chapter 3. A CAVLC/UVLC decoding complexity model and its associated decoding complexity control scheme are examined in Chapter 4. A new DBF decoding complexity model and its corresponding decoding complexity control schemes are presented in Chapter 5. Finally, conclusion remarks and future work are given in Chapter 6. 12 Chapter 2 Complexity Modeling of Spatial and Temporal Compensations in H.264/AVC Decoders and Its Applications 2.1 Introduction H.264/AVC[53],[35]isthelatestvideocodingstandardproposedbyITU-TandISO/IEC. IthasbeenselectedasthevideocodingtoolinHD-DVDandBlue-rayDVDspecifications. SinceH.264providesvariousinter-andintra-predictionmodestoimprovethecodinggain, itsencodermaysearchallpossiblemodesandchoosethebestonethatminimizestherate- distortion (RD) cost. Due to the use of a larger set of inter- and intra-prediction modes, the H.264 decoding complexity is about 2.1 to 2.9 times more than the H.263 decoder [8]. For some applications, the coded video bit stream will be decoded in portable consumer electronicsdevices. Undersuchascenario, reductionindecodingcomplexitysoastosave the power is very desirable. TosavepowerinanH.264/AVCdecoder,onepossiblesolutionisthattheH.264/AVC encoder can generate a decoder-friendly bit stream that is easy to decode in a generic 13 hardware platform. This idea motivates us to study the H.264/AVC decoding complexity model. Since an H.264/AVC decoder contains many decoding components, it is not easy to use a single complexity model to cover all components. Instead, we consider to develop different models for different decoding components. In this chapter, we will focus on the motion compensation process (MCP) and the spatial compensation process (SCP). Please note that MCP and SCP correspond to the decoding processes for inter- and intra-predictions, respectively, and they consume the largest amount of power in H.264/AVC decoding in a typical setting [23]. The derived complexity models can be used by an H.264/AVC encoder to estimate the decoding complexity for various inter- and intra-prediction modes conveniently. As a result, the coding mode that has the best tradeoffamongtherate,distortionandthedecodingcomplexityforaparticularhardware platform can be selected. Video decoding complexity models have been studied by researchers in various con- texts. A MPEG-4 video complexity verifier (VCV) model was described in [15]. The numbers of boundary macro-blocks (MBs) and non-boundary MBs decoded per second are first estimated and, then, the decoding complexity is modeled as a function of these two numbers in the VCV model. However, since the MB decoding complexities coded by different inter-prediction modes in MPEG-4 may vary, the VCV model is not very accurate. To address this problem, Valentim et al. [45, 46] proposed an enhanced MCP decoding complexity model for MPEG-4 video. By considering the fact that MBs coded by different inter-prediction modes have different decoding complexities, they used the maximal decoding time to measure decoding complexities of MBs coded by different modes individually. Then, the total decoding complexity of a bit stream is the sum of 14 each individual MB’s complexity. More recently, the MCP decoding complexity model was simplified by van der Schaar and Andreopoulos in [47, 1], where the decoding com- plexity of an MB is proportional to the number of its motion vectors (MVs). Thus, an MBwithmoreMVshashigherdecodingcomplexitythanthatwithfewerMVs. AnMCP decodingcomplexitymodelforH.264/AVCwasproposedbyWangandChangin[50,48]. The MCP decoding complexity is modeled as a function of the number of interpolation filters while the number of interpolation filters depends on the selected inter-prediction mode and MV accuracy. An MB with fewer interpolation filters should have lower de- coding complexity. However, decoding complexity models mentioned above may not provide accurate estimation results for H.264/AVC video decoding for two reasons. First, H.264/AVC allows a very rich set of inter- and intra-prediction modes, which are not yet considered extensivelybytheseexistingmodels. Second,therelationshipbetweenMVsofconsecutive blocks, the effect of frame sizes and the distribution of selected reference frames have not beentakenintoaccount. Itwillbeshowninthisworkthattheseparametersarerelatedto efficiency of cache management and consequently play an important role in the decoding complexity. An MCP decoding complexity model for H.264/AVC with cache management consid- eration was proposed by the authors in [25, 26]. As compared with work in [25, 26], there are two new contributions in this chapter. First, we propose an enhanced MCP com- plexity model that considers the distribution of selected reference frames and investigate another complexity model for SCP so that both temporal and spatial compensations can be well modeled. The accuracy of proposed MCP and SCP complexity models is verified 15 experimentally. It is shown that estimation errors of these models are less than 10% of actual decoding complexities. Next, we show the application of derived temporal and spatial compensation models to decoder-friendly H.264/AVC video encoding. That is, an H.264/AVC encoder equipped with these complexity models can choose the best inter- or intra-prediction mode and the best MV or intra-prediction direction while meeting the decoding complexity constraint of the target receiver platform conveniently. The rest of this chapter is organized as follows. Background material on H.264/AVC inter- and intra-predictions is reviewed in Sec. 2.2. H.264/AVC decoding complexity models for MCP and SCP are derived in Sec. 2.3 and Sec. 2.4, respectively. The appli- cation of proposed decoding complexity models to decoder-friendly H.264/AVC encoding is discussed in Sec. 2.5. The accuracy of MCP and SCP complexity models is verified experimentally, and their application is demonstrated in Sec. 2.6. Finally, concluding remarks are given in Sec. 2.7. 2.2 OverviewofH.264InterandIntraPredictionsandMode Decision Algorithms 2.2.1 H.264 Inter- and Intra-Predictions H.264 provides various block sizes for inter- and intra-predictions to improve the coding gain. When intra-prediction is used to code one 16x16 MB, one of the I4MB, I8MB and I16MB intra types can be used to code the MB luminance component. If the I4MB (or I8MB) intra type is selected, the MB is partitioned into 16 4x4 blocks (or four 8x8 blocks),andeachblockcanbepredictedbyoneoftheninepredictiondirectionsasshown 16 Figure 2.1: Intra prediction directions in Fig. 2.1(a). For the MB coded by the I16MB intra type, four prediction directions are allowed to code this MB as shown in Fig. 2.1(b). The MB chrominance components are partitioned into eight 4x4 blocks. Each of the Cb and Cr chrominance components has exactly four 4x4 blocks. Since MB chrominance component is usually smooth over a large area, one of the four prediction directions allowed by the I16MB intra type is used to code each 4x4 block. On the other hand, one 16x16 MB can be partitioned into one 16x16 block, two 16x8 or 8x16 blocks or four 8x8 blocks when inter-prediction is used. Each 8x8 blocks can be further partitioned into two 8x4 or 4x8 blocks or four 4x4 blocks, as shown in Fig. 2.2. As a result, there are totally 259 modes to encode one 16x16 MB. Moreover, each block whose size is larger than 8x8 can be predicted using different reference frames. In addition, H.264 provides fractional samples of motion estimation (ME). The MV can 17 Figure 2.2: Inter prediction MB types be of half- or quarter-pel resolution. The half-pel value is obtained by applying a one- dimension 6-tap FIR interpolation filter horizontally (the x-direction) or vertically (the y-direction). The quarter-pel value is obtained by the average of two nearest half-pel values. For example, the half-pel value of the fractional sample b in Fig. 2.3 is obtained by applying 6-tap FIR interpolation filter to those pixels E, F, G, H, I and J as follows: b=((E−5F +20G+20H −5I +J)+16)>>5 (2.1) Then, the quarter-pel value of the fractional sample a in Fig. 2.3 is given by: a=(b+G+1)>>1 (2.2) Furthermore, H.264providestheso-calledP SkipmodetocodeanMBinalargearea with slow motion. When one MB is coded as the P Skip mode, neither MV nor residual error data are transmitted. The MV of the P Skip mode MB is decided by those MVs of its neighboring MBs. This flexibility which allows the use of various inter- and intra- prediction modes and sub-pel MV accuracies makes the decoding complexity modeling more challenging as compared with that in [45, 46, 50, 48, 49, 25, 26, 47, 1]. 18 Figure 2.3: Interpolation filter for sub-pel accuracy motion compensation 2.2.2 Categories of Mode Decision Algorithms The objective of an encoder is to find the best coding mode which can minimize the distortion and also satisfy the bit rate constraint. Mathematically, this problem can be formulated as min X i D(S i |QP,m i ) subject to X i R(S i |QP,m i )≤R c (2.3) where D(S i |QP,m i ) and R(S i |QP,m i ) are the distortion and bit rate of coding unit S i for a given coding modem i and quantization parameter (QP), respectively, andR c is the bit rate constraint. The coding unit S i can be a group of pictures, a group of MBs or just one MB. In H.264, the coding unit is per MB. The constrained optimization problem 19 (2.3)isusuallysolvedbytheLagrangianmultiplier(LM)methodwhichintroducesanon- negative multiplier and converts the original problem into an unconstrained optimization problem [52, 42, 34, 51]. That is, min X i D(S i |QP,m i )+λ m X i R(S i |QP,m i ) (2.4) The optimal solution of (2.4), i.e., m ∗ i and λ ∗ m , is equivalent to the optimal solution of theoriginalconstrainedoptimizationproblem(2.3). NotethattheLagrangianmultiplier, λ m , in (2.4) depends on the bit rate constraint. Thus, it can be expressed as a function of QP. If the coding units are independent to each other, the optimization problem (2.4) can be further simplified [52] as X i min(D(S i |QP,m i )+λ m ·R(S i |QP,m i )) (2.5) In other words, the unconstrained optimized problem (2.4) is separated into several sub- problems, and these sub-problems can be solved individually since all of them are inde- pendenttoeachother. ForaspecificcodingunitS i , encodersmaytryallpossiblecoding modes to find the best mode m ∗ i which minimizes the rate distortion (RD) cost function (2.6) D(S i |QP,m i )+λ ∗ m ·R(S i |QP,m i ) (2.6) 20 once the optimal Lagrangian multiplier λ ∗ m is given. The set of the best coding mode {m ∗ 1 ,m ∗ 2 ,m ∗ 3 ,...}achievestheglobaloptimumoftheoriginalproblem(2.4). Thisapproach is the so-called full mode decision and is adopted in H.264 standard. There are two problems when LM method is used by the encoders to find the optimal codingmode. First,sincetheaboveunconstrainedoptimizationproblem(2.4)isadiscrete optimization problem, the optimal mode may not be found by LM method if the optimal mode is not on the convex hull of RD curve [34, 2] (i.e., there exists the duality gap). Thisusuallyhappensifthenumberofcodingmodesissmall, andthecorrespondingrate- distortion points, i.e., (R(S i |QP,m i ),D(S i |QP,m i )), are sparse in the RD plane. The secondproblemisthatthechooseofoptimalLagrangianmultiplierλ ∗ m ,whichisrelatedto the bit rate constraintR c , is difficult. Once the optimal Lagrangian multiplier is decided, the encoders can evaluate the RD cost function (2.6) for all possible modes and then find the optimal coding mode that minimizes the RD cost function (2.6). To find the optimal Lagrangian multiplier λ ∗ m , the relationship between λ ∗ m and quantization step size (QS) were studied in [42, 51]. From experimental simulations and curve fitting, the optimal Lagrangian multiplierλ ∗ m can be expressed as a function of QS, which is therefore related to the bit rate constraint. The result can be used in H.264, too. Since the relationship between QS and QP in H.264 is that QS =2 (QP−4)/6 , the optimal Lagrangian multiplier used for mode decision in H.264 can be written as a function of QP as follows [52]: λ ∗ m =0.85·2 (QP−12)/3 (2.7) 21 There are several works further addressing the mode decision problems which are all extended from the original RD problem (2.3) and can be classified into three categories : • Fast inter/intra mode decision algorithms • Mode decision algorithms with encoding complexity constraint • Mode decision algorithms with decoding complexity constraint As mentioned before, the so-called full mode decision algorithm, which searches all possible coding modes to find the best one that minimizes the RD cost function (2.6), is adopted in H.264 standard. Since H.264 provides various inter- and intra-prediction modes to improve the coding gain, the full mode decision algorithm increases the com- putational complexity of the H.264 encoder tremendously. Therefore, several fast inter and intra model decision algorithms have been presented so as to reduce the encoding complexity [43, 44, 36, 6, 54, 19, 18, 38]. Those algorithms can be further divided into two types. The authors [43, 44] proposed fast predictors to estimate bit rate and distor- tion rather than perform the encoding and decoding tasks to obtain the actual bit rate and distortion while another type of algorithms search a smaller set of inter- and intra- prediction modes by early skipping some modes with low possibility [36, 6, 54, 19, 18]. These algorithms were proposed so as to reduce computational complexity of the full mode decision algorithm. Another category of the mode decision algorithms is that the best mode is selected to minimizetheRDcostfunctionbutalsosatisfytheencodingcomplexityconstraint[21,9]. Onedifferencebetweenthefirstandsecondcategoriesofmodedecisionalgorithmsisthat the second category of algorithms search a smaller set of coding modes whose encoding 22 complexities satisfy the encoding complexity constraint for a given video sequence. As a result, the encoder can select the best coding mode to balance the trade off between RD cost and encoding complexity. Since encoding complexity is considered into the encoder duringmodedecision, thisoptimizationproblemcanbesolvedbyLMmethod, too. That is, another Lagrangian multiplier λ c is introduced to deal with the encoding complexity constraint. Then, the encoders can decide the best coding mode which can minimize the RD and encoding complexity (RDC) cost function: D(S i |QP,m i )+λ ∗ m ·R(S i |QP,m i )+λ ∗ c ·C(S i |QP,m i ) (2.8) whereC(S i |QP,m i ) is the encoding complexity of coding unitS i for a given coding mode m i andQP.Thiscategoryofmodedecisionalgorithmsisparticularlysuitableforrealtime applications while the encoders have to generate bit streams according to the computing power of encoding platforms. Be similar to the Lagrangian multiplier λ ∗ m for a given bit rate constraint, the Lagrangian multiplier λ ∗ c is related to the encoding complexity constraint. The concept of the third category of mode decision algorithms is similar to that of the second category while the decoding complexity rather than the encoding complexity is considered [47, 1, 25, 26, 50, 48, 49]. This type of mode decision algorithms consid- ers the best coding mode which not only minimizes RD cost function but also satisfies decoding complexity constraint for a particular decoding platform. Therefore, the gener- ated bit streams are easy to decode in that decoding platform. Under such scenario, the decoding complexity models must be included in the encoders so that the encoders can 23 estimate possible decoding complexity and select the best mode to balance the RD cost anddecodingcomplexityconstrainttradeoff. Similarly, theRDanddecodingcomplexity optimization for the third category of mode decision algorithms can be solved by the LM method, too [50, 48, 49]. That is, the RD and decoding complexity cost function (2.8) is evaluated during mode decision, where C(S i |QP,m i ) in (2.8) is the decoding complexity (rather than the encoding complexity) of coding unit S i for a given coding mode m i and QP, and the Lagrangian multiplier λ ∗ c depends on the decoding complexity constraint. In this chapter, we propose a novel decoding complexity model for H.264 MCP and SCP as well as its application to mode decision. The H.264 encoder equipped with the proposed decoding complexity mode and mode decision scheme can generate decoder- friendly bit streams for a particular decoding platform. 2.3 Complexity Model for Motion Compensation Process (MCP) 2.3.1 Linear Complexity Model Inthiswork, theMCPdecodingcomplexityismodeledasalinearfunctionofthenumber ofcachemisses(N c ),thenumberofy-directioninterpolationfilters(N y ),thenumberofx- directioninterpolationfilters(N x )andthenumberofMVsperMB(N v ). Mathematically, the MCP decoding complexity is written as C mcp =α·N c +β·N y +γ·N x +μ·N v , (2.9) 24 where α, β, γ and μ are the weights of these four terms, respectively. The model in Eq. (2.9) includes the number of MVs per MB, N v , as proposed in [47, 1] and the numbers of interpolation filters, (N x and N y ) as suggested in [50, 48]. The numbers of the x-direction and the y-direction interpolation filters are used to model the decoding complexity of interpolation when the MV is of half- or quarter-pel resolution. Since the decoder may have different numbers of interpolation filters along the x- and the y-directions, two different terms are used in Eq. (2.9). Specifically,weusetheexampleofanSIMD(single-instruction-multiple-data)instruc- tion to explain the difference in implementing the two interpolation filters even though they may be of the same sub-pel resolution. Consider the use of Intel SIMD2 instructions [12] to implement the 6-tap FIR interpolation filter. One x-direction interpolation filter can be realized by one PMADDWD and two PADDD instructions, where six consecutive pixels are multiplied by coefficients of the 6-tap FIR filter and then added together. On the other hand, eight y-direction interpolation filters can be done by SIMD2 instructions in parallel as shown in Fig. 2.4. That is, six 128-bit registers are used to store six rows of data, where each row has eight consecutive pixels. Then, these six 128- bit registers are multiplied by their corresponding filter coefficients (which are 1, -5, 20, 20, 5, 1, respectively) and added together to produce eight half-pel values. As a result, we need two individual terms to model decoding complexities of horizontal and vertical interpolation filters. The most unique feature of the proposed MCP complexity model in Eq. (2.9) is its inclusion of the number of cache misses. As shown in Fig. 2.5, two consecutive MVs in thecurrentimageframepointtotworeferenceblockswithadifferentspatialrelationship. 25 Figure 2.4: Implementation of the y-direction interpolation filter using SIMD2 instruc- tions. Figure 2.5: Inter-prediction of two consecutive blocks, where the two reference blocks are closer to each other in Case I but far away in Case II. The two reference blocks are closer to each other in Case I but far away in Case II. The decoding complexities of these two cases are different since most data required for the prediction of block B in Case I could be obtained in the cache or internal registers of the CPU after the decoding of block A is done. Intuitively, the number of cache misses for Case I is fewer than that of Case II and, consequently, the decoding complexity for block B in Case I should be lower than that of Case II. Experiments were conducted on a mobile platform with Windows XP and Pentium CPUof1.7GHztoverifytheaboveconjecture. TheIntelVTuneperformanceanalyzer8.0 26 [11]wasusedtomeasuredecodingtime(whichisproportionaltodecodingcomplexity)for allpre-codedbitstreamsthroughtoutthepaper. WemodifiedtheH.264/AVCencoderso that it uses a specific inter-prediction mode to encode MBs. For the first test, two inter- prediction modes, i.e., P16x8 andP8x16, were selectedto encode Foremansequence with integer-pixel (int-pel) MV resolution and the single reference frame motion estimation (SRF-ME) was used in the encoder. Two bit streams are generated accordingly, where each bit stream contains 270 frames. It has been observed that the MCP decoding complexitiesofthesetwobitstreamsmeasuredbyIntelVTunearequitedifferentalthough these two bit streams have the same number of MVs. The MCP decoding time of the bit stream encoded by P16x8 takes about 31.43 milli-seconds (ms) while that encoded by P8x16 takes about 52.21 ms. Clearly, the number of MVs is not the only factor to determine the MCP decoding complexity. For the second test, the P4x4 mode was selected to encode Foreman and Container sequences with int-pel MV resolution and with one reference frame. According to mea- surements from Intel VTune, their MCP decoding complexities are quite different, too. The measured decoding time of Foreman P4x4 is 196.55 ms while that of Container P4x4 is 157.34 ms. The reason of demanding less MCP decoding time for Container is that it contains a large smoother area and its MV relationship is more similar to that in Case I whileForemanhasalotofmotionanditsMVrelationshipismoresimilartothatinCase II. This demonstrates that the MV relationship plays an important role in the decoding complexity of MCP and have to be considered carefully. For simplicity, the decoding complexities of MB with half- and a quarter-pel (i.e., sub-pel) MV resolution are assumed to be the same in our complexity model. This can 27 be justified by the fact that, as compared to that of the half-pel MV, the additional computation required by the quarter-pel MV is the average of two nearest half-pels. The cost of the additional computation is low and negligible in most platforms. These weights α, β, γ and μ in Eq. (2.9) can be obtained as follows. First, the Intel VTune performance analyzer is used to measure the number of clock ticks spent by the MCP operation for several pre-coded training video bit streams. Then, parameters N c , N x , N y and N v are counted for each individual pre-coded bit stream. Finally, we use the method of least squares to determine the best weights. More details about the training of weights for the MCP complexity model will be addressed in Sec. 2.3.6. 2.3.2 Computation of Cache Misses As mentioned in Sec. 2.3.1, the relationship of MVs of consecutive blocks can result in different decoding complexities. To differentiate the two cases in Fig.2.5, a simple cache model is proposed to compute the number of cache misses. Consider a cache that has T entries and each entry has L bytes. If the required data for the MCP operation from address addr i with data size s cannot be found in the cache, the number of cache misses is added by one and then cache entries are updated. The MCP operation consists of the luminance and the chrominance parts. Since the luminance MCP operation is similar to the chrominance one, the decoding complexity of the whole MCP operation is proportional to that of the luminance part. Thus, only luminance MCP is calculated in the cache model. The starting address and the size of memoryaccessforluminanceMCPareexaminedtocomputethenumberofcachemisses. 28 Table 2.1: Numbers of memory accesses, memory access sizes and the beginning address of the first memory access for an MxN block. Position of MV=(x, y) Number of Size per Beginning address memory accesses memory access of the 1st memory (integer, integer) N M addr (non-integer, integer) N M +5 addr−2 (integer, non-integer) N +5 M addr−2·width (non-integer,non-integer) N +5 M +5 addr−2·width−2 The luminance MCP operation for an MxN block with int-pel MV resolution can be viewed as the access to a 2D array with N rows and M columns. It takes N times of memory accesses where each memory access is M bytes since each luminance component ofapixelisusuallyrepresentedbyan8-bit(or1byte)integer. However,ifthey-direction MV points to a non-integer pixel position, two additional rows in the top of the MxN block and three additional rows in the bottom of this block are needed for the y-direction interpolation filters. Similarly, if the x-direction MV is at a non-integer pixel position, two additional columns to the left and three additional columns to the right of the block are needed for the x-direction interpolation filters. Table2.1liststhenumberofmemoryaccessesanditssizefordifferentMVresolutions. For a block with int-pel MV resolution, the starting address of the i-th memory access can be easily computed from the MV, the reference frame index and the decoded frame resolution. Given MV(x,y) and the reference frame index, denoted by ref index, the starting address of the first memory access, denoted by addr 1 , can be computed by addr 1 =addr and addr =ref index·width·height+y·width+x, (2.10) 29 wherewidthandheightarethedecodedframeresolution. Generallyspeaking,thestarting address of the i-th memory access can be obtained by addr i =addr 1 +(i−1)·width, i=1,2,··· , (2.11) The starting address of the first memory access is slightly different if MV points to a non-integer pixel position. If the y-direction MV points to a non-integer position, the starting address of the first memory access becomes addr 1 =addr−2·width, (2.12) because two additional rows in the top of the block are needed for the y-direction inter- polation filter. Similarly, the starting address of the first memory access becomes addr 1 =addr−2, (2.13) if the x-direction MV points to a non-integer position. The starting address of the first memory access for different MV resolutions is summarized in Table 2.1. The memory accesswhosesizeisgreaterthanthesizeofcacheentrycanbetreatedasmultiplememory accesses, and each of them has the memory access size equal to the size of cache entry. If required data cannot be found in the cache, then the number of cache misses is added by one and then cache entries are updated. Since memory usages for the MCP operation may be different among inter-prediction modes, it is desirable that different cache models are used to compute the number of cache misses. For P4x8 and P4x4 blocks, the number 30 of cache entries is 64 and the size of cache entry is 8 bytes. The number of cache misses for an MxN block rather than P4x8 and P8x4 is simply the number of memory accesses, which is N or N+5 if the y-direction MVs are of sub-pel accuracy. This setting results in good estimation results for pre-encoded bit streams as well as others. Experimental results will be shown in Sec. 2.6. 2.3.3 Computation of Other Quantities Beingsimilarto[47,1], theproposeddecodingcomplexitymodelconsistsofatermwhich is proportional to the number of MVs. This quantity can be obtained according to the inter-prediction mode. For example, P16x16 and P Skip modes have only one MV but P16x8, P8x16 have two MVs. The maximal MV number per MB is 16, which is the case one 16x16 MB is portioned into 16 4x4 blocks. The number of the x-direction (or the y-direction) interpolation filters can be computed as follows. For an MxN block, the numberofthex-direction(orthey-direction)interpolationfiltersisM·N ifonlytheMV along the x-direction (or the y-direction) is of sub-pel accuracy (i.e., the MV is of half- or quarter-pel accuracy). However, if both x- and y-direction MVs are of sub-pel accuracy, the number of the x-direction (and the y-direction) interpolation filters is decided by the accuracy of MV. Table 2.2 summaries the number of x-direction (and the y-direction) interpolation filters for different accuracy of MVs. 2.3.4 Decoding Complexity versus Frame Size The relationship between the decoding complexity and frame size is examined in this subsection, wherethedecodingplatformisassumedtohavetwo-levelcachemanagement. 31 Table2.2: Numbersofthex-directionandthey-directioninterpolationfiltersforanMxN block. MV(x,y) accuracy Number of x-direction Number of y-direction Int. Filters Int. Filters (Int-pel, Int-pel) 0 0 (Sub-pel, Int-pel) M ·N 0 (Int-pel, Sub-pel) 0 M ·N (Half-pel, Sub-pel) M ·(N +5) M ·N (Sub-pel, Half-pel) M ·N (M +5)·N (Quarter-pel, Quarter-pel) M ·N M ·N Usually,thesizeofthefirst-level(L1)cacheissmallsuchthatonlyafewdataofreference frame can be cached, but the second-level (L2) cache is larger and therefore it might be able tostore the whole reference frame. Considerthescenariothattheframesize is small such that the reference frame can be stored in the L2 cache. Under such a scenario, the L1 cache miss penalty, which can be viewed as the weight for the number of cache misses, duringtheMCPoperationisequaltotheaccesstimeoftheL2cache. Incontrast, theL1 cache miss penalty is equal to the L2 cache miss penalty if frame size is larger such that the required data for MCP operation cannot be obtained from the L2 cache. Therefore, the L1 cache miss penalty highly depends on the L2 cache hit rate [7], and the L2 cache hit rate is related to frame size. In other words, frame sizes play an important role in the L1 cache miss penalty and therefore the MCP decoding complexity. TwoexperimentswereconductedonPentium1.7GHzmobileCPUplatformtoverify this conjecture, where the SRF-ME mode was used in the encoder to generate all bit streams. First, we selected Foreman and Mobile CIF (352x288) sequences as training sequences, and pre-encoded 70 bit streams to train the weights for these four decoding complexity coefficients. More details about the training of weights will be addressed 32 in Sec.2.3.6. Then, weights are used to estimate the decoding complexities of Mobile D1 (720x480) 2560 Kbps and Tempete CIF 512 Kbps bit streams. The actual MCP decoding complexities for Mobile D1 2560 Kbps and Tempete CIF 512 Kbps bit streams are 442.85 ms and 147.32 ms while the estimated MCP decoding complexities are 375.03 msand142.07ms,respectively. TheweightstrainedbyCIFbitstreamscanprovidegood estimation results for CIF bit streams but cannot for D1 bit streams. Second, we re-train the weight coefficients from 14 D1 bit streams generated by Foot- ball and Susie D1 sequences. It is observed that the weight for the number of cache misses trained by CIF bit streams is different from that trained by D1 bit streams, but the other three weights are quite similar. This clearly demonstrates that frame sizes play an important role in the L1 cache miss penalty and therefore the weight for the number of cache misses in our model. While SRF-ME is allowed to generate video bit streams, the weightforthenumberofcachemissesvariesamongframesizes, andtherefore itmust be trained according to different frame sizes of bit streams. In the next subsection, the H.264 encoder uses MRF-ME to predict an MB, and the relationship between the MCP complexity and the distribution of selected reference frames are studied, which also plays an important role in cache management efficiency. 2.3.5 Multiple Reference Frames Inthissubsection,weconsideranenhancedMCPcomplexitymodelthatallowsMRF-ME in the H.264/AVC encoder. As assumed in the last section, the decoding platform has two-level cache management. 33 We argue that the distribution of selected reference frames should play a role in the MCP complexity model, too. Typically, the first reference frame is frequently selected as the best reference frame during MRF-ME while others are rarely used. Under such a scenario, the number of L2 cache misses could be small since the first reference frame can be stored in the L2 cache. However, if all reference frames are uniformly selected as the best reference frame, the number of L2 cache misses becomes larger. Thus, the distribution of selected reference frames can affect the L2 cache hit rate. As mentioned before, the L1 cache miss penalty (or α in our model) highly depends on the L2 cache hit rate [7]. This implies that the distribution of selected reference frames affects the L1 cache miss penalty (or α) and, therefore, the MCP complexity. To verify the above conjecture, two experiments were conducted on the Pentium 1.7 GHz mobile CPU platform. We modified the H.264/AVC encoder such that only one inter-prediction (or intra-prediction) mode was selected to code whole MBs. All bit streamsarewith{IPPP..P}picturestructureandcodedbyint-pelMRF-MEmode,where the maximal number of reference frames is five. In addition, Intel VTune performance analyzer 8.0 was used to measure the MCP complexity for all bit streams. In the first experiment, Susie and Football sequences of the D1 format (720x480) were selected as test sequences, and the modified H.264/AVC encoder encoded whole MBs of P pictures with the P16x16 mode. Different bit streams were generated under different bit rate constraints so that the distribution of selected reference frames may vary. MCPcomplexities(intermsofdecodingtimeinmilli-seconds)forthesebitstreams are shown in Table 2.3. We see that MCP complexities of the Football sequence for 64Kbps and 9.6Mbps are about the same. The distributions of the five referece frames 34 are also quite similar. In contrast, MCP complexities of the Susie sequence vary from 43.43ms to 64.16ms as its bit rate moves from 64Kbps to 9.6Mbps. The distribution of the five reference frames also changes. It is interesting to compare the MCP complexity oftheSusiebitstreamof2.56MbpsandthatofFootballbitstreams. Theyaresimilarto each other (about 50.7-52.7ms) while distributions of selected reference frames are very similar, too. Since the bit streams are all coded by the P16x16 mode with integer-pel MV resolu- tion, the two decoding complexity terms (i.e., N x andN y ) in Eq. (2.9) are zero. We may infer that the distribution of selected reference frames affects either the weight for the number of cache misses (i.e., α) or that for the number of MVs (i.e., μ). This is further clarified with Susie D1 as the test sequence. Table 2.3: MCP complexities (decoding time in milli-seconds) for bit streams with dif- ferent distributions of selected reference frames (RFs), where D1 Susie (S) and Football (F) sequences are coded under different bit rate constraints. Bit stream Time (ms) 1st RF 2nd RF 3nd RF 4th RF 5th RF S 9.6M 64.16 60.14% 16.22% 10.56% 6.45% 6.63% S 2.56M 50.68 76.84% 10.96% 6.52% 2.89% 2.80% S 256k 47.07 83.57% 8.50% 5.02% 1.43% 1.48% S 64k 43.43 89.14% 5.81% 3.91% 0.53% 0.61% F 9.6M 51.92 77.47% 9.79% 5.50% 3.56% 3.68% F 64k 52.67 77.67% 9.37% 6.63% 2.76% 3.57% In the second experiment, we consider two groups of bit streams, where distributions of selected reference frames are similar within the same group but different between two groups. Bitstreamswithsimilardistributionsofselectedreferenceframesweregenerated by the modified H.264/AVC encoder with the same quantization parameter (QP). The modified H.264/AVC encoder generated 7 bit streams encoded by the P16x16, P16x8, 35 P8x16, P8x8, P8x4, P4x8 and P4x4 inter-prediction modes, respectively. Since these 7 bit streams were generated from the same video sequence and QP, it is likely that the same reference frame is selected by the MRF-ME process to predict an MB and, as a result, they have similar distributions of selected reference frames. Each group of bit streams was used to train the weight for the number of cache misses and that for the number of MVs individually. We observe that these two groups have different weights for the number of cache misses but similar weights for the number of MVs. The above two experiments clearly demonstrate that the distribution of selected reference frames affects the L1 cache miss penalty and, thus, the weight α in the MCP complexity model. The weight α should be determined by taking the frame size and the distribution of selected reference frames into account. To estimate the MCP complexity of an MB for a specific frame size of bit streams, the desired weight α for the number of cache misses is selected from a set of the trained weights α according to the distribution of selected reference frames (the training process oftheMCPcomplexitymodelweightswillbeaddressedinthenextsubsection). First,the distribution of selected reference frames in coded video for recent 128 MBs is collected. Then, we select the weight α whose distribution of selected reference frames is most similar to that of the recent 128 MBs of underlying video as the desired weight. Since the distribution of selected reference frames is a monotonically decreasing function as given in Table 2.3, we use the distance of two entropy functions to measure the similarity between two distributions of selected reference frames. In other words, there are several sets of the trained weights α corresponding to different frame sizes of bit streams. The desired weight is selected from a set of the trained weights such that its entropy of the 36 distribution of selected reference frames is most similar to that of the distribution of selected reference frames for recent 128 MBs of underlying video. 2.3.6 Determination of Model Parameters The determination of parameters of the MCP complexity model is discussed in this sub- section. Weights α, β, γ and μ in Eq. (2.9) can be trained by the following steps. First, the H.264 encoder is used to encoder several bit streams as the training bit streams. Then, theIntelVTuneperformanceanalyzerisusedtomeasurethenumberofclockticks spent by the MCP operation for several pre-coded training video bit streams, and the MCP decoding complexity coefficients, N c , N x , N y and N v are counted for each individ- ual pre-coded bit stream. Finally, we use the method of least squares to determine the best weights. Here, Foreman and Mobile CIF sequences were first selected as training sequences to train the weight for the number of MVs (i.e., μ) and the weights for the numbers of x- and y-direction interpolation filters (i.e., β and γ). The seven inter-prediction modes, i.e.,P16x16,P16x8,P8x16,P8x8,P8x4,P4x8andP4x4,andthreedifferentquantization parameters(QP),i.e.,QP=5,15and25,wereusedbythemodifiedH.264encoder. There are21generatedbitstreamsforonevideosequenceaccordingly,wheretheSRF-MEmode was used in the encoder and all MVs are of int-pel accuracy. These 42 bit streams were used to determine the weights for the number of cache misses, i.e., α, and that for the number of MVs, i.e., μ,. Please note that the weight for the number of cache misses trained by the above steps is only suitable for the CIF bit streams with one reference frame only. After that, the modified H.264/AVC encoder used the seven inter-prediction 37 modes and two different QPs, i.e., QP=5 and 15, to generate 14 bit streams accordingly for one video sequence, where the SRF-ME mode was used in the encoder yet MVs are allowed to be of sub-pel accuracy. These 28 bit streams and the two weights determined by above steps, i.e., α and μ, are used to decide the weights for the numbers of x- and y-direction interpolation filters (i.e., β and γ). Since the weight for the number of cache misses changes among frame sizes and the distributions of selected reference frames, it must be trained with respect to frame size and the distribution of selected reference frames. The weights for the number of cache misses are trained by the following steps. First, the modified H.264 encoder is used to generate several bit streams coded by different QPs which are from small to large (i.e., from high to low bit rates), where the P16x16 mode and int-pel MRF-ME mode are used to code whole MBs. Since MVs with larger reference indices require more bits, it is likely that the first reference frame is selected as the best reference frame in low bit rate by the MRF-ME process. Therefore, the generated bit streams coded by different QPs can have variant distributions of selected reference frames. The minimal weight for the number of cache misses can be determined from the bit stream with one reference frame. Since only one reference frame is used in the ME process, it is similar to the case that the MRF-ME mode is used in the encoder to generate a bit stream in very low bit rate such that the firstreferenceframeisfrequentlyselectedasthebestonewhiletheotherreferenceframes are rarely used. Since the weight for the number of MVs has been decided, the weights forthenumberofcachemissesforthesebitstreamscanbeobtainedeasilyoncetheMCP complexities of these bit streams are measured by Intel VTune. 38 Table 2.4: Weight training bit streams for the MCP complexity model. Bit Coding Determined stream modes weights Step 1: 42 Foreman and Mobile int-pel MV, SRF-ME μ CIF bit streams Step 2: 28 Foreman and Mobile sub-pel MV, SRF-ME β, γ CIF bit streams Step 3: Foreman CIF bit streams int-pel MV, MRF-ME CIF α (QP=10, 12, ..., 34) Step 4: Susie D1 bit streams int-pel MV, MRF-ME D1 α (QP=10, 12, ..., 34) Step 5: Blue sky HD bit streams int-pel MV, MRF-ME HD α (QP=10, 12, ..., 34) Step 6: Foreman CIF bit stream int-pel MV, SRF-ME minimal CIF α (QP=10) Step 7: Susie D1 bit stream int-pel MV, SRF-ME minimal D1 α (QP=10) Step 8: Blue sky HD bit stream int-pel MV, SRF-ME minimal HD α (QP=10) In our work, Foreman CIF, Susie D1, and Blue sky HD sequences were selected to train three sets of the weights for the number of cache misses. Each CIF, D1 and HD bit streams contain 270, 68 and 14 frames, respectively, and were encoded by the modified H.264/AVC encoder with different QPs, (i.e., QP=10, 12, ..., 34), where five reference framesandtheP16x16modewereusedbytheMEprocess. Thesethreesetsofbitstreams were used to train the weights for the number of cache misses with respect to different distributions of selected reference frames for CIF, D1 and HD bit streams, respectively. Inaddition, theSRF-MEmodewasusedinthemodifiedH.264/AVCencodertogenerate one CIF, D1 and HD bit streams coded by the P16x16 mode with QP=10 so as to determine the minimal weights for the number of cache misses. Table 2.4 summaries the training process and the training bit streams for the MCP complexity model weights. 39 2.4 Complexity Model for Spatial Compensation Process (SCP) To model the SCP decoding complexity (i.e., the complexity for the decoding process of intra-prediction), several experiments were conducted on Windows XP Pentium 1.7 GHz mobile CPU platform. The modified H.264/AVC encoder was used to encode Foreman CIF, Football D1 and Blue sky HD(1920x1080) sequences. In the first experiment, whole MBs of these three video sequences were encoded by the I8MB DC prediction direction and chroma DC prediction direction. Then, Intel VTune 8.0 was used to measure the decoding complexities of these three test bit streams. Table 2.5 shows the decoding complexities of these three bit streams per MB in milli-seconds. It is observed that the decoding complexities of these three different frame sizes of bit streams are quite similar. These results inspire us that cache miss might have a little impact to the SCP decoding complexity for two reasons. First, the SCP operation can be treated as a 2D memory access from the neighboring blocks of current decoding MB to it. Intuitively, it is very likely that the neighboring blocks are cached into CPU cache so that cache miss rarely happens when the SCP operation is performed. Second, the SCP decoding complexities of those bit streams which are coded by the same intra type and prediction direction but are with different frame sizes are quite similar. Since the weight for the number of cache missesvariesamongframesizesasmentionedintheprevioussubsection,thisimpliesthat the number of cache misses should be very small while the SCP operation is executed. More experiments for different intra types and prediction directions were also conducted to further verify our conjecture, and the results are shown in Table 2.5. 40 Table 2.5: SCP decoding complexities per MB for Foreman, Football and Blue sky bit streams. Bit stream Chroma DC complexity (ms) I8MB DC complexity (ms) Foreman 0.0000797878 0.0004128101 Football 0.0000739376 0.0004171629 Blue 0.0000777426 0.0004096115 Bit stream Chroma DC complexity (ms) I4MB DC complexity (ms) Foreman 0.0000787090 0.0014232346 Football 0.0000821991 0.0014221589 Blue 0.0000839501 0.0014511104 Bit stream Chroma Plane complexity (ms) I8MB Mode3 complexity (ms) Foreman 0.0000777122 0.0004111089 Football 0.0000780858 0.0004102481 Blue 0.0000784071 0.0004014792 Since cache miss has less impact to the SCP decoding complexity, the decoding com- plexities of those bit streams coded by the same intra type and prediction direction are similar. Inourwork, theSCPdecodingcomplexityissimplymodeledasafunctionofthe numberofpredictiondirectionsforaspecificintratype. Sincedecodersmayhavedifferent implementations of intra-prediction, it is desirable that the decoding complexity of each prediction direction for a specific intra type is modeled individually. Mathematically, the decoding complexity of intra-prediction is expressed as C scp = 3 X i=0 N 16,i ·ω 16,i + 8 X i=0 N 8,i ·ω 8,i + 8 X i=0 N 4,i ·ω 4,i + 3 X i=0 N c,i ·ω c,i , (2.14) where C scp is the SCP decoding complexity, and N 16,i ,N 8,i ,N 4,i and N c,i represent the numbersofpredictiondirectionsfortheI16MB,I8MB,I4MBintratypesandMBchromi- nance component, respectively. ω 16,i ,ω 8,i ,ω 4,i and ω c,i are the weights, i.e., decoding complexities per MB, of corresponding prediction directions. 41 To measure the weights of prediction directions, the modified H.264 encoder is used to generate bit streams coded by at most two prediction directions with the same intra type. Since the DC prediction direction is the only direction which is allowed to code wholeMBsintheframe,theDCpredictiondirectionisselectedtocodethoseMBswithin frame boundary. For example, since the vertical prediction direction cannot be used to code those MBs in the upper frame boundary since those MBs are not available, the modified H.264/AVC encoder chooses the DC prediction direction to code those MBs within upper frame boundary. Therefore, the generated bit stream is coded by the ver- tical and DC prediction directions only. The modified H.264/AVC encoder generates bit streams for all prediction directions and intra types individually. Please note that only the bit stream coded by the DC prediction direction contains only one prediction direc- tion while the other bit streams contain two prediction directions (i.e. one of them is the DC prediction direction). Intel VTune performance analyzer 8.0 is used to measure the decoding complexities for all bit streams. The weight of the DC prediction direction is first obtained via dividing the measured decoding complexity of the bit stream with DC prediction direction only by the number of MBs. As a result, once the weight of the DC prediction direction is determined, the weights of other prediction directions can be obtained easily, too. The weights of chroma prediction directions can be determined by the similar approach above. 42 2.5 Applications of MCP and SCP Models The application of the proposed complexity model is briefly discussed in this section. Consider the scenario that an H.264 encoder generates a single bit stream for different decoding platforms without the use of any decoding complexity model. In contrast, the encoder may generate several bit streams for different decoding platforms separately according to their computational powers so that the resultant bit stream is easy to be decoded for a particular platform. For the latter case, the decoding complexity models havetobeintegratedintotheH.264encodersothattheencodercanestimatethepossible decoding complexity and then generate decoder-friendly bit streams. 2.5.1 RDO Process with Decoding Complexity Model In the conventional H.264 encoder, there are two rate-distortion optimization (RDO) processes. The first RDO process decides the optimal inter-prediction mode among the P8x8, P8x4, P4x8 and P4x4 modes for one 8x8 block while the second RDO process determines the optimal inter-prediction mode or intra type for one 16x16 MB among the P Skip, P16x16, P16x8, P8x16 modes I16MB, I8MB, I4MB and four 8x8 blocks whose optimal modes have been decided by the first RDO process. Both of the two RDO processesconsistofthreestepsasshowninFig. 2.6(a). Inthefirststep,eitherthemotion estimation (ME) process or the process which is used to determine the best prediction direction (PD) can be performed. If the current MB is considered to be coded by inter- prediction,theMEprocessisperformedtofindthebestMVforaspecificinter-prediction mode because different inter-prediction modes have a different number of MVs. On the 43 Figure 2.6: RDO process and RDO process with integrated decoding complexity model other hand, the RDO process needs to search the best PD for a particular intra type if current MB is considered to be coded by one of the I16MB, I8MB and I4MB intra types. Second, the RDO process performs the encoding (e.g., the spatial domain transform, quantizationandentropyencoding)andthenthedecodingprocessestogetreconstructed video frame so that the distortion and the bit rate can be obtained. Finally, the RDO process decides the best inter-prediction mode or intra type that yields the minimal RD cost. Since the proposed MCP decoding complexity model requires MV only for a specific inter-prediction mode, the estimated complexity can be computed if their weights are given. The proposed complexity model can be integrated in the ME stage as shown in Fig. 2.6(b). For a specific inter-prediction mode, the ME process in the conventional H.264 encoder generates integer- and sub-pel MVs for all reference frames and then decides the best MV that minimizes the RD cost. The proposed decoding complexity model estimates decoding complexities from a set of MVs of different pixel resolutions 44 and reference frames, and then eliminates those MVs whose decoding complexities are higher than that can allowed by the target decoding platform. Similarly, the proposed SCP decoding complexity model can be integrated in the stage which determines the best PD for a specific intra type. The proposed decoding complexity model estimates the decoding complexities of PDs for a given intra type, and then those prediction directions whosedecodingcomplexitiesarehigherthanthedecodingcomplexityconstraintoftarget decoding platform are eliminated. As a result, the RDO process shown in Fig. 2.6(b) searches the best MV and mode (or the best PD and intra type) among a smaller set of MVs and inter-prediction modes (or PDs and intra types) whose decoding complexities meetthedecodingcomplexityconstraint. Asaresult,theRDOprocesswiththedecoding complexity model can select either the best MV and inter-prediction mode or the best PD and intra type to minimize the RD cost and meet the decoding complexity constraint for the target platform. SincetheMVoftheP SkipmodeMBisdecidedbyitsneighboringMBs,theproposed model can also estimate the required decoding complexity of the P Skip mode. Another advantage for the RDO process shown in Fig. 2.6(b) is that the encoding complexity can be saved since the encoder does not have to perform the encoding and the decoding processes for all inter-prediction modes and intra types. Please note that the weights are different among different platforms, frame sizes, and distributions of selected reference frames. Therefore, they have to be measured for the target decoding platform first so that the RDO process with the decoding complexity model can estimate the decoding complexity for a given inter-prediction mode and MV as well as intra type and prediction direction. 45 2.5.2 Decoding Complexity Control The proposed algorithm to allocate decoding complexities for frame, MB and block is presented in this sub-section. As mentioned in the Section II, the choose of the optimal Lagrangian multiplier is difficult when the LM method is used to solve the optimization problemformodedecision. Sincethemodedecisionproblemwithdecoding(orencoding) complexity constraint is similar to the original mode decision problem, this difficulty (i.e. the selection of the optimal Lagrangian multiplier λ ∗ c for a given complexity constraint) also happens while the LM method is applied to find the best coding mode for the RD and encoding (or decoding) complexity optimization problems. Since there is very little knowledge regarding the statistical model of H.264 decoding (or encoding) complexity, the authors [50, 48, 49, 9] resort to experimental simulations to find the relationship between target decoding (or encoding) complexity constraint and the optimal Lagrangian multiplier λ ∗ c in (2.8). In our work, different approach is used. The target MB (or block) decoding complexity is decided first, and then the optimal coding mode is determined from a set of modes whose decoding complexities are less than or equal to the target MB (or block) decoding complexity. In order to get smooth playback quality, each frame has to be decoded within the same time duration. The time durationtocompleteframedecodingcanbereferredtoastheframedecodingcomplexity over the computational power of target decoding platform. Therefore, it is desirable that 46 the same decoding complexity is assigned to each frame. In our algorithm, the target decoding complexity for the k-th MB is given by: C mb,k =C mb,average ·k− k−1 X i=1 C 0 mb,i (2.15) where C mb,average is the average decoding complexity per MB, which is equal to the frame decoding complexity over the number of MBs, and C 0 mb,i is the allocated decoding complexity for the i-th MB. Since the actual MB decoding complexity cannot exceed the target MB decoding complexity given in (2.15), the remaining decoding complexities from previous MBs can be used in current MB and the sum of all MBs’ decoding complexities is the frame decoding complexity. Here, the target decoding complexities for those blocks within the same MBare given by the same approachabove. Thatis, the targetdecodingcomplexity of block k-th is given by C block,k =C block,average ·k− k−1 X i=1 C 0 block,i (2.16) where C block,average is the average decoding complexity per block, and C 0 block,i is the allo- cated decoding complexity for the i-th block. As a result, once target MB and its block decoding complexities are determined, the 8x8 block and 16x16 MB RDO processes with integrated decoding complexity model can estimate required decoding complexity for a givenMVandinter-predictionmodeaswellaspredictiondirectionandintratype. Then, those MVs and prediction directions whose decoding complexities are higher than the 47 target MB (or block) decoding complexities are eliminated by the 16x16 MB (or 8x8 block)RDOprocesses. Finally, thetwoRDOprocessesdecidethe optimalMVandinter- prediction mode (or prediction direction and intra type) which can minimize RD cost function and also satisfy the given decoding complexity constraint. Please note that the MV and inter-prediction mode (or prediction direction and intra type) leading to the least decoding complexity are selected if all MVs and modes (or prediction directions and intra types) have MB (or block) decoding complexities higher than the constraints. The experimental results for the H.264 encoder with integrated decoding complexity model and complexity control scheme will be shown in the next section. 2.6 Experimental Results We conducted experiments to verify the proposed MCP and SCP decoding complexity models given in Eqs. (2.9) and (2.14) and the decoding complexity control scheme given by Eqs. (2.15) and (2.16) on the PC platform. The CPU was Pentium mobile 1.7 GHz CPUwith512MbRAMandtheoperatingsystemwasWindowsXP.ThereferenceJM9.4 decoder [17] was optimized by the Intel MMX technology. The trained weights for the number of MVs and the number of x- and y-direction interpolation filters in the proposed MCP complexity model are: β =3.16·10 −6 , γ =2.74·10 −6 , μ=1.03·10 −4 . (2.17) 48 Figure2.7: WeightsforthenumberofcachemissesforCIF,D1andHDsequences, where the x-axis is the entropy of the distributions of selected reference frames. Fig. 2.7 shows the weights for the number of cache misses for CIF, D1 and HD sequences, wherethex-axisistheentropyofthedistributionsofselectedreferenceframes and y-axis is the weight for the number of cache misses. To measure the weights of the SCP decoding complexity model, we selected Foreman and Mobile sequences as the training sequences. Since there are four, nine and nine prediction directions for the I16MB, I8MB and I4MB intra types, respectively, the total number of prediction directions which can be used to code the MB luminance component 49 Table 2.6: Weights of four prediction directions used by the I16MB intra type and MB chrominance component. Intra mode I16MB Chroma Mode 0 (Vertical) 0.0001051712 0.0000741226 Mode 1 (Horizontal) 0.0001183389 0.0000991201 Mode 2 (DC) 0.0000891024 0.0000782764 Mode 3 (Plane) 0.0008194890 0.0000732053 is 22. The modified H.264/AVC encoder generated 22 bit streams for all prediction di- rections while the MB chrominance component was coded by the DC prediction direction only. Besides, themodifiedH.264/AVCencodergeneratedsixbitstreamsfortheremain- ing three chrominance prediction directions, i.e., horizonal, vertical and plane directions, where three of these six bit streams whose MB luminance components were coded by the I8MB DC prediction direction while the MB luminance components of the remaining three bit streams were coded by I4MB DC prediction direction. Therefore, there were 28 bit streams used to train the weights of the SCP complexity model for one training sequence. Please note that not all prediction directions can be used to code whole MBs. The modified H.264 encoder chooses the DC prediction direction to code those MBs in the frame boundary. Therefore, the bit streams were coded by at most two prediction directions with the same intra type. Tables 2.6 and 2.7 show weights of all prediction directions. Then, the weights were integrated into our SCP decoding complexity model and were used to estimate the SCP decoding complexities for several HD bit streams which were coded by intra-prediction modes only. The results are shown in Table 2.8. The errors are all less than 6%. 50 Table 2.7: Weights of nine prediction directions used by the I8MB and I4MB intra types. Intra mode I8MB I4MB Mode 0 0.0004036980 0.0012946847 Mode 1 0.0004022834 0.0015487100 Mode 2 0.0004121618 0.0014295389 Mode 3 0.0003996801 0.0016167432 Mode 4 0.0003983204 0.0015020752 Mode 5 0.0003856215 0.0016063213 Mode 6 0.0005677700 0.0016262611 Mode 7 0.0006584645 0.0016195389 Mode 8 0.0006843826 0.0016260349 Next,theseweightsoftheMCPandSCPdecodingcomplexitymodelswereintegrated into our complexity model to estimate the decoding complexities of the following test bit streams: eight CIF video sequences, four D1 video sequences and three HD sequences, wherethesebitstreamsarewith{IPPP..P}GOPstructureandwithratecontrolenabled. Each CIF, D1 and HD bit streams contain 250, 200 and 30 frames, respectively, and the number of reference frames used by the MRF-ME process is five. Tables 2.9, 2.10, 2.11, and 2.12 show the comparison between the estimated decoding complexity based on the proposed complexity model and the actual decoding complexity measured by the Intel VTune performance analyzer. We see that the proposed complexity model provides good estimation results for these test bit streams. The errors are within 10%. TheH.264/AVCencoderequippedwiththeproposeddecodingcomplexitymodeland decoding complexity control scheme can generate bit streams to meet different decoding complexity constraints. To validate this claim, experimental results are shown in Figures 2.8, 2.9, and 2.10, where all test bit streams were coded by the MRF-ME process with five reference frames. Each row in Figures 2.8, 2.9, and 2.10 shows the results for one test 51 Table 2.8: Comparisons of actual and estimated SCP decoding complexities (decoding timeinmilli-seconds)forBluesky, Toyandcalendar, SunflowerandRushhoursequences Blue Actual Estimated Estimation (Bit rate) complexity (ms) complexity (ms) error(%) 7.68M 62.46 60.60 2.98% 6.40M 62.74 60.80 3.10% 5.12M 64.48 62.06 3.75% 3.84M 64.15 62.40 2.72% 3.20M 66.25 62.79 5.23% 2.56M 65.67 62.79 4.38% Toy Actual Estimated Estimation (Bit rate) complexity (ms) complexity (ms) error(%) 7.68M 74.49 72.99 2.02% 6.40M 72.72 72.99 0.37% 5.12M 73.88 72.99 1.20% 3.84M 74.96 74.50 0.61% 3.20M 75.09 74.50 0.79% 2.56M 76.47 74.50 2.58% Sunflower Actual Estimated Estimation (Bit rate) complexity (ms) complexity (ms) error(%) 7.68M 64.00 62.85 1.80% 6.40M 66.79 63.85 4.40% 5.12M 66.37 65.42 1.43% 3.84M 67.49 65.50 2.94% 3.20M 68.42 65.50 4.26% 2.56M 66.93 66.10 1.24% Rush Actual Estimated Estimation (Bit rate) complexity (ms) complexity (ms) error(%) 7.68M 61.02 61.84 1.35% 6.40M 60.20 61.33 1.88% 5.12M 60.42 60.58 0.26% 3.84M 59.36 59.68 0.55% 3.20M 58.69 58.65 0.06% 2.56M 58.13 58.54 0.70% 52 Table 2.9: Comparison of actual and estimated decoding complexities (decoding time in milli-seconds) for Akiyo, Container, Flower and Foreman CIF sequences, where five reference frames were used in the ME process Bit Actual Estimated Estimation rate complexity (ms) complexity (ms) error(%) Akiyo 2048k 81.35 80.17 1.45% 1536k 79.46 76.38 3.88% 1024k 70.01 67.91 3.00% 512k 57.01 53.69 5.83% 256k 48.86 46.57 4.70% 128k 44.36 41.79 5.80% 64k 39.85 37.11 6.88% Container 2048k 98.48 95.85 2.67% 1536k 91.00 89.35 1.81% 1024k 79.59 82.11 3.16% 512k 67.56 70.81 4.80% 256k 54.07 57.42 6.18% 128k 44.18 46.63 5.55% 64k 36.75 36.82 0.19% Flower 2048k 145.81 137.66 5.58% 1536k 142.11 135.43 4.70% 1024k 136.27 130.75 4.05% 512k 120.20 118.88 1.10% 256k 101.54 103.50 1.94% 128k 76.26 79.14 3.77% 64k 51.99 55.07 5.92% Foreman 2048k 195.27 187.18 4.14% 1536k 183.10 179.31 2.07% 1024k 173.17 167.12 3.49% 512k 153.68 150.88 1.82% 256k 135.71 136.12 0.30% 128k 112.80 116.77 3.52% 64k 64.50 69.06 7.06% 53 Table 2.10: Comparison of actual and estimated decoding complexities (decoding time in milli-seconds) for Mobile, Silent, Stefan and Tempete CIF sequences, where five reference frames were used in the ME process Bit Actual Estimated Estimation rate complexity (ms) complexity (ms) error(%) Mobile 2048k 190.35 200.02 5.08% 1536k 187.98 196.59 4.58% 1024k 177.64 189.59 6.73% 512k 162.70 166.03 2.04% 256k 141.87 144.92 2.15% 128k 105.50 110.18 4.44% 64k 78.53 80.62 2.65% Silent 2048k 109.34 100.61 7.98% 1536k 102.61 93.76 8.63% 1024k 94.69 86.93 8.20% 512k 80.77 76.52 5.26% 256k 71.19 67.34 5.41% 128k 58.66 57.60 1.82% 64k 42.25 43.59 3.18% Stefan 2048k 165.56 163.49 1.25% 1536k 167.20 159.69 4.49% 1024k 158.17 154.53 2.30% 512k 147.05 143.89 2.15% 256k 129.49 130.74 0.97% 128k 106.57 108.35 1.67% 64k 72.62 76.96 5.98% Tempete 2048k 206.85 218.37 5.57% 1536k 205.75 209.44 1.79% 1024k 184.06 192.19 4.41% 512k 160.58 162.60 1.25% 256k 137.23 141.92 3.42% 128k 98.68 103.98 5.38% 64k 39.33 39.71 0.97% 54 Table 2.11: Comparison of actual and estimated decoding complexities (decoding time in milli-seconds) for D1 sequences, where five reference frames were used in the ME process Bit Actual Estimated Estimation rate complexity (ms) complexity (ms) error(%) Football 9.6M 416.01 399.66 3.93% 7.68M 392.59 385.59 1.79% 5.12M 366.89 363.28 0.99% 2.56M 324.37 325.21 0.26% 1.024M 240.55 245.80 2.18% 512k 185.63 193.25 4.10% 256k 155.72 155.11 0.39% Ship 9.6M 505.12 477.34 5.50% 7.68M 417.22 397.39 4.75% 5.12M 344.36 325.53 5.47% 2.56M 254.42 240.96 5.29% 1.024M 171.55 158.66 7.52% 512k 143.70 131.81 8.27% 256k 123.76 116.15 6.14% Susie 9.6M 756.70 715.88 5.39% 7.68M 703.99 667.34 5.21% 5.12M 585.09 553.97 5.32% 2.56M 460.04 438.08 4.77% 1.024M 369.80 367.24 0.69% 512k 324.89 332.71 2.41% 256k 277.23 286.91 3.49% Mobile 9.6M 652.32 621.07 4.79% 7.68M 624.81 605.17 3.14% 5.12M 606.39 581.11 4.17% 2.56M 553.70 538.45 2.75% 1.024M 464.98 473.46 1.82% 512k 365.76 377.88 3.31% 256k 323.28 325.76 0.77% 55 Table2.12: Comparisonsofactualandestimateddecodingcomplexities(decodingtimein milli-seconds) for HD sequences, where five reference frames were used in the ME process Bit Actual Estimated Estimation rate complexity (ms) complexity (ms) error(%) Blue 29.40M 433.15 434.65 0.35% 20.48M 382.45 386.59 1.08% 15.36M 375.16 363.38 3.14% 10.24M 344.81 332.80 3.48% 5.12M 318.81 314.60 1.32% 2.56M 275.60 278.09 0.90% 1M 246.85 253.63 2.75% Toy 29.40M 625.27 608.06 2.75% 20.48M 575.50 569.29 1.08% 15.36M 542.62 537.45 0.95% 10.24M 482.30 487.25 1.03% 5.12M 420.02 415.97 0.97% 2.56M 354.32 369.51 4.29% 1M 294.12 303.47 3.18% Sunflower 29.40M 615.72 612.96 0.45% 20.48M 577.23 573.61 0.63% 15.36M 549.49 542.85 1.21% 10.24M 487.28 500.94 2.80% 5.12M 416.33 434.14 4.28% 2.56M 377.44 399.62 5.88% 1M 274.64 289.36 5.36% 56 bit stream. The x-axis is the decoding time and the y-axis is the deviation in complexity control(column1),thecomplexitysaving(column2)andthecodingperformance(column 3). The first point in the x-axis corresponds to the case without decoding complexity control. As compared with sequences without decoding complexity control, the CIF Foreman bit stream with the target decoding complexity at 130 ms loses 0.43 dB in PSNR but saves30.04%indecodingcomplexity. TheD1Foorballbitstreamwiththetargetdecoding complexity at 220 ms loses 0.22 dB in PSNR but saves 29.35% in decoding complexity. Finally, theHDSubflowerbitstreamwiththetargetdecodingcomplexityat290msloses 0.2 dB in PSNR but saves 26.33% in decoding complexity. TheH.264/AVCencodercangeneratedecoder-friendlybitstreamsusingonereference frame to meet different decoding complexity constraints, too. To support this claim, experimental results are shown in Figures 2.11, 2.12, and 2.13. Since all resultant bit streams use a single reference frame, these bit streams are particularly interesting in mobile devices, where the decoding platforms have limited memory resource and power constraints. The results for CIF bit streams are shown in Figure 2.11 while the results for D1 bit streams are shown in Figure 2.12, respectively. The results for HD Blue sky, and Toy and calendar sequences are shown in Figure 2.13. Each row corresponds to the experimental results of one test bit stream. The x-axis is the decoding time and the y- axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). The first point in the x-axis corresponds to the casewithoutdecodingcomplexitycontrol. Ascomparedwithsequenceswithoutdecoding complexity control, the CIF Tempete bit stream with the target decoding complexity at 57 80 ms loses 0.54 dB in PSNR but saves 37.95% in decoding complexity. The D1 Ship bit stream with the target decoding complexity at 150 ms loses 0.08 dB in PSNR but saves 27.31% in decoding complexity. Finally, the HD Toy and calendar bit stream with the target decoding complexity at 300 ms loses 0.33 dB in PSNR but saves 28.48% in decoding complexity. Based on the above experimental results, we see clearly that the H.264/AVC en- coderwiththe proposeddecoding complexitymodel andthe decodingcomplexitycontrol scheme can generate bit streams to meet different decoding complexity constraints. The errors between actual decoding complexities and target decoding complexity constraints are all less than 10%. Thus, the H.264/AVC encoder with the decoding complexity con- trolschemecangeneratebitstreamstomeettargetdecodingcomplexityconstraintswell. Besides, the resultant bit streams can save a significant amount of decoding complexity at the cost of some PSNR loss. This is particular interesting in a mobile broadcasting environment, where multiple mobile devices will get broadcast/streaming video in real time. 2.7 Conclusion An improved motion compensation decoding complexity model and its application to H.264/AVC encoding were presented in this work. This model helps the encoder select propermotionvectorsandinter-predictionmodesorpredictiondirectionsandintratypes, and then generates a video bit stream that is most suitable for a receiver platform with some hardware constraint. As a result, the coded bit stream can balance the tradeoff 58 Figure 2.8: Decoding complexity control for CIF bit streams with five reference frames: Flower (row 1), Foreman (row 2), Stefen (row 3) and Tempete (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). 59 Figure 2.9: Decoding complexity control for D1 bit streams with five reference frames: Football (row 1), Susie (row 2), Ship (row 3) and Mobile (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). 60 Figure 2.10: Decoding complexity control for HD bit streams with five reference frames: Blue sky (row 1), Toy and calendar (row 2), and Subflower (row 3), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). 61 Figure 2.11: Decoding complexity control for CIF bit streams with one reference frame: Flower (row 1), Foreman (row 2), Stefen (row 3) and Tempete (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). 62 Figure 2.12: Decoding complexity control for D1 bit streams with one reference frame: Football (row 1), Susie (row 2), and Ship (row 3), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). 63 Figure 2.13: Decoding complexity control for HD bit streams with one reference frame: Blue sky (row 1), and Toy and calendar (row 2), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). between the RD requirement as well as the computational power of the decoding plat- form. The proposed decoding complexity model was verified experimentally. We showed that the model provides fairly good estimation results for various test bit streams. The performance of the H.264/AVC codec with the proposed decoding complexity model and the proposed decoding complexity control scheme was presented. It was shown that the H.264/AVC encoder can generate bit streams according to different decoding complex- ity constraints accurately. The resultant bit streams can be decoded at a much lower complexity at the cost of small PSNR loss. 64 Chapter 3 CABAC Decoding Complexity Modeling and Its Applications 3.1 Introduction H.264/AVC[53,35]isthelatestvideocodingstandardproposedbyITU-TandISO/IEC. It has been selected as the video coding tool in HD-DVD and Blue-ray specifications. Since H.264 provides various coding modes to improve the coding gain, its encoder may searchallpossiblemodestodecidethebestmodethatminimizestherate-distortion(RD) cost. Due to the use of a larger set of coding modes, the H.264 decoding complexity is about 2.1 to 2.9 times more than the H.263 decoder [8]. For some applications, the coded video bit stream will be decoded in portable consumer electronics devices. Under such scenario, the reduction in decoding complexity so as to save the power becomes a critical issue. On possible to achieve power saving purpose of the H.264 decoder is that the H.264 encoder can generate decoder-friendly bit streams. That is, the H.264 encoder has a tar- get decoding platform in mind and can generate a bit stream that is easy to be decoded 65 in that platform. This motivates us to study the decoding complexity model and its applications. Once the decoding complexity model is available, this model can be used by the H.264 encoder to estimate the decoding complexity for various coding modes and then select the best mode to balance the rate-distortion (RD) and decoding complexity tradeoff. Since a general decoding complexity model is too broad to cover, it is desirable that the most computational expensive process is considered first. This work mainly con- sidersthedecodingcomplexitymodelofcontext-basedadaptive binaryarithmeticcoding (CABAC),whichisanentropycodingtoolinH.264,forthefollowingreasons. Fist,ithas been observed that entropy decoding process demands a higher computational complex- ity for high bit rate video streams (e.g. high definition contents) due to more non-zero quantized transformed coefficients (QTCs) and motion vectors (MVs). Second, entropy decoding process becomes the highest computationally expensive process while the com- putation of other decoding processes is moved to graphic processor unit (GPU)[41]. For example, Nvidia GeForce7 and ATI Radeon X1300 series and above GPUs support H.264 hardware decoding for motion compensation and de-blocking filter in their GPUs. Un- der such scenario, the complexity of entropy decoding becomes the main bottleneck, and thereforethecomplexityreductionofentropydecodingshouldbeaddressedfirst. Finally, H.264providestwoentropycodingtools. Oneistheso-calledcontext-basedadaptivevari- able length coding (CAVLC) while another entropy coding tool adopted in H.264 is the so-called context-based adaptive binary arithmetic coding (CABAC). Although CABAC can achieve better RD performance than CAVLC, the CABAC decoding complexity is higher, too [35]. For those reasons, the CABAC decoding complexity model in H.264 is mainly considered in our work. 66 The decoding complexity models have been studied in the past and can be classified into two categories: system-specific complexity models [15, 45, 46, 50, 48, 49, 25, 26] and generic complexity models [47, 1]. The MPEG-4 system-specific video complexity verifier (VCV) model was first described in [15], where the numbers of boundary macro- blocks(MBs)andnon-boundaryMBsdecodedpersecondareestimatedandthedecoding complexity can be modeled as a function of these two numbers. However, the decoding complexities of the MBs coded by different coding modes in MPEG-4 can be different so that the VCV model is not very accurate. To address this shortcoming, Valentim et al. [45, 46] proposed an enhanced decoding complexity model for MPEG-4 video. By considering the fact that MBs encoded with different coding modes have different decoding complexities, they use the maximal decoding time to measure the decoding complexities of MBs encoded by different coding modes individually. Then, the total decoding complexity of this bit stream is the sum of each individual MB’s complexity. The H.264 system-specific complexity models were presented in [50, 48, 49] and [25, 26]. The decoding complexity model for H.264 motion compensation process (MCP) was first described in [50, 48, 49], and the MCP decoding complexity is simply modeled as a functionofnumberofinterpolationfilters. Lateron,theH.264MCPdecodingcomplexity model was improved by [25, 26], which considers not only the number of interpolation filters but also the relationship between MVs, which has the impact to the efficiency of cachemanagementandthereforethedecodingcomplexity. Thegenericcomplexitymodels forvariablelengthdecoding(VLD)werereportedin[47,1], wherethesumofmagnitudes of the non-zero QTCs and the sum of the run lengths of zero QTCs are estimated and then the entropy decoding complexity can be modeled as these two parameters. 67 The above complexity models are however not suitable for H.264 entropy decoding for several reasons. First, the existing system-specific models are used to estimate the MCP decoding complexity, and therefore the entropy decoding complexity cannot be well estimated by these models. Second, although the generic complexity models can estimate the entropy decoding complexity for those decoders using VLD schemes, the genericcomplexitymodelsareinaccuratefortheH.264decoder. SincetheH.264CABAC decodingschemeismorecomplicatedthanVLDschemes, itsdecodingcomplexitycannot be well modeled by the generic complexity models. Furthermore, CABAC can be used to encode whole syntax elements in H.264 including QTCs, MVs, MB types and other flags. This flexibility makes the decoding complexity modeling more challenging as compared with that in [47, 1]. A new complexity model for H.264 CABAC decoding scheme will be proposed in our work. Therestofthischapterisorganizedasfollows. AH.264CABACdecodingcomplexity model is proposed in Section 3.2. The applications of the proposed decoding complex- ity model and its integration with an H.264 encoder are discussed in Section 3.3. The proposed decoding complexity model and its application are verified experimentally in Section 3.4. Finally, concluding remarks are given in Section 3.5. 3.2 Proposed CABAC Decoding Complexity Model Our CABAC decoding complexity model is inspired from the CABAC decoding process. Therefore,theCABACencodinganddecodingprocessesandtheprocesstocodeQTCsin 68 Figure 3.1: CABAC encoding and decoding processes H.264arefirstreviewedinthissection. Then,thedecodingcomplexitymodelisdeveloped accordingly. 3.2.1 BackgroundReviewofCABACEncodingandDecodingProcesses InH.264,CABACcanbeusedtocodeallsyntaxelements,suchasQTCs,MVs,reference frame indices and other flags [33]. Fig. 3.1 shows the CABAC encoding and decoding processes. The CABAC encoding process consists of at most three stages, which are binarization, context modeling and binary arithmetic coding (BAC). First, an non-binary syntax element is mapped into a binary sequence in the bina- rization stage, where the each symbol in the binary sequence is referred to as bin, while a binary syntax element bypasses the binarization stage and directly fed into the context modeling stage. There are four basic binarization schemes in H.264, which are (1) unary (2) truncated unary (TU) (3) the k-th order Exp-Golomb code (EGk) (4) fixed-length code. Inadditiontothese fourbasicbinarizationschemes, the firstandthirdbinarization schemes with a cut-off value S (UEGk) can be combined together to code an non-binary 69 syntax element, such as the motion vector differences (MVDs, i.e. the differences be- tween MVs and predicted MVs) and the absolute values minus one of QTCs. To be more specific, the binary sequence generated by the UEGk binarization scheme consists of the prefix and suffix parts if the value of syntax element C is larger than the cut-off value S. The prefix part of the binary sequence is generated by the unary binarization scheme to represent the value S while the suffix part is generated by the EGk binarization scheme to represent the value (C-S). However, if the value of syntax element C is less than or equal to S, the resultant binary sequence only includes the prefix part which is generated by the unary binarization scheme. After the binarization stage, each bin is fed into the context-modeling stage to decide the probability model which will be used by the BAC to code this bin. Finally, the BAC generates bit streams and also updates the context model. In H.264, there is a so-called bypass BAC, which is used to speed up the encoding speed when the distribution of coded bins is approximately uniform. For example, the sign bits of QTCs and the suffix part of the binary sequence generated by the UEGk binarization scheme are coded by the bypass BAC. The CABAC decoding process is shown in Fig. 3.1(b). Bit streams are fed into either the regular binary arithmetic decoding (BAD) or the bypass BAD, depending on the distribution of coded bins. Then, the inverse binarization process is utilized to reconstruct the non-binary syntax element. Finally, the context model is updated. The process to code QTCs in one 4x4 block consists of three syntax stages as shown in Fig. 3.2. First, one bit variable code block flag is used to indicate whether there exists non-zeroQTCsinthisblock. IntheSignificant map stage,significant coeff flag isutilized toindicatethepositionofthenon-zeroQTCaftermappingthe2DQTCarrayintoan1D 70 Figure 3.2: CABAC encoding process for quantized transformed coefficients array with zig-zag scan order. If there exists an non-zero QTC in the 1D array, a further one bit variable last significant coeff flag used to indicate whether the current QTC is the last non-zero QTC in this block is coded. Finally, the absolute value minus one of this QTC and its sign bit are coded in the Level information stage. Please note that the binarization stage is skipped if syntax elements are binary. As a result, these one bit variables, code block flag, significant coeff flag and last significant coeff flag, are directly fed into the context-modeling stage and then coded by the regular BAC while the sign bits of QTCs are coded by the bypass BAC. In addition to QTCs, CABAC can be also used to code other syntax elements. These one bit variables, mb skip flag and transform size flag, are used to indicate whether cur- rent MB is skipped and the size of spatial domain transform (i.e. 4x4 or 8x8 transforms), 71 respectively. For MB type, a three bit variable is used to indicate whether current MB is coded by P16x16, P16x8, P8x16 or P8x8 inter-prediction modes while a two bit variable is needed to identify whether current MB is coded by I4MB or I16MB intra-prediction modes. MVs are first estimated by the motion vector predictors, and then the MVD is fed into the UEGk binarization scheme. The prefix and suffix parts of the generated binarysequencearecodedbytheregularandbypassBACs, respectively. Similarly, intra- prediction type is first estimated by the intra-prediction predictor. Then one bit variable is used to indicate whether the actual and estimate intra-predictions are equal. If the actual and estimated intra-prediction types are not equal, their difference is fed into a three bit fixed-length binarization scheme and coded by the regular BAC. The reference frame index is binarized by the unary binarization scheme and then coded by the regular BAC. More detailed information for CABAC encoding process for other syntax elements can be found in [33]. 3.2.2 CABAC Decoding Complexity Model The proposed CABAC decoding complexity model consists of two parts. One is used to model the decoding complexity for source data, i.e. QTCs, while another model is for headerdata,suchasMVDs,referenceframeindices,MBtypesandintra-predictiontypes. It can be seen that the execution time of the CABAC decoding process in Fig. 3.1(b) depends on the number of loops, i.e. the number of executions for BAD. Therefore, it is desirable that the number of executions for BAD is included in the CABAC decoding complexitymodel. SincethecomplexityofbypassBADischeap,ourmodelonlyconsiders 72 the number of executions for regular BAD. The number of executions for regular BAD is an important parameter in our decoding complexity model. The CABAC decoding complexity for source data, C src , is modeled as a function of number of regular BAD executions(N bad,1 ), the number of non-zero QTCs (N nz ), the position of the last non-zero QTC (P nz ), and the number of non-skipped MBs (N mb ). Mathematically, it can be written as C src =ω bad,1 ·N bad,1 +ω nz ·N nz +ω p ·P nz +ω mb ·N mb , (3.1) where ω bad,1 , ω nz , ω p and ω mb are weights for these four decoding complexity variables, respectively. The number of regular BAD executions is used to model the decoding complexity in the level information stage while the remaining three terms are considered to model the decoding complexity for code block flag and that in the significant map stage. The CABAC decoding complexity model for the header data, C hdr , is modeled as a function of the number of regular BAD executions (N bad,2 ), the number of MVs (N mv ), and the number of reference frames (N ref ). Mathematically, it can be expressed as C hdr =ω bad,2 ·N bad,2 +ω mv ·N mv +ω ref ·N ref , (3.2) where ω bad,2 , ω mv and ω ref are weights for these three decoding complexity variables, respectively. Similar to the CABAC decoding complexity model for the source data, the numberofBADexecutionsisincludedinthemodelfortheheaderdata,too. Thenumber 73 of regular BAD executions is used to model the decoding complexities for those syntax elements, such as the MB type, the MB subtype, the transform flag, the intra-prediction type, and MVDs. The number of MVs and the number of reference frames are used to model the complexities when decoding MVs and reference frame indices, respectively. The number of regular BAD executions can be calculated easily in our model. As mentioned before, a non-binary syntax element is fed into the binarization process to generateabinarysequence. ThelengthofbinarysequencedeterminesthenumberofBAD executions. Sincethebinarizationprocessisusuallyimplementedbytablelookuporsome additive and shift operations [5], the number of regular BAD executions can be easily obtainedonceanon-binarysyntaxelementisgiven. Forexample, theUEG0binarization scheme with a cut-off value, S, equal to 14 is used to generate binary sequences for the absolute values minus one of QTCs in the level information stage. The prefix part of the binary sequence is coded by the regular BAC while its suffix part is coded by the bypass BAC. Thus, the number of regular BAD executions is equal to the length of the prefix part of the binary sequence. In other words, the number of execution for the regular BAD is the minimum between the cut-off value and the value of the non-binary syntax element when the UEGk binarization scheme is used to generate the binary sequence. The weight coefficients in our model can be obtained by the following steps. First, several pre-encoded bit streams are selected and Intel VTune performance analyzer is used to measure the number of clock ticks spent by the CABAC decoding. Second, the numbers of these decoding complexity coefficients are counted individually for those pre- encodedbitstreams. Finally, theconstrainedleastsquaremethodisusedtofindthebest 74 fitting of the weight coefficients. The proposed CABAC decoding complexity model will be verified experimentally in Section 3.4. 3.3 Decoder-friendly H.264 System Design The application of the proposed decoding complexity model is briefly discussed in this section. ConsiderthescenariothatanH.264encodergeneratesasinglebitstreamfordif- ferentdecodingplatformswithouttheuseofanydecodingcomplexitymodel. Incontrast, the encoder may generate several bit streams for different decoding platforms separately according to their computational powers so that the resultant bit stream is easy to be decoded for a particular platform. For the latter case, the decoding complexity models havetobeintegratedintotheH.264encodersothattheencodercanestimatethepossible decoding complexity and then generate decoder-friendly bit streams. 3.3.1 Rate-Distortion and Decoding Complexity (RDC) Optimization IntheconventionalH.264/AVCencoder,therearetworate-distortionoptimization(RDO) processes. The first RDO process decides the optimal inter-prediction mode among the P8x8, P8x4, P4x8 and P4x4 modes for one 8x8 block while the second RDO process determines the optimal inter- or intra-prediction mode for one 16x16 MB among the P Skip, P16x16, P16x8, P8x16, I16MB, I8MB and I4MB modes and four 8x8 blocks whose optimal inter-prediction modes have been decided by the first RDO process. BothRDOprocessesconsistofthefollowingsteps. First,sincedifferentinter-prediction modeshaveadifferentnumberofMVs, theRDOprocessperformsthemotionestimation (ME)tofindthebestMVifthecurrentMBistobecodedbyinter-predictionmodes. On 75 the other hand, the RDO process finds the best intra-prediction type if the MB is to be coded by intra-prediction modes. Second, the RDO process performs the encoding (e.g., the spatial domain transform, quantization and entropy encoding) and then the decoding processes to get reconstructed video frame so that the distortion and the bit rate can be obtained. Then, the RDO process evaluates the following RD cost function: J rd (blk i |QP,m)=D(blk i |QP,m)+λ m ·R(blk i |QP,m), (3.3) where D(blk i |QP,m) and R(blk i |QP,m) are the distortion and the bit rate of block blk i for given coding mode m and quantization parameter (QP), respectively. Finally, the RDO process finds the best mode that yields the minimal RD cost. The minimiza- tion of the RD cost function in (3.3) implies that the RDO process determines the best mode that minimizes distortionD(blk i |QP,m) as well as meets rate constraintR st,i (i.e., R(blk i |QP,m)≤R st,i ). Note that the Lagrangian multiplier,λ m , in (3.3) depends on the rate constraint and, therefore, it can be expressed as a function of QP. TheoriginalRDoptimizationproblemcanbegeneralizedbyconsideringthejointRD anddecodingcomplexityoptimization(RDC).Wecanintroducethedecodingcomplexity cost into the original RD cost function (3.3) via J rdc (blk i |QP,m)=D(blk i |QP,m)+λ m ·R(blk i |QP,m)+λ c ·C(blk i |QP,m), (3.4) where C(blk i |QP,m) is the decoding complexity of block blk i for a given coding mode m and QP. Being similar to λ m for the rate constraint, the Lagrangian multiplier, λ c , 76 depends on the decoding complexity constraint. The algorithm that can select proper λ c for a given decoding complexity constraint C st,i (i.e., C(blk i |QP,m) ≤ C st,i ) will be discussed in the following subsections. 3.3.2 Relationship Between Bit Rate and Decoding Complexity Since the relationship between the bit rate and the decoding complexity helps select the proper Lagrangian multiplier, λ c , for a given decoding complexity constraint, their relationship is first investigated in this subsection. Asmentionedearlier, theCABACencodingprocessconsists ofthree stages: binariza- tion, context-modeling and binary arithmetic coding. The length of the binary sequence generated in the binarization stage determines the bit number for a non-binary syntax el- ement. Thebitnumberforanon-binarysyntaxccanbeexpressedR =L·h 2 (c), whereL and h 2 (c) are the length and the average bit rate of the binary sequence for a non-binary syntax element c, respectively. Now, consider the number of regular BAD executions, which is an important parameter in our decoding complexity model. In H.264/AVC, the UEGK binarization scheme with a cut-off value, S, is used to generate the binary se- quences of non-binary syntax elements, such as MVDs and the absolute values minus one of QTCs. The number of regular BAD executions is equal to the length of the generated binary sequence if the value of a non-binary syntax element is less than the cut-off value. Since the value of a non-binary syntax element is rarely larger than the cut-off value, the numberofregularBADexecutionsN bad isproportionaltothebitnumberofanon-binary syntax element, i.e., R =L·h 2 (c)=N bad ·h 2 (c). 77 The relationship between existing rate models and our CABAC decoding complexity model is also studied below. Consider the H.264/AVC rate models in [20], which consist of the source bit and the header bit parts. The relationships of these two rate models and our CABAC complexity models are discussed individually. The rate model for source bits is modeled as a function of the quantization step (QS) and expressed as R src =α· SATC QS , (3.5) where SATC is the sum of the absolute values of transformed coefficients (SATCs) for one 4x4 blocks, and α is the source rate model parameter. Since SATC QS can be seen as the sum of absolute values of QTCs for one 4x4 block, the rate model for source bits can be further written as R src =α· P i |QTC i |, where QTC i represents the i-th QTC in one 4x4 block. Consider the proposed CABAC decoding complexity model for the source data in Eq. (3.1). In the high bit rate case, the number of regular BAD executions, which is used to model the decoding complexity in the level information stage, dominates the total decoding complexity while the rest three terms used to model the decoding complexities for those one bit variables in the significant map stage are less significant. Since H.264/AVC adopts the UEGk binarization scheme with the cut-off value S = 14 to generate the binary sequences for QTCs, the number of regular BAD executions is equal to P i min(|QTC i |,14) or P i |QTC i | in most cases. Thus, the CABAC decoding complexity for the source data is proportional to the source rate. 78 The header bit number is modeled as a function of the number of MVs (N mv ), the number of non-zero MV elements (N nzMVe ) and the number of intra MBs (N intra ) in the rate model [20], which is written as R hdr =γ·(N nzMVe +ω·N mv )+N intra ·b intra , (3.6) whereγ andω aretheheaderratemodelparameters,andb intra istheaveragedheaderbit number for intra MB. Consider the proposed CABAC decoding complexity model for the header data in Eq. (3.2). The proposed complexity model for the header data includes the same term, i.e., the number of MVs, as the rate model for header bits. In addition, the number of regular BAD executions is used to model the decoding complexities of MVDs and intra-prediction types for inter and intra MBs, respectively. As mentioned before, the number of regular BAD executions is proportional to the bit number for a non-binary syntax element (i.e., MVD and intra-prediction types in this case). Thus, the proposed CABAC complexity model for the header data should be proportional to header bits, too. Fig. 3.3 shows the relationships between the actual source (or header) bit rate and the CABAC decoding complexity for the source (or header) data for various high bit rate bit streams. The experimental results clearly demonstrate that the CABAC decoding complexity is proportional to the bit rate. The linear relationship between the bit rate and the CABAC decoding complexity will be used in our decoding complexity control scheme in the following subsection. 79 Figure 3.3: The relationship between the bit rate and CABAC decoding complexity for various high bit rate bit streams . 80 3.3.3 CABAC RDC Optimization and Decoding Complexity Control Being similar to rate control, decoding complexity control is a process to determine some control parameters so that the encoder equipped with the decoding complexity control scheme can generate bit streams to meet the target decoding complexity constraint. On theotherhand,akeyproblemofRDCoptimizationistodecidetheLagrangianmultiplier, λ c inEq. (3.4), sothattheRDOprocesscandecidethebestcodingmodethatminimizes the distortion and meets the rate and decoding complexity constraints simultaneously. In this subsection, the selection of proper λ c for the RDC optimization is addressed first. Then, the CABAC decoding complexity control algorithm is developed accordingly. Since the CABAC decoding complexity is proportional to bit rate, bit rate can be further expressed as a function of the CABAC decoding complexity. Let Rc(·) be the rate-complexity (RC) mapping function, which can be used to estimate the bit rate for a given decoding complexity. The RDC problem in (3.4) can be either reduced to the rate- distortion (RD) problem if R st,i ≤ R c (C st,i ) or reduced to complexity-distortion (CD) problem if R c (C st,i ) < R st,i because the complexity constraint is tighter than the rate constraint. Mathematically, the CD optimization problem can be written as minD(blk i |QP,m)s.t.C(blk i |QP,m)≤C st,i ≡ min{D(blk i |QP,m)+λ c ·C(blk i |QP,m)} (3.7) 81 or equivalent to minD(blk i |QP,m)s.t.R c (C(blk i |QP,m))≤R c (C st,i )≡ min{D(blk i |QP,m)+λ 0 m ·R c (C(blk i |QP,m)} (3.8) We tend to solve the CD optimization problem in (3.8) rather than in (3.7) since the correspondingLagrangianmultiplierλ 0 m in(3.8)iseasiertobeobtained. TheLagrangian multiplier λ 0 m can be determined by the following steps. First, a QS can be computed by the linear rate model [20] once the estimated rate R c (C(B i |m,QP)) for a given CABAC decodingcomplexityisobtained. Then,theQSisusedtodetermineQP inH.264. Finally, the Lagrangian multiplier λ 0 m can be obtained by the formula λ 0 m = 0.85 · 2 (QP−12)/3 suggestedby[52]. TheabovestepsimplythattheCDoptimizationproblemcanbefurther converted into the RD optimization problem, and the new RD optimization problem has tighter rate constraint than the original rate constraint. In other words, lower rate is desirablesoastoreducetheCABACdecodingcomplexitywhenthedecodingcomplexity constraint is tighter than the rate constraint. The detailed proposed CABAC decoding complexity control scheme is incorporated with the H.264 rate control algorithm [30] and is described as follows. TheflowchartoftheratecontrolalgorithmadoptedbytheH.264/AVCreferencecode [17] is shown in Fig. 3.4(a). It consists of the following stages. First, the frame layer rate control decides the frame bit number T, which depends on picture type, i.e. P or B types, and buffer status in the hypothetical reference decoder, which is used to avoid buffer underflow or overflow in the actual decoders. Next, the bit number of current 82 Figure 3.4: Rate control and the proposed decoding complexity control algorithms in H.264 83 basic unit B i is determined by B i = T · MAD 2 i P k MAD 2 k , where MAD i is the predicted mean absolute difference of current basic unit. In other words, more bits are assigned to the basic unit if its distortion is higher. Then, the source bit number of current basic unit B src is obtained by subtracting the estimated header bit number B hdr from basic unit bit number B i . After that, the linear rate model is used to determine the QS by the source bit number B src , and then the QS is used to decide QP. The resultant QP of the basic unit is clipped by QP = min(QP previousframe +3,QP) for the purpose of quality smoothness, and will be used for mode decision and further encoding process. Finally, the statistical information in the rate and distortion models are updated according to the encoding results of current frame and basic unit. Please note that the rate control is not performed for the first I, P and B frames because the rate and distortion models have no statistical information. The basic unit can be one MB or several MBs, which is one of the encoding options in the H.264 encoder. Fig. 3.4(b) shows the flowchart of the proposed decoding complexity control algo- rithm incorporated in the H.264/AVC rate control algorithm. Two rate-complexity (RC) mapping functions are used in this process. That is, the source (header) rate is modeled as a function of the decoding complexity for source (header) data, respectively. Mathe- matically, the two RC mapping functions can be written as R c,src (C src (BU i |QP,m))=α src ·C src (BU i |QP,m) R c,hdr (C hdr (BU i |QP,m))=α hdr ·C hdr (BU i |QP,m) , (3.9) 84 where C src (BU i |QP,m) and C hdr (BU i |QP,m) are the CABAC decoding complexities of basic unit BU i for source data and header data for a given coding mode m and QP, re- spectively. α src andα hdr are the RC function coefficients. These RC function coefficients can be trained in the RDO process. The RDO process performs the encoding process to get the source and header rates, and also estimates the CABAC decoding complexi- ties for source and header data by the proposed complexity model. Then the bit rates and decoding complexities for recent 100 MBs are used to train the RC mapping func- tion coefficients by the least square method. The proposed complexity control algorithm consists of several steps. First, the estimated header rate B hdr is used to estimate the CABAC decoding complexity for header data C hdr by dividing B hdr by α hdr . Then, the CABAC decoding complexity for source data C src is obtained by subtracting C hdr from the CABAC decoding complexity of current basic unit C b , which can be calculated by C b = C st −C sum N nb , (3.10) where C st , C sum and N nb are the CABAC decoding complexity constraint, the sum of allocated CABAC decoding complexity, and the number of non-coded basic units, respectively. After that, the source rate B src,c for a given CABAC decoding complexity constraint is determined by the source RC mapping function and the CABAC decoding complexityforsourcedataC src . Finally, theminimalsourceratebetweenB src,c andB src is used to decide the QS and QP for further encoding processes. The performance of the proposed decoding complexity control algorithm will be shown in Section 3.4. 85 3.4 Experimental Results We conducted experiments to verify the proposed CABAC decoding complexity models for source and header data parts and their decoding complexity control scheme on the PC platform. The CPU was Pentium mobile 1.7 GHz CPU with 512 Mb RAM and the operating system was Windows XP. The reference JM9.4 decoder [17] was optimized by the Intel MMX technology. To train the weights of proposed decoding complexity models for the source and the header data parts, Foreman and Mobile CIF sequences were selected as training sequences, each of which contains 270 frames. Twenty training bit streams encoded by QPs from 2, 4, 6 ··· 40 were used to train weights of the CABAC decoding complexity model for the source data. Then, the Intel VTune performance analyzer 8.0 was used to measure the CABAC decoding complexities of the source data for all pre-encoded bit streams. The numbers of clock-ticks measured by Intel VTune were divided by 1.7·10 7 to get the decoding time of the source data in terms of milli-seconds. Then, the proposed CABAC decoding complexity model counted the number of decoding complexity variables for those pre-encoded bit streams. Finally, the information was used to train weights of the CABAC decoding complexity model for the source data, i.e., ω bad,1 , ω nz , ω p and ω mb , by the constrained least-squares method. The weights of the CABAC decoding complexity model for the source data are ω bad =1.63·10 −5 , ω nz =1.83·10 −5 , ω p =1.01·10 −5 , ω mb =2.872·10 −5 . (3.11) 86 To train the weights of the CABAC decoding complexity model for the header data, twosetsofbitstreamswereused. Thefirstsetofbitstreamscontainstwentyfilesencoded by QPs from 2, 4, 6, ···, 40, where only one reference frame was used in the motion estimation (ME) process. On the other hand, the second set of bit streams contains ten files encoded by QPs form 2, 4, 6, ···, 20, yet multiple reference frames were allowed in the ME process where the number of reference frames is five. The first set of bit streams was used to train ω bad,2 and ω mv . Then, the training results, i.e., ω bad,2 and ω mv , and the second set of bit streams were used to determine ω ref . The weights of the CABAC decoding complexity model for the header data are ω bad =2.794·10 −5 , ω mv =1.1729·10 −4 , ω ref =1.9435·10 −4 . (3.12) The above weights were adopted by our decoding complexity model to estimate the decoding complexities of the following four HD (1920x1080) bit streams: Blue sky, Toy and calendar, Sunflower, and Rush hours. Table 3.1: Estimation errors (%) between the actual and estimated CABAC decoding complexities for the source and header data for various bit streams, where the top row indicates the bit rate in the unit of Megabits per second (Mbps). HD 29.40 25.60 20.48 15.36 10.24 Sequence Mbps Mbps Mbps Mbps Mbps Blue (source) 4.65% 3.78% 3.46% 0.36% 4.60% Blue (header) 1.11% 1.88% 2.72% 5.04% 4.46% Toy (source) 3.06% 4.77% 1.34% 4.04% 4.64% Toy (header) 0.86% 3.18% 3.13% 4.86% 2.89% Sunflower (source) 1.66% 0.64% 0.23% 2.26% 2.62% Sunflower (header) 0.43% 0.24% 0.32% 2.61% 2.03% Rush (source) 6.59% 5.87% 6.14% 2.84% 0.69% Rush (header) 1.87% 4.42% 0.01% 0.93% 2.39% 87 Table 3.1 shows the comparison between the estimated decoding complexity based on theproposedcomplexitymodelandtheactualdecodingcomplexitymeasuredbytheIntel VTune. We see that the proposed complexity model provides good estimation results for these test bit streams. The errors are within 7%. Fig. 3.5 shows experimental results that an H.264/AVC encoder with the proposed CABAC decoding complexity model and the decoding complexity control scheme gen- erates bit streams that meet different decoding complexity constraints. As compared with sequences without decoding complexity control, the Toy and calendar sequence of rate 25.60Mbps with the target CABAC complexity at 600ms loses 0.59dB in PSNR but saves 27.05% in decoding complexity. The Sunflower 20.48Mbps sequence withthe target CABAC complexity at 490ms loses 0.49dB in PSNR but saves 25.32% in decoding com- plexity. Finally, the Rush hour 15.36Mbps sequence with the target CABAC complexity at 340ms loses 0.66dB in PSNR but saves 27.09% in decoding complexity. The errors between actual decoding complexities and the target ones are all less than 7%. Thus, the H.264 encoder with the decoding complexity control scheme can generate bit streams to meet target decoding complexity constraints well. Besides, the resultant bit streams can save a significant amount of decoding complexity at the cost of some PSNR loss. This is particular interesting in a mobile broadcasting environment, where multiple mobile devices will get broadcast/streaming video in real time. 88 Figure 3.5: The CABAC decoding complexity control for various video sequences. 89 3.5 Conclusion TheCABACdecodingcomplexitymodelanditsapplicationtoH.264/AVCencodingwere presented in this work. The encoder integrated with the proposed complexity model and proposed complexity control scheme can generate a bit stream that is most suitable for a receiver platform with some hardware constraint. In other words, the coded bit stream can balance the tradeoff between the RD requirement as well as the computational power ofthedecodingplatform. Theproposeddecodingcomplexitymodelconsistsoftwoparts. Its first part is for source data while the second part is for header data. The proposed decoding complexity model was verified experimentally. We showed that the model pro- vides fairly good estimation results for various test bit streams. The performance of the H.264 codec with the proposed decoding complexity model and the proposed decoding complexity control scheme was presented too. It was shown that the H.264 encoder can generate bit streams according to different decoding complexity constraints accurately. The resultant bit streams can be decoded at a much lower complexity at the cost of small PSNR loss. 90 Chapter 4 CAVLC/UVLC Decoder Complexity Modeling and Its Applications 4.1 Introduction In H.264/AVC, there are two entropy coding modes. One is the context-based adaptive binary arithmetic coding (CABAC), which can be used to code all syntax elements in H.264/AVC.Theothermodeconsistsoftwoentropycodingtools: context-basedadaptive variable length coding (CAVLC) and universal variable length coding (UVLC). Unlike CABAC, CAVLC is only used to encode the quantized transformed coefficients (QTCs) while UVLC is adopted to encode header data, such as motion vectors (MVs), macro- block (MB) types, intra prediction type and other flags. In the last chapter, we focused on the decoder complexity modeling of CABAC. In this chapter, we consider the decoder complexity modeling of CAVLC/UVLC and its applications. This is an important problem for a couple of reasons. It has been observed that en- tropy decoding demands a higher computational complexity for high rate video streams 91 (e.g. high definition contents) due to more non-zero QTCs and MVs. Entropy decod- ing actually becomes the most computationally expensive process while other decoding processes, such as motion compensation and deblocking filter, are implemented in the graphic processor unit (GPU) [41]. Under such a scenario, entropy decoding becomes the main bottleneck and its complexity reduction is critical. Furthermore, although CABAC can provide better RD performance, the H.264/AVC baseline profile is limited to CAVLC/UVLC only since it targets at the low complexity and low delay applications. In other words, this profile is particularly suitable for portable devices. When the video bitstreamisdecodedinportabledevices,itisdesirabletoreducethedecodingcomplexity for the purpose of power saving. To estimate the complexity of variable length decoding (VLD), some models were reported in [47, 1]. The sum of magnitudes of non-zero QTCs and the sum of run lengths of zero QTCs are first estimated. Then, the entropy decoding complexity is modeled as a function of these two parameters. These models are however not suitable for CAVLC/UVLC decoding in H.264/AVC for two reasons. First, CAVLC is more complicated than conventional variable length coding (VLC) schemes and, thus, cannot be well captured by existing complexity models. Second, H.264/AVC adopts CAVLC and UVLC to encode QTCs and header data, respectively. This flexibility makes the decoding complexity modeling more sophisticated as compared with that in [47, 1]. To address these issues, a novel complexity model for the CAVLC/UVLC entropy decoding of H.264/AVC will be proposed in this chapter. Therestofthischapterisorganizedasfollows. ACAVLC/UVLCdecodingcomplexity model is proposed in Sec. 4.2. The integration of the proposed decoding complexity 92 model and an H.264/AVC encoder for practical applications is discussed in Sec. 4.3. Experimental results are shown in Sec. 4.4. Finally, concluding remarks are given in Sec. 4.5. 4.2 Decoding Complexity Models for CAVLC and UVLC CAVLC andUVLC decoding complexitymodels are derivedby examiningtheirdecoding processes carefully. The CAVLC and UVLC coding processes in H.264/AVC are first reviewed in Secs. 4.2.1 and 4.2.3, respectively. Then, their decoding complexity models are proposed in Secs. 4.2.2 and 4.2.4. 4.2.1 CAVLC Encoding and Decoding The CAVLC encoding process is based on several properties of QTCs in one block [4, 39]. First, QTCs in one block are usually sparse (i.e., the 2-D array contains lots of zeros) so that the run length coding is used to encode these repeating zeros. Second, non- zero QTCs of the current block are correlated with those of neighboring blocks, and the magnitudes of non-zero QTCs tend to decrease from low to high frequency. The two properties of non-zero QTCs are used in the VLC table selection, and the coding of each non-zero QTC is implemented by table look-up. In other words, the selection of the VLC tableanditscodewordsarecontextadaptive. Third, afterbeingconvertedinto1-Darray using the zig-zag scan order, high frequency non-zero QTCs are usually equal to ±1 with equalprobabilities. ThispropertyisexploitedbyCAVLC,wheretrailingonesaretreated as a different case from normal non-zero QTCs so as to reduce the number of coding bits for non-zero QTCs furthermore. 93 The CAVLC encoding process is described below. QTCs in one block is first mapped into an 1-D array with the zig-zag scan order. Then, the number of non-zero QTCs and number of trailing ones (i.e., high frequency QTCs with their magnitude equal to 1) are encoded in the bit stream, where the number of trailing ones are at most three. If the number of trailing ones are larger than three, the remaining QTCs with value equal to ±1 are treated as normal non-zero QTCs. Then, the sign bit of trailing ones is stored in the bit stream with the reverse zig-zag scan order, which is the order from the high to the low frequencies. After that, the value of each non-zero QTC, which is called the level, is encoded by VLC table look-up, where the selection of VLC table depends on neighboring blocks and the value of the non-zero QTC. This implies that VLC table selection is context adaptive. Finally, the total number of zeros before the last non-zero QTC and the number of zeros before each non-zero QTC (called the “run before”), are encoded. In CAVLC, both the level and the run before are encoded in the reverse zig-zag scan order (i.e., from the high frequency to the low frequency). Based on the above discussion, the CAVLC encoding process includes the following syntax elements: the number of non-zero QTCs, the number of trailing ones, the sign bit of trailing one, the value of non-zero QTCs (called the level), and the total number of zeros and run before. The CAVLC decoding process is shown in Fig. 4.1. The number of non-zero QTCs and that of trailing ones are first decoded. Then, non-zero QTCs with value equal to ±1 are reconstructed with their corresponding sign bits. After that, non-zero QTCs and zeros before each non-zero QTC are decoded. Finally, non-zero QTCs are decoded into a 1-D array with the zig-zag scan order. 94 Figure 4.1: The CAVLC decoding process. 4.2.2 CAVLC Decoding Complexity Model As shown in Fig. 4.1, the CAVLC decoding process consists of four modules. The first and the second modules are used to decode the sign bit of trailing ones and the level, respectively. Then, the third and the fourth modules are used to decode the runs and reconstruct QTCs in the zig-zag scan, respectively. In terms of program implementation, the four modules are implemented as four loops. The loops corresponding to these four modules are shown in Fig. 4.2. Inthiswork, theCAVLCdecodingcomplexityismodeledasafunctionofthenumber ofCAVLCexecutions(N cavlc ), thenumberoftrailingones(N one ), thenumberofremain- ing non-zero QTCs (N qtc ), and the number of run executions (N run ). Mathematically, the CAVLC decoding complexity, C cavlc , is expressed by C cavlc =ω cavlc ·N cavlc +ω one ·N one +ω qtc ·N qtc +ω run ·N run , (4.1) 95 Figure 4.2: The implementation of the CAVLC decoding process by loops. 96 where ω cavlc , ω qtc , ω one and ω run are weights for these four decoding complexity vari- ables. The number of trailing ones, the number of remaining non-zero QTCs, and the numberofrunexecutionsareusedtomodelthedecodingcomplexitiesofthesefourloops, respectively, while the number of CAVLC executions is used to model the complexities of decoding the number of non-zero QTCs, the number of trailing ones and the total number of zeros in Fig. 4.1. Note that the number of trailing ones is limited to be three in CAVLC. 4.2.3 UVLC Encoding and Decoding UVLCisusedtocodeheaderdata,suchasMVs,theMBtype,inter-andintra-prediction modes, and so on. The UVLC coding process uses the Exp-Golomb (EG) code. The EG code for an unsigned integer syntax element of value C has the following bit fields: [M zeros][1][INFO] (4.2) where M = floor(log 2 [C +1]), INFO = C +1−2 M , and the separation bit between Mzeros and INFO is one. In other words, the EG code is a variable length code. A signed integer syntax element of value SC is first mapped into an unsigned integer value C via C =2·|SC|, if SC ≤0, C =2·(SC −1), otherwise. (4.3) Then, the unsigned integer value, C, is coded by the EG code. The EG code decoding process is simple. First, M zeros are read until one is reached, and then M-bit INFO 97 is read from the bit stream. Finally, the unsigned integer syntax element of value C is constructed by C =INFO+2 M −1. 4.2.4 UVLC Decoding Complexity Model TomodeltheUVLCdecodingcomplexity,weexaminethecomplexityoftheEGdecoding process, which consists of reading M zeros and M-bit INFO in the EG code bit field. However, the implementation of EG code bit field reading might be different among decoders. For example, a simple decoder can read one bit a time and, therefore, 2M +1 times of reading bits are required to decode a syntax element. On the other hand, a sophisticated decoder can read eight bits (i.e., one byte) a time and then examine if the one-byte data is zero or not. If the one-byte data is non-zero, it implies that the separation bit between M zeros and INFO is found. If the one-byte data is zero, the same operation is performed again until the separation bit is identified. The execution time of the above operations is 2·d M 8 e. As a result, the decoding complexity model that considers the length of the EG code bit field is not suitable for all decoders. Inourwork,theUVLCdecodingcomplexityismodeledasafunctionofthenumberof non-skipped MBs (N mb ), the number of intra blocks (N intra ), the number of MVs (N mv ), and the number of reference frames (N ref ). These four quantities are used to model the complexities of decoding MB types, MB subtypes, intra prediction types, MVs, and reference frame indices, respectively. Mathematically, the UVLC decoding complexity, C uvlc , can be written as C uvlc =ω mb ·N mb +ω intra ·N intra +ω mv ·N mv +ω ref ·N ref , (4.4) 98 where ω mb , ω intra , ω mv , and ω ref are weights of these decoding complexity variables. The above decoding complexity model is not built upon a specific decoder implemen- tation, and it is suitable for variant H.264/AVC decoders. Note that the number of intra blocks depends on the intra-prediction mode. For example, there are 16, 4, and 1 intra blocks for I4MB, I8MB and I16MB, respectively. The number of intra blocks is used to estimate the complexity of decoding intra-prediction directions. The number of reference frames depends on inter-prediction modes. For example, there are 1, 2, 2 and 4 reference frames for the MB coded by P16x16, P16x8, P8x16 and P8x8 inter-prediction modes, respectively. The number of reference frames in one MB is at most four because those blocks within one 8x8 block share the same reference frame index. The approach used to train the weights in the CABAC complexity model can also be used to train the weights in the CAVLC/UVLC complexity model. First, several pre- encoded bit streams are selected, and the Intel VTune performance analyzer is used to measure the CAVLC and UVLC decoding complexities (i.e., C cavlc and C uvlc ) in time, individually. Second, the numbers of these decoding complexity variables are counted individually for those pre-encoded bit streams. Finally, the constrained least-squares method is used to find their weights. Since weights in our model depend on the target decoding platform, they must be trained for a particular decoding platform. More details aboutweighttrainingcanbefoundandtheproposedCAVLC/UVLCdecodingcomplexity model will be verified experimentally in Sec. 4.4. 99 4.3 Decoder-Friendly H.264/AVC System Design 4.3.1 Rate Distortion and Decoding Complexity (RDC) Optimization Framework In H.264/AVC, there are two RDO processes used to find the best coding mode: one for 8x8 blocks and the other for 16x16 MBs. Each RDO process consists of three stages. The first one is to perform motion estimation (ME) to find the best MV or search the best intra prediction direction if the current block/MB is to be coded as the inter or intra prediction modes, respectively. Then, the RDO process evaluates the RD cost and find the best mode that minimizes the RD cost function. The RD cost function is given by J rd (blk i |QP,m)=D(blk i |QP,m)+λ m ·R(blk i |QP,m), (4.5) where D(blk i |QP,m) and R(blk i |QP,m) are the distortion and the bit rate of block blk i for a given coding mode m and quantization parameter (QP), respectively. The RD cost minimizationimpliesthattheRDOprocessfindsthebestmodethatminimizesdistortion D(blk i |QP,m)whilemeetingtherateconstraint(i.e.,R(blk i |QP,m)≤R st,i ). Pleasenote that λ m in (4.5) depends on the rate constraint, and, therefore, it is a function of QP. Since the decoding complexity constraint is our main focus, the original RD cost in Eq. (4.5)canbeextendedbyaddinganothertermtoaccountforthedecodingcomplexity as J rdc (blk i |QP,m)=D(blk i |QP,m)+λ m ·R(blk i |QP,m)+λ c ·C(blk i |QP,m) (4.6) 100 where C(blk i |QP,m) is the decoding complexity of block blk i for a given coding mode m and QP, and λ c is the associated Lagrangian multiplier. The minimization of the cost function in (4.6) implies that the RDO process finds the best mode that mini- mizes the distortion while meeting the rate and decoding complexity constraints; namely, R(blk i |QP,m)≤R st,i andC(blk i |QP,m)≤C st,i . Being similar toλ m for a given bit rate constraint,theLagrangianmultiplier,λ c ,dependsonthedecodingcomplexityconstraint. TherelationshipbetweenthebitrateandtheCAVLC/UVLCdecodingcomplexityisfirst investigated. Then, the selection of proper Lagrangian multiplier, λ c , for mode decision and the decoding complexity control algorithm are studied accordingly. 4.3.2 Relationship between Bit Rate and Decoding Complexity The relationship between rate and decoding complexity is investigated in this section for CAVLC/UVLC. A. CAVLC First, we consider the CAVLC decoding complexity model for source data in (4.1). In the high bit rate case, since N mb and N CAVLC are approximately equal to the number of MBs and 4x4 blocks, respectively, these two terms can be viewed as constants. N run becomes less important in the decoding complexity model since most QTCs are non-zero in the high bit rate case. Besides, CAVLC only allows at most three trailing ones (i.e., N trailing ≤ 3). As a result, the number of non-zero QTCs, N qtc , is the most significant term that dominates the CAVLC decoding complexity in the high bit rate case. Since 101 DCT coefficients are approximately Laplacian-distributed [22], the number of non-zero QTCs N qtc can be expressed as a function of quantization step-size (QS) N qtc =2N · Z ∞ QS p(x)dx=N ·μ· Z ∞ QS e −μ·|x| dx=N ·e −μ·(QS) , (4.7) where N is the total number of QTCs in one 4x4 block, and p(x) is the Laplacian prob- ability density function. Since the CAVLC decoding complexity is primarily dominated by the number of non- zero QTCs, N qtc , in the high bit rate case, the CAVLC decoding complexity can be expressed as a function of QS, too. That is, C cavlc =ω qtc ·N ·e −μ·(QS) =c·e −μ·(QS) (4.8) where c=ω qtc ·N is a constant. In H.264/AVC, the source bit rate R src can be modeled as a linear function of QS [20], i.e., R src = α QS , where α is a model parameter. This implies that the CAVLC decoding complexity can be further expressed as a function of source bit rate, i.e., C cavlc = c · e −μ α Rsrc or ln c C cavlc = μ α Rsrc . On the other hand, the source bit rate can be expressed as a function of the CAVLC decoding complexity: R src =μα·[ln c C cavlc ] −1 . By using the Taylor expansion of the log function, the source bit rate can be written as R src uμα·[( c C cavlc −1)− 1 2 ( c C cavlc −1) 2 ] −1 =β 1 ·C cavlc +β 2 ·C 2 cavlc , (4.9) which is a quadratic function of the CAVLC decoding complexity. 102 Figure4.3: TherelationshipbetweenthesourcebitrateandCAVLCdecodingcomplexity for four test sequences (Blue sky, Toy and calendar, Sunflower, and Rush hour) coded by high bit rates. The quadratic relationship between the source bit rate and CAVLC decoding com- plexity in Eq. (4.9) will be used in decoding complexity control and the selection of λ c for mode decision in Eq. (4.6), which will be discussed later. The relationship between the source bit rate and the CAVLC decoding complexity for Blue sky, Toy and calen- dar, Sunflower, and Rush hour HD (1920x1080) bit streams is shown in Fig. 4.3. We see that the quadratic approximation as derived in Eq. (4.9) is well supported by these experimental data. B. UVLC The relationship between the header bit rate and the UVLC decoding complexity is studied in this subsection. The header bit rate is modeled as a function of the number 103 of MVs (N mv ), the number of non-zero MV elements (N nzMVe ) and the number of intra MBs (N intra ), in [20]. Mathematically, it is in form of R hdr =γ·(N nzMVe +ω·N mv )+N intra ·b intra (4.10) where γ and ω are model parameters, and b intra is the average header bit number for intra MB. The UVLC decoding complexity model in (4.4) includes the number of MVs, which is also included in the header bit rate model. Furthermore, the number of intra blocks should be able to estimate the bit number of intra-prediction directions for intra MBs, and the number of reference frames can be used to model the bit number of reference frame indices. Thus, it is our conjecture that the UVLC decoding complexity may be proportional to the header bit rate. Toverifytheaboveconjecture,wecollecttheheaderbitratesandtheUVLCdecoding complexities of basic units for several test video sequences, and plot them in Fig. 4.4, where each basic units contains 15 MBs and the four high-rate test video sequences are Blue sky, Toy and calendar, Sunflower, and Rush hour. The linear relationship between the header bit rate and UVLC decoding complexity is clearly confirmed by these data. Therelationshipsbetweenratesanddecodingcomplexitiesasdiscussedinthissection will be used in decoding complexity control and mode decision in the next section. 104 Figure 4.4: The relationships between header bit rates and the UVLC decoding complex- ity for four high rate video sequences (Blue sky, Toy and calendar, Sunflower, and Rush hour). 105 4.3.3 RDC Optimization and Decoding Complexity Control Being similar to rate control, decoding complexity control is a process to determine some control parameters so that the encoder equipped with the decoding complexity control scheme can generate bit streams to meet the target decoding complexity constraint. On theotherhand,akeyproblemofRDCoptimizationistodecidetheLagrangianmultiplier, λ c inEq. (4.6), sothattheRDOprocesscandecidethebestcodingmodethatminimizes the distortion and meets the rate and decoding complexity constraints simultaneously. In this subsection, the selection of proper λ c for the RDC optimization is addressed first. Then, the CAVLC decoding complexity control algorithm is developed accordingly. Following the CABAC RDC optimization strategy, we use the rate-complexity (RC) mapping function Rc(·) to estimate the bit rate for a given decoding complexity. Then, the RDC problem in (4.6) can be either reduced to a rate-distortion (RD) problem if R st,i ≤ R c (C st,i ) (which means that the rate constraint is tighter than the complexity constrain) or a complexity-distortion (CD) problem if R c (C st,i ) < R st,i (which means that the complexity constraint is tighter than the rate constrain). Mathematically, the RD optimization problem can be written as minD(blk i |QP,m)s.t.R c (C(blk i |QP,m))≤R c (C st,i ), (4.11) or min{D(blk i |QP,m)+λ 0 m ·R c (C(blk i |QP,m)}. (4.12) 106 On the other hand, the CD optimization problem can be written as minD(blk i |QP,m)s.t.C(blk i |QP,m)≤C st,i , (4.13) or min{D(blk i |QP,m)+λ c ·C(blk i |QP,m)}. (4.14) We would like to solve the RD optimization problem in (4.12) rather than the CD op- timization problem in (4.14) since the Lagrangian multiplier, λ 0 m , in (4.12) is easier to obtain. The Lagrangian multiplier, λ 0 m , can be determined by the following steps. First, the QS can be computed by the linear bit rate model [20] once the estimated rate R c (C(blk i |QP,m)) for a given CAVLC decoding complexity is obtained. Then, the QS can be used to determineQP in H.264/AVC. Finally, λ 0 m can be obtained by the formula λ 0 m =0.85·2 (QP−12)/3 as suggested in [52]. The above process means that the CD optimization problem can be converted into a RD optimization problem via rate-complexity (RC) mapping. The resultant RD opti- mization problem has a tighter rate constraint than the original rate constraint due to a tighter complexity requirement. In other words, it is desirable to lower the rate so as to reducetheCAVLC/UVLCdecodingcomplexitywhenthedecodingcomplexityconstraint is tighter than the rate constraint. 107 Figure 4.5: The proposed decoding complexity control algorithms in H.264 4.3.4 Decoder-Friendly H.264/AVC Encoding We can integrate the decoding complexity control module studied in the last subsection with the H.264/AVC rate control module as proposed in [30] into one system as shown in Fig. 4.5. It has two rate-complexity (RC) mapping functions. First, the source bit rate is modeled as a quadratic function of the CAVLC decoding complexity. Second, the header bit rate is modeled as a linear function of the UVLC decoding complexity. Mathematically, these two RC mapping functions can be expressed as R c,src (C cavlc (BU i |QP,m))=β 1 ·C cavlc (BU i |QP,m)+β 2 ·C 2 cavlc (BU i |QP,m), R c,hdr (C uvlc (BU i |QP,m))=α hdr ·C uvlc (BU i |QP,m), (4.15) 108 where C cavlc (BU i |QP,m) and C uvlc (BU i |QP,m) are the CAVLC and UVLC decoding complexities of basic unit BU i for a given QP and coding mode m, respectively. Param- eters β 1 , β 2 and α hdr in the RC mapping functions can be trained in the RDO process. The RDO process performs the encoding task to get the source and header rates, and estimates the CAVLC decoding complexity (C cavlc ) and the UVLC decoding complexity (C uvlc ). Then, bit rates and decoding complexities for the last 100 MBs are used to train coefficients of the RC mapping functions with the method of least squares. Thecomplexitycontrolalgorithmconsistsofseveralsteps. First,theestimatedheader bit rate B hdr is used to estimate the UVLC decoding complexity C uvlc by dividing B hdr withα hdr . Then,theCAVLCdecodingcomplexityC cavlc isobtainedbysubtractingC uvlc from the decoding complexity of the current basic unit, which is equal to C b = C st −C sum N nb (4.16) where C st , C sum and N nb are the decoding complexity constraint, the sum of allocated decoding complexity, and the number of non-coded basic units, respectively. After that, the source bit rate B src,c for a given decoding complexity constraint is determined by the source RC mapping function and the CAVLC decoding complexity C cavlc . Finally, the minimal source rate between B src,c and B src is used to decide the QS and QP for further encoding. 109 4.4 Experimental Results We conducted experiments to verify the proposed CAVLC and UVLC decoding complex- ity models and their decoding complexity control scheme on the PC platform. The CPU was Pentium mobile 1.7 GHz CPU with 512 Mb RAM and the operating system was Windows XP. The reference JM9.4 decoder [17] was optimized by the Intel MMX tech- nology. To train the weights of proposed decoding complexity models for the source and the header data, Foreman and Mobile CIF sequences were selected as training sequences, each of which contains 270 frames. Twenty bit streams generated by QPs from 2, 4, ··· 40 were used to decide weights of the CAVLC decoding complexity models, i.e., ω cavlc , ω qtc , ω one and ω run . The Intel VTune Performance Analyzer 8.0 was used to measure the CAVLC decoding complexity. Then, the proposed complexity model counted the numbers of the four decoding com- plexityvariables. Finally, theseweightsweredeterminedbytheconstrainedleast-squares method. These weights of the CAVLC decoding complexity model are: ω cavlc =7.8592·10 −5 , ω qtc =3.26814·10 −5 , ω one =1.8861·10 −5 , ω run =1.9508·10 −5 . (4.17) To train weights of the UVLC decoding complexity model, two sets of bit streams were selected. The first set of bit streams was coded by the single reference frame ME (SRF-ME)processwhilethesecondsetofbitstreamswascodedbythemultiplereference frame (MRF-ME) process with five reference frames. The first set of bit streams contains twenty files encoded by QPs from 2, 4, ···, 40. The second set contains ten files encoded 110 by QPs from 2, 4, ··· to 20. The first set of bit streams was used to train weights for the number of non-skipped MBs, the number of MVs, and the number of intra blocks, i.e., ω mb , ω mv and ω intra . Once these three weights are determined, the second set of bit streams and these three trained weights help decide the weight for the number of reference frames. The trained weight for the UVLC decoding complexity model are ω mb =1.46·10 −4 , ω mv =1.6488·10 −4 , ω intra =2.4391·10 −5 , ω ref =1.281403·10 −4 . (4.18) Next, these weights were used in the decoding complexity model to estimate the decoding complexities of the following four HD (1920x1080) bit streams: Blue Sky, Toy and Calendar, Sunflower and Rush Hour. Table 4.1 gives the comparison between the estimated decoding complexity based on the proposed complexity model and the actual decoding complexity measured by the Intel VTune. We see that the proposed complexity model provides good estimation results for these test bit streams. The errors are within 8%. Table 4.1: Estimation errors (%) between the actual and estimated CAVLC(source data) and UVLC(header data) decoding complexities for various bit streams. HD 29.40 25.60 20.48 15.36 10.24 Sequence Mbps Mbps Mbps Mbps Mbps Blue (source) 1.20% 1.43% 0.44% 2.25% 1.03% Blue (header) 3.73% 0.58% 5.46% 0.43% 4.70% Toy (source) 3.62% 4.27% 7.34% 3.11% 0.40% Toy (header) 1.23% 2.33% 2.16% 0.19% 5.31% Sunflower (source) 5.22% 5.32% 4.17% 0.25% 0.14% Sunflower (header) 2.80% 1.94% 2.06% 2.09% 5.32% Rush (source) 5.03% 3.45% 3.64% 3.65% 3.07% Rush (header) 5.01% 5.60% 4.82% 2.48% 2.15% 111 ExperimentalresultsusingtheH.264/AVCencoderwiththeCAVLC/UVLCdecoding complexity models and the decoding complexity control scheme are shown in Fig. 4.6, where results for Blue sky, Toy and calendar, Sunflower, and Rush hour are shown in rows 1, 2, 3 and 4, respectively. The x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). The first point in the x-axis corresponds to the case without decoding complexity control. As compared with sequences without decoding complexitycontrol,Toyandcalendar25.60MbpssequencewiththetargetCAVLC/UVLC complexity at 300ms loses 0.76dB in PSNR but saves 35.97% in decoding complexity. The Sunflower 20.48Mbps sequence with the target CAVLC/UVLC complexity at 270ms loses 0.64dB in PSNR but saves 30.40% in decoding complexity. Finally, the Rush hour 15.36Mbps sequence with the target CAVLC/UVLC complexity at 200ms loses 0.8dB in PSNR but saves 32.04% in decoding complexity. TheseexperimentalresultsclearlydemonstratethattheH.264/AVCencoderwiththe proposed CAVLC/UVLC decoding complexity models and the decoding complexity con- trol scheme can generate bit streams to meet different decoding complexity constraints. The errors between actual decoding complexities and target decoding complexity con- straints are all less than 7%. Besides, the resultant bit streams can save a significant amount of decoding complexity at the cost of some PSNR loss. This is particular in- teresting in a mobile broadcasting environment, where multiple mobile devices will get broadcast/streaming video in real time. 112 Figure4.6: CAVLC/UVLCdecodingcomplexitycontrolforfourtestsequences: Bluesky (row 1), Toy and calendar (row 2), Sunflower (row 3), and Rush hour (row 4), where the x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). 113 4.5 Conclusion The CAVLC and the UVLC decoding complexity models and their associated decoding complexity control scheme were presented in this chapter. Since CAVLC and UVLC are used to encode source and header data, respectively, the proposed decoding complexity model consists of source and header data parts. Both decoding complexity models were verified experimentally. We also studied the relationship between the bit rate and the obtained decoding complexity model. Their relationship helps us develop a simple de- coding complexity control scheme and decide the proper Lagrangian multiplier for mode decision. The performance of the H.264/AVC codec with the entropy decoding complexity models and the proposed decoding complexity control scheme was presented. It was shown that the H.264/AVC encoder can generate bit streams to meet different decoding complexity constraints accurately. The resultant bit streams can be decoded at a much lower complexity at the cost of small PSNR loss. When decoders have a tight complexity (or energy) constraint such as those in portable devices, this tradeoff may be worthwhile. 114 Chapter 5 Decoding Complexity Modeling of Deblocking Filters and Its Applications 5.1 Introduction H.264/AVC[53][35]isthelatestvideocodingstandardproposedbyITU-TandISO/IEC. IthasbeenselectedasthevideocodingtoolinHD-DVDandBlue-rayspecifications. Be- ing similar to other video coding standards, H.264/AVC adopts the block-based coding scheme. Blocking artifacts may be introduced due to coarse quantization of transformed coefficients or discontinuity between reference blocks in motion compensation. These ar- tifacts can be mitigated by a post-processing deblocking filter (DBF) or an in-loop DBF, which is performed in the encoding and the decoding loops. For the latter, reconstructed frames are processed by in-loop DBF operations and used as reference frames for future encoding (or decoding). Since the in-loop DBF typically provides better subjective and objective quality than the post-processing DBF, it has been adopted in the H.264/AVC videocodingstandard[32]. However,sincethein-loopDBFoperationsmustbeperformed in both the encoding and the decoding loops to ensure the same reconstructed reference 115 frame in motion estimation and compensation, the H.264/AVC decoding complexity in- creases significantly. According to the decoding complexity profiling study as reported in [8], the DBF decoding complexity is one of the main bottlenecks in the whole decoding process. For mobile multimedia applications, video bit streams have to be decoded in portable consumer electronics devices. Reduction of the DBF decoding complexity so as to save the power is highly desirable in such a scenario. To save the power consumption of the H.264/AVC decoder, one solution is to allow the H.264/AVC encoder to generate decoder-friendly bit streams in the sense that the compressedbitstreamiseasytodecodeonaselectedplatform. Thismotivatesustostudy the DBF complexity model for the H.264/AVC decoder. Once the model is available, it can be used by the H.264/AVC encoder to estimate the DBF decoding complexity and select the best coding mode to balance the tradeoff between the rate-distortion (RD) cost and the decoding complexity. In this chapter, we are primarily focused on the DBF decoding complexity model and its application to the reduction of H.264/AVC decoding complexity. Complexity models of various H.264/AVC decoding modules have been studied by researchers in the past [50, 48, 49, 25, 26, 29, 28, 24, 27, 10]. The motion compensation process (MCP) in H.264/AVC decoding was first modeled as a function of the number of interpolation filters in [50, 48, 49]. This model was further improved in [25, 26, 29]. Be- sides the number of interpolation filters, the relationship between motion vectors (MVs), frame sizes and the distribution of selected reference frames were taken into account in the improved model, since their relationship has an impact on the cache management ef- ficiency and, thus, the MCP decoding complexity. Complexity models of the H.264/AVC 116 entropy decoding module were investigated in [24, 27, 28]. Since H.264/AVC entropy coding tools are used to encode the source data (i.e., quantized transformed coefficients) and the header data (e.g., motion vectors and coding modes), their complexity models consist of decoding complexity models for source and header data parts, respectively, as reported in [24, 27, 28]. As compared with complexity models for the motion compensation and the entropy decoding processes, the complexity model of H.264/AVC DBF decoding has not received much attention in the past. The decoding complexity of DBF is modeled as a function of the number of low-pass filtering operations in [10]. However, this model cannot provide a very accurate result. A new DBF decoding complexity model will be proposed in this chapter to enhance the model given in [10]. The rest of this chapter is organized as follows. The H.264/AVC DBF process is reviewed in Section 5.2. Then, a new H.264/AVC DBF decoding complexity model is proposed in Section 5.3. The application of the proposed decoding complexity model and its integration with an H.264/AVC encoder are discussed in Section 5.4. The detailed implementation of the DBF decoding complexity control is presented in Section 5.5. Experimental results are reported in Section 5.6. Finally, concluding remarks are given in Section 5.7. 5.2 Overview of H.264/AVC Deblocking Filter The H.264/AVC deblocking filter (DBF) process is briefly reviewed in this section. The DBF process is applied to all horizontal and vertical edges of each 4x4 block in one MB. 117 Figure 5.1: The deblocking filter process for one 4x4 block. It consists of boundary strength (BS) computation, edge detection and low-pass filtering modules as shown in Fig. 5.1. First, an integer value, called the boundary strength (BS), is computed in the BS computation stage for each edge in one 4x4 block. The BS value is used to determine low-passfilteringoperationsinthelaterstage. Table5.1showsthattheBSvaluedepends on coding modes, reference frames, motion vectors and quantized residual data. When the BS value is equal to four, it means that a strong low-pass filtering operation should be applied to this edge. When the BS value is non-zero but less than four, a common low-pass filtering operation might be performed on this edge. Finally, if the BS value is equal to zero, the edge is within a smooth area and, as a result, there is no need to apply edge detection and low-pass filtering operations afterwards. After that, each edge with a non-zero BS value is analyzed in the edge detection stage so as to distinguish trues edges and those due to blocking artifacts. It is desirable that 118 Table 5.1: The boundary strength (BS) of an edge depends on coding modes, reference frames, motion vectors and quantized residual data. Block modes and conditions BS One of the block is intra and 4 the edge is a macro-block edge One of the blocks is intra 3 One of the blocks has coded residuals 2 Difference of block motion ≥1 1 (luma sample distance) Motion compensation from different 1 Reference frames Else 0 the low-pass filter operations are applied to remove these edges due to blocking artifacts rather than true edges. Edge detection is performed on these pixels near a 4x4 block boundary by examining the following two conditions, |p 0 −q 0 |<THD A |p 1 −p 0 |<THD B , (5.1) where p 0 , p 1 and q 0 are pixels near a 4x4 block boundary as shown in Fig.5.2, and both THD A and THD B depend on quantization parameters (QPs). To be more specific, THD A and THD B increase as QP becomes larger, which implies that it is more likely these pixels near a 4x4 block boundary are processed by the low-pass filtering operations in the low bit rate case. Only when the above two conditions hold, the low-pass filter operations are executed in the following stage. As mentioned before, two low-pass filters can be used to remove block artifacts de- pending on the value of BS. A strong low-pass filter is applied if the BS value is equal to four while a common low-pass filter is used if the BS value is equal 1, 2 and 3. Fur- thermore, each of these two low-pass filters can have two operational modes. That is, 119 Figure 5.2: Pixels near a block boundary. once the the two edge detection conditions given in (5.1) hold, we apply check another conditions as given below: |p 2 −p 0 |<THD B . (5.2) If the condition in (5.2) holds, pixels p 0 , p 1 and p 2 are processed by the low-pass filtering operations (either the strong or common low-pass filtering operation). Otherwise, low- pass filtering operations are only applied to pixels p 0 and p 1 . Pixels q 0 , q 1 and q 2 are processed in the same manner with p substituted by q in Eqs. (5.1) and (5.2). For more details of the H.264/AVC DBF process, we refer to [32]. 5.3 H.264/AVC DBF Complexity Model 5.3.1 Motivation As mentioned before, the H.264/AVC DBF process consists of boundary strength (BS) computation, edge detection and low-pass filtering stages. Complexities of these three stages should be considered in the decoding complexity model. As compared to the DBF decoding complexity model in [10], which models the DBF decoding complexity as a 120 function of the number of low-pass filtering operations only, our proposed model is more accurate. We conducted an experiment on the Pentium mobile 1.7GHz platform to verify our conjecturethatthenumberoflow-passfilteringoperationsisnotsufficientintheDBFde- coding complexity modeling. We selected Foreman and Mobile CIF (352x288) sequences to generate two bit streams with the quantization parameter (QP) set to 10, where each bit stream contains 270 frames with the group of picture (GOP) structure {IPPP..P}. It wasmeasuredbytheIntelVTunePerformanceAnalyzer8.0thattheDBFdecodingcom- plexities of Foreman and Mobile CIF bit streams are equal to 163.53 mill-seconds (ms) and 165.50 ms, respectively. Since a fine QP value is used by the H.264/AVC encoder to generatethesetwobitstreams, eachedgeisidentifiedasatrueedgeintheedgedetection stage so thatnolow-pass filteringoperationisappliedtothese twobitstreams. However, it was observed that their DBF decoding complexities were still measured by the Intel VTune. This experiment indicates that the number of low-pass filtering operations is not the only factor in the DBF decoding complexity modeling. Instead, the complexities of the BS computation and edge detection also play a role in the DBF decoding complexity. 5.3.2 DBF Decoding Complexity Model with Fixed Frame Size Inthisresearch,theH.264/AVCDBFdecodingcomplexityismodeledasafunctionoffour decodingcomplexityvariables: thenumberofMBs(N mb ), thenumberofedgeswithnon- zerovaluesofBS(N e,nz ),thenumberofcommonlow-passfilteringoperations(N common ), 121 and the number of strong low-pass filtering operations (N strong ). Mathematically, the DBF complexity (C DBF ) is expressed as C DBF =ω mb ·N mb +ω edge ·N e,nz +ω common ·N common +ω strong ·N strong , (5.3) where ω mb , ω edge , ω common , and ω strong are weights of these four variables. Being similar to [10], the number of low-pass filtering operations are considered in Eq. (5.3). Since H.264/AVCadoptstwokindsoflow-passfilteringoperations(i.e.,thecommonandstrong low-pass filters) which have different numbers of filter coefficients, two different terms are used in the proposed model. In addition, the number of MBs and the number of edges with non-zero values of BS are used to model the complexities of BS computation and edge detection, respectively. 5.3.3 DBF Decoding Complexity Model with Variable Frame Sizes The relationship between the DBF decoding complexity and the frame size is studied below. It is assumed that the decoding platform has a two-level cache hierarchy: the first-level (L1) and the second-level (L2). It was reported in [23] that cache management is critical to the DBF implementation in H.264/AVC decoding. Reduction of the number of cache misses so as to lower the DBF decoding complexity is desirable. Since the size of the L1 cache is small, only a few blocks of the decoded frame can be cached when the decoded frame is processed by DBF. In contrast, the L2 cache is larger and may store the whole decoded frame. Consider the scenario where the frame size is small enough to be stored in the L2 cache in decoding. Under such a scenario, the L1 cache miss penalty 122 in the DBF process is equal to the access time of the L2 cache. On the other hand, if the frame size is larger and the data required by DBF cannot be obtained from the L2 cache, the L1 cache miss penalty could be the same as the L2 cache miss penalty. To conclude, the L1 cache miss penalty highly depends on the L2 cache hit rate [7], and the L2 cache hit rate decreases as the frame size becomes large. Thus, the frame size can affect the L1 cache miss penalty and the DBF decoding complexity. To verify the above claim, several experiments were conducted in the Pentium mobile 1.7GHz platform. First, we selected Foreman and Mobile CIF sequences as training sequences, and pre-encoded 40 bit streams to train the weights in our complexity model. More details of the training process will be shown in Section 5.6. Then, these weights wereusedtoestimatetheDBFdecodingcomplexitiesofAkiyoCIF256KBps,FootballD1 (720x480) 1024KBps, and Blue and sky HD (1920x1080) 10.24MBps bit streams. The actual DBF decoding complexities, which are in terms of milli-seconds (ms), of Akiyo 256KBps, Football 1024KBps and Blue and sky 10.24MBps bit streams measured by the Intel VTune are 53.6 ms, 254.85ms, and 497.85 ms while the estimated DBF decoding complexities are 51.14 ms, 206.40 ms, and 339.92 ms, respectively. These weights trained by CIF bit streams can provide good estimation results for CIF bit streams but not for D1 and HD bit streams. The above experiments confirm that these weights vary with the frame size, which is similar to the weight for the number of cache misses in our MCP decoding complexity model [26]. Furthermore, we re-trained these weights using 42 D1 bit streams generated by Foot- ball and Mobile D1 sequences with different QPs and frame numbers as shown in Table 5.3. It was observed that the weights trained by D1 bit streams are different from those 123 trainedbyCIFbitstreams. Thisclearlydemonstratesthatframesizesplayanimportant role in cache miss penalty of L1 cache, and, therefore, these weights in our DBF decoding complexity model. They must be trained according to the frame size of bit streams. 5.4 Decoder-friendly H.264/AVC System Design The application of the proposed DBF decoding complexity model is briefly discussed in this section. Consider the following two scenarios. First, an H.264/AVC encoder generates an unique bit stream for different decoding platforms. Second, the H.264/AVC encoder generates multiple bit streams targeting at different decoding platforms so that the resultant bit stream is easy to decode with respect to a particular platform. For the latter, the decoding complexity model and the corresponding decoding complexity control scheme should be implemented in the H.264/AVC encoder so that the encoder can estimate the decoding complexity so as to generate decoder-friendly bit streams. 5.4.1 FrameworkforRate,DistortionandDecodingComplexity(RDC) Optimization Since H.264/AVC adopts many coding modes to improve its coding gain, the selection of the best coding mode becomes a challenge task in the H.264 encoder. The rate-distortion optimization (RDO) process is often used in the encoder to find the best coding mode. Specifically, the RDO process either estimates or actually computes the rate and the distortion of a coded frame for a specific coding mode. Then, the RDO process evaluates 124 the rate-distortion (RD) cost function and selects the best mode that minimizes the following RD cost function: J rd (blk i |QP,m)=D(blk i |QP,m)+λ m ·R(blk i |QP,m), (5.4) whereD(blk i |QP,m)andR(blk i |QP,m)arethedistortionandthebitrateofblockblk i for givencodingmodemandquantizationparameter(QP),respectively. Theminimizationof RDcostfunctionin(5.4)impliesthattheRDOprocessfindsthebestmodethatminimizes distortion D(blk i |QP,m) while meeting the rate constraint (i.e., R(blk i |QP,m) ≤R st,i ). Note that the Lagrangian multiplier, λ m , in (5.4) depends on the bit rate and, therefore, it is a function of QP. Along this line of thoughts, joint optimization of rate, distortion, and decoding com- plexity can be considered in the RDO process for the selection of the best coding mode. The RD cost function in (5.4) can be generalized by introducing another term to account for the decoding complexity. The resultant RDC cost function can be expressed as J rdc (blk i |QP,m)=D(blk i |QP,m)+λ m ·R(blk i |QP,m)+λ c ·C(blk i |QP,m), (5.5) where C(blk i |QP,m) is the decoding complexity of block blk i for given coding mode m and QP, and λ c is the associated Lagrangian multiplier. The minimization of the RDC cost function in (5.5) implies that the RDO process finds the best mode that mini- mizesthedistortionwhilemeetingtherateanddecodingcomplexityconstraints(namely, R(blk i |QP,m) ≤ R st,i and C(blk i |QP,m) ≤ C st,i ). Being similar to λ m , which depends 125 on the bit rate, the Lagrangian multiplier λ c is determined by the decoding complexity constraint. The selection of suitable λ c for a given decoding complexity constraint is a challenge to the encoder. 5.4.2 Decoding Complexity Control and Its Challenges Being similar to the concept of rate control, decoding complexity control is a process to determine some parameters so that the resultant bit stream can achieve the best RD performance while meeting the decoding complexity constraint. Since different coding modes can result in different DBF decoding complexities, a DBF decoding complexity control scheme was proposed in [10] by mode decision. To be more specific, the DBF decoding complexity control scheme is incorporated in the RDO process that decides the best mode to generate the bit stream to achieve the best RD performance while satisfying the decoding complexity constraint. This is achieved by minimizing the RDC cost function in Eq. (5.5). However, there are two challenges not well addressed in [10]. First, as mentioned before, the Lagrangian multiplier,λ c , in Eq. (5.5) depends on the decoding complexityconstraint. The decoding complexitycontrolmethodviamode deci- sioncanbeviewedasaproceduretodetermineasuitablevalueofλ c foragivendecoding complexity constraint, where the RDO process decides the best coding mode that yields the best RD performance and meets the given decoding complexity constraint. However, theselectionofλ c foragivendecodingcomplexityconstraintwasnotaddressedin[10]. A fixedLagrangianmultiplier,λ c , wasusedintheRDOprocessinsteadand, asaresult, the generated decoder-friendly bit stream only satisfies one decoding complexity constraint 126 while it could be decoded by different decoding platforms. Thus, the H.264/AVC en- coder equipped with the decoding complexity control method in [10] may not generate decoder-friendly bit streams for different decoding platforms. Second, consider the DBF decoding complexity of the Akiyo CIF 64KBps bit stream. Its DBF decoding complexity measured by the Intel VTune is about 37.38 milli-second (ms)whilethedecodingcomplexityofBScomputationtakesabout28.1ms. Sincethebit stream contains 99000 macro-blocks (MBs), the decoding complexity of BS computation is expressed by the MB number multiplied by the weight ω mb , which is 2.8439·10 −4 for CIF bit streams. Then, the complexity of BS computation is 28.1 ms in the proposed DBF complexity model 1 . In other words, about 75% of the DBF decoding complexity is contributed by the complexity of BS computation. In this case, the decoding com- plexity control scheme via mode decision does not help the encoder much in generating decoder-friendly bit streams since the decoding complexity control method by mode de- cision selects coding modes with fewer low-pass filtering operations but not lower BS computational complexity. The latter is basically a constant in the MB DBF decoding complexity [8]. Two decoding complexity control methods will be proposed to solve the this difficulty in the following two subsections. 5.4.3 Average Complexity Control Method To address the difficulty that the complexity of BS computation is a constant term in the MB DBF decoding complexity and cannot be controlled by mode decision, a slice-layer 1 More details of the training process and the trained weights of the proposed complexity model will be shown in Section 5.6. 127 decoding complexity control method, which is called the average method, is proposed in this subsection. Consider the H.264/AVC bit stream flag, disable deblocking filter idc, which is used to enable or disable the DBF operations in one slice. The proposed DBF decoding complexity control method first calculates the average of allocated slice DBF decoding complexity (denoted by C dbf,avg ) and the target slice decoding complexity constraint (denoted by C slice,target ), which are mathematically expressed as C dbf,avg = C allocated Nc , C slice,target = Cconst−C allocated N−Nc , (5.6) where C allocated and C const are the allocated DBF decoding complexity and the DBF decoding complexity constraint, respectively. N and N C are the slice number and the number of coded slices, respectively. Then, the proposed slice-layer decoding complexity control method comparesC dbf,avg and C slice,target and then decides whether the DBF operations of the current slice should be enabled or disabled. Once the average of allocated slice DBF decoding complexity is less than the target slice decoding complexity constraint (i.e., C dbf,avg ≤ C slice,target ), the DBF operations of the current slice are enabled. Otherwise, the DBF operations are disabled by setting disable deblocking filter idc equal to one. Since decoding complexities of BS computation, edge detection and low-pass filtering can be fully controlled by the proposed slice-layer decoding complexity control method, it helps the encoder generate decoder-friendly bit streams to meet different decoding 128 complexity constraints. Experimental results of the proposed DBF decoding complexity control method will be shown in Section 5.6. 5.4.4 Hierarchical Complexity Control Method Features of the proposed hierarchical complexity control method are summarized in the following subsections. More detailed implementation of the proposed method will be presented in Sec. 5.5. 5.4.4.1 Slice-layer and BU-layer Complexity Control A basic-unit-layer (BU-layer) complexity control method is used to control the DBF decoding complexity, where a BU can be one MB or several MBs. Being similar to the Average Method, a slice-layer decoding complexity control method that determines the use of DBF operations for the current slice is utilized so that the encoder can generate decoder-friendly bit streams according to variant decoding complexity constraints. The slice-layer complexity control method offers a binary decision for one slice (i.e., DBF operations are either enabled or disabled). The BU-layer decoding complexity control method offer a finer resolution of decoding complexity control. 5.4.4.2 GOS-based Complexity Control Second, we may consider the group of slices (GOS)-based complexity control, which is inspired by the group of pictures (GOP) rate control algorithm. Several encoding slices are grouped together. The DBF operations of the first slice is always enabled while 129 those of other slices in the same group are determined by the slice-layer and BU-layer complexity control methods. 5.4.4.3 Lagrangian Multiplier Selection Finally, the BU-layer complexity control method can be embedded in the RDO process. In other words, the BU DBF decoding complexity is controlled by mode decision. As mentioned before, the selection of a suitable Lagrangian multiplier, λ c , in Eq. (5.5) so as to control the decoding complexity is a challenging task. A suitable value of λ c can be determinedintheproposedBU-layercomplexitycontrolmethodbasedontheinformation obtained from the previous coded BU. 5.5 Detailed Implementation of Hierarchical Complexity Control Method 5.5.1 Frame-based DBF Process TheDBFoperationstypicallyareperformedafteraframeiscompletelyencodedasshown in the left of Fig. 5.3. The frame encoding process consists of the macro-block (MB) encoding stage and the MB writing stage, which is the process to generate the coded bit stream. The MB encoding stage is the most complicated part in the H.264/AVC encoder. Since H.264/AVC provides many coding modes, the RDO process in the MB encoding stage performs encoding and decoding tasks to get the bit rate and distortion for a specific coding mode, and then the best coding mode that minimizes the RD cost function in (5.4) is selected to encode the MB as shown in the right of Fig. 5.3. 130 Figure 5.3: The deblocking filter operation for one frame. Since both RD performance and the DBF decoding complexity are jointly considered bytheBU-layercomplexitycontrolmethodintheRDOprocess,DBFoperationsmustbe executed in the RDO process, i.e., in the mode decision loop, so that the DBF decoding complexity can be evaluated and then the best coding mode that minimizes the RDC cost function in (5.5) can be selected. The DBF process is integrated into the RDO process as shown in Fig. 5.4. Since the DBF process may modify pixels around the MB boundary, pixels of the left and upper MBs around the target MB must be saved before performing the DBF process and then used by the DBF process in the next trial coding mode. Similarly, results of the DBF process, i.e., pixels of the current MB and those of the left and upper MBs around it, must be saved. Once the trial coding mode is selected as the best coding mode, its results are stored in the reference frame buffer to be used by the motion estimation process. Once a suitable Lagrangian multiplier, λ c , is determined, the RDO process can evalu- ate the RDC cost function in Eq. (5.5) for all coding modes, and the best one is selected 131 Figure 5.4: The deblocking filter operations integrated into the rate-distortion optimiza- tion (RDO) processs. to minimize the distortion while meeting the bit rate and the DBF decoding complexity constraint simultaneously. 5.5.2 Proposed Algorithm As mentioned before, the DBF process of the first slice is always enabled while those of otherslicesintheGOSaredecidedbytheBU-layerandtheslice-layercomplexitycontrol methods. The flow diagram of the proposed decoding complexity control method for the remaining slices is shown in Fig. 5.5. First, the slice-layer decoding complexity control is used to decide whether the DBF process should be enabled or disabled for the current slice by setting the bit stream flag, disable deblocking filter idc. Here, the linear prediction method is used to predict the 132 Figure5.5: Theproposeddecodingcomplexitycontrolmethodforagroupofslices(GOS). DBF decoding complexity of the current slice, denoted byC slice,pred , before it is encoded. IfthepredictedDBFdecodingcomplexityislessthanthetargetslicedecodingcomplexity constraint, denoted byC slice,target , the DBF process of the current slice is enabled. Thus, this condition is written as C1:C slice,pred ≤C slice,target (5.7) IfC slice,pred isgreaterthanC slice,target buttheirdifferenceisbelowaparticularthresh- old as expressed below C2:|C slice,pred −C slice,target |≤THD, (5.8) theDBFprocessofthecurrentsliceisstillenabledwhiletheBU-layercomplexitycontrol method is used to control the DBF decoding complexity by mode decision so that the 133 allocatedDBFdecodingcomplexityofcurrentslicecanbelessorequaltothetargetslice decoding complexity constraint C slice,target . As mentioned before, complexity control by mode decision can only reduce a small amount of the DBF decoding complexity since the complexity of BS computation is required by all MBs. Thus, the BU-layer complexity control is used to lower the DBF decoding complexity only when the difference between C slice,pred and C slice,target is below a particular threshold. If their difference is large, the DBF process of the current slice should be disabled so as to save the decoding complexity. Note that the encoder needs to perform the DBF for all slices so that the DBF decoding complexity of the current slice can be estimated by the proposed complexity model and then this information can be used by the linear prediction method to estimate the slice decoding complexity for future slices. The proposed slice-layer decoding complexity control only sets the bit stream flag, disable deblocking filter idc, equal to one, to disable the DBF operations used by decoders. When disable deblocking filter idc is equal to one, the DBF operations are still performed by the encoder but the result is not updated into the frame buffer so that the same reference frame is used by both the encoder and the decoder. As mention before, the selection of a proper Lagrangian multiplier, λ c , in Eq. (5.5) to control the decoding complexity is a challenging task. Being inspired by the work in [31], we propose a method to determine a proper value of λ c for complexity control. Since the RD characteristics of continuous coding units should be similar, the RD characteristics of the current coding unit are first collected by trying several operation modes (e.g., quantization parameters) and then the information is used to decide coding parameters so as to control the bit rate of the next coding unit in [31]. 134 Following by the same concept, the RD and decoding complexity (RDC) characteris- tics of the current BU are collected in the RDO process during mode decision, and then the information helps decide a proper λ c for complexity control. To be more specific, since each λ c corresponds to an optimal solution that can minimize the RD cost func- tion while meeting a specific decoding complexity constraint, different λ c ’s are tried in the RDO process and their RDC characteristics (i.e., RD(λ c ) and C(λ c )) are collected during mode decision, where RD(λ c ) and C(λ c ) can be expressed by RD(λ c )=D(blk i |QP,m ∗ )+λ m ·R(blk i |QP,m ∗ ) C(λ c )=C(blk i |QP,m ∗ ), (5.9) where m ∗ is the best coding mode chosen in the RDO process for a given λ c . That is, m ∗ = arg min{D(blk i |QP,m)+λ m ·R(blk i |QP,m)+λ c ·C(blk i |QP,m)}. (5.10) Then, a proper Lagrangian multiplier λ ∗ c , which is the minimal Lagrangian multiplier between λ 1 and λ 2 (i.e., λ ∗ c = min(λ 1 ,λ 2 )), is selected to control the DBF decoding complexity of the next BU, where λ 1 and λ 2 satisfy the following two conditions: C(λ 1 )≤C BU,target RD(λ 2 )≤(1+γ)·RD(0), (5.11) where C BU,target is the target BU decoding complexity constraint given by C BU,target = C slice,target −C BU,allocated N BU −N coded,BU , (5.12) 135 where C slice,target is the target slice decoding complexity constraint, and C BU,allocated is theallocatedBUDBFdecodingcomplexityinthisslice. N BU andN coded,BU arethetotal BU number and the number of coded BUs in this slice, respectively. RD(0) is the RD costwithoutanydecodingcomplexitycontrol, whichisalsotheminimumofRD(λ c ). We set γ =0.1 in our experiments. The selection of the best Lagrangian multiplier between λ 1 and λ 2 means that either the Lagrangian multiplier can meet the decoding complexity constraint (i.e., C(λ 1 ) ≤ C BU,target )oritcanlowerthedecodingcomplexityandresultintolerableRDperformance degradation (i.e., RD(λ 2 ) ≤ (1+γ)·RD(0)). It is used to control the DBF decoding complexityofthenextBU.Inotherwords,ifλ 2 isselectedtocontroltheDBFcomplexity of the next BU, it implies that λ 1 causes larger RD performance degradation although λ 1 can result in the target decoding complexity constraint. Then, we should choose the smaller Lagrangian multiplier to lower the DBF decoding complexity but still maintain a certain degree of RD performance. TheRDCcharacteristicsofForemanandMobile256KBpssequencesatthe10thframe are shown in Fig. 5.6, where the y-axis is RD(λ c ) and the x-axis is C(λ c )), respectively. Experimental results of the hierarchical decoding complexity control will be shown in Section 5.6. 136 Figure5.6: The rate, distortionanddecoding complexitycharacteristicsforForemanand Mobile 256KBps sequences at the 10th frame, where the y-axis is RD(λ c ) and the x-axis is C(λ c ), respectively 137 Table 5.2: Selected training bit streams for CIF sequences, where each bit stream is with {IPPP...PP} picture structure Sequence Frame number QP Foreman 200 QP=10, 14, 18, ..., 46 Foreman 270 QP=12, 16, 20, ..., 48 Mobile 150 QP=10, 14, 18, ..., 46 Mobile 270 QP=12, 16, 20, ..., 48 5.6 Experimental Results 5.6.1 Model Verification We conducted experiments to verify the proposed DBF decoding complexity model and the decoding complexity control methods on the PC platform. The CPU was Pentium mobile1.7GHzCPUwith512MbRAMandtheoperatingsystemwasWindowsXP.The reference JM9.4 [17] decoder was optimized by the Intel MMX technology. We selected Foreman and Mobile CIF sequences as training sequences and pre-encoded 40 training bit streams. These 40 bit streams contain different frame numbers and were encoded by different quantization parameters (QPs). Each bit stream is with the {IPPP...PP} picture structure. Table 5.2 shows the coding configurations of these CIF training bit streams. After that, the Intel VTune performance analyzer 8.0 was used to measure the DBF decoding complexities for all pre-encoded bit streams. The numbers of clock-ticks measured by the Intel VTune were divided by 1.7·10 7 to get the DBF decoding time in milli-seconds. Then, the proposed DBF decoding complexity model counted the number of all decoding complexity variables, i.e., N mb , N e,nz , N common , and N strong , for those pre-encoded bit streams. 138 Table 5.3: Selected training bit streams for D1 sequences, where each bit stream is with {IPPP...PP} picture structure Sequence Frame number QP Football 70 QP=10, 16, 22, ..., 46 Football 60 QP=12, 18, 24, ..., 48 Football 50 QP=14, 20, 26, ..., 44 Mobile 70 QP=10, 16, 22, ..., 46 Mobile 60 QP=12, 18, 24, ..., 48 Mobile 50 QP=14, 20, 26, ..., 44 The above informationwas usedto trainthe weights ofthe DBFdecoding complexity model for CIF sequences by the constrained least squares method. The weights of the DBF decoding complexity model for CIF sequencess are ω mb =2.8439·10 −4 ω strong =2.3023·10 −6 ω common =4.9979·10 −6 ω edge =7.6983·10 −6 (5.13) Similarly, we selected Football and Mobile D1 sequences to generate 42 training bit streams, and Blue sky, and Toy and calendar HD sequences to encode 40 training bit streams so as to train the weights for D1 and HD sequences, respectively. Tables 5.3 and 5.4 summarize the encoding configurations for these training D1 and HD bit streams. The weights of the DBF decoding complexity model for D1 sequences are: ω mb =2.71834·10 −4 ω strong =8.16169·10 −6 ω common =4.53922·10 −6 ω edge =8.7538·10 −6 (5.14) 139 Table 5.4: Selected training bit streams for HD sequences, where each bit stream is with {IPPP...PP} picture structure Sequence Frame number QP Blue 15 QP=10, 20, 30, 40 Blue 14 QP=12, 22, 32, 42 Blue 13 QP=14, 24, 34, 44 Blue 12 QP=16, 26, 36, 46 Blue 11 QP=18, 28, 38, 48 Toy 15 QP=10, 20, 30, 40 Toy 14 QP=12, 22, 32, 42 Toy 13 QP=14, 24, 34, 44 Toy 12 QP=16, 26, 36, 46 Toy 11 QP=18, 28, 38, 48 The weights of the DBF decoding complexity model for HD sequences are: ω mb =3.96514·10 −4 ω strong =8.85689·10 −6 ω common =5.105741·10 −6 ω edge =9.548236·10 −6 (5.15) Next, these weights were integrated into the proposed decoding complexity model to estimatedecodingcomplexitiesofCIFbitstreamsgeneratedbyAkiyo,Container,Mobile, Silent, Flower, Foreman, Stefan and Tempete sequences, D1 bit streams generated by Football, Susie, Ship and Mobile sequences, and HD bit streams generated by Blue sky, Toy and calendar, Sunflower and Rush hour. Figs. 5.7, 5.8, and 5.9 shown experimental resultsofmodelverification. WeseethattheproposedDBFcomplexitymodelcanprovide good estimation results, where errors are all less than 10%. 140 Figure 5.7: Model verification for variant CIF bit streams 141 Figure 5.8: Model verification for variant D1 bit streams Figure 5.9: Model verification for variant HD bit streams 142 5.6.2 Complexity Control with the Average Method Experimental results of CIF, D1 and HD sequences for the H.264/AVC encoder equipped withtheDBFdecodingcomplexitymodelandtheproposedAveragedecodingcomplexity control method to generate decoder-friendly bit streams are shown in Figs. 5.10, 5.11, 5.12, and 5.13, respectively, where results in each row correspond to one bit stream. The x-axis is the decoding time and the y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the RD performance (column 3). The first point in the x-axis corresponds to the case without decoding complexity control. As compared with sequences without decoding complexity control, the Foreman CIF sequences with the target DBF decoding complexity at 30 ms loses 0.1 dB in PSNR but it can save the decoding complexity by about 64.06%. The D1 Ship sequence with the target DBF decoding complexity at 30 ms can offer similar RD performance to the bit streamwithoutcomplexitycontrolyetupto63.60%oftheDBFdecodingcomplexitycan be saved. Finally, the Blue sky HD sequence with the target DBF decoding complexity at 200 ms can provide the same RD performance as the bit stream without complexity control while saving 57.30% of the DBF decoding complexity in time. 5.6.3 Complexity Control with the Hierarchical Method Experimental results of CIF, D1 and HD sequences encoded by the H.264/AVC coder equipped with DBF decoding complexity models and the proposed Hierarchical decoding complexity control method are shown in Figs. 5.14, 5.15, 5.16, and 5.17, where each BU contains 15, 15 and 6 MBs for CIF, D1 and HD bit streams, respectively. Similarly, results in each row correspond to one bit stream. The x-axis is the decoding time and the 143 Figure 5.10: Decoding complexity control by the Average method for Akiyo, Container, Mobile, and Silent CIF bit streams 144 Figure 5.11: Decoding complexity control by the Average method for Flower, Foreman, Stefan, and Tempete CIF bit streams 145 Figure 5.12: Decoding complexity control by the Average method for Football, Ship, Susie, and Mobile D1 bit streams 146 Figure 5.13: Decoding complexity control by the Average method for Blue sky, Toy and calendar, Sunflower, and Rush hour, HD bit streams 147 y-axis is the deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance (column 3). The first point in the x-axis corresponds to the case without decoding complexity control. As compared with sequences without decoding complexity control, the Flower CIF sequence with the target DBF decoding complexity at 30 ms can save 47.18% of the DBF decoding complexity, and the Susie D1 sequence with the target DBF decoding complexity at 40 ms can save 57.27% of the DBF decoding complexity while these two bit streams offer the RD performance similar to those without decoding complexity control. Finally, the Rush hour HD sequence with the target DBF decoding complexity at 250 ms loses 0.1 dB in PSNR, but saves 56.93% of decoding complexity in time. These experimental results clearly demonstrate that the H.264/AVC encoder with the proposed DBF decoding complexity models and the decoding complexity control methods can generate bit streams to meet different decoding complexity constraints. Errors between actual decoding complexities and target decoding complexity constraints are all less than 10%. Besides, the resultant bit streams can save a significant amount of decoding complexity while the generated decoder-friendly bit streams have the RD performance similar to those without decoding complexity control. This is particularly interesting in a mobile broadcasting environment, where multiple mobile devices receive broadcast/streaming video in real time. Finally, although the Average method can offer similar performance as the Hierarchi- cal method, there are two advantages of the Hierarchical method. First, DBF decoding complexities can vary among different groups of slices so as to meet different decod- ing complexity constraints of a target decoding platform with the Hierarchical decoding 148 Figure 5.14: Decoding complexity control by the Hierarchical method for Akiyo, Con- tainer, Mobile, and Silent CIF bit streams 149 Figure 5.15: Decoding complexity control by the Hierarchical method for Flower, Fore- man, Stefan, and Tempete CIF bit streams 150 Figure 5.16: Decoding complexity control by the Hierarchical method for Football, Ship, Susie, and Mobile D1 bit streams 151 Figure 5.17: Decoding complexity control by the Hierarchical method for Blue sky, Toy and calendar, Sunflower, and Rush hour, HD bit streams 152 complexity control algorithm. Second, the Hierarchical complexity control can be easily integrated with the GOP rate control algorithm. Since the first picture of a GOP is usually an I-frame, it is desirable to enable the DBF operations for the I-frame to avoid quantizationerrorpropagation. TheDBFoperationsofthefirstframearealwaysenabled by the Hierarchical method but may not be enabled by the Average method. 5.7 Conclusion The H.264/AVC deblocking filter (DBF) decoding complexity model and its application to H.264/AVC encoding were presented in this chapter. The encoder integrated with the proposed complexity model and complexity control methods can generate a bit stream that is tailored to a receiver platform with some hardware constraint. The coded bit stream can balance the tradeoff between the RD requirement and the computational power of the decoding platform. The decoding complexity model was verified experimen- tally. The performance of the H.264/AVC codec with the proposed decoding complexity model and the proposed decoding complexity control methods was also presented. The resultant bit streams can be decoded at a much lower complexity while the RD per- formance of these decoder-friendly bit streams are quite similar to those of bit streams without decoding complexity control. 153 Chapter 6 Conclusion and Future Work 6.1 Summary of the Research In this thesis, we considered several decoding complexity models and decoding complex- ity control schemes for the H.264/AVC video coding standard. The H.264/AVC codec equipped with these decoding complexity models can generate bit streams to meet dif- ferent decoding complexity constraints. Specifically, we proposed decoding complexity models for the following H.264/AVC building modules: MCP, SCP, CABAC, UVLC, CAVLC, and the deblocking filter (DBF). Decoding complexity models for MCP and SCP in H.264/AVC were studied in Chap- ter2. SinceMCPcanbeviewedasamemoryoperation, itsdecodingcomplexitydepends on the efficiency of cache management. Traditional MCP decoding complexity models consideronlythenumberofmotionvectors(MVs)andthenumberofinterpolationfilters withoutpayingattentiontotherelationshipbetweenMVs, whichplaysanimportantrole incachemanagementefficiencyand,therefore,theMCPdecodingcomplexity. Toaddress 154 this issue, the number of cache misses was incorporated in the proposed decoding com- plexitymodel. Wealsoexaminedtheeffectsofframesizesandthedistributionofselected reference frames. It was shown by experimental results that frame sizes and distributions of selected reference frames affect the weight for number of cache misses (which can seen as the cache miss penalty) and, therefore, the weight for number of cache misses must be re-trained according to distributions of selected reference frames and frame sizes. Since the number of cache misses is less important in SCP decoding complexity, the SCP decoding complexity model can be simply modeled as a function of the number of intra-prediction directions for a particular intra type. The proposed MCP/SCP decoding complexity models can provide good estimation results whose estimation errors are all less than 10% when compared to the actual MCP/SCP decoding complexities measured by Intel VTune. The MCP/SCP decoding complexity control scheme was also proposed in Chapter 2. It estimates the decoding complexity for a given inter-prediction mode and MV (or an intra type and the prediction direction), and then those inter-prediction modes and MVs (or intra types and prediction directions) whose decoding complexities are higher than the decoding complexity constraint are eliminated. The H.264 encoder equipped with proposed decoding complexity models and complexity control scheme can generate decoder-friendly bit streams to meet different decoding complexity constraints accurately. A decoding complexity model and its associated decoding complexity control scheme for the H.264/AVC context-based adaptive binary arithmetic coding (CABAC) decoding were investigated in Chapter 3. It was observed that the CABAC decoding complexity is proportional to the number of binary arithmetic decoding (BAD) executions, where this 155 numberisequaltothelengthofbinarysequencegeneratedbythebinarizationprocessfor a non-binary syntax elements. Therefore, this number is included into our decoding com- plexity model. The proposed CABAC decoding complexity model consists of two parts. Oneisforsourcedata,i.e.,quantizedtransformedcoefficient(QTC),whiletheotherisfor header data. The proposed decoding complexity model provides good estimation results, where errors between the estimated and actual decoding complexities are less than 7%. The linear relationship between coding bit rates and decoding complexity was observed, theoretically argued and experimentally verified, which can be used in developing a sim- ple yet effective decoding complexity control scheme and selecting a proper Lagrangian multiplier to achieve the rate-distortion and decoding complexity (RDC) optimization. The decoding complexity control scheme can help the encoder generate decoder-friendly bit streams to meet different decoding complexity constraints accurately at the cost of little quality loss. AdecodingcomplexitymodelanditsassociatedcomplexitycontrolschemeforCAVLC and UVLC in H.264/AVC were proposed in Chapter 4. CAVLC and UVLC are entropy coding tools used to code source data and header data, respectively. The proposed de- coding complexity model is accurate, where errors between the estimated and actual decoding complexities are less than 8%. To develop the decoding complexity control scheme, the relationship between the bit rate and decoding complexity was studied. It was shown by theoretical analysis as well as experimental results that the relationship between the source bit rate and the CAVLC decoding complexity is quadratic while the relationship between the header bit rate and the UVLC decoding complexity is linear. The decoding complexity control scheme was then developed, and a proper Lagrangian 156 multiplier to achieve the RDC optimization can be determined accordingly. The encoder equipped with the proposed decoding complexity model and complexity control scheme can generate decoder-friendly bit streams. The DBF decoding complexity model and its associated decoding complexity control scheme were studied in Chapter 5. The DBF process in H.264/AVC consists of three modules: boundary strength (BS) computation, edge detection, and low-pass filtering. Since the decoding complexity model in [10] only considers the operations of low-pass filtering, it is not accurate. In our research, complexities of all three modules are taken intoaccountintheDBFdecodingcomplexitymodel. Therelationshipbetweentheframe sizeandtheDBFdecodingcomplexitywasalsostudied,sincetheframesizecanaffectthe parameters of DBF decoding complexity model. Thus, model parameters should be re- trained for different frame sizes of the same video content. The proposed DBF decoding complexitymodelwasverifiedexperimentally. ErrorsbetweenestimatedandactualDBF decoding complexities measured by Intel VTune are all less than 10% for a wide range of bit rates and video contents. Two DBF decoding complexity control schemes were proposed. In the first decoding complexity control scheme, the bit stream flag that determines the DBF operations of a coding slice is used to control the DBF decoding complexity. The second decoding com- plexity control scheme provides some flexibility so that each group of slices (GOS) can have different decoding complexity constraints. These two decoding complexity control schemes were verified experimentally. It was shown by experimental results that the gen- erated decoder-friendly bit streams can save a large amount of computational complexity 157 whileresultantbitstreamshaveR-Dperformancessimilartothosegeneratedbyatypical H.264 encoder. 6.2 Future Research Direction Tomakeourcurrentworkmorecomplete,thereareseveralresearchproblemsthatdeserve further investigation. • Decoding complexity allocation Decoding complexity allocation, analogous to bit allocation, is a process to assign a proper complexity budget to basic decoding units, which can be one macro-block (MB) or several MBs. The objective is to minimize the distortion while meeting the rate and the decoding complexity constraints. As shown in earlier chapters, the proposed complexity control schemes can help an encoder generate bit streams to meetdifferentdecodingcomplexityconstraintsaccurately. However,thecomplexity control scheme focuses on the total complexity rather than that for basic units. Consider the scenario that a video clip contains two basic units. The first one has higherdecodingcomplexity(e.g., withmorecomplicatedtextureandmotion)while the second one has lower decoding complexity (e.g., in a smooth and still region). Our current complexity control scheme assigns the average decoding complexity to both basic units. It is desirable to develop a more sophisticated complexity control scheme that assigns higher decoding complexity to the first one yet lower decoding complexity to the second one and the total decoding complexity constraint is still satisfied. 158 • ApplicationoftheLagrangianmultipliermethodtoMCP/SCPcomplex- ity control Since it is difficult to find the optimal Lagrangian multiplier for complexity control, we proposed a two-step decoding complexity control scheme in Chapter 2. That is, the MB (or block) decoding complexity constraint is first determined, and the optimal coding mode is then decided from a smaller set of inter- or intra-prediction modes whose decoding complexities are less than the MB (or block) decoding com- plexity constraint. Although this scheme works to a certain degree, it is however desirabletousetheLagrangianmultipliermethodtocontrolthedecodingcomplex- ityforthefollowingreasons. First, theLagrangianmultipliermethodissuitablefor complexitycontrolinbasicunitswhiletheproposedschemeislimitedtocomplexity control for MBs only. Second, as mentioned in Chapter 2, the optimal mode may notbefoundbytheLagrangianmultipliermethodiftheoptimalmodeisnotonthe convex hull of the RD curve [34] (i.e., there exists a duality gap). However, it can be shown that the duality gap can be very small when the number of MBs in one basic unit is larger. In other words, the solution using the Lagrangian multiplier method can be close to the optimal solution [3]. Thus, it is worthwhile to apply the Lagrangian multiplier method to decoding complexity control. • Joint rate, distortion and decoding complexity optimization in entropy decoding There are two entropy coding modes in H.264/AVC: CABAC and CAVLC/UVLC. As mentioned in Chapter 3, CABAC can achieve better RD performance than 159 CAVLC/UVLC while the decoding complexity of CABAC is much higher than that of CAVLC/UVLC. Since CABAC always has better RD performance, the entropycodingmodeselection(i.e.,selectionbetweenCABACandCAVLC/UVLC) is unnecessary when the RD optimization problem is considered. In the current reference codes, the H.264/AVC encoder adopts CAVLC/UVLC in the baseline profile only. However, due to its higher decoding complexity, CABAC may not be the best entropy coding mode when the joint optimization problem of rate (R), distortion (D) and decoding complexity (C) is considered. It is interesting to seek the optimal entropy coding mode to reach the balance between RD and decoding complexity in the near future. • Decoding complexity model of H.264 scalable video coding (SVC) Recently, the Joint Video Team (JVT) has developed the scalable extension of H.264/AVC [40], which is commonly known as the H.264/SVC (scalable video cod- ing). The objective of this new standard is to provide a bit stream that adapts to a wide range of decoding platforms with different channel bandwidths, display resolutions and frame rates. To achieve this goal, the generated bit stream consists of one basic layer and several enhancement layers. The basic layer can be decoded by the conventional H.264/AVC decoder while other layers, which provide quality enhancement, can only be decoded by the H.264/SVC decoder. Three types of scalability are provided in H.264/SVC; namely, temporal scalability, spatial scalability, and quality scalability. For temporal scalability, enhancement layers provide the information to increase the frame rate of the basic layer. For 160 spatialscalability,enhancementlayersofferextrainformationtoincreasethepicture resolution of the basic layer. Finally, for quality scalability, enhancement layers can increase PSNR while keeping the picture resolution and the frame rate the same. H.264/SVC can generate bit streams into several enhancement layers with different decoding complexity constraints. That is, the basic layer demands the least com- puting power while enhancement layers offer better PSNR but has to be decoded at higher complexity. However, we have not yet seen any research on the decod- ing complexity model for H.264/SVC. It should be interesting to investigate the H.264/SVC decoding complexity model to get an optimal tradeoff between com- plexity and scalability. • Application of the decoding complexity model to the H.264/AVC de- coder SinceH.264/AVCprovidesalargenumberofcodingmodes,itsdecodingcomplexity usually is about 2.1 to 2.9 times more than previous video coding standards. For some applications where bit streams are decoded in mobile devices with limited computing power, H.264/AVC may skip some frames (or MBs) in the bit stream so as to save its power and prolong the usage time. Under such a scenario, decoding complexity models can be embedded in the H.264/AVC decoder as well. Then, decoding complexity models help the decoder estimate the decoding complexity of a frame (or an MB) before performing decoding tasks, and then it may skip those frames (or MBs) whose decoding complexities are higher than the constraint. 161 The decoding complexity model can also be used in the H.264/SVC decoder. That is, the decoder can estimate the possible decoding complexity of an enhancement layer prior to its actual decoding. The decoder can skip the enhancement layer of high decoding complexity so as to save the computing power and prolong the usage time. It is an interesting topic to study the application of the decoding complexity model to H.264/AVC and H.264/SVC decoders. • H.264 decoding complexity allocation with multiple modules In this thesis, we proposed the MCP/SCP decoding complexity model, the entropy decoding complexity model, and the deblocking filter decoding complexity model and addressed their applications to the H.264 decoding complexity reduction in- dividually. However, since these three decoding modules are executed in sequence in a decoding platform, their decoding complexity reduction should be considered jointly(ratherthanonebyoneseparately)intheencoder. Inotherwords, adesired encodershouldgeneratedecoder-friendlybitstreamstoachievethebestRDperfor- mancewhilethesumofdecodingcomplexitiesofthesethreedecodingmodulesmeet the target decoding complexity constraint. This problem can be viewed as a prob- lem that allocates the total decoding complexity among three decoding modules. A suitable decoding complexity constraint may be assigned to each decoding module first, and then each decoding module performs its associated decoding complexity controlmethodsuchthattheRDperformanceisoptimalwhilethesumofdecoding complexities for all decoding modules (i.e., the total decoding complexity) is less 162 or equal to the target decoding complexity constraint. It is worthy to study H.264 decoding complexity allocation with multiple modules. 163 References [1] Y. Andreopoulos and M. van der Schaar, “Complexity-constrained video bitstream shaping,” IEEE Trans. on Signal Processing, vol. 55, pp. 1967–1974, May 2007. [2] D. P. Bertsekas, “Nonlinear Programming,” in Athena scientific, vol. 2nd edition, 2003, pp. 493–495. [3] ——, “Nonlinear Programming,” in Athena scientific, vol. 2nd edition, 2003, pp. 502–503. [4] G. Bjontegarrd and K. Lillevold, “Context-adaptive VLC coding of coefficients,” in JVT document, vol. JVT-C028, May 2002. [5] J. L. Chen, Y. K. Lin, and T. S. Chang, “A low cost context adaptive arithmetic coder for H.264/MPEG-4 AVC video coding,” in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, May 2007. [6] F. Fu, X. Lin, and L. Xu, “Fast intra prediction algorithm in H.264/AVC,” in IEEE Int. Conf. Signal Processing(ICSP2004), Dec. 2004, pp. 1191–1194. [7] J. Hennessy and D. A. Patterson, “Computer architecture: A quantitative approach 2nd edition,” in Morgan Kaufmann, 1996, p. 417. [8] M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC baseline pro- file decoder complexity analysis,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, pp. 704–716, July 2003. [9] Y.Hu,Q.Li,S.Ma,andC.C.J.Kuo,“Jointrate-distortion-complexityoptimization for H.264 motion search,” in IEEE Int. Conf. Mediamedia Expo (ICME2006), July 2006, pp. 1949–1952. [10] Y.Hu,Q.Li,S.Ma,andC.-C.J.Kuo,“Decoder-FriendlyAdaptiveDeblockingFilter (DF-ADF)ModeDecisioninH.264/AVC,”in IEEE Int. Symposium on Circuits and Systems (ISCAS2007), May 2007. [11] Intel, “Intel VTune Performance Analyzer ,” in [Online] Available: http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vpa/219898.htm. [12] ——, “IntelR 64 and IA-32 Architectures Software Developers Manual,” in [Online] Available: www.intel.com/design/processor/manuals/253665.pdf. 164 [13] ITU-T and ISO/IEC standard, “Coding of moving pictures and asociated audio for digital storage media at up to about 1.5 Mbit/s - Part 2:Video,” in ISO/IEC JTC1 11172-2 (MPEG-1), 1993. [14] ——, “Generic coding of moving pictures and asociated audio information - Part 2:Video,” in ITU-T and ISO/IEC JTC1 13818-2 (MPEG-2), 1994. [15] ——, “Information technology - coding of audiovisual objects - Part 2: Visual,” in ISO/IEC 14496-2 (MPEG-4), Dec 1999. [16] N. S. Jayant and P. Noll, “Digital coding of waveforms – principles and applications to speech and video,” in Prentice Hall, 1984, pp. 16–17. [17] Joint Video Team, “H.264 JM94 reference code,” in [Online] Available: http://iphome.hhi.de/suehring/tml/download/old jm/jm94.zip . [18] C. S. Kannangara, I. E. G. Richardson, M. Bystrom, J. R. Solera, Y. Zhao, A. MacLennan, and R. Conney, “Low-complexity skip prediction for H.264 through Lagrangian cost estimation,” IEEE Trans. on Circuits and Systems for Video Tech- nology, vol. 16, pp. 202–208, Feb. 2006. [19] C. Kim and C. C. J. Kuo, “Feature-based intra/inter coding mode selection for H.264/AVC,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, pp. 441–453, April 2007. [20] D. K. Kwon, M. Y. Shen, and C. C. J. Kuo, “Rate control for H.264 video with enhanced rate and distortion models,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, pp. 517–529, May 2007. [21] D. N. Kwon, P. F. Driessen, A. Basso, and P. Agathoklis, “Performance and compu- tational complexity optimization in configurable hybrid video coding system,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, pp. 31–42, Jan. 2006. [22] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficients distribution for images,” IEEE Trans. on Image Processing, vol. 10, pp. 1661–1666, Oct. 2000. [23] V. Lappalainen, A. Hallapuro, and T. D. Hamalainen, “Complexity of optimized H.26L video decoder implementation,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp. 717–725, July 2003. [24] S.W.LeeandC.-C.J.Kuo,“Complexitymodelingforcontext-basedadaptivebinary arithmetic coding (CABAC) in H.264/AVC decoder,” in Conference on Applications of Digital Image Processing, SPIE Optics and Photonics (SPIE 2007), Aug. 2007. [25] ——, “Complexity modeling for motion compensation in H.264/AVC decoder,” in IEEE Int. Conf. on Image Processing (ICIP2007), Sep. 2007. [26] ——, “Motion compensation complexity model for decoder-friendly H.264 system design,” in IEEE Int. Workshop on Multimedia Signal Processing (MMSP2007), Oct. 2007. 165 [27] ——, “Complexity modeling of H.264/AVC CAVLC/UVLC rntropy decoders,” in IEEE International Symposium on Circuits and Systems (ISCAS2008), May 2008. [28] S. W. Lee and C. C. J. Kuo, “H.264/AVC Decoder Complexity Modeling and Appli- cations (II):Entropy Decoding,” submitted to IEEE Trans. on Circuits and Systems for Video Technology, 2008. [29] ——, “H.264/AVC Decoder Complexity Modeling and Applications (I):Spatial and Temporal Compensations,” submitted to IEEE Trans. on Circuits and Systems for Video Technology, 2008. [30] Z. G. Li, F. Pan, K. P. Lim, X. Lin, and S. Rahardja, “Adaptive rate control for H.264,”inIEEEInt.Conf.onImageProcessing(ICIP2004),Oct.2004,pp.745–748. [31] L. J. Lin and A. Ortega, “Bit-rate control using piecewise approximated rate- distortion characteristics ,” IEEE Trans. on Circuits and Systems for Video Tech- nology, vol. 8, pp. 446–459, 1998. [32] P.List, A.Joch, J.Lainema, G.Bjntegaard, andM.Karczewicz, “Adaptivedeblock- ing filter,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, pp. 614–619, 2003. [33] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp. 620–636, July 2003. [34] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Magazine, vol. 15, pp. 23–50, Nov. 1998. [35] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stock- hammer, and T. Wedi, “Video coding with H.264/AVC: Tools, performance, and complexity,” IEEE Circuits and Systems Magazine, vol. 4, pp. 7–28, 2004. [36] F. Pan, X. Lin, S. Rahardja, K. P. Lim, Z. G. Li, D. Wu, and S. Wu, “Fast mode decision algorithm for intraprediction in H.264/AVC video coding,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, pp. 813–822, July 2005. [37] S. Perschau and M. J. Horowitz, “Complexity analysis: H.264 vs. computational efficient H.264,” in Video Coding Experts Group (VCEG) SG16, Aug. 2005. [38] I. Richardson and Y. Zhao, “Video encoder complexity reduction by estimating skip mode distortion,” in IEEE Int. Conf. on Image Processing (ICIP2004), Oct. 2004, pp. 103–106. [39] I. E. G. Richardson, “H.264 and MPEG-4 video compression - video coding for next generation multimedia,” in John Wily and Sons, 2003, pp. 198–207. [40] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 17, pp. 1103–1120, 2007. 166 [41] G.Shen, G.P.Gao, S.Li, H.Y.Shum, andY.Q.Zhang, “Acceleratevideodecoding with generic GPU,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 5, pp. 685–693, May 2005. [42] G.J.SullivanandT.Wiegand,“Rate-distortionoptimizationforvideocompression,” IEEE Signal Processing Magazine, vol. 15, pp. 74–90, Nov 1998. [43] C. H. Tseng, H. M. Wang, and J. F. Yang, “Improved and fast algorithms for in- tra 4x4 mode decision in H.264/AVC,” in IEEE Int. Symp. Circuits and Systems (ISCAS2005), May 2005, pp. 2128–2131. [44] ——, “Enhanced Intra-4x4 mode decision for H.264/AVC coders,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 16, pp. 1027–1032, Aug. 2006. [45] J. Valentim, P. Nunes, and F. Pereia, “An alternative complexity model for the MPEG-4 video verifier mechanism,” in IEEE Int. Conf. on Image Processing (ICIP2001), Oct. 2001, pp. 461–464. [46] ——, “Evaluating MPEG-4 video decoding complexity for an alternative video com- plexity verifier model,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, pp. 1034–1044, Nov. 2002. [47] M. van der Schaar and Y. Andreopoulos, “Rate-distortion-complexity modeling for network and receiver aware adaptation,” IEEE Trans. on Multimedia, vol. 7, pp. 471–479, June 2005. [48] Y. Wang, “Low-complexity H.264 decoder: motion estimation and mode decision,” in [Online] Available: http://www.ee.columbia.edu/ ywang/Research/camed.html. [49] ——,“Resourceconstrainedvideocoding/adaptation,”inPhDthesisgraduateschool of arts and sciences, Columbia Unviversity. [50] Y. Wang and S. F. Chang, “Complexity adaptive H.264 encoding for light weight stream,”inIEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP),May 2006, pp. II25–28. [51] T. Wiegand and B. Girod, “Lagrange multiplier selection in hybrid video coder control,” in IEEE Int. Conf. on Image Processing (ICIP2001), Oct. 2001, pp. 542– 545. [52] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate- constrained coder control and comparison of video coding standards,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp. 688–703, July 2003. [53] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVCcodingstandard,”IEEE Trans. on Circuits and Systems for Video Tech- nology, vol. 7, pp. 560–576, July 2003. [54] Z.ZhouandM.T.Sun,“Fastmacroblockintermodedecisionandmotionestimation forH.264/MPEG-4AVC,”inIEEE Int. Conf. on Image Processing (ICIP2004),Oct. 2004, pp. 789–792. 167
Abstract (if available)
Abstract
The problem of H.264/AVC decoder complexity modeling and its applications to control the decoding complexity are studied in this research. The encoder integrated with the decoding complexity model and the associated decoding complexity control algorithm can generate decoder-friendly bit streams in the sense that compressed bit streams can be easily decoded on a particular decoding platform at a lower complexity and/or to meet various decoding complexity constraints.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Rate control techniques for H.264/AVC video with enhanced rate-distortion modeling
PDF
Power efficient multimedia applications on embedded systems
PDF
Dependent R-D modeling for H.264/SVC bit allocation
PDF
Complexity scalable and robust motion estimation for video compression
PDF
Hybrid mesh/image-based rendering techniques for computer graphics applications
PDF
Efficient coding techniques for high definition video
PDF
Algorithms for scalable and network-adaptive video coding and transmission
PDF
The extraction and complexity limits of graphical models for linear codes
PDF
Techniques for efficient cloud modeling, simulation and rendering
PDF
Biologically inspired auditory attention models with applications in speech and audio processing
PDF
Distributed source coding for image and video applications
PDF
Model-guided performance tuning for application-level parameters
PDF
Advanced features and feature selection methods for vibration and audio signal classification
PDF
Line segment matching and its applications in 3D urban modeling
PDF
UWB indoor diffusion channel model and its application to receiver design
PDF
Geometric and dynamical modeling of multiscale neural population activity
PDF
Syntax-aware natural language processing techniques and their applications
PDF
Emotional speech production: from data to computational models and applications
PDF
Robustness of gradient methods for data-driven decision making
PDF
Hybrid methods for robust image matching and its application in augmented reality
Asset Metadata
Creator
Lee, Szu-Wei (author)
Core Title
H.264/AVC decoder complexity modeling and its applications
School
Andrew and Erna Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering (Multimedia and Creative Technology)
Publication Date
06/23/2008
Defense Date
05/12/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
deblocking filter,decoding complexity model,entropy decoding,H.264/AVC,motion compensation,OAI-PMH Harvest,rate-distortion optimization
Language
English
Advisor
Kuo, C. C. Jay (
committee chair
), Chang, Tu-Nan (
committee member
), van der Schaar, Mihaela (
committee member
)
Creator Email
szuweile@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1285
Unique identifier
UC1418603
Identifier
etd-Lee-20080623 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-73490 (legacy record id),usctheses-m1285 (legacy record id)
Legacy Identifier
etd-Lee-20080623.pdf
Dmrecord
73490
Document Type
Dissertation
Rights
Lee, Szu-Wei
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
deblocking filter
decoding complexity model
entropy decoding
H.264/AVC
motion compensation
rate-distortion optimization