Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dependent R-D modeling for H.264/SVC bit allocation
(USC Thesis Other)
Dependent R-D modeling for H.264/SVC bit allocation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DEPENDENT R-D MODELING FOR H.264/SVC BIT ALLOCATION by Yongjin Cho A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER ENGINEERING) May 2010 Copyright 2010 Yongjin Cho Dedication This dissertation is dedicated to my family with sincere appreciation. Thank you for everything. ii Acknowledgements I would like to express my sincere appreciation to my advisor, Dr. C.-C. Jay Kuo. I am really grateful that he willingly gave me the opportunity to work with him in his research groupandthankstohisinvaluablecommentsandadvices, Icouldhaveaglimpseofwhat it is to be a researcher. I would also like to thank my other committee members, Dr. Antonio Ortega and Dr. Aiichiro Nakano, for their insightful comments on my research. Upon the completion of the thesis, I could realize that I have fallen into great debt to my friends and family. I wish I could pay it off one day. The completion of this thesis is contributed by my colleagues from the Media Com- municationsLabandfriendsfromUSC.IappreciateDr. JaeHoonKim, Dr. Seung-Hwan Kim, Dr. Byung-Ho Cha, JeWon Kang and Dong-Woo Kang for their help and en- couragement. I am grateful to Jiaying Liu at Peking University, who has been a great co-workersharingalotofdiscussionswithmeallthetime. Especially,Icannotappreciate enough Dr. Do-Kyoung Kwon, who has been a friend and a mentor during my academic years. Without his endless encouragement and help, this research would have never been possible. I have to thank my sincere friends from Yonsei university, Dr. Minsik Ahn, Dr. Yang-Won Jung and SunHyoung Han. They strongly encouraged me to come to USC iii for my study and I could successfully finish my course with their endless support and encouragement. Also, I am very grateful to my fellow Yonseians, Ju-So Song, Jongeun Jun, Dr. Chulmin Lee, HyungSeok Kim, Woojin Choi, Dr. Kyu-Jeong Han and Samuel Kim. They gave their hands without any hesitation whenever I was in trouble. I am very grateful to Su-Bok Kim, Jong-Woo Lim, Jae-Sang Cho and Hawoon Kim for sharing the warmth of a family with me. I appreciate Dr. SangHyun Chang for his great inspiration, support and help. Also, I am very grateful to YoungWoo Ko and his family for their endless support, care and concern. I would also like to appreciate my friends who have helped me to stay in good shape. JunSooKim,YoungHoonKang,Dr. SangHyoKim,Dr. TaeHoonShin,Chang-YangLee, Dr. Eun-Jin Lee and Hyung-Ah Kim, I appreciate their great support, help and concern. Especially, I would like to express my sincere appreciation to Christopher and Michelle Lee for their great care and concern. I must not forget mention my friends from my childhood. Chul-Hee, Eung-Zu, Jung- Jae, Seung-Hyoung and Yong, they have been always a great driving force to me even though we have been far away from each other. Myfinalappreciationgoestomyfamily. Iwillneverforgetthesacrifice,faith,support, concern, care and help from my family, my father, mother, and brother. I would never be able to achieve this without them. Thank you. iv Table of Contents Dedication ii Acknowledgements iii List Of Tables viii List Of Figures ix Abstract xiii Chapter 1: Introduction 1 1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2: Research Background and Motivation 15 2.1 H.264/SVC Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2 Spatial Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Quality/SNR Scalability . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.4 Combined Scalability. . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Dependency in Scalable Video Coding . . . . . . . . . . . . . . . . . . . . 24 2.2.1 Temporal Dependency . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2 Spatial Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Bit Allocation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 Dependent Bit Allocation Algorithms . . . . . . . . . . . . . . . . 30 2.3.2 Independent Bit Allocation Algorithms . . . . . . . . . . . . . . . 33 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 3: Temporal Layer Dependent R-D Modeling and Its Application to Tem- poral Layer Bit Allocation 36 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Temporal Layer R-D Characteristics . . . . . . . . . . . . . . . . . . . . . 37 3.2.1 Hierarchical Temporal Layers of H.264/SVC . . . . . . . . . . . . . 37 3.2.2 Rate Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 39 v 3.2.3 Distortion Characteristics . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Temporal Layer Dependent Distortion Model . . . . . . . . . . . . . . . . 44 3.3.1 T Layer Distortion Modeling . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 GOP Distortion Modeling . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3 Modeling Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4 Temporal Layer Bit Allocation . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 Solution to the Lagrangian Equation . . . . . . . . . . . . . . . . . 55 3.4.3 Proposed Bit Allocation Algorithm . . . . . . . . . . . . . . . . . . 57 3.4.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Chapter 4: Quality Layer Dependent R-D Modeling and Joint Q-T Bit Allocation 66 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2 Quality Layer R-D characteristics . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 Q Layer Rate Characteristics . . . . . . . . . . . . . . . . . . . . . 69 4.2.2 Q Layer Distortion Characteristics . . . . . . . . . . . . . . . . . . 73 4.3 Quality Layer Dependent Rate Model . . . . . . . . . . . . . . . . . . . . 75 4.3.1 Q Layer Rate Modeling . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.2 GOP R-D Models with Combined Q-T Scalability . . . . . . . . . 78 4.3.3 Verification of Modeling Accuracy . . . . . . . . . . . . . . . . . . 79 4.4 Joint Q-T Bit Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4.2 Solution to Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.3 Proposed Bit Allocation Algorithm . . . . . . . . . . . . . . . . . . 85 4.4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 5: S-domain Analysis of Dependent R-D Characteristics 94 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2 Four Types of Dependent Rate/Distortion Characteristics . . . . . . . . . 95 5.3 Joint T-Q Rate/Distortion Models . . . . . . . . . . . . . . . . . . . . . . 100 5.3.1 Derivation of Joint T-Q Models . . . . . . . . . . . . . . . . . . . . 100 5.3.2 Model Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.3 Analysis of Model Parameters . . . . . . . . . . . . . . . . . . . . . 104 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Chapter 6: Simplified Temporal Layer Bit Allocation for Hierarchical B-pictures 110 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.3 One-Pass Bit Allocation Algorithm . . . . . . . . . . . . . . . . . . . . . . 114 6.3.1 GOP Rate Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.3.2 T Layer QP Decision . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.3.3 Proposed Rate Control Algorithm . . . . . . . . . . . . . . . . . . 120 6.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 vi Chapter 7: A Cross-layer Design to Wireless/Mobile Video Streaming 124 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.2.1 Challenges in Wireless/Mobile Video Streaming System Design . . 126 7.2.1.1 Wireless/Mobile Radio Channel . . . . . . . . . . . . . . 127 7.2.1.2 Application Characteristics . . . . . . . . . . . . . . . . . 128 7.2.2 Cross-Layer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.2.2.1 Concepts and Definitions . . . . . . . . . . . . . . . . . . 130 7.2.2.2 Classifications of Cross-layer Approaches . . . . . . . . . 131 7.3 Cross-layer Wireless/Mobile Video Streaming . . . . . . . . . . . . . . . . 134 7.3.1 Requirements of Cross-Layer Video Streaming Systems . . . . . . 135 7.3.1.1 Preliminary Issues . . . . . . . . . . . . . . . . . . . . . . 135 7.3.1.2 System Requirements . . . . . . . . . . . . . . . . . . . . 136 7.3.2 Cross-layer Wireless/Mobile Video Streaming System . . . . . . . 138 7.3.2.1 Network-centric approach . . . . . . . . . . . . . . . . . . 139 7.3.2.2 Application-centric Approach . . . . . . . . . . . . . . . . 145 7.4 Toward Practical Cross-layer Video Streaming System . . . . . . . . . . . 150 7.4.1 H.264/SVC for Wireless/Mobile Video Streaming . . . . . . . . . . 151 7.4.2 Operational Application Characteristics . . . . . . . . . . . . . . . 153 7.4.3 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.4.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.4.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . 158 7.4.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . 159 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Chapter 8: Conclusion and Future Work 163 8.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Bibliography 167 vii List Of Tables 3.1 GOP Distortion Modeling Result . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 Experimental result: Bit stream coding efficiency with 10 GOPs . . . . . 61 3.3 Temporal layer rates and Y-PSNR, 10 GOP average . . . . . . . . . . . . 64 4.1 Q Layer R-D modeling results. . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 Experimental result: Global bit stream coding efficiency, 10 GOPs . . . . 91 4.3 Quality layer rates and Y-PSNR, 10 GOP average . . . . . . . . . . . . . 92 5.1 Modeling Accuracy (%) of Each Coding Block. . . . . . . . . . . . . . . . 103 5.2 Modeling Accuracy (%) of Quality Layers. . . . . . . . . . . . . . . . . . . 104 5.3 T layer distortion model parameters. . . . . . . . . . . . . . . . . . . . . . 105 5.4 Q Layer Rate Model Parameters. . . . . . . . . . . . . . . . . . . . . . . . 108 6.1 Coefficients for the QP decision . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Experimental results for GOP-4. . . . . . . . . . . . . . . . . . . . . . . . 121 6.3 Experimental results for GOP-8. . . . . . . . . . . . . . . . . . . . . . . . 122 7.1 Packet Mapping in a MANE. . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.2 Test Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 viii List Of Figures 2.1 IllustrationofthehierarchicalpredictionstructureofH.264/SVCfortheT scalability, where the top and bottom numbers below each frame indicate the encoding order and the corresponding T layer [36], respectively. . . . . 16 2.2 Illustration of a H.264/SVC encoder providing two spatial layers. . . . . . 17 2.3 Illustration of three inter-layer prediction methods. . . . . . . . . . . . . . 18 2.4 Illustration of the trade-off between coding efficiency and error control in three different methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Illustrationofthekeypictureconceptinthehierarchicalpredictionstructure. 22 2.6 Illustration of the combined scalability of H.264/SVC. . . . . . . . . . . . 23 2.7 Illustrationofthetemporaldependency,wheretheratesandthedistortion values of the P-frame are (b) R=2,472 bits, MSE=92.63 and (d) R=2,536 bits, MSE=37.66. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.8 Illustration of the R-D characteristics of an inter-coded picture with dif- ferent reference fidelity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.9 Illustration of the R-D characteristics of spatially dependent enhancement Q and S layers with different base Q and S layer fidelity. . . . . . . . . . . 28 2.10 Illustration of the spatial dependency, where the rates and the distortions of the EL are (b) R=130.91 kbits, MSE=1.33 and (d) R=330.12 kbits, MSE=1.22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.11 Illustration of a trellis-tree constructed by R-D data generation with all possible quantization choices for each coding unit. . . . . . . . . . . . . . 31 ix 3.1 IllustrationofaGOPofH.264/SVC,whichconsistsoffivetemporallayers, where TL-0 consists of two key frames while TL-1, ···, TL-4 are formed by hierarchical B-pictures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Illustrationofratedependencywherethex-axisisthebitrateoflayerTL- 0 (reference layer) and the y-axis is the bit rate of layer TL-1 (dependent layer). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Illustration of TL-1 distortion dependency, where the x-axis is the distor- tion of layer TL-0 and the y-axis is the distortion of layer TL-1. . . . . . . 41 3.4 Illustration of TL-2 distortion dependency, where the x-axis is the distor- tion of layer TL-0 and the y-axis is the distortion of layer TL-2. . . . . . . 43 3.5 Illustration of TL-1 distortion modeling procedure. . . . . . . . . . . . . . 45 3.6 Illustration of TL-2 distortion modeling procedure. . . . . . . . . . . . . . 46 3.7 Illustration of the 16 frame (5-TL) GOP distortion modeling result, where thex-axisisthetrueGOPMSEandthey-axisistheestimatedGOPMSE by the proposed GOP distortion model, QCIF test sequences. . . . . . . 51 3.8 Illustration of the 16 frame (5-TL) GOP distortion modeling result, where thex-axisisthetrueGOPMSEandthey-axisistheestimatedGOPMSE by the proposed GOP distortion model, CIF test sequences. . . . . . . . . 52 3.9 The proposed T layer bit allocation algorithm. . . . . . . . . . . . . . . . 57 3.10 Illustrationofthecodinggainbytheproposedtemporallayerbitallocation scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.11 Illustration of the frame-by-frame Y-PSNR variation. . . . . . . . . . . . . 60 4.1 Illustration of H.264/SVC video with combined Q-T scalability, where three Q layers and four T layers are shown. . . . . . . . . . . . . . . . . . 68 4.2 Illustration of signal decomposition which explains the rate dependence of the enhancement layer on the QP value of the base Q layer. . . . . . . . . 70 4.3 Illustration of rate dependency, where the x-axis is the rate of layer QL-0 and the y-axis is the rate of layer QL-1. . . . . . . . . . . . . . . . . . . . 71 4.4 Illustration of rate dependency, where the x-axis is the rate of layer QL-0 and the y-axis is the rate of layer QL-2. . . . . . . . . . . . . . . . . . . . 72 x 4.5 Illustration of distortion dependency, where the x-axis is the distortion of layer QL-0 (the reference layer) and the y-axis is the distortion of layers QL-1 and QL-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6 Illustration of QL-1 rate modeling. . . . . . . . . . . . . . . . . . . . . . . 76 4.7 Illustration of QL-2 rate modeling. . . . . . . . . . . . . . . . . . . . . . . 77 4.8 Illustration of the Q layer rate modeling results, where the x-axis is the true QL rate and the y-axis is the estimated rate by the proposed Q layer rate model, (a)-(j) QCIF test sequences and (k)-(t) CIF test sequences. . 81 4.9 Illustration of the Q layer distortion modeling results, where the x-axis is the actual QL MSE and the y-axis is the estimated MSE by the proposed Q layer distortion model, (a)-(j) QCIF test sequences and (k)-(t) CIF test sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.10 The blockdiagram of the proposed Q layer bit allocation algorithm. . . . . 85 4.11 Comparison of coding efficiency (Y-PSNR vs. Rate) for four QCIF test sequences: (a) Carphone, (b) City, (c) Foreman and (d) News. . . . . . . 88 4.12 Comparison of coding efficiency (Y-PSNR vs. Rate) for four CIF test sequences: (a) Crew, (b) Hall, (c) Soccer and (d) Tempet. . . . . . . . . . 89 5.1 The rate characteristics of a dependent T layer as a function of the rate of the base T layer: (a) Football and (b) Foreman. . . . . . . . . . . . . . . . 96 5.2 The distortion characteristics of a dependent T layer as a function of the distortion of the base T layer: (a) Football and (b) Foreman. . . . . . . . 97 5.3 The rate characteristics of a dependent Q layer as a function of the rate of the base Q layer: (a) Soccer and (b) City. . . . . . . . . . . . . . . . . . . 98 5.4 The distortion characteristics of a dependent Q layer as a function of the distortion of the base Q layer: (a) QCIF and (b) CIF. . . . . . . . . . . . 99 6.1 IllustrationofaGOPofH.264/SVC,whichconsistsoffivetemporallayers, where TL-0 key frame is I- or P-type frame while TL-1, ···, TL-4 are formed by hierarchical B-pictures. . . . . . . . . . . . . . . . . . . . . . . 111 6.2 The GOP rate characteristics with respect to the quantization step size of the key frame for four QCIF test sequences: (a) City ( = 1:3), (b) Football (=0:8), (c) Foreman (=1:0), and (d) News (=1:0). . . . . 116 xi 6.3 The GOP rate characteristics with respect to the quantization step size of the key frame for four CIF test sequences: (a) City (=1:3), (b) Football (=0:8), (c) Foreman (=0:9), and (d) News (=1:25). . . . . . . . . 117 6.4 Illustration of the relationship between ∆QP and the number of skipped MBs: (a)City,QCIF,(b)Tempete,QCIF,(c)City,CIF,and(d)Tempete, CIF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.5 Illustration of the MSE ratios of several test sequences: (a) Hall, QCIF, (b) Mobile, QCIF, (c) Hall, CIF and (d) Mobile, CIF. . . . . . . . . . . . 119 6.6 Illustration of the frame-by-frame PSNR results of selected test sequences (a) Soccer, QCIF, (b) Hall, QCIF, (c) Foreman, CIF and (d) Bridge, CIF . 123 7.1 Functional diagram of a wireless video streaming system. . . . . . . . . . 126 7.2 Illustration of a QoS support architecture for video applications. . . . . . 140 7.3 Illustration of communication channels for error control. . . . . . . . . . . 146 7.4 NAL unit header extension of H.264/SVC. . . . . . . . . . . . . . . . . . . 152 7.5 Illustration of operational application characteristics. . . . . . . . . . . . . 153 7.6 Overview of the video streaming system under consideration. . . . . . . . 155 7.7 Illustration of the mapping of video data from NALUs to RLPs. . . . . . 156 7.8 The proposed radio link buffer control algorithm: (a) the GOP-based pri- oritized packet transmission and (b) congestion control. . . . . . . . . . . 158 7.9 The wireless channel simulation model. . . . . . . . . . . . . . . . . . . . . 159 7.10 Channel fluctuation for the simulation duration. . . . . . . . . . . . . . . 160 7.11 Simulation result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 xii Abstract In this research, we investigate model-based bit allocation algorithms for H.264/SVC, which is newly standardized as a scalable extension of H.264/AVC. Despite its impor- tance in video coding, inter-dependency between coding units is often indirectly ad- dressed in conventional single layer bit allocation algorithms. This simplified treatment is adopted due to the complexity involved in its explicit consideration. In H.264/SVC, inter-dependencybetweencodingunitsbecomesevenmoreinvolvedandthedevelopment of an optimal bit allocation algorithm imposes an even higher challenge. To address the bit allocation problem for H.264/SVC, we study dependent rate and distortion (R-D) models for temporal and quality scalabilities of H.264/SVC. Traditional R-D models represent rate and distortion characteristics as functions of quantization pa- rameter (QP), known as the Q-domain analysis, or the percentage of zero transform coefficients, known as the-domain analysis. Unlike previous models, we examine depen- dent R-D characteristics based on the self-domain (S-domain) analysis (namely, R- and D-domainanalysisfortherateandthedistortioncharacteristics,respectively),wherethe R-DcharacteristicsofadependentcodingunitareexpressedastheR-Dcharacteristicsof its reference and/or base layers. With the proposed R-D models, the complex dependent xiii R-Dcharacteristicscanbesimplifiedtobealinearsumoffunctionsofasingleparameter. As a result, the bit allocation problem can be solved elegantly. The temporal dependency of the video signal is first studied. According to the self D-domain study, the distortion of a coding unit is subject to temporal dependency. In contrast, its rate is relatively independent of its references. This leads to a linear de- pendent distortion model for the temporal scalability. Then, the research is extended to the quality scalability. Contrary to the temporal-layer case, the self R-domain analysis reveals inter-dependency of the rate characteristics. On the other hand, its distortion is independentof those in the preceding layers. Forthis reason, the quality-layerdependent rateismodeledasthelinearsumofbaselayerratefunctions. Theperformanceofthepro- posed dependent R-D models is verified by comparing their R-D estimation results with actual R-D data. After the initial study on the dependent R-D models, we analyzed the proposed R-D models to understand the physical meaning of the model parameters. We could learn from the analysis that the model parameters convey important information about the inter-layer dependence in the T-Q scalability. We conduct studies on two bit allocation algorithms, which are formulated as the Lagrangian optimization problem. First, we investigate a temporal layer bit allocation problem based on its dependent distortion model. Then, we examine the joint quality- temporallayerbitallocationproblembycombiningtemporalandqualitylayerR-Dmod- els. OneimportantadvantageoftheproposedR-Dmodelsisthattheyallowananalytical solution to the Lagrange equation. With the proposed algorithms, both bit allocation problems are numerically solved at significantly reduced complexity. It is shown by ex- perimental results that the proposed algorithms could produce more efficient scalable bit xiv streams than those by the H.264/SVC reference software codec (JSVM) at various bit rates with different types of test sequences. Moreover, the coding gain of each scalable layer, i.e., T or Q layer, exceeds that obtained by the JSVM benchmark. ThepurposeoftheR-Dcharacteristicsmodelingistodevelopanefficientandeffective rate control algorithm for a video coder. Even though the bit allocation algorithms introduced in the first part of the thesis achieve better performance than the JSVM benchmark, their usage is limited to off-line encoding scenarios. To address the issue, we alsoconductstudythesimplificationofthebitallocationalgorithmsconsideringreal-time encoding scenarios. The performance of the proposed is verified by experimental results in comparison with the JSVM benchmark. Finally, we also examine the rate control of compressed scalable video under network based video application scenarios. We employ cross-layer approach to the design of a wireless video streaming algorithm. Beginning with a through review on the cross-layer design principles, we identified major challenges and issues with the cross-layer design approach in the realistic application scenarios and they could be well addressed by the proposedalgorithm. Thecomputersimulationresultsverifiestheperformanceofproposed algorithm in comparison with single layer video streaming. xv Chapter 1 Introduction 1.1 Signicance of the Research The scalability of coded video has drawn a lot of attention in the literature due to its potential advantages in offering a flexible video transmission solution in a heterogenous networking environment. In the setting of modern mobile communications, it becomes even more attractive due to a wide variety of wireless channel conditions, access band- widths and display devices among end-users. Scalable video is expected to provide pow- erful coding means that adapt to the error-prone radio channel of fluctuating bandwidth efficiently. Scalable video coding has been one of active research topics in the field of video compression for more than two decades. Despite its long history and potential advantages, it has not yet been widely employed in practical video systems due to two concerns, i.e., coding efficiency and complexity [29,36]. With the availability of more powerful computing devices and improved coding efficiency, there is an emerging interest in scalable video transmission nowadays. 1 VideoscalabilitywasnotconsideredinearlyvideocodingstandardssuchasISO/IEC MPEG-1 and ITU-T H.261 [2,4] since the target applications of these standards were on video storage or conversational services. The first scalable video coding concept was introduced to the MPEG-2 standard by ISO/IEC [3], where video scalability was im- plemented with a layered coding approach. Although scalable video coding in MPEG-2 canprovidethreedimensionsofscalability, i.e., temporal(T),spatial(S)andquality(Q) scalabilities,ithasaninherentlimitation. Thatis,itonlyallowscoarse-scalescalabilities, whichbroughtupaflexibilityconcern. ThescalableamendmentoftheMPEG-4standard addresses the flexibility issue by proposing the fine granular SNR scalability (FGS) [1] at the expense of the coding gain. Since then, the trade-off between coding efficiency and bit stream scalability has arisen as an important issue [43]. H.264/SVC has recently been standardized as a scalable extension of the H.264/AVC video coding standard [6]. It aims at overcoming the disadvantages of previous scalable video coding to realize highly efficient scalable video codec at low complexity. It provides flexible coding formats of T, S and Q scalabilities with the following design objectives: • Support of video scalability; • Support of simple video stream adaptation means; • Lowencoding/decodingcomplexityincreaseascomparedtothesinglelayerH.264/AVC video; • Reasonable coding efficiency as compared to single layer coding and simulcast. H.264/SVC inherits most of the features of H.264/AVC, which enables the production of a highly efficient video stream. Moreover, it is equipped with added features to support 2 video scalability as well as high coding efficiency by taking the most advantage of lower layersignalsinthehigherlayerencodingsuchthatnearlynoredundantsignalisincluded in a coded bit stream. Video coding standards define only the bit stream syntax and the decoding process, which means that the encoder realization can be entirely arbitrary as long as the out- put bit stream of an encoder conforms the syntax defined by the standard. Moreover, because the encoder design is not subject to standardization, the R-D performance of compressed bit streams may vary depending on how encoders are designed and imple- mented. Nowadays, most of standard compatible encoders are equipped with their own encoder optimization algorithms to produce bit streams with high coding efficiency. A rate control algorithm is essential to any video encoder because a coded bit stream has to be regulated to satisfy a number of constraints such as the transmission channel rate, the encoder/decoder buffer size and the end-to-end delay requirement. Moreover, it is highly desirable that a rate control algorithm should produce an R-D optimized bit streamtoprovidethehighestpossiblevideoqualityatagivenbitrate. R-Doptimization algorithms are related to most of the encoder functional blocks including motion estima- tion, the macroblock (MB) mode decision and the quantization processes. Bit allocation algorithms are expected to embrace all the processes which constitute a rate control al- gorithm such that R-D optimized bit streams can be produced at designated bit rates by an encoder. Among many coding parameters in the encoding process, the quantization parameter (QP) plays a key role for an optimal bit allocation scheme. As a result, the bit allocation problem is often formulated as a QP decision problem among participating coding units. 3 With the new scalable video coding standard, the development of an efficient and effective bit allocation algorithm is essential. This is the main objective of this research. As an extension of the H.264/AVC standard, H.264/SVC shares the same issues to be addressed for the development of a low complexity bit allocation algorithm. They are stated below. • Requirement for accurate source rate and distortion models. The source rate and distortion (R-D) models are essential to the reduction of the complexity of the rate control process. Due to the new features of H.264/AVC, it is highly desirable to have new R-D models which is customized for H.264/AVC video. For example, Kwon et al. [24] reported the importance of header bits for the R-D modeling of H.264/AVC video and applied it to the source rate control. • The ”chicken and egg” dilemma between the R-D optimized motion estimation and MB mode decision (RDO) and the rate control processes. Although the RDO process has been successful in the enhancement of coding ef- ficiency [42], it introduces some difficulty in the model-based H.264/AVC bit allo- cation process. This is because the RDO process precedes the rate control stage despite the fact that the RDO requires a QP value for its operation while the QP is the output of the rate control unit. Since a QP value is required for the R-D opti- mization of a coding unit before its decision, there is potential discrepancy between the QP for the RDO process and the QP determined by the rate control process. • Bit allocation among coding units. 4 Conventionally, model-based H.264/AVC rate control is performed based on the independence assumption among coding units. As a result, bit allocation among different coding units can be considered separately although it has a huge influence on the overall coding efficiency. In the previous work, bit allocation is performed based on frame types and the energy of the residual signal represented by the mean absolute difference (MAD) value [23,34]. Besides the issues described above, the requirement for the scalability support makes the development of a bit allocation algorithm for H.264/SVC more challenging than that for H.264/AVC due to the following reasons. • Bit allocation among coding units. With multiple dimensions of scalability, the number of coding units is much larger than that of a single layer video. Moreover, inter-dependency among coding units becomes more involved than that of single layer video coding because of the em- ployment of more complex prediction structures. As a result, bit allocation among codingunitshastobeperformedbasedontheaccurateknowledgeoftheinteraction among coding units. • Requirement for simple dependent R-D models. The development of a low complexity bit allocation algorithm requires analytical R-D models of participating coding units. However, conventional independent R- D models cannot properly represent the dependent R-D characteristics of a video signal due to their independent assumption. Although Lin et al. [25] proposed dependent R-D models for MPEG-2 video, its computational complexity grows 5 exponentiallywiththenumberoflayers, whichisprohibitiveforthedevelopmentof a simple bit allocation algorithm. Thus, simple analytical R-D models are desired so that dependent signal R-D characteristics can be represented by closed form functions with a small number of model parameters. Other than the rate control algorithm for video encoders, we also need another type of rate control algorithms that controls the transmission of compressed video streams in network based video applications. Scalable video is developed to address the shortcom- ings of non-scalable video’s inflexibility and incapability of adaptation in a time-varying environment such as wireless video streaming. H.264/SVC is equipped with various tools to provide the flexibility and the adaptability with video applications and it is an impor- tant issue how to make good use of these tools under realistic application scenarios with a given underlying network infrastructures. Majorchallengesforthedevelopmentofanefficientscalablevideostreamingalgorithm are described below. • Random radio channels Theradiochannelprovidesthetransmissionmediumforwirelessvideodelivery. The quality of the radio channel is often measured by the signal-to-noise (SNR) or the signal-to-interference-and-noiseratio(SINR).Sincetheradiochannelisgovernedby thephysicalenvironment,itisunstablewithrandomfluctuation. Forthisreason,it isimportantthatthetransmissionrateofapplicationdataberegulatedaccordingly. • Network heterogeneity 6 In the modern video service environment, it is common that a service network is a combination of wired and wireless ones. Under such an environment, one major technical barrier to network adaptive service provision is the heterogeneity ofnetworkcharacteristics. Forexample,packetlossmayoccurduetothecongestion in a wired network environment or bit errors in a wireless network. Error control is a difficult problem in the presence of network heterogeneity. • Insufficient network resource In many network based applications, it is often desirable to provide real-time deliv- ery of a large amount of video data. However, since network resources are shared by different types of applications, video applications may not be assigned enough amount of resource for delivery. • Vulnerability to errors Modern video codecs employ the motion compensated prediction (MCP) to achieve high coding efficiency. Then, the decoded video quality is highly dependent on correct decoding of the transmitted bit stream. For this reason, even a single bit error could result in serious degradation of decoded video. This problem is even worsewhenweconsidertherandomandunstablenatureofthetransmissionmedium of wireless channels. • Unequal importance of packetized video data The MCP process leads to dependency within a coded video bit stream. Then, the effect of bit errors could be different depending on which part of the bit stream is corrupted. For example, errors in a reference frame may result in more serious 7 quality degradation than those in a non-reference frame. Hence, it is important to provide prioritized services to different types of packetized video data. To address the issues mentioned above, we may impose a set of requirements on an efficient video streaming algorithm below. • Cross-layer approach to video streaming Video applications should be operated adaptively according to the network condi- tions and data characteristics. It is essential to adopt a cross-layer approach. • Cross-layer signaling The conventional network environment is based on a rigid layered architecture, where inter-layer signaling is relatively simple and well defined. Signaling between differentlayersismorecomplicatedinacross-layersystemduetothelackofaclear layer structure. • Flexible video data representation H.264/SVC provides a set of tools to represent coded video streams flexibly. Then, how to manipulate the H.264/SVC bit stream to achieve its full potential is an interesting issue. 1.2 Review of Previous Work Several rate control algorithms have been proposed for H.264/SVC, e.g., [28,30,46,47], most of which are based on the algorithms for previous video coding standards. 8 In[30], Prananthaet al. proposeddependentbitallocationalgorithmforH.264/SVC, whichisformulatedastheLagrangeoptimizationproblem. Sincetheproposedalgorithm directly follows what appeared in [32] for MPEG-2 video, its computational complexity is exponential with the number of layers. Thus, it is not applicable to the encoding of a multi-dimensional scalable bit stream in practice. The solution proposed in [47] was based on the independent quadratic R-D models and the linear prediction model in [11] and [33], respectively. Based on the single layer ratecontrolalgorithm,eachscalablelayerwasencodedindependentlyofotherlayers. For this reason, inter-dependency of video units cannot be properly addressed. Liu et al. [28] employed heuristic T layer weighting factors, which were identical for all video sequences without analyzing their temporal characteristics. Although their algorithm demands a lower complexity than the JSVM FixedQPEncoder [5], its coding efficiency is about the same. T layer differentiation was studied by Xu et al. [46], where the weight of a T layer was computed based on the frame type and the number of connected samples to the T layer. Then, the scaling factor for each T layer, which determines the QP distance from its preceding T layer, was derived by the weight and the frame complexity. However, the scaling factor for each T layer was not affected by the temporal characteristics since the number of connected samples was determined by the frame types in the proposed algorithm. Moreover, although a high coding gain was reported by the authors, the proposed algorithm was experimented for the single layer coding with the hierarchical prediction structure. However, its utility for the H.264/SVC bit allocation is still not clear. 9 Generallyspeaking,ratecontrolalgorithmsasmentionedaboveweredesignedwithad hoc ideas rather than arguments from rate and distortion models and R-D optimization due to the difficulty in modeling dependent R-D behavior. It should be noted that, exceptforthehighlycomplextrellis-basedsolutionin[30],theinter-layerdependencyhas not been properly addressed therein because of the complexity involved in the problem formulation. We provide a detailed overview of bit allocation algorithms in the literature in Sec. 2.3, where bit allocation algorithms are classified based on how the signal dependency issue is addressed. The advantages and the shortcomings of two types of bit allocation algorithms, i.e., dependent and independent bit allocation methods, are described so as to provide the fundamental motivation of this research. 1.3 Contributions of the Research Themainobjectiveofthisresearchistodevelopalowcomplexitybitallocationalgorithm for H.264/SVC, which explicitly addresses the dependent R-D characteristics in scalable video coding. With the conventional single layer coding, the temporal dependency has been regarded as the major issue to be addressed. However, we point out in Chapter 2 that the spatial dependency has to be carefully considered in the bit allocation problem for H.264/SVC as well. To the best of our knowledge, even for the single layer bit allocation, there is few bit allocation algorithm that addresses the dependency issues effectively. That is, all dependent bit allocation algorithms require a prohibitive amount 10 of complexity whereas most of low-complexity algorithms tackel the signal dependency issue heuristically without any theoretical and/or analytical justification. Major contributions of this research are summarized below. • Employment of a self-domain analysis. First,weproposedependentR-Dmodelstodevelopalow-complexitydependentbit allocation algorithm for H.264/SVC. Dependent R-D modeling is challenging since parameters of dependent R-D characteristics are not yet well identified. Even with conventionally accepted R-D parameters (e.g., quantization parameter (q) or the percentage of zero transform coefficients ()), R-D characterization could not be done successfully in some cases. In our dependent R-D modeling, we employ R-D characteristicsofareferenceandabaselayerastheanalysisdomainsfordependent R-D characteristics in T and Q scalabilities, respectively, where they are named as the D- and R-domain analysis. • T layer dependent distortion modeling. Based on the D-domain analysis, which employs the distortion of the lowest T layer as the domain, the dependent distortion characteristics of T layers can be represented by a simple closed-form expression. Moreover, the T layer dependent model is successfully combined to construct a GOP distortion model, which clearly identifies the influence of the individual quantization choice for each T layer on the GOP distortion. • Q layer dependent rate modeling. 11 WealsoproposeadependentratemodelintheQscalabilitybasedontheR-domain analysis. Similarly to the T layer dependent distortion model, the proposed depen- dent rate model successfully identifies the influence of the Q layer bit allocation on the following enhancement Q layers. • Analysis of R-D characteristics in the joint T-Q scalability. As a by-product of dependent R-D characteristics study, we understand the de- pendent R-D characteristics of H.264/SVC video in the T-Q scalability. Important findings include 1. S-domain linearity 2. S-domain parallelism 3. Duality of dependent R-D characteristics in the T-Q scalability 4. R-D dependence orthogonality. Moreover, we conduct a through analysis of the proposed R-D models and could understand the implications of the model parameters. • DevelopmentofdependentbitallocationalgorithmsforH.264/SVCbased on numerical solution to the Lagrange equation. BasedondependentR-Dmodels,wedevelopbitallocationalgorithmsforH.264/SVC. We first examine the T layer bit allocation problem in Chapter 3 based on the T layer R-D models, and then investigate a joint Q-T bit allocation problem in Chap- ter4withthecombinedTandQlayerR-Dmodels. Weformulatethebitallocation 12 problems using the Lagrange optimization method, for which an analytical solution could be obtained with the proposed R-D models. • Development of a simplified rate control algorithm for hierarchical B- pictures As an extension of the dependent bit allocation algorithm in Chapter 3, we study a simplified rate control algorithm for hierarchical B-pictures of H.264/SVC. The major challenge lies in adaptive T layer QP decision to exploit the inter-layer de- pendency among participating T layers. We employ two features (i.e., the number of skipped macroblocks and the MAD ratio) to measure the inter-layer dependency and determine the optimal QP in the proposed simplified rate control algorithm. • Cross-layer wireless video streaming of H.264/SVC video Finally, we consider a cross-layer design approach to efficient video streaming. We firstprovideathroughreviewonthecross-layerdesignprincipleandthendiscussits applicationtothevideostreamingproblem. Moreover,westudythedevelopmentof a cross-layer video streaming algorithm in a realistic service scenario with scalable video. The efficiency of the proposed solution is verified by computer simulation. 1.4 Organization of the Dissertation Therestofthedissertationisorganizedasfollows. InChapter2,weproviderelatedback- ground knowledge for this research. The dependent T layer distortion model is proposed and its application to the T layer bit allocation is considered in Chapter 3. Studies on Q layer dependent R-D modeling and the joint Q-T bit allocation problem based on the 13 T and Q layer R-D models are conducted in Chapter 4. As an extension of Chapters 3 and 4, we provide an analysis on model parameters and develop a simplified rate control algorithm for hierarchical B-pictures in Chapters 5 and 6, respectively. In Chapter 7, we examine the problem of wireless video streaming as an application of H.264/SVC video. Finally, we give some concluding remarks and future research directions in Chapter 8. 14 Chapter 2 Research Background and Motivation The background knowledge related to the research is reviewed in this chapter. An overview of the H.264/SVC standard is provided with its design principles in Sec. 2.1, followed by the signal dependency issue in the scalable video coding in Sec. 2.2. Sec. 2.3 briefly reviews bit allocation algorithms. Finally, we conclude this chapter with our research motivation in Sec. 2.4. 2.1 H.264/SVC Overview We provide a brief overview of H.264/SVC in this section. As an extension of the H.264/AVC standard, H.264/SVC inherits most of its features, which enable highly ef- ficient video coding. This section provides new features of H.264/SVC to support video scalability. 2.1.1 Temporal Scalability The temporal (T) scalability refers to the capability of providing coded video streams at reduced frame rates. In H.264/SVC, the T scalability is implemented by a hierarchical 15 prediction structure, by which only the lower T layer pictures can be used as references for higher T layer pictures. Fig. 2.1 demonstrates GOPs with the hierarchical prediction structure. T0 T0 T0 T1 T1 T2 T2 T2 T2 T3 T3 T3 T3 T3 T3 T3 T3 0 1 9 2 10 3 4 11 12 13 14 15 16 5 6 7 8 GOP GOP (a) Hierarchical B pictures T0 T0 T0 T1 T1 T2 T2 T2 T2 T3 T3 T3 T3 T3 T3 T3 T3 0 8 16 4 12 2 6 10 14 9 11 13 15 1 3 5 7 GOP GOP (b) Hierarchical P pictures with no structural delay Figure 2.1: Illustration of the hierarchical prediction structure of H.264/SVC for the T scalability, where the top and bottom numbers below each frame indicate the encoding order and the corresponding T layer [36], respectively. Depending on encoding constraints, a GOP can be structured with hierarchically aligned B- or P-pictures as depicted in Figs. 2.1(a) and 2.1(b). Although the hierarchical structure is originally developed for the provision of the T scalability, it is also capable of producing highly efficient video streams when B-pictures are used. In fact, the T scalability is not a totally new feature. Even with the conventional GOP structures of IBBP..., the support of T scalability is feasible by skipping non-referenced B frames. As a result, the T scalability of H.264/SVC should be fully compatible with the H.264/AVC. 16 2.1.2 Spatial Scalability The spatial (S) scalability of H.264/SVC directly follows the conventional layered coding approach. An encoding process of a spatially scalable video is demonstrated in Fig. 2.2. The original input is fed into the enhancement layer (EL) encoder whereas the spatially decimated input signal is passed to the base layer (BL) encoder, which produces a video bit-stream fully compatible with the H.264/AVC standard. H.264/AVC compatible encoder Hierarchical MCP & Intra prediction Base Layer Coding H.264/AVC MCP & Intra prediction Base Layer Coding Input video Spatial Decimation Inter-layer Prediction - Intra - Motion - Residual Multiplex Scalable Bit-stream Texture Texture Motion Motion H.264/AVC compatible Base-layer bit-stream Figure 2.2: Illustration of a H.264/SVC encoder providing two spatial layers. In conventional scalable video, the BL reconstruction (or the average of the BL re- construction) and the temporal predictor are used as the inter-layer prediction signal. H.264/SVC introduces the following new features with its S scalability design to further improve coding efficiency: • BlSkip macroblock (MB) mode, • Inter-layer motion prediction, and • Inter-layer residual prediction, 17 4 x 8 4 x 8 8x4 8x4 8x8 4x4 4x4 4x4 4x4 8 x 16 16x8 8x8 8 x 16 16x8 8x8 8x8 8x8 16x16 BL sub-MB EL MB mode, reference index, etc Scaling (MV, Block size,…) (a) Inter-layer motion prediction BL residual EL residual prediction Up-sampling (b) Inter-layer residual prediction BL Intra-block EL Intra-prediction Up-sampling (c) Inter-layer intra prediction Figure 2.3: Illustration of three inter-layer prediction methods. 18 wheretheBlSkipmodeisintroducedtosignaltheinter-layermotionandintraprediction. Experimental results on new inter-layer predictions were provided in [35]. Fig. 2.3 demonstrates the three inter-layer prediction methods in the S scalability of H.264/SVC,whereeachsub-figureshowshowtheinter-layerpredictionsignalisprepared for an MB of the EL. They are detailed as follows. • Inter-layer motion prediction The prediction mode is used when the EL MB mode is BlSkip and the co-located 8x8 sub-MB of its base layer is inter coded. In this case, the EL MB is also inter- coded and all data needed for inter-prediction, including the motion vector (MV), the block partitioning type and the reference indices, is passed from the co-located BL sub-block. Although the BL motion information is utilized with the inter-layer motionprediction, BLresidualsignalsstillremainunusedfortheELencodingwith inter-layer motion prediction, which motivates the inter-layer residual prediction as described below. • Inter-layer residual prediction This prediction mode is employed for all inter-coded EL MBs regardless of their MB modes. With the inter-layer residual prediction, the BL residual signal of the corresponding 8x8 sub-MB is up-sampled in a block-wise manner and the up- sampled residual signal is used as the prediction signal for an EL MB and, as a consequence, only the difference signal is coded in the EL encoder. • Inter-layer intra prediction 19 Combined with the BlSkip mode, this prediction mode is used when the co-located BL8x8sub-MBisintra-coded. Whenthismodeisenabled,thereconstructionofthe BLintrasignalisup-sampledandusedasthepredictionsignalforthecorresponding EL intra MB. In the S scalability design of H.264/SVC, two extended spatial scalabilities were also introduced; namely, 1) non-dyadic S scalability by cropping pictures to an arbitrary resolution, and 2) spatial scalable coding of inter-laced sources. The three basic inter- layer predictions are still valid with these two extended spatial scalabilities. More details about the S scalability of H.264/SVC can be found in [37]. Note that ’D’ was used as an identifier of an S layer in the S scalability design of H.264/SVC. However, in this proposal, we use ’S’ instead of ’D’ to avoid a possible confusion with Distortion. 2.1.3 Quality/SNR Scalability Currently, H.264/SVC defines two types of Q scalability: the coarse-grain SNR scalable coding (CGS) and the medium-grain SNR scalable coding (MGS). Although the fine- grain SNR scalable (FGS) coding was initially proposed to address some shortcomings of the CGS coding, it is not included in the standard. The inter-layer prediction schemes described in Sec. 2.1.2 are all employed in the Q scalability as well. The CGS design directly follows the layered coding concept and, therefore, it can be viewed as a special type of S scalability without any spatial resolution change between the BL and the EL pictures. For a multi-layered coding scenario, the CGS coding raises some flexibility concern. That is, a scalable video stream with CGS Q layers can only 20 (a) BL only control (b) EL only control (c) Two loop control Figure 2.4: Illustration of the trade-off between coding efficiency and error control in three different methods. provide a few selected bit rates at a coarse level. Moreover, with the CGS coded scalable bitstream,thebitstreamswitchingbetweenqualitylayersislimitedatpre-definedpoints only. An important design issue for the Q scalability development is the trade-off between coding efficiency and error-drift. Fig. 2.4 demonstrates three methods that provide different trade-offs as explained in the following [36]. • BL only control (see Fig. 2.4(a)) Only the BL reconstruction is used in the motion-compensated prediction (MCP) process. For this reason, any loss or modification of EL slices does not cause any error drift. However, because of the low fidelity of references, the coding efficiency is significantly lower as compared to the single-layer coding. • EL only control (see Fig. 2.4(b)) EL slices with the highest fidelity are used as references. Although it provides significantcodinggaincomparedtotheBLonlycontrol,scalablebitstreamsbecome very vulnerable to any loss or error in EL slices. • Two-loop control (see Fig. 2.4(c)) 21 BL slice EL slice Key picture T0 T0 T1 T2 T2 Figure2.5: Illustrationofthekeypictureconceptinthehierarchicalpredictionstructure. Thetrade-offismanagedbytwoMCPloops,wherereferencesarefoundinthesame Q layer of the current slice. Although an error in the EL stream does not have any influence on the BL quality, the EL is still subject to the error drift. In H.264/SVC, the trade-off is addressed by the key picture concept combined with a hierarchicalpredictionstructureasillustratedinFig. 2.5. AframeatTlayer0(indicated by T0), can be defined as a key picture or a non-key picture. As illustrated in the figure, the reference of a key picture has to be the BL reconstruction of the previous T0 frame while non-key pictures are allowed to have references at their highest fidelity. With this prediction mechanism, any error in the EL is successfully confined to a GOP without propagating to subsequent GOPs. Moreover, by having higher fidelity references, the coding gain could be effectively improved. Theconceptualsimplicityofthekeypictureisdirectlyconnectedtothelowcomplexity codecdesign. ThekeypictureconceptisemployedintheMGSofH.264/SVCforefficient trade-off between coding efficiency and error-drift. The MGS is designed to provide finer 22 rate levels and more flexibility than the CGS. Combined with the key picture concept, an MGS coded scalable stream is expected to achieve the design goal of the MGS by providing more flexibility with Q scalable bit streams. 2.1.4 Combined Scalability Different types of scalability are described in previous sections individually. However, in many application examples, a scalable global bit stream is expected to contain all types of scalability. Fig. 2.6 shows a typical example of the combined scalability in the S-T-Q space, where each coding block is specified by the 3-D coordinates, (S;T;Q). S T Q (0,3,2) (0,3,1) (0,0,0) (0,1,0) (0,2,0) (0,3,0) (1,3,2) (1,3,1) (1,0,0) (1,1,0) (1,2,0) (1,3,0) (2,3,2) (2,3,1) (2,0,0) (2,1,0) (2,2,0) (2,3,0) Figure 2.6: Illustration of the combined scalability of H.264/SVC. With the combined scalability, an H.264/SVC video is expected to provide great flexibility in terms of the display resolution, the quality level and the frame rate. One important issue is how to generate R-D efficient scalable bit streams with the complex 23 structure of the 3-D scalability. That is, the bit stream generation has to be performed carefully for the R-D efficiency of the stream with the following considerations. • The internal structure of a global scalable bit stream in the S-T-Q space. If all sub-blocks are required for the decoding of a block, (i;j;k), some sub-blocks may be potentially redundant for the representation of block (i;j;k). For this reason, the internal connection between coding blocks should be designed carefully. • Bit allocation to each coding block. The R-D characteristics of a coding block is temporally and spatially dependent on the fidelity of its preceding blocks. Thus, bit allocation among coding blocks has to be determined carefully. 2.2 Dependency in Scalable Video Coding Theredundancyremovalinvideocodingisperformedbythedifferenceoperationbetween an original signal and its predictor, which results in the residual signal. Mathematically, we have r i (x;y)=|f i (x;y)− ˆ f i (x;y)|=|f i (x;y)−f i−j (d i (x;y))|; ∀i;j ∈{0;1;2;:::}; (2.1) where i and j are frame indices, f i is the original signal, ˆ f i is its predictor and d i (x;y) is the displacement vector of the pixel at (x;y). Note that the original signal is intra-coded 24 if i=j, and inter-coded if i>j or i<j. It is obvious from Eq. (2.1) that the energy of the residual, E= W ∑ x=1 H ∑ y=1 r(x;y) 2 ; (2.2) whereW andH arethewidthandtheheightoftheinputvideoframe,ishighlydependent ontheaccuracyofthepredictor. Becausetheresidualsignalistheinputtothequantizer, the residual energy (E) becomes the major factor that determines the number of bits required to encode the corresponding frame. Generally speaking, the R-D characteristics of an image is determined by the residual energy as expressed in Eqs. (2.1) and (2.2) combined with the number of bits required to represent the displacement vector, d i (x;y), and other syntax elements. The high cod- ing efficiency of modern video coding standards is achieved by the successful reduction of psycho-visual, statistical, temporalandspatialredundancies. Amongthem, thetemporal and spatial redundancy reductions are the core elements that result in significant bit re- ductionfortherepresentationofmotionpictures. Inthissection,weexaminedependency in temporal and spatial domains. 2.2.1 Temporal Dependency The temporal dependency refers to the dependence of a coding unit (e.g., a macroblock (MB), a frame, etc.) on its temporal predictor. All inter-coded pictures (P- and B- pictures) are subject to the temporal dependency on its reference pictures. Generally speaking, references with higher fidelity lead to higher R-D efficiency of inter-coded pic- tures, which is also known as the monotonicity property of the video signal [32]. 25 (a) Reference, QPR = 45 (b) P-Frame, (QPR;QPP) = (45;45) (c) Reference, QPR = 15 (d) P-Frame, (QPR;QPP) = (15;45) Figure 2.7: Illustration of the temporal dependency, where the rates and the distor- tion values of the P-frame are (b) R=2,472 bits, MSE=92.63 and (d) R=2,536 bits, MSE=37.66. AtypicalexampleofthetemporaldependencyisdemonstratedinFig. 2.7,whereaP- frame in Figs. 2.7(b) and 2.7(d) is coded with different reference fidelity as demonstrated in Figs. 2.7(a) and 2.7(c). As shown in these figures, the coding efficiency of an inter- coded P-frame heavily depends on the fidelity of its reference. When the numbers of coding bits are close, the distortion is significantly reduced by having a higher fidelity reference. In the PSNR measure, the quality difference between the two coding instances is about 4 dB (28.46dB versus 32.37dB). In Fig. 2.8, we show the R-D characteristics of 26 an inter-coded picture (P-frame) with different reference fidelity, where the fidelity of the reference picture is represented by its QP values. 0 10 20 30 40 50 60 0 20 40 60 80 100 120 140 160 180 Rate (kbits) MSE ref QP=15 ref QP=30 ref QP=45 Figure 2.8: Illustration of the R-D characteristics of an inter-coded picture with different reference fidelity. The temporal dependency demonstrated in Figs. 2.7 and 2.8 have an important implicationforthedevelopmentofbitallocationalgorithmsinvideocoding. Sincehigher reference fidelity implies a larger number of bits for the reference picture, the trade-off between the R-D performance of pictures bound in the dependency has to be carefully managed for optimal bit allocation. The trade-off becomes even more important for H.264/SVC with a hierarchical prediction structure since a GOP is composed mostly of inter-coded pictures as illustrated in Fig. 2.1. 2.2.2 Spatial Dependency The spatial dependency is different from the temporal dependency only in the location of the predictor signal. That is, the dependence of an input video signal is determined based on whether the predictor is temporal or spatial, i.e., i ̸= j or i = j in Eq. (2.1). 27 Thespatial dependency is considered to be less importantthanthe temporal dependency for the following reasons. • The temporal redundancy reduction realizes far more efficient compression capabil- ity of video signals. • The spatial dependency is considered mostly with relatively smaller coding units (e.g., MB)and, asaresult, itismainlyonlyfortheMB-layerbitallocationwhereas most bit allocation algorithms in the literature target at the frame- or the GOP- layer. However, the spatial dependency has to be investigated carefully in scalable video coding since more spatial predictors are employed by the inter-layer prediction as introduced in Secs. 2.1.2 and 2.1.3. 0 5 10 15 20 25 30 35 0 20 40 60 80 100 120 140 Rate (kbits) MSE BL QP=33 BL QP=39 BL QP=45 (a) Q layer spatial dependency 0 10 20 30 40 50 60 70 80 0 20 40 60 80 100 120 Rate (kbits) MSE BL QP=15 BL QP=30 BL QP=45 (b) S layer spatial dependency Figure 2.9: Illustration of the R-D characteristics of spatially dependent enhancement Q and S layers with different base Q and S layer fidelity. Fig. 2.9 demonstrates the R-D characteristics of a frame with different spatial pre- dictors, where the fidelity of a spatial predictor is determined by the BL QP value. An 28 example with the Q layer coding is given in Fig. 2.10, where the enhancement Q layer in Figs. 2.10(b) and 2.10(d) is coded with different base Q layer fidelity as shown in Figs. 2.10(a) and 2.10(c). It is clear that the EL coding performance could be greatly enhancedwithabaseQlayerofhigherfidelity. Thatis, theELencodingdemands130.91 and 330.12 kbits with higher and lower quality BL, respectively, to achieve about the same picture quality. (a) BL, QPB = 21 (b) EL, (QPB;QPE) = (21;15) (c) BL, QPB = 45 (d) EL, (QPB;QPE) = (45;15) Figure 2.10: Illustration of the spatial dependency, where the rates and the distortions of the EL are (b) R=130.91 kbits, MSE=1.33 and (d) R=330.12 kbits, MSE=1.22. We see from Figs. 2.9 and 2.10 that the spatial dependency has a great influence on the R-D performance of a layer in scalable video coding. For this reason, it is essential 29 that a bit allocation algorithm for H.264/SVC should take the spatial dependency into account to produce R-D efficient scalable video streams. 2.3 Bit Allocation Algorithms Bit allocation schemes can be classified into dependent and independent bit allocation algorithms depending on how the inter-dependency of video signals is addressed. Gener- ally speaking, the effect of signal dependency is explicitly considered with dependent bit allocation algorithms [25,27,30,32,38]. On the other hand, independent bit allocation algorithms assume the independence of coding units or heuristically address the signal dependency issue [11,14–16,19,23,24,33,34,46,47]. They are detailed below. 2.3.1 Dependent Bit Allocation Algorithms AsintroducedinSec. 2.2,thesignaldependencyissuehastobestudiedcarefullyinorder to manage the trade-off between references and dependent layers properly. With depen- dent bit allocation algorithms, the most important issue is to understand the dependent R-D characteristics of coding units, which makes a data generation pass essential to all dependentbitallocationalgorithms. Theoverallcomplexityofadependentbitallocation algorithmisgovernedbythedatagenerationpass. DuetotheR-Ddatarequirement, op- erational algorithms are prevalent in solving the dependent bit allocation problem except for the model-based solution in [25]. Ramchandran et al. [32] proposed an operational bit allocation algorithm that de- mands the R-D knowledge of all possible combinations of quantization choices for par- ticipating coding units. R-D data are generated from the actual encoding passes and a 30 trellis-tree is constructed accordingly. An example of a trellis-tree of R-D data is given in Fig. 2.11, which describes the encoding instance of five frames (f 0 ;··· ;f 4 ) with four different quantization choices of q 0 ;··· ;q 3 . Although the authors proposed a fast tree searchalgorithmbasedonthemonotonicityproperty,thecomplexity(C)ofthealgorithm is still exponential in the depth of the dependency tree (the number of frames or layers); namely, C =|Q| N ; where Q is the set of all admissible quantization choices and N is the number of coding units, e.g., frames or layers. q 0 q 1 q 2 q 3 f 0 f 1 f 2 f 3 f 4 Figure 2.11: Illustration of a trellis-tree constructed by R-D data generation with all possible quantization choices for each coding unit. Given the trellis-tree of R-D data, Liu and Kuo [27] proposed two efficient iteration methods, called the simple iteration algorithm (SIA) and the greedy iteration algorithm (GIA). Moreover, they introduced a new skip node to the dependency tree for joint temporal/spatial bit allocation with frame skipping. The proposed search algorithms are more efficient than that in [32], yet the complexity of R-D data gathering still leaves an issue. 31 To address the complexity of R-D data generation, the steepest descent algorithm was employed for the dependent bit allocation by Sermadevi et al. [38]. By tracking the solutionpathontheR-Dsurfaceformedbydifferentquantizationchoicesforcodingunits, the required number of R-D sampling could be successfully reduced to the polynomial complexity; namely, C =|Q|NM; whereQ is the set of all admissible quantization choices, N is the number of coding units and M is the maximum number of dependent coding units on bit allocation change of a coding unit. An analytical model-based approach was proposed by Lin and Ortega [25] for depen- dent bit allocation in MPEG-2 video. The R-D models were constructed by interpolating a relatively small constant number of R-D samples, called the control points, of each cod- ing unit. Moreover, they introduced a special distortion property of inter-coded pictures in dependent distortion modeling. Although the required number of R-D samples is sig- nificantly reduced as compared with those in [27,32], the complexity (C) of the algorithm is still exponential with the number of control points; namely, C =|S| N ; where S and N are the numbers of control points and coding units, respectively. 32 2.3.2 Independent Bit Allocation Algorithms Although dependent bit allocation schemes provide optimal or near-optimal R-D perfor- mance for video coding, their complexity is prohibitively high in practical deployment. Independent bit allocation algorithms reduce the complexity using the independent as- sumption among coding units and concentrate on bit allocation to residual signals. As described in Sec. 2.2, the prediction signal, ˆ f i (x;y), in Eq. 2.1 is subject to signal depen- dency while the residual signal, r i (x;y), can be viewed as a relatively independent unit, which meets the independent assumption among coding units required by independent bit allocation algorithms. The independent assumption makes analytical R-D modeling more feasible than that with the consideration of dependency. For this reason, independent bit allocation al- gorithms are mostly prevalent in the model-based approach. With the low complexity solution as the objective, analytical R-D models are expected to be simple yet accurate and robust. As a consequence, it is one of the most important issue with the R-D model- ing to find the most suitable feature, which determines the R-D characteristics of coding units. The quantization effect is often considered as the major factor for the R-D vari- ations of coding units in the q-domain analysis, where the R-D models are defined as functions of quantization parameter or step-size [11,19,23,24,33]. In [14–16], the R-D modelingapproachfollowsthebehavioroftheentropycoder. Thatis,therateofacoding unitismodeledasalinearfunctionofanewvariable,whichisdefinedasthepercentage of zero transform coefficients. 33 For independent bit allocation algorithms, the importance of signal dependency still remains. This is evidenced by several efforts that attempt to exploit them [23,34]. The monotonicity property provides the fundamental principle for inter-dependency exploita- tion. Ribas-Corbera et al. [34] pointed out that inter-dependency can be well addressed byaheuristicofkeepingthequalityofP-framesabout1dBhigherthanthatofB-frames. Kwon et al. [23] proposed a GOP-based simplified rate control algorithm for H.264/AVC, where frame QP’s in a GOP are determined by the frame type such that QP I ≤QP P ≤QP B : IndependentbitallocationalgorithmsbasedonanalyticalR-Dmodelshavebeenquite successful. For example, the quadratic R-D models in [11] is employed for the implemen- tation of the rate control algorithms in the software reference codecs of H.264/AVC and H.264/SVC [7,8]. However, because of its inherent limitation of the independence as- sumption, the rate control algorithm for H.264/SVC is applicable only to the base layer of scalable video, which is in fact the same as the single layer H.264/AVC video. 2.4 Conclusion In this chapter, we provided a review of background knowledge relevant to the research presented in the following chapters. The fundamentals of H.264/SVC were described in Sec. 2.1. Then, we elaborated on the fact that signal dependency has become even more important in scalable video coding than that of single layer video coding in Sec. 2.2. The review of bit allocation algorithms was conducted in Sec. 2.3. 34 In this research, our objective is to develop efficient bit allocation algorithms for H.264/SVC, which considers signal dependency yet demands low complexity. We will develop dependent R-D models and bit allocation algorithms for H.264/SVC in Chapters 3 and 4. 35 Chapter 3 Temporal Layer Dependent R-D Modeling and Its Application to Temporal Layer Bit Allocation 3.1 Introduction In this chapter, we propose a GOP based T layer dependent distortion model and in- vestigate a temporal layer bit allocation problem as its application. A T layer which is composed of a number of frames is considered as a basic coding unit in this work. Con- ventional R-D models such asQ- or-domain models in [11,14] assume the independence among coding units and, therefore, they cannot represent the dependent R-D behaviors properly despite the importance of the signal dependency among coding units. In this work, we employ the rate and the distortion of a reference layer as interme- diate domains (the R-domain and the D-domain, respectively) for the observation of the dependent R-D behaviors of inter-coded dependent T layers. That is, the R-D character- istics of a dependent layer are observed with the rate and the distortion of its reference layers. According to our observation, the rate of a dependent T layer is relatively inde- pendent of the rate of its reference layers while the distortion characteristics are strongly 36 dependentonthedistortionofitsreferencelayers. Forthisreason,wemainlyconcentrate on dependent distortion modeling in this chapter. The D-domain observation enables the distortion function of a dependent layer to be expressed by that of a reference layer evaluated at the quantization step sizes of the target layer and its reference layers. Moreover, with the independent reference layer as the observation domain, the dependent distortion function can be expressed as the linear sum of the independent distortion functions. The linear decomposition allows the analysis of the influence of individual quantization choices for participating layers and, as a consequence, the optimal bit allocation problem can be solved by numerical methods. The rest of this chapter is organized as follows. First, we report our observation on the temporal layer R-D characteristics in Sec. 3.2, where the dependent distortion is characterized by a small number of constants. In Sec. 3.3, the modeling procedure and the performance of the proposed model are discussed. As an application of the proposed distortion model, we investigate the temporal layer bit allocation problem in Sec. 3.4, whereitisformulatedbytheLagrangeoptimizationmethod. Finally,concludingremarks are given in Sec. 3.5. 3.2 Temporal Layer R-D Characteristics 3.2.1 Hierarchical Temporal Layers of H.264/SVC We consider the hierarchical B-pictures as the target GOP structure of investigation in this section. Because it depicts the most complex inter-dependency among all GOP structures, the study of its dependent R-D characteristics should be applicable to other 37 relatively simpler GOP structures. The hierarchical B-pictures consist of a key frame (of I- or P-pictures) and hierarchically aligned B-pictures, which are adopted to achieve temporal scalability with good coding performance. Under this setting, we investigate a GOP-based optimal bit allocation scheme among temporal layers formed by hierarchical B-pictures as shown in Fig. 3.1. TL-0 TL-1 TL-2 TL-3 TL-4 I / P Frame B Frame GOP Figure 3.1: Illustration of a GOP of H.264/SVC, which consists of five temporal layers, where TL-0 consists of two key frames while TL-1, ···, TL-4 are formed by hierarchical B-pictures. Conventionally, the R-D characteristics of a dependent layer are expressed as a func- tion of quantization step-sizes (or, equivalently, quantization parameters) to reflect the quantization effect on input video in the encoding process. Mathematically, they are multi-variable functions of the following form: R i (q 0 ;q 1 ;··· ;q i ) and D i (q 0 ;q 1 ;··· ;q i ); (3.1) 38 whereR i ,D i andq i denotetherateandthedistortionfunctionandthequantizationstep size for the i th temporal layer. Usually, the behavior of these multi-variable functions is very complex. For an optimal bit allocation scheme, it is desirable to understand the influences of individual variables on the rate and the distortion of the target layer. This is feasible if the multi-variable R-D functions in Eq. (3.1) is expressed by simpler single variable functions. In the hierarchical B-pictures of H.264/SVC video, inter-layer dependency among B-frames is formed in a recursive manner. Besides the explicit dependence on its reference frames, each B-frame is subject to an implicit dependence on its preceding framesbecauseofthedependenceofthereferenceframesontheirreferences. Thecomplex inter-layer dependency is studied and simplified in this section. 3.2.2 Rate Characteristics We show the bit rate of a dependent layer (TL-1) as a function of the bit rate of its reference layer (TL-0) parameterized by four QP values for TL-1 in Fig. 3.2. Because of the monotonicity property [32], we consider only the case where the QP of the dependent layerisgreaterorequaltothatofthereferencelayer. Weseeclearlythattheratefunction of the target layer is primarily determined by its own QP or q (due to the one-to-one correspondence between QP and q). It is not sensitive to the bit rate (or the QP value) of the reference layer. Thus, we can approximate the rate function of a dependent layer as R i (q 0 ;q 1 ;··· ;q i−1 ;q i )≈R i (q i ); (3.2) 39 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 TL-0 Rate (kbits) TL-1 Rate (kbits) QP 27 QP 30 QP 33 QP 36 QP 39 (a) Football, QCIF 0 50 100 150 200 250 300 350 20 30 40 50 60 70 80 90 100 110 TL-0 Rate (kbits) TL-1 Rate (kbits) QP 27 QP 30 QP 33 QP 36 QP 39 (b) Football, CIF 0 5 10 15 20 25 30 35 40 45 2 3 4 5 6 7 8 9 10 11 TL-0 Rate (kbits) TL-1 Rate (kbits) QP 27 QP 30 QP 33 QP 36 QP 39 (c) Foreman, QCIF 0 20 40 60 80 100 120 140 6 8 10 12 14 16 18 20 22 24 26 TL-0 Rate (kbits) TL-1 Rate (kbits) QP 27 QP 30 QP 33 QP 36 QP 39 (d) Foreman, CIF Figure 3.2: Illustration of rate dependency where the x-axis is the bit rate of layer TL-0 (reference layer) and the y-axis is the bit rate of layer TL-1 (dependent layer). where q i is the quantization step size of the target layer. 3.2.3 Distortion Characteristics The characterization of the distortion function of T layers is more involved than that of the rate function. As shown in Eq. (3.1), the distortion function of the i th T layer can be expressed as D i (q 0 ;q 1 ;··· ;q i ): 40 We will explore the relationship between D 0 and D i , i = 0;1;···, which is called the D-domain analysis. It will become transparent that the D-domain analysis is easier than the traditional Q-domain or -domain analysis [11,14] to model the dependent distortion characteristics. We begin with a simple example, i.e., TL-1’s dependency on TL-0, and study the distortion model with arbitrary dependency in this section. TL-1 Distortion Characteristics 0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 TL-0 MSE TL-1 MSE QP0=QP1 QP0<=QP1, Fixed QP1 (a) Football, QCIF, TL-1 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90 100 TL-0 MSE TL-1 MSE QP0=QP1 QP0<=QP1, Fixed QP1 (b) Foreman, CIF, TL-1 Figure 3.3: Illustration of TL-1 distortion dependency, where the x-axis is the distortion of layer TL-0 and the y-axis is the distortion of layer TL-1. TL-0 is an independent layer and its distortion is determined by a single factor, i.e., its own q, D 0 =D 0 (q 0 ): For layer TL-1, we plot its distortion, D 1 (q 0 ;q 1 ), as a function of D 0 (q 0 ) in Fig. 3.3 with the following two settings: • q 0 =q 1 and • q 0 ≤q 1 for fixed q 1 . 41 The second setting reflects the monotonicity property in [32]. In Fig. 3.3, we have two types of curves corresponding to the two q-settings above. The solid curve on the diagonal is the TL-1 distortion by the firstq-setting - i.e., (D 0 (q 0 );D 1 (q 0 ;q 0 )) forq 0 ∈Q. The branches from the diagonal in the dashed curve illustrate (D 0 (q 0 );D 1 (q 0 ;q 1 )) for q 0 ;q 1 ∈Q by the second q-setting, where q 1 remains fixed for one branch and varies over different branches. Careful observations with these controlled q-settings suggest the following two prop- erties about the distortion characteristics of dependent temporal layers. 1. The distortion of a dependent temporal layer is linearly proportional to the distor- tion of its reference layers. 2. The slopes remain approximately constant for curves under the same settings of q’s. Forexample, thedistortionbranchesbythesecondq-settingareapproximately parallel. By the two approximated properties, the curves in Fig. 3.3 can be characterized by two constants of m 0 ≈ D 1 (q 0 ;q 1 )−D 1 (q ′ 0 ;q 1 ) D 0 (q 0 )−D 0 (q ′ 0 ) and m 1 ≈ D 1 (q;q) D 0 (q) ; (3.3) where q and q ′ denote different q values and m 1 and m 0 represent the characteristics of the first and the second type of curves respectively. TL-2 Distortion Characteristics We investigate the characteristics of TL-2 distortion which is influenced by two refer- ence layers of TL-0 and TL-1. It is expressed as a three-variable function of D 2 (q 0 ;q 1 ;q 2 ) 42 0 20 40 60 80 100 120 140 160 180 0 50 100 150 200 250 300 350 TL-0 MSE TL-2 MSE QP0=QP1=QP2 QP0=QP1<=QP2, Fixed QP2 QP0<=QP1<=QP2, Fixed QP1, QP2 (a) Foreman, QCIF, TL-2 0 20 40 60 80 100 120 140 160 180 0 50 100 150 200 250 300 350 400 450 500 TL-0 MSE TL-2 MSE QP0=QP1=QP2 QP0=QP1<=QP2, Fixed QP2 QP0<=QP1<=QP2, Fixed QP1, QP2 (b) Football, CIF, TL-2 Figure 3.4: Illustration of TL-2 distortion dependency, where the x-axis is the distortion of layer TL-0 and the y-axis is the distortion of layer TL-2. and Fig. 3.4 plots D 2 (q 0 ;q 1 ;q 2 ) as a function of D 0 (q) with the following three q-settings of • q 0 =q 1 =q 2 • q 2 is fixed and q 0 and q 1 vary with q 0 =q 1 ≤q 2 • q 1 and q 2 are fixed and q 0 varies with q 0 ≤q 1 . Similarly to the TL-1 distortion characteristics, the solid curve on the diagonal, the dashed branches from the diagonal and the dotted branches from dashed branches are theTL-2distortionsbytheaboveq-settingsinthesameorder. Thelinearandtheparallel assumptions are applied in the same manner as the TL-1 dependence case; namely, m 0 ≈ D 2 (q 0 ;q 1 ;q 2 )−D 2 (q ′ 0 ;q 1 ;q 2 ) D 0 (q 0 )−D 0 (q ′ 0 ) ; m 1 ≈ D 2 (q 1 ;q 1 ;q 2 )−D 2 (q ′ 1 ;q ′ 1 ;q 2 ) D 0 (q 1 )−D 0 (q ′ 1 ) and m 2 ≈ D 2 (q;q;q) D 0 (q) ; (3.4) 43 wherethedifferenttypesofcurvesinFig. 3.4aresimplycharacterizedbythreeconstants of m 0 , m 1 and m 2 . Finally, the observations can be generalized to model the dependent distortion of a T layer with an arbitrary number of reference layer. That is, the distortion of TL-i with i reference layers can be characterized by i+1 slope values of m 0 , m 1 , ···, and m i . The modeling procedure using the linearized distortion characteristics is explained in detail in the next section. 3.3 Temporal Layer Dependent Distortion Model The proposed distortion model estimates GOP distortion in two steps. First, the distor- tion of each dependent layer in a GOP is estimated by the D-domain approximations as depicted in Sec. 3.2.3. The D-domain approximations enable the decomposition of the distortion of a dependent layer into the weighted sum of the distortion of an independent layer evaluated at the q’s of participating layers. That is, we have D i (q 0 ;··· ;q i )= 0 ·D 0 (q 0 )+···+ i ·D 0 (q i ); (3.5) where i ’s are model parameters. Finally, the GOP distortion is estimated simply by computing the sum of distortions of all participating layers. Mathematically, we have D GOP (q 0 ;q 1 ;··· ;q N−1 )=D 0 (q 0 )+D 1 (q 0 ;q 1 )+···+D N (q 0 ;··· ;q N−1 ) =! 0 ·D 0 (q 0 )+! 1 ·D 0 (q 1 )+···+! N−1 ·D 0 (q N−1 ); (3.6) 44 wherethenumberoftemporallayersisN,and! i ’saremodelparameters. Itisworthwhile to point out that only the distortion characteristics of an independent T layer need to be known in order to model distortions of dependent T layers in a GOP as given in Eqs. (3.5) and (3.6). In the following, the distortion modeling procedure is described and the distortion modeling results are provided for model verification. 3.3.1 T Layer Distortion Modeling TL-1 Distortion Modeling D 0 (q 0 ) D 0 (q 1 ) D 1 (q 1 ,q 1 ) D 1 (q 0 ,q 1 ) Slope : m 1 Slope : m 0 A B D 1 (q,q’) D 0 (q) Figure 3.5: Illustration of TL-1 distortion modeling procedure. Fig. 3.5 illustrates an idealized TL-1 distortion characteristics in the D-domain. Based on the linear and parallel assumptions, the curves in Fig. 3.3 are represented by straightandparallellines, whichcharacterizethetwotypesofcurvesinFig. 3.3withtwo slope values of m 0 and m 1 . The TL-1 distortion at an arbitrary point, i.e., D 1 (q 0 ;q 1 ), can be specified by tracing the points A and B in Fig. 3.5 as follows. 1. A: D 1 (q 1 ;q 1 )=m 1 ·D 0 (q 1 ), 45 2. B: D 1 (q 0 ;q 1 )=D 1 (q 1 ;q 1 )+m 0 ·(D 0 (q 0 )−D 0 (q 1 )) =m 1 ·D 0 (q 1 )+m 0 ·(D 0 (q 0 )−D 0 (q 1 ))=m 0 ·D 0 (q 0 )+(m 1 −m 0 )·D 0 (q 1 ) . By substituting coefficients m 0 and (m 1 −m 0 ) with 0 and 1 , we are led to a model for the TL-1 distortion. Mathematically, we have D 1 (q 0 ;q 1 )= 0 ·D 0 (q 0 )+ 1 ·D 0 (q 1 ); (3.7) where i ’s are model parameters. In Eq. (3.7), the double parameter distortion function is decomposed into a weighted sum of single variable independent distortion function - i.e., D 0 (q) evaluated at q 0 and q 1 . As a consequence, it becomes feasible to analyze the contributions of individual quantization choices on dependent layer’s distortion indepen- dently. TL-2 Distortion Modeling D 0 (q 0 ) D 0 (q 2 ) D 0 (q) D 2 (q 2 ,q 2 ,q 2 ) D 2 (q 1 ,q 1 ,q 2 ) D 2 (q 0 ,q 1 ,q 2 ) C Slope : m 0 B A D 2 (q,q’,q’’) D 0 (q 1 ) Slope : m 1 Slope : m 2 Figure 3.6: Illustration of TL-2 distortion modeling procedure. 46 Fig. 3.6 illustrates an idealized D-domain plot of TL-2 distortion for its modeling. In the figure, we have three types of lines characterized by three slope values of m 0 , m 1 and m 2 . The TL-2 distortion model is constructed by tracing the points A, B and C. To be more specific, 1. A: D 2 (q 2 ;q 2 ;q 2 )=m 2 ·D 0 (q 2 ); 2. B: D 2 (q 1 ;q 1 ;q 2 )=D 2 (q 2 ;q 2 ;q 2 )+m 1 ·(D 0 (q 1 )−D 0 (q 2 )) =m 1 ·D 0 (q 1 )+(m 2 −m 1 )·D 0 (q 2 ); 3. C: D 2 (q 0 ;q 1 ;q 2 )=D 2 (q 1 ;q 1 ;q 2 )+m 0 ·(D 0 (q 0 )−D 0 (q 1 )) =m 0 ·D 0 (q 0 )+(m 1 −m 0 )·D 0 (q 1 )+(m 2 −m 1 )·D 0 (q 2 ): Then, by substituting m 0 , (m 1 − m 0 ) and (m 2 − m 1 ) with 0 , 1 and 2 , we can decompose the TL-2 distortion into D 2 (q 0 ;q 1 ;q 2 )= 0 ·D 0 (q 0 )+ 1 ·D 0 (q 1 )+ 2 ·D 0 (q 2 ); (3.8) where i ’s are model parameters. To conclude, the single variable function in the de- composition, i.e., Eqs. (3.7) and (3.8), is the independent 0 th layer distortion function evaluated at q i ’s of all participating T layers. 3.3.2 GOP Distortion Modeling Next,weextendthebasicdecompositionideatoconstructthedistortionmodelforTL-N. Inferring from the observations on the TL-1 and TL-2 distortion modeling, we assume 47 N +1 types of lines and corresponding slopes (m 0 ;m 1 ;··· ;m N ) accordingly for the TL- N distortion modeling. Then, the TL-N distortion can be modeled by the following procedure. 1. D N (q N ;q N ;··· ;q N )=m N ·D 0 (q N ), 2. D N (q N−1 ;q N−1 ;··· ;q N−1 ;q N )=m N−1 ·D 0 (q N−1 )+(m N −m N−1 )·D 0 (q N ), 3. D N (q N−2 ;··· ;q N−2 ;q N−1 ;q N ) =m N−2 ·D 0 (q N−2 )+(m N−1 −m N−2 )·D 0 (q N−1 )+(m N −m N−1 )·D 0 (q N ), ······ 4. D N (q 0 ;q 1 ;:::;q N−1 ;q N ) =m 0 ·D 0 (q 0 )+(m 1 −m 0 )·D 0 (q 1 )+···+(m N−1 −m N−2 )·D 0 (q N−1 )+(m N − m N−1 )·D 0 (q N ). A simple substitution of coefficients with i ’s yields D N (q 0 ;··· ;q N )= 0 ·D 0 (q 0 )+···+ N ·D 0 (q N ); (3.9) where 0 =m 0 and i =m i −m i−1 for i=1;··· ;N and it conforms the layer distortion model introduced in Eq. (3.5). 48 Finally, we simply add the distortions of all participating layers in a GOP to achieve the GOP distortion model. That is, D GOP (q 0 ;q 1 ;··· ;q N ) =D 0 (q 0 )+ 1;0 ·D 0 (q 0 )+ 1;1 ·D 0 (q 1 )+···+ N;0 ·D 0 (q 0 )+···+ N;N ·D 0 (q N ) = ( 1+ N ∑ i=1 i;0 ) ·D 0 (q 0 )+ N ∑ i=1 i;1 ·D 0 (q 1 )+···+ N;N ·D 0 (q N ) =! 0 ·D 0 (q 0 )+! 1 ·D 0 (q 1 )+···+! N ·D 0 (q N ); (3.10) where we have added one more subscript to model parameters to distinguish differ- ent temporal layers and ! i ’s are the GOP distortion model parameters computed from the distortion model parameters ( i;j ) of dependent T layers. It is worthwhile to point out that model parameters i;j ’s and ! i ’s represent the individual contributions of the corresponding layers to the T layer distortion and the GOP distortion, respectively. 3.3.3 Modeling Result Withtheproposeddistortionmodel,theGOPdistortionof9testsequencesareestimated for model verification. In this experiment, a GOP is composed of 16 frames with 5 tem- poral layers. The model parameters i;j in Eq. (3.5) are determined by a straightforward approach in this experiment. That is, we encode a target GOP at pre-determined QP’s ofq m andq M and then the target layer’sdistortion corresponding toD 0 (q m ) andD 0 (q M ) become pivots for the computation of slope values. For this reason, a GOP withN layers has to be iteratively encoded with N q-settings to produce actual distortions at pivots. 49 Table 3.1: GOP Distortion Modeling Result Format Sequence Accuracy (%) Format Sequence Accuracy (%) QCIF City 85.53 CIF City 86.19 Crew 77.53 Crew 86.20 Football 93.80 Football 97.46 Foreman 90.61 Foreman 96.73 News 89.82 News 95.74 Soccer 94.99 Soccer 92.42 Carphone 84.26 Hall 88.68 Salesman 92.79 Harbour 76.24 Table Tennis 90.64 Tempete 79.59 Average 88.89 Average 88.80 For example, a 2-layer GOP requires 2 encoding passes with q-settings of (q m ;q M ) and (q M ;q M ) and, as a consequence, the slope values of m 0 and m 1 are computed from the D-domain coordinates of (D 0 (q M );D 1 (q M ;q M )) and (D 0 (q m );D 1 (q m ;q M )). In Fig. 3.5, q 0 and q 1 are used instead of q m and q M for the model derivation purpose. ForeachGOPfromQCIForCIFtestsequences,wegenerated2,730and810distortion samples, respectively. Figs. 3.7 and 3.8 illustrate the modeling result with the red dotted line indicating the identity, i.e., y = x line. We have numerical result summarized in Table 3.1, where the estimation accuracy is computed by the average error from the true distortion values; namely, accuracy= 1 N s Ns ∑ i=1 ( 1− |s i −ˆ s i | s i ) ×100; where N s is the number of samples and s i and ˆ s i are the true and estimated MSE values, respectively. The estimation result obtained by the proposed model achieves an accuracyofabout89%ontheaverage. Thus,theaccuracyandrobustnessoftheproposed distortion model are confirmed by experimental results. 50 Carphone (a) Carphone, QCIF City (b) City, QCIF Crew (c) Crew, QCIF Football (d) Football, QCIF Foreman (e) Foreman, QCIF News (f) News, QCIF Salesman (g) Salesman, QCIF Soccer (h) Soccer, QCIF Table Tennis (i) Table Tennis, QCIF Figure3.7: Illustrationofthe16frame(5-TL)GOPdistortionmodelingresult,wherethe x-axis is the true GOP MSE and the y-axis is the estimated GOP MSE by the proposed GOP distortion model, QCIF test sequences. 51 City (a) City, CIF Crew (b) Crew, CIF Football (c) Football, CIF Foreman (d) Foreman, CIF Hall (e) Hall, CIF Harbour (f) Harbour, CIF News (g) News, CIF Soccer (h) Soccer, CIF Tempete (i) Tempete, CIF Figure3.8: Illustrationofthe16frame(5-TL)GOPdistortionmodelingresult,wherethe x-axis is the true GOP MSE and the y-axis is the estimated GOP MSE by the proposed GOP distortion model, CIF test sequences. 52 In the next section, we propose an optimal T layer bit allocation algorithm as an application of the proposed distortion model. 3.4 Temporal Layer Bit Allocation We examine a T layer bit allocation problem for H.264/SVC based on the distortion model presented in earlier sections. The major advantage of the proposed distortion model comes from the successful decoupling of multiple variables in dependent distortion functions. As a consequence, the GOP distortion is decomposed into a weighted sum of single variable distortion function of an independent T layer at q’s of participating layers. Moreover, the linear GOP distortion model enables the development of a simple analytical solution to a dependent bit allocation problem. 3.4.1 Problem Formulation For this problem, we consider a T layer and a GOP as the basic and the target opti- mization coding units, respectively. That is, given a bit budget for a GOP, the proposed algorithm performs a T layer bit allocation that minimizes the GOP distortion. A bit allocation problem is actually a QP decision problem because QP is the actual encoder control parameter. By the one-to-one correspondence between QP and the quantization step size (q), the T layer bit allocation problem is formulated as the q-decision problem. Mathematically, we have q ∗ =argmin q∈Q N T D GOP (q) s.t. R GOP (q)≤R T ; (3.11) 53 where N T is the number of temporal layers, q = [q 0 ;q 1 ;··· ;q N T −1 ] represents the q- vector of q i ’s corresponding to participating T layers, Q is the q-space of all admissible quantization step sizes, and R T is the target bit budget constraint. The constrained problem given in Eq. (3.11) can be expressed by its unconstrained dual as J(q ∗ ; ∗ )=argmin q∈Q N T J(q;); where J(q;)=D GOP (q)+·R GOP (q); (3.12) whereq ∗ istheoptimalsolutionand ∗ referstotheLagrangemultiplierattheoptimality. Although the Lagrange equation in Eq. (3.12) can be easily solved by computing partial derivatives, an analytical solution to dependent bit allocation problems has not been available in the literature yet. This is mainly because the influence of individual variablesontheR-Dvaluesarenotclearlyidentified. Sincetheproposeddistortionmodel, combined with the rate approximation in Sec. 3.2.2, enables the successful decoupling of the vector variable, it becomes feasible to partially differentiate Eq. (3.12) to achieve the solution analytically. With the proposed distortion model, the problem in Eq. (3.12) can be rewritten to J(q;)=D GOP (q)+·R GOP (q)= N T −1 ∑ i=0 ! i ·D 0 (q i )+· N T −1 ∑ i=0 R i (q i ); (3.13) where the R-D functions of a complex vector variable are decomposed into simple single variable functions. For the final closed form expression of the R-D functions, we employ 54 the frame layer R-D functions proposed by Kamaci et al. in [19]. Based on the Cauchy distribution assumption of transform coefficients, the R-D functions are expressed by R(q)=a·q − and D(q)=b·q ; (3.14) where a;b; and are model parameters. By substituting the R-D functions in Eq. (3.14) to Eq. (3.13), we finalize the problem formulation; namely, J(q ∗ ; ∗ )=argmin q∈Q N T J(q;)=argmin q∈Q N T ( N T −1 ∑ i=0 ! i ·b 0 ·q 0 i +· N T −1 ∑ i=0 a i ·q − i i ) : (3.15) 3.4.2 Solution to the Lagrangian Equation The Lagrangian equation in Eq. (3.15) can be solved by partially differentiating the equation with respect to q i ’s and setting the partial derivatives to zero. Mathematically, the partial differentiation with respect to q i yields ∗ = ! i ·b 0 · 0 ·q 0 −1 i a i · i ·q − i −1 i ; ∀i∈{0;··· ;N T −1}: (3.16) Eq. (3.16) can be rewritten with the rate function of TL-i from the R−q relation in Eq. (3.14); namely, ∗ = ! i ·b 0 · 0 ·q 0 −1 i a i · i ·q i −1 i = ! i ·b 0 · 0 ·a i i i i ·R i (q i ) − i + i i : (3.17) 55 Now, the partial derivatives corresponding toq 0 andq i can be equated because they have ∗ in common as depicted in Eq. (3.17). Mathematically, we have ∗ = ! 0 ·b 0 · 0 ·a 0 0 0 0 ·R 0 (q 0 ) − 0 + 0 0 = ! i ·b 0 · 0 ·a i i i i ·R i (q i ) − i + i i : (3.18) As a consequence, the TL-i rate function can be expressed by the TL-0 rate function as R i (q i )= i ·R 0 (q 0 ) i ; (3.19) where i = 0 + 0 0 · i 0 + i and i = ! 0 · i ·a 0 0 0 ! i · 0 ·a i i i − i 0 + i . Finally, we can obtain the solution to the Lagrangian equation using the bit rate constraint R T = N T −1 ∑ i=0 i ·R 0 (q 0 ) i : (3.20) IntheproposedTlayerbitallocationscheme,Eq. (3.20)isnumericallysolvedtoyield R 0 (q 0 ), and the rates of all remaining T layers (R i (q i ), i=1···N T −1) are computed by Eq. (3.19), which yields q i ’s of all participating T layers based on the R−q relation in Eq. (3.14). Finally, the T layer QP’s are determined from the q-QP relation of QP=8:6547·logq +4:0465; (3.21) where QP’s are assumed to be continuous values rather than discrete integer values in {0;··· ;51}. In practice, we can take the nearest integer value for QP. 56 3.4.3 Proposed Bit Allocation Algorithm The proposed algorithm consists of three stages as illustrated in Fig. 3.9. The pre- encodingstagedeterminesthemodelparametersandcoefficientsfortheR-Dmodelingof atargetGOP.Inthesecondstage, thetemporallayerbitallocationisperformedfollowed bytheQP-decisionbasedonthesolutiontotheLagrangianequation. Finally,atthethird stage, the target GOP is encoded with the QP vector determined in the second stage to produce an encoded bit stream at the target bit rate. Pre-encoding Rate Allocation & QP-decision Encoding @ R T Target GOP Encoded GOP Stream Coefficients Parameters Coding Parameter QP=[QP 0 ,...,QP N ] Figure 3.9: The proposed T layer bit allocation algorithm. The encoding procedure of the proposed algorithm is given below. 1. Initial bit allocation to a GOP Based on frame rate F and channel rate C, allocate the target bit budget R T to a GOP according to R T =N F;GOP · C F +R 0 ; whereR 0 compensatesfortheerrorsinthepreviousGOPbitallocationandN F;GOP is the number of frames in a GOP. 2. Pre-encoding for model parameter decision 57 The target GOP is encoded to produce the true MSE values at pivots (points A, B, and C in Figs. 3.5 and 3.6). The required number of pre-encoding passes is N T , if the target GOP has N T T layers. 3. Temporal layer rate allocation and QP decision By solving Eq. (3.19) numerically, the T layer bit allocation is performed. The R−q relationinEq. (3.14)mapsallocatedbitstotheq valueofeachlayer. Finally, QP is determined directly from q by Eq. (3.21). 4. Encoding at the target bit rate The current GOP is encoded to produce the final bit stream at the target bit rate. 5. Iteration Steps 1 to 4 are applied to the following GOP until the end of the input video. 3.4.4 Experimental Result The proposed T layer bit allocation algorithm is evaluated with QCIF and CIF test sequences. Thetargetbitratesaresetto32kbpsto768kbpsbasedontheircharacteristics and formats. In the experiment, a GOP is composed of 16 frames with 5 temporal layers ranging from 1.875 fps to 30 fps and 10 GOP’s are encoded for each test sequence. In this experiment, all TL-0 frames are encoded as P-frames except for the first I-frame. The performance of the proposed temporal layer bit allocation scheme is compared with that of the rate control algorithm implemented in the JSVM software reference codec [7]. First, Fig. 3.10 shows the coding gain at various target bit rates. Then, the frame-by-frame Y-PSNR comparison is depicted in Fig. 3.11. A significant coding gain 58 0 50 100 150 200 250 300 350 34 36 38 40 42 44 46 48 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (a) News (QCIF), RT = 32kbps - 256kbps 0 200 400 600 800 1000 24 26 28 30 32 34 36 38 40 42 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (b) Football (QCIF), RT = 64kbps - 512kbps 0 200 400 600 800 1000 1200 28 30 32 34 36 38 40 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (c) Crew (CIF), RT = 96kbps - 768kbps 0 100 200 300 400 34 35 36 37 38 39 40 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (d) Hall (CIF), RT = 48kbps - 384kbps Figure 3.10: Illustration of the coding gain by the proposed temporal layer bit allocation scheme. canbeobservedfromthesetwofigures. Anotheradvantageoftheproposedbitallocation scheme is the precise rate control as depicted by the mismatch (∆R) in Table 3.2, where the maximum difference from the target bit rate is as low as 3.26% for the Hall test sequence in CIF format encoded at 384 kbps. The coding gain of the proposed bit allocation scheme mainly comes from the ex- plicit consideration and the successful modeling of the T layer dependence. The JSVM rate control algorithm is the one implemented in the JM reference software codec for 59 ͣ ͡ ͣ ͣ ͣ ͥ ͣ ͧ ͣ ͩ ͤ ͡ ͤ ͣ ͤ ͥ ͤ ͧ ͤ ͩ ͥ ͡ ͡ ͦ ͡ ͢ ͡ ͡ ͢ ͦ ͡ ͷ Σ Β Ξ Ζ ͑ Ϳ Π ͟ Ί ͞ ΄ Ϳ ͑ ͙ Ε ͳ ͚ Σ Π Ρ Π Τ Ζ Ε ͻ ΄ · ; ͑ ͪ ͟ ͢ ͣ FOOTBALL (a) Football, QCIF @ 64kbps ͤ ͦ ͤ ͨ ͤ ͪ ͥ ͢ ͥ ͤ ͥ ͦ ͥ ͨ ͥ ͪ ͦ ͢ ͡ ͦ ͡ ͢ ͡ ͡ ͢ ͦ ͡ ͷ Σ Β Ξ Ζ ͑ Ϳ Π ͟ Ί ͞ ΄ Ϳ ͑ ͙ Ε ͳ ͚ Σ Π Ρ Π Τ Ζ Ε ͻ ΄ · ; ͑ ͪ ͟ ͢ ͣ SALESMAN (b) Salesman, QCIF @ 128kbps ͤ ͣ ͤ ͤ ͤ ͥ ͤ ͦ ͤ ͧ ͤ ͨ ͤ ͩ ͤ ͪ ͡ ͦ ͡ ͢ ͡ ͡ ͢ ͦ ͡ ͷ Σ Β Ξ Ζ ͑ Ϳ Π ͟ Ί ͞ ΄ Ϳ ͑ ͙ Ε ͳ ͚ Σ Π Ρ Π Τ Ζ Ε ͻ ΄ · ; ͑ ͪ ͟ ͢ ͣ HALL (c) Hall, CIF @ 48kbps ͤ ͣ ͤ ͤ ͤ ͥ ͤ ͦ ͤ ͧ ͤ ͨ ͤ ͩ ͤ ͪ ͥ ͡ ͥ ͢ ͡ ͦ ͡ ͢ ͡ ͡ ͢ ͦ ͡ ͷ Σ Β Ξ Ζ ͑ Ϳ Π ͟ Ί ͞ ΄ Ϳ ͑ ͙ Ε ͳ ͚ Σ Π Ρ Π Τ Ζ Ε ͻ ΄ · ; ͑ ͪ ͟ ͢ ͣ NEWS (d) News, CIF @ 96kbps Figure 3.11: Illustration of the frame-by-frame Y-PSNR variation. 60 Table 3.2: Experimental result: Bit stream coding efficiency with 10 GOPs (∆R =R prop −R T ) Format Sequence R T JSVM 9.12 Proposed Rate (kbps) Y-PSNR (dB) Rate Y-PSNR ∆R QCIF City 32 32.4 32.16 32.22 32.33 +0.22 64 71.82 35.29 63.57 36.12 -0.43 128 133.32 39.42 128.80 39.59 +0.80 256 263.81 42.20 255.51 42.36 +0.51 News 32 35.82 34.19 32.15 35.31 +0.15 64 70.52 36.80 63.96 39.23 -0.04 128 140.63 42.66 128.07 43.49 +0.07 256 321.08 47.16 256.11 47.65 +0.11 Salesman 32 35.84 34.78 32.20 35.56 +0.20 64 52.41 35.79 63.89 39.67 -0.11 128 142.96 42.40 126.00 43.86 -0.2 256 299.13 45.85 251.15 46.61 -3.85 Football 64 62.74 25.67 63.51 27.00 -0.49 128 205.61 30.27 127.78 29.81 -0.22 256 363.41 33.35 255.67 33.17 -0.33 512 957.44 40.37 511.41 37.26 -0.59 CIF Crew 96 118.85 30.30 95.99 28.66 -0.01 192 247.55 32.80 191.22 32.18 -0.78 384 522.56 35.01 383.20 34.83 -0.80 768 1080.08 38.93 768.64 37.68 +0.64 Foreman 48 51.10 30.76 48.12 30.29 +0.12 96 106.34 34.64 96.05 34.29 +0.5 192 239.01 38.40 191.43 38.19 -0.57 384 276.67 38.78 381.79 42.24 -2.21 Hall 48 41.51 34.56 47.26 35.51 -0.74 96 98.03 35.98 92.33 37.26 -3.67 192 221.11 37.04 183.62 38.65 +0.62 384 257.20 37.13 371.49 39.73 -12.51 Soccer 96 118.57 29.15 96.92 28.32 +0.92 192 211.07 31.89 192.48 31.43 +0.48 384 533.72 35.64 382.70 34.92 -1.30 768 824.88 39.06 768.50 38.72 +0.50 61 H.264/AVC, which performs a frame-layer bit allocation with heuristic and implicit con- sideration of the inter-layer dependency. To be more specific, the JSVM implementation isahybridofmodes1and2oftheJMratecontrolalgorithm, wheremode1isusedwhen pictures of an input video are coded into the same type of pictures while mode 2 takes the hierarchical coding into account. That is, the QP of a B-frame is determined by the combination of its previous QP and the hierarchical level it belongs to [8]; namely, QP B order = QP prev ; if B order =0; QP prev +H levels −L(B order ); otherwise; where B order is the B-frame encoding order in a GOP and L(n) is the hierarchy level for a B-frame with encoding order n. Because of this simple decision method, the QP difference of a B-frame at a certain hierarchy level from that of its reference pictures becomes constant regardless of their dependent distortion characteristics. As a consequence, the dependence of a B-frame cannot be properly addressed. On the other hand, with the proposed bit allocation scheme,theQPofaB-frameisdeterminedbasedontheGOPbaseddependentdistortion model. That is, model coefficients ! i ’s in Eq. (3.6) successfully determine the QP values so that the dependency of B-frames is properly addressed to maximize the GOP quality. Roughly speaking, the QP difference between highly dependent layers is supposed to be larger than that between layers with low dependency at the R-D optimality. The T layer R-D performance comparison given in Table 3.3 implies the importance ofthedependencyconsiderationinthebitallocationforhierarchicalB-pictures. InTable 62 3.3, the coding gain ranges from 0.36 to 2.18 dB with an average ranging from 0.94 to 1.76 dB at each T layer. The T layer coding gain can be explained as follows. • The QP of each T layer is determined based on the contribution of each T layer to the GOP distortion, which is signaled by the model coefficient ! i in Eq. (3.6). • The QP decision by the proposed algorithm results in the higher fidelity of lower T layers than that by the JSVM RC algorithm. • Lower T layers with higher fidelity improves the R-D efficiency of the following T layers, by which the R-D performance at each T layer becomes superior to that of the JSVM rate control algorithm. 3.5 Conclusion In this chapter, we proposed a dependent distortion model and examined a temporal layer bit allocation as its application. With the proposed distortion model, the complex interaction among coding units, i.e., inter-layer dependency, can be successfully modeled at a significantly reduced complexity. To the best of our knowledge, all existing solu- tions to the dependent bit allocation problem (e.g., [32] and [25]) require exponential complexity. In contrast, we obtained a solution of linear complexity with the proposed distortion model. The accuracy and the robustness of the proposed distortion model was verified by experiments of its application to a temporal layer bit allocation for the hier- archical B-pictures of H.264/SVC. The coding efficiency could be greatly enhanced with the proposed bit allocation scheme with a precise rate control at given target bit rates. 63 Table 3.3: Temporal layer rates and Y-PSNR, 10 GOP average (∆R =R prop −R JSVM , ∆PSNR=PSNR prop −PSNR JSVM ) Format Sequence R T TL JSVM 9.12 Proposed ∆R ∆PSNR Rate PSNR Rate PSNR (kbps) (dB) (kbps) (dB) QCIF Football 64 0 11.84 27.64 11.36 28.02 -0.48 +0.36 1 19.82 27.00 19.03 27.79 -0.79 +0.79 2 31.52 26.38 31.04 27.42 -0.48 +1.02 3 46.25 25.91 45.84 27.12 -0.41 +1.21 4 62.74 25.67 63.51 27.00 +0.77 +1.33 Average -0.28 +0.94 Salesman 128 0 56.87 44.46 51.16 46.34 -5.71 +1.88 1 72.93 43.66 76.10 45.84 +3.07 +2.18 2 94.93 43.15 99.92 45.24 +4.99 +2.09 3 118.81 42.85 117.68 44.67 -0.13 +1.18 4 142.96 42.40 126.00 43.86 -12.96 +1.46 Average -2.15 +1.76 CIF Hall 48 0 11.41 35.19 14.59 36.14 +3.18 +0.95 1 16.42 34.93 20.44 35.92 +4.02 +0.99 2 23.28 34.75 27.53 35.70 +4.25 +0.95 3 31.82 34.63 36.47 35.56 +4.65 +0.93 4 41.51 34.56 47.26 35.51 +5.75 +0.95 Average +4.37 +0.95 News 96 0 37.45 36.88 33.20 38.06 -4.25 +1.38 1 50.07 36.51 47.64 37.86 -2.43 +1.35 2 66.34 36.26 64.75 37.58 -1.59 +1.32 3 84.70 36.07 82.20 37.33 -2.50 +1.26 4 101.25 35.98 94.70 37.15 -6.55 +1.17 Average -3.46 +1.30 64 The performance of the rate control algorithm is contributed mainly to the successful D-domain modeling of the dependent distortion characteristics. Althoughthecomplexityreductionissignificantascomparedwithprevioussolutions, the current temporal layer bit allocation scheme is still subject to multiple pre-encoding passes for model parameter decision. For this reason, an efficient method for model parameter decision is essential to practical encoder design with the proposed distortion model. We believe that the pseudo-stationarity of video signals can be applicable to model parameter prediction, which is similar to other model-based solutions. This will be explored in the future. In the next chapter, we study the quality layer R-D characteristics and its modeling to develop a bit allocation method for scalable video with joint quality-temporal (Q- T) scalability. Being similar to the temporal layer R-D modeling, the dependent R- D characteristics of the quality scalable layer are carefully investigated to model their dependent behavior. 65 Chapter 4 Quality Layer Dependent R-D Modeling and Joint Q-T Bit Allocation 4.1 Introduction In this chapter, we study the quality (Q) layer R-D characteristics and their modeling as a sequel to the T layer R-D modeling in Chapter 3. Since a scalable video bit stream contains multiple layers with different scalabilitie, i.e., temporal, quality and spatial scalability, the objective of studying the Q layer R-D characteristics is to develop an optimal bit allocation scheme for scalable video with combined Q-T scalability. We propose Q layer R-D models and GOP based R-D models and then derive combined Q-T scalability from T and Q layer R-D models. In Chapter 3, the T layer R-D characteristics were investigated with intermediate R- and D-domains. We employ the same methodology for the Q layer R-D characteristics study. Thatis,thebaseQlayerRandDfunctionsserveasthedomainfortheobservation oftheR-DbehaviorofenhancementQlayers. Recallthatwehavethefollowingtwomajor conclusions from the last chapter: 66 1. The rate of a dependent T layer is relatively independent of its reference layers. 2. The dependent distortion can be decomposed into a linear sum of an independent base T layer distortion function evaluated at the q’s of participating layers. In this chapter, we would like to draw similar conclusions about the Q layer R-D characteristics. However, contrary to the T layer R-D characteristics, the Q layer distor- tion is relatively independent of its preceding Q layers. Furthermore, the Q layer rate is dependent on the rates of its preceding Q layers. As a result, an enhancement Q layer distortion behavior can be expressed by a single variable function and an enhancement Q layer rate characteristics are represented by the base Q layer function evaluated at the quantization step sizes of the target layer and its preceding Q layers. Another important observation is the orthogonality of the R-D dependence of the T andQlayerR-Dcharacteristics. Thatis,therateandthedistortioncharacteristicsdepict independence in the temporal and the quality scalability, respectively. As a consequence, theQlayerR-DmodelingcouldbeperformedindependentlyoftheTlayerR-Dbehaviors. Moreover, the R-D models for GOPs with combined Q-T scalability could be derived simply by combining the T and Q layer R-D models. Finally, an optimal bit allocation for joint Q-T scalability is developed as the appli- cation of the GOP based R-D models, where bit allocation is formulated as a Lagrangian optimization problem. Similar to the T layer bit allocation problem in Chapter 3, the joint bit allocation problem can be expressed in closed form using the proposed GOP R-D models and solved numerically. 67 The rest of this chapter is organized as follows. First, we study the Q layer R-D characteristics in Sec. 4.2 and present the Q layer and GOP R-D models in Sec. 4.3. The joint bit allocation problem is formulated and solved numerically in Sec. 4.4. Finally, we give concluding remarks and future research directions in Sec. 4.5. 4.2 Quality Layer R-D characteristics A Q-T plane that shows the combined Q-T scalability is illustrated in Fig. 4.1. There are four T layers (denoted by 0, 1, 2, 3) and three Q layers (denoted by 0, 1, 2) in this example. Each block of index (i;j) indicates a video clip that contains 0;··· ;i quality layersand0;··· ;j temporallayers. TheinteractionamongcodingunitsintheQ-Tplane influences the overall coding efficiency of a scalable bit stream greatly. (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) (0,3) (1,3) (2,0) (2,1) (2,2) (2,3) TL (distortion dep.) QL (Rate dep.) TL-0 TL-1 TL-2 TL-3 Figure 4.1: Illustration of H.264/SVC video with combined Q-T scalability, where three Q layers and four T layers are shown. 68 To represent such an interaction, the R-D characteristics of a coding unit, (i;j), can be expressed by the following multi-variable functions: R i;j (q 0;0 ;··· ;q i;j ) and D i;j (q 0;0 ;··· ;q i;j ); where q i;j is the quantization step size at the ith Q layer and the jth T layer. Since the rate of a T layer is relatively independent of its reference layers as reported in Chapter 3, the investigation of the rate characteristics of a Q layer can be performed within a T layer independently of other T layers. Being similar to the T layer distortion modeling, theR-DcharacteristicsofenhancementQlayerscanbeanalyzedwithrespecttothebase Q layer R-D characteristics. In this section, we describe the Q layer R-D characteristics of H.264/SVC video as a sequel to the T layer R-D modeling in Chapter 3. 4.2.1 Q Layer Rate Characteristics Asdiscussedinthelastchapter,therateofaTlayerisindependentofitsreferencelayers. However, for a fixed T layer, the rate characteristic of a Q layer has strong dependence which demands careful investigation. Intuitively, when a frame is encoded into a number of quality layers, the information in the frame is distributed into the quality layers according to the quantization for each layer. Fig. 4.2demonstratesthesignaldecompositiondeterminedbythebaseQlayerQP decision. That is, the energy of the residual signal, which is to be coded as enhancement Qlayers,isdeterminedbythebaselayerreconstructionfidelity. Generallyspeaking,since the input to a quality enhancement layer encoder is the differential signal between the 69 QP=45 QP=30 + + Reconstruction Residual Figure 4.2: Illustration of signal decomposition which explains the rate dependence of the enhancement layer on the QP value of the base Q layer. original frame and the reconstructed frame from lower layers, the rate of an enhancement layer depends on the bit allocation decision of lower layers. QL-1 Rate Characteristics We show the rate of Q layer 1 (QL-1) as a function of that of the base Q layer (QL-0) inFig. 4.3, whereq i ’sarecontrolledtogeneratetheseratepoints. TheindependentQL-0 rate function, which is determined only by its own q, is employed as the domain of the QL-1ratecharacteristicsobservation. FortherateofQL-1inFig. 4.3, wehavetwotypes of curves representing the following two settings: 1. Solid line: QP 0 −QP 1 =∆, and 2. Dashed line: fixed QP 1 with QP 0 ≥QP 1 +∆. As shown by the dashed curve in Fig. 4.3, the QL-1 rate decreases as the QL-0 rate increases. This is because higher bit rates of the base layer result in the reduction of the 70 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45 50 QL-0 Rate (kbits) QL-1 Rate (kbits) Fixed QP0-QP1 Fixed QP1 (a) Foreman, QCIF 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 180 200 QL-0 Rate (kbits) QL-1 Rate (kbits) Fixed QP0-QP1 Fixed QP1 (b) Soccer, CIF Figure 4.3: Illustration of rate dependency, where the x-axis is the rate of layer QL-0 and the y-axis is the rate of layer QL-1. information to be coded in the enhancement layer. As a result, the enhancement layer rate is reduced when QP 1 is fixed. Being similar to the T layer distortion dependence relationship, the QL-1 rate with respect to the QL-0 rate has the following behavior. 1. The rate of a quality enhancement layer can be expressed as an affine function of the rate of the quality base layer (QL-0). 2. The slope remains approximately constant for curves under the same settings of q’s. That is, the branches by the second q-setting are approximately parallel to each other. These two properties enables the approximation of the two types of curves by the following two constants: m 0 ≈ R 1 (q 0 ;q 0 −∆)−R 1 (q ′ 0 ;q ′ 0 −∆) R 0 (q 0 )−R 0 (q ′ 0 ) and m 1 ≈ R 1 (q 0 ;q 1 )−R 1 (q ′ 0 ;q 1 ) R 0 (q 0 )−R 0 (q ′ 0 ) ; (4.1) 71 where q and q ′ denote different q values and m 0 and m 1 represent the slopes of the approximating lines of the solid and the dashed curves, respectively. QL-2 Rate Characteristics 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 80 90 QL-0 Rate (kbits) QL-2 Rate (kbits) Fixed QP1-QP2 & QP0-QP1 Fixed QP2 & QP0-QP1 (a) City, QCIF, Fixed QP 2 0 50 100 150 200 250 0 50 100 150 200 250 300 350 QL-0 Rate (kbits) QL-2 Rate (kbits) Fixed QP0-QP1 & QP1-QP2 Fixed QP2 & QP0-QP1 (b) Football, CIF, Fixed QP 2 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 80 90 QL-0 Rate (kbits) QL-2 Rate (kbits) Fixed QP1-QP2 & QP0-QP1 Fixed QP2 & QP0-QP1 Fixed QP1 (c) City, QCIF, Fixed QP 1 and QP 2 0 50 100 150 200 250 0 50 100 150 200 250 300 350 QL-0 Rate (kbits) QL-2 Rate (kbits) Fixed QP0-QP1 & QP1-QP2 Fixed QP2 & QP0-QP1 Fixed QP1 (d) Football, CIF, Fixed QP 1 and QP 2 Figure 4.4: Illustration of rate dependency, where the x-axis is the rate of layer QL-0 and the y-axis is the rate of layer QL-2. TheR-domaincharacteristicsoftheQL-2ratearedepictedinFig. 4.4wheredifferent types of curves reflect the following three q-settings: 1. Solid line: QP 0 −QP 1 =QP 1 −QP 2 =∆, 72 2. Dashed line: fixed QP 2 with QP 0 −QP 1 =∆, and 3. Dotted line: fixed QP 1 and QP 2 with QP 0 >QP 1 >QP 2 . This figure suggests that linear and parallel lines can be used for the approximation as that for the QL-1 rate. Thus, the QL-2 rate can be characterized by the following three constants: m 0 ≈ R 2 (q 0 ;q 0 −∆;q 0 −2∆)−R 2 (q ′ 0 ;q ′ 0 −∆;q ′ 0 −2∆) R 0 (q 0 )−R 0 (q ′ 0 ) ; m 1 ≈ R 2 (q 0 ;q 0 −∆;q 2 )−R 2 (q ′ 0 ;q ′ 0 −∆;q 2 ) R 0 (q 0 )−R 0 (q ′ 0 ) ; and m 2 ≈ R 2 (q 0 ;q 1 ;q 2 )−R 2 (q ′ 0 ;q 1 ;q 2 ) R 0 (q 0 )−R 0 (q ′ 0 ) : (4.2) The above observation can be generalized to model the dependent QL rate with more than 2 quality layers. That is, the rate of Q layer i can be characterized by i+1 slopes of m 0 ;m 1 ;··· ;m i . 4.2.2 Q Layer Distortion Characteristics The distortion characteristics of Q layers are depicted in Fig. 4.5, where the distortion of enhancement Q layers (i.e., QL-1 and QL-2) are plotted as a function of the base Q layer (QL-0). A linear relation among the quality base and enhancement layers is observed, which can be expressed as D i (q 0 ;··· ;q i )= i ·D 0 (q i ); (4.3) where i is a Q layer index, i is the model parameter and D 0 (q) is the distortion function of the base Q layer. This simple distortion relation can also be explained based on signal 73 0 50 100 150 0 20 40 60 80 100 120 140 160 QL-0 MSE QL-1 MSE City Foreman Soccer (a) QL-1 vs. QL-0, QCIF 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 QL-0 MSE QL-2 MSE City Foreman Soccer (b) QL-2 vs. QL-0, QCIF 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 QL-0 MSE QL-1 MSE Foreman News Soccer (c) QL-1 vs. QL-0, CIF 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 QL-0 MSE QL-2 MSE Foreman News Soccer (d) QL-2 vs. QL-0, CIF Figure 4.5: Illustration of distortion dependency, where the x-axis is the distortion of layer QL-0 (the reference layer) and the y-axis is the distortion of layers QL-1 and QL-2. decomposition. Since the quantization step size of the enhancement layer has to be finer than that of the base layer, the enhancement layer distortion with respect to the original frame becomes relatively independent of the base layer quantization. TheaboveR-DpropertiesofQlayersaregeneralandrobust. Theyhavebeenverified by a large number of test video sequences. Based on these observations, we will develop R-D models for Q layers and GOPs with combined Q-T scalability in the next section. 74 4.3 Quality Layer Dependent Rate Model The distortion model of Q layers can be well characterized by Eq. (4.3), which is simple. In contrast, the rate model of Q layers is more involved. The rate model of a Q layer i is constructed as a weighted linear sum of the base layer rate function evaluated at parameters q 0 ;··· ;q i . Mathematically, we have R i (q 0 ;··· ;q i )= i;0 ·R 0 (q 0 )+···+ i;i ·R 0 (q i )+ i ; (4.4) where i;j and i are model parameters which represent the influence of the rate of QL-j to that of QL-i. In this section, we present the Q layer rate modeling procedure in Sec. 4.3.1, investigate the R-D modeling of GOPs with the combined Q-T scalability in Sec. 4.3.2 and show the GOP R-D modeling results in Sec. 4.3.3. 4.3.1 Q Layer Rate Modeling QL-1 Rate Modeling AnidealizedQL-1ratecharacteristicsanditsmodelingprocedureisillustratedinFig. 4.6. GivenvaluesoftheratepointsA,BandC,theQL-1ratewitharbitraryquantization choices for QL-0 and QL-1 (i.e., q 0 and q 1 ) can be specified by the following procedure. 1. By A and B: R 1 (q 1 +∆;q 1 )=m 0 ·R 0 (q 1 +∆)+n 1 , and 2. By B and C: R 1 (q 0 ;q 1 )=(m 0 −m 1 )·R 0 (q 1 +∆)+m 1 ·R 0 (q 0 )+n 1 : 75 By substituting coefficients with 1;j ’ and 1 , we have the QL-1 rate model conforming to Eq. 4.4; R 1 (q 0 ;q 1 )= 1;0 ·R 0 (q 0 )+ 1;1 ·R 0 (q 1 +∆)+ 1 ; (4.5) where 1;0 =m 1 , 1;1 =m 0 −m 1 and 1 =n 1 are model parameters. R 0 (q) R 1 (q,q’) R 0 (q 1 + ) Slope : m 1 R 0 (q 0 ) Slope : m 0 R 1 (q 0 ,q 0 - ) R 1 (q 1 + ,q 1 ) R 1 (q 0 ,q 1 ) C A B R 1 (q,q- ) = m 0 R 0 (q)+n 1 Figure 4.6: Illustration of QL-1 rate modeling. QL-2 Rate Modeling For QL-2 rate modeling, we have three types of lines to reflect the three parameters of q 0 , q 1 and q 2 , which determine R 2 . Fig. 4.7 illustrates an idealized QL-2 rate (R 2 ) behavior with respect to R 0 . The QL-2 rate with arbitrary quantization choices can be specified by the rate points A, B, C and D as depicted in Fig. 4.7. That is, 1. By A and B: R 2 (q 2 +2∆;q 2 +∆;q 2 )=m 0 ·R 0 (q 2 +2∆)+n 2 , 2. By B and C: R 2 (q 1 +∆;q 1 ;q 2 )=(m 0 −m 1 )·R 0 (q 2 +2∆)+m 1 ·R 0 (q 1 +∆)+n 2 , and 76 3. By C and D: R 2 (q 0 ;q 1 ;q 2 )=(m 0 −m 1 )·R 0 (q 2 +2∆)+(m 1 −m 2 )·R 0 (q 1 +∆)+m 2 ·R 0 (q 0 )+n 2 . Finally, by substituting coefficients with 2;j and 2 , we can obtain the following QL-2 rate model: R 2 (q 0 ;q 1 ;q 2 )= 2;0 ·R 0 (q 0 )+ 2;1 ·R 0 (q 1 +∆)+ 2;2 ·R 0 (q 2 +2∆)+ 2 ; (4.6) where 2;0 = m 2 , 2;1 = m 1 −m 2 , 2;2 = m 0 −m 1 and 2 = n 2 are model parameters. Eqs. (4.5) and (4.6) both conform the Q layer rate model given in Eq. (4.4). Moreover, the complex multi-variable rate functions are decomposed into a linear sum of weighted single variable functions. R 0 (q) R 2 (q,q’,q’’) R 0 (q 2 +2 ) R 0 (q 1 + ) Slope : m 1 R 0 (q 0 ) C D R 2 (q 0 ,q 1 ,q 2 ) R 2 (q 1 + ,q 1 , q 2 ) R 2 (q,q- ,q-2 ) = m 0 R 0 (q)+n 2 Slope : m 2 R 2 (q 0 ,q 0 - ,q 0 -2 ) R 2 (q 2 +2 ,q 2 + ,q 2 ) B A Slope : m 0 Figure 4.7: Illustration of QL-2 rate modeling. Theabove rate modeling process can be generalized to that of QL-i (i>2) giveni+1 slopes of m 0 ;··· ;m i . That is, the model parameters of the QL-i rate model in Eq. (4.4) are determined by i;0 =m i ; i;j =m i−j −m i−j+1 ; ∀j ∈1;··· ;i; and i =n i : (4.7) 77 There is an one-to-one correspondence between the quantization step size (q) and quan- tization parameter (QP). We use the same notation ∆ above for simplicity. Rigorously speaking, they should be written as ∆ QP and ∆ q , respectively, depending on the context. 4.3.2 GOP R-D Models with Combined Q-T Scalability Inthissection,weinvestigatetheR-DmodelingofaGOPwithcombinedQ-Tscalability. The linear decomposition of dependent R-D characteristics allows the construction of the GOP-based R-D models, where the effect of the individual quantization choice for each coding unit could be explicitly considered. That is, we have R GOP (Q)= N Q −1 ∑ i=0 N T −1 ∑ j=0 R i;j and D GOP (Q)= N Q −1 ∑ i=0 N T −1 ∑ j=0 D i;j ; (4.8) where R i;j and D i;j are the rate and distortion of the ith QL and the jth TL of a GOP specified by a block (i;j) in Fig. 4.1, Q is the q-matrix formed by q 0;0 ;··· ;q i;j , and N Q and N T are the numbers of the Q and T layers, respectively. The R-D functions of a layer with index (i;j) can be expressed by R i;j (Q i;j )=R i;j (q j ) = j i;0 ·R 0;j (q 0;j )+···+ j i;i ·R 0;j (q i;j )+ j i = i ∑ k=0 j i;k ·R 0;j (q k;j )+ j i and D i;j (Q i;j )=D i;j (q t i )= j i ·D 0;j (q i;0 ;··· ;q i;j ) (4.9) = i;j ·( j;0 ·D 0;0 (q i;0 )+···+ j;j ·D 0;0 (q i;j ))= i;j j ∑ k=0 j;k ·D 0;0 (q i;k ); 78 where i and j refer to the quality and the temporal layers that the coding block belongs to, Q i;j is the matrix formed by q 0;0 ;··· ;q i;j , q j and q t i are the jth column and the ith row of Q i;j , and , and are model parameters. It is worthwhile to point out that the orthogonality of the R-D dependency of layer (i;j) helps reduce the number of variables in the first step in Eq. (4.9) significantly. This is possible since the rate and distortion dependency is considered along the Q and the T layer directions, respectively. By Eqs. (4.8) and (4.9), we have the final GOP R-D models expressed by R GOP (Q)= N T −1 ∑ j=0 N Q −1 ∑ i=0 ( i ∑ k=0 j i;k ·R 0;j (q k;j )+ j i ) and D GOP (Q)= N Q −1 ∑ i=0 N T −1 ∑ j=0 j ∑ k=0 i;j · j;k ·D 0;0 (q i;k )= N Q −1 ∑ i=0 N T −1 ∑ j=0 ! i;j ·D 0;0 (q i;j ); (4.10) where ! i;j = N T −1 ∑ k=j i;j · j;k is the model parameter for the GOP distortion. Please note that j in the model coefficient and is not an exponent but a T layer index. 4.3.3 Verication of Modeling Accuracy The performance of the Q layer R-D models is verified by the measured R-D values. We generated 105 and 560 QL-1 and QL-2 R-D samples from 16 different test sequences of the QCIF and CIF formats. Figs. 4.8 and 4.9 show the modeling results, where the estimated R-D values with models are plotted with respect to actual R-D values along the horizontal axis. Furthermore, numerical results are provided in Table 4.1, where the proposed rate model achieved an average accuracy in the range of 90% and the averaged 79 Table 4.1: Q Layer R-D modeling results. Sequence Format Accuracy(%) Format Accuracy(%) Rate Distortion Rate Distortion QL-0 QL-1 QL-0 QL-1 QL-0 QL-1 QL-0 QL-1 Bus QCIF 90.19 87.40 97.33 95.39 CIF 90.31 88.54 97.17 95.71 City 89.68 87.22 97.30 95.90 88.56 87.03 96.41 94.18 Coast 87.12 84.69 96.90 94.64 88.40 86.05 90.65 91.01 Crew 86.36 84.72 96.80 96.14 86.00 87.24 90.99 89.44 Football 90.87 87.72 94.54 94.98 90.78 88.26 89.92 87.18 Foreman 87.44 85.67 97.07 96.19 85.53 84.69 92.89 85.68 Hall 91.90 91.15 95.01 95.00 90.02 88.94 87.59 91.96 Harbour 90.96 88.64 97.12 95.90 90.77 89.17 93.33 91.09 Ice 91.15 88.81 93.87 92.12 90.63 88.00 87.25 82.65 Mobile 90.13 88.28 95.70 94.57 92.78 89.67 96.80 95.30 News 90.75 89.86 94.69 96.43 92.41 91.26 86.38 82.72 Paris 91.70 88.34 95.00 96.78 92.50 90.02 96.48 91.72 Silent 89.23 86.57 97.58 96.70 86.37 82.93 94.89 94.28 Soccer 88.14 83.20 97.20 95.33 86.43 83.70 92.18 90.85 Stefan 90.65 89.51 95.34 95.48 92.18 91.27 90.99 88.99 Tempete 91.58 88.56 95.47 93.38 91.87 90.07 94.42 96.67 Average 89.87 87.52 96.06 95.31 Average 89.66 87.93 92.40 90.59 MSE accuracy is greater than 90%. The accuracy is computed from the average error from the true R-D values; namely, accuracy= 1 N s Ns ∑ i=1 ( 1− |s i − ˆ s i | s i ) ×100; where N s is the number of samples and s i and ˆ s i are the actual and estimated values, respectively. 4.4 Joint Q-T Bit Allocation 4.4.1 Problem Formulation The joint Q-T bit allocation problem is formulated as an optimal QP (equivalently, q) decision problem, which minimizes the GOP distortion under a target bit rate for each Q 80 (a) Bus,QL-1 (b) City,QL-1 (c) Coast,QL-1 (d) Crew,QL-1 (e) Football,QL-1 (f) Foreman,QL-2 (g) Hall,QL-2 (h) Harbour,QL-2 (i) Ice,QL-2 (j) Mobile,QL-2 (k) Harbour,QL-1 (l) Mobile,QL-1 (m) Paris,QL-1 (n) Soccer,QL-1 (o) Tempete,QL-1 (p) Hall,QL-2 (q) Ice,QL-2 (r) News,QL-2 (s) Silent,QL-2 (t) Stefan,QL-2 Figure 4.8: Illustration of the Q layer rate modeling results, where the x-axis is the true QL rate and the y-axis is the estimated rate by the proposed Q layer rate model, (a)-(j) QCIF test sequences and (k)-(t) CIF test sequences. 81 (a) Crew,QL-1 (b) Foreman,QL-1 (c) Ice,QL-1 (d) Silent,QL-1 (e) Tempete,QL-1 (f) Football,QL-2 (g) Hall,QL-2 (h) Mobile,QL-2 (i) News,QL-2 (j) Stefan,QL-2 (k) Bus,QL-1 (l) Football,QL-1 (m) Harbour,QL-1 (n) Mobile,QL-1 (o) Soccer,QL-1 (p) City,QL-2 (q) Coast,QL-2 (r) Paris,QL-2 (s) Silent,QL-2 (t) Tempete,QL-2 Figure 4.9: Illustration of the Q layer distortion modeling results, where the x-axis is the actual QL MSE and the y-axis is the estimated MSE by the proposed Q layer distortion model, (a)-(j) QCIF test sequences and (k)-(t) CIF test sequences. 82 layer of a GOP. Each block in the Q-T plane (Fig. 4.1) is defined as a basic coding unit for the bit allocation. Mathematically, we have Q ∗ = argmin Q∈Q N Q ×Q N T D GOP (Q) suject to R 0 (q 0 )≤R T;0 ; R 1 (q 1 )≤R T;1 ;··· ;and R N Q −1 (q N Q −1 )≤R T;N Q −1 ; (4.11) whereR i (q i ) is the rate of QL-i,Q andq i are theN Q ×N T matrix and theN T ×1 vector whose elements are the quantization step sizes (i.e. q values) of the participating coding units, Q is the space of all admissible quantization step sizes and R T;i is the target bit budget for QL-i. The Lagrangian formulation of the constrained problem in Eq. (4.11) leads to the following unconstrained optimization problem: J(Q ∗ ;Λ ∗ )= argmin Q∈Q N Q ×Q N T;∈R N Q J(Q;Λ) = N Q −1 ∑ i=0 N T −1 ∑ j=0 D i;j + 0 N T −1 ∑ j=0 R 0;j −R T;0 +···+ N Q −1 N T −1 ∑ j=0 R N Q −1;j −R T;N Q −1 ; (4.12) where i ’s are the Lagrange multipliers. With the proposed R-D models, the Lagrange cost function in Eq. (4.12) can be rewritten as J(Q;Λ)= N Q −1 ∑ i=0 N T −1 ∑ j=0 ! i;j ·D 0;0 (q i;j )+ 0 · N T −1 ∑ j=0 R 0;j (q 0;j )−R T;0 +···+ N Q −1 · N T −1 ∑ j=0 N Q −1 ∑ k=0 ( j N Q −1;k ·R 0;j (q k;j )+ j k ) −R T;N Q −1 : (4.13) 83 Eq. (4.13) can be written in a closed form expression by the frame-based R-D models given in [19]: R(q)=a·q − and D(q)=b·q ; (4.14) where a, b, and are model parameters. Finally, the optimization problem becomes: J(Q;Λ)= N Q −1 ∑ i=0 N T −1 ∑ j=0 ! i;j ·b·q i;j + 0 · N T −1 ∑ j=0 a j ·q − j 0;j −R T;0 + 1 · N T −1 ∑ j=0 1 ∑ k=0 ( j 1;k ·a j ·q − j k;j + j 1 ) −R T;1 +··· + N Q −1 · N T −1 ∑ j=0 N Q −1 ∑ k=0 ( j N Q −1;k ·a j ·q − j k;j + j k ) −R T;N Q −1 ; (4.15) where a j and j are the rate model parameters corresponding to TL-j and i, j and k refer to a corresponding Q and T layer, and thus, they are not used for any mathematical operation such as an exponent. 4.4.2 Solution to Lagrangian The complex GOP R-D functions are decomposed into a linear sum of simple single- variable functions by the proposed GOP R-D models. Hence, the optimization problem in Eq. (4.15) can be solved by computing the partial derivatives with respect to each variable q i;j , which results in a system of non-linear equations. Mathematically, we have @J(Q;Λ) @q i;j =! i;j ·b··q −1 i;j − N Q −1 ∑ k=i k · j k;i ·a j · j ·q − j −1 i;j =0; and @J(Q;Λ) @ i = N T −1 ∑ j=0 i ∑ k=0 ( j i;k ·a j ·q − j k;j + j k ) −R T;i =0: (4.16) 84 Since the numbers of variables and equations are both equal to N Q ×N T +N Q , the system of non-linear equations in Eq. (4.16) is solvable and it is solved by the gradient method. 4.4.3 Proposed Bit Allocation Algorithm The proposed bit allocation algorithm consists of three stages as shown in Fig. 4.10. In the first stage, we determine the model parameters and coefficients for the R-D modeling of a target GOP. In the second stage, the joint Q-T bit allocation is performed by solving the Lagrangian equation in Eq. (4.12) for QP values. Finally, in the third stage, the targetGOPisencodedwiththeQPmatrixdeterminedinthesecondstagetoproducean encoded bit stream at the target bit rate for each Q layer of the target GOP. Note that the bit allocation for each coding block in Fig. 4.1 is performed simultaneously based on the solution to the Lagrange equation in Eq. (4.12). Pre-encoding Target GOP Rate Allocation & QP Decision Encoding @ R T,i Encoded GOP Stream Coefficients, Parameters Coding Parameter QP=[QP 0,0 ,…,QP 0 , N_T ;…; QP N,0 ,…,QP N_Q,N_T ] Figure 4.10: The blockdiagram of the proposed Q layer bit allocation algorithm. The encoding procedure of the proposed algorithm is summarized below. 1. Initial bit allocation to Q layers of a GOP 85 Based on frame rate F and channel rate C i , we allocate the target bit budget R T;i to the ith Q layer of a GOP according to R T;i =N F;GOP · C i F +R 0;i ; where R 0;i compensates for errors in the previous GOP bit allocation and N F;GOP is the number of frames in a GOP. 2. Pre-encoding for model parameter decision ThetargetGOPencodedtoprovideactualR-Dvaluesatpivots(pointsA,B,Cand DinFigs. 4.6and4.7). IfthenumbersofQandTlayersofthetargetGOPareN T and N Q , the required number of pre-encoding passes is N T +N Q +1, respectively. 3. Joint Q-T layer rate allocation and QP decision By numerically solving Eq. (4.16) with the gradient method, we can perform the joint Q-T bit allocation. The R−q relation in Eq. (4.14) maps the number of allocated bits to q of each Q and T layer, which provides the encoder parameterQ via the one-to-one correspondence between q and QP. 4. Encoding at the target bit rate The current GOP is encoded to produce the final bit stream at the target bit rates for each Q layer. 5. Iteration Steps 1 thorough 4 are conducted in the following GOP until the end of the input video. 86 4.4.4 Experimental Results We use 8 QCIF and CIF test sequences of different characteristics to study the perfor- mance of the proposed joint Q-T bit allocation scheme. The Fixed QP Encoder tool implemented in the JSVM reference software codec is used as the performance bench- mark [5]. The target bit rates of global bit streams range from 72 to 540 kbps for QCIF test sequences and from 144 to 1,248 kbps for CIF test sequences based on sequence characteristics. Each GOP is structured to provide 3 QL’s and 3 TL’s, and every TL-0 is coded as a P-frame except for the 0th I-frame. The Y-PSNR performance as a function of bit rates in full Q-T resolution is provided in Figs. 4.11 and 4.12 with an averaged coding efficiency of 10 GOPs. Clearly, the proposed bit allocation scheme outperforms JSVM by a substantial margin in all cases. The successful modeling of R-D characteristics in the Q and T layers results in higher coding efficiency of the proposed bit allocation scheme over the JSVM. In the JSVM implementation, each Q layer is encoded one after another with the Fixed QP Encoder, where the interaction between Q layers is not considered at all. Moreover, the QP of the T layer is determined by the hierarchy level of the target T layer using the QP cascading method [36]; namely, QP t =QP base −6:0·log 2 (SF i ) and QP i =min(51;max(0;round(QP t ))); where i is the T layer index corresponding to the hierarchy level, QP base is the T layer 0 QP and SF i is the scaling factor for T layer i. With the QP cascading method, the QP 87 50 100 150 200 250 34 35 36 37 38 39 40 41 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (a) Carphone (QCIF), 72kbps - 240kbps 50 100 150 200 250 31 32 33 34 35 36 37 38 39 40 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (b) City (QCIF), 72kbps - 240kbps 50 100 150 200 250 300 31 32 33 34 35 36 37 38 39 40 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (c) Foreman (QCIF), 72kbps - 240kbps 50 100 150 200 250 34 36 38 40 42 44 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (d) News (QCIF), 72kbps - 240kbps Figure 4.11: Comparison of coding efficiency (Y-PSNR vs. Rate) for four QCIF test sequences: (a) Carphone, (b) City, (c) Foreman and (d) News. 88 200 400 600 800 1000 1200 1400 31 32 33 34 35 36 37 38 39 40 41 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (a) Crew (CIF), 240kbps - 1,248kbps 100 200 300 400 500 600 700 36 36.5 37 37.5 38 38.5 39 39.5 40 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (b) Hall (CIF), 144kbps - 648kbps 200 400 600 800 1000 1200 1400 28 30 32 34 36 38 40 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (c) Soccer (CIF), 240kbps - 1,248kbps 200 400 600 800 1000 1200 1400 28 29 30 31 32 33 34 35 36 Rate (kbps) Y-PSNR (dB) Proposed JSVM 9.12 (d) Tempete (CIF), 240kbps - 1,248kbps Figure 4.12: Comparison of coding efficiency (Y-PSNR vs. Rate) for four CIF test sequences: (a) Crew, (b) Hall, (c) Soccer and (d) Tempet. 89 difference between adjacent T layers is always set to 1 or 2. That is, the characteristics of input pictures are not considered in the T layer QP decision process, which results in relatively poor R-D performance. The comparison of coding efficiency at each QL is provided in Table 4.3. We see that the proposed joint Q-T bit allocation scheme can produce more R-D efficient bit streams at each Q layers than those by the JSVM 9.12 Fixed QP Encoder. The achieved coding gains in terms of the Y-PSNR value are in the range of 0.80 dB and 0.50 dB for QCIF and CIF test sequences, respectively. 4.5 Conclusion In this chapter, we studied the Q layer R-D characteristics and their modeling. We observed the following properties: 1. the orthogonality of R-D characteristics in Q-T plane; 2. the independence of the Q layer distortion characteristics on its preceding Q layers; 3. the independence of the Q layer rate characteristics on the base Q layer. The R-D orthogonality in the Q-T plane was used to simply the joint T-Q bit allocation process. The Q layer rate characteristics was used to derive a Q layer dependent rate model. Followed by the Q layer R-D modeling, we derived GOP based R-D models for Q- T scalable H.264/SVC videos. With GOP-based R-D models, we examined the joint Q-T bit allocation problem with an objective to minimize the GOP distortion under 90 Table 4.2: Experimental result: Global bit stream coding efficiency, 10 GOPs (∆R =R prop −R JSVM , ∆PSNR=PSNR prop −PSNR JSVM ) Format Sequence R T JSVM 9.12 Proposed ∆R ∆PSNR Rate Y-PSNR Rate Y-PSNR (kbps) (dB) (kbps) (dB) QCIF Hall 72 70.89 36.43 66.17 38.38 -4.72 +1.95 96 98.37 38.15 88.29 39.54 -10.08 +1.39 144 145.89 40.09 136.18 40.89 -9.71 +0.80 240 265.57 41.96 238.16 42.14 -27.41 +0.18 Harbour 180 188.65 30.71 185.32 30.97 -3.33 +0.26 240 237.25 32.03 245.04 32.25 +7.79 +0.22 360 374.47 33.87 368.93 34.19 -5.54 +0.32 540 556.96 36.32 548.18 36.35 -8.78 +0.03 Paris 90 88.47 30.04 89.45 31.70 +1.02 +1.66 150 153.13 33.47 145.86 34.72 -7.27 +1.25 270 272.51 37.49 265.19 37.44 -7.32 -0.05 450 454.07 41.14 449.26 42.03 -4.81 +0.89 Soccer 180 167.95 33.90 177.51 34.47 +9.56 +0.57 240 236.17 35.44 236.58 36.09 +0.41 +0.65 360 351.49 37.92 356.40 38.71 +4.91 +0.79 540 536.66 41.35 528.54 41.33 -8.12 -0.02 CIF City 144 143.65 29.24 144.30 29.58 +0.65 +0.34 216 216.23 31.14 212.82 31.53 -3.41 +0.39 360 360.60 33.70 345.00 34.03 -15.60 +0.33 648 656.14 36.68 620.75 37.09 -35.39 +0.41 Foreman 144 146.13 32.39 142.41 32.54 -3.72 +0.15 216 221.07 34.92 209.79 35.15 -11.32 +0.23 360 372.88 38.14 346.31 38.24 -26.57 +0.10 648 645.20 41.19 629.37 41.75 -15.83 +0.56 News 144 134.36 35.39 138.65 35.60 +4.29 +0.21 216 215.93 37.52 204.39 37.74 -11.54 +0.22 360 367.83 40.24 338.35 40.40 -29.48 +0.16 648 639.70 43.13 628.68 43.41 -11.02 +0.28 Paris 144 150.17 27.42 141.69 27.29 -8.48 -0.13 216 217.64 29.37 212.70 29.52 -4.94 +0.15 360 372.70 32.23 352.33 32.37 -20.37 +0.14 648 648.25 35.34 627.68 35.66 -20.57 +0.32 91 Table 4.3: Quality layer rates and Y-PSNR, 10 GOP average (∆R =R prop −R JSVM , ∆PSNR=PSNR prop −PSNR JSVM ) Format Sequence QL R T JSVM 9.12 Proposed ∆R ∆PSNR Rate PSNR Rate PSNR (kbps) (dB) (kbps) (dB) QCIF News 0 48 46.57 37.25 47.35 37.34 +0.78 +0.09 1 96 103.00 39.34 94.48 41.48 -6.52 +2.14 2 144 150.53 41.63 141.70 42.94 -8.83 +1.31 Average · 39.05 · 39.91 · +0.86 Soccer 0 120 121.53 35.63 119.45 36.44 -2.08 +0.81 1 240 244.27 38.59 238.08 39.73 -6.19 +1.14 2 360 351.49 41.39 356.40 41.55 +4.91 +0.16 Average · 37.92 · 38.71 · +0.79 CIF Crew 0 416 426.24 37.39 422.06 37.95 -4.18 +0.56 1 932 841.42 39.96 833.06 40.13 -8.37 +0.17 2 1248 1269.23 41.56 1236.49 41.31 -32.84 -0.28 Average · 39.23 · 39.57 · +0.34 Tempete 0 224 206.34 30.80 230.42 31.63 +24.06 +0.82 1 448 425.78 32.62 457.88 33.13 +22.10 +0.51 2 672 652.97 33.91 678.73 34.06 +25.76 +0.15 Average · 32.26 · 32.82 · +0.56 a target bit rate. The constrained optimization problem was converted to a Lagrange optimization problem, where a closed form expression can be derived using the proposed R-Dmodels. TheLagrangeoptimizationproblemwassolvednumerically. Itwasshownby experimentalresultsthattheproposednewR-Dmodelsyieldahighlyefficientbitstream, which outperforms the JSVM benchmark by a significant margin. Our contribution lies in the decomposition of multi-variable dependent R-D functions into a linear sum of independent single-variable R-D functions. As a result, the dependent bit allocation problem can be solved at lower complexity. Being similar to the T layer bit allocation algorithm in Chapter 3, the proposed bit allocation scheme demands a number of pre-encoding passes, which is linearly propor- tional to the number of Q and T layers. Although the complexity is linear, it is still a substantial overhead for a practical encoder. For this reason, we need to develop an 92 efficient model parameter prediction method. We expect that the pseudo-stationarity of video signals can be used in parameter prediction for the joint Q-T layer dependent R-D models. In Chapters 3 and 4, we developed dependent R-D models to deal with the temporal and the quality scalability of H.264/SVC video. Dependent R-D modeling for spatially scalable H.264/SVC video was treated in [26], where the spatial (S) layer R-D character- isticswasinvestigatedwiththesignaldecompositionprincipleasdoneinourQlayerR-D characteristics study. We expect that the combination of T, Q and S layer R-D models can be successfully employed for efficient bit allocation of H.264/SVC video with the 3-D scalability. 93 Chapter 5 S-domain Analysis of Dependent R-D Characteristics 5.1 Introduction We have demonstrated the self-domain (S-domain) analysis of dependent R-D charac- teristics and their modeling in the T-Q scalability in Chapters 3 and 4. In this chapter, we provide an in-depth analysis of the R-D characteristics and the model parameters to provide solid understanding of the complex dependent R-D characteristics. The S-domain analysis employs the R-D characteristics of the reference layer as the observationdomainofthoseofdependentlayers. TheadvantageoftheS-domainanalysis is that it enables the decomposition of dependent R-D functions into a linear sum of base layer functions evaluated at the quantization choices of participating layers. Then, the influence of the individual quantization choices for each layer could be clearly identified. The S-domain R-D analysis results in greatly simplified dependent R-D models for scalable video. The benefit of having simple dependent R-D models are already verified in Chapters 3 and 4 with the model based dependent bit allocation algorithms, which 94 run with linear complexity whereas the conventional dependent bit allocations are of exponential complexity [32], [25]. The rest of this chapter is organized as follows. We first introduce four types of R-D characteristics in Sec. 5.2. The linear dependent R-D models based on these four types of R-D characteristics are introduced in Sec. 5.3 along with numerical verification and model parameter analysis. Finally, we give some concluding remarks in Sec. 5.4. 5.2 FourTypesofDependentRate/DistortionCharacteristics The difficulty of modeling dependent rate and distortion characteristics comes from the factthattheinfluenceofreferencelayersonthetargetcodingunithasnotyetbeenclearly understoodinthepast. Generally,therateanddistortionfunctionsofadependentcoding unit can be expressed in multi-variable functional form as R i;j (Q i;j ) and D i;j (Q i;j ) with Q i;j = q 0;0 q 0;1 ··· q 0;j q 1;0 q 1;1 ··· q 1;j ··· ··· ··· ··· q i;0 q i;1 ··· q i;j ; (5.1) where R i;j and D i;j are the rate and the distortion of a coding unit at the i th temporal (T) layer and the j th quality (Q) layer, respectively, andQ i;j is the matrix of parameters (i.e., quantization step sizes). WeproposeanS-domainanalysistounderstandtheeffectofquantizationparameters ontherateandthedistortionfunctions. TheadvantageoftheS-domainanalysisisthatit 95 enablesthedependentrateordistortionfunctiontobeexpressedbytherateordistortion function of the base layer. As a result, the number of variables in the dependent rate or distortion function as shown in Eq. (5.1) can be reduced to that of the base layer. By employing the rate or distortion function of the base layer as the observation domain, a multi-variable rate or distortion function can be simplified to be a linear combination of single-variable rate or distortion functions. 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 TL-0 Rate (kbits) TL-1 Rate (kbits) QP 27 QP 30 QP 33 QP 36 QP 39 (a) Football, QCIF 0 20 40 60 80 100 120 140 6 8 10 12 14 16 18 20 22 24 26 TL-0 Rate (kbits) TL-1 Rate (kbits) QP 27 QP 30 QP 33 QP 36 QP 39 (b) Foreman, CIF Figure 5.1: The rate characteristics of a dependent T layer as a function of the rate of the base T layer: (a) Football and (b) Foreman. We classify the rate/distortion characteristics of dependent T/Q layers into the fol- lowing four types. • Type I: Rate characteristics of a dependent T layer A typical rate characteristics of a dependent T layer is plotted as a function of that of the based layer in Fig. 5.1. We see clearly that the rate of the dependent T layer 96 0 20 40 60 80 100 120 140 160 180 0 50 100 150 200 250 300 350 400 450 500 TL-0 MSE TL-2 MSE QP0=QP1=QP2 QP0=QP1<=QP2, Fixed QP2 QP0<=QP1<=QP2, Fixed QP1, QP2 (a) Football, CIF, TL-2 0 20 40 60 80 100 120 140 160 180 0 50 100 150 200 250 300 350 TL-0 MSE TL-2 MSE QP0=QP1=QP2 QP0=QP1<=QP2, Fixed QP2 QP0<=QP1<=QP2, Fixed QP1, QP2 (b) Foreman, QCIF, TL-2 Figure 5.2: The distortion characteristics of a dependent T layer as a function of the distortion of the base T layer: (a) Football and (b) Foreman. isactuallyindependentofthatofthebaseTlayer. Itcanbewrittenmathematically as R i;j (Q i;j )≈R i;j (q t i ); (5.2) where q t i denotes the transposition of the i th row of Q i;j . The number of variables in Eq. (5.1) is reduced from a matrix of parameters to a vector. • Type II: Distortion characteristics of a dependent T layer A typical distortion characteristics of a dependent T layer is plotted as a function of that of the based T layer in Fig. 5.2. The dependency is more complicated, and its approximation and simplification are discussed in Sec. 3.2. • Type III: Rate characteristics of a dependent Q layer A typical rate characteristics of a dependent Q layer is plotted as a function of that of the base Q layer in Fig. 5.3. Its approximation and simplification are discussed in Sec. 4.2. 97 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 180 200 QL-0 Rate (kbits) QL-1 Rate (kbits) Fixed QP0-QP1 Fixed QP1 (a) Soccer, CIF 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 80 90 QL-0 Rate (kbits) QL-2 Rate (kbits) Fixed QP1-QP2 & QP0-QP1 Fixed QP2 & QP0-QP1 Fixed QP1 (b) City, QCIF Figure 5.3: The rate characteristics of a dependent Q layer as a function of the rate of the base Q layer: (a) Soccer and (b) City. • Type IV: Distortion characteristics of a dependent Q layer A typical distortion characteristics of a dependent Q layer is plotted as a function of that of the based Q layer in Fig. 5.4. Similarly to the type I T layer rate characteristics, the Q layer distortion is independent of that of the base layer as demonstrated in Figs. 5.4(a) and 5.4(b) that plot D 1 (q 0 ;q 1 ) vs. D 0 (q 0 ). Then we have the first simplification of the Q layer distortion by D i;j (Q i;j )≈D i;j (q j ); (5.3) where q j is the j th column of Q i;j that corresponds to the quantization choices of coding units of the j th Q layer. Next, we plot the QL-2 distortion (D 2 (q)) with respect to the base layer (QL-0) distortion (D 0 (q)) in Figs. 5.4(c) and 5.4(d) to 98 0 20 40 60 80 100 120 140 160 180 200 220 0 10 20 30 40 50 60 70 80 QL-0 MSE QL-1 MSE QP 17 QP 21 QP 25 QP 29 QP 33 QP 37 (a) City, QCIF 0 20 40 60 80 100 120 140 160 0 10 20 30 40 50 60 70 80 QL-0 MSE QL-1 MSE QP 17 QP 21 QP 25 QP 29 QP 33 QP 37 (b) Soccer, CIF 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 QL-0 MSE QL-2 MSE City Foreman Soccer (c) QL-2 vs. QL-0, QCIF 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 QL-0 MSE QL-2 MSE Foreman News Soccer (d) QL-2 vs. QL-0, CIF Figure 5.4: The distortion characteristics of a dependent Q layer as a function of the distortion of the base Q layer: (a) QCIF and (b) CIF. furthercharacterizetheQlayerdistortionbehavior. Weobservethefollowinglinear relationship: D i;j (q j )≈ i ·D i;0 (q j ); (5.4) where i is the model parameter. Finally, the type IV quality layer distortion can be characterized by D i;j (Q i;j )≈D i;j (q j )≈ i;j ·D i;0 (q j ); (5.5) 99 where q j is the j th column of Q i;j and i;j is a parameter representing the slopes of the lines in Figs. 5.4(c) and 5.4(d). We could observe the duality between the R-D characteristics in the temporal and the quality scalability. That is, the T layer rate (type I) and the Q layer distortion (type IV) are similar while the T layer (type II) distortion and the Q layer rate (type III) are similar. ThefollowingsectiondescribesdependentR-Dmodelsbasedontheobservations in this section. 5.3 Joint T-Q Rate/Distortion Models 5.3.1 Derivation of Joint T-Q Models Based on the discussion in the last section, the dependent rate and distortion functions can be expressed as f i (q 0 ;··· ;q i )=c 0 ·f 0 (q 0 )+···+c i ·f 0 (q i )+c i+1 ; (5.6) wherec i ’saremodelcoefficients,f i (·)istherateorthedistortionfunctionofadependent layer i and f 0 (·) is the corresponding function of the base layer. The result in Eq. (5.6) is significant since we can convert a multi-variable rate or distortion function to a linear combination of single-variable rate or distortion functions of the base layer evaluated at parameters q 0 ;··· ;q i , separately. Clearly, the multi-variable function in the left-hand- side of Eq. (5.6) is much more difficult to model than the single-variable function in the right-hand-side of Eq. (5.6). As shown in Eqs. (3.3), (3.4), (4.1) and (4.2), model 100 coefficients c i can be determined by running experiments at several selected values of q 0 ;··· ;q i . Type II Distortion and Type III Rate Characteristics We have one important observation that can be used to simplify R-D models of joint T-Q scalability. That is, the T layer rate (type I) and the Q layer distortion (type IV) are independent of the temporal and the spatial predictors, respectively. For this reason, the influences of prediction signals along the corresponding scalability directions can be safely ignored as appeared in Eqs. (5.2) and (5.5). Hence, the distortion characteristics of Type II and the rate characteristics of Type III can be written, respectively, as Type II: D i;j (q t j ) = i ∑ k=0 j i;k ·D 0;j (q k;j )= j i i ∑ k=0 j i;k ·D 0;0 (q k;j ); (5.7) Type III: R i;j (q i ) = j ∑ k=0 j;k i ·R i;0 (q i;k )+ j i ; (5.8) where i;0 =m i , i;k =m i−k −m i−k+1 , 0 =m j and k =m j−k −m j−k+1 for k =1;··· ;i and j i is the y-intercept of the line having m 0 as its slope, e.g., the line determined by the pivots A and B in Figs. 4.6 and 4.7. GOP-based R/D Models of Q Layers Next, we extend the T layer distortion model in Eqs. (5.7) and (5.8) to derive a GOP-based distortion model of a Q layer that consists of a number of T layers in a GOP. 101 This can be computed simply by adding the distortion of all participating T layers in a Q layer. Thus, we have D j (q 0;j ;··· ;q N T −1;j )= N T −1 ∑ i=0 ( j i i ∑ k=0 i;k ·D 0;0 (q k;j ) ) = N T −1 ∑ i=0 ! i;j ·D 0;0 (q l;j ); (5.9) where N T is the number of T layers and ! i;j = j i N T ∑ k=i j k;l is the model parameter of the Q layer distortion. Being different from the distortion, the rate of a Q layer is the sum of the rates of the current and its previous Q layers. Therefore, the rate of thejth Q layer can be computed by R j (q 0;0 ··· ;q N T −1;j )= j ∑ k=0 N T −1 ∑ i=0 R i;k (q k )= j ∑ k=0 N T −1 ∑ i=0 ( k ∑ l=0 k;l i ·R i;0 (q i;k )+ l i ) : (5.10) The verification of the joint T-Q layer R-D models are given in the following section. 5.3.2 Model Verication In Secs. 3.3 and 4.3, the modeling results are provided for the T layer distortion (Type II) and the Q layer rate and distortion (Types III and IV) separately. In this section, we demonstrate the performance of joint T-Q layer R-D models that combine the four types of R-D models. Without loss of generality, we examine combined T-Q scalability with three Q layer and three T layers below. 102 Table 5.1: Modeling Accuracy (%) of Each Coding Block. Sequence Format T1Q0 T2Q0 T0Q1 T1Q1 T2Q1 T0Q2 T1Q2 T2Q2 City QCIF Rate • • 85.18 90.94 83.67 81.26 85.72 83.76 MSE 97.14 96.32 95.33 96.56 93.40 86.07 86.30 85.60 CIF Rate • • 82.38 89.56 90.88 78.59 82.64 85.43 MSE 98.63 97.66 93.92 95.07 96.57 85.10 85.60 85.16 Foreman QCIF Rate • • 86.32 91.12 86.94 83.47 90.96 85.05 MSE 95.79 96.87 96.96 96.74 93.02 94.60 90.31 91.99 CIF Rate • • 80.63 89.61 87.24 81.97 89.96 92.73 MSE 96.11 97.73 97.03 92.19 96.60 93.40 92.74 93.56 News QCIF Rate • • 82.24 87.90 86.31 90.09 92.08 91.87 MSE 96.91 97.98 97.06 96.20 96.54 92.20 94.31 94.90 CIF Rate • • 80.21 78.34 90.44 92.27 72.91 80.94 MSE 98.34 99.23 97.84 97.41 98.17 95.34 93.94 93.64 Soccer QCIF Rate • • 87.50 85.21 83.88 82.55 89.82 89.95 MSE 96.65 97.22 94.86 95.28 92.67 90.19 94.59 90.79 CIF Rate • • 89.31 86.88 92.90 80.48 89.46 90.18 MSE 95.52 96.71 94.52 97.05 97.70 78.06 87.28 89.09 Tempete QCIF Rate • • 76.00 89.54 89.77 65.04 88.62 89.45 MSE 97.40 97.50 92.76 93.25 93.02 83.32 80.81 76.13 CIF Rate • • 76.96 86.82 87.32 61.02 82.93 88.83 MSE 95.78 95.85 95.38 88.92 87.39 84.50 75.23 65.50 First,theaveragedmodelingaccuracyofallcodingblocksisshowninTable5.1,where TiQj indicates the ith T and the jth Q layer. The accuracy is computed by accuracy= 1 N s Ns ∑ i=1 ( 1− |s i − ˆ s i | s i ) ×100; where s i and ˆ s i are the original and the estimated R or D values, respectively. As shown in Table 5.1, the estimated dependent R and D values are quite accurate for various test sequences of different types and formats. Next, we show the GOP-based Q layer R and D modeling accuracy in Table 5.2. The effectiveness and robustness of the proposed R and D models are well verified by Tables 5.1 and 5.2. 103 Table 5.2: Modeling Accuracy (%) of Quality Layers. Format QL-0 QL-1 QL-2 Rate MSE Rate MSE Rate MSE City QCIF 94.44 97.56 91.16 95.71 92.42 88.41 CIF 96.64 98.57 92.79 95.45 91.54 87.75 Foreman QCIF 98.23 96.57 93.97 93.75 91.02 88.93 CIF 97.16 97.76 89.95 96.57 88.65 94.55 News QCIF 98.57 96.13 95.51 96.63 95.62 95.08 CIF 97.87 99.30 95.98 98.05 96.60 95.32 Soccer QCIF 99.12 96.28 92.42 91.39 90.24 92.10 CIF 97.40 96.53 91.96 97.56 93.22 89.27 Tempete QCIF 99.06 98.89 84.71 94.51 89.86 79.62 CIF 99.00 97.30 86.30 89.13 77.58 71.44 5.3.3 Analysis of Model Parameters Recall that we employ the R and D functions of the prediction signal as the basis func- tions for the analysis of the R-D characteristics of dependent layers, and decompose the dependent R-D functions into a weighted linear sum of the basis functions evaluated at the quantization step sizes of participating layers. We attempt to explain the physical meanings of model parameters for Types II and III in this subsection. Generally speaking, weights in a linear decomposition reflect the contribution of the participating function to the decomposed entity. Therefore, the model parameters rep- resent the influence of the quantization choice of each layer in the proposed R and D models. T Layer Distortion (Type II) Model Parameters Ramchandranet al.[32]observedamonotonocitypropertyinvideocoding,whichsays that“higher fidelity references lead to higher coding efficiency of inter-coded frames”. The monotonocity is widely accepted as a good heuristic for the QP decision of dependent frames. However, there is still no clear answer with respect to the following question: ”how good does the reference quality have to be given a target bit rate for a number 104 Table 5.3: T layer distortion model parameters. Sequence TL QCIF CIF 0 1 2 3 ∑ 0 1 2 3 ∑ City 0 1 0 0 0 1 1 0 0 0 1 1 0.836 0.150 0 0 0.986 0.822 0.153 0 0 0.978 2 1.744 0.031 0.245 0 2.020 1.677 0.030 0.257 0 1.964 3 3.418 0.090 0.160 0.341 4.009 3.403 0.117 0.057 0.449 4.026 ! 6.998 0.271 0.405 0.341 8.015 6.902 0.300 0.314 0.449 7.965 Football 0 1 0 0 0 1 1 0 0 0 1 1 0.350 0.827 0 0 1.177 0.411 0.754 0 0 1.165 2 0.956 0.180 1.098 0 2.234 0.972 0.224 1.078 0 2.274 3 2.012 0.626 0.319 1.341 4.298 2.106 0.553 0.327 1.378 4.364 ! 4.318 1.633 1.417 1.341 8.709 4.489 1.531 1.405 1.378 8.803 Hall 0 1 0 0 0 1 1 0 0 0 1 1 0.985 0.015 0 0 1.000 0.924 0.082 0 0 1.006 2 1.979 0.003 0.032 0 2.014 1.837 0.007 0.163 0 2.007 3 3.948 0.010 0.010 0.047 4.015 3.679 0.033 0.027 0.291 4.030 ! 7.912 0.028 0.042 0.047 8.029 7.440 0.122 0.190 0.291 8.043 Soccer 0 1 0 0 0 1 1 0 0 0 1 1 0.495 0.547 0 0 1.042 0.642 0.466 0 0 1.108 2 1.109 0.175 0.808 0 2.092 1.314 0.163 0.654 0 2.131 3 2.322 0.398 0.366 0.899 3.985 2.664 0.377 0.186 0.976 4.203 ! 4.926 1.120 1.174 0.899 8.119 5.620 1.006 0.840 0.976 8.442 of frames?”. To answer this question, we may examine the model parameters of the proposed dependent distortion model more carefully. The dependent distortion model is a linear decomposition as given in Eqs. (5.7) and (5.9). As a result, model parameters (’s and !’s) directly quantifies the influence of the distortion of participating layers on that of the ultimate coded frames. Moreover, since the contributing function in each term is evaluated at the quantization step-size of the corresponding T layer, the influence of the quantization choice is also taken into account by the distortion model in Eqs. (5.7) and (5.9). We list the distortion model parameters of four test sequences in Table 5.3. Parameters i (i = 0;1;2;3) and ! indicate how the distortions of a T layer and a GOP are constituted by the quantization choices of participating T layers, respectively. 105 We have the following observations from data in this table. • If the temporal correlation of a sequence is higher (i.e. video of slow motion), the contribution of higher T layers becomes less important. In the four test video sequences, City and Hall have a higher temporal correlation (slower motion) while Football and Soccer have a lower temporal correlation (faster motion). We see clearly the above trend. • The sum of T layer model parameters corresponds to the number of the T layer frames. Similarly, the sum of the GOP model parameters is approximately equal to the number of frames in a GOP. This observation indicates that the model parameters are properly normalized weights to the distortion values of participating T layers. Q Layer Rate (Type III) Model Parameters Since the rate of a T layer with multiple Q layers is the sum of the rates of all participating Q layers, the rate of a Q layer can be computed by subtracting the rates of all other Q layers from the the T layer rate. Consider the case of two Q layers. The rate of the ith T layer can be written as R i (q i;0 ;q i;1 )=R i;0 (q i;0 )+R i;1 (q i;0 ;q i;1 )=(1+ 1;0 i )·R i;0 (q i;0 )+ 1;1 i ·R i;0 (q i;1 +∆)+ 1 i ; (5.11) wherethesecondequalityisderivedbyincorporatingtheQL-1ratemodel. Wecanderive from Eq. (5.11) that the overhead of having two Q layers over a single Q layer coded by the same quantization choice of q i;1 . That is, the difference between R i (q i;0 ;q i;1 ) and 106 R i;0 (q i;1 )istherateoverhead. Theabovestatementholdsunderanassumptionthatboth of the coding configurations have about the same distortion. Based on Eq. (5.11), we can derive the rate of the ith T layer with multiple Q layers as R i (q i;0 ;··· ;q i;L )= L ∑ j=0 ( L ∑ k=l k;j i ·R i;0 (q i;j +j·∆)+ j i ) : (5.12) Model parameters of intra- and inter-coded frames of five test sequences are shown in Table 5.4. We have the following observations. • For intra-coded frames, model parameters i are similar regardless of the charac- teristics and formats of test sequences. • Forinter-codedframes,wemayclustermodelparametersintotwogroupsdepending on the underlying motion chacteristics of video sequences. The value j fluctuates more for faster motion sequences. • R i;j is mainly contributed by layers j and j−1. The quantization step size gets finer along with the index of quality enhancement layers, which explains the dependence on layer j. Besides, the residual to be coded at QL-j is that after the input frame coded at q i;j−1 . 5.4 Conclusion In this chapter, we investigated the dependent R and D characteristics with combined T-Q scalability based on the S-domain analysis and verified the models by numerical experiments. Besides, we tried to link these model parameters with the characteristics 107 Table 5.4: Q Layer Rate Model Parameters. Intra-coded Frames Sequence QL QCIF CIF 0 1 2 0 1 2 City 0 1 0 0 0 1 0 0 0 1 -0.96 1.16 0 1455.90 -0.96 1.15 0 4709.13 2 0.01 -1.09 1.32 2115.37 0.00 -1.09 1.32 6749.59 Football 0 1 0 0 0 1 0 0 0 1 -0.95 1.12 0 2116.38 -0.98 1.15 0 5688.02 2 -0.02 -1.06 1.28 2483.79 -0.03 -1.07 1.32 7016.81 Foreman 0 1 0 0 0 1 0 0 0 1 -0.98 1.19 0 897.04 -1.01 1.24 0 1712.63 2 0.01 -1.10 1.37 1421.92 0.03 -1.14 1.45 2061.73 Hall 0 1 0 0 0 1 0 0 0 1 -0.99 1.22 0 1085.71 -0.97 1.26 0 2205.69 2 0.02 -1.15 1.45 1283.27 0.02 -1.15 1.53 2436.19 Mobile 0 1 0 0 0 1 0 0 0 1 -0.97 1.10 0 3636.93 -0.98 1.11 0 12012.30 2 -0.02 -1.06 1.22 4100.82 -0.02 -1.08 1.25 13301.50 Inter-coded Frames Sequence QL QCIF CIF 0 1 2 0 1 2 City 0 1 0 0 0 1 0 0 0 1 -0.71 1.18 0 421.97 -0.74 1.21 0 899.32 2 0.06 -0.84 1.45 432.98 0.05 -0.89 1.47 1369.86 Football 0 1 0 0 0 1 0 0 0 1 -0.91 1.14 0 2066.66 -0.92 1.17 0 3585.22 2 -0.02 -0.99 1.28 2110.55 0.00 -1.01 1.35 4672.5 Foreman 0 1 0 0 0 1 0 0 0 1 -0.85 1.21 0 388.53 -0.90 1.25 0 551.35 2 0.08 -0.97 1.42 522.92 0.09 -1.03 1.51 442.26 Hall 0 1 0 0 0 1 0 0 0 1 -0.48 1.42 0 46.87 -0.63 1.38 0 -89.91 2 0.08 -0.68 1.99 -85.38 0.08 -0.84 1.78 -526.67 Mobile 0 1 0 0 0 1 0 0 0 1 -0.81 1.17 0 1119.42 -0.83 1.17 0 1976.70 2 0.02 -0.93 1.38 1043.56 0.01 -0.92 1.35 3383.22 108 of the underlying video sequences. In the next chapter, we will propose a simplified rate control algorithms for hierarchical B-pictures based on the results given in this chapter. 109 Chapter 6 Simplied Temporal Layer Bit Allocation for Hierarchical B-pictures 6.1 Introduction In this chapter, we consider a simplified rate control algorithm for hierarchical B-pictures of H.264/SVC. As shown in Fig. 6.1, a GOP is composed of hierarchically aligned B- frames with the I or the P frames at the top of the hierarchy. With the hierarchical GOP structure, bit allocation among T layers is critical to the efficiency of video coding. We already examined the T layer bit allocation problem in Chapter 3, where we used the dependent distortion model to handle inter-layer dependency among T layers. However, the complexity of the bit allocation algorithm is high since the determination of the model parameters demands the coding of the same GOP several times with different quantization step sizes. A simplified rate control algorithm is proposed in this chapter to reduce the com- plexity. Specifically, we would like to have a better understanding on model parameters and study their sensitivity with respect to different video sequences. The proposed rate 110 control algorithm consists of two stages. First, the QP of TL-0 key frames is determined by the GOP-based rate model. Then, the QPs of the remaining dependent T layer are adaptively determined by considering inter-layer dependency within a GOP. TL-0 TL-1 TL-2 TL-3 TL-4 I / P Frame B Frame GOP Figure 6.1: Illustration of a GOP of H.264/SVC, which consists of five temporal lay- ers, where TL-0 key frame is I- or P-type frame while TL-1, ···, TL-4 are formed by hierarchical B-pictures. The rest of the chapter is organized as follows. We first formulate the T layer bit allocation problem and discuss the solution framework in Sec. 6.2. Then, we present methods to determine the QP values of the key frame and the dependent T layers in Secs. 6.3.1 and 6.3.2, respectively. Experimental results are shown in Sec. 6.4 and concluding remarks are given in Sec. 6.5. 111 6.2 Problem Statement TheTlayerbitallocationalgorithmwasalreadystudiedinChapter3,wheretheproblem is formulated as the GOP distortion minimization problem by J(q ∗ ; ∗ )=argmin q∈Q N T J(q;)=argmin q∈Q N T ( N T −1 ∑ i=0 ! i ·b 0 ·q 0 i +· N T −1 ∑ i=0 a i ·q − i i ) ; (6.1) where a i , i , b 0 , 0 , and ! i are model parameters, q ∗ is the optimal solution and ∗ is the Lagrange multiplier at the optimality. The solution to Eq. (6.1) can be acquired by differentiating the distortion function with respect to variables q ′ i s and and setting them to zero. Then, we have N T + 1 equations: ! i ·b 0 · 0 ·q 0 −1 i −·a i · i ·q − i −1 i =0; i=0;··· ;N T −1; (6.2) a 0 ·q − 0 +···+a N T −1 ·q − N T 1 −R T =0: (6.3) We would like to solve the above system of equations for QP’s (or q i ’s) so as to minimize the distortion function. We can compute the relationship between q 0 and q i with (6.2); namely, q i = [ ! 0 ! i · a i · i a 0 · 0 ] 1 0 + i ·q 0 + 0 0 + i 0 ; where i=1;··· ;N T −1: (6.4) Based on the approximated q-QP relation q =0:6267·e 0:1155·QP ; 112 Table 6.1: Coefficients for the QP decision Sequence Coefficient GOP-4 GOP-8 TL-0 TL-1 TL-0 TL-1 TL-2 Foreman C0 1.01 0.97 0.99 0.98 0.95 C1 1.10 3.99 0.80 3.61 6.28 Hall C0 0.98 0.96 0.97 0.94 0.92 C1 2.58 6.38 2.49 5.76 9.47 Soccer C0 1.00 0.99 1.00 0.99 0.97 C1 0.37 2.81 -0.40 2.28 4.60 Tempete C0 0.97 0.98 0.98 0.94 0.95 C1 2.10 4.26 3.12 3.47 4.48 we can rewrite Eq. (6.4) in terms of QP as QP i =C i;0 ·QP 0 +C i;1 ; (6.5) where C i;0 = 0 + 0 0 + i ; C i;1 = 231 2000 · [ 1 0 + i ·ln ( ! 0 ! i · a i · i a 0 · 0 ) + 0 − i 0 + i ·ln0:6267 ] : The average values of coefficients in Eq. (6.5) for several test sequences with different GOP structures are shown in Table 6.1, where GOP-4 and GOP-8 refer to a GOP con- sisting of 4 and 8 frames, respectively. We observe from Table 6.1 that the value ofC i;0 is quite stable (i.e., close to one) while C i;1 varies with the characteristics of the underlying video sequences. Generally speaking, C i;1 is larger for simple sequences such as Hall and Tempete, which have stronger inter-layer dependency. Actually, by approximating C i;0 with one in Eq. (6.5), we can interpret C i;1 as the QP difference between QP i and QP 0 . Although we have a closed form expression for QP i in Eq. (6.5), it is a function of model 113 parameters. It is not convenient to re-compute model parameters for every GOP, which demandsthecodingofthesameGOPseveraltimesusingdifferentquantizationstepsizes. 6.3 One-Pass Bit Allocation Algorithm Generally speaking, a good rate control algorithm should satisfy two requirements: 1) accurate rate control to meet the target bit rate and 2) the R-D optimal bit allocation among coding units. We address these two issues separately in the following two subsec- tions. 6.3.1 GOP Rate Modeling The rate of a video coding unit is often expressed as a function of the quantization step size. One common approach in rate modeling is to examine the statistical characteristics of transform coefficients. That is, we can study the histogram of transform coefficients of macroblocks (MBs) and fit it with a certain probability distribution. For example, the quadratic rate model is a direct consequence of the Laplacian distribution of source statistics[11]. Morerecently,Kamaciet al.[19]proposedanotherframe-basedR-Dmodel R(q)=a·q − ; (6.6) whereaandaremodelparameters, undertheCauchydistributionassumptionoftrans- form coefficients. Although the rate model in Eq. (6.6) was derived from a single frame, it can be easily extendedtoaGOP.Thatis,theGOPratemodelshouldbeconsistentwiththeframerate 114 model since a GOP consists of multiple frames whose MB transform coefficients should follow the same statistics. The GOP rate was modeled as an inverse relation with the GOP average quantization step size based on the same idea in [23]. In Figs. 6.2 and 6.3, we show the GOP rate characteristics with respect to the quantization step size of the TL-0 key frame, where GOP0n denotes the nth GOP. Our conjecturethattheGOPratecharacteristicscanbemodeledbythesamestatisticalsource astheframeisverifiedinthesefigures. Moreover,weobservethattheratecharacteristics ofdifferentGOPsfromthesamesequenceareveryclosetoeachother. Thus, theQPofa targetGOPintheTL-0keyframecanbepredictedfromtheGOPratemodelconstructed from the rate statistics from previous GOPs. 6.3.2 T Layer QP Decision The analysis in Sec. 6.2 provides a guideline for the T layer QP decision to achieve high coding efficiency. Based on the discussion in Sec. 6.2, we set C i;0 to one and view C i;1 to be QP difference (∆QP i ) between TL-i and TL-0. In this section, we present a method that determines ∆QP i adaptively according to the characteristics of input video. First, weusethenumberofskippedMBsasameasureofinter-layerdependency. Fig. 6.4 shows ∆QP i as a function of the number of skipped MBs, where ∆QP i is acquired during the bit allocation process as proposed in Chapter 3 and the number of skipped MBs is normalized by the total number of MBs in a frame. We see that ∆QP i is roughly proportional to the number of skipped MB’s, which can be written as ∆QP =C· N skip N MB ; (6.7) 115 0 0.05 0.1 0.15 0.2 0 10 20 30 40 50 60 70 80 90 100 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (a) City 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 50 100 150 200 250 300 350 400 450 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (b) Football 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 10 20 30 40 50 60 70 80 90 100 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (c) Foreman 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 10 20 30 40 50 60 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (d) News Figure 6.2: The GOP rate characteristics with respect to the quantization step size of the key frame for four QCIF test sequences: (a) City ( = 1:3), (b) Football ( = 0:8), (c) Foreman (=1:0), and (d) News (=1:0). whereN skip andN MB arethenumberofskippedMBsandthenumberofMBsinaframe, respectively. The linear approximation in Fig. 6.4 is somehow rough. Thus, we consider the ratio ofMSEsbetweenTL-0andTL-i, i.e., MSE i MSE 0 ,asanothertoolforbitallocation. Theratios can be computed using the experimental data from the T layer bit allocation in Chapter 3. InFig. 6.5, weplottheMSEratiosatdifferentTlayersandGOPs. Itisobservedthat 116 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 10 20 30 40 50 60 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (a) City 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 200 400 600 800 1000 1200 1400 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (b) Football 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 50 100 150 200 250 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (c) Foreman 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 80 100 120 140 160 180 200 1/Qstep a GOP Rate (kbits) GOP 01 GOP 03 GOP 05 (d) News Figure 6.3: The GOP rate characteristics with respect to the quantization step size of the key frame for four CIF test sequences: (a) City (=1:3), (b) Football (=0:8), (c) Foreman (=0:9), and (d) News (=1:25). the MSE ratios of a T layer are stable and they do not vary much for different GOPs. Based on this observation, we can impose the following condition: R L ≤ MSE i MSE 0 ≤R H ; (6.8) 117 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 Num Skip dQP (a) City, QCIF 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 Num Skip dQP (b) Tempete, QCIF 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 Num Skip dQP (c) City, CIF 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 Num Skip dQP (d) Tempete, CIF Figure 6.4: Illustration of the relationship between ∆QP and the number of skipped MBs: (a) City, QCIF, (b) Tempete, QCIF, (c) City, CIF, and (d) Tempete, CIF. where R L and R H are the lower and the upper threshold ratio values. Because the MSE ratio can be computed only after encoding the target GOP, we employ the ratio of the mean absolute difference (MAD) instead of the MSE ratio in the proposed algorithm. However, we still need to address the chicken-and-egg dilemma of the H.264 video encoder[11]. Thatis,therate-distortionoptimization(RDO)processrequiresaQPvalue (QP RDO ),buttheQPhastobedeterminedbythenumberofskippedMBsandtheMAD ratio that are available after the RDO process. To address the dilemma, we consider the 118 0 5 10 15 20 25 30 35 40 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 GOP Number MSE Ratio TL-1 TL-2 (a) Hall, QCIF 0 5 10 15 20 25 30 35 40 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 GOP Number MSE Ratio TL-1 TL-2 (b) Mobile, CIF 0 5 10 15 20 25 30 35 40 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 GOP Number MSE Ratio TL-1 TL-2 (c) Hall, QCIF 0 5 10 15 20 25 30 35 40 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 GOP Number MSE Ratio TL-1 TL-2 (d) Mobile, CIF Figure 6.5: Illustration of the MSE ratios of several test sequences: (a) Hall, QCIF, (b) Mobile, QCIF, (c) Hall, CIF and (d) Mobile, CIF. QPs for the RDO and the quantization process separately. They are determined by the following procedure. • In the RDO process, the number of skipped MBs ( ˆ N skip ) is predicted from that of the collocated frame in the previous GOP and QP RDO is determined by ˆ N skip . • The QP for the quantization (QP Quant ) is determined by the updated number of skipped MBs (N skip ) after the RDO. 119 • If the MAD ratio is out of the threshold interval, QP Quant value is changed accord- ingly. Kwon et al. [24] has justified the separate consideration of QP RDO and QP Quant . That is, the R-D efficiency remains about the same if the difference between the two QPs (|QP RDO −QP Quant |) is small. 6.3.3 Proposed Rate Control Algorithm We propose a GOP based rate control algorithm in this subsection. As described in Secs. 6.3.1 and 6.3.2, QPs of each frame in a GOP is determined based on the inter-layer dependency of the frame. That is, if an input picture is an independent TL-0 frame, its QP is determined by the GOP rate model. The QPs of dependent B-frames in a GOP aredeterminedbasedonthenumberofskippedblocksandtheMADratio. Theencoding procedure of the proposed rate control algorithm is given below. 1. Initial bit allocation for a GOP Based on the frame rate F, channel rate C, the bit budget R T is allocated to a GOP via R T =N F;GOP · C F +R 0 ; where N F;GOP is the number of frames in a GOP and R 0 compensates for the difference between the original target bit rate and the current rate. 2. Encoding frames in the target GOP FramesinthecurrentGOPareencodedwiththeQPwhichisdetermineddepending on the position of frames using the following mechanism. 120 Table 6.2: Experimental results for GOP-4. Format Sequence R T JSVM 9.16 Proposed ∆PSNR (kbps) Rate PSNR Rate PSNR (dB) QCIF Foreman 32 33.74 31.22 34.05 31.39 +0.17 Hall 64 66.00 39.04 62.40 39.22 +0.18 News 128 133.76 41.50 133.08 41.71 +0.21 Soccer 256 267.55 38.65 268.18 38.76 +0.11 CIF Hall 64 65.99 35.06 66.60 35.39 +0.33 City 128 134.41 31.70 135.41 31.86 +0.16 Bridge 256 263.83 38.62 266.36 38.78 +0.16 Soccer 512 533.93 36.15 536.67 36.33 +0.18 (a) If an input frame is a key frame (I/P-picture), the QP is determined by the GOP rate model as described in Sec. 6.3.1. (b) If an input frame is not a key frame (B-picture), the QP is first determined by the number of skipped MBs. However, if the MAD ratio is greater (or less) than the upper threshold, the QP is increased (or decreased) by ∆. (c) Iteration until the last frame of the current GOP is reached. 3. Proceed to the next GOP until the end of the input video. 6.4 Experimental Results In this section, we compare the performance of the proposed rate control algorithm with that of the JSVM reference software encoder [7]. In the experiment, we considered two GOP structures of GOP-4 and GOP-8, which consist of 4 and 8 frames per GOP, respec- tively. In the experiment, we have encoded 161 frames at various bit rates depending on input video characteristics. The coding performance at various target bit rates is shown in Tables 6.2 and 6.3. We see that, on the average, the proposed rate control algorithm achieves about 0.20 dB improvement in coding efficiency. 121 Table 6.3: Experimental results for GOP-8. Format Sequence R T JSVM 9.16 Proposed ∆PSNR (kbps) Rate PSNR Rate PSNR (dB) QCIF Hall 32 33.25 36.25 33.39 36.51 +0.26 News 64 69.06 37.99 68.55 38.19 +0.20 Tempete 128 137.77 34.53 139.09 34.67 +0.14 Soccer 256 274.04 38.71 273.60 38.97 +0.26 CIF Bridge 64 65.73 37.52 64.63 37.63 +0.11 Hall 128 133.09 37.11 132.62 37.28 +0.17 News 512 550.86 43.08 534.26 43.25 +0.17 City 1024 1067.15 40.09 1052.93 40.47 +0.38 In Fig. 6.6, we provide the frame-by-frame PSNR values of the proposed algorithm and the JSVM. We see that the PSNR value using the JSVM rate control algorithm dropssignificantlyattheendoftheinputsequencewhiletheproposedalgorithmprovides relatively smooth PSNR variation. Generally speaking, the proposed algorithm provides less quality fluctuation than the JSVM benchmark. 6.5 Conclusion Inthischapter,weproposedasimplifiedratecontrolalgorithmforhierarchicalB-pictures of H.264/SVC by decomposing the T layer bit allocation problem into into two sub- problems. First, the QP of the TL-0 key frame is determined by the GOP rate model. Then, the QPs of the other T layers are determined based on the number of skipped blocks, which is used as a measure of inter-layer dependency. Moreover, we utilized the MADratioasthereferencetotheR-Doptimalitytocompensateforpossibleerrorsbythe number of skipped blocks. It was shown by experimental results that the proposed rate control algorithm achieves better coding efficiency than the JSVM rate control algorithm at various target bit rates for different test sequences. 122 0 20 40 60 80 100 120 140 160 24 26 28 30 32 34 36 38 40 42 Frame Number PSNR (dB) PROP JSVM 9.16 (a) Soccer@128kbps, QCIF, GOP04 0 20 40 60 80 100 120 140 160 26 28 30 32 34 36 38 40 Frame Number PSNR (dB) PROP JSVM 9.16 (b) Hall@16kbps, QCIF, GOP08 0 20 40 60 80 100 120 140 160 32 34 36 38 40 42 44 Frame Number PSNR (dB) PROP JSVM 9.16 (c) Foreman@256kbps, CIF, GOP04 0 20 40 60 80 100 120 140 160 36 36.5 37 37.5 38 38.5 39 39.5 40 40.5 41 Frame Number PSNR (dB) PROP JSVM 9.16 (d) Bridge-far@64kbps, CIF, GOP08 Figure 6.6: Illustration of the frame-by-frame PSNR results of selected test sequences (a) Soccer, QCIF, (b) Hall, QCIF, (c) Foreman, CIF and (d) Bridge, CIF . 123 Chapter 7 A Cross-layer Design to Wireless/Mobile Video Streaming 7.1 Introduction The access to digitized video contents over wireless/mobile networks has drawn a lot of attention from the academia and industry due to its potential as a killer application as witnessedinthewiredInternetenvironment. AlthoughthewirelessInternettechnologies, such as the IEEE 802.11 and 802.16 standards, have provided an access means, it is restrictedtoastaticenvironmentwithlimitedreceivermobility. Wirelessvideostreaming with mobile receivers is still under development due to several technical barriers. That is, video services over wireless network are characterized by burst noise, dependence in the compressed bit stream, and the time constraint in data delivery. Besides, video applications consume a huge amount of radio resource and a careful system design is needed to avoid unnecessary waste of the limited radio resource. Inaconventionalwirednetwork,thepotentialperformancegainbyinter-layerinterac- tion is traded for conceptual simplicity and convenience of layer-independent design and 124 optimization. Infact,thesuccessoftheInternetiscontributedbythewell-structureddef- inition of each network layer, where the inter-layer communication is strictly disallowed. However,thelayeredapproachtomobilewirelessdatacommunicationsystemdesigndoes not offer satisfactory performance. For this reason, researchers have re-visited the cross- layer approach, which is the major concern of this chapter. In principle, the cross-layer design paradigm provides a large degree of freedom and, as a consequence, we have ob- served numerous cross-layer proposals addressing various aspects of the system design to deliverbetterperformance. Althoughthebenefitofthecross-layerapproachhasbeenev- idencedbyrichresearchactivitiesintheacademiaandindustry,itisimportanttoaddress various aspects of the cross-layer design in terms of its advantages and disadvantages. Therestofthischapterisorganizedasfollows. InSec. 7.2,wediscussmajorchallenges of a wireless/mobile video streaming system to motivate the cross-layer approach. In Sec. 7.3, we examine system requirements and describe several representative examples of cross-layer video streaming systems. In Sec. 7.4, we consider practical issues with the cross-layer design based on the characteristics of video streaming applications. Finally, concluding remarks are given in Sec. 7.5. 7.2 Background Thissectionprovidesthemotivationofthecross-layerdesignapproachtowireless/mobile videostreamingsystemdesign. Wefirstdiscussthemajorchallengesofthewireless/mobile communication environment by considering each functional unit of the system. Then, we 125 Source Coding Unit Application, Transport, Network Layers Transmission Unit Network, Data Link, Physical Layers Wired Packet Switching Network Video Encoder Streaming Clinet, Depacketization Encoder Buffer / Storage Streaming Server, Packetization Comm. Tx System, Channel coding, Modulation Comm. Rx System. Channel decoding Demodulation Raw Video Video Output Video Decoder Mobile / Wireless Radio Channel Local Decoder Decoder Buffer Local Video Output Figure 7.1: Functional diagram of a wireless video streaming system. describe the potential benefits of adopting the cross-layer design to overcome these chal- lenges. 7.2.1 Challenges in Wireless/Mobile Video Streaming System Design We show the functional blocks of a source coding unit and a transmission unit along with their corresponding network layers in a wireless video streaming system in Fig. 7.1. In this figure, a compressed bit stream travels through a heterogeneous network to arrive its destination. There are three sub-systems involved: 1) a video codec, 2) a packet- switching backbone network, and 3) a wireless/mobile communication system. For the conventional layered network architecture, the distinction of these sub-systems provides great abstraction and simplicity in system design. However, their clear partition may not be practical for the development of an efficient wireless streaming system as explained below. 126 7.2.1.1 Wireless/Mobile Radio Channel The wireless/mobile radio channel is the physical path in the digital communication with the radio frequency (RF) microwave as the transmission medium. Two major issues arise due to the characteristics of the radio channel. First, the condition of a wireless/mobile channel is determined by the physical envi- ronmentbetweenthetransmitterandthereceiver,wherethequalityofthereceivedsignal isdeterminedmainlybythepropagationpathandreceivermobility. Thewireless/mobile channel is often characterized by a multi-path propagation model with multiple reflec- tors, where each propagation path is associated with its own attenuation factor, phase shift and propagation delay. Combined with the multi-path fading, receiver mobility fur- ther complicates the characteristics of received signals, which is known as the Doppler fading. As a result of independent multi-path and Doppler fading effects, we have four fading channel models (i.e., frequency selective/non-selective or fast/slow fading). The frequency-selective fast fading channel is most challenging to reliable digital communica- tion. While the BER drops exponentially with the additive Gaussian noise radio channel (AWGN) without fading, it is reported by Cavers that the average bit error rate (BER) decreasesonlyinverselywiththeincreaseofSNRasaconsequenceofthefadingeffect[10]. This implies that the increase of the transmission power may not be very effective in reducing the average BER for a fading radio channel. Moreover, Cavers reported another important observation about the fading channel, where errors occur in bursts and the 127 performance of data communication systems can be significantly degraded accordingly [10]. Another problem can be found in a multi-user scenario. Since the radio channel is a shared medium, participating users have to compete for limited radio resource using various multi-access schemes. For this reason, radio resource management such as call admission control (CAC), scheduling, and power control have to be carefully designed in a multi-user communication environment. 7.2.1.2 Application Characteristics Digital video is composed by a sequence of highly correlated digital images. The high coding efficiency of modern hybrid video coders is contributed by the successful removal of spatial and temporal redundancy in video signals. Generally speaking, two types of frames are defined in video compression standards. The intra frames are obtained using the spatial predictor in an image so that they are independent of other images in a video clip. The inter frames are encoded by exploiting the temporal correlation in a sequence of images with temporal predictors. The frames that provide the temporal predictor are called reference frames. There are two types of inter-coded frames, called the P- and B-frames, depending on the position of the predictor. Generally, inter frames have better coding efficiency than intra frames. The motion compensated prediction (MCP) process for inter frames is the major contributor to the high coding gain of the modern video codec. When digital video is compressed by the MCP process, a sequence of input images form a dependence chain, whose length is determined by the period of independent intra frames. Typically, the dependence chain 128 isdeterminedbythestructureofagroup-of-pictures(GOP),whichisdefinedbyanumber ofinterframesfollowedbyoneintraframe. Forexample,IPPPandIBBParetwocommon GOP structures. Despite the success of the MCP process, the dependence chain provides a problem in network-based video applications. Since inter frames can be correctly decoded only with the presence of its reference frames, failure to deliver reference frames results in error propagation in subsequent inter frames. The error in a video frame is often measured by the pixel-wise mean squared error (MSE) in comparison with the original raw video input. Girod et al. proposed an error propagation model that is approximately inversely proportional to the elapsed time (t) after the occurrence of transmission error [13]. The model can be expressed mathematically by MSE d (t)=MSE e + MSE 0 1+ ·t ; (7.1) where isaparameterthatrepresentstheeffectivenessoftheloopfilterforerrorremoval, MSE d and MSE e are the distortion values at the decoder and the encoder, respectively, and MSE 0 is the initial distortion caused by the transmission error at t = 0. The error propagationmodelimpliesthenatural prioritizationofvideodatain adependencechain. Intuitively, video frames at the head of a dependence chain are more important than those located at the tail. In video streaming applications, a stream of compressed video data is packetized and delivered through a data service network with a time constraint to guarantee continuous and smooth play-back. Considering the hostility of the transmission environment, the 129 time constraint and the massive amount of video data, it is often that only a portion of all video packets can be delivered on time in a realistic video streaming scenario. Then, even at the same packet loss rate, Eq. (7.1) suggests that the video quality at the receiver end could be different depending on which video packet is lost. For this reason, the service differentiation among video packets of different priority has to be considered for the performance enhancement of the video streaming system. 7.2.2 Cross-Layer Design DiscussionsinSec. 7.2.1suggestthatthedynamicserviceenvironmentcausedbytheran- dom time-varying radio channel and the unequal importance of video data imposes main challenges on the development of an efficient wireless/mobile video streaming system. To address these challenges, we may consider an adaptive network that is responsive to the dynamic environment. Since the rigid layered architecture of the conventional network systems does not offer a good solution, a recent trend in network design is to consider the cooperation among different network layers, called a cross-layer design approach. This will be detailed in this subsection. 7.2.2.1 Concepts and Definitions Thecross-layerdesignallowstheinteractionamonglayersofanetworksystem. Although the term ”cross-layer” only appeared recently, its basic concept can be found in the context of Internet-based Quality-of-Service (QoS) architectures, where QoS classes are defined by time sensitivity of certain applications [9]. The QoS architectures attempt to 130 overcomethelimitationofthebesteffortserviceandoffertheprimitiveformofcross-layer design. In the beginning, cross-layer design research has concentrated on the modification and extension of the conventional network architecture to overcome the shortcomings of the Internet. For example, the addition of Explicit Congestion Notification (ECN) to TCP/IP addressed the issue of wireless TCP/IP networks that misinterprets packet errorsasnetworkcongestion[31]. Morerecently, Shakkotaiet al. initiatedthecross-layer designastheoverallnetworksystemdesign[39]. SrivastaveandMotaniviewedthecross- layer design as a protocol design by the violation of a reference layered communication architecture [41]. A more comprehensive definition of cross-layer design was given by Jurdak in [18] that embraces a wider range of system design aspects. Regardless of the definition, inter-layer interaction that enables the adaptation of network layers is at the core of the cross-layer design principle. It is not a coincidence that the cross-layer design proposals are more prevalent in wireless networks than wired ones. The requirement for channel adaptation to maximize capacity in wireless networks has naturally led to information passing between the physical layer and upper layers. For this reason, many of early cross-layer proposals have concentrated on the PHY- MAC interactions, which resulted in Adaptive Modulation and Coding (AMC), Hybrid Automatic Repeat reQuest (H-ARQ), channel adaptive scheduling, etc. 7.2.2.2 Classifications of Cross-layer Approaches Cross-layer approaches are classified to provide further insights and a structured view on the cross-layer design principles. 131 First, we consider the classification by Zhang et al. below [48]: • End-system centric approach The system parameters are controlled by the end-system application. That is, the end-system is responsible for congestion control, error control and power control to maximize the application layer performance (e.g., the quality of streamed videos at the receiver). This approach has two advantages. First, the cost involved with the construction of a cross-layer system could be low since an extensive modification of the network infrastructure can be avoided. Second, end-user quality maximiza- tion can be easily achieved since this approach operates at the application layer. However, it may suffer from a relatively longer system response time because the round trip time (RTT) between a sender and a receiver is longer than that be- tween lower-layer communications. Moreover, the overall system efficiency may degrade significantly because of granularity mismatch in the processing unit (called the packet data unit or PDU) under a highly dynamic wireless video streaming environment. For example, a video frame may not be decoded properly because of a single bit error at the physical layer. • Network-centric approach Cross-layer optimization is performed in lower layers of the transmission system. As compared with the end-system centric approach, its advantages lie in faster responsetime,properdatagranularityandeasieraccesstotransmissionparameters such as the channel coding rate, power level control, etc. However, discrepancy between quality metrics at different layers imposes major issues on this approach. 132 For example, the video application quality metric, the MSE or the peak-signal-to- noise-ratio (PSNR), requires a proper interpretation in terms of lower layer quality metrics such as data throughput, the bit error rate and delay requirements. Another classification for the cross-layer design was suggested by Van der Schaar et al. as follows [12]. • Application-centric approach Theoptimizationisperformedbyapplicationlayerprotocols. Becauseoftheslower time-scale and the coarser data granularity of the application layer, this approach does not always provide efficient solutions. • MAC-centric approach The MAC layer determines the optimal transmission strategy based on the infor- mation from the application and the physical layers. Since this approach does not involve source coding, it can be disadvantageousfor adaptivesource channelcoding in time varying channel conditions. • IntegratedapproachThisapproachconsidersstrategiesofparticipatinglayersjointly. However,itmaynotbepracticalduetothecomputationalcomplexityifallpossible combinations of each layer’s strategies and parameters have to be examined for the optimal solution. Schaar et al. also suggested another classification based on the direction of information flow. • Top-down approach Higherlayerprotocolsserveasthecontrolleroflowerlayerparameters. Forexample, 133 the MAC layer parameters and strategies are controlled by the application layer protocols, and the MAC layer protocols determines the physical layer parameters such as channel coding rate, modulation, etc. This approach is most prevalent among cross-layer designs since QoS enabled communication systems can employ this approach and determine lower layer parameters by the QoS class of application data. • Bottom-up approach Lower layers are insulated from higher layers in losses and bandwidth variations. However, this approach tends to result in delay and unnecessary throughput reduc- tion. The above categorization schemes may not be mutually exclusive. For example, an application-centric approach may take either a top-down or a bottom-up approach. 7.3 Cross-layer Wireless/Mobile Video Streaming Due to its capability to deliver better QoS, the cross-layer approach has gained its pop- ularity in the design of wireless service networks as evidenced by a massive number of proposalsintheliterature. Inthissection,webeginwiththerequirementsofacross-layer system and provide a review of representative wireless/mobile video streaming systems based on the cross-layer design. 134 7.3.1 Requirements of Cross-Layer Video Streaming Systems 7.3.1.1 Preliminary Issues Although the cross-layer approach seems to offer a right choice for the design of wire- less/mobile video streaming systems, its success still depends on careful design consider- ations. Kawadia and Kumar pointed out the following cautionary aspects of a cross-layer design in [21]. • Architecture for technology proliferation The Internet is a set of interconnected computer networks operating on the stan- dardized common protocol - TCP/IP. Although the World Wide Web (WWW) service has facilitated its commercial success, its technical success is fundamentally rooted on well defined protocols and network architectures. For this reason, it is important for the cross-layer design to have a well defined structure and architec- ture. • Tractability of cross-layer design The divide-and-conquer approach offers one of the most effective approaches in deriving solutions to various problems. The component-wise development of a sys- tem is advantageous in its maintenance and extension since only the corresponding components have to be considered. In contrast, the cross-layer design demands simultaneous consideration of system parameters. For this reason, system main- tenance and extension could be problematic with recklessly designed cross-layer systems. 135 • Establishment of stability Due to its adaptive characteristics, cross-layer systems are necessarily a closed-loop system, which always brings up the system stability issue. To develop a success- ful cross-layer system, every consequence of cross-layer interaction must be well understood to avoid system instability. 7.3.1.2 System Requirements Due to the concerns in the previous subsection, we discuss requirements for the devel- opment of a well-structured cross-layer wireless/mobile video streaming system. Major technical obstacles in the practical deployment of cross-layer systems include: 1) discrep- ancy between layer parameters, 2) means for the cross-layer information exchange and 3) flexibility of application data. They are detailed below. Abstraction of System Parameters Inter-layer interaction involves the passing of numerous system parameters. To realize well-structured interaction, it is important to extract representative parameters from each layer and define their proper abstractions. For example, discussion in Sec. 7.2.1.2 suggests the prioritization of a video packet based on its contribution to the end quality according to the error propagation model. Thus, we can define the priority index of a video packet based on its MSE profile according to Eq. (7.1) to enable more efficient information exchange than using the actual MSE value of the video packet directly. Interface for Information Exchange Cross-layerSignalingInter-layerinformationexchangeisnotcommonintheconventional 136 layered architecture. However, it is one of the fundamental elements of a cross-layer net- workarchitecture. Protocolsmaybedefinedorextendedfromtheconventionalprotocols. Srivastava and Motani considered three cross-layer information exchanging methods [41]: 1) direct communication between layers, 2) a shared database across layers, and 3) com- pletelynewabstraction. Torealizeefficientinformationexchange,anewnetworkelement that is capable of understanding the cross-layer information is essential while maintain- ing the conventional network architecture for backward compatibility. The media aware network element (MANE) proposed by Wenger et al. may serve this purpose well since it is acknowledged by the H.264 video payload in RTP packets [44]. Given the information exchange path, the simplicity of cross-layer operations is also an important issue. The cross-layer design may demand complicated overhead to control the network operation. For example, complex cross-layer algorithms may fail to provide a prompt response to the time-stringent video streaming application. Moreover, com- plex cross-layer operations may hurt the system stability due to its intractability and unintended consequences. Flexibility of Compressed Videos To be adaptive to a dynamic communication environment, the flexibility of compressed video data is essential. However, most video coding standards do not provide a suffi- cient amount of flexibility for network-based video applications and, as a consequence, compressed video streams still suffer from error vulnerability due to transmission errors. Various techniques are employed in modern video coding standards to address error vul- nerability. For example, H.264/AVC, which is the state-of-the-art video coding standard, introduces the following tools to combat the transmission error at the encoder side: 137 • Slice based video coding • Flexible macroblock ordering (FMO) • Arbitrary slice ordering (ASO) • Data partitioning • Redundant slices • Switching (SI/SP) pictures However,theyarenotsufficienttoovercomealldifficultiesencounteredinwireless/mobile video delivery. Recently, the video coding community has made efforts in increasing flexibility in compressed video data, by adding scalability and/or multiple descriptions of compressed video. In the last two decades, practical deployment of scalable video does not happen in the industry due to the reduction of the coding gain and the additional complexity overhead. More recently, H.264/SVC has been standardized as a scalable extension of the H.264/AVC standard. With its great flexibility and excellent coding performance inherited from H.264/AVC, H.264/SVC is expected to provide a practical network-based video solution in the near future. 7.3.2 Cross-layer Wireless/Mobile Video Streaming System The goal of designing a good video streaming system is to maximize the quality of streamed video at the receive end. Adaptation is at the core of a cross-layer design 138 to overcome the hostile wireless/mobile radio channel. Thus, it is important to under- stand how channel adaptation is performed to overcome the physical obstacle. In this section, we provide a review on cross-layer solutions from a practical point of view by consideringthesystemrequirementsstatedinSec. 7.3.1.2. Wefirstdiscussthecross-layer architecture for service differentiation and then adaptive error control algorithms follow. 7.3.2.1 Network-centric approach MAC-centricQoSControlThenetwork-centricapproachisthechoiceoftoday’sindustry, which has been developed based on a well-established conventional QoS architecture. The QoS control unit is often implemented at the MAC layer of a digital communication system. As the middle layer between the application and the physical layers, the MAC layer has a positional advantage in the cross-layer design due to its proximity to the fast varying radio channel. Fig. 7.2demonstratesageneralQoSsupportarchitectureforvideoapplications,where the QoS support is realized by an adaptive QoS mapping unit, a radio channel estimator and a prioritized transmission unit. First, the adaptive QoS mapping unit classifies the applicationdataintoanumberofMAC-layerQoSclassesbasedontheestimatedavailable channel rate. Then, the prioritized transmission unit realizes differentiated services to MAC-layer PDUs of different QoS classes. Although the operation is conceptually simple as the description of the QoS control procedure, the implementation of each component has a number of technical issues. 139 Raw Video Video Encoder Adaptive QoS Mapping QoS(1) QoS(2) QoS(K) Transmission Control QoS(3) Channel Modeling Time-varying Wireless/Mobile Radio Channel Video Decoder Output Video Channel Feedback Figure 7.2: Illustration of a QoS support architecture for video applications. The fundamental technical issue arises from the requirement of translating different layers’ characteristics to the MAC-layer parameters. QoS metrics in the MAC layer in- cludesthedatarate, thepacketlossrate(PLR),thedelay, thedelayviolationprobability etc. Ontheotherhand, thequalityofastreamedvideoismeasuredbydistortionandun- interruptedplay-backatthereceiveend. Sincetheradiochannelconditionrepresentedby SNR or SINR does not have direct correspondence to QoS metrics in the MAC layer, the development of efficient cross-layer information exchange is one of the major challenges with the network-centric cross-layer design. End-to-End Cross-layer QoS Architecture for Wireless Video Delivery Kumwilaisak et al. proposed an end-to-end cross-layer QoS architecture for wireless video delivery [22]. They concentrated mainly on the wireless last hop such as the wire- less connection between a base station (BS) and a mobile station (MS). The proposed architecture embraces system components for end-to-end video delivery, which include the video codec module, the cross-layer QoS mapping and adaptation module, the link 140 layer packet transmission module, the time varying wireless channel and the adaptive wireless channel modeling module. Main considerations in the design of each module are described in the following. • Video Encoder/Decoder Video streams were prepared using MPEG-4 PFGS (Progressive Fine Granular Scalable) video codec to provide flexible compressed video. • Wireless Channel Model Since the QoS control unit was implemented at the link layer, a link-layer wireless channel model that assumes the fading, time-variant and non-stationary wireless radio channel was adopted. To reflect the randomness of the radio channel, they employed a discrete-time Markov model, where each state in a Markov chain was combined with the available transmission rate under the current channel condition. In the proposed system, the wireless channel condition was modeled in the adap- tive channel modeling module based on the feedback from the physical layer radio channel. • Link-layer Transmission Control Module Service differentiation among application data of various QoS classes was imple- mented by the transmission control module. It is implemented by a class-based bufferingandschedulingmechanismthatemploysastrictpriorityscheduling. Based oneffectivecapacity(EC)theorybyWuandNegi[45], theycomputedtheratecon- straint of each QoS class, which specified the maximum data rate to be transmitted reliably with certain statistical QoS guarantee. The video stream classification and 141 the transmission bandwidth allocation of each QoS class were determined by the rate constraints. • Cross-layer QoS mapping and adaptation module This QoS control module was the key component of the proposed system that per- forms the optimal application-to-link layer QoS mapping to the estimated channel condition adaptively. At the application layer, each video packet is labeled by its loss and delay properties and the GOP based optimal QoS mapping is performed based on the rate constraint of each QoS class to maximize video quality at the receive end. Now, we recall the system requirements in Sec. 7.3.1.2 for practical feasibility of the proposed cross-layer QoS architecture. The employment of the MPEG-4 PFGS video codec for video stream preparation resolves the issue of application data flexibility. How- ever, the system parameter abstraction and the cross-layer information exchange could be partially addressed by the proposed QoS architecture. That is, the delivery of the ap- plication layer information to the adaptive QoS mapping module is not clearly presented although various application layer information such as the loss and the delay characteris- tics, the packet size and the GOP number is used for the video packet classification and in the solution framework. QoS Support of Mobile WiMax It is important to note that any QoS support architecture must not concentrate on a specific type of application such as video streaming. Since different types of application traffic should coexist in realistic application scenarios, fairness among mixed traffics has 142 to be established. To examine this issue, we provide the QoS support architecture of the IEEE 802.16e standard as an example, which is known as Mobile WiMax. Mobile WiMax implements service-flow-based QoS support that combines a unidirec- tional packet flow with particular QoS parameters. The QoS parameters determine the transmission order and scheduling on the air interface such that the connection-oriented QoS support can provide accurate control over the air interface. Specifically, the radio channel state information (CSI) is passed to the scheduler through a Channel Quality Information Channel (CQICH) from user terminals. Moreover, dynamic resource alloca- tion is realized by having the resource allocation information at the beginning of each MAC-layer frame, which is delivered by the Media Access Protocol (MAP). The QoS support in Mobile WiMax is performed by two system components of the classifier and the scheduler, which are responsible for the application packet-to-QoS class mapping and prioritized transmission, respectively. Mobile WiMax defines five QoS classes based on different application types and their requirements. They are summarized below [17]. • Unsolicited Grant Service (UGS) This service is designed to support real-time applications that generate constant bit-rate data packets periodically. To reduce the latency, this class of services is granted the radio channel periodically. The QoS of this class is specified by the maximum sustained rate, maximum latency tolerance and jitter tolerance. • Real-Time Polling Service (rtPS) This service is for real-time service flows that generate variable-size data packets 143 periodically. Although this service incurs more request overhead and latency than UGS, it can support the variable-bit-rate (VBR) traffic such as video streaming applications. The QoS of this class is specified by the minimum reserved rate, the maximum sustained rate, maximum latency tolerance, and traffic priority. • Extended Real-Time Polling Service (ertPS) The ertPS supports real-time VBR applications. It is different from rtPS in that it is granted unicast at the BS as the UGS to reduce the latency caused by the bandwidth request. VoIP is the target application of this service, and the QoS is specified by the minimum reserved rate, the maximum sustained rate, maximum latency tolerance, jitter tolerance and traffic priority. • Non-Real-Time Polling Service (nrtPS) The nrtPS is designed to support delay-tolerant applications such as FTP and HTTP. The QoS specification parameters of this service include the minimum re- served rate, the maximum sustained rate and traffic priority. • Best Effort Service (BE) The BE service supports applications with no minimum service requirements and, thus, theserviceofapplicationsofthisclassissubjecttoresourceavailability. Data transfer applications such as the email service belong to this class, and the QoS are specified by the maximum sustained rate and traffic priority. Although the cross-layer information can be effectively delivered in a well structured manner with the QoS support architecture of Mobile WiMax, the generic issues with the network-centric cross-layer design still remain. For example, because Mobile WiMax 144 defines the MAC and the PHY layers only, the resource allocation conveyed in the MAP messages is still subject to the higher layer abstraction of the application traffic that determines the video quality at the receive end. 7.3.2.2 Application-centric Approach Adaptive Error Control The unequal importance of video packets has naturally brought up a lot of adaptive error control proposals termed unequal error protection (UEP). The basic idea of UEP algorithms is to provide more protection to video packets with higher prioritytomaximizethequalityatthereceiveend. ForwardErrorCorrection(FEC)and Automatic Repeat request (ARQ) are common techniques to combat transmission errors in the digital communication system, whose preference is dependent on the application data characteristics. Generally, ARQ demonstrates better performance in terms of error protection at the cost of delay. For this reason, the employment of ARQ has to be carefully determined with real-time applications such as video streaming. Modern digital communicationsystemssuchasMobileWiMaxandcdma20001xEV-DOemploythetwo techniques in a combined manner, which is termed Hybrid-ARQ (H-ARQ). Communication Channels for Adaptive Error Control Fig. 7.3showsthecommunicationchannel,whereerrorcontroltechniquesmaytakeplace. The physical layer is divided into a number of controllable sub-layers, including channel- coding, modulation and power-control sub-layers. Similarly to the network-centric ap- proach, the feedback information (CSI) from the radio channel plays a major role in the adaptive error control. In general, FEC may be applied to PDUs of any layer from the APP layer to the channel coding sub-layer, where the communication delay decreases 145 Link Layer Channel Physical Layer Channel MAC APP APP Channel Coding Modulation Radio Channel Demodulation Channel Decoding CSI MAC ACK / NACK Power Conrol Figure 7.3: Illustration of communication channels for error control. following the same order. ARQ algorithms are often implemented at the link layer based on the link layer feedback message. In digital communications, the transmission power is a resource regulated by the Fed- eral Communications Commissions (FCC). For the efficient use of the limited resource, Adaptive Modulation and Coding (AMC) techniques are common in the modern com- munication systems, whose objective is to achieve a pre-specified target bit error rate (BER) at the radio channel adaptively to the channel condition. Moreover, in advanced system design proposals, power control is also considered to provide unequal protection with prioritized data. Joint Source Coding and Optimal Energy Allocation Katsaggelos et al. presented an overview on a number of energy efficient wireless video techniques considering the energy limited environment of mobile communication [20]. 146 They formulated a distortion optimization problem with source coding (S) and the chan- nel parameters (C) as the variables: argmin {S;C} D tot (S;C) subject to E tot (S;C)≤E 0 and T tot (S;C)≤T 0 ; (7.2) where D, E and T are the distortion at the receiver, consumed energy and the end-to- end delay, respectively, E 0 and T 0 are the maximum allowable energy consumption and end-to-end delay constraint imposed by the application . Based on the problem formulation in Eq. (7.2), they introduced three optimization problems(i.e., jointsourcecodingandpoweradaptation, jointsource-channelcodingand power adaptation, and joint source coding and data rate adaptation) and used the power level and the transmission rate as part of a UEP mechanism. Besides, they assumed flexible source coding parameters for video preparation and modeled the wireless channel by a Markov channel model, where each state was associated with a packet loss proba- bility determined by the transmission power, the available channel information and the transmission rate assigned to a packet. This proposal and the references therein, has demonstrated great potential and pre- sented important theoretical fundamentals of the joint source coding and channel opti- mization (JSCC) techniques. However, they are still subject to several practical issues. First, the cross-layer information exchange is implicitly assumed in the solution frame- work,whichisoneofthemostimportantissuesinthepracticalemploymentofcross-layer solutions. Second, although the wireless channel is modeled as a packet loss network, the 147 connection between the packet loss probability and the physical layer parameters such as the fading effect is not clearly identified. Systematic Cross-layer Approach to Wireless Video Streaming Shanhasproposedasystematicapproachtothewirelessvideostreamingapplication[40]. The cross-layer wireless video streaming system was implemented by three schemes of 1) applicationlayerpacketization, 2)class-basedUEPand3)priority-basedARQ(P-ARQ). The UDP-Lite employed in the proposed system overcomes the inefficiency caused by the TCP based service implementation. Moreover, the proposed system implements the application-to-link layer retransmission request by the property of UDP-Lite that allows the delivery of corrupted packets to the application layer. The system description is detailed below. • Application layer packetization The application layer packetization performs the decomposition of the application layer packets into an integer number of equal-sized RLP packets. The identifica- tion of corrupted RLP packets can be easily performed at the application layer of the receiver because the integer number of RLP payload blocks is combined with corresponding 4 bit sequence number. Then, the receiver side application layer de- termines the retransmission request based on the knowledge of the FEC level for that class of application data. Another important system feature achieved by this application layer packetization is the operational estimation of the wireless channel condition based on the loss statistics of RLP packets. The channel estimation is then used to determine the optimal FEC level for each class of application data at the sender side. 148 • Class-based UEP MPEG-4 video coding standard is employed for the application data preparation in the proposed work, where different parts of a compressed bit stream are classified into 4 classes according to their importance and dependency of frames: – Class 0: Header information; – Class 1: I and P frames with scene change; – Class 2: Shape and motion information of P frames; – Class 3: Texture information of P frames. Then, a block-based Reed-Solomon (RS) code is applied for the error protection of different classes, where the protection level is adaptively determined by the appli- cation data class and the estimated channel condition. • P-ARQ With the P-ARQ scheme, application data with different classes are assigned of different number of retransmission request limits. That is, once an application packetisidentifiedtobeerroneous,theretransmissionrequestforthecorresponding RLP packet is determined by the expected decoding time of the application data and the remaining number of retransmission requests. The expected decoding time is determined by the round trip time (RTT) and the current decoder buffer status in terms of play-back time and its consideration provides the minimal play-back interrupt given the system parameters such as the round trip time (RTT). The major advantage of the proposed system comes from its realistic design that provides compatibility with conventional data communication systems. Because only the 149 application layer is subject to the complex implementation, the burden of re-designing underlying data communication network could be kept at a reasonably low level. In the proposed system, the cross-layer information exchange is implicitly enabled by the ap- plication packetization scheme. Moreover, they developed a simple cross-layer operation, i.e., the prioritized application-to-link layer retransmission request based on the expected decoding time, which does not incur huge complexity overhead in the system operation. Although the video format in the proposal is not flexible, the basic idea of video packet prioritization should be applicable to other flexible video formats with slight modifica- tion. Another important feature is channel adaptation of the proposed system. That is, channel estimation is performed at the receiver end to adaptively determine the proper FEC rate for different classes of video packets, which tends to be incapable of catching fast channel fluctuations. Besides, the application layer channel estimation is employed without proper justification, and its performance and validity were simply demonstrated by simulation results. We will show in the next section that the application channel estimation may suffice for wireless video streaming applications, which should relieve video streaming systems from tracing a wireless radio channel of fast fluctuation. 7.4 Toward Practical Cross-layer Video Streaming System In this section, we consider the development of a practical cross-layer video streaming system for wireless/mobile networks. We employ H.264/SVC as the video codec and assume a MANE at the proximity of the BS for cross-layer information exchange. By 150 analyzing the video streaming characteristics and its sensitivity to channel fluctuation, we propose an efficient and effective cross-layer video streaming system. 7.4.1 H.264/SVC for Wireless/Mobile Video Streaming For the practical employment of a cross-layer video streaming system, it is essential to have a standardized infrastructure that enables the cross-layer system design. In this section, we describe efforts of the standardization bodies to establish such an infras- tructure. Specifically, we consider H.264/SVC for the video stream preparation and the RTP/UDP/IP protocol stack for video delivery in a packet-based network environment. H.264/SVC is a recently standardized scalable video codec based on the state-of-the- artH.264/AVCstandard. Itprovidesgreatflexibilityofcompressedvideobyitstemporal (T), spatial (D), and quality/SNR (Q) scalability. A scalable bit stream is generated in such a way that a global bit stream carries embedded bit streams of combined scalable layers (T, D and Q) and the scaling operation is performed simply by discarding higher scalablelayerbitstreamsfromtheglobalbitstreamtoproduceasub-bitstreamcomposed oflowerscalablelayerbitstreams. Inaglobalbitstream, lowerscalablelayerbitstreams areusedasreferencesfortheencodingofhigherscalablelayersand, thus, thedependence chainisnaturallyformedfollowingthescalabilitystructure. Thescalingoperationalways corresponds to the temporal/spatial down sampling or quality degradation of a global scalable bit stream, and it is important to consider inter-dependency of each scalable layer when discarding higher scalable layer bit streams. The H.264/AVC standard assumes network-based video applications in its design. It defines a Network Abstraction Layer (NAL) unit as its basic processing unit. There are 151 PID TYPE 0 7 6 5 4 D QID O RR Figure 7.4: NAL unit header extension of H.264/SVC. two types of NAL units: 1) Video Coding Layer (VCL) NAL units and 2) non-VCL NAL units. Compressed video data are stored in VCL NAL units, and non-VCL NAL units contain supplemental enhancement information (SEI), parameter sets, picture delimiter, or filler data. Each NAL unit is combined with 1 byte NAL unit header that contains the abstraction of the NAL unit payload such as the NAL unit type and the non-reference indicator. An extension of the NALU header is defined for H.264/SVC video, which containsvariousidentifierflagstosignalthepriorityandthescalablelayers(D,TandQ) of the corresponding NALU payload as shown in Fig. 7.4. Among different identifiers, the 6-bit PID does not have any influence on the decoding of H.264/SVC video. Instead, it is defined to store the operator/user defined priority of the corresponding NAL unit. Given the flexibility and abstraction of compressed video, the next step is to establish theinformationinterfacebetweentheapplicationlayerandtheunderlyingnetworklayers. Forpacketbasedwirelessvideostreaming, theRTP/UDP/IPprotocolstackisacommon choice as an alternative to the conventional TCP/IP protocol stack. In RFC 3984, the RTP payload format for H.264 video is used to allow packetization of one or more NAL units in each RTP payload. RFC 3984 also introduces an important network interface element called the media aware network element (MANE), which is capable of parsing theNALunitheaderinformationofH.264/SVCvideo. WiththeexistenceoftheMANE, 152 the abstraction of application layer data (H.264/SVC video) can be successfully passed to the lower network layer protocol stacks to enable the cross-layer operation for system performance enhancement. 7.4.2 Operational Application Characteristics In this section, we examine the operational characteristics of scalable video streaming in a wireless/mobile environment. We assume that video streams are pre-encoded so that adaptive source encoding is not considered in the following discussion. First, we discuss two idealistic video streaming application scenarios based on a non-scalable video and a scalable video, where the channel rates are fully known. Fig. 7.5 shows the application scenarios characterized by a number of characteristics curves. In Fig. 7.5(a), s(t), r(t) andp(t)arethecumulativesourcerate,thechannelrate,andtheplay-backcharacteristic curves, respectively, and subscripts in Fig. 7.5(b) indicates scalable layers. time init d init s(t) r(t) p(t)=s(t-d init ) r(t) r’(t) s(t), p(t)=s(t-d init ) r'(t) T src d init +T src (a) Non-scalable video streaming time r(t) s 2 (t-d init ) s 1 (t-d init ) s 0 (t-d init ) p(t) s 0 (t-d init ), s 1 (t-d init ), s 2 (t-d init ) r(t), p(t) Layer Switch d init +T src (b) Scalable video streaming Figure 7.5: Illustration of operational application characteristics. 153 We determine the initial delay (d init ) that guarantees uninterrupted play-back of a streamed video given the source and the channel rate. That is, in Fig. 7.5(a), uninter- rupted play-back can be guaranteed if the play-back curve, p(t), is always less or equal to the channel rate, r(t). Mathematically, we have p(t)=s(t−d init )≤r(t); for d init ≤t≤d init +T src ; (7.3) where T src is the length of the video source. There are two issues with this scenario in a practical video streaming system. First, it is not realistic to assume the full channel knowledge so that cannot be determined based on the channel condition. Second, the video streaming application is not sensitive to fast radio channel fluctuation. For exam- ple, the piece-wise linear channel rate characterized by r ′ (t) in Fig. 7.5(a) should lead to the same decision of d init . The second observation suggests an important implication for the design of a wireless/mobile video streaming system that the only required channel information for video streaming is the average rate of an interval and, as a result, com- munication systems can be relieved from the burden of tracking the fast fluctuation of radio channels. This also justifies the validity of the application layer channel estimation introduced in Sec. 7.3.2.2 that the time granularity of the application channel estimation does not degrade the performance of the video streaming system. Now, we consider a more realistic scenario, where the play-back begins without an assumption of the full channel knowledge. Then, the condition in Eq. (7.3) may be violatedcausingtheintermediateplay-backinterruptionswithnon-scalablevideostreams. Under this setting, the scalability of compressed video data can be beneficial. Fig. 7.5(b) 154 illustrates such a scenario with a scalable video of 3 layers represented by s 0 (t), s 1 (t) and s 2 (t), respectively. In the figure, the play-back begins at without considering the future channel condition. Then, the layer switch occurs to avoid the play-back interrupt (low channel rate) or to provide higher video quality (high channel rate). The layer switching operation can be formulated mathematically by p(t)= argmax i=1;···;N L −1 s(t) subject to p(t)≤r(t); for d init ≤t≤d init +T src ; (7.4) where N L is the number of layers in a scalable video stream. The streaming scenario in Fig. 7.5(b) is still idealistic in that the layer switch occurs exactly at the intersections between r(t) and s(t−d init ). However, it is clear from the discussionthatanefficientlayerswitchingalgorithmshouldrealizeagoodvideostreaming system in a more realistic environment. Multimedia Server Base Station Radio Link Buffer Mobile Station (Decoder) TX NACK MANE SVC stream RTP flow Wired/dedicated connection Wireless connection Figure 7.6: Overview of the video streaming system under consideration. Our research objective is to devise an efficient layer switching algorithm in a more practical environment. Consider the video streaming scenario as depicted in Fig. 7.6. First, H.264/SVC data are encapsulated into RTP packets and delivered to a MANE. Then,theMANEassignsRTPpacketstoradiolinkprotocol(RLP)packetswhilekeeping 155 necessaryinformationfortheRLPpacketdifferentiation. Themappingamongdataunits at different network layers is demonstrated in Fig. 7.7. Under this setting, we would like to find an optimal subset of scalable video, which guarantees a smooth play-back at the highest possible end-user quality under an average channel condition and the application time constraint. SVC global stream (NALUs, S) RTP packets (P RTP ) RLP packets (P RLP ) S k P RTP,k P RLP,k Figure 7.7: Illustration of the mapping of video data from NALUs to RLPs. 7.4.3 Proposed Algorithm Theproposedalgorithmconsistsoftwomodulestoaddressthefollowingtwoissues. First, theGOP-basedpriorityinformationofeachvideopackethastobeconveyedtoradiolink layer packets so that differentiated packet transmission is possible at the radio link layer. Second, we need a radio link buffer control algorithm, which can handle random channel fluctuation effectively. Cross-layer priority signaling The first module can be implemented by maintaining a mapping table between SVC NALUs and RLP packets in the MANE as given in Table 7.1. For simplicity, we assume that NALUs and RTP packets have an one-to-one mapping. The RTP to RLP mapping 156 Table 7.1: Packet Mapping in a MANE NALU / RTP RLP Symbol Entry Symbol Entry PN Packet Number PN Packet Number TS Time Stamp TS Time Stamp PID GOP ID (GID) PID Priority ID DID, QID, TID Layer ID Type Packet type TxStat Tx Status provides three type of RLP packets of single (SNGL, one-to-one), fragment (FRAG, one-to-many) and aggregate (AGGR, many-to-one), depending on the RTP packet size. The TxStat field indicates the transmission status with two values of TX and NACK. Finally, we define the priority of an RLP packet by the combination of PID, DID, QID and TID in the NALU header extension (see Fig. 7.4) to reflect inter-dependency among coding units, where the PID of the NALU header contains the GOP number (GID) of the corresponding packet. Mathematically, we have PID RLP =GID·2 10 +DID·2 7 +QID·2 3 +TID; (7.5) where the multiplications by the power of 2 can be simply implemented by the shift operator. Radio link buffer control The radio link buffer control algorithm is shown in Fig. 7.8, which consists of two mechanisms: 1)theGOP-basedprioritizedpackettransmissionand2)congestioncontrol. The priority transmission mechansim as shown in Fig. 7.8(a) exploits the layered video structure in the streaming application. That is, with respect to each GOP, it makes a decision on the layer switch at a layer boundary packet based on the decoder buffer 157 status. Ahead-of-line(HOL)packetistransmittedonlywhenthedecoderbuffercontains an enough amount of data to play for a pre-specified duration of time. If the decoder level is lower than the threshold, all enhancement layer packets of the current GOP are dropped. Similarly, the congestion control mechanism in Fig. 7.8(b) is triggered by the decoder buffer underflow, and a GOP-based packet drop is conducted so that all packets belonging to the current GOP are discarded in case of wireless congestion. In the two mechanisms, the decoder buffer status is inferred from the highest packet number with NACK by assuming continuous retransmission of failed packets. GID i == GID i-1 && LID i > LID i-1 Transmit Packet i For each HOL RLP packet i B DEC > TH DEC, High YES NO NO Drop all the packets with GID k == GID i (a) Priority transmission B RL > TH RL, High LID i == 0 YES Drop all the packets with GID k == GID i +1 && LID k > LID i YES NO NO Drop all the packets with GID k == GID i && LID k > LID i (b) Congestion control Figure 7.8: The proposed radio link buffer control algorithm: (a) the GOP-based priori- tized packet transmission and (b) congestion control. 7.4.4 Simulation 7.4.4.1 Simulation Setup Weemployedthetwo-stateGilbert-Elliotwirelesschannelmodelinthecomputersimula- tion as shown in Fig. 7.9. The packet loss rates for the good and the bad states were set 158 G B P G P B 1-P G 1-P B Figure 7.9: The wireless channel simulation model. to 0.1 and 0.8, respectively, and the channel duration was assumed to be geometrically distributed with averages equal to 2 and 4 seconds, respectively. The RLP packet size was fixed to 200 bytes, and each packet was transmitted every 5 ms with the round trip time (RTT) equal to 20 ms. We performed the simulation with two test sequences of different lengths (257 and 985 frames, respectively). The frame rate is 30 fps for both sequences so that their durations are 8.6 and 32.5 seconds, respectively. The two test video sequences are given in Table 7.2. Table 7.2: Test Sequences Sequence Codec Layer Rate (kbps) PSNR (dB) Foreman SVC QL-0 74.94 36.49 QL-1 110.45 36.82 QL-2 152.49 37.29 AVC • 145.65 39.11 City, SVC QL-0 79.30 33.31 Harbour, QL-1 110.40 34.09 News, QL-2 137.20 34.59 Soccer AVC • 132.67 36.05 7.4.4.2 Simulation Results Fig. 7.10 shows the radio channel rate fluctuation as a function of time, which measure the amount of successfully transmitted packets at every 0.5 second. Fig. 7.11 shows the simulation results, where the performance of the proposed adaptive video stream- ing scheme using H.264/SVC is compared against the non-adaptive single layer video 159 (a) Channel Condition - Short Streaming (b) Channel - Long Streaming Figure 7.10: Channel fluctuation for the simulation duration. streaming based on H.264/AVC. Since the conventional single layer video scheme does not provide a flexible video representation, even a single packet loss may significantly degrade the overall video quality at the receive end. For this reason, we use the interme- diate buffering interrupt (or play-back hiccups) as one performance metric. In Fig. 7.11, we show the initial delay, video play-back quality and buffer interrupts for two test video sequences. Asshowninthisfigure,weseethatthesingle-layervideoformat(H.264/AVC) has 4 interrupts and 1 interrupt for the long and short video sequences while the layered video format (H.264/SVC) has no interrupt. Besides, the video quality by the proposed algorithm should be better than that by the benchmark if we interpret the interrupts to frame losses, which should cause serious error propagation. 160 Buffering Interrupt (a) Result - Short Streaming Buffering Interrupt (b) Result - Long Streaming Figure 7.11: Simulation result. 7.5 Conclusion In this chapter, we examined a cross-layer approach to wireless/mobile video streaming system design. We specified the time-varying random radio channel, the unequal impor- tance of video data and the time constraint of the streaming applications as the major challenges to overcome. We also introduced the cautionary perspective of cross-layer de- sign along with the system requirements. Representative proposals have been reviewed along with comments on their advantages and possible enhancements. Finally, we exam- ined the design of a realistic video streaming system based on a cross-layer approach. The cross-layer design of a video streaming is a challenging task since every aspect of the system components has to be carefully considered. It is a huge topic that embraces 161 threemajorresearchfields;namely,videocoding,computernetworks,anddigitalcommu- nications. Hence, discussion in this chapter may be neither sufficient nor complete. How- ever, the important aspects of the cross-layer approach to the design of wireless/mobile video streaming system have been mostly covered in this chapter. 162 Chapter 8 Conclusion and Future Work 8.1 Summary of the Research In this research, we studied dependent R-D modeling in H.264/SVC and its application to efficient bit allocation. This research was motivated by the very complicated inter- dependency nature of coding units of the scalable video coding standard, H.264/SVC. Sincesignaldependencyhasbecomemoreimportant,thisissuehastobestudiedcarefully in order to develop efficient bit allocation algorithms. Furthermore, the requirement of a reasonable complexity of bit allocation algorithms has led us to develop dependent R-D models. In Chapter 3, we investigated the temporal dependency and the T layer bit allocation problem for the hierarchical B-pictures of H.264/SVC. We observed critical T-layer R-D characteristics: • independence of rate in the T layer; and • dependence of distortion in the T layer. 163 ToaddresstheR-Dcharacteristicsinahighlycomplexpredictionstructure,wedeveloped the dependent distortion model, by which the distortion of each T layer is represented as a linear sum of the 0 th T layer distortion function. The validity of the distortion model was justified by comparing the GOP distortion estimation results with the actual GOP distortion values of various test sequences with different temporal characteristics. The proposed distortion model achieved about 89% of GOP distortion estimation accuracy with 16 test sequences in QCIF and CIF formats. Basedonthedistortionmodel,wedevelopedasimpleTlayerbitallocationalgorithm, where the bit allocation process was performed with linear complexity by numerical so- lution to the Lagrange optimization problem. The R-D performance of the proposed bit allocation algorithm was compared with that of the rate control algorithm implemented in JSVM 8.9. It was shown that a significant coding gain was achieved by the proposed bit allocation algorithm in the global scalable bit stream as well as in each T layer. We studied a simplified rate control algorithm in Chapter 6 as an extension of the T layerbitallocationalgorithm. TheTlayerbitallocationalgorithminChapter3couldbe limited in its usage for time constrained applications due to its high complexity. For this reason, we analyzed the problem and the solution of the T layer bit allocation algorithm and developed a single pass rate control algorithm. The performance of the simplified algorithm was compared with that of the JSVM 9.16 rate control algorithm. We observe a clear coding gain improvement with precise rate control at the target bit rate. The Q layer R-D modeling was studied in Chapter 4. Several important conclusions were drawn on the Q layer R-D characteristics: • independence of distortion in the Q layer; 164 • dependence of rate in the Q layer; • orthogonality of the R-D dependency in T and Q layers. We developed Q layer dependent R-D models, where the rate of a Q layer is modeled as the linear combination of its base Q layer rate functions and the distortion of a Q layer is represented by a simple multiple of the base Q layer distortion. We obtained actual Q layer R-D data from 16 test sequences in QCIF and CIF formats for modeling performance evaluation. An accuracy of about 89% and 93% was achieved for the rate and the distortion estimation, respectively. Based on T and Q layer R-D models, we examined the joint Q-T bit allocation prob- lem. The orthogonality of the R-D dependency allows a simple linear combination of the T and the Q layer R-D models, by which a very simple joint Q-T bit allocation algorithm could be developed. The proposed joint bit allocation algorithm outperforms the JSVM benchmarkbyabout0.5dBontheaveragefor8testsequencesinQCIFandCIFformats. Moreover, the efficiency enhancement of each Q layer was about 0.6 dB on the average. InChapter5,wesummarizethedependentR-Dmodelingandprovidefurtheranalysis of the R-D models proposed in Chapters 3 and 4. We derived the joint T-Q layer R-D models based on dependent R-D models. The accuracy of the joint R-D model is verified by the use of real R-D data. While conducting the analysis, we could identify important dependentR-DcharacteristicsintheT-Qscalabilityandthephysicalimplicationofmodel parameters. Finally,inChapter7,weconsideredthestreamingofH.264/SVCvideointhewireless environment. We employed the cross-layer network design. We clearly identified major 165 challenges and important issues in the development of a cross-layer video streaming algo- rithm and address them carefully in the proposed video streaming algorithm. Moreover, this scheme adopts standardized tools to achieve cross-layer video streaming in a realistic application scenario. It was verified by computer simulation that the integration of a media aware network element (MANE) and the NALU header provides a powerful tool for high quality video delivery over wireless networks. 8.2 Future Research Directions AsintroducedinChapter2, thecomplexityincreaseswiththeencodingandthedecoding ofscalablevideo, whichpotentiallypreventsscalablevideofromitspracticalemployment [36]. Forthisreason,algorithmsadoptedbythescalablevideocodechavetobeoptimized suchthattheircomplexityrequirementcouldbeconfinedtoareasonablelevel. According to the complexity profiling of the single layer encoder and decoder, the computational complexity of an encoder and a decoder mainly comes from the motion estimation and the mode decision and the motion compensation processes (more than 50% of the entire processing time), respectively. Thus, fast motion search, fast mode decision algorithms and motion compensation algorithms for scalable video demand further investigation. 166 Bibliography [1] “Coding of audiovisual objects, Part 2 : Visual,” ISO/IEC 14496-2. [2] “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s, Part 2 : Video,” ISO/IEC 11172-2. [3] “Generic coding of moving pictures and associated audio information, Part 2 : Video,” ISO/IEC 13818-2. [4] “Video codec for audiovisual services at p×64 kbits/s,” ITU-T, Recommendation H.261. [5] “JointDraftITU-TRec.H.264—ISO/IEC14496-10 Amd.3Scalablevideocoding,” JointVideoTeam(JVT)ofISO/IECMPEG&ITU-TVCEG,Doc.JVT-X201,July 2007. [6] “JointScalableVideoModel,”JointVideoTeam(JVT)ofISO/IECMPEG&ITU-T VCEG, Doc. JVT-X202, July 2007. [7] “Rate control for the joint scalable video model (JSVM),” Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Doc. JVT-W043, April 2007. [8] “Rate control reorganization in the Joint Model (JM) reference software,” Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Doc. JVT-W042, April 2007. [9] R.Braden,D.Clark,andS.Shenker,“Integratedservicesintheinternetarchitecture: An overview,” RFC-1633, Internet Engineering Task Force (IETF), June 1994. [10] J. K. Cavers, Mobile Channel Characteristics. Hingham, MA, USA: Kluwer Aca- demic Publishers, 2000. [11] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic rate dis- tortion model,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 7, no. 1, pp. 246–250, February 1997. [12] M. V. der Schaar and N. S. Shanker, “Cross-layer wireless multimedia transmis- sion: Challenges, principles and new paradigm,” IEEE Wireless Communications Magazine, vol. 12, no. 4, pp. 50–58, August 2005. [13] B. Girod and N. F¨ aber, Compressed Video over Networks. New York, NY, USA: Marcel Dekker, 2001, ch. Wireless Video, pp. 465–511. 167 [14] Z. He and S. K. Mitra, “A unified rate distortion analysis framework for transform coding,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 11, no. 12, pp. 1221– 1236, December 2001. [15] ——, “A linear source model and a unified rate control algorithm for DCT video coding,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 12, no. 11, pp. 511– 523, June 2002. [16] ——, “Optimum bit allocation and accurate rate control for video coding via - domain source modeling,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 12, no. 10, pp. 840–849, October 2002. [17] H.-C. Huang, W.-H. Peng, T. Chiang, and H.-M. Hang, “Advances in the scalable amendment of H.264/AVC,” IEEE Communications Magazine, vol. 45, no. 1, pp. 68–76, January 2007. [18] R. Jurdak, Wireless Ad Hoc Sensor Networks: A Cross-layer Design Perspective. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007. [19] N. Kamaci, Y. Altinbasak, and R. M. Mersereau, “Frame bit allocation for H.264/AVCvideocoderviaCauchy-density-basedrateanddistortionmodels,”IEEE Trans. on Circ. and Sys. for Video Tech., vol. 15, no. 8, pp. 994–1006, August 2005. [20] A. K. Katsaggelos, F. Zhai, Y. Eisenberg, and R. Berry, “Energy efficient wireless videocodinganddelivery,”IEEE Wireless Communications Magazine,vol.12,no.4, pp. 24–30, August 2005. [21] V. Kawadia and P. R. Kumar, “A cautionary perspective of cross-layer design,” IEEE Wireless Communications Magazine, vol. 12, no. 1, pp. 3–11, February 2005. [22] W. Kumwilaisak, Y. T. Hou, Q. Zhang, W. Zhu, C. C. J. Kuo, and Y. Q. Zhang, “A cross-layer quality-of-service mapping architecture for video delivery in wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 10, pp. 1685–1698, December 2003. [23] D.-K. Kwon, Y. Cho, and C.-C. J. Kuo, “A simplified rate control scheme for non- conversational H.264 video,” in IEEE Int’l Workshop on Multimedia Signal Process- ing, October 2007, pp. 284–287. [24] D.-K. Kwon, M.-Y. Shen, and C.-C. J. Kuo, “Rate control for H.264 video with enhanced rate and distortion models,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 17, no. 5, pp. 517–529, May 2007. [25] L.-J. Lin and A. Ortega, “Bit-rate control using piecewise approximated rate- distortion characteristics,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 8, no. 4, pp. 446–459, August 1998. [26] J. Liu, Y. Cho, Z. Guo, and C.-C. J. Kuo, “Bit allocation for spatial scalability in H.264/SVC,” in IEEE Int’l Workshop on Multimedia Signal Processing, October 2008, pp. 278–283. 168 [27] S. Liu and C.-C. J. Kuo, “Joint temporal-spatial bit rate control for video coding with dependency,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 15, no. 1, pp. 15–26, January 2005. [28] Y. Liu, Z. G. Li, and Y. C. Soh, “Rate control of H.264/AVC scalable extension,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 18, no. 1, pp. 116–121, January 2008. [29] J.-R. Ohm, “Advances in scalable video coding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 42–56, January 2005. [30] D. Pranantha, M. Kim, S. Hahm, B. Kim, K. Lee, and K. Park, “Dependent quan- tization for scalable video coding,” in Proc. IEEE Int’l Conf. Advanced Communi- catino Technology, February 2007, pp. 222–227. [31] K. Ramakrishnan, S. Floyd, and D. Black, “The addition of explicit congestion noti- fication(ecn)toip,”RFC-3168,InternetEngineeringTaskForce(IETF),September 2001. [32] K. Ramchandran, A. Ortega, and M. Vetterli, “Bit allocation for dependent quanti- zation with applications to multiresolution and MPEG video coders,” IEEE Trans. on Image Processing, vol. 3, no. 5, pp. 533–545, September 1994. [33] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low complexity communications,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 9, no. 1, pp. 172–185, February 1999. [34] ——, “A frame-layer bit allocation for H.263+,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 10, no. 7, pp. 1154–1158, October 2000. [35] H. Schwarz, D. Marpe, and T. Wiegand, “SVC core experiment 2.1: Inter-layer pre- diction of motion and residual data,” ISO/IEC JTC 1/WG11, doc. M11043, Red- mond, WA. USA, July 2004. [36] ——,“OverviewofthescalablevideocodingextensionoftheH.264/AVCstandard,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 17, pp. 1103–1120, September 2007. [37] C. A. Segall and G. J. Sullivan, “Spatial scalability with the H.264/AVC scalable video coding extension,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 17, no. 9, pp. 1121–1135, September 2007. [38] Y. Sermadevi and S. S. Hemami, “Efficient bit allocation for dependent video cod- ing,” in Proc. IEEE Int’l Conf. on Data Compression, March 2004, pp. 232–241. [39] S. Shakkottai, T. S. Rappaport, and P. S. Karlsson, “Cross-layer design for wireless networks,” vol. 41, no. 10, pp. 74–80, October 2003. [40] Y. Shan, “Cross-layer techniques for adaptive video streaming over wireless net- works,” EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 220–228, January 2005. 169 [41] V. Srivastava and M. Motani, “Cross-layer design: A survey and the road ahead,” vol. 43, no. 12, pp. 112–119, December 2005. [42] G.J.SullivanandT.Wiegand,“Rate-distortionoptimizationforvideocompression,” IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74–90, November 1998. [43] Q. Wang, Z. Xiong, F. Wu, and S. Li, “Optimal bit allocation for progressive fine granularity scalable video coding,” IEEE Signal Processing Letters, vol. 9, no. 2, pp. 33–39, February 2002. [44] S. Wenger, M. Stockhammer, T. Sesterlund, and D. Singer, “Rtp payload format for h.264 video,” RFC-3984, Internet Engineering Task Force (IETF), February 2004. [45] D. Wu and R. Negi, “Effective capacity: A wireless link model for support of quality of service,” IEEE Transactions on Wireless Communications, vol. 2, no. 4, pp. 630– 643, July 2003. [46] L. Xu, W. Gao, X. Ji, and D. Zhao, “Rate control for hierarchical B-picture coding with scaling-factors,” in IEEE Int’l Symposium on Circuits and Systems, May 2007, pp. 49–52. [47] L.Xu,S.Ma,D.Zhao,andW.Gao,“Ratecontrolforscalablevideomodel,”inProc. of SPIE on Visual Communications and Image Processing, July 2005, pp. 525–534. [48] Q. Zhang, W. Zhu, and Y. Zhang, “End-to-end qos for video delivery over wireless internet,” Proceedings of the IEEE, vol. 93, no. 1, pp. 123–134, January 2005. 170
Abstract (if available)
Abstract
In this research, we investigate model-based bit allocation algorithms for H.264/SVC, which is newly standardized as a scalable extension of H.264/AVC. Despite its importance in video coding, inter-dependency between coding units is often indirectly addressed in conventional single layer bit allocation algorithms. This simplified treatment is adopted due to the complexity involved in its explicit consideration. In H.264/SVC, inter-dependency between coding units becomes even more involved and the development of an optimal bit allocation algorithm imposes an even higher challenge.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Rate control techniques for H.264/AVC video with enhanced rate-distortion modeling
PDF
H.264/AVC decoder complexity modeling and its applications
PDF
Distributed source coding for image and video applications
PDF
Efficient coding techniques for high definition video
PDF
Artistic control combined with contact and elasticity modeling in computer animation pipelines
PDF
3D face surface and texture synthesis from 2D landmarks of a single face sketch
PDF
A statistical ontology-based approach to ranking for multi-word search
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
Green learning for 3D point cloud data processing
PDF
Scalable peer-to-peer streaming for interactive applications
PDF
Semantic structure in understanding and generation of the 3D world
PDF
Feature-preserving simplification and sketch-based creation of 3D models
PDF
Learning controllable data generation for scalable model training
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
3D modeling of eukaryotic genomes
PDF
A data-driven approach to compressed video quality assessment using just noticeable difference
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Intelligent near-optimal resource allocation and sharing for self-reconfigurable robotic and other networks
PDF
Fast iterative image reconstruction for 3D PET and its extension to time-of-flight PET
Asset Metadata
Creator
Cho, Yongjin
(author)
Core Title
Dependent R-D modeling for H.264/SVC bit allocation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publication Date
02/19/2010
Defense Date
12/10/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bit allocation,H.264/SVC,OAI-PMH Harvest,rate control,R-D modeling,scalable video
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Nakano, Aiichiro (
committee member
), Ortega, Antonio (
committee member
)
Creator Email
choyongjin@gmail.com,yongjinc@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2816
Unique identifier
UC1163899
Identifier
etd-Cho-3436 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-289439 (legacy record id),usctheses-m2816 (legacy record id)
Legacy Identifier
etd-Cho-3436.pdf
Dmrecord
289439
Document Type
Dissertation
Rights
Cho, Yongjin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
bit allocation
H.264/SVC
rate control
R-D modeling
scalable video