Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Advanced video coding techniques for Internet streaming and DVB applications
(USC Thesis Other)
Advanced video coding techniques for Internet streaming and DVB applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ADVANCED VIDEO CODING TECHNIQUES FOR INTERNET STREAMING AND DVB APPLICATIONS by Lifeng Zhao A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2002 Copyright 2002 Lifeng Zhao Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3093973 UMI UMI Microform 3093973 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by L i zh«* ______________________ under the direction o f h dissertation committee, and approved by all its members, has been presented to and accepted by the Director o f Graduate and Professional Programs, in partial fulfillment o f the requirements fo r the degree of DOCTOR OF PHILOSOPHY Director Date December 18,—2002 Dissertation Committee Chair Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedication To m y advisor Dr. C.-C. Jay K uo, fo r helping m e get to where I am today. To m y parents and m y wife Chun Lin, fo r pushing m e to finish this thesis. ii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements I would like to first acknowledge the continuous guidance and support from my thesis advisor Professor C.-C. Jay Kuo. W ith his insightful advice and ideas, Pro fessor Kuo turns this research experience both enjoyable and invaluable to my de velopment as a researcher. I would also like to thank Professor Antonio Ortega, Professor Tu-Nan Chang, Professor Chris Kyriakakis and Dr. Chang-Su Kim for serving on my committee. A special thanks to my mentor Dr. Jongwon Kim for his great support and guidance for this research during his stay at USC. I am also grateful to Dr. Ioannis Katsavounidis for his great help to this research when I joined Intervideo as a full time employee. I am also indebted to the whole MPEG-4 team in InterVideo. I would also like to express my gratitude to all members of our group for creating such pleasant environment. The good times we spent together is unforgettable in my memory. Finally, I gratefully acknowledge the generous support from my family. I owe my deepest gratitude to my father and mother for their dedication to my education. I iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. am also very indebted to my wife Chun Lin for her endless understanding and love during the past three years. My appreciation to her cannot adequately be expressed in words. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents Dedication ii Acknowledgem ents 111 List Of Tables x List Of Figures xi A bstract xv 1 Introduction 1 1.1 Significance of the R e s e a rc h ................................................................................. 1 1.1.1 Problems in Today’s Internet Video Stream ing............................................ 1 1.1.2 Encoder Design for Digital Video B ro a d c a stin g ........................................ 6 1.2 Issues in Internet Video Streaming S y s te m ...................................................... 7 1.2.1 Source E n c o d in g ................................................................................................. 7 1.2.2 Rate C o n tr o l......................................................................................................... 9 1.2.3 QoS Provision for Internet V id e o ..................................................................... 10 1.3 Research on Codec Design for D V B ................................................................... 12 1.3.1 R-D Optimization for Video E n c o d in g .......................................................... 12 1.3.2 Fast Motion S earch .............................................................................................. 13 1.4 Contributions of this R esearch............................................................................. 14 1.5 Outline of the d isse rta tio n .................................................................................... 18 2 Overview of Previous Work and Proposed Stream ing System 19 2.1 In tro d u ctio n ............................................................................................................... 19 2.2 Scalable Source C oding........................................................................................... 19 2.3 Fast Algorithms for Block based motion e s tim a tio n ..................................... 21 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.4 Rate Control for Video Applications ................................................................ 23 2.4.1 Real-Time Visual C om m unications................................................................ 24 2.4.2 Stored Video S tre a m in g .................................................................................... 26 2.4.3 Rate Control with R-D O p tim izatio n ............................................................. 27 2.4.4 Rate control for MPEG-4 FGS b itstre a m ...................................................... 29 2.5 Quality of Service Provision for Video A p p lic a tio n s..................................... 30 2.6 Proposed Video Streaming System ................................................................... 32 2.7 C onclusion.................................................................................................................. 38 3 Constant Quality Video R ate Control 39 3.1 In tro d u ctio n ............................................................................................................... 39 3.2 M otivation.............................. 39 3.3 Constant Quality Rate Control for MPEG4 V id e o ......................................... 42 3.3.1 Video A n a ly s is ..................................................................................................... 42 3.3.2 Generate Coding Complexity In fo rm a tio n .................................................. 43 3.3.3 Three Layer Constant Quality Rate C o n tro l............................................... 44 3.3.3.1 Complexity-Guided GOP level bit allocation............................................ 44 3.3.3.2 Frame Level Bit A llo c a tio n .......................................................................... 46 3.3.3.3 MB Level Rate C o n tr o l................................................................................. 47 3.4 Constant Quality Rate Control for MPEG4 FGS C o d ec.............................. 48 3.5 MPEG4 FGS Video Streaming W ith C Q V R C ............................................... 52 3.6 Simulation R e s u lt s ................................................................................................. 53 3.6.1 Constant Quality Rate Control for MPEG4 Video . ................................ 53 3.6.1.1 Impact of Pre-loading Buffer on Visual q u a l i t y ..................................... 55 3.6.2 Constant quality rate control for MPEG-4 FGS Video .................... . 55 3.6.2.1 CQVRC for CBR channels .......................................................................... 57 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.6.2.2 CQVRC for Time-Varying CBR c h a n n e l.................................................... 59 3.6.3 CQVRC for BL and E L ...................................................................................... 60 3.7 Conclusion................................................................................................................... 60 4 Content Aware R ate Control 74 4.1 In tro d u ctio n ................................................................................................................ 74 4.2 M otivation................................................................ 75 4.3 Pre-processing and Encoding.................................................................................. 77 4.3.1 Motion-based Adaptive Frame G ro u p in g ........................................................ 78 4.3.2 Frame Type Decision and Relative Quality Contribution Association . 79 4.3.3 Highly scalable differential JPEG2000 video codec....................................... 80 4.3.4 Spatial/Tem poral Quality Contribution A ssociation................................... 83 4.3.5 R ate/D istortion Information E m b ed d in g ....................................................... 84 4.4 Content Aware Rate Control (CARC) ............................................................... 84 4.4.1 Across-GOP Frame-rate Control ..................................................................... 86 4.4.1.1 Heuristic Frame Rate C o n tro l........................................................................ 86 4.4.1.2 Frame Rate Control with Constrained R-D Optimal Bit Allocation . 88 4.4.2 Inside-GOP frame rate control ........................................................................ 92 4.4.3 GOP-bitplane rate c o n tro l.................................................................................. 93 4.5 Layered RPI Generation and Content Aware F iltering.................................... 94 4.5.1 Layered Packetization and Priority A ssig n m e n t.......................................... 95 4.5.2 Content-A ware Packet F ilte rin g ........................................................................ 96 4.6 Simulation R e s u lt s ........................ ......................................................................... 97 4.6.1 Adaptive Frame Grouping and Quality Contribution Association . . . 97 4.6.2 Evaluation of C A R C ............................................................................................ 99 4.7 C onclusion..................................................................................................................... 103 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5 Fine Granular Video Streaming over DifFServ Networks 106 5.1 In tro d u ctio n .................................................................................................................106 5.2 M otivation.................................................................................................................... 107 5.3 Overview of the Proposed S y ste m .......................................................................... I l l 5.4 Fixed-length Packetization and Priority A ssignm ent....................................... 114 5.5 Differentiated Forwarding of FGS V i d e o .............................................................117 5.6 Experimental R e su lts................................................................................................120 5.7 C onclusion.................................................................................................................... 127 6 Buffer-Constrained R-D Optimal R ate Control 128 6.1 In tro d u ctio n .................................................................................................................128 6.2 M otivation.................................................................................................................... 129 6.2.1 Review of R-D Optimized Video E n co d in g ...................................................... 129 6.2.2 Proposed R-D Optimized Coding for Progressive and Interlaced Video 132 6.2.2.1 R-D optimization for different encoding o p tio n s .........................................133 6.2.2.2 CASE I: Independent Rate and Independent Distortion ........................135 6.2.2.3 CASE II: Dependent Rate and Independent D isto rtio n ...........................136 6.2.2.4 CASE III: Dependent Rate and Dependent D is to r tio n ........................... 138 6.2.3 Quality Assured R-D Optimal Codec C o n tr o l................................................139 6.2.3.1 Quality assured frame layer bit a llo c a tio n .................................................. 140 6.2.3.2 R-D Optimized MB Level Rate C o n t r o l ......................................................143 6.2.3.3 Simplified MB-Level Rate C o n t r o l................................................................149 6.3 Experimental R e su lts................................................................................................157 6.3.1 R-D Optimized MPEG-2 E ncoder.......................................................................157 6.3.2 R-D Optimized MPEG4 E n c o d e r.......................................................................159 6.3.3 Conclusion ............................................................................................................... 163 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7 Predictive Fast Integer and H alf Pel M otion Search For Interlaced Video 166 7.1 In tro d u ctio n .................................................................................................................. 166 7.2 M otivation......................................................................................................................167 7.3 Predictive Fast Integer-pel S e a rc h ...........................................................................168 7.3.1 Overview of MVFAST A lg o rith m ........................................................................168 7.3.2 Predictive Fast Integer-pel search for Interlaced v id e o ...................................170 7.4 Predictive Fast Half-pel S e a r c h ............................................................................... 175 7.4.1 Pre-elimination of half-pel ME ..........................................................................175 7.4.2 Predictive half-pel motion e stim atio n ................................................................177 7.5 Simulation R e s u lt s .....................................................................................................178 7.5.1 Predictive Half-Pel S e a rc h ....................................................................................179 7.5.2 Fast Predictive Integer-pel (FPIP) S e a r c h ........................................................180 7.6 Conclusion..................................................................................................................... 182 8 Conclusion and Future work 183 8.1 Conclusion..................................................................................................................... 183 8.2 Future Work ...............................................................................................................186 8.2.1 Efficient Scalable Video C o d in g ...........................................................................186 8.2.2 Low Complexity A lgorithm s................................................................................. 187 8.2.3 Effective QoS Mapping and Refined DiffServ M o d e l......................................188 8.2.4 Joint Rate A daptation and Error Control for Scalable Video Streaming over DiffServ N etw orks......................................................................... 189 8.2.5 Fast Motion Search for Advanced Motion Compensated Prediction . . 189 Reference List 191 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Tables 3.1 Content characteristics of cascaded “Akiyo” and “Foreman” CIF se quences............................................ 54 4.1 Comparison of PSNR and its variation for full frame-rate, CARC, and BFS rate control under 3 0 0 k b p s......................................................................103 4.2 Comparison of PSNR and its variation for full frame-rate, CARC, and BFS rate control under 100 kbps.......................................................................103 5.1 Param eters for differentiated forwarding of BL packets of MPEG-4 FGS . 122 5.2 Priority distribution of BL and EL packets under different encoding and packetization param eters..................................................................................... 127 7.1 Histogram of ratio between integer-pel SAD and minimal half-pel SAD . 175 7.2 Computation-distortion performance of the proposed half-pel search scheme 180 7.3 Computation-distortion performance of the proposed integer-pel search sch em e.....................................................................................................................181 7.4 Computation-distortion performance of the combined integer-pel and half- pel search sc h e m e .................................................................................................182 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Figures 1.1 Components of a video streaming system and related research issues. . . 7 1.2 The motion-compensated hybrid transform video coder................................ 8 2.1 Layered options in the non-scalable video codec.............................................. 22 2.2 The encoding structure of the MPEG-4 FGS codec........................................ 23 2.3 Temporal variation of a sequence: (a) temporal variation and GOP de cision, and (b) temporal variation within a GOP and its dependency structure................................................................................................................ 34 2.4 The content-aware scalable Internet video streaming system....................... 36 3.1 Relationship between (a) frame quality, (b) bit allocation of each frame, and (c) the corresponding buffer status........................................................ 62 3.2 The impact of different rate allocations of EL on the PSNR value: (left) BL (base layer) only, (center) fixed allocation of EL and (right) allo cation of EL to achieve constant q u a lity .................................................... 63 3.3 (a) The scalability structure of MPEG-4 FGS and (b) the comparison of interpolated and real (i.e. emprical) distortion values.............................. 63 3.4 System overview of MPEG-4 FGS video streaming with CQVRC.................. 64 3.5 Frame dropping by MPEG4 VM rate control at different pre-loading delay (a) 0.25s; (b) Is; (c) 2s; and (d) 2.5s.............................................................. 65 3.6 Comparison of the proposed CQVRC with TM5, and MPEG4 VM for the “Foreman” sequence at lOfps, 128kbps, under different pre-loading cases. The average PSNR of CQVRC-0.5s is 30.86db, CQVRC-l.Os is 30.92db and CQVRC-2.0s is 30.99db respectively. The average PSNR of MPEG4-VM and TM5 is 30.43db and 30.34db, respectively 66 3.7 Comparison of the proposed CQVRC, TM5, and M PEG4 VM for the cascaded “Akiyo” and “Foreman” sequence at lOfps, 128kbps................... 67 3.8 Quality variation in MPEG4-FGS encoded video with an even bit distri bution in EL......................................................................................................... 67 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.9 The PSNR comparison of the proposed CQVRC and FGSRC for EL under 160 kbps. The maximal PSNR difference of CQVRC is 0.38db for the first scene, and 2.2db for the second scene. The maximal PSNR difference of FGSRC is 2.8db for the first scene, and 6.4db for the second scene.................................................................................................. 68 3.10 The PSNR comparison of the proposed CQVRC and FGSRC for EL under 320 kbps. The maximal PSNR difference of CQVRC is 0.4db for the first scene, and 0.7db for the second scene. The maximal PSNR difference of FGSRC is 2.5db for the first scene, and 7.2db for the second scene.................................................................................................. 68 3.11 The PSNR comparison of the proposed CQVRC and FGSRC for EL under 480 kbps. The maximal PSNR difference of CQVRC is 0.3db for the first scene, and 0.9db for the second scene. The maximal PSNR difference of FGSRC is 2.0db for the first scene, and 8.1db for the second scene.................................................................................................. 69 3.12 The PSNR comparison of the proposed CQVRC and FGSRC for EL under 600 kbps. The maximal PSNR difference of CQVRC is 0.3db for the first scene, and 0.6db for the second scene. The maximal PSNR difference of FGSRC is 2.2db for the first scene, and 9.0db for the second scene.................................................................................................. 69 3.13 (a) The trace of a variable bit rate channel with bandwidth change period of 1 second and (b) the PSNR comparison of CVQRC and FGSRC. . 70 3.14 (a) The trace of a variable bit rate channel with bandwidth change period of 2 seconds and (b) the PSNR comparison of CVQRC and FGSRC. 71 3.15 (a) The trace of a variable bit rate channel with bandwidth change period of 5 seconds and (b) the PSNR comparison of CVQRC and FGSRC. 72 3.16 Quality variation in MPEG4-FGS coded video with constant quality BL, even bit distribution for EL and no adaptive frame grouping................. 73 3.17 The PSNR comparison of different rate control options of BL and EL. . . 73 4.1 Example of MSL calcu latio n ............................................................................... 80 4.2 Four level hierarchical rate control for CARC.................................................. 86 4.3 The temporal variation of a sequence: (a) across GOP and (b) within one G O P . ........................................................................................................... 98 4.4 The spatial complexity of one sequence............................................................. 98 xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 SQ C /TQ C for each GOP......................................................................................... 99 4.6 The frame-rate determined by CARC under 300 kbps and 100 kbps. . . . 101 4.7 PSNR comparison for three schemes: (a) under 300 kbps with full frame- rate, CARC and BFS and (b) under 100 kbps with CARC and BFS. 102 4.8 Reconstructed image quality comparison between full frame-rate and CARC (upper for full frame-rate and lower for CARC)..............................105 5.1 The overview of the proposed scalable video streaming system with net work QoS p ro v isio n in g .......................................................................................112 5.2 The error concealment scheme for lost MB recovery in BL..............................118 5.3 The DiffServ node with a multiple queue..............................................................119 5.4 The diagram of the proposed simulation setup................................................... 121 5.5 The PSNR comparison of E E P/U E P for BL packets: (a) EEP and 3-level DS-UEP of Case 1 and (b) EEP and 3-level DS-UEP of Case 2. . . . 123 5.6 The RLI distribution for packets in the Foreman sequence: (a) BL packets under Case 1 and (b) EL packets under 384k b p s........................................123 5.7 The PSNR comparison of rate adaptation, UEP, and EEP for the Fore man sequence under different EL rates: (a) EL at 512kbps, (b) EL at 384kbps, (c) EL at 256kbps and (d) EL at 160kbps, where the PSNR value is 34.99dB, 34.19dB, 32.97dB and 32.07dB, respectively, under the no loss condition................................................................................... 126 6.1 The three trellis diagrams for I, B and P frames, respectively........................149 6.2 The average number of bits per non-zero coefficient in the 1 frame for the “Football” sequence...............................................................................................152 6.3 The average number of bits per non-zero coefficient in the I frame for the “Tempete” sequence..............................................................................................153 6.4 The average number of bits per non-zero coefficient in the interframe of sequence “Tempete”..............................................................................................154 6.5 The PSNR comparison of the TM5 reference encoder with the proposed R-D optimized encoder for the football sequence..........................................158 6.6 The PSNR comparison of TM5 reference encoder with the proposed R-D optimized encoder for the mobile and calendar sequence............................159 xiii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.7 The PSNR comparison of the simplified R-D optimized encoder with the R-D optimized encoder for the football sequence..........................................160 6.8 The PSNR comparison of the simplified R-D optimized encoder with the R-D optimized encoder for the mobile and calendar sequence.................. 160 6.9 The PSNR performance comparison of the proposed R-D optimized en coder versus the TM5 reference codec for the 15fps Foreman sequence at 192 kbps, where PSNR.TM 5-a.vg — 31.15dB and PSNRRD-avg = 32.22dB.....................................................................................................................161 6.10 The PSNR performance comparison of the proposed R-D optimized en coder versus the TM5 reference codec for the 15fps Tempete sequence at 256 kbps, where PSNRxMn-avg = 27.3dB and PSNR.RD-avg — 28.1 d B .......................................................................................................................162 6.11 The PSNR comparison of the simplified R-D rate control method with the R-D optimized rate control method for the 15fps Foreman sequence at 192kbps, where P S N R smp~rd-avg = 32.21 dB and PSNRRD-avg — 32.22db...................................................................................................................... 162 6.12 The PSNR comparison of the simplified R-D rate control scheme with the R-D optimized rate control method for the 15fps Tempete sequence at 256kbps, where P S N R smp^rd^avg = 28.1dB and P S N R rd i- avg — 28.1 d B .......................................................................................................................163 6.13 The PSNR comparison of the 2-pass R-D optimized encoder with the TM5 reference encoder for the 15fps Foreman sequence at 192kbps, where P S N R 2P - rd-avg — 32.3dB and P S N R rM 5-avg = 31.15d6. . . . 164 6.14 The PSNR comparison of the 2-pass R-D optimized encoder with the TM5 reference encoder for the 15fps Tempete sequence at 256kbps, where P S N R 2p-rd-avg = 28.12dB and PSNRxus-avg = 27.3dB. . . 165 7.1 Overall diagram of the proposed predictive fast integer-pel and half-pel search scheme..........................................................................................................169 7.2 (a) Large diamond search pattern and (b) small diamond search pattern. 170 7.3 The search points in half pel M E............................................................................178 xiv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Video streaming applications over IP networks have attracted a lot of attention recently as a key enabling technology for future video distribution. Streaming offers a significant advantage over the download-and-play approach for on- or off-line me dia distribution from a media server. In the first part of this research (i.e. Chapters 3-5), we study technologies for the challenging IP media streaming problem from both the application and the network viewpoints. To provide QoS control in the application layer, rate control schemes should be employed to maximize end-to-end visual quality. By matching the rate of the underlying video stream to the avail able network bandwidth, rate control reduces the possibility of network congestion. Rate control is also used to preserve smooth video quality when there is a significant amount of variation in the video source, the available bandwidth or both. From the network infrastructure viewpoint, the trend is to promote more QoS support in network nodes ( i.e. boundary or internal routers). We seamlessly integrate video preprocessing, error resilient scalable source coding, constant quality rate adapta tion, prioritizing packetization, and DiffServ based QoS networks into one system in this research. xv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A constant quality video rate control (CQVRC) scheme is proposed for non- scalable MPEG4 video to meet both the constant bit-rate (CBR) channel and the receiver buffer constraints in Chapter 3. Rate control schemes such as TM5, TM N 5/8/9/10 and MPEG4-VM all target at visual communications over a low- delay CBR channel. The requirement for rate control to satisfy the CBR constraint is significantly relaxed due to a relatively large buffer adopted in most streaming applications. On the other hand, significant quality degradation appears in the area with large object motion or scene change, which is annoying to human perception. CQVRC is a variable bit-rate (VBR) rate control scheme, in which the average of instantaneous bit rates is adjusted to match the available bandwidth without vio lating the buffer constraint. CQVRC exploits a large decoder buffer, future frame information and temporal scene segmentation to achieve much smoother video qual ity than previous rate control schemes such as TM5 and M PEG4 VM for low-delay visual communications. The impact of the buffer size on the delivered visual quality is also demonstrated. Rate control for scalable video is then realized by bitstream truncation. An optimal truncation strategy is proposed for MPEG4-FGS codec. CQVRC is realized by embedding the minimal R-D (rate-distortion) information and relying on a piecewise linear R-D model within an enhancement layer (EL). The R-D information (e.g. R-D sample points generated during the encoding process) is Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. embedded in each bitplane of the MPEG-4 FGS enhancement layer. By linear inter polating those embedded R-D information, adaptive bit allocation can be performed to achieve constant quality rate adaptation. Efficient rate control schemes are studied in Chapter 4, where both spatial quality and the temporal frame rate are jointly manipulated for the best performance. By considering spatial temporal quality tradeoff, one sophisticated content-aware rate control (CARC) scheme is formulated. A scalable video streaming system is pre sented to include video preprocessing, scalable source encoding, and content-aware rate control. By combining the encoding pre-processing (i.e. adaptive frame group ing and associated measures on the spatial/tem poral quality contribution per frame) with the temporal-adaptive scalable video codec followed by CARC, the proposed streaming solution addresses the problem of keeping smooth video quality even when there is a significant change in the video source or the connection bandwidth. Delivering MPEG4-FGS bitstream over the DiffServ network is investigated in Chapter 5. The bitstream is first segmented into fixed size packets, and the packet priority is then calculated based on its loss impact on the end-to-end visual quality. Prioritized packets are subject to differentiated dropping and forwarding under the DiffServ network. The impact of source coding parameters, the packetization scheme and the DiffServ network on visual quality is carefully investigated. It is shown that, although the prioritized stream benefits from the prioritized network, its gain xvii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is heavily dependent on how well the video source and network priorities match each other. The second part of this research (i.e. Chapters 6 and 7) is focused on techniques to improve video quality for the digital video broadcasting (DVB) applications. The R-D optimization technique is investigated to handle both progressive and interlaced video in Chapter 6. In the frame level, a better R-Q model is developed based on the number of non-zero coefficients to give a better frame-level bit allocation scheme. Furthermore, a quality feedback scheme is proposed to generate the VBV (Video Buffer Verifier) compliant bitstream with assured video quality. In the macroblock (MB) level, both quantization parameters and coding modes are optimized in all types of pictures, including I, B and P pictures. To reduce the complexity of the R-D optimization procedure, several heuristics are developed to achieve results that are close to the optimal one at a much lower computational cost. A fast algorithm to perform integer- and half-pel motion search for interlaced video is presented in Chapter 7. By exploiting the correlation between the frame- and the field-type integer-pel search, the proposed scheme can skip the sub-optimal integer-pel search type. Furthermore, based on the matching cost obtained from integer-pel search, the proposed scheme is able to determine whether to perform the half-pel search. W hen the half-pel search is deemed necessary, some sub-optimal half-pel search points can still be skipped to reduce the com putational cost fur thermore. Compared with the baseline scheme using the MVFAST algorithm, our xviii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. new scheme can reduce up to 70% search points with around 0.1 ~ 0.2 db quality degradation. xix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction 1.1 Significance of the Research 1.1.1 Problem s in Today’s Internet V ideo Stream ing Video streaming applications over IP networks has attracted a lot of attention re cently as a key enabling technology for future video distribution. Streaming offers a significant advantage over the download-and-play approach for on- or off-line media distribution from a media server. Users may have different preferences, processing ca pabilities, diverse network accesses to the streaming server. For streaming video, the user and the network heterogeneity demands highly scalable video coding methods and flexible delivery techniques to overcome the challenge imposed by the best-effort Internet. The transmission of real time video typically has the bandwidth, the delay and the loss requirements. However, the current best-effort Internet does not offer any quality of service (QoS) guarantees to streaming video in these three aspects. 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Furthermore, for video multicast, it is difficult to achieve both efficiency and flex ibility. Consequently, Internet video streaming presents numerous challenges. To provide application-layer QoS control, both rate control and error control should be employed to maximize the end-to-end visual quality [67, 26]. By matching the rate of the video stream to the available network bandwidth, rate control techniques reduces the possibility of network congestion. The rate control scheme (also called rate adaptation when it works together with network congestion control) plays a crucial role for smooth video transmission when there is a significant change in the video source or the connection bandwidth. Video rate control for H.263+ and MPEG-4 video has been originally designed for low-latency visual communications such as video conferencing applications. In this scenario, short delay is primarily targeted by the rate control module, which implies a small buffer size. Hence, the assigned bit budget for each coded frame should not deviate from the instantaneous bandwidth too much. Otherwise, either buffer overflow or underflow will occur. A dynamic video source often presents vary ing characteristics and thus demands different coding requirements. As a result, the encoder tends to introduce a significant degree of quality variation when encoding frames with an objective of producing a fixed bit-rate stream with a strict buffer constraint. For example, when video segments involving scene change, scene transi tion or some special effects are encoded, either multiple frame skipping or dramatic 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. quality degradation will appear due to the insufficient instantaneous bandwidth and a small coding buffer. For streaming video applications, the strict delay requirement can be substan tially relaxed, and a relatively larger buffer is allowed. W ith such a large buffer, it is sufficient to match the average bit rate with the required bandwidth, allowing the number of bits per frame to fluctuate more as long as it is within the bound imposed by the available buffer space. This leads to the design of variable-bit-rate (VBR) rate control schemes to achieve more constant video quality while meeting a constant bit rate in the average sense. Constant quality is valuable to visual experience. However, it only solves one part of the problem. When the bit budget is not sufficient to satisfy both the full frame rate and reasonable spatial quality, it is necessary to make a tradeoff between the tem poral and the spatial quality factors to achieve better visual quality [55]. The tradeoff of these two quality factors is tightly tied with the video content. This observation leads us to the proposal of a content aware rate control (CARC) scheme, which provides more constant video quality than what an individual temporal or spatial rate control scheme can offer. Moreover, the unpredictable channel variation requires finer granularity than what can be provided with the layering options of conventional MPEG-2 and H.263+ video. Also, the complex dependency geared for coding efficiency poses another bottleneck since it hurts the video robustness under erroneous environment. The 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fine grain scalability (FGS) of MPEG-4 [35] is one big step towards scalable video solution. However, it is a compromised and hybrid scheme, where the non-scalable base layer and scalable enhancement layer is mixed. This SNR scalability-focused scheme trades the compression efficiency with the robust tem poral separation of the enhanced layer. Recently, the hybrid temporal-SNR FGS scalability [48] is proposed to address this limitation. Both the SNR and temporal scalability can be achieved through a single enhancement layer by fine-granular coding of the B-frames, where the achieved frame rate can vary from the frame rate of base layer only to th at for the enhancement layer combined. However, even with this fixed tem poral scalability structure, it lacks the awareness to the inherent scene variation as pointed out in [48] and consequently the adjustment capability is still limited. Finally, with the help of this scalable stream, the video streaming task is much simplified since all the transcoding overhead required in the non-scalable codec is bypassed. However, scalable coding only solves the part of the problem, and packet loss is very common with the unpredictable channel condition. To address this prob lem fully, both efficient scalable coding scheme and flexible delivery technique are needed. The application-oriented people is working by starting from the current best-effort network model to find more innovative streaming scheme to mitigate the effect from this unpredictable packet loss [67]. T hat is, by rate adaptation and error control, the application-layer Quality of Service (QoS) is provided to the end user. The network infrastructure-oriented approach, on the contrary, is promoting more 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. QoS support in network node. Two representative approaches in the Internet en gineering task force (IETF) are the integrated services (IntServ) with the resource reservation protocol (RSVP) and the differentiated services (DiffServ or DS) [56, 7]. Those IP-QoS methods are more suitable in accommodating various QoS require ments of different applications than the best-effort model. It appears that efficient video streaming problem can be better solved from both application and network aspects. In the first part of this research (i.e. Chapters 3-5), by seamlessly integrat ing video preprocessing, error resilient scalable source coding, constant quality rate adaptation, packet prioritization, and the DiffServ-based network into one system, we tackle this challenging problem from both the application and the network view points. Our primary goal is to preserve smooth video quality even there is a signif icant change in the source, the bandwidth or both. Furthermore, we prioritize the video stream with packets as the basic unit for prioritized dropping and forwarding, where rate adaptation is dynamically performed to meet the time-varying available bandwidth. Our second goal is to show how such a prioritized stream can benefit from the QoS network such as IP differentiated service (DiffServ) when both the source and the network priorities match each other well. 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1.2 Encoder D esign for D igital V ideo Broadcasting Due to the different application environments, the main coding challenges encoun tered by the digital video broadcasting (DVB) system are somewhat different from those in streaming applications. Since most TV receivers support the interlaced dis play only, most video contents for the DVB application are of the interlaced format. Besides, since the DVB system is for the entertainm ent purpose, high video quality is one of the key requirements. Unlike video streaming where pre-loading delay up to several seconds may be allowed, the DVB system has a strict delay requirement to handle live broadcasting and/or channel hopping by users. This implies a tighter buffer constraint. Another main difference between DVB and Internet streaming applications is th at the DVB channel usually has better QoS support than the best- effort Internet. Hence, video distribution over broadcast channels (e.g. cable and satellite systems) is more reliable than Internet video streaming. The two major challenges for DVB are: (I) how to efficiently allocate the constrained bit budget to coding units to achieve the best possible quality and (2) how to decode and playback the original frame rate with proper decoder’ s buffer management. In the second part of this research (i.e. Chapters 6 and 7), we propose methods for the design of an R-D optimized codec system. Based on the given bandwidth and buffer constraints, we show how to perform efficient frame-level bit allocation and adjust the coding mode and the quantization param eter of each MB to achieve best video quality. Besides, several heuristics are developed to reduce the computational 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. r Video processing Preprocessing encoding, rate and error control Capture Video 1 1 ■ I Media ■ « i > 1 ik Media » • > i Playback ; i w Encoding ! • ^ i i , i serving ■ > ■ « Transmission i > i > Rate adaptation, packetization, error control Uni/multi-cast queue management, media caching Post processing Error concealment, interpolation and deblocking___ Figure 1.1: Components of a video streaming system and related research issues, complexity. For example, a fast algorithm is proposed to perform integer- and half- pel motion search for interlaced video. 1.2 Issues in Internet Video Streaming System In Fig. 1.1, we decompose the whole video streaming system into five components, where related research issues are also highlighted. Some of them are described in details as follows. 1.2.1 Source Encoding In order to achieve high efficiency, both spatial and temporal redundancies are ex ploited in the state-of-the-art video compression schemes. Spatial redundancy can be exploited with transform coding such as DCT and the wavelet transform while tem poral redundancy is exploited with frame prediction. One typical encoding flowchart is illustrated in Fig 1.2. Recently, there have been quite a few standardization efforts, including ITU H.263, H.263+, H.263++, H.26L and ISO M PEG -1/2/4 standards. All of them are based on a hybrid approach. They are however different from each 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. >- Encoded Residual (To Channel) Input Frame Motion Compensated Prediction Decoder Prior Coded Frame Approx Motion Vector and Block Mode Data (To Channel) Quantize Transform Entropy Coding Delayed frame buffer Motion Estimation Motion Comp. Predictor Entropy Decode Dequantize Inverse DCT Figure 1.2: The motion-compensated hybrid transform video coder. other in some details such as the transform, the quantization, the motion estima tion and compensation, and the entropy coding modules. Moreover, in all those standards, only the decoder and the syntax of the bit stream are defined so that the encoder can have some flexibility in several components such as preprocessing, rate control, motion search, postprocessing, and error concealment. This flexibility makes the coding optimization under the standard umbrella possible, and it also differentiates each specific implementation. Although several scalable options may be described, these standards mainly fo cus on non-sealable video coding. Compared with the non-scalable counterpoint, the scalable codec can easily resolve the network heterogeneity issue in the multi cast environment and the time-varying bandwidth issue in the unicast application. However, the coding efficiency of a scalable codec decreases. 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.2 R ate Control The major purpose of rate control is to match the encoded bitstream with the available channel bandwidth. Rate control can be performed for both constant bit rate (CBR) and variable bit rate (VBR) video. VBR can be further classified into shaped VBR, constrained VBR and feedback VBR for the ATM network [30]. The CBR transmission mode makes the network management task easier because of its predictable traffic patterns. However, the CBR mode does not m atch the inherent VBR characteristics of coded video well. Thus, quality degradation is inevitable in the CBR mode with a strict buffer constraint. Generally speaking, Internet video streaming can be better modelled by feedback VBR where rate control should adjust the transm itted video according to the feedback available bandw idth (ABW). During each feedback interval, the channel can be modelled with CBR. As a result, it is actually time-varying CBR. By incorporating the knowledge from the channel into the encoder and/or the media server, we can make a more appropriate selection of the frame rate, the frame resolution and the coded PSNR quality. This observation motivates most network-aware rate control schemes and the research work described in this proposal. Rate control, when working with congestion control, is often called rate adap tation. Based on the location (i.e. at the outbound of the server, at the active networking routers, and finally at the end user) where it is performed, rate con trol schemes can be classified into the server-oriented, the receiver-driven and the 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. hybrid approaches. The server-oriented rate control approach can be very much simplified via scalable video coding by simply truncating the embedded bitstream according to the available bandwidth. The embedded bitstream has fine granularity, if the source is highly embedded. For non-scalable encoders, rate control has to be tightly coupled with the encoding process. The receiver-driven approach relies on the receiver decision, and is much coarser than the server-oriented approach. By the hybrid approach, we mean th at both the server and the client play roles in rate control. As more active networks being deployed, the network can also perform rate control by filtering packets based on the packet priority at the active routing com ponent. Regardless of its complexity, we feel th at the hybrid approach is the most probable solution in the future since it provides rate control in both coarse and fine granularities. 1.2.3 QoS Provision for Internet Video Internet video applications have different delay and loss requirements from the con ventional data transmission that cannot be guaranteed by current best-effort Inter net. Thus, it is a challenging problem to design an efficient video delivery system that can achieve the maximal perceptual quality for end user as well as high network utilization. The application-oriented people is working by starting from the current best-effort network model to find more innovative streaming scheme to mitigate the effect from this unpredictable packet loss [67]. T hat is, by rate adaptation and error 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. control, the application-layer Quality of Service (QoS) is provided to the end user. It includes the FEC, ARQ and hybrid FEC/A RQ schemes. The network infrastructure-oriented approach, on the contrary, is promoting more Quality of Service (QoS) support in network node. Two representative ap proaches in the Internet engineering task force (IETF) are the integrated services (IntServ) with the resource reservation protocol (RSVP) and the differentiated ser vices (DiffServ or DS) [56, 7]. Those IP-QoS methods are more suitable in accom modating various QoS requirements of different applications than the best-effort model. Between these two main IP-QoS approaches, the DiffServ scheme provides a less complicated and scalable solution since IntServ requires to maintain per-flow state across the whole path for resource reservation. In the DiffServ model, re sources are allocated differently for various aggregated traffic flows based on a set of bits (i.e., the DS byte). Consequently, the DiffServ approach allows different QoS grades to different classes of aggregated traffic flows. Two services are supported; the premium service that support low loss and delay/jitter, and an assured service (AS) th at provides QoS better than best effort but without guarantee. By deliver ing video over DiffServ network, the packets assigned with different priorities will have different delay and packet loss behavior [53]. Thus, one network unequal error protection (NUEP) is realized and video quality is improved compared with the one without service differentiation. 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3 Research on Codec Design for DVB High video quality and low implementational complexity are vital for a successful codec system for DVB. The following two issues are discussed in this thesis research. 1.3.1 R -D O ptim ization for V ideo Encoding In any video compression standard, the bitstream syntax has to be unambiguously defined to allow for interoperability over a wide range of applications and systems. Thus, the decoding process is well defined by the standard without much room for manipulation (except for error handling and post-processing). In contrast, the en coding process is only included in the normative part of standards for bitstream generation. A lot of coding parameters are left open by the standard for a flexible implementation of the encoder. In order to achieve a high rate-distortion (R-D) performance tradeoff, an R-D optimization process is proposed for the DVB appli cation. Generally speaking, every encoding option can be optimized in the R-D sense if the associated complexity is acceptable. For a given image sequence, the frame type (such as I, P and B) of a particular frame and the corresponding bit allocation in its macroblocks can be determined to give the maximal quality under a certain rate constraint [47], [33]. Once the frame type and bit allocation for each frame is determined, the DCT transform type, the coding mode and the quantization param eter for each MB should be selected to result in the minimal distortion with 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the specified frame level bit allocation. As to the block level (of size 8x8), the 64 coefficients can be adaptively quantized so th at isolated high-frequency coefficients of small magnitude can be skipped to avoid a long run-length code without causing much distortion [9]. In principle, we can evaluate any coding options perm itted by the syntax with the R-D optimization criterion. The major challenge is: how to design an efficient R-D optimal codec with an acceptable complexity. 1.3.2 Fast M otion Search Motion estimation is the most computationally intensive module in a typical video encoder. To reduce the computational complexity, several fast algorithms [19, 61, 59, 3, 79] have been developed to avoid examining all points in a search window as done by the full search scheme. Recently, two fast motion estimation algorithms, i.e. MVFAST (the Motion Vector Field Adaptive Search Technique) and PMVFAST (the Predictive Motion Vector Field Adaptive Search Technique), were adopted in the MPEG-4 part-7 as the optimization model. These two algorithms include the following basic ideas: 1. selection of initial MV predictors from spatially and temporally adjacent blocks to perform diamond search (DS); 2. both small and large diamonds being utilized and adaptively selected based on the local motion activity; and 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. threshold param eters being adaptively calculated to assist in the early term i nation of the search. These two algorithms provide a significant improvement over the diamond search (DS) algorithm given in [59] in terms of visual quality as well as speed-up. They can be readily applied to the integer-pel motion search for progressive video contents. For interlaced video in DVB, two types of motion estimation (ME) modes, i.e. frame and field ME, are employed to improve the motion prediction efficiency. However, the complexity of a straightforward implementation of these two algorithms for inter laced video is about three times more expensive than th at for progressive contents. An enhanced motion search algorithm is desirable for the coding of interlaced video. 1.4 Contributions of this Research In the first part of this thesis, we examine the challenging IP media streaming problem from both the application and the network viewpoints. Source coding with finer granularity, constant-quality and content-aware rate control and adaptation and content delivery over prioritized networks are investigated. In the second part of this thesis, the R-D optimized scheme is formulated and developed to achieve quality improvement for both progressive and interlaced contents. Several fast schemes are developed to reduce the complexity in the R-D analysis and in the search of the optimal solution. To reduce the complexity of encoding interlaced contents, a 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fast motion estimation scheme is presented to perform the integer- and the half-pel motion search schemes. Contributions of this research are summarized below. • In the first part of Chapter 3, constant quality video rate control (CQVRC) is proposed for non-scalable MPEG-4 video to meet both the bandwidth and the receiver buffer constraints. The requirement of rate control to meet the CBR constraint is significantly relaxed due to a relatively large buffer adopted in streaming applications. Thus, VBR rate control can be performed, where we attem pt to match the average of instantaneous bit rates to the available bandwidth without violating the buffer constraint. The CQVRC exploits a larger decoder buffer, future frame information and tem poral scene segmenta tion to achieve much smoother video quality than that of previous rate control schemes such as TM5 and MPEG-4 VM targeting at low delay visual commu nication applications. The impact of the buffer size on the final visual quality is also studied. • In the second part of Chapter 3, rate control for scalable video is implemented by bitstream truncation. An optimal truncation strategy is presented for the MPEG4-FGS codec. The constant quality rate adaptation is realized by em bedding a small amount of rate-distortion (R-D) information in each bitplane of the MPEG-4 FGS enhancement layer. A piecewise linear R-D model can 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be reconstructed by interpolating embedded R-D data points for the enhance ment layer (EL). Then, adaptive bit allocation can be performed to achieve constant quality rate adaptation [73, 77, 78, 72]. • By considering spatial-temporal quality tradeoff, a sophisticated content-aware rate control (CARC) scheme is formulated in Chapter 4. The scalable video streaming system incorporates video preprocessing, scalable source encoding, and content-aware rate control. The proposed streaming solution address the problem on how to keep smooth video quality even when there is a significant change in the video source, the connection bandwidth, or both [76, 78, 75]. • In Chapter 5, the delivery of the MPEG-4 FGS bitstream over the DiffServ net work is investigated. The bitstream is first segmented into flxed-size packets, then the packet priority is calculated based on its loss impact on end-to-end visual quality. The prioritized packets are subject to differentiated dropping and forwarding under DiffServ. The impact of source coding parameters, the packetization scheme and differentiated network dropping/forwarding on final end-to-end visual quality is carefully investigated. It is shown that, although the prioritized stream benefits from the prioritized network, its gain is heavily dependent on how well the video source and the network priorities match each other . • In Chapter 6, the R-D optimization technique is investigated to handle both progressive and interlaced video. Both the quantization param eter and the 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. coding mode of MB are optimized in all types of pictures including I, B and P. In the frame level, a better R-Q model is developed based on the number of non-zero coefficients. It leads to a better frame level bit allocation result. Fur thermore, a quality feedback scheme is developed to generate the VBV (Video Buffer Verifier) compliant bitstream with assured video quality. To reduce the complexity of the R-D optimization, several fast heuristics are developed to ob tain results th at are close to the optimal one with a much lower computational complexity. • In Chapter 7, a fast motion estimation scheme is proposed to perform integer- and half-pel motion search for interlaced video. The proposed integer-pel mo tion search is based on the MVFAST scheme. By exploiting the correlation between the frame and the field ME methods, we can skip some field ME search types. Furthermore, based on matching results at integer positions ob tained by the integer-pel search, our scheme will adaptively decide whether half-pel search is needed or not. If the half-pel search is deemed necessary, the proposed scheme can skip sub-optimal search points to further reduce the com putational complexity. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.5 Outline of the dissertation In this introductory chapter, we have described various issues associated with video streaming applications over the Internet and efficient codec design. The remainder of this dissertation is organized as follows. Previous work on fast motion search, rate control, error control and QoS networks is reviewed in Chapter 2. Chapter 3 explains in detail the constant quality video rate control (CQVRC) scheme for both scalable and non-scalable video codecs. Sim ulation results are given to demonstrate the efficiency of the proposed scheme. We investigate the content-aware rate control (CARC) scheme to achieve the optimal tradeoff between temporal and spatio qualities in Chapter 4. Chapter 5 presents a MPEG-4 FGS video streaming solution over the DiffServ network, and demon strates the superior performance of such a solution. Chapter 6 presents a R-D optimized encoder design by jointly adjusting the MB quantization and the coding mode with several fast heuristics. The proposed scheme considers both progressive and interlaced content for all I, B and P type of pictures. In Chapter 7, we present a fast motion estimation scheme to perform integer- and half-pel motion search for interlaced ME. Finally, in Chapter 8, concluding remarks are given, and possible extensions of this work are discussed. 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Overview of Previous Work and Proposed Streaming System 2.1 Introduction For streaming video, the user and the network heterogeneity requires both adaptive video coding and flexible delivery techniques to overcome the challenges imposed by the best-effort Internet. It is widely believed that a good synergy of networking and coding will lead to a good solution in media delivery [67]. 2.2 Scalable Source Coding Video coding schemes can be classified into two types: scalable and non-scalable video coding. The non-scalable video coder generates one layer. In contrast, a scalable video encoder compresses an image sequence into multiple layers. The base layer, which can be independently decoded, provides the coarsest visual quality. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Other layers are enhancement layers, which can only be decoded together with the base layer so th at better visual quality can be provided. There are mainly three types of salabilities, i.e. the SNR, the spatial and the temporal scalabilities aiming at quality, spatial resolution and temporal frame-rate enhancement, respectively. Several layered options are provided in previous standards such as MPEG-2 [20], H.263/H.263+ [22], and MPEG-4 [21]. By adopting different quantization parame ters for different layers, the SNR scalability can be realized such th at the base layer provides the coarsest quality. Hence, there are existing two parallel coding proce dures. Temporal scalability is realized by encoding the bi-directionally predicted B frames which can be arbitrarily dropped if necessary. Spatial scalability is realized by down/up-sampling of the spatial resolution of the different layers. In Fig. 2.1, these three types of scalability are illustrated. These coarse layered-coding options are not sufficient to handle the unpredictable channel variation. Moreover, the complex dependency geared for coding efficiency poses another bottleneck since it hurts the video robustness under an erroneous environment. The fine grain scalability (FGS) of MPEG-4 [35] is one big step towards scalable solution, where the base layer is tar geted to provide the basic visual quality to meet the minimal user bandwidth. The scalable enhancement layer can be arbitrarily truncated to meet the heterogeneous network conditions. The prediction structure is illustrated in Fig. 2.2. Besides, all these standard coding options, scalable wavelet video coding schemes are also 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. extensively investigated in [69], [60]. However, the performance of scalable coding is found much inferior than its non-scalable counterpart. 2.3 Fast Algorithms for Block based m otion estim ation Motion estimation is the most computationally intensive module in a typical video encoder. In order to reduce the encoding complexity, several fast algorithms were developed. They include 2-D logarithmic search [24], three-step search [34], hierar chical search [3] and diamond search [59], etc. Although the search speed can be much improved, they may however result in significant quality degradation in com parison with the full search scheme. Recently, two fast motion estimation algorithms, i.e. MVFAST (the Motion Vector Field Adaptive Search Technique) [79] and PMV- FAST (the Predictive Motion Vector Field Adaptive Search Technique) [61], were adopted by the MPEG-4 part-7 as an optimization model. The basic ideas of these two algorithms include the following: 1. selection of initial MV predictors from spatially and temporally adjacent blocks to perform diamond search (DS); 2. both small and large diamonds being utilized and adaptively selected based on the local motion activity; and 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I B P B (a) Enhancement layer EP EP Base layer (b) Enhancement layer Base layer (c) Figure 2.1: Layered options in the non-scalable video codec. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E nhancem ent Bitstream IDCT Frame Memory M otion E stim atio n Figure 2.2: The encoding structure of the MPEG-4 FGS codec. 3. threshold parameters being adaptively calculated to assist in the early termi nation of the search. These two algorithms provide a significant improvement over traditional fast search algorithms in terms of visual quality as well as complexity reduction. The obtained visual quality from MYFAST and PMVFAST is very similar to that ob tained by the full search scheme. 2.4 R ate Control for Video Applications The rate control problem can be formulated as follows. Given the bit budget for an image sequence of concern, how can we encode the sequence to achieve the highest 23 Frame Find Meinoiy ' Maximum FGS Enharici'ntcrit Er nd n Input Video DCT Bit-plane VLC Q M otion C o m p e n sa tio n VLC B ase Layer Bitstream Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. quality while meeting the channel and delay constraints as well? The encoder in cludes the decision on the frame type (I, P and B frames) [32, 33], the coding mode (intra and inter mode) [10, 49, 66, 57], the motion vector and the quantization step- size. Generally speaking, any encoder decision process involving the rate allocation can be classified as rate control. Traditionally, video applications are categorized into two groups: VBR and CBR applications. For VBR applications, rate control attem pts to achieve the optimal quality for a given rate while for CBR and real-time applications, a rate control scheme must satisfy the low-latency and the buffer con straints. Based on the latency (i.e., effective buffer) requirement, we divide typical video applications into two categories: (1) real-time visual communications and (2) stored video streaming. 2.4.1 R eal-Tim e V isual Com m unications Rate control schemes in current standard test models (i.e., MPEG-2 TM5 [20], H.263 TM N 5/8/9/10 [22] and MPEG-4 VM [21] ) are usually designed for real time visual communications. Delay is an im portant issue in real-time communication, where the whole process of capturing video, encoding, transmission, and decoding has to be done within the delay constraint. The delay constraint can be proportionally translated into effective encoder/decoder buffer constraints [14]. Thus, low delay means th at a low effective encoder/decoder buffer is allowed. 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The objective of rate control is to select the quantization param eters so that the encoder produces bits at the rate of the channel and the overall distortion is minimized. Generally, rate control schemes can be divided into two layers, i.e. the frame and the macroblock layers. The frame layer assigns a target number of bits to each video frame and, at a given frame, the macroblock layer selects the quantiza tion param eters to achieve the frame target accurately. M athematical modelling is a common approach used in rate control, where the rate and the distortion models are established to decide the quantization step-size based on the current buffer fullness, the available bandwidth as well as the frame distortion [6], [4], [31]. Current test models of video standards adopt mathematical modelling due to its simplicity. How ever, like any modelling approach, video quality will not be good when the model cannot characterize the input video well. A typical rate control procedure in current test models can be summarized as follows [31]. • Step 0: Initialization (1) Initialize the buffer size according to the latency requirement; (2) Excludes the bit count of the first I frame from the total budget; and (3) Initialize the buffer fullness to the middle level. • Step 1: The pre-encoding stage: (1) Estimate the target bit rate; (2) Further adjust the target bit rate based on the buffer status; and (3) Calculate the quantization parameter. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Step 2: The encoding stage: (1) Encode the video frame and record the actual bit rate; and (2) Activate the MB-layer rate control if necessary to meet the target frame rate. • Step 3: The post-encoding stage: (1) U pdate the R-D model; and (2) Perform frame skipping if necessary to prevent the potential buffer overflow and/or underflow. 2.4.2 Stored V ideo Stream ing For non-real-time video applications such as video-on-demand, video broadcasting, and digital library, their constraints are different. Since the image sequence is already available and encoded and stored in the server, the encoding delay is not an issue. Before the playback, one part of the bitstream is pre-fetched to the decoder buffer to ensure th at every frame can be decoded at the scheduled time. The preloading time depends on the decoder buffer size and the waiting time, which a viewer is willing to accept. Compared with real-time applications, the pre-loading time in off-line streaming applications can be significantly larger than real-time applications. Com pared with low delay CBR rate control, the buffer constraint here is largely relaxed, and thus a well designed VBR algorithm can be adopted. It can be considered as long term CBR coding that allows more fluctuation on the instantaneous bit rate while the average bitrate is constant. This is actually known as VBR rate control with bandwidth and buffer constraints. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In [44], Pao et al. considered constraints of the pre-loading time th at video viewers have to wait and the physical buffer-size at the receiver (decoder) side. They proposed a sliding-window rate control scheme th at uses the statistical information of future video frames as a guidance to generate better video quality for vide streaming involving constant bitrate channels. In [70], Yu et al. presented MPEG-2 VBR coding for a fixed-size storage application, where both long-term CBR and the finite size of the decoder buffer were taken into account. Both approaches have two passes, where the first pass is dedicated to the collection of the statistical information while the second pass performs the actual encoding task. 2.4.3 R ate Control w ith R-D O ptim ization One fundamental problem in the encoder design is: how to select coding param eters to maximize visual quality under constraints imposed by the computational complexity, delay, bandwidth and/or loss factors. For a buffer constrained GBR coding, the optimal encoder bit allocation problem was studied in [40] with a forward dynamical programming technique known as the Viterbi Algorithm (VA) over a discrete set of quantizers. The proposed solution generates a trellis to represent all viable allocations at each instance (i.e. frame or MB). Each branch corresponds to a quantizer choice for a MB or frame and has an associated cost. The trellis path with the minimal total distortion can be found via 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. VA, and it provides the optimal solution to this buffer constrained bit allocation problem. A tree-based dependency diagram was propose in [47] for an optimal implemen tation of the MPEG standard. In this case, all possible combinations are generated by successive quantizer choices in a sequence of frames. The number of combina tions grows exponentially with the number of levels of dependency, which makes the optimal solution too complex to be practical. Although heuristics such as the monotonicity assumption can be used to reduce the number of search paths, it is still too complex for implementation in real-time applications. Instead of optimizing bit allocation among frames, bit allocation can also be optimized among macro-blocks by selecting different quantization steps and/or coding modes for the P frame coding in the H.263 standard [66, 57, 39, 49]. Wiegand et al. [66] proposed a method to select one from four possible modes, i.e. uncoded, intra, inter and inter-4V (Inter MB with four motion vectors), for the coding of MBs in a P frame to optimize the R-D tradeoff. A joint coding-mode and quantization-step selection method was considered in [39, 49] to encode the P frame with the R-D optimization. To reduce the computational complexity, Mukherjee et al. [39] proposed the M-best search scheme, in which the M least-cost paths are retained as survivors at each state in a trellis and carried over to the next step. Schuster et al. [49] restricted the range of quantization param eters to be between 8 and 12 for a speed-up. 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The R-D optimization goal can also be achieved via adaptive quantization [29, 44, 9]. T hat is, the decision on whether one coefficient in one DCT block is encoded or not is determined by evaluating the rate and the distortion associated with the specific coefficient. 2.4.4 R ate control for M PE G -4 FGS bitstream Rate control for the finer granular MPEG-4 FGS bitstream is much simpler since the bitstream can be arbitrarily truncated. One default approach is that, for the VBR channel, the EL bitstream can be truncated according to the instantaneous available bandwidth. While, for CBR channel, the bit budget after subtracting BL is evenly distributed among EL frames [45], Although it is simple, it bears two disadvantages so th a t rate control fully following the bandwidth variation may have significant quality variations. Also, the trivial even bit distribution may not be optimal for final PSNR results. In [64], Wang et al. examined the optimal rate control for one improved special MPEG-4 FGS codec called the progressive FGS (PFGS) scheme. Different from MPEG-4 FGS scheme that totally disconnects the prediction link within EL of frames, PFGS allows for a limited prediction loop within one EL group (i.e., 2 or 3 frames as one group) to improve coding efficiency. By assuming one exponential model between frames within one group, they performed bit allocation among EL in the rate-distortion sense. However, the gain from this R-D optimized bit allocation is somehow very limited as described in [64]. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.5 Quality of Service Provision for Video Applications QoS can be provided for video applications in two layers, namely, in application layer and in transport layer respectively. The application-oriented people is working by starting from the current best-effort network model to find more innovative stream ing scheme to mitigate the effect from this unpredictable packet loss [67]. T hat is, by rate adaptation and error control, the application-layer Quality of Service (QoS) is provided to the end user. Traditionally, rate adaptation and error control are usually investigated separately. However, rate adaptation and error control affect each other, hence, several recent works attem pt to integrate these aspects and consider them together. In [71], the rate adaptation is performed to smoothly adjust the sending rate of MPEG4-FGS encoded video based on the estimated network bandwidth, and then each packet is protected unequally. However, this sender-oriented rate adap tation is mainly for unicast video rather than the multicast one. Moreover, FEC level for unequal error protection is decided in a heuristic manner (i.e. without opti mization for end-to-end quality). In [5], one fully receiver-driven approach for joint rate adaptation and error control is proposed with pseudo-ARQ layers. The sender injects multiple source/channel layers into network, where the delayed channel lay ers (relative to the corresponding source layers) serve as packet recovery role. Each receiver performs rate adaptation and error control by subscribing selected number 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of source and channel layers according to the receiver’ s available bandwidth and channel condition. However, as shown in [11], the entire receiver-driven approach are subject to several drawbacks including persistent instability in video quality, arbitrary unfairness with other sessions, and difficult receiver synchronization. On the contrary, the network infrastructure-oriented approach is to promote more Quality of Service (QoS) support in the network node. Two representative approaches in the Internet engineering task force (IETF) are the integrated ser vices (IntServ) with the resource reservation protocol (RSVP) and the differentiated services (DiffServ or DS) [56, 7]. Those IP-QoS methods are more suitable in ac commodating various QoS requirements of different applications than the best-effort model. Between these two main IP-QoS approaches, the DiffServ scheme provides a less complicated and scalable solution since IntServ requires to maintain per-flow state across the whole path for resource reservation. In the DiffServ model, resources are allocated differently for various aggregated traffic flows based on a set of bits (i.e., the DS byte). Consequently, the DiffServ approach allows different QoS grades to different classes of aggregated traffic flows. Two services are supported; the pre mium service th at support low loss and delay/jitter, and an assured service (AS) that provides QoS better than best effort but without guarantee. Several previous works have been performed on the non-scalable (i.e. coarse granular) video streaming over QoS provision network in [52, 53]. In [52], Shin etc. provides one QoS mapping 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mechanism for prioritized video packets for non-scalable video. It is shown th at the significant gains can be achieved compared with that without service differentiation. 2.6 Proposed Video Streaming System For Internet video transmission including streaming, both rate control and error control are required to handle the dynamic variation of available network band width and packet loss/delay. In this proposal, first part is mainly focusing on rate control aspects while second part covers the error control issue by a DiffServ based differentiated forwarding. Under Internet video with non-scalable video coding such as M PEG-x and H.26x standard, source rate control at the encoder has to be tightly tied with the trans port rate control (i.e., rate adaptation th at reacts to the network dynamics via the congestion control and consists of schemes ranging from packet scheduling to rate smoothing) to tackle network variation while keeping low-latency [26]. The tight integration limits the applicability of non-scalable video coding to the unicast Inter net video, although the layered coding options and longer latency can relax it. To better help the required isolation, rate control schemes working on pre-compressed stream or packets, which is often called rate shaping, can help the required iso lation. There are approaches from inter-dependence based frame dropping [13] to the coefficient-level dynamic rate shaping (DRS) [23] including transcoding scheme. 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. These rate shaping schemes for non-scalable video codec are still too coarse and in flexible. However, with the fine-grain scalable (using the embedding property) video codec in place, the isolation is easily achieved and a scalable rate controller (shaper) can be designed systematically. As a result of this isolation, the scalable video codec can be applied to the multicast video [36] as well as the unicast video. Generally speaking, different types of video sequences and even different portions of a sequence may exhibit distinct degrees of motion (i.e., temporal variation) and textures (i.e., spatial complexity). For example, some frames contain fast motion with coarse texture, while some may have slow motion but delicate texture. When the bandwidth can not guarantee both the high frame-rate for motion smoothness and sufficient spatial quality, the tradeoff selection needed to be done at any points between the sender and clients. For H.263+, a variation of encoding frame-rate is proposed in [55] where the frame-rate plays the control knob for spatial/tem poral quality tradeoff. Although it may achieve real-time frame-rate adjustm ent by using sliding window, it still lacks the global perspective over larger number of frames and it is constrained to unicast video. Hence, we have pursued a generalization of this variable frame-rate control (i.e., called as the adaptive temporal scalability), which considers the global scene variation. To enable the content-awareness to the scalable video codec, scene content analysis from both preprocessing and encoding has to be organized and conveyed to the scalable rate controller in our approach. Thus, 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. RQC TQC Frame No. Frame No. (b) (a) Figure 2.3: Temporal variation of a sequence: (a) temporal variation and GOP decision, and (b) temporal variation within a GOP and its dependency structure. an integrated quality metric th at jointly considers the spatial/tem poral quality is designed as follows. The proposed content analysis, which may be performed in both pre-processing and encoding stages, generates the spatial and temporal quality contributions (SQC/ TQC) for each temporal segment (i.e., GOP with variable number of frames). These metrics indicate th a t the GOP in question is contributing how much to the overall spatial/tem poral quality of video sequence to be encoded. At the client side, portion of the quality contribution is actually delivered, depending on the rate allocation on each GOP. In fact, these measures can be acquired from features either in spatial- domain or compressed-domain and the specific way of assigning these values may vary from one to another. In Fig. 2.3(a), the typical tem poral variation for a se quence is illustrated. For example, any global or local frame difference measure, such as mean absolute difference (MAD), histogram of difference (HoD) or mean motion 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vector magnitude (MMV) can be used to represent the tem poral variation (eventu ally the TQC measure). There are several peaks that indicate abrupt scene changes and when we zoom into one specific segment (as shown in Fig. 2.3(b)) there are another time-scale changes corresponding to the temporal variation of each frame within a segment. To efficiently represent this variation within a GOP, another measure called relative quality contribution (RQC) is defined for every frame with respect to the member GOP. Then, the spatial complexity variation of a sequence has to be considered, which indicates how hard to encode the frame to a specified distortion level. Even though the spatial quality complexity may change significantly for different scenes, it is rather smooth within a GOP (if the adaptive GOP grouping is working well). Thus, we are currently using a representative measure for every frame in a GOP (i.e., each GOP) for the SQC. For example, one can use initial I-frame bit rate for this. Finally, all these measures, SQC/TQ C for each GOP and RQC for each frame, are stored for subsequent rate control. In 2.4, we illustrate the proposed system including the video preprocessing, source coding, rate adaptation and prioritized transmission as an seamlessly inte grated framework. An input video sequence is analyzed by the pre-processing where the adaptive grouping is decided by the motion-based rule. The sequence is then encoded to the maximal temporal, spatial and SNR quality (i.e., over-coded). The generated stream is stored with the associated quality contribution measures for later transmission. Thus, the content-aware scalable video rate control (CARC) is 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. , \ldco \ ! I. ► V Sequence / ' l » 2 K Ki-etl \ id. R Q C V T TC „ D I2 K M o u o p -h jic il p i e p i i i C t f N i i n ^ ( o n troJ IXit.l / I o u u o l D jl.'lr — l 1 iW Jl' SQL . V -i i iu< . K..J 1 i Content Aware Rate Controller H n stre a iji \ 0 \ j i -C oiled \ hii>.trcam Bandwidth estimator Packetization .max Internet Video Filter ^ J User Preference End user Figure 2.4: The content-aware scalable Internet video streaming system. implemented in the streaming server. Based on the estimated available bandwidth, the user buffer status and its preference, the CARC first calculate the suitable frame rate and quality layer for each transm itted frame, then starts to parse the stream header and the portion of stream corresponding to desired spatio-temporal combina tion is selected and transm itted to the packetization module. Since the proposed rate control can tradeoff the quality flexibly, we can achieve much smooth reconstructed video. In addition, as an independent part from encoder, the rate controller can serve multiple users simultaneously. The prioritized packetization is highly desirable if the network can transm it pack ets with different quality of service (QoS) levels. One example is the differentiated service network where each packet can be assigned with different priority level and 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. consequently the packets will have different delay and packet loss behavior [53]. In this proposal, the prioritized packetization is discussed for the embedded EL bit stream as well as the non-scalable BL bitstream. The packet priority is determined by the unit distortion (UD) of packet. By unit distortion of packet, we means the ratio of caused distortion to the size of the considered packet, and the packets with the largest UD should be assigned the highest priority. By this manner, the relative priority index (RPI) can be generated by mapping to the calculated UD. The pri oritized RPI label will enable the efficient extension of CARC, the content-aware filtering/forwarding, at the transmission end of streaming server, or networking nodes or at the clients. For example, at the transmission of streaming server, it allows adaptive dropping of packets with lower RPI values based on the available network bandwidth. It can also be utilized within the active networking entity (i.e., video filters) residing at any intermediate location between server and end-user. The difference is th at the CARC at the streaming server can also incorporate even the stream-level rate control while that of the intermediate node has only priority labels on packets. The prioritized video packets will be mapped to different QoS levels provided by the proportional DiffServ network. By differentiated forwarding of those packets, one network unequal error protection is realized. By careful design of source coding, packetization, and network parameters, superior performance is shown from the proposed system. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.7 Conclusion In this chapter, we reviewed recent progresses in source coding, fast motion search, R-D optimized rate control, and QoS provision schemes for Internet video coding and delivery. Although scalable video coding is most desirable for a time-varying channel, its efficiency is inferior to the non-scalable one. Therefore, almost all current video standards are based on non-scalable coding schemes. Most of rate control schemes are tailored for non-scalable coding accordingly, especially for the low latency real time video communications. The new rate control problem for streaming applications was examined as well. This is an application area th at is different from the classic one. To be more specific, it is about offline VBR rate control with a large buffer and bandwidth constraints in constrast with the strict CBR rate control in the classical setting. Rate control based on the R-D criterion has recently been employed to select different encoding options to achieve better visual quality with a higher com putational complexity than the default one recommended by the standard. To reduce the encoding complexity, several fast search scheme are developed, especially with recent improvement, they can obtained similar visual quality to that obtained by full search scheme. Finally, we describe our proposed video streaming framework. Each component within this framework will be described more in later Chapters. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Constant Quality Video Rate Control 3.1 Introduction Not only the quality of an individual frame but also the quality variation across frames determine the final visual quality of compressed video. It is desirable for the end user to have a reconstructed sequence of a high average PSNR value per frame but with a low variance. In this chapter, we investigate two constant quality rate control approaches: one for the non-scalable MPEG-4 main-profile codec and the other one for the scalable MPEG-4 streaming profile codec, which is also known as the MPEG-4 FGS codec. 3.2 M otivation Although the scalable codec gains more popularity recently due to its flexibility in a heterogeneous delivery environment, non-scalable coding schemes still dominate 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. coding standards and commercial video products. There are three popular streaming video systems in the market. They are the Microsoft Windows Media [41], the Real Network Real-Video/Audio [42] and the Apple QuickTime [43]. Instead of generating one scalable bitstream to deal with heterogeneous network access conditions, all these three systems produce multiple bitstreams of different quality to serve different access bandwidths such as the telephone modem, ISDN, DSL, T l, etc. The server switches among those streams during the streaming period based on the detected bandwidth. Therefore, rate control for CBR channels is still im portant even in video streaming applications. Rate control schemes proposed in existing standards mainly focus on low latency real-time CBR video applications. Thus, only a small amount of bit variation is allowed from one frame to the other due to the tight buffer bound imposed by the strict delay requirement. However, a video source may have different characteristics so th a t intervals with frequent scene changes, frame transition and camera motion are interleaved with intervals of slow motion with static background. Consequently, different rate demands arise from different segmented intervals of a video source. W ith low delay CBR rate control, allocated bit rates are more than enough for in tervals with small motion, but not sufficient for frames with large motion and/or scene changes. This leads to serious quality degradation (e.g. consecutive frame skipping or unacceptable spatial quality of each individual frame) due to the insuf ficient bit budget. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The conceptual relationship between quality, the bit rate and the buffer status for CBR channels is illustrated in Fig. 3.1. Generally speaking, free VBR is required in achieving constant visual quality, where both the bandwidth and the buffer should not be under any constraint. If the buffer and bandwidth constraints are imposed, quality variation seems unavoidable. However, for the off-line streaming scenario, we can relax the buffer latency to some extent. Besides, if the characteristics of future frames are available, they can also be exploited. Given these conditions different from the conventional low delay real-time video applications, it is possible to achieve more constant video quality than existing rate control schemes. MPEG-4 FGS video has been adopted as a standard coding tool for video stream ing application [35]. A straightforward approach to rate adaptation is evenly dis tributing the remaining (after BL coding) available bandwidth to EL. This trivial allocation however does not provide the best quality due to the following reason. BL is normally generated as CBR (constant bit rate) and it exhibits a significant amount of quality variation frame by frame as shown in Fig. 3.2 (left). W hen a fixed amount of bits are allocated to the EL of frames, the degree of quality vari ation is about the same as shown in Fig. 3.2 (center). It is desirable to have the constant visual quality as shown in Fig. 3.2 (right). Thus, an unequal amount of bit rates should be assigned to different frames. In the following, we will describe our approach to achieve constant quality with rate adaptation. In addition, we will 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. examine the effect of constant quality BL to the overall video quality when EL is added. 3.3 Constant Quality Rate Control for M PEG4 Video To enhance received video quality, not only the statistical information of the previous and the current frames but also future frames are needed to be considered. As done in [70], [44], [62], our approach also exploits the two-pass coding approach where the first pass focuses on analyzing the underlying video and collecting the necessary information for the second pass. Then, the second pass performs the actual encoding task. 3.3.1 V ideo A nalysis We analyze the underlying image sequence so th at the sequence can be temporally segmented into relatively homogeneous intervals (i.e. scenes). There are several approaches to extract content features for tem poral scene segmentation. Both pixel- and compressed-domain processing can be adopted. Here, we adopt the motion- based approach. T hat is, the sequence is first divided into variable size segments (i.e., G O P’ s) by using the feature of the mean of absolute differences (MAD). The scene change and global motion activity can be captured from the MAD curve between 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. consecutive two frames. Scene changes are located at local maxima of MAD(n). For each frame i, the second-order differential can be calculated as The scene change can be located by detecting zero crossing points, and a new GOP is started from every scene change. Besides, to handle slow and steady scene transitions, the accumulated difference with respect to the initial frame is also checked against a predefined threshold. After these, a sequence can be efficiently segmented into temporally homogeneous G O P’s. The proposed rate control will aim at providing uniform quality for each temporally segmented GOP so th a t quality variation should occur only at GOP boundaries since minimizing the quality varia tion (in the PSNR value) among different scenes provides little meaning to human perception. 3.3.2 G enerate Coding C om plexity Inform ation The relative frame complexity can be represented by the coding cost of each frame i in GOP g. As shown in [76, 77, 62], it can be described by the proportion of bits spent on encoding a frame relative to the average bit used in the whole sequence as d2(MAP(i)) (di)2 = MAD(i + 1) + M AD(i - 1) - 2 ■ MAD(i). (3.1) = Rg,i/R, (3.2) 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where the bit usage Rgj is generated by encoding frame i with a fixed quantization param eter (QP) in the first coding pass. 3.3.3 Three Layer Constant Quality R ate Control Our constant quality rate control works in a three-level hierarchy, i.e. the GOP level, the frame level and the MB level. 3.3.3.1 Com plexity-G uided GOP level bit allocation To assign bits to each GOP, we first define the complexity measure for each GOP by calculating the average spatial complexity as = Y2cg,i/Ng- Then, the following GOP level recursive bit allocation is applied. • Step 1: Initialization. We set A=0, bit budget Br — B, the initial buffer fullness /?! = Td ■ R and start from GOP of index 1. • Step 2: Assign bits to GOP of index g by using the formula B,(g) = A • (R/F) ■ JV(9) + (1 - A) • ' Br. (3.3) • Step 3: Check the buffer status with the tentatively assigned bit budget Bt(g), if 0.8 ■ j3max, 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. we accept it and go to Step 4. Otherwise, we adjust the value of A via A — 0.1 and go back to Step 2. • Step 4: U pdate the buffer status via P„ = + B ,(g ))~ (R /F )-N (g ), and set the remaining bit budget Br to Br — Bt(g). Then, go to Step 2 for next GOP of index <7 + 1. In above, R is the channel rate, F is the selected frame rate, N(g) represents the number of frames in GOP of index g and A represents the weighting factor between the buffer variation and complexity demands. The case of A = 0 represents the bit allocation scheme follows the frame complexity completely. If the buffer constraint can be met, this is the most desirable scenario. On the other hand, A = 1.0 represents the case where the bit budget is evenly distributed without considering the frame complexity. In this case, minimal pre-loading and a minimal size of the decoder buffer are required such th at only the first frame has to be pre-fetched. The case with 0 < A < 1.0 represents a bit-allocation tradeoff between the buffer and the quality constraints. 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.3.2 Frame Level B it Allocation The GOP level bit allocation solves the problem of allocating the bit budget to each GOP while meeting both the buffer and the quality constraints. However, the problem of allocating the bit budget to each frame within one GOP is still unsolved. To obtain the constant quality within each GOP, it is preferable to allocate the bit budget according to the frame complexity while still meeting the buffer constraint. The detailed algorithm is similar to the one for the GOP level bit allocation and stated below. • Step 1: Initialization. We set A=0, bit budget Br = Bt(g), the initial buffer fullness /30,o = Tj ■ i? if it is the first GOP. Otherwise, the initial buffer fullness is set to (3gfi = (3 g-i- We start from frame 1 in the current GOP. • Step 2: Assign bits to frame i of GOP with index g by B,(i) = A • ( fl/F ) + (1 - A) • • Br. (3.4) • Step 3: Check the buffer status with the tentatively assigned bit budget if (0g,i- 1 + Bt(g,i)) — (R / F ) < 0.8 • (3 max, we accept it and go to step 4. Otherwise, we adjust the value of A via A — 0.1 and go back to Step 2. 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Step 4: Update the buffer status via Pg,i — (Pg,i- 1 + )) — (R / F ), and set the remaining bit budget B r to B r — Bt(g,i)- Then, go to step 2 for the next frame with index i + 1. 3.3.3.3 M B Level R ate Control The MB level rate control is necessary for low delay applications or higher bit rate coding to achieve the strict buffer regulation and better spatial perceptual quality. Our approach attem pts to meet the assigned bit budget for each frame in a strict or a relaxed manner depending on the underlying buffer constraint. For the GOP and the frame level bit allocation schemes, we preserve a safe margin (0.8 of the maximal buffer) for buffer regulation. If a reasonably large buffer is allowed, e.g. 5-second pre-loading time for a 100kbps channel, then the buffer safe margin is equal to 50k bytes. Then, we can allow for 25k bytes bit variation of the estimated bit usage from the real bit usage. Compared to the normal size of I and P frames of the CIF format, which is less than 10k bytes, it is sufficient for all possible frame level budget deviations. Therefore, one simple mode is to quantize all MBs with the same quantization param eter (QP) in our proposed scheme. The QP value is determined in the frame level rate control with the following trial and error strategy. 1* I f B aciu aZ * 1.15 * Bt(i), then QPi+ 1 = QPt + 1. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. If B actuak < 1.15 * Bt(i), then we check the condition B actuak < 0.85 * Bt(i). If it is true, then QPl+\ = QPi — 1, else QPl+i = QPi. 3. We set QPi+i — m&x(QPi+i, 1) and QPi+i = min(QPi+i, 31). However, if the allowed buffer is relatively small, then one MB level rate control similar to the approach in the MPEG4 VM [31] can be adopted. It consists of the following steps. • Calculate the target bit rate r. , : for each MB i according to its MAD value. • Solve for the QP value based on the assigned and the existing R-D model. • U pdate the R-D model according to the actual bit usage. For more details, we refer to [31]. 3.4 Constant Quality Rate Control for M PEG4 FGS Codec As illustrated in Fig. 3.3(a), MPEG-4 FGS video consists of one BL th at can be coded with a non-scalable MPEG-4 compliant coder. Each EL is coded progres sively (i.e. bit-plane coding) with the embedded R-D parameters. Compared with the non-scalable video codec whose rates are usually adapted by transcoding, the MPEG-4 FGS codec simplifies rate adaptation since the EL stream can be arbitrarily truncated. 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In order to achieve constant quality with rate adaptation, an unequal amount of bit rates should be assigned to different frames. By embedding a small number of R-D samples into the bit stream, we can build an R-D model via piecewise- linear interpolation at the decoder end. To avoid modeling inaccuracy of existing closed-form models at low bit rates [31], Lin et al. utilized a set of R-D samples to approximate the complete R-D relationship via cubic spline interpolation [37]. Besides higher accuracy, the R-D approximation based on samples only demands a low complexity and a small overhead. To be more specific, R-D samples for MPEG-4 FGS EL are embedded, which is coded bitplane by bitplane, in our approach. The R-D characteristic is assumed to be uniform within each bitplane since distortion reduction is roughly determined by the quantization parameter. Consequently, only R-D samples at the beginning of each bitplane have to be calculated and embedded. Typically, there are only a few bitplanes, i.e. 6 ~ 7 bitplanes, corresponding to a wide range of QP (e.g. from 1 to 26 or 27 times QP.) Moreover, since DCT is an unitary transform, the distortion associated with R-D samples in each bitplane can be directly calculated in the coefficient domain. R-D samples of each bitplane introduces a negligible amount of overhead and demands a very low com putational complexity. The generated R-D samples can be either stored in the user defined data of each VOP or in a separate file. In Fig. 3.3(b), the piecewise-linear R-D curve obtained via interpolation is compared to the empirical curve for the 1st, 15th, and 30th frames of the “Foreman” CIF sequence. It is verified 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that the piecewise-linear R-D model can approximate the empirical R-D curve quite well. W ith this interpolated R-D model, we adopt a sliding-window approach to per form rate adaptation. Suppose that window Wi includes Mw, frames. Then, the rate allocation optimization to achieve constant quality can be performed within each window independently. In terms of mathematics, the problem can be w ritten as where represents the available bit rate (bits/sec) at the start time of window i and B bl is the total bit budget for BL within this window. To reach the opti mal solution, bisection search is utilized to find the best D Wi, which minimizes the distortion variation among frames. T hat is, window as the initial value of D Wi. • Step 2: Calculate . B j based on the given D W i by using the piecewise linear R-D model. m in £ 1 1^ - ^ - . u (3.5) jeWi subject to • Step 1: Take the minimal frame distortion of all BL frames within one sliding 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Step 3: If Bj < ( ^ ^ - B bl) - A B - 8 , then set Dw. = Dw^ ^ , D high = Dwi and return to Step 2. Else, if J2j Bj > ( ~ B bb) — AB + 5, then set Dw. = nwA ^ h,H l1 .' £)low = Dwi and return to Step 2. Note th at 5 is negligible factor to control rate adaptation accuracy. Else, continue with Step 4. • Step 4: Move to the next window, update by A B = B reai — R W i+ 1 — Rwi+Mw • ^ we are n°t at the end of sequence, go back to Step 1. Only a few iterations are needed in the above bisection search of Rw{ and Dwt. Since a simple interpolation scheme is needed to calculate these values, its complexity is low enough to be performed in real-time. This approach works the best for slowly varying channel conditions. Even for a fast varying channel as shown in Section 5.5, the proposed approach is still acceptable when differentiated forwarding can mitigate packet losses due to inaccurate rate adaptation. Another interesting point is the choice of the proper window size Mwt ■ By increasing M w1, visual quality can be smoothed further. It however requires more buffering and incurs more delay. The optimization window should also not go across multiple scenes since minimizing the distortion variation among frames of different scenes has little meaning to visual quality. Generally speaking, constant quality rate adaptation can be performed by joint spatio-temporal consideration. For example, the source can be segmented into several temporal homogeneous regions [76], and both the tem poral frame-rate and the spatial SNR quality can be adjusted to enhance visual quality. 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.5 M PEG4 FGS Video Streaming W ith CQVRC An overview of the MPEG-4 FGS video streaming system with the proposed CQVRC is shown in Fig. 3.4. As described above, a two-pass coding procedure is exploited. First, an input video sequence is analyzed by a pre-processing module to tempo rally segment it into a set of group of pictures (GOPs) (i.e. a variable number of homogeneous frames). Then, to obtain the frame complexity (i.e. spatial quality re quirement), it is further analyzed by a pre-encoding module. The gathered statistics (e.g. GOP segmentation, frame complexity) along with the target minimum buffer size and the minimum bandwidth guide CQVRC for BL. The MPEG-4 FGS codec encodes BL into a smooth quality stream. Residuals are then coded with EL streams while R-D samples are generated for each bitplane and embedded as the VOP user data. For non-real-time applications, the coded bitstream is pre-stored in the streaming server. Upon the streaming request, the rate adaptation module scales the EL stream based on the available bandwidth feedback to preserve constant quality by referring to embedded R-D samples. These rate-adapted BL and EL streams are then packetized into fixed-length packets and prioritized by evaluating the anticipated loss impact of each packet on end-to-end video quality. Thus, based on the assigned priority, network adaptation including error control can be performed. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.6 Simulation Results In this section, extensive simulations are performed to evaluate the proposed algo rithm in comparison with TM5 and MPEG4 VM rate control schemes. Then, we analyze the impact of the pre-loading buffer on the final end-to-end visual quality. Following that, the constant quality rate control scheme for the MPEG4-FGS codec is evaluated and compared with the FGS rate control scheme with three different options, i.e. constant quality only in BL, constant quality only in EL, and both. 3.6.1 Constant Quality R ate Control for M PE G 4 V ideo In the current MPEG4-VM rate control method without MB level rate control, un predictable frame skipping is very common during scene change or large motion areas. In those regions, the characteristics of frames after scene change are signifi cantly different from those before scene change. Thus, the actual bit usage is much larger than the estim ated bit budget, which tends to keep the virtual buffer at the full level. Consecutive frames after this particular frame will cause potential buffer overflow, and be skipped accordingly. We cascade 150 frames of “Akiyo” and 150 frames of “Foreman” CIF sequences to create a scene change. Furthermore, there is a large amount of camera motion in the “Foreman” Sequence. The characteristics of the cascaded sequence are given in Table 3.1. In Fig. 3.5 (a), (b), (c), (d), we show positions of skipped frames under pre- loading delay of 250ms, 500ms, Is, and 2s, respectively. The encoded frame rate is 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.1: Content characteristics of cascaded “Akiyo” and “Foreman” CIF se quence^___________________________________________________________________ frames (0-149) 150 content Akiyo, Slow motion Scene change point from Akiyo to Foreman frames (220-270) 271-300 content large camera motion slow camera padding with detailed texture lOfps, and the bit rate is 128 kbps. It is found th at several consecutive frames are dropped between frame no. 150 ~ 170 and no. 220 ~ 270 for small pre-loading delay such as 250ms and 500ms. However, for longer pre-loading delay such as Is and 2s, few frames get dropped. When pre-loading becomes 3s, there is no frame skipping anymore. In all above cases, a significant amount of quality variation is present. In Fig. 3.7, we plot the PSNR value of the decoded sequence by M PEG4 VM rate control with pre-loading delay of 3s, (i.e. without any buffer frame skipping). On the other hand, instead of dropping frames, TM5 rate control attem pts to match the frame bit allocation with one buffer constrained MB level rate control which can adjust QP for each MB. Although the virtual buffer level can be kept constant, significant quality variation is also found in Fig. 3.7. In Fig. 3.7, we plot the PSNR value of the coded sequence with the proposed CQVRC of 3-second pre-loading delay. It is observed from this figure that much smaller quality variation appears in the proposed CQVRC. In both MPEG4 VM and TM5 techniques, the PSNR value for “Akiyo” is more than 42db while the PSNR value for “Foreman” is as low as 26db. Since for quailty 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with PSNR above 40db is invisible to human eyes, both TM5 and M PEG4 VM rate control schemes offer unnecessary PSNR improvement for “Akiyo” yet provides un acceptable video quality for “Foreman” . In contrast, the proposed CQVRC provides much smoother video quality for both “Akiyo” and “Foreman” . 3.6.1.1 Im pact of Pre-loading Buffer on Visual quality We examine the impact of the pre-loading buffer with the proposed CVQRC scheme in Fig. 3.6. As expected, a large buffer provides a higher average PSNR result and much smoother video quality. The results are obtained for the “Foreman” sequence encoded at a frame rate of lOfps and a bit rate of 130kbps. Compared with TM5 and MPEG4-VM rate control, our proposed scheme achieve 0.5 0.6db improvement and a much smaller PSNR variation. 3.6.2 Constant quality rate control for M PE G -4 FGS Video In Fig 3.8, significant quality variation is observed in the case of even bit allocation in EL. In this subsection, we evaluate the proposed CQVRC scheme for M PEG4 FGS video in both CBR and time-varying CBR channels. To facilitate the evaluation of overall sequence quality, the sequence distortion is defined as the arithmetic mean of the frame distortion, i.e. De ^ x V t S * 1 - d & ’y’ (3- 6 ) x y t 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for a frame size of X • Y pixels and T encoded frames. Then, the sequence PSNR can be defined as 2552 De = 20logl0(—— ). (3.7) JJe We feel th at the sequence PSNR value defined above is more meaningful in char acterizing the sequence quality than the average PSNR value th a t is commonly used. The average PSNR is related to the geometric mean of the distortion of the individ ual frame while the sequence PSNR is related to the arithmetic mean of distortion of the individual frame. Intuitively, the arithmetic mean of frame distortions is more related to visual quality than the geometric mean of frame distortions. There are several parameters to be determined in CQVRC for the MPEG4-FGS codec, including the size of the optimization window, the bit rate of EL and the virtual decoder buffer size. The optimization window is a very im portant factor that determines the impact of quality smoothing. For example, for the window size equal to 1, we perform rate control based on the instantaneous bit-rate without any quality smoothing. A larger window size means th at we can perform CQVRC to achieve constant quality for a larger number of frames. Since large quality variation is only allowed during the window boundary in CQVRC, and, if the window fully covers one temporally segmented scene, there will be no visible quality variation within the scene. It is found th at human can easily perceive quality change during one scene. However, quality change across two scenes is less obvious. Thus, it 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is desirable to have the optimization window adaptive to the underlying source. For the time-varying CBR channel such as in the video streaming scenario, the optimization window will not produce realistic results since the window may cover several channel bit-rates. For the CBR channel, it is desirable to perform CQVRC within one optimization window covering the same video scene. In the following, we compare CQVRC results for CBR and time-varying CBR channels, respectively. 3.6.2.1 CQVRC for CBR channels First, we perform CQVRC for 10-second “Foreman” sequence whose BL is encoded at a frame rate of lOfps, a bit rate of 128kbps with TM5 rate control. Suppose that the total available bandwidths for CBR channels are 288kbps, 448kbps, 608kbps and 728kbps, respectively, we will have 160kbps, 320kbps, 480kbps and 600kbps for EL correspondingly. Those 100 frames can be temporally segmented into two scenes such that the first scene is mainly about the “worker” scene (i.e. frame 1 ~ 50)) while the second scene is about the “building” (i.e. frame 70 ~ 100). Frames 51 ~ 70 are the transition frames from scene 1 to scene 2). Thus, the optimization window adopted is 50-frame long, and the two scenes lie in two separate optimization windows. In Fig. 3.9, we show the PSNR comparison of the proposed CQVRC and MPEG4- FGS rate control (FGSRC) under 160kbps. We see th at the proposed CQVRC obtains almost the constant PSNR value for every frame in the first scene. The maximal frame PSNR difference is less than 0.4db in CQVRC. For comparison, the maximal PSNR difference in FGSRC is around 2.2db. For the second scene, a more 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. significant PSNR variation is observed for FGSRC. For example, the maximal PSNR difference is around 6.4db in FGSRC compared to 2.8db in the proposed CQVRC. Even in CQVRC, there is still some quality variation between frames 51 ~ 70. That is due to the fact th a t the quality of frames 71 ~ 100 is much lower than frames 51 ~ 70, and almost all available bandwidth of EL is utilized to compensate the quality of frames 71 ~ 100. Hence, quality of frames 51 ~ 70 is similar to frames in BL and has visible quality variation. Therefore, smoother visual quality can be achieved by combining CQVRC EL with CQVRC BL. In Fig. 3.10, we show the PSNR comparison of the proposed CQVRC and the MPEG4-FGS rate control (FGSRC) under 320kbps. As expected, much smoother video quality is obtained from CQVRC such th at the maximal PSNR difference is around 0.4db for the first scene and 0.7db for the second scene. For FGSRC, the maximal PSNR difference is around 2.5db for the first scene, and 7.2db for the second scene, respectively. A similar phenomenon is observed in Figs. 3.11 and 3.12 for 480 and 600 kbps EL. Moreover, CQVRC can be performed for either BL or EL and even both layers. In Fig. 3.16, we perform the constant quality rate control for the base layer, then FGSRC is employed to evenly distribute bits among EL. It is observed th a t there is still significant quality variation. This is partially due to significant quality variation in CQVRC of BL with constant QP. It further proves the necessity of the proposed CQVRC for EL of MPEG4-FGS even applying CQVRC to BL. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.6.2.2 CQVRC for Time-Varying CBR channel For a channel with the bandwidth estimator or the bandwidth feedback, it can be better modelled by T-CBR since the channel bandwidth is assumed to be constant between two estimation points. Here, we assume one simple bandwidth trace with binomial values, and the feedback periods are 1, 2 and 5 seconds, respectively. The total simulation duration is 10 seconds by using 300-frame “Foreman” CIF, whose base layer is encoded at a frame rate of 30 fps, and a bit rate of 240kbps. In Figs. 3.13 (a), 3.14 (a) and 3.15(a), the utilized T-CBR bandwidth traces are shown and the periods of bandwidth change are 1, 2 and 5 seconds, respectively. The optimization window sizes in CQVRC are adaptive to the period of the bandwidth change (i.e. 30, 60 and 150 frames, respectively). The bit rates of those two bandwidth levels are 960kbps and 600kbps, respectively. The comparison of the PSNR value and its variation of CQVRC and FGSRC is given in Figs. 3.13 (b), 3.14 (b) and 3.15(b). In Fig. 3.13, the optimization window is 30 frames long for CQVRC. Compared with FGSRC, much smaller quality variation is obtained in the proposed CQVRC such th at almost constant PSNR values are obtained within the optimization window. Visible quality variation only takes place at the boundary of the optimization window. Similar phenomena are also observed in Figs. 3.14 and 3.15. For CQRVC, when the bandwidth change is less frequent in Figs. 3.14 and 3.15, even smoother video quality can be obtained than CQVRC in 3.13. Therefore, to achieve more constant quality, it is desirable to enlarge the 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. optimization window in CQVRC. In particular, when the network has a large buffer to absorb the instantaneous bandwidth variation, one large optimization window can be readily utilized in CQVRC. Another im portant application of CQVRC is in the QoS-enabled network such as IntServ where the bandwidth renegotiation [28] is supported, and CQVRC can be performed during two renegotiation points. 3.6.3 CQVRC for BL and EL Finally, we apply CQVRC to both BL and EL. In Fig. 3.17, we compare CQVRC with the default rate control scheme in the MPEG-4, which utilizes TM5 for BL and the uniform bit allocation scheme for EL. We see th at the proposed CQVRC has a significant improvement in terms of the average PSNR value as well as the PSNR variation. 3.7 Conclusion In this chapter, two constrained VBR rate control techniques were proposed for the non-scalable MPEG4 and the scalable MPEG4-FGS codecs. Extensive simula tions were performed to demonstrate that the proposed CQVRC can provide much smoother quality than existing rate control schemes suggested in today’s standards when there is a significant amount of variations in the source and the channel. Given 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the low complexity of the proposed schemes (especially CQVRC for the MPEG4- FGS codec), they are ready for implementation in a real world video delivery system. It provides a promising technique to enhance the current M PEG4 standard. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Quality Actual Quality Desired Quality Frame No/Time. (a) Demanding bit-allocation o f each traxne for const, quality Rate \ V Const. Avg. BW o f each frame due to buffer Frame No/Timi (b) .owable Buffer status Buffer Overflow Buffer status lax. Buffer Frame No/Time (c) Figure 3.1: Relationship between (a) frame quality, (b) bit allocation of each frame, and (c) the corresponding buffer status. 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E L bits A EL bits PSNR PSNR Frame No. PSNR' Frame No. Frame No. Figure 3.2: The impact of different rate allocations of EL on the PSNR value: (left) BL (base layer) only, (center) fixed allocation of EL and (right) allocation of EL to achieve constant quality. Fine-granular scalable enhancement layer MPEG-4 base layer 7 0 F1 R e a l F 1 5 R e a l F 3 0 R e a l F1 In te rp F 1 5 In te rp F 3 0 In terp 6 0 - 50 3 0 - 20 Figure 3.3: (a) The scalability structure of MPEG-4 FGS and (b) the comparison of interpolated and real (i.e. emprical) distortion values. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C onstant Q u ality Video Rate Adaptation - BL & P rioritized D elivery with Network Adaptation R-D embedding Pre-processing & MPEG-4 PGS Codec EL Constant Quality Rate Adaptation EL owr- encoding Delivery with Network Adaptation P rioritized Packeti zation Video Pre processing Pre encoding BL encoding Network M onitoring & Feedback Dandling Frame Esi mated Total Bandwidth Complexity (Quality BL Constant Quality Rate Control onstraint) GOP segment inform ation Target Minimum Bandwidth (Network onstraint Target Minimum B uffer Size (Buffer onstraint C onstant Q u ality Video Rate Control - BL Figure 3.4: System overview of MPEG-4 FGS video streaming with CQVRC. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dropped frames f 1 31 61 91 121 151 181 211 241 271 (a) 0.25s Frame No. ■ dropped frames (b) 1s Frame No. 271 Frame No. 31 61 91 121 151 (d) 2.5s. 181 21 1 241 271 Frame No. Figure 3.5: Frame dropping by MPEG4 VM rate control at different pre-loading delay (a) 0.25s; (b) Is; (c) 2s; and (d) 2.5s. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CQVRC-2s TM5 MPEG4-VM Frame No. CQVRC-2S CQVRC-ls CQVRC-0.5S 91 1 1 1 21 31 41 51 61 71 81 Frame No. (b) Figure 3.6: Comparison of the proposed CQVRC with TM5, and MPEG4 VM for the “Foreman” sequence at lOfps, 128kbps, under different pre-loading cases. The average PSNR of CQVRC-0.5s is 30.86db, CQVRC-l.Os is 30.92db and CQVRC-2.0s is 30.99db respectively. The average PSNR of MPEG4-VM and TM5 is 30.43db and 30.34db, respectively. 6 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 45 40 - < § 35 - 30 - 11 71 1 21 61 81 91 31 41 51 Frame No. Figure 3.7: Comparison of the proposed CQVRC, TM5, and M PEG4 VM for the cascaded “Akiyo” and “Foreman” sequence at lOfps, 128kbps. o n O h Frame No. 37 BL128 — ® — BL+EL128 BL+EL256 35 33 31 29 27 25 1 1 1 21 31 41 51 61 71 81 91 Figure 3.8: Quality variation in MPEG4-FGS encoded video with an even bit dis tribution in EL. 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ■ ' S ' ..... 3 - ■ . . . r * f / —•— FGSRC — — BL / - 1 11 21 31 41 51 61 71 81 91 Frames Figure 3.9: The PSNR comparison of the proposed CQVRC and FGSRC for EL under 160 kbps. The maximal PSNR difference of CQVRC is 0.38db for the first scene, and 2.2db for the second scene. The maximal PSNR difference of FGSRC is 2.8db for the first scene, and 6.4db for the second scene. 34- X> • o 30- 71 81 31 61 91 1 1 1 21 41 51 Fraire Figure 3.10: The PSNR comparison of the proposed CQVRC and FGSRC for EL under 320 kbps. The maximal PSNR difference of CQVRC is 0.4db for the first scene, and 0.7db for the second scene. The maximal PSNR difference of FGSRC is 2.5db for the first scene, and 7.2db for the second scene. 6 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 34 - 0 0 Oh '♦—CQVRC ■ * — FGSRC -*—BL 71 81 11 21 31 41 51 61 91 Frames Figure 3.11: The PSNR comparison of the proposed CQVRC and FGSRC for EL under 480 kbps. The maximal PSNR difference of CQVRC is 0.3db for the first scene, and 0.9db for the second scene. The maximal PSNR difference of FGSRC is 2.0db for the first scene, and 8.1db for the second scene. 40 38 32 30 FGSRC B L 71 81 91 1 1 1 21 41 61 3 1 5 1 Frames Figure 3.12: The PSNR comparison of the proposed CQVRC and FGSRC for EL under 600 kbps. The maximal PSNR difference of CQVRC is 0.3db for the first scene, and 0.6db for the second scene. The maximal PSNR difference of FGSRC is 2.2db for the first scene, and 9.0db for the second scene. 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) CQVRC Fram e No Figure 3.13: (a) The trace of a variable bit rate channel with bandw idth change period of 1 second and (b) the PSNR comparison of CVQRC and FGSRC. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CQVRC FGSRC 34 - z to 0. 28 - 1 31 91 121 151 181 211 241 271 61 Fram e No. (b) Figure 3.14: (a) The trace of a variable bit rate channel with bandw idth change period of 2 seconds and (b) the PSNR comparison of CVQRC and FGSRC. 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fram e No. (b) Figure 3.15: (a) The trace of a variable bit rate channel with bandw idth change period of 5 seconds and (b) the PSNR comparison of CVQRC and FGSRC. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 37 - S' 35 - * o % 33 - 1 3 1 - BL128 BL128+EL256 BL128+EL128 29 - 1 11 21 31 41 51 61 71 81 91 frame No Figure 3.16: Quality variation in MPEG4-FGS coded video with constant quality BL, even bit distribution for EL and no adaptive frame grouping. TM 5+FGSRCEL T M5 +CQ V R C E L • - CQVRC BL+CQVRC EL Frame No 11 21 31 41 51 61 71 81 91 Figure 3.17: The PSNR comparison of different rate control options of BL and EL. 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Content Aware Rate Control 4.1 Introduction In this chapter, by combining the encoding pre-processing (i.e., adaptive frame grouping and associated measures on the spatial/tem poral quality contribution per frame) with the temporal-adaptive scalable video codec followed by the content- aware rate control, the proposed streaming solution is tackling the problem of how to keep the smooth video quality even when there is a significant change in video source or connection bandwidth. Moreover, for streaming pre-compressed scalable video, prior knowledge of future frame, longer pre-loading time and larger user buffer are allowed and give us more flexibility for rate control. And the proposed video codec can provide the adaptive temporal scalability as well as the SN R/spatial scal ability. Utilizing these scalabilities and the content-awareness, the content aware rate controller, which is entirely isolated from the encoder, can easily meet the di verse demands from the heterogeneous network connections and the individual users. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Meanwhile, the maximal visual quality is preserved by utilizing the spatio-temporal quality tradeoff, which is based on the spatial and temporal (i.e., motion-based) ac tivity analysis of the video pre-processing and encoding. Finally, we show that when combined with packet filtering approach, the proposed CA rate control can be more efficiently implemented by carefully assigning relative priority index (RPI) for each packet. The conducted simulations to verify the proposed solution dem onstrate the flexibility and efficiency of the proposed solution. 4.2 M otivation W ith this proposed scalable video coder, the encoded stream can be arbitrarily trun cated according to network bandwidth and receiver buffer constraints. Consequently, the video coding and video transmission process can be totally isolated. This isola tion is im portant, since the transmission bandwidth, the individual user preference and its decoder buffer status are not known at the encoding time. However, at the transmission time, the quality of service (QoS) constraints and user preference can be easily met at any intermediate transmission node. Thus, the rate control problem of scalable video is simply to determine how to optimally truncate the over-coded video to achieve the highest visual quality for given transmission bandwidth and these QOS constraint and user requirements. Basically, it involves decisions of the frame rate and SNR quality layer. When the bit budget is not sufficient for both 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. full frame rate and reasonable spatial quality, it is desirable to tradeoff both the temporal and spatial quality to achieve best visual quality. This can be achieved by defining one generalized distortion measure for both the spatial and temporal ones, the conventional rate-distortion optimization problem can then be formulated with this generalized distortion measure and solved by using the discrete Langrange multiplier. Based on the location (i.e. at the outbound of server, at the active networking routers, and finally at the end user) where it is performed, the rate adaptation scheme can be classified into the server-oriented, the receiver-driven and the hybrid approaches. The server oriented rate control approach is much simplified with the help of scalable video coding by simply truncating the embedded bitstream according to the available bandwidth. The receiver driven approach entirely relies on the receiver decision and provides rate control of coarser granularity than the server-oriented approach. By the hybrid approach, we mean that both the server and client play a role in rate control. As more active networks being deployed, the network can also perform rate control by filtering packets based on the defined packet priority at the active routing component. Regardless of its complexity, we believe th at the hybrid approach is most probable to get success since it provides both coarse and fine granularity rate control. The proposed content-aware framework can work with instantaneous rate con trol. However, a higher gain can be achieved by allowing a reasonable amount of 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pre-loading time and an equivalent receiver buffer size. W ith the help of the re ceiver buffer, the short-term burst can be absorbed. In particular, for the off-line streaming scenario, a user may wish to trade several seconds in pre-loading delay for better streaming quality. This implies th at a much larger buffer than real-time video communication applications can be utilized. This feature has the dramatic impacts on rate control schemes that were tailored for real-time video communica tion applications with a tight delay constraint (hence, a very small receive buffer). By relaxing this constraint, rate control can be more flexible, and the scene level rate control across several GOPs is thus allowed. This approach tends to provide a more global view for the underlying transm itted video source than the real time communication scenario, where only one or few frames are taken into account for the rate control decision. 4.3 Pre-processing and Encoding As described in Chapter 2 section 6, the video should be temporally segmented into different relatively uniform regions, then the different frame should be prioritized based on the features obtained from the results of either preprocessing or encoding or both. The individual features adopted should be determined by the applications. For example, if on-line encoding and transmission is required, only a small look ahead is allowed to determine the scene change, the measures such as SQC and TQC associated with each GOP will not be available beforehand. The proposed 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scheme can benefit more for streaming pre-compressed video where all the features from content analysis is available and can be utilized. Here, our work is supposed in this scenario. In the motion-based pre-processing, the sequence is first divided into variable size segments (i.e., G O P’s) using MAD feature. Then, RQC for each frame in GOP is associated by HoD feature. The encoding with the differential JPEG-2000 coder is performed and the TQ C/SQ C is associated to each GOP utilizing both the MAD and encoding statistics. 4.3.1 M otion-based A daptive Frame Grouping The scene change and global motion activity can be captured from MAD (mean abso lute difference) curve between consecutive two frames. W ith M A D (i ) = M A D (fi, /;_]), the scene changes are to be located at local maxima of MAD(n). For each frame i, the second-order differential is calculated as MAD"(i) = d2(M AD(i)) = M A D ^ + A + M A D (j _ !) _ 2 . MAD(i) (4.1) (aiy By interpreting this, the scene change is located and the new GOP is started. Be sides, to handle slow and steady scene transitions, the accumulated difference with respect to the initial frame is checked against a predefined threshold. Hence, the sequence is efficiently segmented into temporally uniform G O P’s (in global sense) and this motion characteristic of each GOP is utilized for TQC later. Note, though 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the size of each GOP is design parameter affected by its underlying source variation, we typically use 30 frames as the average GOP size for 30 fps source. Then, since the scene change occurs every 3 to 9 secs for typical video programs [15], only one of three or even nine GOPs is afFected by scene changes. Other GOPs are kept around 30 frames, which again further divided by the following consideration for each individual frame. 4.3.2 Frame T ype D ecision and R elative Q uality Contribution A ssociation W ithin each GOP, HoD (histogram of difference) derived measure Dh(fn,fm ) — (52i>\thresh\ H oD(i)) / Npixei is adopted as the motion measure for each frame. It is very sensitive to local object motion. Similar to scene change detection, the local maxima of Dh can be obtained by calculating the second-order differential. Again, identified local maxima and accumulated-difference overflow points are decided into P frames. Hence, within a GOP, frames are identified into I(intra for initial frame), P(prediction), and D(droppable) frames. Thus, a GOP is further divided into several sub-GOPs by initial I frame and P frames. Under these sub-GOPs, each frame in a GOP can be indexed by (g,s,d ), the GOP index, the sub-GOP index, and the droppable frame index. RQC for s-th P-frame (i.e., d = 0) is equal to RQCjf’^ = RQC9 — (s — 1), where RQC9 is the predefined base value for P frames in a GOP g. 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. By subtracting the order s from the base value, the later P frames will have lower RQC’s since they refer to the previous P frames. M S L ( D l ) = H 0 D (P 1, D 2) M SL(D 2) = H 0 D (D 1, D 3) M S L ( D 3) = H O D ( D 2 , P2) Figure 4.1: Example of MSL calculation The remaining frames, D frames are individually predicted from preceding P frames and their RQC’s are determined by the following iteration. For each D frame between I/P frames, we are searching for D frame with the lowest RQ C( f ’ si first and move on next lowest. D frames are thus organized in the order of RQC and it enables the gradual dropping of frames while keeping the overall motion smoothness as much as possible. As an example shown in Fig. 4.1, there are 3 D frames, namely Di,D 2, and D3 between Pi and P2 frames. We define the motion smoothness loss (MSL) as the caused motion change by dropping a frame, which is illustrated in Fig. 4.1, Then, the complete RQC association is iteration of searching D frame with lowest MSL by Dm = arg min{MSL{Dd)} and taking out the iteration index i from d RQCi9's) 4.3.3 H ighly scalable differential JPEG 2000 video codec Based on the simplified JPEG-2000 compression core, a scalable and robust wavelet video solution is proposed based on the motion compensation prediction (MCP) 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. approach [75]. By incorporating the wavelet block-based JPEG-2000 core, the spa tial/SN R scalability is naturally tied with our scheme. One big problem for SNR scalability of video coding is the drift problem, in which, the reference frame at the encoder may be different from the one at the decoder side since the reference frame at the decoder may be decoded partially. In our scheme, one layering motion estimation and compensation is performed in the wavelet domain to avoid the drift problem. The proposed video codec can achieve temporal, SNR and spatial scala bility simultaneously. Also, the codec is quite robust to bit-error and packet loss since no inter-subband relationship is exploited as the famous zerotree-style wavelet codec. The core of JPEG-2000 is based on a scheme called EBCOT (Embedded Block Coding with Optimized Truncation). Compared with the conventional tree-based wavelet approach, EBCOT is a block-based coder th at encodes wavelet coefficients in fixed-size blocks independently. After all blocks are encoded, distortion thresh olds for all blocks at a specified bit rate are found iteratively. Then, all block bit streams are truncated by imposing the rate-distortion rule. A fine granular scala bility is achieved by aggregating a bunch of stream layers at different bit rates to form the desired stream. Finally, the target output rate is met by rate-distortion optimized truncation. However, the entire coding process and the bit stream struc ture are so complicated that its application may be restricted to still image coding only. The block-based JPEG-2000 core is simplified by modifying several fine-tuning 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. procedures of EBCOT. It adopts block skipping during bit-plane coding of blocks and defines the minimal data unit (MDU) as the bit-plane of a subband. A global header is constructed to indicate the length of each MDU so th at each compressed MDU can be accessed without any extra decoding operations [1], To compress the color images, the 4:1:1 down-sampled Y, U and V components are transformed and encoded separately. Then, the rate-distortion (R-D) slopes for Y/U/V components associated with each MDU are calculated. The final stream is organized based on the calculated R-D slope where the MDU stream with larger slopes is transm itted first. There is no need for bit allocation among Y, U and V components and the obtained stream is R-D optimized with SNR and spatial scalabilities. One major problem of the rate scalable video codec is how to synchronize the reference frame for motion compensation between the encoder and the decoder, re gardless of the available bandwidth. MPEG-4 FGS scheme [35] circumvents this problem by simply locking the encoder and the decoder to the fixed and guaran teed rate (i.e. the base layer only) while temporal redundancy is not exploited for enhancement layers. In [1] and [75], motion estimation is performed in the wavelet domain by using fixed-size blocks. To keep consistency of reference frames at both the encoder and the decoder, strict layer dependency is imposed in the way that Intra(I), Predicted (P) and Discardable (D) frames within each GOP are encoded into the same bit-plane layer. Different bit-planes are separately compensated to generate a multiple-layer video bit-stream. More precisely, the residual block can 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be obtained by calculating the bit-by-bit difference between a coefficient in the cur rent block and that in the reference block. Therefore, the layered structure in the prediction residue is kept in the layering motion compensation scheme. 4.3.4 Spatial/T em poral Quality Contribution A ssociation In the compressed stream, the statistical measures collected from pre-processing and encoding procedure is also embedded for the rate control procedure. First, the TQC, the motion activity of a GOP, is represented by the ratio between the mean of the absolute HoD values in each GOP and mean of the whole sequence, defined as, defined as where Havg represents the average HoD value for the whole sequence and Hgop(g) represents the sum of the total HoD values in GOP g (which also indicates the total motion activity of this GOP). Then, SQC, the spatial quality contribution of each GOP, is described by the average coding distortion for each GOP such as where Len(g) represents the number of frame in GOP g and M SEi represents the spatial distortion in MSE measure for frame i which is variable depending on the kept spatial layers. The obtained TQ C ’s and RQC’s are stored in the header part TQ C (ff) - Hgop(g)/Havg, (4.2) SQC{9) = ^ 2 ( M S E i)/Len (g) (4.3) 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of each GOP. However, the SQC is not directly stored. Instead, we introduce the following rate distortion information embedding. 4.3.5 R ate/D istortion Inform ation Em bedding To facilitate the quality aware rate control, the rate and distortion param eters asso ciated with each bit-plane are also embedded to bit-stream. For each bitplane, the bit usage and its incurred distortion reduction (i.e., in MSE sense) are written to the header of enhancement layer of each bit-plane. Besides, the frame distortion from base layer only is also embedded. There are several advantages with this embedded distortion and rate information, most obviously, the constant quality can be easily achieved by referring to this information. In addition, either the frame rate for each GOP or the quality layer within one GOP can be optimally selected as described in Section 4. The frame rate optimization can then be performed among GOPs with one generalized distortion defined for both temporal frame rate and spatial distortion. 4.4 Content Aware Rate Control (CARC) Temporal quality degradation consists of two major artifacts, i.e. flickering and mo tion jerkiness[55]. The flickering artifact is caused by spatial quality fluctuation while motion jerkiness occurs when the frame rate goes below a certain threshold. Hence, to achieve good temporal quality, a rate control scheme should achieve constant 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spatial quality as well as a reasonable frame rate. W hen the available bandwidth is not sufficient for both the high frame rate and spatial quality, spatial and temporal qualities should be adjusted to achieve the spatio-temporal tradeoff. The embedded statistical measures such as spatial/tem poral quality and the rate information are utilized by CARC. Ideally, the adjustment of temporal quality (i.e., frame-rate) should adapt to underlying motion activities, and it has to be coordinated with the spatial quality. By defining one cost function th at includes both spatial quality (i.e. the PSNR value of coded frames) and the frame rate for each GOP, the overall quality of each GOP can be described by this cost function. Note th at the objective video quality evaluation is a very complicated problem, which involves the understanding of the human visual model. Currently, in the ITU society, there is one expert group called VQEG [63] th at is specially involved in defining the quality evaluation measure for video. Although several models have been proposed and evaluated, there is still no clear winner. When this measure is available, it can be leveraged in our current framework. A three-level rate control scheme is proposed in our approach. The first level is the across-GOP frame-rate control that determines how many frames should be transm itted for each GOP and its bit allocation. The second level is the within-GOP frame-rate control to determine which frame should be preserved and transm itted. The third level is the frame-level rate control th at determines the transm itted quality 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. layer and subband/bitplane when only partial bitplane can be transm itted. The hierarchical structure of the proposed rate control is illustrated in Fig. 4.2. U n . Fri>Fr2> C ioP2 ; (jtoPj, (ioP4 ► i f R 3, f R 4... — RQQ, RQQ, X - X RQQ? RQQ - ™ transmit L a a g ^ C g m J |gg^ o i ' partial _ v. discard * p ri b t i a j s n i gn ■ p r f!e r v ® d p*pi fn —; r “] — | “ |— i ” * ■ subbands I discard MDU subbands Figure 4.2: Four level hierarchical rate control for CARC. 4.4.1 A cross-G O P Frame-rate Control The major purpose of across-GOP frame rate control is to determine the frame-rate F R ^ for GOP of index g based on the overall tem poral/spatial quality contribution. 4.4.1.1 Heuristic Frame R ate Control However, it is very difficult to determine an optimal frame-rate under the channel bandwidth constraint since one can have multiple choices for spatial/tem poral qual ity levels. To make this problem more tractable, we first separate the decision of temporal and spatial quality levels into two sequential steps. Before a formal formulation, some empirical rules are considered below. If the spatial or the temporal quality goes below a threshold, either motion jerkiness or 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. spatial artifacts will be clearly visible. By simulation, if the spatial quality is higher than a quality threshold, it is visually more pleasant to increase the frame-rate than to increase the spatial quality alone. Based on this, we propose the following twin- threshold frame-rate decision rule. The twin thresholds refer to the base spatial quality threshold T H s q - b a s e and the full-frame-rate quality threshold T H Tq - FULL defined by quantization param eter (QP) associated with the bitplane level. The base spatial quality means the lowest spatial quality that should be guaranteed. The full-frame-rate quality threshold means th at if the spatial quality is higher than this threshold, the full frame-rate should be preserved. Hence, with these twin-thresholds, the complete algorithm for the frame-rate decision can be stated as follows. • Step 1: We attem pt to preserve the full frame-rate. T hat is, if the QP < T H t q - f u l l , the frame rate is set to the full frame-rate (e.g., 30 fps). Other wise, go to Step 2. • Step 2: Calculate the frame-rate based on the following described spatial and quality tradeoff. If QP < T H s q - b a s e , take it as the final frame-rate. O ther wise, go to Step 3. • Step 3: Adjust the frame-rate to achieve the base spatial quality, reduce F R g by SFRcur = 0.3 • F R C U r at each adjustment until transm itted QP is less than T H s q - b a s e - 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Based on TQC and SQC for each GOP g, the overall quality contribution is defined as the weighted average of TQC and SQC such as QC^> = a -T Q C ^ + (1 — a) ■ SQC^9\ Then, the frame-rate should be equal to F R ® = min(FR(CB) ■ l/Q C {a) + (TQC® - 1) x FR(CB)/ 3, 30), where FR(CB) is the predefined frame-rate function based on bandw idth CB. Ba sically, it reflects the frame rate assigned for the GOP with the average complexity under channel bandwidth CB ■ It is also affected by the user preference and the en coding frame-rate F R enc. It follows the observation that a large tem poral complexity requires a higher frame-rate while a high spatial complexity prohibits a higher frame- rate. Hence, the frame-rate is selected adaptively based on the underlying motion and texture details. 4.4.1.2 Frame R ate Control w ith Constrained R-D Optim al B it Allocation Different from the above heuristic approach, we adopt the R-D optimized frame rate assignment [78]. The cost function considers both temporal and spatial qualities by following the rationale described in [38], It is defined as 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where A is a param eter that balances the relative importance between the individ ual frame quality and the frame rate, which are weighted by TQC according to the perceptual rationale. The above generalized function is not specially tailored to ob jective quality evaluation since the human visual model is not seriously considered here. However, it heuristically makes sense since both spatial and temporal quali ties are both taken into account, and it is supported by the underlying perceptual rationale th at “the importance of temporal frequency resolution relative to spatial frequency resolution increases as motion increases in the scene” [38]. W ith this generalized distortion function, the optimal frame rate under a given channel bandwidth is simply to minimize GD(g). T hat is, we have {FR{g)} = {argmin}FR{g){GD{g) = argminFR(g){ ^ ^ ■ SQC(g) + (4-5) subject to the given bandwidth constraint. Obviously, there is a general relationship between the frame rate and SQCg (represented in PSNR values of coded frames). T hat is, PSNR values of coded frames monotonically decrease as the frame rate increases. In Fig. ??, we show the average PSNR values of coded salesman sequence versus the frame rate from 5fps to 30 fps under a bandwidth of 12kbps, 24kbps and 48kbps, respectively. Then, under a given bandwidth, the minimal GD (s) can be found by exhaustive search of points in the corresponding PSNR-FR curve. Note, when the total bit budget is determined for the current GOP, the corresponding PSNR-FR curve is easily built based on the embedded rate distortion information in 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. our approach. T hat is, if the total bit budget of GOP is B(g), its length is and the frame rate is F R (s), the assigned bit rate for each inter-frame can be approximated by (B(g) — B (g)(I))/(FR(g) * ). Then, the PSNR value for each coded frames can be interpolated by the embedded rate distortion information. Moreover, unlike interactive video applications, streaming video is mostly delay- tolerable and an order of seconds of start-up latency is allowed in exchange for better quality. More importantly, for stored video streaming, the apriori knowledge of frame sizes of the stream can be utilized to reduce the high bandw idth requirement through going-ahead transmission of frames of a large size to the client buffer. In the literature, most bandwidth smoothing schemes [50] take advantages of the prior knowledge of video frames. Then, those schemes use a client-side buffer along with the prior knowledge to schedule video delivery to minimize metrics such as the peak bandwidth requirements. Therefore, bandwidth smoothing schemes cannot be directly applied for video delivery over the Internet since the generated transmission plan cannot be guaranteed under the best-effort network. However, the notion of bandwidth smoothing can be extended to the best-effort networks. T hat is, our rate control protocol not only utilizes network feedback but also the prior knowledge available from stored video to adjust the tem poral/spatial quality. One advantage of this treatm ent can be illustrated by a simple example, where the video segment includes several easy scenes followed by difficult scenes. W ithout knowing the existence of these difficult scenes, difficult scenes will have a lower 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. frame rate and lower PSNR values since the instantaneous bandwidth is insufficient for those difficult frames. However, if we can shift some budget from easy scenes to those difficult scenes, the overall quality can be improved. Therefore, for streaming video, we can treat one GOP as the basic control unit and formulate the rate control problem across GOPs as the independent optimization problem as where the Te and Ts are the start and the end times of the sliding window, re spectively, and betdi is the accumulated buffer amount from the last window, which works as the smoothing factor. As an example, when the sliding window passes from an easy scene to a diffi cult scene, betdi is positive since the bit budget is shifted from the easy scene to the difficult one. The value E {B C W } is the estimated average bandwidth for the current window. Practically, this optimization can be performed within one sliding window, and the total bit budget for this window is determined by the available bandwidth (ABW) and the duration of the window. However, ABW is time varying for the Internet. Hence, even instantaneous ABW is determined, the total bit bud get available for this sliding window is still not yet determined without bandwidth drgminRi{ ^ 2 GD m in (Ri)h (4.6) subject to Ri < E {B C W } ■ (Te — Ts) + /% , (4.7) 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. prediction. There are several bandwidth prediction schemes proposed [26]. Here, we take a simple first-order prediction model, where the channel bandwidth is predicted as where a is the slope of the approximating line to minimize the mean square error of the instantaneous bandwidth, and TW in is the duration of the current sliding window. This predicted bandwidth will be used to calculate the rate constraint given in (4.6). The Lagrange multiplier method can be formulated to solve the above indepen dent optimization problem, i.e. optimizing the objective function This problem can be solved by a bisection search of A as described in ??. Based on the calculated A j, the bit budget R% assigned to each GO Pi can be determined. At this stage, both the frame rate and the bit budget for the GOP of concern is known, rate control within one GOP will then take place as follows. 4.4.2 Inside-G O P frame rate control The purpose of inside-GOP frame rate control is to determine which frames should be transm itted in the specified GOP. After determining the frame-rate, the number of transm itted frames in each GOP is known as Fffi = F R (,J ) ■ L ^ / F R enc. However, E { B Cw } — E { R p w } + ot • T yn , (4.8) Ji — GDi + A (4.9) 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which frames should be transm itted is still a question. In M PEG-x and H26x stan dard, the I and P frame should be transm itted anyway, hence the selection can be straightforward selection of B frame in even interval. However, this simple selection does not take the individual frame characteristics into consideration. In this work, based on the RQC of the motion smoothness, the selection is performed by trans m itting the frames with higher RQC first. T hat is, the transm itted frame jF * should be {Fi} = {Fi\RQC(Fi) > RQ Q - F ( N 9)}. 4.4.3 G O P-bitplane rate control W hen the bit budget and frame rate of the concerned GOP is known, one simple rate control proposed in M PEG4 FGS works by distributing the available bit budget within enhancement layer evenly. Then each frame is truncated based on this as signed bit budget. However, this blind bit allocation among frames may cause large quality variation due to 1) the large quality variation in base layer and 2) the differ ent bit demand among the enhancement layers. However, with the help of embedded PSNR values, within each GOP, we attem pt to minimize the quality variation. T hat is, for each GOP, the distortion of transm itted frame should be generated by {L} = argminxj Y ] N |A (A ) - A -td (A -td )|, (4-10) o=i where A is the bit-plane layer number to which the frame i is decoded. Instead of seeking the optimal solution for this, a sub-optimal solution is proposed as follows. 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For the transm itted frames, the same number of bit-plane layers is taken as much as possible. When the bit budget is not sufficient to decode all the frames at the same bit-plane layer, the next layer, subband-bitplane (i.e., MDU) layer, should take control. MDU can take the whole GOP or the sub-GOP as the control unit. The transm itted bit-stream should be in the subband order. T hat is, the lowest sub band (LL) for all frames should be transm itted first. If there is still available bandwidth, the second lowest subband (HL, LH and HH) should be transm itted next. For example, the remaining bandwidth is Cr , then the main problem is determine the subband S by {S} — {51 ]G/=i LfJ2s=i ^ ( / i t s) — Cr < i i LfJ2s=i 5 ( /, l, s). where the N is the number of transm itted frame determined by frame rate control, L is fully transm itted bitplane level determined by GOP-bitplane rate control, and R (f,L i,s ) is the bitrate for frame f transm itted to bitplane L and subband s. 4.5 Layered RPI Generation and Content Aware Filtering The video stream prepared by the rate control module is then packetized into a single packet by packing one or several subband bitplane data. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5.1 Layered Packetization and Priority A ssignm ent For transmission efficiency, the packet size cannot be too small. However, if packet size is too large, error recovery due to packet loss becomes more difficult. We limit our packet size to maximal 256 bytes in this chapter. Besides, during packetization, we obey the following two rules. (1) The bitstream of different frames will not be packetized into one packet. (2) Subbands with different orientations (LL, HL, LH and HH) will not be packetized into the same packet. Rule (1) guarantees th at a single packet loss will not get across over two frames. Rule (2) seems to be conflicting with the conventional approach which packetizes the bitstream layer by layer. Under the conventional approach, if some lower layer (i.e. more significant layer) get lost, all higher layers will be discarded. However, with the proposed orientation-first packetization approach, dependency between packets is largely reduced since except LL subbands, HL, LH and HH subbands are independent of each other. Thus, the packet loss from any subbands of HL, LH or HH will not affect packets of other subbands. As done for the rate control part, since the subband-bitplane is the basic data unit, the packetization granularity is also in the unit of subband-bitplane. Thus, there is no single subband-bitplane belonging to two packets. Besides, packet dependency is present not only between packets of the same frame but also packets from different frames due to the adopted temporal prediction structure. In [5], frame dependency is shown as the directed acyclic dependency graph. Then, packet loss will only affect its descendants but not its siblings. 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Subsequently, by evaluating the anticipated loss impact of each packet on the end- to-end video quality (to cover the loss impact of its own and depending packets), we assign a varying priority per packet within the priority range for each layer. The resulting priority is called the layered relative priority index (layered-RPI). Note that the priority assignment for one packet should be adaptive to the media adaptation module since not only concerned packet itself but also its child packets will affect the assigned priority. The number of child packets and their distortions are varying with the output of the media adaptation module. Our priority decision works as follows. First, the distortion reduction A D by each packet is calculated by embedded R-D information and linear interpolation. Then, the unit distortion (UD) reduction is calculated as ud — ^ . Then, we define our final packet priority as seSDi where SDi is the set of descendants of packet z, which can be obtained with help of the dependency graph. 4.5.2 Content-Aware Packet Filtering During the congestion period, packets with a lower priority will get dropped by the intermediate router first. In addition, edge routers can also actively filter packets when it detects potential congestion. If edge routers can understand the semantics of transm itted video such as I, P and S frames, content-aware rate filtering can (4.11) 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be performed in the packet level. Packets can be selected to drop for spatial and temporal tradeoff as done in the server-oriented rate control approach. On the other hand, if only the packet priority is available, packet filtering can be performed in the order of this priority but without the content-aware aspect. 4.6 Simulation Results To evaluate the performance of the proposed video streaming system, the following simulation setup is arranged. Six CIF MPEG-4 video clips (Akiyo, Foreman, Stefan, Coastguard, Mother-Daughter and News) of 150 frames each are cascaded to form a single test sequence. Another commercial TV newscast sequence is also utilized. 4.6.1 A daptive Frame Grouping and Quality Contribution A ssociation To verify the adaptive frame grouping and quality contribution association of the proposed pre-processing scheme, the temporal and spatial quality variations of the two test sequences are depicted in Figs. 4.3 and 4.4, respectively. In Fig. 4.3(a), the temporal variation of two test sequences, ‘commercial’ and ‘combined’, are illustrated in terms of MAD between consecutive frames. To handle these variations, one scene cut detection is performed and variable-length GOPs are formed. In Fig. 4.3(b), the 150-frames-long clip from Stefan CIF sequence is segmented into 5 GOPs by dash lines. It is clearly observed th a t even inside one 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 0 90 80 70 60 50 40 30 20 10 0 - - ♦ - - new scast - - ■ *- - combined Sequences t 1 I I 1 \ * coast- * ^ y H f o r e m a n f ste f -------------\--------u ------ I -------------^ ^ i guard * ; new s t ---1- _____jmother-_}_ _ Idaughted 1 j n : ~ r ~ - r r t ------- 3 0 25 20 1 |1 5 10 5 0 — V -V f i ■ V / s / i t v - v - - V " ------ ' V i i I f Frame No. (a) 1 13 25 37 49 61 73 85 97 109 121 133 145 Frame N o (b) Figure 4.3: The temporal variation of a sequence: (a) across GOP and (b) within one GOP. 12000 10000 -- 8 0 0 0 -- 1 2 % 6 0 0 0 - - - ■ § > 00 4 0 0 0 -- - 2 0 0 0 - < D 0 > C O tM C O © r- C O r- © © 3 C O r o <o © Figure 4.4: The spatial complexity of one sequence. GOP, the MAD value changes significantly. It indicates a variable amount of motion inside each frame, which can be represented by RQC. In Fig. 4.4, the variation of the spatial complexity of each GOP is illustrated by intra-coding all frames by DJP2K with the quantizer step-size set to 16. It is easily found th at even in each video clip there is significant change in spatial complexity (i.e. for the Foreman sequence, the beginning part is easily encoded while the end part is relatively difficult to code). However, after temporal segmentation, these frames will be encoded into different GOPs. Hence, the spatial complexity within 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. one GOP is kept to be similar, which allows us to apply one factor to characterize the spatial complexity for the whole GOP. The resulting spatial and tem poral quality contribution (SQC/TQC) of G O P’ s are depicted in Fig. 4.5. SQC 3.5 3 - 2.5 - 0.5 - 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 Figure 4.5: SQC/TQ C for each GOP. 4.6.2 Evaluation of CARC To evaluate whether the proposed rate control strategy gives a good tradeoff between the motion smoothness and individual image quality, the PSNR performance of the decoded sequence at different bit-rates has been analyzed under the following scenarios. In every simulation, we utilize the overall PSNR value based on the three-color components (Y, U and V) as the quality measure. It is defined as P S N R = 20 • log{255/(M S E Y ■ 4 + M S E V + M S E v )j 6}. A fair evaluation of variable frame-rate control is not trivial. Basically, rate control techniques th a t skip more frames will have a high average PSNR per coded 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. frame. Therefore, a straightforward average of the PSNR values of transm itted frames is not an accurate measure since distortions of skipped frames are not taken into account. In MPEG-4 rate control evaluation, every frame including the skipped frame is included in PSNR calculation. Unfortunately, skipped frames tend to de crease PSNR greatly and to overwhelm the overall PSNR increase due to frame skipping. This implies that, just with the PSNR measure, it may be favorable to keep the full frame-rate regardless of the current bandwidth in most cases. This is however not appropriate since it does not m atch with the received perceptual quality. Hence, we utilize a blind frame skipping (BFS) scheme as the reference for comparison. By BFS, we means a fixed-interval frame skipping scheme th at does not consider the underlying scene content. By keeping the number of transm itted frames the same, both BFS and CARC can be compared in terms of average PSNR and its standard variation. Two types of constant channel bandwidth (300 kbps and 100 kbps) are utilized in the following simulation. For 300 kbps, we have three scenarios: (A) keep the full frame-rate which is the base case for temporal smoothness, (B) BFS and (C) the proposed CARC scheme. While for 100 kbps channel, only scenarios B and C are simulated. Also, T H Tq - f u l l and T H s q - b a s e are set to be 20 and 40 in CARC, respectively. In Fig. 4.6, the frame rate of each GOP is shown for both 300 kbps and 100 kbps, respectively. Under the 300 kbps scenario, since the all the GOPs in ”MAD” and 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. —+ — Frame Rate (300kbps) - -x- - Frame Rate (100kbps) 30 - 25 -- 20 -- 1 4 7 10 1 3 16 19 22 25 28 31 34 37 40 43 46 Figure 4.6: The frame-rate determined by CARC under 300 kbps and 100 kbps. ’’akiyo” sequences can be transm itted to one more layer than QP equal to 20, hence the full frame-rate should be preserved for those sequences. In contrast, if the full frame-rate is preserved, the corresponding QP for the GOP for ’ ’Stefan” , ’ ’Foreman” , ’’Coastguard” and ’ ’News” will be larger than the T H s q - b a s e ■ Therefore, the QP of those sequences should be determined by the proposed tradeoff scheme. Note, even the selected frame rate by the proposed overall complexity, the corresponding QP of the transm itted bitplane for ’ ’Stefan” is still higher than the th& . Hence, the frame rate should be reduced further. In our simulation, the frame rate is reduced by AFeur = 0.3*Fc„r at each adjustment until th a t the transm itted QP is lower than the thb■ Fig. 4.7 shows the PSNR result for the whole combined sequence under three scenarios. In Table 4.1, the mean and variation of PSNR are shown. We see that the PSNR variation within each GOP is pretty small in all three cases, which means the CARC scheme can keep the distortion variation small even without the spatial and 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 45 43 41 _39 §37 £ * 3 5 §33 31 29 27 25 f s s - m _ . i « w _ *•••30 fps variable rate * x - - BFS I: iS**- <y* •♦••■CA-SVRC « BFS 200 400_ k l 600 Frame No 800 1000 200 400 600 Frame N o 800 1000 (a) (b) Figure 4.7: PSNR comparison for three schemes: (a) under 300 kbps with full frame- rate, CARC and BFS and (b) under 100 kbps with CARC and BFS. temporal quality tradeoff. More importantly, although the number of transm itted frames (625 frames) in CARC is slightly more than th a t of BFS (600 frames) for the 300 kbps channel, we observe that the overall PSNR of CARC is more than 1.3 db higher than th at of BFS. The quality variation in CARC is smaller than th a t in the Scenario A and th at of BFS. The PSNR value for the 100 kbps channel for CARC and BFS is shown in Fig. 4.7(b). The mean and the standard variation of PSNR are given in Table 4.2. CARC keeps 235 frames in total while BFS only keeps 225 frames (4 to 1 skipping). The overall PSNR of CARC is more than 1.8 db higher than BFS. The quality variation of CARC is also smaller than th a t of BFS. Since there are high motion and detailed textures inside the sequence, 300 kbps is actually not sufficient in transm itting the full frame-rate video for sequences like ’’Stefan” , ’ ’Foreman” , and ’ ’Coastguard” . As a result, there remain a considerable amount of spatial artifacts within those frames. As an example, the first frame of each clip is shown in the Figure 4.8 to indicate the particular spatial quality for each 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 4.1: Comparison of PSNR and its variation for full frame-rate, CARC, and BFS rate control under 300kbps_______________________________________ Sequence 300 kbps Full-frame CARC BFS PSNR O'PSNR PSNR OPSNR PSNR O PSNR Akiyo 41.41 0.3259 41.41 0.3259 43.47 0.5181 Foreman 31.59 1.1378 33.53 0.9340 32.87 1.0410 Stefan 25.52 0.3767 27.81 0.4383 26.03 0.3778 Coastguard 29.54 0.2399 32.23 0.3402 31.17 0.2653 MAD 39.00 0.1054 39.00 0.1054 41.06 0.0985 News 34.51 0.2105 36.09 0.2872 36.07 0.2879 Overall 33.59 5.46 36.49 4.23 35.11 5.93 Table 4.2: Comparison of PSNR and its variation for full frame-rate, CARC, and BFS rate control under 100 kbps.______________________________ Sequence 100 kbps CARC BFS PSNR OPSNR PSNR OPSNR Akiyo 37.70 0.1774 38.58 0.1663 Foreman 32.33 1.1415 31.60 1.1594 Stefan 26.63 0.4175 25.09 0.3952 Coastguard 31.00 0.2896 29.44 0.2340 MAD 37.43 0.1136 37.88 0.1071 News 32.51 0.1542 32.50 0.1520 Overall 34.32 3.6932 32.52 4.7091 clip for both full frame rate and CARC. It is clearly observed th at the proposed rate control can intelligently select the spatial and temporal quality level and, as a result, the overall quality is improved. 4.7 Conclusion In this chapter, one complete video streaming system is presented based on the highly scalable differential JPEG2000 video codec and Content-Aware scalable video 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Rate Controller (CARC). By jointly considering the underlying spatial and temporal complexity, the proposed CARC can keep more visually smooth video quality. The bitstream encoded by the proposed DJ2K video codec is very fine granular in terms of both spatial and temporal layers. Since the priority is well defined, it is very suitable for video transmission over prioritized network. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.8: Reconstructed image quality comparison between full frame-rate and CARC (upper for full frame-rate and lower for CARC). 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Fine Granular Video Streaming over DiffServ Networks 5.1 Introduction Streaming video over IP is challenged in this proposal by integrating the error re silient scalable source codec with the prioritized network transmission. The scalable codec is based on the ongoing standardized MPEG-4 finer granular scalable (FGS) coding scheme, which allows arbitrary truncation according to the given rate bud get. The video stream is then prioritized for prioritized dropping, where the rate adaptation is dynamically performed to meet the time-varying available bandwidth. This kind of prioritized stream can benefit from the QoS network such as IP dif ferentiated service (DiffServ). All key components, including FGS encoding, rate adaptation and packetization, error resilieny decoding and differentiated forwarding, are seamlessly integrated into one system. By focusing on the end-to-end quality, we 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. set both source and network parameters properly to achieve a superior performance of FGS video streaming. 5.2 M otivation The fine grain scalability (FGS) of MPEG-4 [35] is one big step towards the scalable video solution, where the base layer aims at providing the basic visual quality to meet the minimal user bandwidth while the scalable enhancement layer can be ar bitrarily truncated to meet heterogeneous network conditions. W ith the help of this scalable stream, video streaming is much simplified since all the transcoding over head required by non-scalable codecs can be bypassed. However, scalable coding only solves one part of the problem, since packet loss is common with the unpre dictable channel condition. To address this problem fully, both efficient scalable coding scheme and flexible delivery technique are needed. Researchers start with the current best-effort network model and find innova tive streaming solutions to mitigate the effect of unpredictable packet loss [67, 26]. That is, the application-layer QoS (Quality of Service) is provided to end users via rate adaptation and error control. Although rate adaptation and error control are traditionally investigated independently, they are actually affect each other and re cent work attem pts to integrate them together. Rate adaptation was performed in [71] to smoothly adjust the sending rate of MPEG-4 FGS coded video based on the estimated network bandwidth, and each packet is then protected unequally. 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. However, this sender-oriented rate adaptation is mainly suitable for unicast video rather than multicast one. Moreover, the FEC level for unequal error protection is decided in a heuristic manner (i.e. without optimization for end-to-end quality). One fully receiver-driven approach for joint rate adaptation and error control was proposed in [5] with pseudo-ARQ (automatic repeat request) layers. The sender in jects multiple source/channel layers into the network, where delayed channel layers (relative to the corresponding source layers) serve the packet recovery role. Each receiver performs rate adaptation and error control by subscribing a selected number of source/channel layers according to the receiver’s available bandw idth (ABW) and channel condition. However, as discussed in [11], the receiver-driven approach is sub ject to several drawbacks, including persistent instability in video quality, arbitrary unfairness with other sessions, and difficult receiver synchronization. On the other end, from the network infrastructure viewpoint, the trend is to pro mote more QoS support in network nodes (i.e. boundary or internal routers). Two representative approaches in the Internet Engineering Task Force (IETF) are the integrated services (IntServ) with the resource reservation protocol (RSVP) and the differentiated services (DiffServ or DS) [56, 7]. They are more suitable in accommo dating QoS requirements of different applications than the best-effort IP network. Between these two approaches, the DiffServ scheme provides a scalable solution with less complexity since IntServ intends to maintain the per-flow state across the whole 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. path for resource reservation. In the DiffServ model, resources are allocated differ ently for various aggregated traffic flows based on a set of priority bits (i.e. the DS byte). Thus, the DiffServ approach allows different QoS levels for different classes of aggregated traffic flows. Some work was performed on non-scalable (or coarse gran ular) video streaming over the QoS provision network, and a significant amount of gain obtained from unequal error protection (UEP) was usually claimed. However, the gain is often overhyped, and this is especially true when the source and network param eters do not m atch well. By applying error-resilient scalable source coding, constant quality rate adapta tion and packet prioritization to the DiffServ-based network, we tackle this problem from both the application and the network viewpoints. The scalable codec is based on the standardized MPEG-4 FGS coding, whose scalable stream can be truncated arbitrarily according to the rate budget. The main contribution of this work is the proposal of a coding system that integrates error-resilient MPEG-4 FGS coding and constant quality rate adaptation into the DiffServ network through R-D optimized fine granular packets. The basic idea of our proposed system is stated below. First, R-D sample points are generated and embedded in each bitplane of the MPEG-4 FGS enhancement layer (EL) in the encoding process. These data are interpolated to serve as the piecewise- linear R-D model of EL. W ith this R-D model, rate adaptation can be easily obtained to control the distortion. The MPEG-4 FGS base layer (BL) is known to be less 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. flexible and more sensitive to errors than EL [71]. Error resilient coding, source- / channel-level UEP, and optimal packetization [65, 68] can be employed to protect BL. In this work, we follow the MPEG-4 packetization principle by generating fixed- size packets for both BL and EL. The loss impact of BL packets to the end-to-end video quality is measured and the priority is assigned accordingly. Similarly, for EL packets, the loss impact (i.e. increased distortion) is calculated from the piecewise R- D model. W ith prioritized fine granular packets, more graceful quality degradation can be achieved than the work in [71], where packets are not differentiated. The proposed system also takes full advantage of applying the fine granular scal able video coding technique to the QoS-enabled DiffServ network. After a careful ex amination from both source and network viewpoints, an appropriate DiffServ model is chosen to efficiently handle the MPEG-4 FGS stream. To avoid a performance bias in favor of UEP over equal error protection (EEP), we make a fair compari son between UEP and EEP for both BL and EL by evaluating several deployment scenarios. The differentiated forwarding of FGS video shows good efficiency and sufficient flexibility to overcome the short-term network variation and packet loss from the low to the middle ranges. Thus, rate adaptation at the sender is required to match longer-term network variation only. This implies th a t rate adaptation can be done less frequently. 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Overview of the Proposed System Different QoS provisioning to applications of different characteristics is required to enhance the overall resource utilization while maximizing user’s satisfaction. The emerging DiffServ network provides different levels of services in a scalable manner. Current research efforts along this direction can be classified to two types: the absolute [56] and the relative [7, 8] service differentiation schemes. The absolute service differentiation scheme has a higher complexity due to the overhead of QoS provision. It trades flexibility for more guarantees. Dovrolis et al. [7] promoted the concept of relative differentiation, which provides a proportional service gap with their own proprietary scheduling. T hat is, a higher DS level1 provides a better (or at least not worse) performance in terms of queuing delay and packet loss. Two services are supported by IETF [2], i.e. the premium service (PS) that supports low loss and delay/jitter, and the assured service (AS) th a t provides QoS better than the best effort yet without guarantee. For streaming video applications, where encoding/decoding is more resilient to packet loss and delay fluctuations, AS seems to be a better match and we focus on AS of relative service differentiation for streaming video applications. Note that MPEG-4 FGS originally assumes guaran teed delivery (e.g. DiffServ PS) of the base layer (BL) and leaving the enhancement layer (EL) to the mercy of the best effort Internet. Here, we relax this requirement partly to allow a broader range of applicability and partly to avoid the high cost lThe ‘DS Level’ can be interpreted as the grade of quality provided to a group of packets with an identical DS codepoint in the IP header. I l l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of PS. There exist several ways to realize DiffServ AS, especially the proportional DiffServ. For a more detailed description, we refer to [8, 52], This research adopts the proportional DiffServ model described in [8] for network simulation. In terms of packet loss, the proportional DiffServ model demands that loss rates of different DS levels are spaced as Y = ~ , 1 < i , j < N , (5.1) h aj where ^ is the average loss rate for DS level i, and ix,, z = 1,..., N are loss differen tiation param eters ordered as a\ > a 2 > ■ ■ ■ > < T jy > 0. Network Node Transmission Encoding server End User RD Embedding BL Prioritization Video Preprocessing MPEG4 FGS Codec Rate Adaptation Bandwidth Estimator EL Packetization Prioritization DiffServ Model Figure 5.1: The overview of the proposed scalable video stream ing system with network QoS provisioning The overview of the proposed scalable video streaming system with network QoS (e.g. DiffServ) provisioning is shown in Fig. 5.1. The system consists of four key components: (1) scalable source encoding, (2) constant quality rate adaptation, (3) prioritized packetization and (4) differential forwarding [77, 78, 72]. They are briefly described below. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Scalable coding and rate adaptation The source is first encoded by the MPEG-4 FGS codec, where the estimated minimal bandwidth gives the bandwidth constraint for BL. In the coding pro cess, R-D samples are generated for each bitplane and embedded as the VOP user data. For non-real time applicatioins, the over-coded bitstream is pre- . stored in the streaming server. Upon the streaming request, the rate adap tation module takes place to scale the EL stream based on the feedback of the available bandwidth to preserve constant quality by referring to embedded R-D samples. • Prioritized packetization The rate-adapted stream that contains both BL and EL is then packetized. The fixed-length packetization is adopted for both EL and BL streams as recommended by MPEG-4 [35]. By evaluating the anticipated loss impact of each packet on the end-to-end video quality (i.e. considering the loss impact to itself and depending packets), we assign a priority index, called the relative priority index or RPI, to each packet within the priority range of each layer. The priority assignment of a packet is dynamically determined in the media adaptation module, since not only the packet of concern but also its children packets (which is actually dynamically varying) will affect the assignment. • Differential forwarding W ith the assigned priority, these packets are sent to the DiffServ network to 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. receive different forwarding treatm ent [52]. By mapping these prioritized pack ets to the different QoS DS levels, packets will experience different packet loss rates with this differential forwarding mechanism. This transport-level priority may be accompanied by the application-level priority such as FEC (forward error correction) and ARQ (Automatic ReQ uest). Besides prioritized dropping performed by DiffServ routers, traffic policing can be explicitly carried out at intermediate video gateways/filters (e.g. inside the active DiffServ routers or other special network devices) by using packet filtering. Thus, based on the assigned RPI, rate adaptation and error control with transport-level UEP can be jointly performed in the proposed system. 5.4 Fixed-length Packetization and Priority Assignm ent The packetization scheme has a significant effect on the efficiency and error-resiliency of video streaming. Two packetization schemes are used in the context of video streaming, i.e. the variable-length packetization (e.g. GOB packets of H.263+) and the fixed-length packetization (e.g. MPEG-4 video packets), where video packets of a similar length are formed. The packet size is related to efficiency and error- resilency, since a smaller packet size demands a higher overhead but is more resilient to errors. Recently, to improve error resiliency, a discrete optimization problem 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to minimize the distortion was formulated for the packetization of an embedded stream [68]. In this work, we follow the fixed-length packetization (FLP) due to the following considerations. First, FLP can avoid the inefficiency of GOB’ s of a very small size . More im portant, compared with GOB-based packetization, FLP automatically separates a relative large motion region into several packets and group several relatively static regions into one single packet. Split packets for a large motion region spread the loss impact to multiple packets, thus increasing error resiliency. Each packet is assigned with certain priority according to its impact to end- to-end visual quality. For different service preferences in term s of loss and delay, the priority can be further divided into the relative loss index (RLI) and the relative delay index (RDI) as given in [52], If the assigned priority reflects the impact of each packet to the end-to-end quality well, graceful quality degradation can be achieved by dropping packets with respect to the priority indices. To determine the packet priority with a low complexity is an active research area today. Several features such as the initial error strength (i.e. in MSE by assuming the packet loss concealed), its propagation via motion vectors and the spatial filtering effect were used to develop a corruption model in [27]. This approach leads to a good priority assignment result at the cost of a higher computational complexity. We adopt the following three rules in packetization: 1. the fixed-length packetization is used; 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. no stream from two frames can be packetized into the same packet for both BL and EL; 3. For EL, no stream from different bitplanes can be packetized into the same packet. For BL packets, we adopt the accurate priority instead of the approximated one. T hat is, we empirically measure the overall MSE when the packet of concern gets lost. Although this approach is somewhat complex, the priority of each packet reflects the actual impact to video quality. By doing so, we can clearly analyze the gain from differentiated forwarding. Also, since BL is normally determined by the minimal bandwidth and it is typically fixed, the priority of BL packets can be calculated off-line. For EL packets, the priority assignment is simplified due to the strict separation of frames along the temporal direction. The packet loss within EL only affects a single frame, and it does not propagate. The incurred distortion from each EL packet can be accurately calculated within each frame. The packet priority can be calculated as _ A Dj . , Pi~ A iV ( ^ where A Di represents the incurred distortion due to the specified loss, and A R4 is the rate of the packet of concern. In addition, the packet dependency has to be taken into consideration such that packets in more significant bitplane get lost, less 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. significant bitplane should be discarded anyway. Hence, the final packet should be calculated as R L l f L)= Y , Pt + K ’ (5-3) seSDi where SDi is the descendent set of packet i. By using the piecewise linear R-D model for each bitplane, the priority of EL packets can be easily calculated on-line during the packetization procedure. 5.5 Differentiated Forwarding of FGS Video In our system, the priority is associated with each individual packet instead of with layers (e.g. BL and EL) as done in previous work. W ith each packet assigned with priority, differentiated forwarding can be employed accordingly. In this section, we first describe the error resilient coding for both BL and EL in MPEG-4 FGS and then examine a DiffServ model to perform the differentiated forwarding mechanism for error-resilient bit streams. The BL of MPEG-4 FGS is obtained with non-scalable MPEG-4. Several error resilient tools have been standardized by MPEG-4. They include: the video packet, data partitioning (DP), reversible VLC, and cyclic/adaptive intra refresh (CIR/AIR) and NewPred. DP and RVLC are mainly for partial decoding of video packets with bit errors, and it is less relevant to the Internet environment where the packet loss 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is dominant. CIR and AIR are encoding choices where AIR enhances CIR by more intelligent refresh considering motion area. Here, both AIR and CIR are applied to the BL of MPEG-4 FGS, and their performance is compared for UEP as well as EEP scenarios. The adopted error concealment (EC) scheme is illustrated in Fig. 5.2. For lost MBs in the P-frame, their motion vectors are interpolated from those of upper and lower MBs. The following rules are used to handle some special cases: (1) If the missing MB is at the upper boundary of a frame, vectors (vl and v2) are set to zero; (2) If a lower MB is corrupted or the missing MB is at the lower boundary of a frame, vectors (v3 and v4) are set equal to v l and v2. As mentioned before, the fixed length packetization is applied to both BL and EL streams. y upper MB missing MB lower MB Figure 5.2: The error concealment scheme for lost MB recovery in BL. Differentiated forwarding of the non-scalable H.263+ stream was discussed in [52], where proportional DS levels were used in prioritized video streaming. For scalable MPEG-4 FGS, it is usually assumed that the available bandwidth is suf ficient to cover the BL stream due to a relatively small BL size. Compared to EL 118 vl v2 v3 I v4 1 vl v2 v3 A v4 vl v2 v3 v4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. packets, these BL packets are very critical and should be protected faithfully. Thus, the BL stream should be mapped into higher priority DS levels to secure its reliable transmission. Even if the available bandwidth goes below the rate demanded by BL incidently, we may assume that some minimal bandwidth (which could be smaller than the BL rate) is still sustained. By prioritizing the BL stream and protecting it accordingly, we can preserve its minimal quality with the delivery of the highest priority portion of BL packets. Thus, to serve as a simple and general model, we consider three BL categories for relative proportional DiffServ. For EL packets, an other lower priority class queue (i.e. DS level) with two different drop preferences is assigned 2. This multiple queue DiffServ node as illustrated in Fig. 5.3 performs the differentiated forwarding policy. b e \ • " . • S c h e d u le r L - ~ e ......... ........... \ . / A F ........................... - V -X— X— X— ► Relatively proportional Loss Rate Figure 5.3: The DiffServ node with a multiple queue. In addition to priority dropping, rate adaptation can significantly reduce the net work congestion when an accurate estimate of the available bandwidth is available. Rate adaptation can be performed at either the server side or the edge router in 2If we include the best-effort service as the worst one, then there are three drop preferences for EL in total. 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the DiffServ network. Packets are dropped by rate adaptation in a strict priority order. By marking IN for EL packets within the available bandwidth and OUT on EL packets above the available bandwidth, rate adaptation is performed implicitly (i.e. OUT is assigned to the best-effort service). We can sometimes drop the OUT portion at the sender side to avoid network congestion in advance. This approach may result in worse network link utilization when the estimation of available BW is not accurate (e.g. too conservative). Compared to priority dropping in differenti ated forwarding, rate adaptation provides more graceful quality degradation when the available bandwidth becomes small. However, we have to pay some price to achieve the goal, e.g. the complexity required to estimate the available bandwidth, and the inevitable time delay between bandwidth measure and rate adaptation. In our proposed system, a compromised solution is provided. T hat is, when there is a big change in the available bandwidth, rate adaptation is performed. Otherwise, only DiffServ forwarding is employed. 5.6 Experimental Results We study the performance of the proposed system through extensive simulations, where the simulation set up is given in Fig. 5.4. Several typical scenarios in video streaming applications are identified and evaluated. The gain of prioritized trans mission over non-prioritized one is compared. The proportional DiffServ is adopted in network simulation. 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. V id e o f MPEG4 FGS encoder I cnd-lo-cnd \ l i l t o »; RPI Generation t r MPEG4FGS encoder — P Packetizer — Prioritizer and rate adaptation D i l f S c n N d tv m rl " . M o d f i ‘ Estim ated Vavailable B W y1 1 x p e r i c n i c N t n t w k iM .iv a n d p .ti-k e l lo ss I MPEG4 FGS ER decoder Loss/error concealment De-Packetizer R e c o n s t r u c t e d v id e o Figure 5.4: The diagram of the proposed simulation setup. Scenario 1: Differentiated Forwarding of BL packets As discussed, the MPEG-4 FGS codec is built on the assumption th at the avail able bandwidth is sufficient to transm it BL packets. However, in the case of severe network congestion, it is desirable to degrade the quality of BL gracefully. Here, we first compare the performance of prioritized transmission of BL packets with that of non-prioritized one. The priority transmission adopts the proportional DiffServ model with three DS levels for proportional loss rates. One interesting issue here is how to map the continuously prioritized packets to three discrete DS levels. One possible solution is the mapping strategy proposed in [52], T hat is, packets are first classified (i.e. uniform or nonuniform quantized) into different categories. Packets of different categories are then mapped to a limited number of DS fields, with a goal to minimize quality degradation under a pricing mechanism. However, for simplicity, we adopt a simpler QOS mapping policy in the simulation by a direct mapping from 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. RLI to DS levels, in which all packets are clustered into three groups, each of which has a similar number of packets and each group of packets is mapped to one DS level. For the simulation setup in Fig. 5.4, we have to set up various param eters in the encoding and the packetization modes. Several param eters are shown in Table 5.1. The BL packets are encoded by using the MPEG-4 FGS codec with TM5 rate control at 128 kbps. The CIF sequence is encoded at lOfps with a leading I-frame followed by all P frames. Since packet loss from the initial I-frame is too catastrophic, we limit the packet loss only to P-frames. Both AIR and CIR simulations are performed and compared, and the performance gain from those two modes are shown in Fig. 5.5(a) and (b). We see from these figures that the DiffServ transmission has a clear gain in terms of PSNR under the same bit budget and the overall packet loss ratio. It is interesting to note th at gains are varying significantly in four cases as shown in Table 5.1. The smallest gain is obtained from Case 1 while Case 4 provides the largest gain. The RLI distribution of BL packets under Case 1 is illustrated in Fig. 5.6(a). Table 5.1: Parame x ts for differentiated forwarc ing of BL pac sets of MPEG-4 FGS Case Bitrate frame rate Rate Control GOP mode ER options Packet size 1 128 kbps lOfps TM5 IPPPP... VP, AIR 400 bytes 2 128 kbps lOfps TM5 IPPPP... VP, CIR 400 bytes 3 128 kbps lOfps TM5 IPPPP... VP, AIR GOB based 4 128 kbps lOfps TM5 IPPPP... VP, CIR GOB based Scenario 2: Differentiated Forwarding of EL Packets 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 30 - c c 28 VI o c 0.26 ■ 24 - pkt loss 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% 9% 8% 7% 6% 5% 4% 3% 2% 1% 0% Figure 5.5: The PSNR comparison of E E P/U E P for BL packets: (a) EEP and 3-level DS-UEP of Case 1 and (b) EEP and 3-level DS-UEP of Case 2. 1 50 99 148 197 246 295 344 393 442 491 540 1 113 225 337 449 561 673 785 897 1009 1121 1233 Figure 5.6: The RLI distribution for packets in the Foreman sequence: (a) BL packets under Case 1 and (b) EL packets under 384kbps 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BL and EL packets are usually protected by two error protection levels. In [71], different bitplanes within EL are unequally protected. However, the same bitplanes in different frames may have different contributions to the end-to-end visual quality so th at they should be protected unequally. For example, EL packets of a low quality BL frame typically has a higher impact than that of a high quality BL frame. Thus, we propose to apply differentiated forwarding to EL packets. By utilizing the R- D sample derived priority, we prioritize each EL packet and perform differentiated forwarding accordingly. The RLI distribution of EL packets at a rate of 384kbps are given in Fig. 5.6(b). We show the performance advantage of priority dropping in DiffServ over uniform dropping at 160kbps, 256kbps, 384kbps, and 512kbps of EL packets in Figs. 5.7(a)- (d), respectively. As shown in these figures, prioritized transmission has a clear gain in PSNR under the same bit budget and the same packet loss ratio. We also present the ideal performance of rate adaptation when an accurate estimate of the bandwidth is available. More graceful quality degradation can be achieved since rate adaptation is efficient in dropping packets in a strict order of priority. Another observation is th at the advantage of rate adaptation over UEP and the advantage of UEP over EEP are varying under different packet-loss ratios and different EL bit rates. Under a high packet loss ratio (i.e. 25 percent or more), rate adaptation has a significant gain over UEP. On the other hand, the gain of rate adaptation over UEP becomes smaller under a low packet loss. If there is a substantial amount of 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. bandwidth variation (i.e. corresponding to a high packet loss ratio), a higher gain can be achieved from rate adaptation at the cost of additional complexity to estimate the available bandwidth. Under a small or medium range of bandw idth fluctuation, UEP without rate adaptation works well without losing much efficiency. Finally, different EL bit rates also affect the gain of UEP over EEP. The smaller the EL bit rate is, the smaller gain of UEP over EEP is achieved. Impact of Packet Priority Distribution on the U EP Gain Coding and packetization parameters have an impact on the gain of UEP over EEP in both BL and EL packets. The AIR coding choice and the fixed-length pack etization narrow down the performance gap between UEP and EEP (in reference to CIR and GOB packetization). This phenomenon can be explained by the packet priority distribution as shown in Table. 5.2. As an extreme case, when the packet priority is set the same, the gain of UEP over EEP is zero. By modifying the coding and packetization modes, we modify the priority distribution as well as the perfor mance gap. Most work done in UEP attem pts to spread the priority distribution over a wide region so th at the gain from UEP is highlighted. In conclusion, for the UEP approach, a more widely spread priority index provides more graceful quality degradation. Thus, the performance gap between UEP and EEP is largest when the priorities are spread across a wide range. On the other hand, when packet priorities are clustered into a small region, it narrows down the performance gap. 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. w S 30 3DS-UEP —A r— EEP 0.2 0.1 0.5 0.4 0.35 0.3 0.25 0.05 Pkt Loss Rate (a) 3 5 RA 3DS-UEP EEP 3 4 - 3 3 - tz i 5 0 % 4 0 % 3 5 % 3 0 % 2 5 % 2 0 % 1 5 % 10% 5 % P k t L o s s R a t e 33.5 S' 32.5 - 'O ,M r . v 4 k . — — RA t 3DS-UEF EEp 30.5 50% 45% 40% 35% 30% 25% 20% 15% 10% 6% Pkt Loss Rate (C) -*— RA » — 3DS-UEP ■ * — EEP 30.5 40% 35% 30% 25% 15% 10% 6% 20% Pkt Loss Rate (d) Figure 5.7: The PSNR comparison of rate adaptation, UEP, and EEP for the Fore man sequence under different EL rates: (a) EL at 512kbps, (b) EL at 384kbps, (c) EL at 256kbps and (d) EL at 160kbps, where the PSNR value is 34.99dB, 34.19dB, 32.97dB and 32.07dB, respectively, under the no loss condition. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.2: Priority distribution of BL and EL packets under different encoding and packetization parameters._________________________________________________ BL Case 1 BL Case 2 160k EL p ■ * avg ap Gain p 1 avg aP Gain p 1 avg aP Gain 3.04 2.45 0.97db 3.35 3.09 2.13db 0.023 0.026 0,15db 256k E J 384k E J 512k EL p 1 avg aP Gain p 1 avg a p Gain P avg aP Gain 0.024 0.031 0.57db 0.026 0.035 0.8db 0.027 0.0038 l.ld b 5.7 Conclusion A video streaming solution with the MPEG-4 FGS stream delivered over prioritized networks was investigated. First, a constant-quality rate adaptation was proposed by embedding R-D samples with an interpolated piecewise linear R-D model. Then, the error-resilient video stream was prioritized for differentiated dropping and for warding, where rate adaptation was dynamically performed to meet the time-varying available bandwidth. Through extensive experiments, benefits of prioritized trans mission over the DiffServ network was illustrated. We also investigated the impact of priority distribution on the performance gap of UEP over EEP. 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Buffer-Const rained R-D Optimal Rate Control 6.1 Introduction Several factors should be considered in the design of a video codec system. First, it is the channel bandwidth constraint. Through rate control, we can regulate the coding rate to meet the channel bandwidthe constraint and allocate the bits to coding units efficiently to achieve the best possible quality (or, equivalently, the minimal distortion). Second, there is a constraint on the end-to-end delay, e.g. interactive video applications. Third, we should consider the buffer constraint in many applications. T hat is, the video decoder should be able to decode and playback the reconstructed image sequence without buffer overflow or underflow (i.e. buffer starvation). A typical coding system usually consists of three layers: the group of picture (GOP) layer, the frame layer and the macroblock layer. One may add another layer called the slice layer in H.263 or the video packet layer in MPEG4 between the frame 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and the macroblock layers, where high layers (i.e. the GOP and the frame layers) perform bit allocation and low layers (i.e. the slice and the MB layers) control the coding mode and the quantization scale to achieve the target bit allocation. By the coding mode, we mean the prediction type (Intra/Inter/N ot coded), the variable block size (16x16, 8x8 or other allowed block size by the syntax), the DCT type (frame/field DCT) and the motion compensation (MC) type (frame/field predicted MC). In this chapter, we propose several approaches to the design of an R-D optimized codec system. The video input of consideration includes both progressive and in terlaced video. The problem to be addressed is that, given the bandwidth and the buffer constraints, how to adjust the coding mode and quantization param eters to achieve the highest video quality (or, equivalently, the minimal distortion). 6.2 M otivation 6.2.1 R eview of R-D Optim ized V ideo Encoding In international video compression standards such as MPEG and H.26x families, the bitstream syntax and semantics are unambiguously defined to allow for interoper ability of the compressed bitstream over a wide range of applications and systems. Thus, the decoder’s action on any standard-compliant bitstream is well defined by 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the standard itself without leaving much flexibility for modification (except for possi ble error concealment and post-processing operations intended to improve the visual quality of displayed frames). On the other hand, the encoding process is only de fined in the normative part of the standard documents for bitstream generation. The encoder has the flexibility to decide encoding modes for frame and MB coding and coding param eters {e.g. the quantization parameters and the frame skip number) as long as their syntax can be understood by the decoder. These coding decisions have been intentionally left out of the standard for a flexible encoder implementation to achieve further performance improvement. One fundamental problem in the encoder design is to optimize the selection of coding modes and param eters to achieve the maximal video quality under constraints on the com putational complexity, delay, buffer, bandwidth and the packet (or bit) loss rate. A buffer-constrained CBR encoding scheme was examined in [40], where an op timal encoder bit allocation scheme was determined using a forward dynamical pro gramming technique known as the Viterbi Algorithm (VA) with a discrete set of quantizers. It essentially generates a trellis to represent all viable allocations at each instance {i.e. frame or MB) under the buffer constraints. Each branch corresponds to a quantizer choice for an M B/frame unit and has its associated cost. The trellis path with the minimal total distortion can be found via VA. It provides the optimal solution to this buffer constrained bit allocation problem. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A tree-based dependency graph was formulated and applied to M PEG in [47], where all possible combinations were generated by a successive quantizer’ s choice for frames. The number of possible combinations grows exponentially with the number of levels of dependency, which makes the exact solution too complex. Although some heuristics such as the monotonicity assumption can reduce the number of search paths, it is still too complex to implement in practical codecs. Instead of optimizing bit allocation among frames, bit allocation was optimized among macro blocks in [66, 57, 39, 49] by selecting different quantization steps and coding modes for P frame coding in the H.263 standard. Wiegand et al. [66] showed the selection of four possible modes: uncoded, intra, inter and inter-4V (Inter MB with 4 motion vectors) to encode MB in P frames to give the best R-D performance. In [39] and [49], a joint coding mode and quan tization step selection were performed to encode the P frame to optimize the R-D performance. To reduce the computational complexity, Mukherjee et al. [39] pro posed the M-best search scheme. T hat is, at each state in a trellis, only the M paths with the least cost were retained as survivors and carried over to the next step. Schuster et al. [49] restricted the range of quantization param eters to 8 ~ 12. The R-D optimization methodology was further adopted in the design of adaptive quantizers in [29], [44]. T hat is, for each DCT block, whether a coefficient is encoded or not is determined by evaluating the R-D tradeoff associated with th a t specific coefficient. 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2.2 Proposed R-D O ptim ized C oding for Progressive and Interlaced Video The video source can be either progressive or interlaced in this chapter. Since the limited buffer size at the decoder is crucial in most video broadcasting systems, the strict virtual buffer constraint is imposed. Moreover, the com putational complexity should not be too high for DVB applications since real-time video encoding is often required. Compared with previous work, our research has the following contributions. 1. Our R-D optimized video codec can handle both progressive and interlaced video, while previous work mainly focused on progressive video. 2. In contrast with previous work that focused on P frame optimization [39, 49, 57, 66], all I, P and B pictures are optimized in our work. 3. We extend the possible coding modes of MB by including the zero MV and the zero texture bits as two candidate modes for every MB to avoid the irregularly large MV or one single expensive coefficient. 4. We develop a quality feedback scheme to generate VBV (Video Buffer Verifier) compliant bitstream with assured video quality. 5. A better R-D model is developed for frame level bit allocation. 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6. Fast heuristics are developed to achieve good coding results th at are close to the optimal one at a much lower computational cost. In the following, we first formulate the R-D optimized coding problem with coding options at different encoding stages. Then, with MPEG-2 and MPEG-4 as examples, we show how to implement the R-D technique in the real-time coding scenario with fast heuristics. Finally, experiments are given to dem onstrate the superior performance of the proposed algorithm. 6.2.2.1 R-D optim ization for different encoding options In principle, we can optimize every coding option to give the best R-D performance. T hat is, given an input sequence, the frame type (I, P and B) and the corresponding bit allocation scheme in each frame should be determined to give the best quality under constraints [47], [33]. Once the frame type and its bit allocation for each frame is determined, the DOT transform type, coding modes and the quantization param eter for each MB should be determined to give the minimal distortion under the frame level bit allocation. Going down to the 8 x 8 block level, the 64 coefficients can be adaptively quantized such th at the small single high frequency coefficient can be skipped to avoid a long run length code without incurring significant distortion [9]. Generally speaking, any coding options allowed by the bitstream syntax can be evaluated against each other in terms of R-D optimization. Therefore, a truly R-D optimized video encoder should be able to 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • identify all possible coding options; and • specify a specific procedure to select the best coding options under certain constraints. The coding options can be represented as a vector x th a t consists of the frame type (FP), bit allocation (B), the MB quantization param eter (QP), the MB coding type (M), and the DCT transform type (T), i.e. X = (FP, B, QP, M, T). (6.1) Let us use © to denote the set of all possible coding options. Then, the R-D opti mization can be written as “S * ” E A ( x), (6.2) subject to J 2 Ri(x)^R- (6-3) To solve the above minimization problem, we introduce the Lagrangian cost function of the following form J\(x)i = Di(x) + A * Riix), (6.4) 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where A is the Lagrangian multiplier. It was shown in [54] th at if x is the optimal solution to (6.4) satisfying the rate requirement X ^ i( x ) < R> then x is als° the optimal solution to (6.2). Note th at x can be either one 5-D vector or a sequence of of 5-D vectors if dependency is considered. It is generally complicated to find the optimal solution to (6.2). Moreover, the rate and/or the distortion of coding unit i (which can be frame or MB) may rely on options of its neighboring units (i.e unit i — 1, % — 2, • • •). This dependency make the complexity grow exponentially. We can classify R-D optimization problems according to the rate/distortion dependency. In the following, we review different approaches to solve these R-D optimization problems. 6.2.2.2 CASE I: Independent R ate and Independent D istortion This is the classical R-D optimization problem. One example is to minimize the average distortion D of a collection of coding units subject to a total rate constraint Rbudget■ The problem of N units can be expressed as N argm in xiee (6.5) subject to N ^ ' R i { X i ) ^ Rbudget- (6.6) 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Under the independent rate and distortion assumption, the optimization of the Lagrangian cost of each unit — Di + A * Ri can be done independently once the commone multiplier A for all units is found by a fast convex search. The complexity of this R-D optimization procedure is low. Suppose th at each unit has M coding options and there are totally N units. Then, there are M x N R-D pairs to be evaluated and compared. This optimization formaltion can be applied to intraframe coding. Examples include: motion JPEG coding [40, 46], the R-D optimized progressive JPEG [16], and the R-D optimized MPEG4 FGS rate control [78]. 6.2.2.3 CASE II: D ependent R ate and Independent D istortion To improve coding efficiency, DPCM is widely used among MBs in MPEG-2 and MPEG-4 standards. For example, to efficiently encode the motion vector of each MB, the difference of MVs is encoded instead of the MV itself. Similarly, when quantization param eters (QPs) among MBs are changed, only the QP difference is encoded instead of the QP itself. This type of problems shares one common feature. T hat is, the distortion term of coding unit i only relies on its own coding 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. option while its rate term depends on coding options of both itself and its neighbors. Mathematically, we can write the Lagrangian cost function as M Xi, MXj) = RiiXilXj) + ^ * Dtixi) (6.7) This class of problems is more difficult to solve than those in the independent rate and distortion class. This is especially ture when the rate term relies on multiple neighbors of the target unit i. If the rate R t of unit i only depends on its immediate neighbor (i.e. Ri(xi\Xi-i), this problem can be efficiently solved using the Viterbi algorithm (VA) or trellis coding. In trellis coding, the immediate dependency can be represented with a path in trellis. The minimization problem of “rsmm A corresponds to finding a minimal distance path among the trellis. This optimization technique can be efficiently used in the determination of MB param eters such as the mode selection, the quantization selection, and the selction of motion compensation and/or DCT transform types. The complexity of R-D data generation for this type of problem is determined by the rate term Ri{Xi\Xj)- If each unit has M coding options, there are totally M 2 possible rates and M possible distortions to optimize. In total, there are M 2 x N rates and M x N distortions for N units as shown in Equ. 6.8. If the rate term can be decomposed to Ri(xi\xj) = Ri(xi) + M(xi,Xi-1) (6.8) 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where M(x», c/Mj_i) is a predefined m atrix th at depends on Xi and Xi-i only- Then, only N rate terms Ri(xi) have to be generated for unit i. Thus, when the rate term can be decomposed as done in (6.8), the complexity of R-D generation for N units is equal to M x N , which is the same as the independent rate and distortion case. Our work falls in this category. We consider the joint selection of the coding mode and the quantization param eter of each MB to minimize the overall distortion in one frame. 6.2.2.4 CASE III: D ependent Rate and D ependent D istortion In video coding, temporal dependency is introduced among frames through motion compensated prediction (MCP). W ith MCP, both the rate allocation and the result ing distortion of frames are dependent on each other. The problem with N-frame dependency can be formulated as Ji(Xi\Xi-i,Xi-2,-,Xi) = Ri(Xi\Xi-i,Xi-2,-,Xi) + A A (X i|X i-i,X ;-2,-,X i) (6.9) for i = 1, 2, • • • , N. Our goal is to solve the unconstrained problem with a proper value of A that matches the given bit budget Rbudget for coding options yq. yq, • ■ ■ , Xn > l-e- < 6-10> i 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It is obvious th at the complexity of (6.10) grows exponentially with the depth N of the dependency-tree. The monotonicity property was exploited in [47] to reduce the computational complexity of the temporal dependency problem (6.10) by pruning suboptimal operating points. 6.2.3 Q uality Assured R -D Optim al Codec Control As mentioned in the previous section, the rate control problem can be decomposed into th at for three layers: the group of picture (GOP) layer, the frame layer and the macroblock layer. The GOP layer rate control method specifies bit allocation for the entire GOP. Suppose th at the display time of a specific GOP is Tgop, C is the averaged bit rate of the video, and the VBV buffer status before encoding this GOP is B t. Then, the allowed target bit allocation R gop can be w ritten as 0 < R g o p + B t — T g o p * C < Bmax. (6-H) Thus, we can get T g o p — Bt < R g o p < Bmax + TGOP- B t. (6.12) It is easy to see that the adjustable range for the GOP-level bit allocation is Bmax- The larger Bmax is, the more room the rate controller has when the encoder encodes a difficult GOP. A larger buffer tends to result in a longer pre-loading and/or 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. random access delay as a tradeoff. In our work, we keep the desired decoder buffer high enough to handle difficult scenes better. In the implementation, we take the desired buffer level after the coding of the current GOP to be 7/8 * Bmax. Hence, our initial target for the current GOP equals to 7/8 * Bmax + T q o p * C — Bt. 6.2.3.1 Quality assured frame layer bit allocation Based on different applications, different frame layer bit allocation schemes can be applied. For example, for off-line coding, the encoder can do a look-ahead processing and perform the bit allocation scheme more efficiently by utilizing both the previous and the future frames. Even more, if a multiple-pass encoding process is allowed, the encoder can encode each coding unit several times to m atch the target bit rate with a constant quanitization scale (i.e. the constant-Q scheme). Moreover, if the application does not have the buffer constraint (e.g. DVD encoding), there will be no hard limit on the bit variability from one frame to another. T hat is, the encoder can shift bits from easy scenes to difficult scenes relatively freely. Our rate control in the previous section works for the off-line coding scenario where two-pass encoding is used to intelligently allocate bits according to the frame complexity. In this section, we propose another frame layer bit allocation scheme th at targets at real time CBR encoding with a strict buffer constraint. By real time CBR encoding, we mean that we cannot look ahead to get the information of future frames. W ith a strict buffer constraint, we allow the bit variation of neighboring frames within a certain bound imposed by the VBV buffer. 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The proposed buffer constrained real-time CBR video rate control scheme is stated below. 1. Initial frame rate determination Based on the given GOP level bit allocation, we first choose a target frame bit allocation similar to TM5 based on the frame complexity defined as X* = Ri x QPi (i.e. if the frame complexity is thought to be QP invariant, it is actually the first order R-Q model). Furthermore, let us assume th at the target frame has the same complexity as the preceding frames of the same frame type. Based on these assumptions, it is straightforward to obtain the bit allocation formula which is the same as that in TM5. 2. Frame rate modification via buffer validity check The obtained bit allocation from the first step may or may not be allowed by the VBV buffer. We can check the validity of this bit allocation scheme with (6.11). If it is allowed, we use it as the initial target and perform rate control in the macro-block level. If not, we choose the value closest to the target and allowed by the VBV buffer. 3. Trellis-based macroblock (MB) rate control W ith the frame level bit allocation obtained in Step 2, the Viterbi coding algorithm is performed with respect to the R-D lattice to find the best mode and the best QP for each macroblock. 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4. Quality feedback As a by-product of the proposed R-D based MB rate control, we obtain the average QP and the average distortion of the whole frame. These are good indicators of frame quality. To follow the idea of constant-Q rate control, we use QP as the quality indicator. We keep one sliding window to measure the average quality, with which we will compare the quality of the current frame k. Specifically, it works as follows. (a) Suppose the Q P ’s of previous W frames are QPk~w, QPk-w+i-, • • •, QPk-i, and their mean is equal to m q P and their variance is aqp. (b) Based on m qp and aqp, two thresholds are defined. For example, we may choose the lower bound QPdn = rnqp — 2aqp and the upper bound QPup — m qp + 2crg p to avoid a small difference between QPdn and QPup- In the implementation, we actually use QPup = m qp + m ax(2,2aqp) and QPdn = m qp - m ax(2,2aqp). (c) If QPk £ [QPdn - QP-up], the task is done and we can proceed to the next frame. Otherwise, it means that the current bit allocation scheme is either more than sufficient (with QPk < QPdn) or less than the minimal rate (with QPk > QPup)- Under these two scenarios, the bit allocation result should be adjusted based on the allowed VBV buffer as given in the next step. 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (d) To get new bit allocation, we should adjust QPk to fall in the range [QPdn, Q P U p] based on the current buffer status B k- i - Hence, Q P k is a function of QPdn, QPup and B k~ i■ To determine Q P k, we may solve the following minimization problem Q C n ((Bdes - B k- 1) + X * (Q Pk - m qp)), (6.13) where A is the weighting factor between the buffer discrepancy and the quality smoothness. It is set to Bdes/ 15 in our implementation. (e) After getting Q P k, since all R-D data are available, we can easily calculate Rk corresponding to QPk- Then, we use R k as the bit budget of the macroblock level rate control, and determine the best QP and the mode for this MB as described below. 6.2.3.2 R-D Optimized M B Level R ate Control In this section, we focus on two choices in MB level rate control: (1) the selection of the quantization param eter and (2) the selection of the optimal coding mode. Since they are closely related to the standard syntax, they should be handled sepa rately for different standards. In the following, we will take MPEG-2 and MPEG-4 as examples, and the idea presented can be readily applied to H.263 and H.26L standards. MB R ate Control in M PEG -4 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As mentioned in the previous section, R-D data generation, which is computa tionally expensive, is highly dependent on the dependency among MBs. Thus, we first identify the dependency among MBs in the MPEG4 standard with the following procedure. • QP can only vary in the range of (-2,2) from the QP of the previous MB, since this is constrained by the standard syntax. • Only I N T E R — 1 M V MB or IN T R A MB can have a different QP from the previous MB while other MBs such as I N T E R — 4M V MB or skipped MB have Dquant (QP difference of MB) by the syntax. • AC and DC coefficients in IN T R A MB are predictively encoded if their top or left neighboring MBs are Intra MB. • A motion vector is predictively encoded based on motion vectors of its left, top and top right neighbors. • If the OBMC technique is adopt, both rate and distortion of a MB depends on its left, top and top right neighbors. As a result of these constraints, the rate and distortion of the target MB i is determined by considering its own property as well as the coding options (X i- ie ft, Xi-top and Xi-top-right ) of its three neighboring MBs (i.e. left MB, top MB and top 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. right MB). If the unconstrained Lagrangian approach is adopted, the resulting cost function will be the function with several variables: Jii^Xit ^ \X i— leftiX%— topi X i— top— right) = R i(X i\X i— le ftiX i— topiXi— top— right) "b A * D ^ X i l X i — lefti X i— topi X i— top— right 1^-14) where x% is the 2-D feature vector set th at consists of quantization param eter QP and coding mode M such th at x = QP x M. W ithout decomposing the rate and distortion terms, it will be extremely difficult to find the exact solution to Ji due to the coupled dependency among MBs even with VA. To make this problem tractable, we simplify the dependency using the following assumptions. • For the AC prediction in the I frame, we suppose th at the left, top MB has the same QP as current MB. • In the P and B frames, there is a rare chance for the current and its left, top, top-right neighbors to be all intra-coded. Hence, the AC prediction can be disabled in the P and B frames. • In the P and B frames, we introduce the concept of “causal optimality” , by which we take the optimal mode determined by the current Lagrange parame ter to handle MV prediction due to coding modes of the left, top and top right MBs. 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W ith the above assumptions, we restrict the dependency of the current MB i to its immediately preceding MB only, which can be efficiently handled with VA. Hence, (6.14) can be re-written as M X i , A|x»-i) = R i( X i\X i- i) + ADi(Xi|Xi-i)> (6-15) where x — Q P x Af, Q P is equal to [1,..., 31] by definition and the mode M = [mi, of the MB depends on the frame type. In MPEG-4, the possible coding modes of I, B, and P frames are stated below. • For the I frame, only the intra-coded MB is possible. • For the P frame, MB can be coded as the skip, intra coded, inter coded with 1 MV, inter coded with 4 MV, inter coded with 2 MV (field prediction), and the special case th at MV is zero. • For the B frame, MB can be encoded as the direct mode, backward-1MV, forward-lMV, interpolated-2MV, backward-2MV, forward-2MV, and interpolated- 4MV. Moreover, for inter-coded MB, it may consist of motion vectors and DCT data. It is the flexibility of the encoder to decide whether to encode DCT coefficients even though there are non-zero coefficients. By skipping high cost non-zero coefficients with a small increase in distortion, we may get a better R-D performance than directly encoding these expensive coefficients. Hence, for the inter-MB mode in 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B and P frames, both options (with or without DCT coefficients) are examined. Moreover, the inclusion of these modes does not incur additional complexity in the R-D data generation. MB R ate Control in M PEG -2 The existing dependency among MBs in MPEG-2 can be summarized as follows. 1. M PEG-2 has no constraints on the QP variation so th at QP can vary from 1 to 31 from one MB to another. 2. In MPEG-2, the DC scaler is fixed to 8. Thus, the DC prediction in the I frame can be handled accurately since all MBs in the I frame is intra-coded and independent of whatever QP the previous MB takes. However, for intra- coded MB in the P and B frames, the causal optimality rule is still needed to determine the MB type. 3. The motion vector prediction is constrained within one slice. Hence, the MY of target MB only depends on its left neighbor. If we can decompose the rate term as given in (6.8), the complexity of R-D data generation will increase linearly instead of exponentially with the dependency depth. By restricting the dependency of the target MB i to its immediately preceding MB, we do obtain an expression as shown in (6.8). Generally speaking, the rate of MB i includes the rate for MV and the texture as Ri,tot(Xi\Xi-l) = Ri,Mv(Xi\Xi-l) + Ri,TXT(Xi\Xi-l) > (6.16) 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where Ri,Mv(Xi\Xi-i) relies on the difference of the motion vectors of MB i and i — 1 and is invariant to the MB prediction type or the quantization param eter. Hence, only one fixed lookup table is needed to generate the rate of MV. The rate of the MB texture can be decomposed as R i ,T X r ( X i \ X i - l ) = R i,startcode + R i,M B type + Ri,C B P C + R%,DC + R i,A C ■ (6-17) In above, terms R i , sta rtcode and P h .a c are either constant or chosen depending on its own coding option while the term RitMBtype depends on the QP difference of MBs i and i — 1 . Hence, they can be implemented with a lookup table. The term R hc bpc, which can be decomposed as (6 .8 ), depends on its own coding mode as well as the prediction mode of MB i — 1. The same applies to Rijxj- These two terms can also be implemented with a lookup table. To summarize, the rate of the MB texture can be decomposed into one term, which is determined by its own coding option xo and several lookup tables with predefined entries. Hence, the R-D data generation can be performed independently while the MB dependency can be addressed with the Viterbi coding procedure. It reduces the complexity of the R-D data generation significantly. T hat is, the complexity grows linearly instead of exponentially with the number of MBs. W ith X = Q x M illustrated above, we can represent MBs in I, B and P frames with three trellis diagrams as shown in Fig. 6.1. 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.1: The three trellis diagrams for I, B and P frames, respectively. 6.2.3.3 Simplified M B-Level R ate Control The complexity of the R-D optimized rate control comes from two parts: (1) R-D data generation and (2) the Viterbi algorithm. To reduce the complexity, instead of generating the R-D data corresponding to each QP and each mode, we may generate the R-D data for the most probable M modes and N QPs. On the other hand, we i still should keep the coding efficiency of the fully optimized scheme as much as possible. Estim ation of Frame-level B it Usage If the rate control scheme can estimate target QP accurately based on the given bit target, the rate control task will become easy. However, most previous work 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that performs a direct model of the relationship between the rate and QP does not work well. As a result, to achieve more accurate R-D sample points, the cubic spline model was used to interpolate R-D data points based on real R-Q points from the QP set consisting of values 2, 3, 5, 8 , 11, 19, 30 [37]. This still demands a high computational complexity. It was observed by He et al. [12] th at there is a strong correlation between the rate and the percentage of non-zero coefficients. They claimed th a t the number of bits spent for a coding unit is proportional to the number of non-zero coefficients in this unit. In other words, the average bits spent for a single non-zero (NZ) coefficient is constant, i. e. R = C x Npfz- (6.18) Consequently, they suggested to perform the histogram analysis of non-zero coef ficients at each QP. They also assumed that ratio C is the same as the previous frame. Hence, based on given frame target, we only need one table lookup operation to obtain the target QP. By examining this problem in detail, we have the following two observations. • Since DCT coefficients are run-length coded, if there are a smaller number of non-zero coefficients, there are better chances to have a longer run. For example, when QP is changed from 1 to 31, the run value of each non-zero coefficient potentially becomes larger while the average level becomes smaller. These two factors have an opposite effect on the final bit rate spent on the 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. texture coding part. Therefore, one natural question is th at whether these two factors cancel each other. If it is valid, we should get the exact R-NZ relationship as given by (6.18). Otherwise, the bit cost of each NZ coefficients should vary with QP. • Equation (6.18) only estimates bits used by the texture part (to be more specific, bits spent for the coding of DCT coefficients). However, the I frame consists of bits for the syntax header and textures, P and B frames also consists of bits for motion vectors. To estimate the bit rate for the entire frame at different QP, the syntax header of I, P, B frames and motion vectors of P and B frames should also be taken into account. Clearly, the number of syntax bits such as the MB-coded is changing with QP. Although bits for motion vectors are not varying with QP, the MB type may change for MB with the zero motion vector since it becomes the skipped MB when all coefficients become zero. As a result of the above two observations, instead of modeling the relationship between texture bits and the number of non-zero coefficients, we attem pt to model the relationship between the bits used for the syntax header plus the texture part and the number of NZ coefficients in this section. Thus, the derived model can be used to estimate the bits required by the I frame directly, while the required bits for the P and B frames can be estimated by adding the pre-calculated motion vector bits for inter frames. R-NZ Relationship for Intra-frames 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. By performing extensive simulation on different contents, we observe th at that the average cost per non-zero coefficients is linearly decreasing as QP increases from 1 to 31. This fact is shown in Figs. 6.2 and 6.3 for two test sequences. It is especially true for QP larger than 5. It is also observed th a t the average cost per non-zero coefficient does not have significant difference for different video contents. It is almost always between 4 and 8 . The average cost C per non-zero coefficient can be w ritten as C = a x QP + 6 , (6.19) where a and b are two constants and a is a negative number. Thus, unlike the work in [37] where the rate was obtained at sampled points 2, 3, 5, 8 , 11, 19 and 30 of the QP set, only two sampling points are needed to achieve an accurate R-Q relationship in the proposed R-NZ model. pgljaSj c o m t ' - c r > T - c ' O L r > N . a > T - c o i r > N - a 5 T - t- t- t- t- t- cm c \i <m c n c n o o QP Figure 6.2: The average number of bits per non-zero coefficient in the I frame for the “Football” sequence. 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.3: The average number of bits per non-zero coefficient in the I frame for the “Tempete” sequence. R-NZ Relationship for Inter-frames Next, we investigate the relationship between bits used for the coding of texture plus syntax and the number of non-zero coefficients. Again, it is observed that the average cost C per non-zero coefficient has a linear relation as shown in (6.19) when the number of non-zero coefficients is reasonably large. It is supported by extensive experiments th at if the number of non-zero coefficients is larger than two times of the number of MB in one frame, the linear relationship between C and QP is almost certain. If the number of non-zero coefficients is too small, then the bits spent for the skip frame and the MB header part become dominant and the linear relationship of C and QP becomes less obvious. Being different from the I-frame case, C is linearly increasing as QP increases from 1 to 31 in the inter-frame. Thus, param eter a in (6.19) is positive. We show the cost per non-zero coefficients for one inter-frame in 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the ’ ’Dropping Leaf’ sequence from the VQEG source in Fig. ??, which is a typical case for most video contents. 6 S B M E B B M L B M B M S m a m mm ■ ■Eh h hhI rs “ T ' % J y s w ' 4 * 4 C'Tvi ? s S ^ T t v S E s * - I 3 1 ii'fia.nir,-i-T ^T n''-. ■ i rygajWtt-^rVimjmati » b « T T CO f * T T " in i^ T T T co in QP --., .. r'T-T" CD CM i r C T > CD CM CO CM m CM h- CM Figure 6.4: The average number of bits per non-zero coefficient in the interframe of sequence “Tempete” . Hence, with an accurate R-NZ model for an I frame, we can estimate the average frame QP under a given bit budget. By using the default mode decision rules specified in the standard, we can calculate the average frame QP for the inter frame. Based on estim ated Q P avg, we can determine the QP range for both R-D data generation and subsequent Viterbi coding as Q P e (LQPavg + 0.5J - 2, [QPavg + 0.5J + 2) (6 .2 0 ) Only 5 R-D data points are needed in the above range. If the complexity of the R-D data generation and Viterbi coding grows linearly with the QP range, it represents 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a complexity reduction factor of 31/5*31/5 = 38.4 for R-D data generation and Viterbi coding. S electio n o f K m o st p o ssib le m odes We can further reduce the complexity of the Viterbi coding part by only evaluat ing the most probable K modes for each MB. Suppose that we only evaluate 3 most probably modes for each MB, then the complexity of the Viterbi coding for MB in the P frame will be reduced by 7/3. If the Viterbi coding part accounts for half of the complexity of R-D calculation and Viterbi coding, then we can achieve a total complexity reduction by a factor of 3 1 3 1 A 1 7^ — x — x ( - + - x - ) = 64. 5 5 2 2 3 To select the K most probable coding modes for the current MB, we calculate JM i RjV/1 4“ \i- r e v ^ D _y/ / . and the mode Mi th at has the K smallest are selected for th at MB. Here, Xprev is the Larangian param eter selected by the previous frame. Indeed, it can be further simplified so th at only modes with the K smallest distortions (in the SAD or MSE sense) are preserved and their rates and distortions are calculated. Then, both R-D data generation and its subsequent Viterbi coding are simplified by factor of N /K . S u m m a ry o f th e A lg o rith m 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For the I fram e: 1. After the DCT transform, calculate the histogram hist(Q P ) of NZ coefficients at each QP, 2. O btain (7(5) and (7(10) as two sampling points with pseudo coding and find the target QP based on the given bit target Btgt «r i " ( e | j p ) “ hist(Qp »- < 6-21> 3. Perform the R-D data generation from QPtgt — 2 to QPtgt + 2 4. Perform Viterbi coding to select the best QP for each MB under the given bit target B tgt. For the P and B frames: 1. After the DCT transform, calculate the histogram hist(QP) of NZ coefficients at each QP for the default mode (the mode selected by standard mode decision rules) 2. Obtain C(5) and (7(10) as two sampling points with pseudo coding and find the target QP based on the given bit target B tgt 5 T “ ( c § ^ - kist(QP)) (6.22) 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Perform the R-D data generation from QPtgt — 2 to Qptgt + 2 for every mode. 4. Based on Xprev, prune the possible coding modes to K (e.g. 3) modes 5. Perform Viterbi coding in the limited QP range for these K modes. 6.3 Experim ental Results In this section, we examine the performance of the R-D optimized MPEG-2 and MPEG-4 encoders with MPEG-2 and MPEG-4 standard reference software. The reference MPEG-2 codec is obtained from the MPEG software simulation group (http : / j w w w .m peg.org/M PE G IM SSG /) and MPEG-4 software is from [18]. The test sequences in use are sequences adopted for MPEG-2 and MPEG-4 standard ver ification such as ’ ’Football” , ’ ’Mobile and Calendar” , and ’ ’Tempete” and some se quences down-sampled from the HD video sequence. Both interlaced and progressive format are utilized in the simulation. 6.3.1 R -D O ptim ized M P E G -2 Encoder To make the comparison fair, we let our MPEG-2 codec adopt the frame level bit allocation scheme used in the reference MPEG-2 software. Thus, we can evaluate the performance improvement purely from the proposed macro-block level rate control scheme. In Fig. 6.5, we show the PSNR comparison of the proposed R-D optimized M PEG 2 encoder with the reference codec for sequence “Football” at 4 Mbps. We 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. M PEG -2 T M 5 RD-optirrized 39 - ■ D ■ o 37 0c I 35- 33 - 29 1 51 201 101 151 251 framejuim Figure 6.5: The PSNR comparison of the TM5 reference encoder with the proposed R-D optimized encoder for the football sequence. see from the figure th at our optimized MPEG-2 encoder is around 1.6dB better than the reference MPEG-2 encoder. In Fig. 6 .6 , we show the PSNR comparison of our encoder with the reference encoder for sequence ’ ’mobile and Calendar” at 4 Mbps. Our R-D optimized macro-block level rate control scheme has a coding gain of about 1.9 dB. The simplified MPEG-2 encoder utilizes the information of the number of non zero coefficients to limit the possible QP range to 5 and then select 3 best modes for R-D data generation and Viterbi coding. The performance gap between the fully R-D optimized version and the simplified version is very small. The PSNR result of the simplified R-D optimized MPEG-2 encoder is compared with th at of the fully optimized scheme for the football sequence at 4 Mbps in Fig. 6.7. The PSNR comparison for the mobile and calendar sequence is shown in Fig. 6 .8 . We see that the average PSNR degradation is about 0.15dB and 0.04dB for the mobile 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 33 — — MPEG-2 TM5 — RD-Optimized ■ Q ■ D < n C L 25 51 101 151 201 251 frarre num Figure 6 .6 : The PSNR comparison of TM5 reference encoder with the proposed R-D optimized encoder for the mobile and calendar sequence. and calendar sequence and the football sequence, respectively. These indicate that the simplified R-D optimized MPEG2 encoder can achieve results th a t are very close to those of the fully R-D optimized one. In this subsection, we will first demonstrate the performance of the proposed R-D op timized MB level rate control. Then, we will show the performance of the simplified R-D optimization scheme. Finally, we will demonstrate the superior performance obtained by integrating our two-pass frame level bit allocation described in Chapter 3 with the R-D optimized MB level rate control. As done in MPEG-2, we let our MPEG-4 encoder follow the frame level bit allocation of the reference MPEG4 encoder. Since MPEG-4 is mostly applied to the low delay environment where B frames are rarely used due to the reordering delay. 6.3.2 R -D Optim ized M PE G 4 Encoder 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. —» — RD-optirrization — Sim plified RD-optirrization 30 1 51 101 151 201 251 frame_num Figure 6.7: The PSNR comparison of the simplified R-D optimized encoder with the R-D optimized encoder for the football sequence. 32 RD-Optimization Simplified RD optimization 30 28 26 51 101 151 201 251 1 framejium Figure 6 .8 : The PSNR comparison of the simplified R-D optimized encoder with the R-D optimized encoder for the mobile and calendar sequence. 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Only the IP P P GOP structure is evaluated here. Both TM5 and VM7 rate control schemes are compared with the proposed R-D scheme. W hen the comparison is performed, except for the rate control scheme (including adjustm ent of MB coding mode and QP), all other parameters, such as the search range, are kept the same. In Fig. 6.9, the PSNR comparison of reference TM5 and the proposed R-D optimized encoder is presented for the 15fps Foreman CIF sequence at 192kbps. It is found that more than 1 dB gain is achieved with the proposed R-D optimized MB level rate control. In Fig. 6.10, the PSNR comparison of reference TM5 and the proposed R-D optimized encoder is presented for the 15fps Tempete CIF sequence at 256kbps. Similarly, the R-D optimized encoder outperforms reference TM5 by around 0.8 dB in the PSNR value. 34 - £ 3 2 ' E C I 30 - 28 - TM5 RD-Optimization 26 4 101 1 51 frame num Figure 6.9: The PSNR performance comparison of the proposed R-D optimized encoder versus the TM5 reference codec for the 15fps Foreman sequence at 192 kbps, where PSN R TM 5-avg — 31.15dB and PSNR.RD-avg — 32.22dB. In Figs. 6.11 and 6.12, we present the comparison of the simplified RD opti mization versus the R-D optimization for 15fps Foreman CIF sequence at 192kbps. 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 29 • S' 28 - T J 26 T M 5 25 - RD-optim ization 101 121 1 21 41 61 81 frame_num Figure 6.10: The PSNR performance comparison of the proposed R-D optimized encoder versus the TM5 reference codec for the 15fps Tempete sequence at 256 kbps, where P S N R rM 5-avg = 27.3dB and PSNRno-avg = 28.1dJ3. T hat is, we utilize the information of the number of non-zero coefficients to limit the possible number of QP to 5 and then select 3 best modes for R-D generation and Viterbi coding. As expected, the simplified one performs very closely to the RD optimal one in both cases (less than 0.05dB difference). 35 34 33 32 31 30 ♦ — RD-optimization ■ * — Simplified RD-optimization 29 28 51 frame num 101 Figure 6.11: The PSNR comparison of the simplified R-D rate control method with the R-D optimized rate control method for the 15fps Foreman sequence at 192kbps, where P S N R smp- r (i- aV g = 32.21dB and P SN R m ,-.avg = 32.22db. 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 30 29 S' S 28 Q L O T 27 a. 26 25 Figure 6.12: The PSNR comparison of the simplified R-D rate control scheme with the R-D optimized rate control method for the 15fps Tempete sequence at 256kbps, where P S N R smp^rd^avg = 28AdB and P S N R rd^avg = 28.1di3. Last, we present results obtained by integrating our two-pass frame level bit al location described in Chapter 3 with the proposed R-D optimized MB level rate control. In Figs. 6.13 and 6.14, we show the comparison of the 2-pass R-D opti mized encoder versus the TM5 reference encoder for 15fps Foreman CIF sequence at 192kbps. Although the average PSNR of the sequence does not improve signifi cantly compared with TM5 type frame level bit allocation, two-pass frame level bit allocation does provide much smoother video quality. 6.3.3 Conclusion We presented the formulation of the R-D optimized rate control for several encoding options at different encoding stages. Then, by taking MPEG-2 and MPEG-4 as examples, we showed how to incorporate them in the coding standard implemen tation with several fast heuristics. Finally, the comparison of the proposed new 163 RD-optimization •*— Simplified RD-optimization 1 21 41 61 81 101 121 frame num Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 33 - 27 - TM5 2-pass RD optimization 25 1 101 51 frame num Figure 6.13: The PSNR comparison of the 2-pass R-D optimized encoder with the TM5 reference encoder for the 15fps Foreman sequence at 192kbps, where PSNR^p-rd-avg — 32.3dB and PSNR,TM 5-avg = 31.15d6. schemes with test models of those standards was given to dem onstrate the superior performance of the proposed algorithm. 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. — * —TM5 — • —2-pass RD optimization 30 ■ 26 101 1 51 frame num Figure 6.14: The PSNR comparison of the 2-pass R-D optimized encoder with the TM5 reference encoder for the 15fps Tempete sequence at 256kbps, where P S N R 2v-rd-avg = 28.12d B and P S N R TM5-av9 = 27.3d B . 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 Predictive Fast Integer and Half Pel M otion Search For Interlaced Video 7.1 Introduction A fast algorithm to perform integer- and half-pel motion search for interlaced video is proposed in this work. By exploiting the correlation between the frame- and the field-type integer-pel search, the proposed scheme can skip the sub-optimal integer- pel search type. Furthermore, based on the matching cost obtained from integer-pel search, the scheme will determine whether to perform the half-pel search. If the half-pel search is deemed necessary, the scheme is able to skip sub-optimal half-pel search points to further reduce the computational cost. 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.2 M otivation Motion estimation is the most computationally intensive module in a typical video encoder. To reduce the computational complexity, several fast algorithms such as 2- D logarithmic search [24], three-step search [34], hierarchical search [3] and diamond search [59] have been developed to avoid examining all points in a search window as done by the full search scheme. Recently, two fast motion estimation algorithms, i.e. MVFAST (the Motion Vector Field Adaptive Search Technique) [79, 19] and PMVFAST (the Predictive Motion Vector Field Adaptive Search Technique) [61, 19], were adopted in MPEG-4 part-7 as the optimization model. The basic ideas of these two algorithms include the following: (1) selection of initial MV predictors from spatially and temporally adjacent blocks to perform diamond search (DS), (2 ) both small and large diamond being utilized and adaptively selected based on the local motion activity and (3) threshold parameters being adaptively calculated to assist in the early term ination of the search. These two algorithms represent a significant improvement over the diamond search (DS) algorithm [59] in terms of both visual quality and speed-up. They can be readily applied to integer-pel motion search in progressive video contents. Furthermore, interlaced video such as DVB (Digital Video Broadcast) can take ad vantage of two type of motion estimation (ME) modes, % . e. frame and field ME, to improve the efficiency of motion prediction. However, a baseline implementation of these two algorithms to interlaced contents are three times computationally more 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. expensive than the one used for progressive contents. Therefore, an enhanced search algorithm is highly desirable for interlaced video coding. In this work, we present a fast motion estimation scheme to perform integer- and half-pel motion search for interlaced video. The proposed integer-pel motion search is based on the MVFAST scheme. By exploiting the correlation between frame and field type search, we can adaptively ignore the sub-optimal search type. Furthermore, based on the matching cost obtained from the integer-pel search, our scheme adaptively determines whether to perform the half-pel search. If the half- pel search is deemed necessary, our scheme will adaptively skip the sub-optimal search points to further reduce the computational complexity. The overall diagram is illustrated in Figure 7.1. The rest of this chapter is organized as follows. A review of the MVFAST scheme is given, and the technique for fast integer-pel search of Interlaced video is then presented in Section 2. Fast half-pel search is presented in Section 3. Simulation results are shown in Section 4. Concluding remarks are given in Section 5. 7.3 Predictive Fast Integer-pel Search 7.3.1 Overview of M VFAST A lgorithm The accuracy of the initial predicted MV (PMV) has a significant impact on the performance of fast ME algorithms. In MVFAST, both PM V and search pattern 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ^ Frame integer- pel ME ► Select ref. field for integer-pel ME Half-pel search' Select half-pel points from 8 possible positions finish Figure 7.1: Overall diagram of the proposed predictive fast integer-pel and half-pel search scheme. are determined by the local motion activity of the concerned macroblock. The local motion activity of a macroblock (MB) is determined by the MV of its spatially adjacent MBs. Typically, the MB to the left, above and above-right of the current MB are selected. Let us use the set V = V o, V i,V 2, Vi to contain the (0,0) default vector and the MVs of three adjacent MBs. Let us also adopt the 1-norm Ly: = |xj| + \yi\ for vector Vi — ix^yi) as the distance measure. Let L = M A X (L vi) for all Vi in V . The motion activity of the current MB is defined below. Motion Activity = Low, i f L < L\, Medium, i f Li < L < L 2, High, i f L > L 2 (7.1) where Li and L 2 are thresholds. Typically, L\ = 1 and L 2 = 2. 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (b ) (a ) Figure 7.2: (a) Large diamond search pattern and (b) small diamond search pattern. Based on the obtained motion activity, MVFAST adaptively selects PMV as well as the search pattern. If the motion activity is low, the small diamond search pattern (fig. 7.2b) was used with (0,0) as the PMV. W hen motion activity is medium, PMV is still (0 ,0 ) but the large diamond search pattern (fig. 7.2a) is used instead. Otherwise, if motion activity is deemed high, both (0,0) MV and three spatial predictors will be examined and the best predictor will be taken as the PMV followed by the small diamond search pattern. An optional phase called early elimination of search can also be incorporated such that if its SAD value obtained at (0,0) is less than a threshold T, the MV of current MB will be assigned as (0,0) without further checking of any other locations. 7.3.2 Predictive Fast Integer-pel search for Interlaced video Motion estimation (ME) for interlaced video can be categorized into two types: frame and field ME. Field ME considers two fields separately, in contrast, frame ME combines two fields into one frame and performs the ME accordingly. The following 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. analysis will first treat two fields as a frame. If a MB X of size of N by N is located at (x, y ) in current frame i, compared with a MB Y with a displacement of (dx, dy) relative to X in previous frame i — 1, the SAD of frame ME can be w ritten as N - 1 SAD (dx, dy) — \fi{x + m ,y + n ) - f i - i ( x + m + dx,y + n + dy)\ (7.2) m ,n=0 Frame MB X at i frame can be separated into two field MBs th at can be represented as fi( x + 2* m', y + 2 * n') (top field) and fi(x + 2 * m' + 1 , y + 2 * n' + 1 ) (bottom field) respectively, where 0 < m ',n' < (N — l)/2 . Subsequently, the SAD of field ME can be w ritten as SA D top(d'x, dy) = ( N - 1)/2 ^ | fi(x + 2 * m \ y + 2 * n!) — /»_i(x + 2 * m' + d'x,y + 2 * n 1 + d 'y)[7.3) m ' ,n'=0 and (iV -l)/2 S A D bot{d !x,d!y) = ^ 2 \fi(x + 2 * m '+ l,y + 2 * n ' + 1) - m ',n '=0 fi-i(x + 2 * m ' + 1 + d'x,y + 2 *n' + 1 + d'y)\ (7.4) Depending on the parity of x + 2 * m! + d'x and x + 2 * m! + 1 + d'x, this leads to four possible ways of matching top and bottom fields of current MB X with the top and bottom fields of a candidate MB of the previous frame. T hat is, a) 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. top field block predicted from top field of reference frame; b) top field block predicted from bottom field of reference frame; c) bottom field block predicted from top field of reference frame; d) bottom field block predicted from bottom field of reference frame. Then for each of these four types of field prediction, MVFAST based integer-pel motion estimation is applied to obtain the best matching integer pixel position. Then half-pel motion search is performed on its eight surrounding half pixel positions. Subsequently, best match is found for each type. Therefore, four minimal SADs are found for field predictions. The minimal sum of bottom field block and top field block are selected as the minimal SAD of ’ ’field” type MC. Finally, it will be compared with frame type minimal SAD and the final MC type are decided. The complexity of the above baseline implementation of integer-pel search for interlaced video will be approximately twice more complex than integer-pel search for progressive content, and the complexity of half-pel search of interlaced video, it is exactly 4 times more complex than progressive content. Therefore, the overall complexity of interlaced ME is approximately 3 times of the progressive ME. The above baseline process totally isolated the field MC with the frame MC, however, it is easy to see that for the certain displacement (dx,dy), Equ.7.2 can be calculated directly from Equ.7.3 and Equ.7.4. Therefore, in full-search algorithm, during frame MC one can calculate the SAD value for the four type of field MC as well since every possible position will be examined. Hence for full search scheme, essentially, integer-pel search for interlaced video is same as the integer-pel search 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for progressive content. However, when fast search algorithm such as MVFAST is employed, not every points in the search range will be examined. The search path is determined by the search pattern such as SDSP or LDSP, and the search path of frame MC and field MC will not overlap anymore. Therefore, the minimal SAD of field MC obtained from frame MC may not be the same as the SAD via independent field MC. Indeed, from our simulation, we found th a t the search path of field MC is often deviated from the search path of frame MC. Consequently, direct reuse of SAD from the frame MC will cause significant quality degradation. On the other hand, the minimal SAD values of field MC obtained from frame MC is valuable information such th at they provide good if not optimal initial PMV th at can greatly accelerate the search speed. More im portant, the obtained SAD values also indicate which one is the better reference field among the two reference fields of previous frame. Instead of performing ME on the two reference fields, ME for both top and bottom fields will adaptively select one reference field and ignore the other. The overall algorithm can be summarized as follows: 1. Perform frame type integer-pel search with MVFAST scheme, the minimal SADs such as SADfrm. SADtop^top, *h.l I) t op h o t L o in ^ F . ' 1 D {,ut / 0 /! i - [op and F /I I.) iH jj > :i) n - im i io < n and its corresponding M ^Vs for both frame MC and field MC are stored. 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2, By definition, S-A.Dfrm iD . TTblTl^SAD iop^t0 p ^ S-A.Dtop— bottom) “ f ” T Y l'lT l ( S AD bottom — topj S AD bottom— bottom ) (7-5) if the two sides of equ. 7.5 is equal, the frame MC is good enough for current MB, the field MC will not be performed anyway. Otherwise, go to step 3. 3. Based on S A D top-top and SA D top-boiiom, the reference field for field MC is selected for top field of current MB. Similarly, the reference field for bottom field can also be selected accordingly. 4. Perform the field MC for top and bottom fields only on selected reference fields using MVFAST. The initial PMV set will include (0,0) MV, three spatially pre dicted MVs plus the MV corresponding to minimal SAD of field MC obtained from frame MC. W ith the above algorithm, it is easy to see th at we reduce the half of the integer-pel and half-pel search points for field MC. We will presented the simulation results in section 7.5 to show the reduction of search points as well as the PSNR results with this scheme. 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.4 Predictive Fast Half-pel Search Based on SAD obtained from integer-pel search, our scheme will adaptively de termine whether to perform the half-pel search. If the half-pel search is deemed necessary, our scheme will adaptively skip the suboptimal search points to further reduce the com putational complexity. 7.4.1 Pre-elim ination of half-pel ME Since pixel value at half pixel position is bi-linearly interpolated from its integer pixel neighbor, there is clear correlation between the matching SAD from half and integer pel search [51] [17].Typically, the SAD at some half-pel position will be lower than the its SAD from integer-pel position which results in the compression gain of half-pel ME. However, the minimal SAD at half-pel is not significantly different from its integer-pel counter-part. In the following table 7.1, the histogram of the ratio between integer-pel SAD and minimal half-pel SAD is presented. Where 7 = SA D intjm in{SAD haif), the test sequences are MPEG-2 CCIR test sequences such as Mobile and Calendar, Football, Carphone and Tempete. Table 7.1: Histogram of ratio between integer-pel SAD and minimal hal: src Mobile Football Carphone Tempete Average 7 > 1-5 4.2 1 3.1 3.6 3 7 > 1-4 8.4 2 . 1 5.7 7 5.8 7 > 1-3 15.9 4.4 1 1 1 2 . 6 1 1 7 > 1-2 27.8 1 0 2 2 . 6 2 1 . 6 20.5 -pel SAD 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prom the above table, we can see it is very rare th at the m in (S A D haif)< S A D int* 2/3, therefore, when integer-pel ME is finished, we can approximately have the lower SAD bound for half-pel SAD as S A D int * 2/3. Actually, this lower bound is very conservative. In our simulation, we even use lower threshold like SA D int * 5/6 which does not hurt the performance either. W ith this lower SAD bound, we can pre-eliminate the unnecessary half-pel ME without performance degradation. Our algorithm can be summarized as: • After integer-pel search, calculate the SAD for integer-pel frame MC and integer-pel field MC. • If S A D int-.fieid > SA D int - f rame, half-pel ME for field ME will not be per formed. • If S A D int- f rame > S A D int-field * 6/5, half-pel ME for frame type ME will not be performed. The reason we use lower bound for field MC is th at since field MC will use two motion vectors while frame MC only use 1 motion vector, hence it is better to bias to frame MC a little bit. The simulation results is presented in section 7.5 to demonstrate the reduction of search points as well as the PSNR results with this scheme. 176 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.4.2 Predictive half-pel m otion estim ation Even though we decide to perform the half pixel ME for one specific integer position, it is not necessary to perform motion search for all those 8 points. As shown in the following figure 7.3, integer-pel ME stops only when the small diamond search is used and the center pixel (i.e. E in figure 7.3) has the minimal SAD. Hence the SADs corresponding to four surrounding pixel (i.e., A , B, C and D) are already known. Based on these four integer SADs, we can perform a hypothesis about at which half positions (i.e., X in figure) there most probably exist minimal SADs. Then we only test those most probable positions. W ith simulation, it is observed th at if the SAD value obtained from search points A, B, C and D are all very closed to that obtained from center point E, the optimal position is difficult to point out. In this case, we choose 5 most probable positions to examine. Currently, we take the maximal SAD values from A, B, C and D and discard the neighboring position of the one with maximal SAD such th at if D is the maximal SAD point, half-pel points (3, 5, 8) will not be examined. However, if SAD values at A, B, C and D are not very similar, for example, if C and D is two maximal SAD among this four points, and m in (S A D c, S A D d ) > S A D e * 5/4, then we will be pretty sure th at the optimal half-pel points will be closed to A and B, th at is, we only need to check points (1,2 and 4). Now our half-pel ME is performed as follows: 177 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 7.3: The search points in half pel ME. • Find the maximal SAD and second maximal SAD values among A, B, C and D. • Compare the second maximal SAD with SAD at position E (minimal one). • If SA D seCm ax > S A D e * 5/4, we perform less-points ME (2-3 points which is the joint neighbors of two minimal SAD positions). Otherwise, more-points ME is performed such that 5 points th at excludes the neighbors of the maximal SAD positions are employed. W ith the above simplification, we can further reduce the number of half-pel ME points for both frame and field MC. 7,5 Simulation Results Extensive simulation results are presented to demonstrate the performance of the proposed fast motion search algorithm. The proposed scheme is implemented in the 178 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. MPEG-2 encoder. Four interlaced CCIR test sequences are used. They range from relatively slow motion (Carphone) to fast motion (Football) and complicated global motion (Tempete and Mobile). All of them are of size 720 x 480 and length 260 frames. 7.5.1 Predictive H alf-Pel Search The pre-elimination of half-pel ME (PEHP) and the predictive half-pel ME (PHP) tools as described in Section 7.4 can be enabled independently. Therefore, we have the following three scenarios: • Baseline: without PEHP and PHP • PEHP: PEH P is employed in association with the baseline • PEH P+PH P: both PEHP and PHP are employed in association with the base line In Table 7.2, we compare the proposed schemes with the baseline half-pel search scheme. Where % s a d m is defined as the percentage between the examined half-pel points in PEH P or PE H P+PH P and the baseline scheme. T hat is, < W „ = "wSBBr/rEBP+PHr t 1M % _ (y g) h a l f ,b a selin e where N h a i f , P E H P / p e h p + p h p represents the number of half-pel search points by the proposed PEH P (or PEH P+PH P) scheme while Nhalf,baseline represents the number 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of half-pel search points by the baseline scheme. A P S N R represents the PSNR degradation compared with the baseline half-pel search in the unit of dB. We see that PEH P saves about 50% half-pel search points with virtually no quality degradation while PE H P + P H P can save 70 ~ 80% half-pel search points with up to 0.2dB quality degradation. Since PEH P can save around one half of the half-pel search points without quality degradation, it should be always enabled. In the situation where 0.2dB quality degradation is tolerable and the com putational complexity is a major concern, PHP can be turned on as well. Table 7.2: Computation-distortion performance of the proposed half-pel search scheme __________________________________________________ PEHP PE H P+PH P % SA D .h A P S N R % SA D Ji A P S N R Mobile Football Carphone Tempete 45% 60.5% 61.7% 52% 0.026 -0.014 -0.035 -0.047 21 % 34.7% 30.2% 24.8% -0.015 -0.21 -0.12 -0.2 7.5.2 Fast Predictive Integer-pel (FP IP ) Search Next, we illustrate the advantage of the FPIP technique presented in Section ?? by comparing the number of search points and the PSNR value for the baseline scheme and the proposed FPIP technique in Table 7.3. Note that N avg-BL and N avg_pF represent the average numbers of integer-pel search points for each MB with the baseline and the PFIP search scheme, respectively. We see th at the proposed PFIP save about 60% of integer-pel search cost with around 0.1 dB quality degradation. 180 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 7.3: Computation-distortion performance of the proposed integer-pel search scheme _____________________________________________________ Src P S N R bl N a v g —B L P S N R pf N a v g - P F Mobile 28.28 86 28.21 32 Football 33.56 128 33.43 59 Carphone 41.18 59 41.05 25 Tempete 32.42 84 32.37 34 In Table 7.4, we show the overall percentage of search points, including both integer and half-pel motion search points, compared with the baseline scheme to gether with P F IP+ P E H P and P F IP + P E H P + P H P techniques. A P S N R represents the PSNR degradation compared with the baseline scheme in the unit of dB, % s a d represents the ratio of number of search points (both integer-pel and half-pel) be tween the proposed P F IP+ P E H P and P F IP + P E H P + P H P schemes. It is defined mathematically as y N t o t a l , P F I P + P E H P ! P F I P + P E H P + P H P 1 „ n(w V S A D = ------------------------ 77-------------------------------------------- * iUU/o * to ta l,b a s e lin e where N t o t a i , P F i P + P E H P / P F i P + P E H P + P H P represents the total number of search points using the P E IP + P E H P or P F IP + P E H P + P H P technique, while Nbaseiine represents the total number of search points by the baseline scheme. We see th at the proposed scheme saves about 60-70% computational cost at the cost of 0.1-0.3dB loss. 181 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 7.4: Computation-distortion performance of the combined integer-pel and ■ half-pel search scheme___________________________________________ Src F P IP+ P E H P F P IP + P E H P + P H P % S A D A P S N R % S A D A P S N R Mobile 34.8% -0.08 29.5% -0.13 Football 45.4% -0.15 40.7% -0.28 Carphone 41.8% -0.17 35.7% -0.24 Tempete 40.4% -0.1 34.5% -0.3 7.6 Conclusion Three fast search techniques were proposed for integer-pel and half-pel motion search for interlaced video, respectively, in this work. Simulations were performed to demonstrate th at the proposed scheme provides significantly low complexity than the baseline scheme recommended by today’s standards. Given its superior per formance such as significant complexity saving with virtually no or little quality degradation, it provides a promising technique to optimize the current MPEG stan dard implementation such as MPEG-2, MPEG-4 and H.26L encoder. 182 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 8 Conclusion and Future work 8.1 Conclusion In the first part of this thesis (i.e. Chapters 3-5), we proposed several novel video streaming solutions over the best-effort Internet and the DiffServ network. Video preprocessing, error resilient scalable source coding, constant quality rate adapta tion, prioritizing packetization, and DiffServ QoS network are seamlessly integrated into one system to provide QoS for the end user. In the second part of this thesis (i.e. Chapters 6 and 7), the R-D optimization scheme was adopted to achieve coding quality improvement for both progressive and interlaced video contents. Several fast schemes were also developed to reduce the complexity in the R-D tradeoff analysis and in the search of the optimal solution. To reduce the complexity of interlaced motion estimation, a fast scheme was presented to perform integer- and half-pel motion search for interlaced video. 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W ith previous rate control schemes such as TM5, T M N 5/8/9/10 and MPEG4- VM, significant quality degradation presents in the area with large object motion or scene change which is very annoying to human perception. Constant quality video rate control (CQVRC) was proposed for non-scalable MPEG-4 video that can meet both the CBR bandwidth and receiver buffer constraints. Requirement of rate control to constant bit rate is significantly weakened due to relatively large buffer allowed for streaming applications. Thus, one VBR rate control is allowed where only the average of instantaneous bit rate should match the available bandwidth without violating the buffer constraints. The proposed CQVRC exploits the large allowed decoder buffer, future frame information and temporal scene segmentation to achieve much smoother overall video quality than th at obtained by previous rate control schemes such as TM5 and MPEG4 VM. The impact of the buffer size on the final visual quality was also demonstrated. Rate control for scalable video was also performed via bitstream truncation. An optimal truncation strategy was proposed for the MPEG-4 FGS codec. The constant quality rate adaptation was realized by embedding the minimal R-D (rate-distortion) information and relying on a piecewise linear R-D model for the enhancement layer (EL). The R-D information (e.g. R-D sample points generated during the encoding process) is embedded at each bitplane in MPEG-4 FGS enhancement layer. By linear interpolating those embedded R-D information, the adaptive bit allocation can be performed to achieve constant quality rate adaptation. 184 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Besides, both the spatial quality and the temporal frame rate can be jointly ma nipulated for efficient rate control. By considering spatial tem poral quality tradeoff, one sophisticated content-aware rate control (CARC) scheme was formulated. The scalable video streaming architecture incorporates the video preprocessing, the scal able source encoding, and the content-aware rate control. By combining the encoding pre-processing (i.e., adaptive frame grouping and associated measures on the spa tial/tem poral quality contribution per frame) with the temporal-adaptive scalable video codec followed by the content-aware rate control, the proposed streaming so lution is tackling the problem of how to keep the smooth video quality even when there is a significant change in video source or connection bandwidth. Delivering MPEG-4 FGS bitstream over the DiffServ network was also inves tigated. The bitstream is first segmented into fixed size packets, then the packet priority is calculated based on its loss impact on the end-to-end visual quality. Pri oritized packets are subject to differentiated dropping and forwarding. The impact of source coding parameters, the packetization scheme and the differentiated treat ment of the network on the final visual quality was carefully studied. It is shown that, although the prioritized stream benefits from the prioritized network, its gain is heavily dependent on how well the video source priority and the network priority match each other. The R-D optimization technique was adopted to encode progressive and inter laced video contents. Both the quantization param eter and the coding mode of each 185 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. MB were optimized in all types of pictures (including I, B and P frames). In the frame level, a better R-Q model was developed based on the number of non-zero co efficients. The resulting method provided a better frame level bit allocation scheme. Furthermore, a quality feedback scheme was developed to generate the VBV com pliant bitstream with assured video quality. To reduce the complexity of the R-D optimization, several fast heuristics were developed to achieve results th at are close to the optimal one with a much smaller computational complexity. Finally, a fast motion estimation scheme was proposed to conduct integer- and half-pel motion search for interlaced video. The proposed integer-pel motion search was based on the MVFAST scheme. By exploiting the correlation between frame- and field-type motion search, we skipped some search types for speed-up. Further more, based on the matching error in an integer-pel location, the proposed scheme adaptively determines whether to perform the half-pel search at this integer-pel location. If the half-pel search is needed, our scheme adaptively skipped some sub- optimal search locations to reduce the computational complexity further. 8.2 Future Work 8.2.1 Efficient Scalable V ideo Coding Although the scalable codec gains popularity due to its flexibility for heterogeneous connections, its coding efficiency however degrades. In this proposal, we examined 186 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. one differential JPEG2000 (DJ2K) based video codec to achieve the temporal, the SNR and the spatial scalabilities simultaneously. However, the coding gain of DJ2K is more than 2dB lower than the state-of-the-art non-scalable codec. This is probably due to the following two reasons. First, the wavelet domain motion estimation (ME) is less efficient than the pixel domain ME. Second, the coding efficiency of the wavelet codec applied to residual signals is not as efficient as that for natural images. More efficient motion estimation and advanced residue coding schemes are of great value. 8.2.2 Low C om plexity A lgorithm s A two-pass encoding process with the constant quality video rate control (CQVRC) was proposed in this research. To reduce the encoding complexity, it is desirable to come up with an efficient one-pass algorithm. Since the frame complexity infor mation can be roughly calculated in the pixel domain, a simpler CQVRC scheme is possible without encoding the sequence by using the fixed QP in the first pass. For on-line video broadcasting, only a small lookahead window to future frames is possible. How to design an efficient rate control scheme to reduce possible quality variation is worthy of our further investigation. Another interesting extension of the proposed MB-based R-D optimization frame work is to incorporate the R-D frame level bit allocation scheme in our current framework. The frame dependency introduced by MCP makes the complexity grow 187 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. exponentially. How to efficiently decouple the frame dependency at a low complex ity is an open problem. Moreover, instead of generating the real R-D data for each frame, a better R-D model based on the number of non-zero coefficients worths further investigation. 8.2.3 Effective QoS M apping and Refined DiffServ M odel A couple of issues should be elaborated on effective QoS mapping and the refined DiffServ network model. Only heuristic mapping of BL and EL packets to DS levels was studied in this work. In fact, both the packet priority distribution and the pricing mechanism associated with each DS level should play a role in optimizing the QoS mapping. How to optimally perform QoS mapping for both BL and EL packets is still an open question. Furthermore, when multiple bitstreams are multiplexed in the DiffServ router, how to perform the QoS mapping of packets from different streams to exploit the maximal gain? Some absolute priority concept should be defined and compared among those different streams. Finally, the DiffServ model used in our current simulation can be refined to cover rate adaptation, packet filtering, and differentiated forwarding in a more realistic fashion. 188 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.2.4 Joint R ate A daptation and Error Control for Scalable V ideo Stream ing over DiffServ Networks The server-oriented rate adaptation approach was presented in this proposal. Rate adaptation can also be performed in the packet level by prioritizing packet drop ping at DiffServ edge routers, or explicit traffic policing at intermediate video gate ways/filters (e.g. an active network extension of DiffServ routers or other special policing devices) with packet filtering. However, efficient packet filtering techniques are still unknown. A more realistic network model should be investigated for this purpose. Unequal error protection (UEP) can be realized in the transport level by differentiated forwarding. Ideally, this transport level UEP should be coordinated with the application level UEP (AUEP) based on FEC and ARQ. Therefore, rate adaptation and error control with AUEP and TU EP can be jointly performed based on the packet priority. How to integrate those components into one working system will be investigated. 8.2.5 Fast M otion Search for Advanced M otion C om pensated Prediction Blocks of a variable size can better capture the motion vector field to improve the coding performance. However, when blocks of multiple sizes exist, motion estimation becomes even more complex. 189 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In principle, motion search for integer- and fractional-pel positions should be per formed with respect to each block size. For example, in the current H26L standard (also called the MPEG-4 part 10), the block size can be 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 or 4x4. M E/M C is performed for these blocks separately [25]. Compared with ME using a fixed size block, the complexity increases dramatically in the H.26L codec. An algorithm that exploits the correlation between different search results with respect to different block sizes should save a significant amount of computa tional cost. Modern coding standards attem pt to utilize a more accurate motion vector to improve the coding efficiency. For example, up to pel motion search is utilized in the current H.26L standard. Although providing a better coding gain, it demands a significant amount of computational overhead. How to reduce the complexity while preserving the high coding gain due to a more accurate motion vector is still an open problem. 190 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reference List [1] Y. Bao and C.-C. Jay Kuo, “Low complexity binary description codec” , in IEEE Proc. MM SP 99, Sept.,2001. [2] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, “An archi tecture for differentiated services,” RFC 2475, IETF, Dec. 1998. [3] J. Chalidabhongse and C.-C. Jay Kuo, “Fast motion vector estimation using multiresolution-spatio-temporal correlations” , in IEEE Trans. Circuits and Sys tems for Video Technology, vol. 7, no. 3, pp. 477-488, 1997. [4] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic rate distortion model” in IEEE Trans. Circuits and Systems for Video Technology, vol. 7, No. 1, pp. 246-250, Feb. 1997. [5] P. A. Chou, A. E. Mohr, A. Wang, and S. Mehrotra, “Error control for receiver- driven layered multicast of audio and video,” IEEE Trans. Multimedia, vol. 3, no. 1, pp. 108-122, March 2001. [6] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low-delay communications” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 9, No. 1, pp. 172-185, June 1997. [7] C. Dovrolis, D. Stiliadis, and P. Ram anathan, “Proportional differential ser vices: Delay differentiation and packet scheduling,” in Prof. SIGCOMM, Boston, MA, Sept. 1999. [8] C. Dovrolis and P. Ram anathan, “Proportional differential services-Part II: Loss rate differentiation and packet dropping,” in Proc. International Workshop on Quality of Service, Pittsburgh, PA, June 2000. 191 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [9] D. Farin, M. Ksemann, P. H. N. de W ith, W. Effelsberg, “Rate-Distortion Op tim al Adaptive Quantization and Coefficient Thresholding for M PEG Coding”, in Proc. of 23rd Symposium on Information Theory, Belgium, 2002. [10] M. Gallant and F. Kossentini, “Rate-distortion optimized layered coding with unequal error protection for robust internet video” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 11, No. 3, pp. 357-372, Mar. 2001. [11] R. Gopalakrishnan and J. Griffioen, “A simple loss differentiation approach to layered multicast” , Proc. InfoComm 2000, Apr. 2000. [12] Z. He and S.K. Mitra, “A unified rate-distortion analysis framework for trans form coding” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 11, No. 12, pp. 1221-1236, Dec. 2001. [13] M. Hemy, U. Hengartner, P. Steenkiste, and T. Gross, “MPEG system streams in best effort networks” , in Proc. IEEE Packet Video’ 99, New York, April 1999. [14] C.-Y. Hsu, A. Ortega and A. Reibman, “Joint selection of source and channel rate for VBR video transmission under ATM policing constraints” , in IEEE Journal on Selected Areas in Comm., vol. 15, pp. 1016-1028, Aug. 1997. [15] C.L. Huang and B.-Y. Liao, “A robust scene-change detection method for video segmentation” in IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp.1281 - 1288, Dec. 2001. [16] J. In, S. Shirani, and F. Kossentini, “On RD optimized progressive image coding using JPE G ” , in IEEE Trans. Image Proc., vol.8, N o .ll, pp.1630-1638, Nov., 1999. [17] I. Ismaeil, A. Docef, F. Kossentini and R. K. Ward, “A com putation-distortion optimized framework for efficient DCT-based video coding” , in IEEE Trans. Multimedia Sept., 2001. [18] ISO /IEC JTC1/SC29/W G11, 14496-5:2001 (FDIS reference software). 192 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [19] ISO /IEC JTC1/SC29/W G11 N4554, “Optimised reference software for coding of audio-visual objects” , Ver. 4.0, Dec., 2001. [20] ISO /IEC JTC1/SC29/W G11 13381-2, “Coding of the video and the associated audio information” , 1996. [21] ISO /IEC JTC 1/SC29/W G11, 14496-2:2001 “Information technology - coding of audio-visual objects- part 2: visual” , Sydney, July, 2001. [22] ITU-T, “Video coding for low bitrate communication” , ITU-T recommendation H.263; ver-1, Nov. 1995, ver-2, Jan. 1998. [23] S. Jacobs and A. Eleftheriadis, “Streaming video using dynamic rate shaping and TC P congestion control” , in Journal of Visual Comm. Image Representa tion, vol. 9, No. 3, pp. 211-222, Sept. 1998. [24] J. Jain and A. Jain, “Displacement measurement and its application in inter frame image coding” , in IEEE Trans. Commun., vol. COMM-29, pp.1799-1808, Dec. 1981. [25] Joint final committee draft (JFCD) of joint video specification, “ITU-T Rec. H.264—ISO /IEC 14496-10 AVC” , July, 2002. [26] J. Kim, Y.-G. Kim, H. Song, T.-Y. Kuo, Y. J. Chung, and C.-C. J. Kuo, “TCP- friendly Internet video streaming employing variable frame-rate encoding and interpolation” , IEEE Trans. Circuits and Systems for Video Technology, vol. 10, no. 7, pp. 1164-1177, Oct. 2000. [27] J.-G. Kim, J. Kim, J. Shin, and C.-C. J. Kuo, “Coordinated packet level pro tection employing corruption model for robust video transmission,” in Proc. of VCIP, San Jose, CA, Jan. 2001. [28] T.-Y. Kim, B.-H. Roh, and J.-K. Kim, “Bandwidth renegotiation with traffic smoothing and joint rate control for VBR M PEG video over ATM” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 10, No. 5, pp. 693-703, Aug. 2000. 193 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [29] L.P. Kondi and A.K. Katsaggelos, “An operational rate-distortion optimal single-pass SNR scalable video coder” , in IEEE Trans. Image Proc., vol. 10, pp.1613-1620, Nov. 2001. [30] T. V. Lakshman, A. Ortega and A. R. Reibman, “VBR video: tradeoffs and potentials” , in Proc. of the IEEE, vol. 86, No. 5, May 1998. [31] H. Lee, T. Chiang, and Y.-Q. Zhang, “Scalable rate control for M PEG4 video” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 10, no. 6, pp. 878-894, Sept. 2000. [32] J. Lee and B.W.Dickinson, “Temporally adaptive motion interpolation exploit ing tem poral masking in visual perception” , in IEEE Trans. Image proc., vol.3, pp.513-526, Sept. 1994. [33] J. Lee and B.W.Dickinson, “Rate distortion optimized frame-type selection for M PEG coding” , in IEEE Cir. System for Video Tech. vol.7, pp.501-510, June 1997. [34] R. Li, B. Zeng, and M.L.Liou, “A new three-step search algorithm for block motion estim ation”, in IEEE Trans. Circuits and Systems for Video Technol ogy, vol. 4, pp. 438-442, Aug. 1994. [35] W. Li, “Overview of fine granularity scalability in MPEG4 video standard,” IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, Mar. 2001. [36] X. Li, M.H. Ammar, and S. Paul, “Video multicast over the Internet” , in IEEE Network, vol. 13, No. 2, pp.46-60, March-April 1999. [37] J. Lin, A. Ortega, “Bit-rate control using piecewise approximated rate- distortion characteristics” , in IEEE Trans. Circuits and Systems for Video Tech nology, vol. 8, no. 4, pp. 446-459, Aug. 1998. [38] F.C.M. Martins, W. Ding, E. Feig, “Joint control of spatial quantization and tem poral sampling for very low bit rate video” , in Proc. of ICASSP-96, vol.4, 194 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pp.2072-2075, 1996. [39] D. Mukherjee, and S.K. Mitra, “Combined mode selection and macroblock quantization step adaptation for the H.263 video encoder” , in Proc. of Interna tional Conf. Image Proc., pp. 37-40, Sept. 1997, Santa Barbara, USA. [40] A. Ortega, K. Ramchandran, and M. Vetterli, “Optimal trellis-based buffered compression and fast approximation”, in IEEE Trans. Image Proc., vol.3, pp.26- 40, Jan. 1994. [41] Online materials in http://uiww.microsoft.com/windows/windowsmedia/ 9series/encoder/default, asp. [42] Online materials in http://service.real.com/help/library/encoder.html. [43] Online materials in http://www.apple.com /quicktime/. [44] I. -M. Pao and M. -T. Sun, “Encoding stored video for streaming applications” , in IEEE Trans. Circuits Syst. Video Technoi, pp. 199 -209, Feb. 2001. [45] H. Radha, Y. Chen, K. Parthasarathy and R. Cohen, “Scalable Internet video using MPEG-4” , in Signal Proc..-Image Proc., vol. 15, No.1-2, pp. 95-126, 1999. [46] K. Ramchandran and M. Vetterli, “Rate-distortion optimal fast thresholding with complete JPEG /M PE G decoder compatibility” , in IEEE Trans. Image Proc., vol.3, pp.700-704, Sept. 1994. [47] K. Ramchandran, A. Ortega and M. Vetterli, “Bit allocation for dependent quantization with applications to multiresolution and M PEG video coders” , in IEEE Trans. On Image Proc., vol. 3, pp. 533-545, Sept. 1994. [48] M. van der Schaar and H. Radha, “A hybrid temporal-SNR fine-granular scal ability for Internet video”, in IEEE Trans. Circuits and Systems for Video Technology, vol. 11, No. 3, pp.318 -331, Mar. 2001. 195 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [49] G.M.Schuster and A.K. Katsaggelos, “A theory for the optimal bit allocation between displacement vector field and displaced frame difference” , in IEEE Journal on Selected Areas in Comm., vol. 15, No. 9, pp. 1739-1751, Dec. 1997. [50] S. Sen, J.L. Rexford, J.K. Dey, J.F. Kurose and D.F. Towsley “Online smooth ing of variable-bit-rate streaming video” , in IEEE Trans. Multimedia, vol. 2, No. 1, pp. 37-48, March 2000. [51] Y. Senda, “Approximate criteria for the MPEG-2 motion estimation” , in IEEE Trans. Cir. System for Video Tech., Vol. 10^ No. 3, April, 2000. [52] J. Shin, J. Kim, and C.-C. J. Kuo, “Quality-of-Service mapping mechanism for packet video in differentiated services network,” in IE E E Trans. Multimedia, vol. 3, no. 2, pp. 219-231, June 2001. [53] J. Shin, J. Kim, and C.-C. J. Kuo, “Content-based packet video forwarding mechanism in differentiated service networks” , in IEEE Packet Video Work shop, Sardinia, Italy, May 1-2, 2000. [54] Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of quan tizers” , in IEEE Trans. Acoustics, Speech and Signal Proc., vol.36, No.9, pp. 1445-1453, Sept. 1988. [55] H. Song, J. Kim and C.-C. Jay Kuo, “Real-time encoding frame rate control for H.263+ video over the Internet” , in Image Communications, vol. 15, no. 1-2, pp. 127-148, 1999. [56] I. Stoica, S. Shenker, and H. Zhang, “Core-stateless fair queuing: Achieving approximately fair bandwidth allocations in high speed networks,” in Proc. SIGCOMM, Vancouver, BC, Canada, Sept. 1998. [57] H. Sun, W. Kwok, M. Chien and C. H. John Ju, “MPEG coding performance improvement by jointly optimizing coding mode decisions and rate control” , in IE E E Trans. Circuits and Systems for Video Technology, vol. 7, No. 3, pp. 449-458, June 1997. 196 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [58] W. Tan and A. Zakhor, “Video multicast using layered FEC and scalable com pression” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 373-386, Mar. 2001. [59] J.Y.Tham, S. Ranganath, M. Ranganath and A. A. Kassim, “A novel unre stricted center-biased diamond search algorithm for block motion estimation” , in IEEE Cir. System for Video Tech. vol.8, pp.369-377, Aug. 1998. [60] J. Y. Tham, S. Ranganath and A.A. Kassim, “Highly scalable wavelet-based video codec for very low bit-rate environment” , in IEEE Journal On Selected Areas in Comm., vol. 16, pp. 12-27, Jan. 1998. [61] A. M. Tourapis, O. C. Au and M. L. Liou, “Predictive motion vector field adap tive search technique (PMVFAST)-enhancing block based motion estimation” , in SPIE Proc. Visual Comm. Image Proc., San Jose, USA, Jan. 2001. [62] V. Varsa and M. Karczewicz, “Long Window Rate Control for Video Stream ing” , in Proc. of Packet Video 2001, May, 2001, Korea. [63] Video qulity expert group, on line http://www.vqeg.org/. [64] Q. Wang, Z. Xiong, F. Wu, and Shipeng Li, “Optimal rate allocation for progres sive fine granularity scalable video coding” , in IEEE Signal Processing Letters, vol.9, No.2, pp. 33-39, Feb. 2002. [65] Y. Wang and Q. Zhu, “Error control and concealment for video communication: a review” , in Proc. of IEEE, pp. 974-997, May, 1998. [66] T. Wiegand, M. Lightstone, D. Mukherjee, T. G. Campbell and S.K.Mitra, “Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 6, No. 2, pp. 182-190, Apr. 1996. [67] D. Wu, Y. T. Hou, W. Zhu, Y.-Q. Zhang, and J.M. Peha, “Streaming video over the Internet: approaches and directions” , IEEE Trans. Circuits and Sys tems for Video Technology, vol. 11, no. 1, pp. 1-20, Feb. 2001. 197 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [68] X. Wu, S. Cheng, and Z. Xiong, “On packetization of embedded multimedia bitstream s,” IEEE Trans. Multimedia, vol. 3, no. 1, pp. 132 -140, March 2001. [69] J. Xu, Z. Xiong, S. Li and Y-Q Zhang, “Memory-constrained 3D wavelet trans form for video coding without boundary effects” , in IEEE Trans. Circuits and Systems for Video Technology, vol. 12, pp. 812-818, Sept. 2002. [70] Y. Yue, J. Zhou, Y. Wang, and C. W. Chen, “A novel two-pass VBR coding al gorithm for fixed-size storage application” , in IEEE Trans. Circuits Syst. Video Technol, pp.345-356, Mar. 2001. [71] Q. Zhang, G. Wang, W. Zhu, and Y.-Q. Zhang, “Robust scalable video stream ing over Internet with network-adaptive congestion control and unequal loss protection,” Proc. Packet Video, May, 2000. [72] L. Zhao, J. Shin, J. Kim and C. -C. Jay Kuo, “MPEG-4 FGS Video Streaming with Constant-Quality Rate Adaptation, Prioritized Packetization and Differ entiated Forwarding” , in SPIE Information Technology Conf. Aug., 2001. [73] L. Zhao, J. Kim, and C.-C. Jay Kuo, “Constant quality rate control for stream ing MPEG-4 FGS video,” in Proc. IEEE International Symposium on Circuits and Systems (ISCAS) 2002, Scottsdale, AZ, May 2002. [74] L. Zhao and C.-C. Jay Kuo, “Fast Predictive Integer- and Half-Pel Motion Search For Interlaced Video Coding” , subm itted for presentation in Proc. IEEE International Symposium on Circuits and Systems (ISCAS) 2003, Bangkok, Thailand, May 2003. [75] L. Zhao, J. Kim, Y. Bao, and C.-C. Jay Kuo, “Highly scalable differential JPEG-2000 wavelet video codec for Internet video streaming” , in SPIE Proc. of Applications of Digital Image Processing XX III, San Diego, CA, July, 2000. [76] L. Zhao, J. Kim, and C.-C. Jay Kuo, “Scalable Internet video streaming with differential JPEG-2000 video codec and content-aware rate control” , in Proc. SP IE Visual Comm, and Image Proc. 2001, San Jose, CA, Jan. 2001. 198 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [77] L. Zhao, J. Kim, and C.-C. Jay Kuo, “MPEG-4 FGS video streaming with constant-quality rate control and differentiated forwarding” , in Proc. SPIE Vi sual Comm, and Image Proc. 2002, San Jose, CA, Jan. 2002. [78] L. Zhao, J. Kim, and C.-C. Jay Kuo, “Streaming MPEG-4 FGS Video with Constant-Quality Rate Control and Differentiated Forwarding” , in preparation to submit to IEEE Trans. Circuits Syst. Video Technol., Sept. 2002. [79] S. Zhu and K-K Ma, “A new diamond search algorithm for fast block-matching motion estimation” , in IEEE Trans. Circuits and Systems for Video Technol ogy, vol. 9, no. 2, pp. 287-290, Feb. 2000. 199 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Contributions to coding techniques for wireless multimedia communication
PDF
Design and analysis of server scheduling for video -on -demand systems
PDF
Design and applications of MPEG video markup language (MPML)
PDF
Error resilient techniques for robust video transmission
PDF
Algorithms and architectures for robust video transmission
PDF
Advanced intra prediction techniques for image and video coding
PDF
Information hiding in digital images: Watermarking and steganography
PDF
Call admission control and resource allocation for quality of service support in wireless multimedia networks
PDF
Adaptive video transmission over wireless fading channel
PDF
Efficient acoustic noise suppression for audio signals
PDF
Intelligent systems for video analysis and access over the Internet
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
PDF
Contributions to image and video coding for reliable and secure communications
PDF
Color processing and rate control for storage and transmission of digital image and video
PDF
Contributions to content -based image retrieval
PDF
A stochastic block matching algorithm for motion estimation in video coding
PDF
Adaptive methods and rate-distortion optimization techniques for efficient source coding
PDF
Algorithms for streaming, caching and storage of digital media
PDF
A comparative study of network simulators: NS and OPNET
PDF
Predictive coding tools in multi-view video compression
Asset Metadata
Creator
Zhao, Lifeng (author)
Core Title
Advanced video coding techniques for Internet streaming and DVB applications
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Chang, Tu-Nan (
committee member
), Kim, Chang-Su (
committee member
), Kyriakakis, Chris (
committee member
), Ortega, Antonio (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-272692
Unique identifier
UC11335056
Identifier
3093973.pdf (filename),usctheses-c16-272692 (legacy record id)
Legacy Identifier
3093973.pdf
Dmrecord
272692
Document Type
Dissertation
Rights
Zhao, Lifeng
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical