Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Contributions to coding techniques for wireless multimedia communication
(USC Thesis Other)
Contributions to coding techniques for wireless multimedia communication
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NOTE TO USERS This reproduction is the best copy available. ® UMI R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CONTRIBUTIONS TO CODING TECHNIQUES FOR WIRELESS MULTIMEDIA COMMUNICATION by Chih-Hung Kuo A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2003 Copyright 2003 Chih-Hung Kuo R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. UMI Number: 3133299 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3133299 Copyright 2004 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by C h /A - H un/] /< a n___________ under the direction o f h Sc dissertation committee, and approved by all its members, has been presented to and accepted by the Director o f Graduate and Professional Programs, in partial fulfillment o f the requirements fo r the degree o f DOCTOR OF PHILOSOPHY Director Date December 17. 2003 Dissertation Committee £C($y R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. D ed ica tio n Dedicated with love to my parents and my wife. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. A ck now ledgem en ts I would like to first thank my thesis advisor Professor C.-C. Jay Kuo, for guiding me through all stages in completing this thesis. I learned a lot from his methodology and enthusiasm in doing research and in many other activities. I am grateful to Prof. Antonio Ortega and Prof. Tu-nan Chang for taking their valuable time and efforts to serve on both my qualifying examination and dissertation committees. Their constructive advisement significantly improves the quality of the thesis. Also, 1 would like to thank Prof. Shrikanth Narayanan and Prof. P. Vi jay Kumar for serving my qualifying examination committee and for their supportive comments and suggestions. 1 would also like to express my gratitude to all members of our group for their precious discussion and sharing. Finally, I gratefully acknowledge the generous sup port of my family. I owe my deepest thankfulness to my father and mother for their dedication to my education. I am also very indebted to my wife Mandy Sun for her endless endurance and love during these years. My appreciation to her cannot adequately be expressed in words. iii R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Table o f C ontents D edication ii Acknowledgem ents iii A bstract ix 1 Introduction 1 1.1 Significance of the R e s e a rc h ....................................................................... 1 1.1.1 Wireless Multimedia Broadcast for Multiple-Antenna Systems 3 1.1.2 Efficient Video Source Coding for Wireless Transmission . . . 5 1.2 Contributions of the R ese arch ................................................................... 6 1.3 Outline of the T h e s is ................................................................................... 8 2 R eview of Research Background 10 2.1 Multimedia Broadcast S y ste m ................................................................... 10 2.2 Space-Time C o d in g ....................................................................................... 16 2.3 Differential S T C ............................................................................................. 21 2.4 H.264 Video Coding S ta n d a rd ................................................................... 24 2.5 Fast Motion Search A lg o rith m s................................................................ 27 3 Em bedded Space-Tim e Coding for W ireless Broadcast 30 3.1 M otivation....................................................................................................... 30 3.2 System D escription....................................................................................... 33 3.2.1 Channel M o d e l.......................................................................... 33 3.2.2 Transmitter D esig n .......................................................................... 34 3.2.3 Receiver D esig n .......................................................................... 39 3.3 Analysis of Error Probability...................................................................... 45 3.4 Experimental R esu lts................................................................................... 49 3.5 Conclusion....................................................................................................... 54 iv R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 4 Differential D esign for Em bedded Space-Tim e Coding 56 4.1 M otivation........................................................................................................ 56 4.2 Differential D e s ig n ........................................................................................ 57 4.2.1 An Example: Broadcast with Four A ntennas.............................. 60 4.3 Differential Scheme with Kalman F ilte rin g .............................................. 72 4.4 Experimental R esu lts..................................................................................... 77 4.4.1 Differential D etection........................................................................ 77 4.4.2 Kalman Filtering T racking.............................................................. 79 4.5 Conclusion........................................................................................................ 80 5 Fast Variable Block Size M otion Search for H .264 Encoding 85 5.1 Introduction..................................................................................................... 85 5.2 Overview of H.264 Motion E stim ation......................................................... 89 5.2.1 Tree-structured Motion E s tim a tio n .............................................. 89 5.2.2 Motion Estimation Accuracy and Multiple Reference Frames . 91 5.2.3 Fast Full Search in H.264 .................................................................. 92 5.3 The Proposed A lgorithm .............................................................................. 94 5.3.1 N o ta tio n s ........................................................................................... 95 5.3.2 Estimation of Bits for Residual C o d in g ...........................................100 5.3.3 Multiresolution Motion S earch...........................................................106 5.3.4 Prediction of Motion Vectors..............................................................107 5.3.5 Prediction of Residual V a ria n ce ....................................................... 109 5.3.6 Search with Fast Mode D e te c tio n .................................................... 112 5.3.7 Thresholds S e ttin g .............................................................................. 119 5.3.8 Summary of the A lgorithm .................................................................122 5.4 Experimental R esu lts....................................................................................... 125 5.5 Conclusion...........................................................................................................129 6 Conclusion and Future W ork 138 6.1 Conclusion...........................................................................................................138 6.2 Future w o rk ....................................................................................................... 140 6.2.1 Integration of Space-time and Multimedia Source Coding . . . 140 6.2.2 Further Enhancement of H.264 Encoder ....................................... 141 R eference List 141 v R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. L ist o f T ables 5.1 Comparison of the proposed algorithm and other algorithms for H.264 encoding............................................................................................................. 126 vi R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. L ist o f F igures 2.1 An example of broadcast channels: (a) broadcasting with two chan nels of capacities C\ and 62 and (b) the achievable broadcast bit rates lor two receivers............................................................................................... 2.2 An example of multi-resolution modulation with 16-PSK....................... 2.3 Illustration of the space-time communication system.............................. 2.4 The diagram of the H.264 Encoder.............................................................. 3.1 The channel model of a broadcast system with multiple antennas. . . 3.2 A 4-antenna transmitter with the 2-layer embedded space-time encoder. 3.3 Performance of joint detection receiver....................................................... 3.4 The receiver with one antenna............................................ 3.5 The receiver with two antennas........................................... 3.6 The performance of bit-error rates in a system with the one-antenna receiver............................................................................................................... 3.7 The bit-error rate of the first layer in the two-anterma receiver . 3.8 The bit-error rate of the second layer in the two-antenna receiver. . . 3.9 The bit error rate performance when there is no noise............................ 4.1 A 4-antenna transmitter with the 2-layer embedded differential space time encoder..................................................................................................... 4.2 The receiver with one antenna. ............................................................. 12 14 17 25 34 35 38 40 42 50 52 53 55 58 60 vii R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 4.3 The receiver with two antennas for differential detection......................... 67 4.4 The two-antenna receiver with Kalman filtering........................................ 72 4.5 The bit-error rate of the first layer for the one-antenna receiver with differential detection....................................................................................... 79 4.6 The bit-error rates of (a) the first layer and (b) the second layer for the two-antenna receiver with differential detection................................. 82 4.7 The bit-error rate of the first layer for the one-antenna receiver with differential detection and Kalman filtering................................................. 83 4.8 The bit-error rates of (a) the first layer and (b) the second layer for the two-antenna receiver with differential detection and Kalman filtering. 84 5.1 Partitioning of a MB with tree-structured decomposition........................ 90 5.2 The block-diagram of the proposed motion search algorithm........ 95 5.3 The first level partition of an MB........................................................ 96 5.4 The second level partitioning of an 8x8 sub-MB.............................. 97 5.5 Bits/pixel vs MAD with quantization steps 8, 16, 32 and 128.................. 103 5.6 The approximate bit number vs 7 = e~a®........................................... 104 5.7 Multi-resolution motion search............................................... 106 5.8 The flowchart of 2-level mode decision ..............................................113 5.9 The flowchart of the 3-level mode decision...........................................117 5.10 The performance comparison of the 2-level and the 3-level mode de cision 118 5.11 Percentile of increased bits and split MBs for test sequence “Fore man” (CIF)...........................................................................................................120 5.12 The performance of test sequences with a frame size of 720 x 480. . . 130 5.13 The performance of test sequences of the CIF size............................. 132 5.14 The performance of test sequences of the QCIF size..........................135 viii R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. A b stract Two coding techniques for efficient wireless multimedia communication are in vestigated in this dissertation: (i) an embedded space-time coding (STC) technique for scalable video broadcast with a multiple-antenna system; and (ii) a fast motion mode selection and motion vector search technique for effective implementation of the emerging video coding standard H.264. In the first part, an embedded STC method is proposed for the wireless broadcast application. We investigate embedded space-time codes for layered media broadcast. In such a system, a transmitter sends out multi-layer source signals by encoding different layers with different space-time codes. Then, the receiver can retrieve a different amount of information depending on the number of antennas it has as well as the level of the receiving power. A receiving terminal with one antenna can decode only the base layer information with a low complexity decoder while a more advanced terminal with more antennas can retrieve more layers of information. We further consider the case where the channel state information (CSI) is unknown. We explore the embedded design with differential STC to enable layered media transmission. ix R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Issues of differential design are investigated. Both analytic and experimental result,s are reported. In the second part of this dissertation, we propose a fast motion search algo rithm for the H.264 encoder. One major contribution of the gain of H.264 comes from a very rich syntax for motion compensated prediction at the expense of a higher computational complexity. To be more specific, seven modes of different block sizes and shapes (i.e. 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4) are supported. To do full search over all modes requires an extremely large amount of computation. We propose a fast search algorithm for the variable block size motion estimation. The proposed algorithm includes multi-resolution motion search and a rate-distortion measure to set early-termination rules. By avoiding search through all block sizes, the amount of computation involved in the motion search can be substantially re duced. The proposed algorithm can achieve a speed-up factor up to 120 times when compared to the fastest full-search algorithm. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. C hap ter 1 In trod u ction 1.1 Significance o f th e R esearch A lot of progress has been achieved in the development of wireless voice/data com munication systems in the past decade. For wireless mobile communication, the technology has evolved from the first generation (1G) analog cellular systems to the second generation (2G) digital systems such as GSM, cdmaOne (IS-95), and IS-136. Recently, the third generation (3G) systems have been under development to support a transmission rate from 64 Kbps to 384 Kbps in high mobility and 2 Mbps for indoor applications. In about 10 years or more, the fourth generation (4G) system is expected to provide the transmission rate up to 10 Mbps. Furthermore, for home or small enterprize users, fixed wireless networks and wireless local area networks (LAN) have gained much attention as new methods for Internet access. Several services of this type have been deployed, including MMDS, LMDS, ISM, 1 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. etc. Protocols such as DOCSIS+ and IEEE 802.16 are being developed to support broadband wireless communication. Mobile and fixed broadband wireless systems will enable multimedia applica tions, including image and video services, in addition to the existing voice service. Although the bandwidth per application can be broaden using modern wireless com munication technology, it is still important for the source coder to compress the me dia content as much as possible to fully utilize the available bandwidth. Furthermore, new functionalities, such as scalability and error-resilience, should be implemented to tackle the fading and error-prone characteristic of wireless channels. Standards such as JPEG 2000 and MPEG-4 have been developed to provide good compression performance for image and video data, respectively. The emerging H.264 standard is viewed as the new generation video codec that is suitable for numerous applications, including wireless video communication. However, the high computational cost of the H.264 codec (especially, the encoder) makes its implementation on portable de vices a very challenging task. Reduction in the computation complexity of H.264 will be of great value. Research of this dissertation contains two main parts. In the first part, better utilization of wireless broadcast channels is investigated and a new coding scheme called the embedded space-time code is proposed for multiple antenna systems. In the second part, a fast motion search algorithm to enhance the computational speed of the H.264 video encoder is presented. 2 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 1.1.1 W ireless M ultim edia B roadcast for M ultiple-A ntenna System s The main problem in wireless communication is that fading and interference can cor rupt received signals. Wireless channels are susceptible to environmental noise, and can be degraded due to the motion of the mobile station. The concept of diversity has been developed to overcome these channel impairments, which is a technique to provide the receiver several replicas of signals over independent fading channels [34]. Different ways have been adopted to employ the diversity, such as the frequency diversity, the time diversity, the angle-of-arrival diversity and the polarization di versity. The spectrum spreading (including frequency hopping and DS-CDMA), the RAKE receiver and error-correction coding also exploit the concept of diversity. The antenna diversity, which uses more than one antenna in the transm itter and/or the receiver, is also capable of providing the diversity for wireless transmission. Recently, space-time coding (STC) has been extensively studied to exploit the transmission antenna diversity. It integrates the antenna diversity with coding tech niques to achieve a higher capacity and reduce co-channel interference in multiple access. Tarokh et al. [39] derived an analytical bound for the symbol error rate and presented the criteria for STC that can achieve the maximum diversity. Similar to conventional channel coding schemes, STC can be divided into two categories: trellis and block codes. The trellis or convolutional STC can be decoded by the well-known 3 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Viterbi algorithm. However, the complexity of decoding the space-time block codes (STBC) is much lower than that of the trellis codes. On the other hand, the progress in signal processing and communication tech nologies has enabled digital wireless broadcast applications, leading to the commer cialization of high-defmition television (HDTV), digital video broadcast (DVB), and digital audio broadcast (DAB). A broadcast system differs from the point-to-point transmission in that different receivers can have different receiving capabilities. In [7], the performance of a broadcast channel was analyzed in an information theo retic framework. In [35], a multi-resolution modulation technique was proposed to provide different receiving quality according to the distance between the transm itter and the receiver. The source is divided into several layers so that the receiver with a lower signal power can still reconstruct the base-layer information while the receiver with a higher signal power can get all the information. The layered or scalable source coding technique has become mature for the last few years. For example, JPEG-2000 and MPEG-4 fine granularity scalability (FGS) [23] have been adopted as standards for the progressive encoding of still image and video, respectively. The encoded bitstream has the property that the receiver can reconstruct the source with a rate-distortion tradeoff, even if only parts of the data are received. The scalable coding technique has inspired the exploration of many Internet applications, since the server can perform multicast and/or broadcast with R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. a less storage space. Researchers have also attempted to apply STC to multime dia transmission. For example, Zheng and Liu [45] proposed a layered multimedia transmission scheme over a STC system. Their focus was on the power allocation over antennas, loaded with different layers of multimedia data content, instead of exploiting the structure of STC. In this research, we study a special form of STC for multimedia transmission that utilizes the property of space-time codes and layered coding. 1.1.2 Efficient V ideo Source C oding for W ireless Transm ission In addition to enhancing the wireless communication capacity and performance, the compression of source contents also plays a key role in the provision of broadband wireless services. The source coder should compress the target signal as much as possible while meeting a certain quality criterion for better utilization of the given channel bandwidth. It is also desirable for the source coder to have certain func tionalities such as error resilience to combat the error-prone wireless channel. A large amount of effort has been made to compress the video source for the last two decades. H.264 is an emerging video coding standard jointly developed by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC MPEG [37], It is regarded as the state-of-the-art video coding standard since it outperforms most existing standards by a substantial amount while conveying the same visual quality. It also provides tools necessary to deal with packet loss and bit errors caused by 5 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. error-prone and congestive wireless networks. Thus, H.264 is a good candidate to deliver video contents over wireless channels. However, the computational load of the H.264 encoder is extremely high. The effort to reduce the complexity is important yet challenging. The motion estimation and compensation module is one of the most critical components to be optimized in the H.264 encoder. The power consumption of the motion estimation module ranges from 50% to 90% of that of the whole encoder. In the second half of this thesis, we will develop methods to reduce the complexity of the H.264 coder, especially in the area of fast motion search. I.2 C ontrib u tion s o f th e R esearch The contributions of this dissertation include the following. • A new type of space-time coding, called embedded STC, is proposed in this work. It provides differentiation among receivers. A receiver can recover the full amount of information, if it has a sufficient number of antennas and ade quate computing power. On the other hand, a receiver can still reconstruct the low resolution of transmitted media (e.g. the base layer of video) with a lower computational complexity, when it has a partial set of receiving antennas due to the physical space limitation or the failure of some antennas. • The differential coding design is applied to the case when the channel states are unknown. In practical systems, the receiver may not be able to estimate 6 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. the channel states accurately. Furthermore, an accurate estimation of channel state information (CSI) may demand a lot of additional bandwidth for pilot signals. We present a differential design of embedded space-time codes that, does not require the extra tracking mechanism. • Various receiver designs for embedded space-time systems are developed in this work. They include the following. 1. Non-differential decoding is derived based on the antenna number with known CSI. A smaller number of antennas can be used by a simpler receiver. Higher layer information can be decoded only if the receiver has more antennas. 2. Differential decoding detects symbols from previous signals and does not demand knowledge of CSI for all layer information. The minimum mean- square estimation (MMSE) can be incorporated to further suppress in terferences from other layers, as adopted in non-differential system. 3. Kalman filter tracking can be integrated with differential decoding to improve the performance at the expense of a higher computational com plexity for the high-end terminal. On the other hand, signals can still be decoded by conventional differential detection method in a low-end terminal without the assistance of Kalman filtering. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. • A new algorithm for variable block size motion estimation is proposed for the H.264 encoder. The algorithm detects the best mode among various block sizes based on the residual signal’s variance and motion vectors predicted before the actual motion search process is performed. • A multiresolution motion search method is proposed to predict initial motion vectors and the residual signal variance for each macroblock. A simple relation of variances between coarse and full resolution levels is derived and can be updated adaptively according to image contents. • An adaptive model in the macroblock level is proposed to estimate the number of coding bits. This model adopts the third order polynomial whose coefficients are updated after the coding of each macroblock using the least mean square (LMS) algorithm. This provide robust and stable estimation over a wide range of bit rates. 1.3 O u tlin e o f th e T h esis The rest of this thesis is organized as follows. In Chapter 2, we review some related background information, including multimedia broadcasting technologies, space time coding, the H.264 video coding standard, and fast motion search algorithms. In Chapter 3, embedded space-time codes are proposed and discussed. Differential design of embedded STC is investigated in Chapter 4. Then, a fast motion search 8 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. algorithm for the H.264 video encoder is proposed and studied in Chapter 5. Finally, work in this thesis is summarized, and possible future research topics are presented in Chapter 6. 9 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. C hap ter 2 R ev iew o f R esearch B ackground In this chapter, we briefly review previous related work. The discussion of mul timedia broadcasting is given in Section 2.1. The space-time coding technique is presented in Section 2.2. Differential schemes of space-time coding are introduced in Section 2.3. Features of H.264 standard are introduced in Section 2.4. Fast motion search algorithms are reviewed in Section 2.5. 2.1 M u ltim ed ia B road cast S ystem A broadcast system is designed to simultaneously transmit information from one source to multiple receivers. It differs from the point-to-point transmission in that paths toward different receivers may have different channel capacities. To trans mit signals from a single source through those channels, one may have to limit the 10 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. transmission rate below the worst capacity or perform some type of time or fre quency multiplexing between channels, which is shown to be able to achieve a better performance theoretically [7]. For example, let us consider a broadcast system with two receivers, which have binary symmetric channels (BSC) of capacities C\ and G2. Let the information trans mitted from the station to two receivers be encoded into rates of (/?i, f?,2), respec tively. The first intuitive solution is to transmit information at Cm in = min{Ci, 62} to both receivers. The second solution is to use time sharing. That is, the trans mitter uses a proportion of time 0 < A < 1 to send at rate Cm ;n and the remaining 1 — A to send at rate Cm ax = max{Gj, C2}. It can be easily verified that the second scheme can achieve a better rate for both receivers than the first scheme. Cover [7] suggested that an even better joint rate can be achieved than time-sharing by superimposing high-rate and low-rate information together. Fig. 2.1(a) illustrates another broadcast environment, where channel 1 has a higher capacity {i.e. C\ > C2). The source would like to send information {r, .s,} to user 1 and {r, .s2} to user 2, where r is the common message to both receivers. The coding scheme will be efficient if the set si of information contains the set ,s2 of information. In other words, receiver 1 should be able to decode information s2 in addition to information (r, .s L }. It can be shown by information theory that the information superimposing scheme outperforms the time sharing scheme. Fig. 2.1(b) plots potentially deliverable bit rates for both receivers. The proof in [7] 11 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Decoder Channel Decoder Channel Encoder C, M inim ax C, Superimposing inform ation T im e sharing C2 Rx (b) Figure 2.1: An example of broadcast channels: (a) broadcasting with two channels of capacities C\ and C2 and (b) the achievable broadcast bit rates for two receivers. 12 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. was based on an asymptotic analysis, and a practical construction method was not provided. One practical way to realize a broadcast system with information superposition is to use a special channel coding technique, called the embedded error control code (ECC). That is, each codeword is divided into two groups with a different protection level. A two-level embedded code of length n can be expressed as the (n, Aq, A q, /q, t2) code, where fc , is the information length and lt is the maximum correctable error at the Ah level. One may combine two codes of (ni, A q , ti) and (n2, k2, t2) to obtain an (ri\ + n2, A q , A q, U, t2) code. However, it may be outperformed by an well designed embedded (n, A q, Aq,fi, t2) code. Lin et al. [24] tabulated all possible embedded error control codes of odd lengths up to 65 by using computer search. Up to now, embedded ECC theory is still under development, and the complexity of embedded ECC is still too high. Ramchandran et al. [35] proposed an embedded modulation scheme for multi resolution broadcast. The basic idea is to divide the constellation into “clouds” that represent coarse information. Then, mini-constellations inside each cloud carry the detailed information. An example of multi-resolution(MR) modulation, a 16-PSK is illustrated in Fig. 2.2. The encoder maps 2 bits of X to a coarse cloud and then maps 2 bits of Y in formation to the corresponding point inside the cloud. If the receiving SNR, is high enough, then both layers of information X and Y can be successfully detected. On 13 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. MR 16-PSK Coarse info X ► MR source coder ► --------------------- Fine info Y / x= 0 1 o Y0 0 X = 0 0 O Y 01 rO - X I I o Y Y =n r y-io x = M y Figure 2.2: An example of multi-resolution modulation with 16-PSK. the other hand, X can be extracted for the weaker receiver. One parameter to trade off between the coverage area and the quality of reception is the distance between those constellation points. The coverage of high-quality information including both X and Y is smaller than the area covering the lower-quality broadcast of only X . Various channel codes can be incorporated to fine-tune the performance. In [35], several schemes were considered to design the multi-resolution joint source-channel coding system, including embedded error correction codes (ECC), unequal error protection codes (UEP) and trellis coded modulation (TCM). Simula tion results verified the concept that an embedded transmission scheme is superior to the independent transmission of multi-resolution sources. Their work has inspired 14 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. us to develop a system to broadcast multimedia information. In this research, in stead of examining a different coverage area, we consider receivers with a different antenna number and, hence, a different receiving capacity. As to the source coding type, our target is to broadcast scalable multimedia. Traditional encoders can efficiently compress video or audio signals at a fixed rate. However, there is a so-called “digital cutoff” phenomenon. That is, once the channel bit rate is lower than the coding rate, the quality becomes poor abruptly. On the other hand, even if the transmission rate is much higher than the coding rate, the quality cannot exceed a predefined level. The objective of scalable coding is to optimize the visual quality over a bit rate range, instead of a single bit rate. Scalable decoders should be able to recover the source from a partial bitstream, and the performance can be decided by the amount of receiving data. The received quality degrades gracefully as the channel bit rate decreases, and the theoretical rate-distortion curve can be approached more closely. Usually, a scalable bitstream constitutes layers of information, and quantized sig nals are encoded by bit-planes. As a result, encoded bits can be arranged according to their importance. Several scalability properties have been investigated, including the SNR scalability, the temporal scalability, and the spatial scalability. Two well- known scalable standards are the JPEG 2000 [14] image compression standard and the MPEG-4 fine granularity scalability (FGS) [6][23] format. JPEG 2000 is built upon the wavelet transformation and the EBCOT (Embedded Block Coding with 15 R eproduced with perm ission of the copyright owner. Furiher reproduction prohibited without perm ission. Optimized Truncation) algorithm to achieve several scalable properties. In MPEG-4 FGS, two layers are encoded. The base layer is non-scaJable to provide the guaran teed quality, while the enhance layer is bit-plane coded and can be truncated at, any point. Many scalable techniques are intensively studied and plenty of applications have been proposed and even commercialized in the IT industry. Especially, under the Internet environment, it is desirable for the server to keep only one copy of the scalable source and streaming the content to several receivers to save the storage space and the transmission bandwidth. In this research, we attem pt to develop a wireless broadcast system to transmit scalable media information to heterogeneous receivers. 2.2 S p ace-T im e C odin g The fading channel is one of the main obstacles for the design of effective wireless communications systems since it results in the multipath interference effect to de grade the received signal. However, through the use of multiple antennas, a system could dramatically increase its capacity. It was shown in [ 8] that, in a Rayleigh flat- fading environment, the capacity of multiple-antenna links increases linearly with the minimum of the antenna numbers of the transmitter and the receiver . This is possible since scattering will make combined channel coefficients of different paths become statistically uncorrelated in such a fading environment. The independence 16 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. STC Decoder STC Encoder Figure 2.3: Illustration of the space-time communication system, between channels may be achieved by physically separating antennas by the distance of several wavelengths. Compared to the single antenna system, where a deep fading path may deteriorate the receiving signal severely, we may compensate it by another received signal of higher quality in a multiple antenna system. Thus, to increase the antenna number means to raise the diversity of the communication system. The space-time coding technique aims at the design of the best arrangement of symbols to fully exploit the transmitter or the receiver diversity. Unlike conventional chan nel coding, which maps information symbols onto one-dimension vectors, space-time coding maps information symbols onto two-dimension matrices for multiple antenna tr a n s m is s io n . Let us introduce the basic space-time model below. Without loss of generality, we consider a system where the transm itter is equipped with M antennas and the 17 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. receiver is equipped with N antennas as shown in Fig. 2.3. Information data are encoded into M bitstreams. At time k, the baseband signal srnj, from the m th bitstream is modulated and sent through antenna to, where 1 < to < M. Note that, these M symbols are transmitted simultaneously. At the receiver end, the signal rn,k which is taken from the n-th antenna at time k is given by where is the normalized factor chosen so that the average energy of the trans mitted signal is unity, and £n^ is a sample of the AWGN signal. The coefficient hr,hn is the gain of the path emitting from transmit antenna m to receiving antenna n. Flat fading is usually assumed and, therefore, these coefficients are constant during a time frame of interest. Several design criteria have been proposed in [39] to optimize the performance t by considering the diversity gain and the coding gain. Different coding designs were investigated based on theoretical study. Those codes can be grouped into two major categories: trellis codes and block codes. Trellis codes can be viewed as the parallel concatenation of M convolutional codes. It can achieve a high coding diversity if the code is carefully designed and the frame length is adequate. Decoding can be performed by using well-known Viterbi algorithm. However, the complexity of trellis STC is high while the coding frame is long, and hence it is not practical in implementation. 18 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Oil the other hand, block codes have a lower decoding complexity. In a block STC, the information symbol is mapped to a matrix of fixed size M x K, where K is the frame length of a codeword. Hence, the matrix R of receiving signals can be expressed as R = y/WaH S + Nr, where matrix H of size M x N has channel coefficients fim > n as elements, and matrix S of size M x K contains elements sm,k- The AWGN signals are contained in Nr. Alamouti [2] proposed a block STC scheme, which is widely used in two transmit antenna systems. The STC encoder first maps an information symbol to two symbols c\ and C 2, and then construct a code matrix of size 2 x 2 according to the following structure where the star superscript means the complex conjugate. Symbols c\ and — c * 2 in the first row of the matrix are sent by the first antenna, while C 2 and c\ in the second row are sent by the second antenna. This code satisfies the orthogonal design condition [30]. Thus, it can reduce the complexity of the maximum likelihood decoder as well as achieve the maximum diversity gain. This can be seen from the decision variables (2 . 1) 19 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. for the system with a single receiver antenna, which is reduced to a simple linear combination form as ci = h\rx + h2r* 2, c2 = h * 2rx -hir*2) where rx and r2 are received signals from the antenna for two consecutive symbol durations, and hi and h2 are channel fading coefficients [2]. If there are 2b constel lation points in the signaling space of cx and c2, the number of metric computations can be reduced from 22 6 to 2 x 2b by using this linear combination form. We will adopt this block code as the basic design in the base layer encoding. Generally speaking, block codes with multiple antennas can be designed with a low decoding complexity as long as the matrix is orthogonal. However, note that there is some limitation in extending complex code matrices to the system with more antennas, ft, has been proved that the elements in an orthogonal matrix cannot be complex as the size of the matrix is greater than 2 by 2 [40]. This limited the signal of STC codes with greater sizes to be real numbers. Combinations of STC with FEC and some other modulation schemes, such as orthogonal frequency-division multiplexing (OFDM), are widely studied so that the wireless spectrum can be utilized more efficiently. 2 0 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 2.3 D ifferential ST C In our discussion of STC construction in the previous section, it was assumed that the channel state information (CSI) can be perfectly estimated. In the case of one transmit antenna, differential coding technique can be adopted so that the receiver can perform detection without the CSI knowledge. At time t, the receiver can extract information from some combination of signals received at time t and t — 1. Generalization to multiple antenna transmission is also studied. Hochwald and M arzetta[ll] proposed unitary space-time codes for the use of differential design. In this scheme, the transmitted code matrix S(t) at time t should be unitary S(t)S(t)H = M Im. Before encoding, the information data have to be mapped to a matrix G(t) satisfing the condition G(t)G(t)H = IM. Then, the differential encoding is performed by S(t) = S(t - l)G(t). 2 1 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. The decoding rule based on the maximum likelihood to detect G(t) can be derived as G(t) = argmax5R{Tr(/?(i — 1) R(t)G)}, G (2 .2) where Tr(-) means the trace operation of a matrix and '!£{•} represents the real part of a complex number. Based on a similar idea, Hughes [13] proposed a group construction for differ ential design. Later, Shokrollahi et al. [36] used fixed-point-free groups to design high-rate constellations with full diversity in differential modulation. They tried to find out codes to maximize the diversity gain and in the same time to reduce the decoding complexity. However, a differential detection scheme with a complexity order comparable to non-differential STC was proposed by Tarokh and Jafarkhani [41]. The description in [41] is somewhat tedious. Here, we briefly summarize the results as following. Basically, the transmit code matrix is still of t he form as given in Eq.(2.1), i.e. ci(t) - c 2{t)* C2(t) Ci(t)* S(t) 2 2 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. The mapping from information data to symbols ci(t) and c2(t) is divided into two steps. First, the data are mapped into vectors [c?i(£), d,2 (t)]T. Then, the encoding is performed by Cl (t) Ci(t - 1) - C2(t - 1)* X d\{t) C2(t) C2 (f - 1) C l ( f - l ) * d2(t) _ Suppose that signals rx(t) and r2(t) are received at the receiver end. The decoding rule can be shown to be dx(t) = n[rx(t — l)*r i(f) + r2{t — l)r2(f)*], d2(t) = K[r2(t - l)V i(f) - rx(t - 1 )r2(t)*], where k = l/( |ri (/ — 1)|2 + 1/'a(/-~ 1)|2) is a normalized factor. In the above detection process, channel coefficients are no longer used, and the complexity is much lower than general implementation from Eq.(2.2). Jafarkhani and Tarokh [15] also generalized the above work to systems with more antennas. The encoding and decoding processes involve some re-arrangement of elements in the code matrix, and the construction has some extra limitations than that in the simple two-antenna system. We will adopt this code as the base-layer for embedded differential STC, and more details will be discussed in Chapter 4. 23 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 2.4 H .264 V id eo C odin g Standard The H.264 standard was initiated by the ITU-T Video Coding Experts Group (VCEG) in 1997. At that time, it was named H.26L to denote a high-performance video coding standard aiming at more fundamental changes in the syntax structure of the codec with a longer term development effort. At the end of 2001, ISO/IEC MPEG joined ITU-T VCEG to form a Joint Video Team (JVT) for this effort. The objective of JVT is to develop an unified video standard for both P art-10 of the MPEG-4 standard and the ITU-T Recommendation H.264. The development activity is expected to be officially finalized at the end of 2003. Some unique features of H.264 are summarized below. 1. The main objective of H.264 is to enhance visual quality. The target is to gain up to 50% saving of bit rates as compared to existing H.263+ or MPEG-4 Simple Profile while keeping about the same visual quality. 2. It is desired that H.264 can provide good quality for all kinds of applications with a wide range of bit rates, from very low rate {e.g. video telephony) to very high rates {e.g. digital cinema). 3. For robust communication, H.264 offers tools for the decoder to deal with packet loss or bit errors so that error resilience is included in the standard. 4. H.264 is flexible in meeting a wide range of delay constraints. For example, it can operate in the low-delay mode for real-time video conference applications 24 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Intra Inter i L O utput B itstream M otion vectors Transform Quantization Frame Buffer Loop Filter Intra- Prediction Inverse Quantization Entropy Coder Inverse Transform Motion Estimation & Compensation Figure 2.4: The diagram of the H.264 Encoder. as well as the high-delay mode for non-time-critical applications such as video storage or non-real-time streaming applications. 5. H.264 facilitates easier packetization by adapting a conceptual separation be tween a Video Coding Layer (VCL) and a Network Adaptation Layer (NAL). The former provides high-compression representation of the content, while the latter conveys packages to be delivered over a particular type of network. This enables a friendly integration over a diverse networking environment, including the wireless network. The diagram of the H.264 encoder is shown in Figure 2.4. One can find that the structure of H.264 shares many common compression concepts as previous standards. They include the following. 25 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 1. Each video frame is divided into macro-blocks, which serve as the basic unit of the coding process. 2. Spatial redundancies that exist within the video frame are exploited by intra frame prediction as well as coding through the transform, quantization and entropy coding. 3. Temporal correlations that exist between neighboring frames are exploited so that only differences need to be encoded. The technique of motion search and compensation is exploited for this inter-frame prediction. 4. Redundancies of residual signals after inter-frame prediction are exploited again using the transform, quantization, and entropy coding. Although the framework of H.264 is similar to its previous standards, further improvement in these components allows H.264 to achieve substantially better per formance in the coding gain while at the cost of higher complexity. Some major improvements are listed below. 1. Intra-prediction. H.264 offers nine modes for the intra-prediction of 4x4 luminance blocks, and four modes for 16x16 intr a coding. 2. Inter-prediction. H.264 adopts various block sizes and shapes, high-precision sub-pel motion 26 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. vectors and multiple reference frames for inter-prediction. Our work focus on this part and some new results will be given in Chapter 5. 3. Transform. H.264 employs a pure integer transform as a approximation of 4 x 4 DCT. The small block helps reduce blocking and ringing artifacts while the precise integer specification eliminates any mismatch in the inverse transform. 4. Entropy coding. H.264 has adopted two methods for entropy coding. The first one is Context- based Adaptive Variable Length Coding (CAVLC) and the second one is the Context-based Adaptive Binary Arithmetic Coding (CABAC). In both meth ods, the probability models will adaptively updated according to the context. 5. Deblocking filter. H.264 specifies the use of an adaptive deblocking filter that operates within the prediction loop in order to remove artifacts caused by block prediction errors. 2.5 Fast M otion Search A lgorith m s In the block-based motion-compensated predictive video encoder, motion estimation is the most computationally intensive component. The process of inter prediction is usually designed to find the best match between blocks so that their sum of absolute difference (SAD) is minimum. Full search checks every displacement inside 27 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. the designated search window so that it is the most straightforward way to find the optimal motion vector. However, the total computational load for SAD calculation with full search is extremely high. Many sub-optimal algorithms have been proposed to reduce the number of block matching operations in the search process. Some fast search algorithms were devel oped in the past under the assumption of the unimodal error surface. They include: the three-step search (3SS) [18], the new three-step search (N3SS) [21], the four-step search (4SS) [33], the block-based gradient descent search (BBGS) [25] and the di amond search [42], In these algorithms, the search process is divided into several steps, where several possible displacements were checked at each step and the one with the minimum SAD will be picked as the center for the search at the next step. Although the search speed can be improved, they may result in significant quality degradation in comparison with the full search scheme at the same bit rate. A better strategy to achieve fast search is to predict the good initial displacement of the motion vector, and perform early termination with reliable stopping criteria to avoid unnecessary block matching. Chalidabhongse and Kuo [4] first investigated a fast search algorithm that predicts initial motion vectors from neighboring MBs in both multiresolution and spatial-temporal dimensions. Although the scanning order of macroblocks may be non-causal, their proposed algorithm can speed-up by a factor of 100 to 300 with little quality degradation due to bettor prediction of the initial motion vector. Recently, two fast motion estimation algorithms, i.e. 28 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. MVFAST (the Motion Vector Field Adaptive Search Technique) [43] and PMVFAST (the Predictive Motion Vector Field Adaptive Search Technique) [46] were adopted by MPEG-4 Part-7 as an optimization model [31]. They both attem pted to exploit more correlations from spatial-temporal neighboring macroblocks. The basic ideas of these two algorithms are stated below. 1. Select initial MV predictors from spatially and temporally adjacent blocks to perform the diamond search (DS). 2. Adaptively choose small or large diamonds as the search pattern based on the local motion activity. 3. Apply the early termination principle to avoid inefficient SAD matching oper ations. These two algorithms provide a significant improvement over traditional fast search algorithms in terms of visual quality and complexity reduction. The obtained visual quality from MVFAST and PMVFAST is very similar to that obtained by the full search scheme. However, it considers motion search with a fixed-size block. Since H.264 allows motion estimation and compensation with variable block sizes and shapes, some research effort has to be made to enhance the the efficiency of the variable block-size motion search. 29 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. C hap ter 3 E m b ed d ed S p ace-T im e C oding for W ireless B road cast In this chapter, we propose a new type of space-time coding (STC), called embedded STC, that provides differentiation among receivers. This chapter is organized as fol lows. The overall system is described in Section 3.2. The bit error rate performance is analyzed in Section 3.3. Then, simulation results are presented in Section 3.4. Finally, concluding remarks are addressed in Section 3.5. 3.1 M otivation In this work, we investigate a wireless broadcast system with multiple transmit antennas that supports the transmission of a scalable media source. Designers of conventional communication systems consider each information bit and each user equally important. Thus, they are focused on maximizing the channel capacity or minimizing the bit error rate. However, in the context of multimedia communication, 30 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. the importance of each bit may vary according to its carried content. For example, in the video bitstream, the header is more important than contents part, since a little corruption in the header may lead to totally wrong results. In the layered coding technique, some layers are more important than others. The basic layer is usually critical to the visual quality while enhancement layers can be discarded with much less impact on the perception of end users. Also, different users may have different requirements. Some users may wish to gain higher quality with a higher cost while others may only demand the minimum level of quality of service (QoS). These differences among users and QoS requirements are important in a commercial system. That is our motivation to find a new structure of space-time codes that is applicable to systems with different QoS requirements. We are interested in addressing the following problems. • How to broadcast layered coded media contents with multiple antennas more efficiently? Conventional STC systems do not address its integration with layered source coding components. • How to design an unified code that can apply to different receivers? Specifi cally, let us consider the case that a receiving terminal may have a different number of antennas. Some receivers may have more antennas to receive higher quality signals while some may not have an enough number of antennas due to the cost consideration or the physical size constraint (for example, the mobile 31 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. handset or the wireless PDA) so that they may communicate at the lowest quality level. A conventional communication system seldom considers coding for heterogeneous receivers. Only a single copy of source bit-stream can be transmitted and received. The object of our research is to design an effective broadcasting system aiming at an environment with heterogenous receivers. The receiver can retrieve a different amount of information from the broadcast signals according to its antenna numbers. On one hand, a receiver can recover the full amount of information, if it has a sufficient number of antennas and adequate computing power. On the other hand, a receiver can still reconstruct the low resolution information (e.g. the base layer of video) with a low computational complexity, when it has only one antenna due to the limitation in the physical size or the failure of some antennas. The proposed embedded STC system can be employed in wireless broadcast applications. The embedded STC system is ideal for wireless broadcast with heterogeneous receivers, ranging from a fixed wireless access system with the full receiving capability to a handhold device with one simple antenna. 32 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 3.2 S y stem D escrip tion 3.2.1 Channel M odel The channel model for a wireless broadcast system with two different receivers is depicted in Figure 3.1. The broadcast station encodes the media source by space time block codes (STBC) and sends them out with multiple antennas. Receivers are allowed to have a different number of antennas. Each receiving antenna receives signals from transmit antennas through different paths. It is assumed that these paths are statistically-independent, and signals experience slow fading such that channel coefficients are unchanged during the transmission of a single space-time block code. The fading coefficients are assumed to be perfectly estimated in this chapter. The last assumption will be relaxed in next chapter. The purpose of this work is to design a space-time block code that can transmit the multi-layered source bitstream, so that the receiver with a different number of antennas can recover different amount of information. This means that the receiving diversity is considered to differentiate the service type. For example, if the source is encoded with two layers, then terminal no.l with one antenna can only decode the first layer, while terminal no.2 with two antennas can decode both layers successfully since more diversity can be exploited with more receiving antennas. 33 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Terminal Broadcast Station Terminal 2 Figure 3.1: The channel model of a broadcast system with multiple antennas. 3.2.2 Transm itter D esign Suppose the bitstream is modulated by the phase shift keying. For the *t,h layer, the bitstream is mapped to a sequence of symbols by M -ary constellation, and then encoded into matrix Cit. We transmit the sum C = Cl + C2 + --- + CL (3.1) with multiple antennas. The receiver is expected to retrieve a different amount of data, depending on the number of antennas. More specifically, for the receiver with 34 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Layer Constellation Mapper X, X , Source Coder Constellation Mapper Layer 2 y, -y-i --- ► y 2 y . y 3 y, x . y* A Figure 3.2: A 4-antenna transmitter with the 2-layer embedded space-time encoder, a single antenna, only Ci, hence the first layer, can be successfully decoded. In n- antenna case (1 < n < L ), the first n layers are decoded. Let Pi = Tr((7,:f7*) denote the power of each layer code, where Tr(-) is the trace operator. It is often that P1 > P 2 > - > PL, since the first layer is the most important one and requires the highest transmission power. Figure 3.2 shows an example of a two-layer embedded STC system with four transmit antennas. We use an extension of Alamouti code [2] as the first layer code. 35 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. In a two transmit antenna system, the Alamouti code encodes two baseband symbols C l , C '2 into ci -c*2 c2 4 Symbols cj and — c2 in the first row of the matrix are sent by the first antenna, and C ‘ 2 and c, in the second row by the second antenna. This code satisfies the orthogonal design condition. Thus, it can reduce the complexity of the maximum likelihood decoder, as well as achieve the maximum diversity gain. Let ri and r 2 be the received symbols at time t and / , + 1 in the receiver with a single antenna. Let hi and h2 denote the channel distortion from the two transmit antennas to the one receiving antenna, respectively. Then, the decision variables for the receiver antenna can be reduced to simple linear combination forms as ci = h\r\ + h2r2, c2 = h * 2r\ - hir2. If there are 2b constellation points in the signaling space of C i and c2, the number of metric computations can be reduced from 22 6 to 2 x 2b by using this linear combina tion form [2]. Therefore, this code can be decoded even if the receiver has only one antenna, while the performance can be improved with more antennas by exploiting the diversity gain. 36 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Naguib et al. [30] proposed a method that transmits K Alamouti codes simul taneously. In this case, 2K interfering signals arrive at the receiver. They showed that K receiving antennas are required to perfectly suppress the interference effect, and developed a minimum mean-square error (MMSE) interference suppression tech nique. We adopt this method as the second layer space-time code, which needs at least two antennas to perform the decoding. Therefore, in the embedded STC system with four transmit antennas and two receiving antennas, we transmit the signal in the following way. Suppose the first layer modulates x\ and x2 in the duration of one block code, and the second layer 2 /i5 2 /2) 2 /3 an(l 2 /4- Then, the first layer space-time code C\ is X\ x2 X\ X2 x*2 —x\ x*2 - x \ and the second layer code C2 is 2 /i 2 /2 2 /2 -2/i* 2 /3 2 /4 2 /4 -'2/3 37 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Achievable Performance at SNR=1 OdB - a - Embedded STC - - Time-sharing 0.95 0.9 0.85 0.8 p=1.0 0.7 p=0.4 0.65 p=0.2 0.55 p=0 0.5 0.95 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 P c,layer 1 Figure 3.3: Performance of joint detection receiver. The transm itter then sends C = C 1+C 2 through the air interface with four antennas. Since the first layer is more important, we assign more power to the first layer symbols. Because of PSK modulation, elements in C\ and C ,'2 have fixed amplitudes. The amplitude ratio p is defined as the ratio of the second layer amplitude to the first layer amplitude, i.e. p — | | / 1 a r* | for all possible i ’s and j ’s. 38 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 3.2.3 R eceiver D esign In the decoding of two layers, joint detection can provide optimal solutions for the embedded STC. This can be performed using the maximum-likelihood criterion with {C1,C2,...,C L} = axg min Tr((R - HC)(R - HC)*), (3.2) {Cl,C2v,W,} where C is composed of layer codes from (3.1). Figure 3.3 shows simulation results with an example of two-layer coding at SNR=T0dB. Note that the probability of correctly decoded bits, instead of the bit error probability, is shown in this figure. The solid line depicts the probability to simultaneously transmitting both layers for all possible amplitude ratios p of the sec ond layer signal to the first layer signal. For comparison, the dashed line indicates a time-sharing scheme, which means signals of two layers are multiplexed in different time slots. From the figure, we can see that the embedded STC scheme outperforms the naive time-sharing system. If we fix the probability of the first layer, Pcjayer\ , the probability Pc payer2 of the second layer using embedded STC is always greater than that of the time-sharing scheme. This justifies the superiority of embedded STC as well as the theoretical result derived by Cover [7]. Although optimal detection can be achieved by jointly decoding two layers, it requires a high computational complexity. We develop a fast sub-optimal decoding 39 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. >• Layer 1 D em od M L D etector C om biner Figure 3.4: The receiver with one antenna. algorithm for embedded STC with two layers based on the interference cancellation technique. Figure 3.4 shows the structure of the receiver with one antenna. Let h\, h2, As, hi be channel coefficients of paths from four transmit antennas. In the first layer, each symbol is transmitted by two antennas. Thus, we can combine the corresponding two coefficients together, and reconstruct the symbols with the Alamouti decoder [2], regarding the second layer symbols as noise. More specifically, symbols x\ and x 2 can be estimated as * " Ih, + h tf ! \h, + k4|4<fe ‘ + h^ 'n + V* + < 33> * = |fc, + h l 2 + \h3 + + W < 3-4) The estimated symbols are then detected by the maximal likelihood (ML) method to find the most suitable decisions. 40 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Figure 3.5 shows the structure of the receiver with two antennas. There are eight paths in total from the transmitter to the receiver. Let h \ , h2, h3, h4 denote the channel coefficients at the first receiving antenna, and h5, h6, h7, h8 denote those at the second receiving antenna. We first estimate the first layer symbols by the following combination: f _ ____________________ 1 ____________________ \h\ + /t.2 |2 + 1 ^ -3 + + | he + he |2 + |h7 + h81 2 [(hi + h2)*r\ + (/13 + h^r^ + (he + he)*re + (h7 + h8)r4], A _ 1 | hi + h2 1 2 + |h3 + h4 1 2 + |h5 + hg |2 + |h7 + h8|2 [(h3 + h4)*ri — (hi + h2)r2 + (h7 + h8)*r3 — (he + he) ^4 ] - (3-5) These signals are reconstructed with fading coefficients, and subtracted from the original input signals by A rx = ri - (hi + h2)x\ - (h3 + h4) f 2, A r2 = r2 + (hi + h2)x 2* — (h’ i + hi)xi*, A r3 = r 3 - (h5 + h6) f 1 - (h7 + h8).f2, A r4 = r 4 + (he + h6) f 2* - (h7 + h8) f 1*. (3.6) 41 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Dcmod Dcmod Combiner ML Detector 0 « Rcconstructor M MSE Estimator y 2 ML Detector Figure 3.5: The receiver with two antennas. >• L ayer 1 L ayer 2 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. In this way, the interference from the first layer can be eliminated if its estimation is correct. To estimate the second layer signals y’s, the MMSE method is adopted to find a linear combination Vi = w*Ar i = 1,2,3,' to minimize the mean square error of the estimation value. It is given by iji = w*Ar = a r g m in E 'f ly (3.7) Ar = where vector A r is A ri A r2 * A r 3 A r* If the first layer is perfectly detected, then vector A r satisfies A r = H y + n, (3.8) 43 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. where II is the channel coefficient matrix h hi h3 h\ h * 2 ~h\ K - hi he hi hs /ig — h* 5 K -h*7 and y = [y\, y- 2 ,y3, y,x]r- and n is the additive white Gaussian noise (AWGN) vector. The weighting vector w, in (3.7) can be obtained by applying the Wiener-Hopf equation w i = R _1Pi, (3.9) where R = E'fArAr^] is the correlation matrix and p* = E[Ary*]. By inserting (3.8) into (3.9), we have Wi = £[HH* + nn*]_ 1£'[Hyy*]. Since it is assumed that the receiver knows the channel state perfectly, H is deter ministic and w, = [HH* + 1 h h \ - % , 44 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. where hi is the ?'th column of H and 7 = Es/N q is the signal-to-noise ratio. Then, we can detect y * by finding the codeword closest to w* Ar; yi(u) = argmin | y — w*Ar | . (3.10) Since the PSK modulation is adopted, the decoding rule in (3.10) can be simplified and rewritten as yi = argmax!>R{y*(w*Ar)}, y2 = argmax$R{y*(w2Ar)}, y3 = argmax3?{y*(w;Ar)}, 2 /4 = argmax!ft{y* (w^Ar)}, where ’ R{•} denotes the operation to take the real part of a complex number. 3.3 A n alysis o f Error P rob ab ility In the proposed embedded STC system, received signals for the receiver with one antenna can be written as f\ — h\(xi + j/i) + h2(xi + y2) + h's(x2 + y3) + h 4(3 :2 + r/4) + ui, J " 2 = h\(—x2 — y2)* + h2(—x2 + yi)* + hs(xi — y^)* + h^(xi + y3)* + n2. 45 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. By separating x-terms from y-terrns, we can rewrite these equations in the matrix form r = Hxx + Hyy + n, (3-11) where x = Hx Hv P’i,r;]r , [xi-,x2]1\ bi,y2,2/3,i/4]T, n n1,n*2]T, hi + h2 / 1 3 / 1 4 h * A + h\ - K * CM 1 h\ h2 hz hi K - h{ K - h \ Hence, estimation rules in (3.3) and (3.4) can be rewritten as x = [£i,£2f 1 |hi + h2\2 + lh.3 + /14P 1 = x + \h\ + /3 .2P + I/3 .3 + /14I2 = x + 77 + n', Hx*r Hx*Hyy \h\ + /i-212 + |/3.3 + /3-4 P H *n 46 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. where the term r} = \h + h21 2 + 1 h3 + fe 4|2 'Hx*Hyy is the interference from the second layer, and " \hx + h ^ + \h3 + h4\ ^ n is an AWGN vector, whose element has the same variance a2 v, = o2j (|/i| T h2\2 + 1^3 + ^'41 ^ ) • To investigate the effect of the interference on the error probability, we model the interference rj as a Gaussian random number. Without loss of generality, we consider only the pairwise error probability of x\, whose interference term r/, has mean 0 and variance < 7, V i E[ViVl) P2(l^l|2 + | ^21 2 + \h’ i\2 + j /i41 2) (\h\ + h2\2 + | h3 + /1 4 1 2) The pairwise error probability of x\ is the probability in which x' is preferred in the detector when x is transmitted, and it can be expressed as P(x — > x'\hi, h2, h3, h4) = Q A I rf _ ™ /|2 / I I V (3.12) 47 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. By averaging over all possible realizations of random variables /q’s, we can obtain the average pairwise error probability as P(x ->x') = E Q = / / / / Q h i Jfi2 Jha J /1 4 |a; - m /|2(|/i.1 + h2|2 + |h3 + h41 2) ^ 4^ ( 1^ 12 + 1^ 1 2 + 1^2 + 1 ^ 1 2 )+^] \ \x - x '|2(|hi + fi2|2 + \h3 + h4|2) 4[p2(|/ii|2 + |fi2|2 + \h,3\2 + \h4\2) + a 2] f{hi)f{h2)f(hz)f(h4)dhidh2dh3dh4. (3.13) The channel coefficients h^s have complex Gaussian distributions. Note that the integration over 8 dimensions is required in (3.13). It is difficult to find a closed form for this integration. Thus, we find the approximate value by a computer program, and the result will be shown in the next section. Consequently, the total symbol error probability for M -ary modulation is upper-bounded by the following union bound Pe < (M — l)E Q ,\ \x - x'\2(\hi + h2|2 + |h3 + h41 2) 4[p2(|h i|2 + |h2|2 + \h3\2 + \h4\2) + a2] Now, for the receiver with two antennas, (3.11) still holds if we replace the matrices and the vectors with r * * 1 T r = [ n ,r 2,r 3,r 4J , x = [xu x2}T, y = [ 2 /1,2/2,2/3,2/4]^, 48 R eproduced with perm ission of the copyright owner. Furiher reproduction prohibited without perm ission. n = H , H y = [rq, n * 2, n3, nl]T, hi + /l2 ^3 + ^4 hi + /ij - h * 2 h§ + ha hi + h8 h$ + hi - K - K hi h2 h h,A h * 2 - K h * 4 - h i hb he hi h% K — h* 5 h8 -h*7 The error probability of the second layer can be derived in a similar way as the first layer. However, the interference term is more complex and demands a more sophisticated approximation technique, which is not included in this chapter. 3.4 E xp erim en tal R esu lts We conducted a set of simulations for the proposed embedded STC system. In the transmitter, two layers of data are transmitted with four antennas. In two consec utive symbol durations, two symbols are sent for the first layer, and four symbols for the second layer. Both layers are modulated using the QPSK constellation. All paths experience Rayleigh fading. The receiver requires two antennas to retrieve both layers while the one-antenna receiver can reconstruct only the first layer. 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BER in 1 Rx antenna case 1 0 ------------------ r ------------------- 1 --------------------1 -------------------- 1 -------------------- 1 --------------------t--------------------r ratio=0.1 -© -■ ratio=0.2 -V - ratio=0.3 —B~ ratio=0.4 -A - ratio=0.5 O ratio=0.6 V ratio=0.7 □ ratio=0.8 A ratio=0.9 __________ 1 _____________ I _____________ I _____________ I _____________ 1 _____________ I _____________ I _____________ L____________J_____________ 2 4 6 8 10 12 14 16 18 20 22 24 SNR Figure 3.6: The performance of bit-error rates in a system with the one-antenna receiver. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.6 shows the bit error rate for the one-antenna receiver. Eacli curve represents the case of a fixed p, which is the amplitude ratio of the second-layer to the first layer. As expected, the performance degrades as p increases. Notice that the bit-error rate in each curve does not decrease without limitation as SNR increases, and there is a saturation point. This indicates that the interference from the second layer limits the performance of the system. The performances of the two-antenna receiver are shown in Figures 3.7 and 3.8 for the first and second layers, respectively . The two-antenna receiver achieves a lower bit error rate for the first layer data than the one-antenna receiver, since the two receiving antennas yields a higher diversity gain. The bit error rate for the second layer decreases as p increases from 0.1 to 0.3. This can be understood as the increase of the second layer amplitude is equivalent to the increase of SNR. But, when p continues to increase to be above 0.4, the higher signal power of the second layer interferes the decoding of the first layer. Thus, the falsely detected first-layer data result in severe degradation in the decoding performance of the second layer. From simulation results, we see that the range of amplitude ratios is limited to provide a reasonable performance in the second layer. The ratio range from 0.2 to 0.4 is acceptable, since the corresponding bit error rates fall in the range of 10“'* to 10 2, when the signal to noise ratio is above 15 dB. Next, we simulate the case when there is no noise. This asymptotic performance can demonstrate the interference effect. Figure 3.9 shows simulation results as well 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BER of the First layer in 2 Rx antenna case Figure 3.7: The bit-error rate of the first layer in the two-antenna receiver. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BER of the Second layer in 2 Rx antenna case A l\ □ □ c r '- ■ V V O ' C4, o o ratio=0.1 ratio=0.2 ratio=0.3 ratio=0.4 ratio=0.5 ratio=0.6 ratio=0.7 ratio=0.8 ratio=0.9 24 Figure 3.8: The bit-error rate of the second layer in the two-antenna receiver. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as the analytic upper bound. This graph clearly shows that the performance of the embedded space-time system is limited by the interference. The discrepancy between the simulation and the upper-bound comes from the inaccurate Gaussian model of interference, since the number of terms is not large enough to use the law of large numbers. However, the behaviors of these curves are quite similar to each other. To further reduce the error probability, some error correction codes should be incorporated. The other advantage of the error correction codes is that it can be integrated with recursive interference cancellation to improve the performance. This will be addressed in our future research. 3.5 C onclusion In this work, a novel space-time coding scheme, called embedded STC, was proposed for the broadcast of multi-layer media source. The transmitter sends the multi-layer source signal by embedding the space-time codes from different layers. Then, the receiver with only one antenna can decode only the base layer information with a lower complexity, while the receiver with more antennas can retrieve more layers of information. We also derived the analytic performance of the system. It was shown the the proposed system has the potential for multi-resolution transmission. 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Perform ance w hen SNR - > act 1 0 -1 1 0 ' -2 1 0 ' -3 1 0 ' -4 1 0 ' 1st Layer/1 Ant(SimuL) - e - 1st Layer/2 Ant -a- 2nd Layer/2 Ant Jst Layer/1 Ant(Anal.) -5 0.1 1 0 ' 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Ratio Figure 3.9: The bit error rate performance when there is no noise. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hap ter 4 D ifferential D esig n for E m b ed d ed S p ace-T im e C oding In this chapter, we consider the case when the channel state information is unknown. The pure differential design for embedded STC as well as a hybrid scheme to enhance channel tracking with assistance of Kalman filtering are proposed. 4.1 M otivation In analog to conventional communication systems, differential encoding is adopted so that the receiver can recover the information even without explicit knowledge of the channel status. This is useful when the transmitter is limited by bandwidth to send pilot signals, from which the receiver can acquire channel coefficients. Even in the system with channel estimation mechanism, the channel states information (CSI) may not be perfectly estimated by all receivers because of noisy environment. 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Tarokh and Jafarkhani [41] proposed a differential detection scheme for a space time system with two transmit antennas, which imposes a lower complexity on the receiver. Later, they generalized the construction scheme to systems with more transmit antennas in [15]. We will extend their work to the context of embedded space-time coding. The main idea is that we apply the conventional differential coding scheme to the base layer encoding. Since it has higher power than signals in other layers, the receiver can detect the signal by ignoring those from other layers and treat them as interference. For other layers, the information is encoded with some relation to the first layer signal. This encoding method will allow a receiver with a sufficient number of antennas to decode higher layer signals in accordance with the decoded information for the base layer. 4.2 D ifferential D esign The differential coding for embedded STC is briefly described as follows. At time f, the base-layer information is first mapped to a A-component vector d (t), where N is the number of transmit antennas. The transmitted matrix X (t) of size N x 7’, where T is time duration of the code block, has all elements coming from components in the A-component vector x(f). The relationship between x(/,) and d (/,) can be expressed as x(7) = X(t — l)d(f). 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. d(t) Layer 1 X(t) i m E ft) Layer 2 Mapper Mapper C oder Source Figure 4.1: A 4-antenna transm itter with the 2-layer embedded differential space time encoder. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Matrix X (t) has to satisfy the unitary condition X ( t ) X ( t f = JVJjv, (4.1) while the vector length of x(f) is fixed, i.e. (4.2) Under these conditions, the receiver can detect the signal by combining previously received signals. This can be seen more clearly from an example described below. As to the ith (i > 2) layer information, there is more freedom in choosing the encoding method. We choose the encoding method that depends on the first layer. The ith layer information is first encoded into matrix Ei(t), which has the form that only those receivers with a sufficient number of antennas can decode successfully, as described in the previous chapter. The code Yi(t) representing the zth layer information is computed from Finally, the resulting code to be sent through antennas is Z(t) = X(t) + Y2{t) + YA(t) + ... + YL(t). 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Layer D elay D em od M L D etecto r C om biner W eighting C oefficient C om puting Figure 4.2: The receiver with one antenna. Since the importance decreases as i increases, the following power constraint should be satisfied: P2 > P3 > ... > PL, where Pi = Tr(Yi(t)Yi(t)H) can represent the power of the z-th level signal. At the receiver, the first layer X(t) can be easily retrieved, even signals in other layers are not detected. For those with more antennas, higher layer information Yi(t) can be decoded from signals received by different antennas. 4.2.1 A n Example: B roadcast w ith Four A ntennas Figure 4.1 shows an example with a broadcast system using 4 transmit antennas. As described above, the encoding process can be viewed as mapping to the vector space spanned by the previous code matrix X(t — 1). Then, the receiver can retrieve the information by properly arranged matrix computation, and this code can be decoded 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with only one receiving antenna. The matrix of rank two can serve as the space-time code for the base layer. However, while we attem pt to construct the second layer signals that have to be decoded by at least two receiving antennas, the matrix of rank four is needed. To achieve this, the code block has to be transm itted over 4 symbol durations in the broadcast station with four transmit antennas. To encode the first layer signal with N = 4 transmit antenna, we first map the input 4 bits to a signal vector denoted by d(t) = [d1(t):d2(t),d3(t),d4(t)}1. The codeword d (t) is taking from the following set S = | 0 < i < 15}, which contains 24 = 16 elements. The length of every element s., G S should be unity s f s i = 1 , so that the length of x(t) can be kept fixed. There may be several possible mappings to construct set S. We choose the set to be an obvious mapping from [1,1,1, l]r to 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the column space of any X(t). Therefore, the elements of the set can be listed followed: So = 1. 0, 0, 0 IT Si = .5, -.5 , -.5 , -.5 IT S2 = .5, .5, .5, -.5 IT S3 = 0, 0, 0, -1 IT S4 = .5, -.5 , .5, .5 IT S5 = 0, -1 , 0, 0 IT S6 = 0, 0, 1, 0 IT S 7 = -.5 , -.5 , .5, -.5 IT S8 = .5, .5, -.5 , .5 IT Sg = 0, 0, - 1 , 0 IT S 10 = 0, 1, 0, 0 IT S ll = -.5 , •5, -•5, -.5 IT S 12 = 0, 0, o, 1 IT Sl3 = -.5 , -.5 , -.5 , .5 IT Sl4 = -.5 , •5, •5, .5 IT Sl5 = -1 , 0, 0, 0 IT After mapping a codeword, differential modulation is obtained by x(£) = X(t — l)d(t). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Since the vector d(f) 6 < S is unity, the length of the vector x(/) can be fixed. This can be easily verified by substituting the above relationship into (4.1), i.e. x(t)"x(t) = d(t)HX ( t - l ) HX ( t- l) d { t) = N, which guarantee the fixed length property of x(f) as in (4.2). The construction of columns of the code matrix X(t) needs some permutation and re-arrangement of elements in vector x(f). Let us explicitly denote the code vector by x(t) = [x1(t),X2{t),X3{t),X4(t)]T. Furthermore, we define four permutation operations 7Tj(-), i = 1,2, 3,4 as 7T i(x (f)) = [xl(t),x 2(t),x 3(t),xA(t)]T, 7r2 ( x ( f ) ) = [-X2(t),Xi(t), - x 4(t),Xz(t)]T 7r3 ( x ( f ) ) = [-X3(t),X4(t),Xi(t), - X 2(t)]T 7T4 ( x ( t ) ) = {-X4(t), - X 3(t), X2(t), Xi(t)]T Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Then, code matrix X (t) is composed by column vectors as (4.3) X(t) = [7ri(x(£))|7r2(x(£))|7r3(x(t))|7r4(x(0)] x x (t) - x 2(t) - x 3(t) - x A(t) x2(t) Xi (t) xA(t) - x3(t) x-i (t) - x A(t) Xi(t) x 2{t) xA(t) xz(t) - x2(t) Xx (t) Note that elements in the above matrix are restricted to be real since it was proved in [40] that code matrices of a size greater than 2 x 2 cannot be complex . This means that only PAM is needed. Since we have chosen d(t) to be real, matrix X (t) would also be real with real initialization values. As for the second layer, we map the codeword into 8 symbols eA (t), e2(t), e^(t),..., e^(t). Unlike the first layer, these symbols could be complex numbers. The information matrix for the second layer is encoded by using the following equation m = ei(t) e2(t)* e5(t) e6(t)* e2(t) ~ex (t)* e6(t) - e b{t)* es(t) eA(t)* e7(t) e8(f)* eA{t) - e 3(t)* e8(t) - e 7(t)* One can easily find that the above construction is a concatenation of two codewords for the second layer signals as described in non-differential scheme. This construction 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can only be detected by the receiver with at least two antennas. In this construction, we also set the amplitude of each element in E(t) fixed to be p. The actual codeword to be transmitted for the second layer is Y (t) = X{t)E{t). The sum of two matrix Z(t) = X(t) + Y(t) = X{t) + X(t)E(t) are then transmitted through 4 antennas. Now, let us examine the decoding of the first layer information. Consider that four consecutive symbols ri(i), r2(t), r3(t) and r 4(f) are received by a terminal with one antenna. For any receiving antenna, suppose the channel coefficients are from the zth transmit terminal (* = 1, 2, 3, 4), and form the vector h(f) = [hi(t), h2(t), h3(t), h4(t)]T. The receiving signal at kth time slot can be expressed as rk(t) = h(f)TII;(x(f)). Let r (0 = [ri(0,r2(i),r3(t),r4(t)]T. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It can be shown that r (t) = Z (t,)1 h(t) + n(t) = X(t)Th(t) + Y(t)Th(t) + n(t). By expanding the term X(t)Th(t), one can easily verify that r (t) = + rY(t) + n(t) = H ( t ) X ( t - l ) d ( t ) + T Y(t) + n(t), where H(t) = [7ri(h(f))| - 7 r2(h(f))| - 7 r3(h(i))| - 7 r4(h(i))]r hi(t) h2(t) h3(t) h4(t) h2(t) -hi(t) h4(t) - h 3(t) hz(t) —h4(t) —h\(t) h2(t) h4(t) h3(t) - h 2(t) - hi(t) is the coefficient matrix and the term ry(t) denotes the signal generated from the second layer code matrix Y. Thus, we have r (t) = R(t - l)d(f) + r y(f) + n(f) (4.4) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Demod d (0 W(t) r (t -l) Delay ML Detector Combiner ML Detector Weighting Coefficient Computing MMSE Estimator Reconstructor Figure 4.3: The receiver with two antennas for differential detection, where the receiving signal matrix R(t — 1) is the delay of R(t) = ki(r(t))| - 7 r 2(r(*))| - 7 T 3(r(*))| - 7 r 4(r(t))]7 n{t) r2(t) rs(t) r4(t) r2{t) rA(t) ri{t) - r 4{t) -n ( t) ^2 (t) r4(f) rs(t) - r2{t) -n (* ) > Layer 1 Layer 2 6 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A simple combining rule can be derived by multiplying the received signal vector with the inverse of the matrix R(t) to find the estimation of d(t). That is, d (t) = m ty'rit). However, the performance of this approach is not acceptable due to the interference from the second layer signal ry(t). Instead, we adopt the MMSE method to sup press the interference. By approximating the second layer interference as additive Gaussian noise, the estimate of the information signal d is computed from d(t) = W H(t-l)r(t), (4.5) where W(t) = {R{t)R(t)H + p2h)~'R(t), and where p is the amplitude of the components in the second layer code matrix. For a terminal with two receiving antennas, the above procedure will be per formed twice for two antennas, separately, and then their average is used as an estimate for the base layer signal. Let us use r^\t), H^\t) and cr^(i) to denote the received signal, the channel coefficients matrix, and estimated symbols com puted from the i-th receiving antenna, respectively. The first layer symbol is simply estimated as d{t) = (d^{t) + d^{t))/ 2 . 68 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. To decode the first layer, we can find the nearest codeword d (t) = argmin |s — d(£)|2. To further decode the second layer, we first reconstruct the base layer signal and then subtract it from the received signal: A r(1)(f) = r (1)(f) - R w ( t - l)d(t), Ar M(t) = rV\t)-R<-2\ t - l ) d ( t ) , where R^(t) and R( 2Ht) are signal matrix constructed from the two receiving an tennas, respectively. Let r f\t) denote the j'th signal at the v'th antenna. Similarly, let Arf\t) denote the residual jth signal at the ith antenna. It can be shown that, if the first layer is correctly decoded, we have AR(t) ~ 0(£) x Ed{t) + N(t), where A R(t) = [Ar(1)(£)|Ar(2)(£)] 69 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. e(t) Ed(t) = A r^(f) A r ^ \t) - W I N < A 4 2)(t) A A r ^ t ) A ri2\ t ) r s \ t ) - r ( 21}(t)* - r i l)(ty ri1](ty r ? \ t ) r ¥ \ t ) r{2)(t) * 1 r { y \ t y r g \ t y ei(t) e5(t) e2 (t) e6 (t) e3 (t) e7(t) e4 (t) e$(t) and N(t) is a 4 by 2 matrix whose components are AWGN signals. By applying MMSE again, we can get estimates of e;(f) so that the nearest signal can be detected via Ed(t) = W £(t-l)A R (t), (4.6) where wE (t) = (e(f)e(f)" + and F is the SNR value of the first layer signal. 70 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. As shown in Figure 4.1, let us summarize the differential encoding of embedded STC below. Step 1: Map base layer information into vector d(t) Step 2: Compute x(f) = X(t — 1 )d(t). Step 3: Construct X(t) from Eq. (4.3) Step 3: Map the second layer into vector E(t). Step 4: Compute Y{t) = X(t)E(t). Step 5: Send Z(t) = X(t) + Y(t). Step 6: Update t to t + 1, go back to Step 1. Furthermore, as given in Figure 4.3, we can summarize the decoding of the re ceiver with two antennas as follows. Step 1 Estimate information symbols d (t) from Eq. (4.5). Step 2 Find the nearest vector in S. Step 3 Reconstruct the first layer via Rx(t) = Rx(t — l)d(t). Step 4 Find residual AR(t). Step 5 Find MMSE of E(t) from Eq. (4.6). Step 6 Decode by finding the nearest values for the second layer. Step 7 Update t to t + 1, go back to Step 1. 71 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. D em o d d ,(t) z,(0 Layer 1 Layer 2 R econ structor E ncoder D e co d er K alm an Filter D e co d er Figure 4.4: The two-antenna receiver with Kalman filtering. 4.3 D ifferential Schem e w ith K alm an F ilterin g The performance of differential STC could degrade a lot in a time-selective fading environment. This is especially true for the second layer signals, since they have less power and need to be estimated more accurately. In this section, we examine a channel tracking mechanism designed for a terminal with two receiving antennas (while the one-antenna terminal can still decode base-layer signals differentially with a lower complexity as before.) 72 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Liu el al. [27] proposed a Kalman filtering scheme for the STC system with 2 transmit antennas. With the assistance from Kalman filtering, the time-varying channel status can be tracked with decision feedback. In this section, we propose a new scheme that combines differential STC with channel tracking. At the encoder, the first layer still adopts differential encoding, which allows the low-end terminal with a single antenna to receive the base layer signal without much computation. The Kalman filtering mechanism is adopted by the high-end terminal with two antennas. Let us consider a system consisting of a 4-antenna transmitter for embedded Space-time codes as before. Assume that the initial channel coefficients can be trained by the pilot signal of the Kalman filter [17] in the start-up phase. The first- order autoregressive (AR) model for the time-selective fading channel is adopted to describe the channel variation. That is, for the i-th path channel coefficients received by the /c-th antenna, h f\t) varies according to the following form h f\t) = a h f \ t - 1) + vlk\t), where v^\t) is a zero-mean complex Gaussian variable with variance and a is the correlation factor that can be derived easily from the expectation value of channel realizations a = E[hi(t)h*(t — 1)]. 73 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. At the encoder, the base-layer code X{t) is obtained with the differential code as described before. The second layer code Y (t) is then simply encoded by using the following formula: ei(t) e2(t)* e5(t) e6(t)* e2(0 - e x (t)* es(t) - e 5(t)* Y(t ) e3(t) e4(t)* e7(t) es(t)* e4 (t) ~e3(t)* es(t) - e 7(t)* The sum of two layer codes Z(t) = X(t) + Y(t) is transmitted through antennas. For the receiver with one antenna, only the first layer can be decoded and conven tional differential STC can be adopted directly. For the receiver with two antennas, the base layer can also be estimated by the conventional differential coding system. To further decode the second layer, we get the help from Kalman filtering. Let the state vector of Kalman filtering be a collection of channel coefficients, i.e. x (t) = [h{ ^(t), h{ 2](t), h£\t), h{ 4 \t), h{ i\t), h( 2]{t), h£\t), h^it)]7'. The observation vector can be naturally chosen to be the received signals from antennas ((t) = [rf }(f), 4 1 }(f), r^(t), r( 4 \t), rf\t), r£2)(*), rf\t), r f ] 74 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. We obtain the state equation x(t) = A x ( t- l) + v ( t ) , (4.7) where A — als is a 8 by 8 diagonal matrix and the measurement equation is c (t) = S{t)x{t)+n{t), (4.8) where Z { t f 04 04 Z ( t f and where Z(t) = X(t) + Y(t ) is a 4 by 4 transmitted code matrix and O 4 is a zero matrix of size 4 by 4. The value of X(t) is from the differentially decoded signal of the first layer, which does not require the knowledge of channel coefficients. The second layer code matrix Y (t) can be estimated by the MMSE algorithm from coarsely predicted channel coefficients X{t) = Ax(t - 1). (4.9) With observation matrix Z(t) available, y(f) can be obtained by using a standard Kalman filtering [29]. The updated value x(t) can be further utilized to refine the decoding. At this pass, since we have an estimation for the second layer information Y (/,), it should be 75 R eproduced with perm ission of the copyright owner. Furiher reproduction prohibited without perm ission. reconstructed and subtracted from the received signal to help decode the first layer more accurately. This residual signal can be expressed as AC(t) = C(t) - Y(t)T 0 4 0 4 Y(t)T x X(t), (4.10) which can be used to decode the first layer to get new X (t) by differential decoding. Again, to update the second layer signal, the first layer has to be removed. That is, the signal r X(t)T 0 4 0 4 X(t )T ac y(t) = m - x X{t) (4.11) lias to be used as the input for the second layer decoding. The filtered channel states x{t) along with ACx(t) and ACy(t) can be used to decode in this phase. More iterations of these three vectors can be performed to achieve better estimation whenever it is necessary. We summarize the encoding and the decoding procedures below. Encoding: Step 1: Map information into vector d(t). Step 2: Compute X(t) = X(t — 1 )d(t). Step 3: Map the second layer information into vector E(t). Step 4: Send Z(t) = X(t) + E(t). Step 5: Update t by t + 1 and return to Stepl. 76 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Decoding: Step 0: For t = 0, initialize the Kalman filter by the training sequence. Step 1: Estimate the first layer signal X(t) by differential decoding. Step 2: Obtain prediction Xp(t) from (4.9). Step 3: Compute the residual signal A ry. Step 4: Detect the second layer signal Y(t) from Ary. Step 5: Reconstruct matrix Z(t) = X(t) + Y(t). Step 6: Perform Kalman filtering to update h(t). Step 7: Decode X from A (x(t) in Eq. (4.10). Step 8: Decode Y from A(y(t) in Eq. (4.11). Step 9: Iterate Steps 6-8 more times to improve the tracking performance. Step 10: Update time t by t + 1 and return to Step 1. 4.4 E xp erim en tal R esu lts 4.4.1 D ifferential D etection In this subsection, we performed computer simulations on the proposed detection scheme for embedded differential STC as described in Section 4.2. In the transmitter, two layers of data are transmitted with four antennas. For a four-consecutive-symbol duration, four symbols are sent at the first layer, and eight symbols at the second layer. The mapping of the first layer is from 4 bits onto the space S. The second layer modulation is simply BPSK in the quadrature phase with value {+*, All 77 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. paths experience the Rayleigh fading effect. The receiver requires two antennas to retrieve both layers while the one-antenna receiver can reconstruct only the first layer. Suppose that channel coefficients do not change during one block duration. The second layer has only the imagery part while the first layer has only the real part. The channel model has a slow fading coefficient rate set to 2.5 x 10 4. The simulation shows that the uncertainty of channel status and noise can deteriorate the performance severely, since errors tend to accumulate as differential detection is performed. Hence, we insert a re-synchronization symbol for every duration of 80 symbols so that the receiver can adjust decoded signals to prevent error propagation. Figure 4.5 shows simulation results of decoded bit error rates for the base layer in a terminal with only one receiving antenna at different p and SNR values. Compared with those in Figure 3.6 of the non-differential scheme, we see that the performance curves of the differential scheme have a higher ‘floor’ when the p value is the same, and the saturation SNR point is lower. This means that the interference effect is severer in the differential case. Figs. 4.6(a) and (b) show the corresponding results of layers 1 and 2 in the 2-antenna terminal, respectively, of the differential scheme. As expected, the performance of the first layer in this case is better than that of the one- antenna system since it exploit more receiving diversity. The overall performance is s lig h tly w o rs e t h a n t h a t of t h e n o n - d if f e r e n tia l sc h e m e . 78 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. BER of one-antenna terminal c c -2 w 10 p=0.1 -© - p=0.2 -B - p=0.3 -V - p-0.4 - 0 - p=0.5 p=0.6 □ p=0.8 V p=0.9 SNR(dB) Figure 4.5: The bit-error rate of the first layer for the one-antenna receiver with differential detection. 4.4.2 K alm an Filtering Tracking In this subsection, we simulated the proposed differential detection scheme for em bedded differential STC with channel tracking via Kalman filtering as described in Section 4.3. The setting of the simulation was the same as that in 4.4.1. Again, the channel was modelled as a slow fading one with the coefficient rate set to 2.5 x 10-4. Resync words were inserted as done before. They also serve as the pilot signal for Kalman filter tracking. We computed parameter a off-line based on our channel realizations. It was observed in our experiments that more iterations in updating 79 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. coefficients and decoding did not improve the performance much. Hence, we show results with no extra iterations. Figure 4.7 shows simulation results of decoded bit error rates for the base layer in a terminal with only one receiving antenna at different p and SNR values. Comparing them with results of the pure differential scheme, we see that the first layer has almost the same performance for these two one-antenna receiving systems. This is easily understood since the interference level from the second layer is almost identical. The results for the two-antenna receiver are shown in Figures 4.8 (a) and (b). We see that the BER floor for each p value has been reduced to about one half. For the second layer information, there is a similar performance. This is due to the tracking effect of the Kalman filter, which reduces the interference of two layers. These results indicate that the proposed system with Kalman filtering can improve the differential scheme at the cost of the complexity of the receiver with two antennas. However, the receiver with only one antenna can still get the same quality without increasing its complexity. This is suitable for low-end receivers such as handsets and wireless PDA. 4.5 C onclusion The differential detection scheme has been developed for embedded STC in the situ ation where the channel state information is unknown. A tracking mechanism using the Kalman filter was integrated to a higher-end terminal to estimate channel states, 80 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. and a sequence of numerical experiments was conducted to verify the performance of the proposed system. To reduce the error probability furthermore, some error correction codes should be incorporated. Another advantage of error correction codes is that they can be in tegrated with recursive interference cancellation to improve the system performance, especially for the differential scheme. Besides, iterative updating in channel tracking can be incorporated to improve the overall performance. These research topics will be studied in the near future. 81 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. BER of two-antenna terminal(layer 1) £ £ U J C O — p=0.1 -©- p=0.2 - a - p=0.3 -V- p=0.4 p=0.5 O p=0.7 20 24 SNR(dB) (a) BER of two-antenna terminal(layer 2) 10'1 a : m — p=0.1 -© - p=0.2 p=0.3 - V p=0.4 - 0 - p=0.5 20 22 SNR(dB) (b) Figure 4.6: The bit-error rates of (a) the first layer and (b) the second layer for the two-antenna receiver with differential detection. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. BER of one-antenna terminal a: -2 w 10 — p=0.1 -e- p = 0 .2 -B - p=0.3 -V - p=0.4 - 0 - p=0.5 p=0.6 G p=0.7 □ p=0.8 10~“ SNR(dB) Figure 4.7: The bit-error rate of the first layer for the one-antenna receiver with differential detection and Kalman filtering. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. BER of two-antenna terminalflayer 1) ui 1 0 ' p=0.1 -© - p=0.2 -9 " p=0.3 p=0.4 — 0— p=0.5 p=0.6 O p=0.7 V p=04 SNR(dB) (a) BER of two-antenna terminalflayer 2) UJ 1 0 ' p=0.1 -e - p=o.2 — 0 — p=0.3 - p=0.4 p=0.5 p=0.6 O p=0.7 V p=0.< SNR(dB) (b) Figure 4.8: The bit-error rates of (a) the first layer and (b) the second layer for the two-antenna receiver with differential detection and Kalman filtering. 84 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. C hap ter 5 Fast V ariable B lock Size M otion Search for H .264 E n cod in g 5.1 In trod u ction Since the resource of wireless channels is scarce for the transmission of multimedia contents, a significant amount of effort has been made to compress the video dat a more efficiently in last two decades. H.264 is the latest video coding standard jointly developed by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC MPEG [37]. It is widely regarded as the state-of-the-art video coding standard since it is able to save up to 50% in the file size (or the bandwidth consumption) while delivering the same visual quality as compared to existing standards. Besides, it provides tools to deal with the packet loss and/or the bit error caused by error- prone wireless channels and networks. The excellent coding gain of H.264 is achieved at the expense of the high encoder complexity. How to reduce the complexity while 85 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. preserving good coding performance is an interesting research problem. In this chapter, we will investigate methods to reduce the complexity of the H.264 encoder with special attention on fast variable block-size motion search. H.264 is a block-based motion-compensated predictive coder. It is similar to its prior standards in the general framework yet with improvements on some coding modules such as variable block-size motion estimation, intra-frame prediction, in loop deblocking filter, and context-adaptive arithmetic coding, etc. The main reason for H.264 to outperform other standards is that it allows the choice among multiple modes in several coding components as long as this freedom can provide a substantial coding gain. However, this freedom also implies more computation since one has to select the best mode among all possible modes to give the best rate-distortion tradeoff. For inter-frame prediction, H.264 allows blocks of variable sizes and shapes. To be more specific, seven modes of different sizes and shapes, i.e. 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4, are supported in H.264. Smaller blocks, which intend to better characterize the motion behavior of a region, can reduce the prediction error and provide better visual quality due to the less visible blocking artifact. Besides variable block sizes and shapes, the use of higher precision in the motion vector rep resentation also improves the coding gain. H.264 supports 1/4-pel motion accuracy. In addition, H.264 allows the use of multiple reference frames, which is useful in dealing with periodic motion in the sequence. To achieve the highest coding gain, 86 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. an optimized H.264 encoder will estimate the best motion vector (MV) by searching among multiple reference frames and multiple block modes with 1 / 4-pel motion vec tor precision. With all these modes in place, the computational complexity of motion estimation increases dramatically as compared with previous standards. This is one major bottleneck for the H.264 encoder. Let us briefly review motion vector search algorithms. The full search algo rithm checks every displacement inside the designated search window. It is the most straightforward way to find the optimal motion vector. Many sub-optimal algo rithms have been proposed to reduce the number of block matching operations in the search process. Some fast search algorithms were developed in the past under the assumption of the unimodal error surface. They include: the three-step search (3SS) [18], the new three-step search (N3SS) [21], the four-step search (4SS) [33], the block-based gradient descent search (BBGS) [25] and the diamond search [42], In these algorithms, the search process is divided into several steps, where several possible displacements were checked at each step and the one with the minimum SAD will be picked as the center for the search at the next step. Although the search speed can be improved, they may result in significant quality degradation in comparison with the full search scheme at the same bit rate. A better strategy to achieve fast search is to predict the good initial displacement of the motion vector, and perform early termination with reliable stopping criteria to avoid unnecessary block matching. Chalidabhongse and Kuo [4] first investigated 87 R eproduced with perm ission of the copyright owner. Furiher reproduction prohibited without perm ission. a fast search algorithm that predicts initial motion vectors from neighboring MBs in both multiresolution and spatial-temporal dimensions. Although the scanning order of macroblocks may be non-causal, their proposed algorithm can speed-up by a factor of 100 to 300 with little quality degradation due to better prediction of the initial motion vector. Recently, two fast motion estimation algorithms, i.e. MVFAST (the Motion Vector Field Adaptive Search Technique) [43] and PMVFAST (the Predictive Motion Vector Field Adaptive Search Technique) [46] were adopted by MPEG-4 Part-7 as an optimization model [31]. They both attem pted to exploit more correlations from spatial-temporal neighboring mcroblocks. The basic ideas of these two algorithms are stated below. 1. Select initial MV predictors from spatially and temporally adjacent blocks to perform the diamond search (DS). 2. Adaptively choose small or large diamonds as the search pattern based on the local motion activity. 3. Apply the early termination principle to avoid inefficient SAD matching oper ations. These two algorithms provide a significant improvement over traditional fast search algorithms in terms of visual quality and complexity reduction. The obtained visual quality from MVFAST and PMVFAST is very similar to that obtained by the full search scheme. However, it considers motion search for a fixed-size block only. 88 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Since H.264 allows motion estimation and compensation with variable block sizes and shapes, some research effort has to be made to enhance the efficiency of the variable block-size motion search. In this work, we not only provide a fast motion search algorithm but also introduce a fast mode selection technique so that the best motion candidate can be found without searching all modes exhaustively. Our algorithm can achieve up to 120 times speed up as compared to the full search algorithm with good implementation. The rest of this chapter is organized as follows. An overview of H.264 motion estimation is given in Section 5.2. The proposed fast searching algorithm is described in Section 5.3. Experimental results are presented in Section 5.4. Finally, concluding remarks are given in Section 5.5. 5.2 O verview o f H .264 M otion E stim a tio n 5.2.1 Tree-structured M otion E stim ation Similar to previous standards, H.264 uses block-based motion-compensated predic tion to reduce temporal redundancy between successive frames. The difference is that H.264 supports a set of different block sizes and shapes, i.e. 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4, while most prior standards allow only the block of size 16x16 (called the macroblock) for motion compensation. For better adaptation to motion details, H.264 adopts a tree-based decomposition to partition a macroblock (MB) 89 R eproduced with perm ission of the copyright owner. Furiher reproduction prohibited without perm ission. 8x8 4x4 4x4 4x4 4x4 4x8 4x8 8x4 8x4 Figure 5.1: Partitioning of a MB with tree-structured decomposition. into smaller sub-blocks of specified sizes. For example, one MB of size 16x16 may be kept as is, decomposed into 2 rectangular blocks of size 8x16 or 16x6, or decomposed into 4 square blocks of size 8x8. If the last case is chosen (i.e. four 8x8 blocks), each of the four 8x8 blocks can be further split to result in more sub-macroblocks. There are 4 choices again, i.e. 8x8, 8x4, 4x8, 4x4. These partitions result in a large num ber of possible block decompositions for each MB. An example of an MB with tree decomposition is shown in Fig. 5.1. If an MB is divided, each sub-macroblock inside the MB requires a separate motion vector. For example, if an MB is coded using Inter-8x8, and each 8x8 sub macroblock is coded using Inter-4x4, 16 motion vectors will be transm itted for this MB. All motion vectors as well as the partition information should be coded and 90 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. transmitted. In general, a partition with larger block sizes requires fewer bits to represent the associated motion vectors and the partition type, but it may need more bits to encode motion compensation residuals if encoded areas contain high motion details. On the contrary, a partition of smaller block sizes may give smaller residuals after motion compensation but requires more bits to represent motion vectors and partition types. Thus, an optimal tree-structured motion compensation is to find the best combination of partition block sizes to minimize the final coded bits for each macroblock. 5.2.2 M otion E stim ation A ccuracy and M ultiple R eference Frames In addition to variable block sizes, the resolution of motion vector in H.264 is further refined to a quarter of pixel distance. The quarter-pel accuracy outperforms the integer or the half-pel accuracy at the expense of an increased complexity. The value at the quarter-pel position is generated using bilinear interpolation between neighboring integer- and half-pel positions. The value at half-pel position is obtained by applying 1-D 6-tap FIR filter over inter-pel samples. The H.264 standard also offers the option of having multiple reference frames in inter-prediction coding. There are more than one previous coded frames can be used as a reference frame for motion compensation. 91 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 5.2.3 Fast Full Search in H .264 To obtain the motion vector of each partition or sub-partition, block-matching tech niques that minimize a cost function measuring the mismatch between the current block and the candidate block within the search area are commonly adopted due to its simplicity. The most widely used distortion measure is the sum of absolute difference (SAD), which is defined as M - l N —l SAD(mx,my) — E E |Fk(x + i,y + j) - Fk_i(x + i + mx, y + j + my)|, (5.1) i = 0 j = 0 where Fk(x, y) is the (x. y)-th pixel in the current frame and Fk-\ is the reconstruc tion of the previous frame, (mx,my) is the displacement relative to the block with size M x N. In this work, the mean of absolute difference (MAD), defined as x SAD(mx,mv) MAD(mx,m,y) = M x N > (5-2) is adopted for the distortion measure. In the motion estimation for H.264, there are total seven possible block sizes to be searched for each MB. Following the naming convention of the reference software, we let modes 1 to 7 denote block sizes of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4, respectively. The most straightforward way to find the optimal motion vectors of an MB is to use the full search (FS) over all 7 modes. The FS algorithm, which evaluates 92 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. MAD at all locations of a given search window, requires a large amount of compu tation. Therefore, in the reference software of H.264, an alternative implementation of the FS algorithm is provided. One simple idea to speed up the full search algorithm is to use the bottom-up merge process. First, an FS is performed for all 4x4 partitions. The corresponding MADs and motion vectors are stored for later used. Then, the MAD of a larger partition (e.g. 4x8, 8x4, or higher) can be obtained simply by averaging the MADs of all 4x4 sub-blocks inside this partition. W ith this approach, block-matching operations for larger partitions can be saved. The search result is the same as exhaustively searching all 7 modes while the amount of MAD computation can be reduced to 1/7. The major drawback of this method is the precise minimum MAD for each 4x4 block is needed in calculating the MAD of a larger partition. Therefore, it cannot take advantage of many fast search algorithms developed in the past, which reduce the number of search positions but cannot guarantee the global minimum MAD. As a result, this fast full search algorithm still demands a considerable amount of computation. Another way to speed up the motion search of H.264 is directly applying a fast search algorithm to each mode. This approach will sacrifice some coding performance since most fast search algorithms only find sub-optimal solutions to trade for more computational saving. However, these fast search algorithms cannot be as effective for H.264 as in prior standards because of more modes. For example, in prior coding 93 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. standards, the MVFAST algorithm can achieve a speed-up factor of around 100 as compared to the FS algorithm. However, in H.264, this method can only speed up by a factor of 14. The speed-up factor is reduced to 1/7 since there are 7 times of locations to be searched in this fast algorithm. 5.3 T h e P rop osed A lgorith m Fig. 5.2 depicts the block diagram of the proposed algorithm. First, the original image frame is low-pass filtered and sub-sampled to get a lower resolution image. Then, motion estimation is performed on the lower resolution image, and the result is used to predict the initial motion vector and the mean of absolute differences (MAD) of the corresponding MB in the original image. W ith the MAD information, an initial search mode (of a certain block size) is determined using the estimated encoded bits and some predefined threshold. After the initialization, the motion vector of an MB in the original resolution image is refined by performing fast motion search with the initial block size. The number of encoded bits is then estimated again using the MAD of the refined motion vector. The updated encoded bits are used to verify the mode decision and determine the next mode to search. In addition, motion search will be terminated if the number of estimated encoded bits is smaller than an adaptive threshold. As a result, by reducing number of modes to be searched, the amount of computation can be saved remarkably. In the following, we will describe each component of our algorithm in detail. 94 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Lowpass & Coarse motion M AI£ Bit number .. ^ Initial mode subsampling estimation M V^ estimation decision Predicted MV Full-resolution Frame Predicted mode M AD Motion --------¥ Bit number — h M ode estimation — » estimation verification M V Figure 5.2: The block-diagram of the proposed motion search algorithm. 5.3.1 N otations A video clip is composed of several image frames. Each image frame is divided into macroblocks (MB) of size 16x16 as a basic encoding unit. Let us denote the set of pixel indices inside the Ah MB as M B t = { ( .* ,y)\rrii < x < nrii + 15,n* < y < rij + 15}, where (m*, n ,) is the coordinate of the top-left point of this MB. Both rn, and n» must be a multiple of 16. In the inter-frame prediction of H.264, each MB can be further divided into smaller sub-macroblocks. These smaller blocks are used to perform motion search 95 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 16x16 16x8 R'(0) 8x16 R3(0) 8x8 R4(0) R4 (i) R4(2) R4(3) Figure 5.3: The first level partition of an MB. separately. There are two levels of decomposition of an MB. The first level of de composition as illustrated in Fig. 5.3 is to divide an MB into sub-MBs of sizes of 16x16, 16x8, 8x16, or 8x8. They can be expressed explicitly as ^ ( 0) = MBi R2(0) = {(x, y)\k < x < U + 15, Ui < y < u,; + 7}, R2( 1) = {(x,y)\li < x < k + 1 5 , + 8 < y < u% + 15}, Rs(0) = {(x, y)\li < x < k + 7,Ui < y < Ui + 15}, 96 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 8x8 8x4 4x8 4x4 S7i(0) S 7i(l) S7j(2) S7(3) Figure 5.4: The second level partitioning of an 8x8 sub-MB. R \ l ) = {(x y) h + 8 < x < h + 15, M i < y < Ui + 15}, R \ 0) = {{x y) k < x < I, + 7, u, < y <Ui + 7}, R \ 1) = {(* y) h + 8 < x < h + 15, U i < y < Ui + 7}: R4( 2) = {(x y) k < x <U + 7, u, + 8 < y < Ui + 15}, f?4(3) = {(x y) h + 8 < x < h + 15, U j + 8 < y < U j + 15} 9 7 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. The second level decomposition is applied to the sub-MB of size 8x8. Each sub macroblock R4{j), j = 0,1, 2,3 can be partitioned into smaller blocks as illustrated in Fig. 5.4. They can expressed as S}(0) = # ( j ) SJ( 0) = {(%,y) s?(l) = -SJ(O) = {(x,y) 1) = {(x,y) S'J(O) = {(x,y) S]( 1) = {{x,y) S](2) = {{x,y) S j( 3) = {(x,y) Xj < x < X j + 7, Dj < y < Vj + 3}, Xj < x < Xj + 7, yj + 4 < y < yj + 17}, xj < x < Xj + 4 ,yj < y < yj + 7}, xj + 4 < x < xj + 7, yj < y < yj + 7}, Xj < x < Xj + 7, yj < y <yj + 7}, Xj + 4 < x < Xj + 7, yj < y < y3 + 3}, Xj < x < Xj + 3, t/j + 4 < y < yj + 7}, Xj + 4 < x < Xj + 7, yj + 4 < y < yj + 7}, where (Xj,yj) is the top-left point of the jth sub-MB. The set P of rectangular blocks is a partition of the ith macroblock MBi if this MB is composed of these rectangular blocks MBi = |J < p £ P 9 8 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. and these rectangular blocks are disjoint, < ,0 = 0 , if(/?^V; and < f, V ; ^ P- In the inter-prediction process of H.264, the partition pattern of an MB must follow a rule that the tree-structure motion estimation can be performed. Therefore, the possible partitions for an MB can be listed as: {^(O )}, {R2(0), R \ 1)}, {Rs(0), R \ 1)}, and all combinations from the second level with the following decomposition form { S f ( f c ) | (J \J\JS?(k) = M B ^ m e {4,5 , G J } , k £ {0,1,2,3}}. 0 < j < 3 m k The superset of all these possible partitions is denoted as V. The objective of the “inter-frame prediction” module for the ith MB is to mini mize its output bit number Let E(MBi ; P, V) represent the number of encoded bits for the ith macroblock with partition P and V be the set of motion vectors as sociated with each sub-MB. Then, the encoding process is to find the best partition P and motion vectors V so that Bt = mmfE(MBi-,P,V). 99 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Among the bits to encode an MB, those represent motion vectors and residual signals are most critical in the motion search component. Hence, the motion search process is equivalent to finding minv Em(MBi\ P, V) + ER{MB% ; P, V), (5.3) where Em and Er denote the bit numbers required to encode the motion vectors and residual signals, respectively. Usually, there exists a tradeoff between these two terms. When the MB is partitioned into more smaller blocks, Em will increase while Er will decrease. On the other hand, if the MB is not partitioned, then Em is the smallest while Er becomes the largest. The search process is to find the best tradeoff between these two terms to keep the best overall rate-distortion performance. To find the optimal solution for Eq. (5.3), one should perform the whole encoding process for every combination of all modes and their possible associated motion vectors. This extremely large amount of computation is usually not acceptable in practice. In this work, we approximate the bit number of the residual signal with an adaptive model to avoid the actual DCT computation and entropy coding. 5.3.2 E stim ation of B its for R esidual Coding We use a rate-distortion model to estimate the number of bits Er(MP,: P,V) for the residual signal coding based on the pixel variance information in an MB. Several rate-distortion models have been proposed for the frame level, e.g., [10], [19]. In 100 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. the current context, we are concerned with the coding at the MB level and the distortion will be about the same with respect to the same quantization step. The rate-distortion optimization process can be greatly simplified since we only have to minimize the encoding bit rates. The estimation of encoded bits is derived with the assumption that the residual signal is a zero-mean i.i.d source with the Laplacian distribution. The p.d.f. of any pixel x in the input of DCT is p (x ) = a where a — \f2/a. and a is the variance of pixels in an MB. For a uniform quantizer with quantization step Q, the probability to be quantized at k is r ( k + \ / 2 ) Q P k = p(x)dx. J ( k —l / 2 ) Q 1 - ^ 7 , if A : = 0, I7V ( i - 7), iffc^o, where 7 = e~a® . The entropy can be computed from 0 0 B(Q,cr) — — J2 PfclogPfc- k = — OO 101 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. W i th s o m e a lg e b r a ic m a n ip u la tio n , t h e e n tr o p y c a n b e o b ta in e d e x p lic itly v ia B(Q,a) = H2( ^ ) + ^ ( l + ! ^ ) , (5.4) where # 2(y) = —y log2 y — (1 - y)log2(l - y) is the binary entropy function. The entropy B(Q,a ) can be viewed as the average bits to represent a symbol if a perfect entropy encoder is applied. In later sections, we will use it to estimate the encoded bit number E(MBi, P, V) for the MB residual provided that the variance a and quantization step Q is known. In Figure 5.5, examples are given with various quantization steps. The solid curve is the theoretical result computed from Eq. (5.4) while small dots represent actual experimental results conducted by encoding a 300-frame CIF video ‘Forman’ with H.264 JM software. Since the practical real-world encoder may not be an ideal entropy coder and signals inside a single MB may have a wide distribution, it is reasonable that the number of bits to represent an MB may deviate from the theoretical model value. However, these figures indicate that the derived theoretical curves fit the general trend of experimental results well. Since we only use this model for mode selection, it is accurate enough for this purpose. In the practical implementation, the values of B(Q, < j ) can be computed off-line and stored in a 1 0 2 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Bits/pixel Bits/pixel Qstep=8 Qstep=16 2.5 0.5 20 MAD Qstep=32 2.5 < D >< 'q . to 0.5 20 25 MAD Qstep=128 10 15 20 25 MAD 2.5 Experimental Theoretical Q. 0.5 0 5 10 15 20 25 MAD Figure 5.5: Bits/pixel vs MAD with quantization steps 8, 16, 32 and 128. 103 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 7 Theor. rate 2nd order approx. 3rd order approx. 6 5 4 w -O 3 2 1 0 0 0.1 0.2 0.3 0.4 0.6 0.8 0.5 0.7 0.9 1 7 Figure 5.6: The approximate bit number vs 7 = e~a® . lookup table. The estimated bits can be simply found from the lookup table aft er Ending the nearest index of the value Q/a. Furthermore, we improve the model to make it an adaptive one so that it can adjust to the encoding context. The adaptive model for rate control in the frame level has been studied by researchers. For example, Lee et al. [20] proposed a second order polynomial for the rate model, and its coefficients are updated by linear regression. However, we observe that the polynomial of order 2 is not enough for the MB-level 104 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. rate approximation. It demands a third order polynomial to approximate Eq. (5.4), i.e. B( 7) = wQ + wx 7 + w2'y2 + W 373 + o(74), where o(j4) represents negligible higher-order terms. Figure 5.6 shows the theoretical curve as well as the second-order and the third-order approximations as a function of 7 . We see that when 7 approaches one, which corresponds to the case of higher MAD values, the second-order curve deviates a lot from the theoretical one. The proposed third-order polynomial provides a better estimation. To be adaptive to the context of the encoding process, the coefficients of the polynomial are updated after the encoding of each MB. We use the least mean square (LMS) algorithm to update the coefficients after the nth MB is encoded: Wi(n + 1) = Wi(n) + /ry(n)*e(n), i = 0,1,2,3, where //. is the weighting parameter and i=3 e(n) = R(n) - J 2 wi(nh ( nT i= 0 is the prediction error. The LMS algorithm is a well known technique for adaptive filtering. It is simple and robust and we find it suitable for the MB level adaptation. 105 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Original Level Figure 5.7: Multi-resolution motion search. In the simulation, we use the coefficients of the Taylor series expansion of Eq. (5.4) as initial values, and set weighting parameter //, to be as small as 0.001. It is con firmed by experimental results, as compared to the fixed model, the adaptive scheme achieves more stable performance under various quantization steps. 5.3.3 M ultiresolution M otion Search Before motion search is applied to the full resolution video, the MAD of an MB is estimated using a multi-resolution approach as shown in Figure 5.7 in the proposed algorithm. This approach can help predict the initial motion vector as a starting point for the search in the full resolution. First, we represent the original image frame with two resolutions: the original one and the lower resolution one. The lower resolution image is obtained by averaging 4x4 pixels and performing the 4x4 to lx l downsampling. Thus, each MB in the 106 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. original frame becomes a block of size 4x4 in the low resolution frame. A diamond search is then performed to find the motion vector and MAD of each 4x4 block in the low resolution level. The motion vector of an MB in the original level can be predicted by an interpolation technique to be detailed in the next section. The multiresolution motion search does not cost much computation, since the block-size and the search range are scaled down greatly in the lower resolution level. Besides, the search at the lower resolution level only has to be done once for each frame. The information of motion vectors and MAD can be stored for initial esti mates of motion vectors in the original resolution level. 5.3.4 Prediction of M otion Vectors A quadratic model is used to predict the motion vector in the original resolution video from the low resolution video. The model was first proposed in [22] for the sub-pixel motion vector interpolation from integer-pixel motion vectors. We modified it for the prediction of motion vectors in the full resolution from those in the coarse resolution. Let v = (px,Py) be the motion vector of some MB in the original resolution video. To predict v, we first model the SAD in the low resolution level, denoted as S(x, y), as a second-order polynomial of the motion vector in the original resolution: S(x, y) = k i( x - *j)2 + k2(y - ^ ) 2 + k3 (5.5) 107 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. where k\, k2, and A q are characteristic parameters of the SAD function. Let v' = (vx, vy) be the motion vector of the corresponding 4 x 4 block in the coarse resolution with minimum SAD denoted as So. The minimum SADs of its 4 neighboring motion vectors (vx,vy — 1), (vx + l,vy), (vx,vy + 1) and (vx — l,vy) in the coarse level are denoted by Si, S2, So, and ,S '4. respectively. By substituting 5 known SADs, So,- ■ ■ ,S4, into Eq.(5.5), the optimum predicted MV (px,py) can be solved by p, = Rou„d[4 x („, + p„ = Round [4 x („„ + 2(S]^ - ‘ ’ 32S()))] For boundary MBs, where some of SADs may not be available, we can simply choose {Avx,Avy) as the prediction in the full-resolution video. This model is different from that given in [22] since we predict motion vectors from the coarse level to the full-resolution level (rather than from the integral pixel level to the fractional pixel level). The accuracy of our model depends on the sub sampling operation, while that of [22] depends on the interpolation operation. Thus, our model may not be as accurate as the original scheme in [22] since some details could be lost in the down-sampling process. Note that macroblocks with smoother contents can be better predicted more accurately than those with complicated tex tures. 108 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. The motion vector predicted by this model only serves as a rough initial candi date. It should be compared with other candidates obtained using different predic tion methods. Only the winner will serve as the start point in the diamond search process. When early termination conditions are not met, the prediction error will be corrected by the refinement provided by the diamond search. 5.3.5 P rediction of R esidual Variance In the proposed approach, we use the MAD information in the full resolution to estimate the number of bits required to encode an MB. In this subsection, we show that the value MADfuu of an MB in the full resolution level can also be predicted from MADi eV2 in the coarse level by a simple relation MADfuu = c . x MADiev2, (5.6) where c is a constant. It is assumed that the distribution of predicted residual signals of an MB can be approximated by the Laplacian distribution with zero mean and a separable covariance R(m,n) = a j r ^ r ^ , where m and n are the horizontal and vertical distances between two pixels, respec tively, Oj is the variance of these pixel values, and r is the correlation coefficient. 109 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Under the assumption of Laplacian distribution, the variance of residual signals in the compensated blocks can be approximated by a / ~ \Pl x MADfuu. Similarly, the variance of residual signals in the coarse level can be approximated by (T iev 2 ~ \/2 x MADim ,2. We observe that each pixel in the coarse level is the average of all 4 x 4 pixels in the original level. Therefore, it is actually the DC value of the 4 x 4 DCT transform in the original level. Since we have modelled the image to have correlation coefficients r, the variance of the (u, u)-th component of DCT coefficients in the transform domain can be derived as [32] a - 2f( u , v ) = a2 f [ARAT]u,u[ARAT]v. V i where R is the correlation matrix with coefficients r, A is 4 x 4 DCT transformation matrix, and the operation [•]„ > v means the («, u)-th component of a matrix. Hence afev 2 = 1)- Let the ratio of crF(u,v)/cr^ be denoted by W(u,v). By using the energy conservation property, we obtain 2 2 _ Sii,» VT(ti, v) °7 — alev2 X W( 1 , 1) 1 1 0 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Take the correlation factor r = 0.6 as an example, which was investigated in [32] to be a good approximation for most pictures. By applying the transformation and the quantizer of H.264 to the above equations, the following relation can be obtained [Of(“ i !')] = °) x 5.6074 2.1293 1.0609 0.6744 2.1293 0.8086 0.4028 0.2561 1.0609 0.4028 0.2007 0.1276 0.6744 0.2561 0.1276 0.0811 Then, the variance of residual signals can be approximated by Of ~ V2 x MADi CV2 x 1.689. (5.7) Therefore, the constant c in Eq. (5.6) can be set to 1.689. In this model, the matched block in the reference frame at the coarse level is assumed to have the scaled version of the motion vector searched in the full-resolution level, and residual signals in the coarse level are approximately the same as the down- sampled versions of residual signals in the full resolution image. In practice, this approximation may be not accurate, since many details are lost in the sub-sampling process. Thus, it is better to update constant c after the motion search of each macroblock. In addition, the correlation factor r may change from frame to frame, and con sequently, the value of c may vary with the picture. In other words, we can replace 111 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. the constant c to be a function of the form c(k), where k denotes the fc-th MB. The value of c(k) is updated right after the encoding of an MB by c(k) ~ C[k) ~ A2(k) where Af(k) and A2(k) are averaged MADs from the full and the coarse resolutions. They are estimated, respectively, by Af {k) = pAf { h - \ ) + ( \- P ) M A D fvIl{k), A2(k) = pA2( k - l ) + ( l - 0 ) M A D leo2(k), where (3 is a forgetting parameter with a value between [0,1], This approximation can help make the initial mode decision for an MB, and it is used before any motion search in the original level. After mode selection, a fast motion search is applied to the original level with the selected mode to obtain a more reliable MAD new, and the estimated variance can be updated to \[2 x MADnew. 5.3.6 Search w ith Fast M ode D etection Since motion search in a low resolution frame is done before the actual encoding of any MB in the full resolution frame, the estimated motion vector and MAD are both available for initial mode decision of each MB. Furthermore, the predicted bit number f e ,; of the ith MB can be estimated from Eq. (5.4). 112 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Predict b■ o f f'-th MB Check 16x16 Perform 8x8(m ode4) search Perform 16xl6(m odel) search Foryth partition (/=0 to 3) Check 8x8(mode 4) Next j bit>em Check modes 2,3 Check 8x4 block Check 4x8 block Check 4x4 block Check mode 2-Level Decision Figure 5.8: The flowchart of 2-level mode decision R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. From extensive experimental results, we find some empirical rules for mode selec tion in H.264. For example, mode 1 (of size 16x16) is the most popular one among all modes. Therefore, this should be the first one to search if the predicted MAD and estimated encoded bits f e j are both small. Since large MAD will result in large bi: we can simply use /> , for mode decision. If the estimated number of coding bits bt is much greater than b. which represents the average bits to code an MB, it is probable that the MB should be further decomposed into smaller partitions so that they can be searched in different directions to reduce the motion compensation residual and, therefore, the number of bits required to encode the residual. On the other hand, if bi is small enough to meet a distortion criterion, there is no need to do further search. This early termination rule will reduce the number of modes to be searched with little performance degradation. In the reference software of H.264, there is a fixed search pattern to follow for partitions of all modes. In the proposed algorithm, we start the search with the initial mode predicted by b^. Figure 5.8 illustrates the flow chart of the proposed mode decision process. The initial modes are chosen between two levels of square partitions, i.e. 16x16 and 8x8. For a smaller number of 6* , the mode with block 16x16 is used, if bi is grater than the threshold d\, then an MB is split into 4 smaller blocks of size 8x8. After the initial mode is determined, motion search is performed with blocks under this new partition, and MAD and bi are updated with more accurate values. 114 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. If bi meets a predefined threshold, then the search will stop immediately to save computation. Otherwise, the search should continue with other modes. In the case of using the 16x16 block as an initial mode, the next search mode is the 8x8 block. If the early termination criterion is still not met, we continue to check modes with rectangular shapes of 16x8 and 8x16 blocks. The modes with partitions smaller than 8x8 would not be checked since they are less probable to happen while the value of bi is small. On the other hand, if the initial mode is the 8x8 block and the threshold is not met, the predicted bit number bij is estimated for the jth 8x8 partition (j = 0,1,2, 3) with the same method to predict residual bits of a macroblock. The jth partition will be split only if bij > 6\/A. For split 8x8 partitions, their 4x4 sub-blocks will be checked first. If the early termination condition is still not satisfied, other modes such as 4x8 and 8x4 will be checked consequently. For motion search under a particular partition, the MVFAST algorithm is adopted with modifications in several aspects. First, the fixed threshold is replaced by adap tive thresholds derived from MAD in the low resolution image. Second, the initial motion vectors to be checked are also changed. The predicted vectors in MVFAST are taken from the neighboring MB’s motion vectors. In the proposed algorithm, we predict the motion vector using a merge-and-split process. The merge process means that the predicted vector of a partition are taken from its square sub-partitions. The split process means that the predicted vector of a sub-partition are taken from its 115 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. parent square partit ion. These motion vectors are first checked, and the one with the minimum SAD is examined furthermore. If the resulting bit length does not exceed the specified threshold, the search stops immediately. Otherwise, we will check the motion vectors in neighboring MB’s as specified by MVFAST. The same threshold of early termination applies to these candidates, too. If the threshold is not met, the winner among candidates is served as the new search center and a local search of the diamond pattern is performed. In our implementation, only the small diamond pattern is used since it works well for almost all cases tried. The above 2-level decision process can be further enhanced by including one more decision level. It is observed in many cases that, when the initially predicted bit number is large, it is better to go directly to search for all 4x4 blocks under the search mode for 8x8 partitions. The flowchart of the 3-level decision process is depicted in Fig. 5.9. Fig. 5.10 shows a comparison for the two proposed schemes by encoding test CIF sequence ‘Foreman’. We see that the 3-level decision scheme is always faster than the 2-level one. The 3-level algorithm is 13% faster than the 2-level one. However, the coding rate may increase a little bit since the mode of 8x8 partitions may be omitted. When compared to the 2-level algorithm, the bit rate of the 3-level algorithm increase by 0.8% while the PSNR quality remains about the same in this example. One may m a k e a tr a d e o f f b e tw e e n the c o d in g b it r a t e a n d th e speed-up factor by selecting the 2-level or the 3-level decision process adaptively. In the following simulation results, 116 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Predict b, o f /-th MB C h e c k 4 x 4 b lo c k (m o d e 7 ) 2-Level Decision Perform 8x8(mode4) search Fory'th partition (/'=0 to 3) Next j Check 8x4 block Check 4x8 block Check mode Check modes 2,3 Figure 5.9: The flowchart of the 3-level mode decision. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. PSNR(dB) Performance of sequence "foreman" (CIF) 46 44 42 40 38 36 34 32 30 28 26 3 -Level 2-Level 24 ,2 .4 ,1 ,3 10 10' 10' 10 100 90 80 70 60 50 40 30 20 10 Speed Up 3-Level 2-Level 1 0 Rate(Kbps) 10 10 Rate(Kbps) 1 0 Figure 5.10: The performance comparison of the 2-level and the 3-level mode deci sion. 118 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. the 3-level scheme is adopted since it is much faster while the increased bit rate is insignificant. 5.3.7 Thresholds Setting Here, we provide more details about the principle in setting the thresholds. Sup pose that the variance 07 of the motion compensated residual in one MB has been predicted from motion vectors in the coarse resolution by Eq.(5.7). Then, we can determine the averaged MAD Of of frame residuals over all MB’s 07. The bit number to represent a single motion vector lmv can also be estimated by averaging the bits representing the motion vectors of all MBs in the previous frame. From Eq.(5.4), the average bit number to encode an MB’s residual signal is about b = 256B(Q, 07). Let tT/ j be the MAD for the z-th MB and, consequently, the bit number bi can be approximated by bi — 256 A(Q, 07,,-). The values of lmv and bi can be viewed as initial estimations of P, V ) and R r ( M Bt: P, V ) in Eq. (5.3), respectively. In other words, the total number of bits to represent an MB can be approximately by bi + lmv, where the header bits are excluded. It is expected that bi should be close to the average number b. If bi is much larger than the average bit number, then it is likely that a further partition into smaller blocks may help reduce the number of coding bits, and the minimum of Eq. (5.3) can be approached. In the threshold selection for the proposed mode decision process, one may trade the computational complexity for coding efficiency. If the threshold is set in favor of 119 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Percentile of increased bits and split MBs (CIF foreman) 100 Split MBs HID Incr. bits 70 60 o o' 40 20 35 40 -5 Figure 5.11: Percentile of increased bits and split MBs for test sequence “Fore man” (CIF). 1 2 0 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. coarser modes, it is less likely for a macroblock to be split into partitions to search for individual motion vectors, thus requiring less SAD operations and resulting in a lower computational complexity. On the other hand, this threshold reduces the probability of finding the correct mode in the search process, thus leading to larger residual signals. Fig. 5.11 gives an example of the trade-off by encoding the CIF ‘Foreman’ se quence with reference software JM 4.0d. The horizontal axis indicates the ratio q — (6 — b)/lmv, where 0 is the threshold for the predicted bit number b, to deter mine whether the Ah MB should be split or not. The solid curve represents the percentage of macroblocks to be split and the split scheme needs more SAD oper ations than the non-split one. The bars represent the percentages of increased bits with respect to the one using the exhaustive search algorithm to encode a mac roblock, when some threshold 9 is applied for early termination. For instance, if q = 3, the bit number is increased by 8%, while only 20% of the macroblocks are split. In this proposed algorithm, we choose q = 3 to set the first threshold to decide whether a macroblock should be split or not. Alternatively, we can heuristically set thresholds in the following way. For an MB with four 8x8 blocks, the expected number of bits to encode the MB is roughly I) + 4lmv. Therefore, if bi > b + 3lmv, the mode with four 8x8 blocks in one MB is 1 2 1 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. more likely to happen and should be checked before the 16x16 MB. Therefore, we set the first threshold to 6\ = b + 3 lmv. In the case where bi is even larger, we set a more conservative criterion. That is, if bi > b + 7lmv, we go directly to the partition with the smallest block size 4x4. This gives the second threshold 02 = b + 7lmv. After motion search in the full resolution frame, the predicted bit number bi is replaced with the number estimated by the resulting MAD found in the specified mode. It is compared with thresholds 9\ and #2 for possible early termination. That is, the search stops immediately if bi is smaller than these threshold values for the 8x8 and 4x4 cases, respectively. Otherwise, the motion search process will continue. 5.3.8 Sum m ary of th e A lgorithm The proposed fast motion search algorithm for H.264 encoding can be summarized in steps as follows: 1 2 2 R eproduced with perm ission of the copyright owner. Furiher reproduction prohibited without perm ission. For each frame Step 0: The following operations are done once for each frame before searching motion vectors for each MB: 1) Obtain the lower resolution frame by performing the lowpass filtering and 4x4 to lx l sub-sampling on the original frame. 2) Perform fast motion search among two consecutive low resolution frames and store the motion vectors and MADs. 3) Estimate the expected bit number bi of the ith MB from the MAD information. 4) Calculate average length b from the averaged MAD of this frame. 5) Set two threshold values: 6i = b + 4lmv and O 2 — b + 7T mv. For the i-th MB: 123 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Step 1: If bi > 02, mode 7 first, goto Step 17. Step 2: If bi > d\, mode 4 first, goto Step 9. Step 3: Perform fast motion search for mode 1 (16x16) and update bi. Step 4: If bi < 0\ stop. Step 5: Perform fast motion search for mode 4 (8x8) and update bi. Step 6: If bi < $i stop. Step 7: Perform fast motion search for modes 2 and 3 (16x8 and 8x16) Step 8: Choose the best mode and stop. Mode 4 First: For each 8x8 partition Step 9: Perform fast motion search for mode 4 (8x8). Step 10: Find the expected bit number bp for the p-th partition. Step 11: If bp < (0i)/4, stop. Step 12: Perform fast motion search for mode 7 (4x4) and update bp. Step 13: If bp < (0i)/4, stop. Step 14: Perform fast motion search for modes 5 and 6 (8x4 and 4x8). Step 15: Choose the best mode and if bp < (0i)/4, stop. Step 16: Goto Step 24. Mode 7 First: For each 4x4 partition 124 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Step 17: Perform fast motion search for mode 7 (4x4). Step 18: Find the expected bit number bp for the p-th partition. Step 19: If bp < (#i)/4, stop. Step 20: Perform fast motion search for mode 4 (8x8) and update bp. Step 21: If bp < (#i)/4, stop. Step 22: Perform fast motion search for modes 5 and 6 (8x4 and 4x8). Step 23: Choose the best mode and if bp < (0i)/4, stop. Partition Merge: Step 24: Perform fast motion search for modes 2 and 3 (16x8 and 8x16) and update 6,. Step 25: If bi < 6\, stop. Step 26: Perform fast motion search for mode 1 (16x16). Step 27: Choose the best mode, and stop. 5.4 E xp erim en tal R esu lts We implemented the proposed algorithm and integrated it with the H.264 reference software JM4.0d. The exhaustive search (fast FS) and MVFAST search algorithms were tested for comparison. The exhaustive search was executed by enabling the option of FAST-FU LLSEARCH in reference software, which computed SADs of 4x4 blocks first and merged SADs of larger blocks with a bottom-up merging process as 125 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Table 5.1: Comparison of the proposed algorithm and other algorithms for H.264 encoding Sequence Format B P S Increase(% ) Speed U p New Method Diamond New Method Diamond susie D1 6.02 9.43 71.60 23.96 football^ D1 5.20 2.90 47.01 14.71 mobile D1 2.21 4.39 64.28 14.89 mobile CIF 2.68 4.85 69.49 17.55 foreman CIF 3.73 4.93 60.01 17.98 foreman QCIF 4.77 4.47 65.88 20.71 akiyo CIF -0.09 4.01 156.82 50.32 akiyo QCIF 0.04 3.91 147.85 49.63 salesman CIF 1.35 2.92 92.15 29.63 salesman QCIF 0.80 3.01 110.51 36.94 missam CIF 0.68 6.59 104.24 31.32 missam QCIF 0.08 3.78 136.62 44.66 coastguard CIF 2.66 6.11 66.65 16.98 coastguard QCIF 3.46 6.21 79.15 23.44 hall CIF 0.88 1.46 121.31 33.69 hall QCIF 0.75 2.14 145.97 40.18 flower CIF 2.14 2.19 70.67 20.79 tInterlaced coding m ode is enabled. described in Section 5.2.3. The MVFAST algorithm was implemented by following the description in [31]. Only the P-frame prediction with one reference frame was tested as inter-frame prediction. The search range was set to ±16 for all cases. Several video sequences of different formats were tested in our experiment, and 10 quantization parameters ranging from 18 to 45 were selected for each sequence to cover a wide range of bit rates. Table 5.1 gives the results of the average performance for these sequences. 126 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Wo used the fast FS algorithm provided in the H.264 reference software as a benchmark. Two fast search algorithms, including the proposed algorithm and the fast search algorithm with MVFAST directly applied to all 7 modes, were compared to the benchmark algorithm. In Table 5.1, the columns labelled by “BPS Increase” are the bit-rate increase in terms of the percentage. As expected, the fast FS al gorithm generates the lowest bit-rates in most cases. The results also show that the proposed method has higher coding efficiency than the MVFAST algorithm in general. This means that our algorithm can determine the proper mode and the associated motion vector better than MVFAST. It is worth mentioning that, the proposed algorithm even outperforms fast FS in coding bit rates for the “Akiyo” CIF sequence. The reason is that, although FS can find the MV candidate with the minimum SAD, the minimum SAD value does not always guarantee the smallest number of coding bits. The right two columns of the table show the speed-up factors. The factors are calculated by counting the number of SAD operations. In the proposed algorithm, the counting includes the subsampling operation and SAD operations of motion search in both the original and the coarse levels. The values given in the table are ratios of the SAD operation compared to the benchmark, i.e. the fast FS algorithm. It is clear that the proposed algorithm can provide a speed-up factor in the range of 60 to 150. Note that the speed-up factor m a y v a r y fo r d if fe r e n t b i t r a t e s e v e n for the same test video. Compared to MVFAST, our method can still improve the 127 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. search speed by 3 to 4.3 times while providing better coding efficiency in most test sequences. In general, the proposed algorithm has better performance in sequences with smoother motion since fewer motion vector candidates need to checked. Other video characteristics may also influence the performance of the proposed algorithm. For example, the test sequence ‘football’ has a lot of movement and obviously interlaced effects. If we use the frame coding mode as adopted in other test sequences, the coding performance of the proposed algorithm will degrade severely. The resultant bit rate is 10% higher than the full search algorithm. In this case, the interlaced coding option has to be turned on, and the performance becomes more reasonable as shown in the table. The comparison of coding performance among three codecs are shown in Figures 5.12, 5.13, and 5.14. The PSNR, and speed-up factors vs bit rates are plotted at vi- rous quantization steps with format D l, CIF, and QCIF, respectively. The proposed algorithm MVFAST and FS are marked as ‘new’, ‘diam’, and ‘full’ in the legend. In most cases, the PSNR curves of the proposed algorithm are close to those of the FS algorithm. This means that the proposed fast algorithm has little visual quality degradation as compared with the FS algorithm. Note that the visual quality using MVFAST degrades significantly in the low bit rates. 128 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. 5.5 C onclusion We have developed a fast motion search algorithm for H.264 in this chapter by introducing a rate model for effective initial mode selection, a method for initial motion vector prediction and early-termination rules to avoid unnecessary compu tation. Experimental results were given to demonstrate that our method can speed up the search up to a factor of 120 times with little visual quality degradation. The proposed method outperforms several fast search algorithms and provides the best tradeoff between coding efficiency and the speed. With our approach, a real-time H.264 encoder for high quality video becomes more feasible. 129 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of seq u en ce "mobile" 45 co 2 , a : z < 0 CL — New Diamond Rate(Kbps) 100 90 80 60 40 Speed Up Performance of sequence "football" New Diamond 10 10 Rate(Kbps) Speed Up 45 100 ............." * T ~1 / 90 40 I 80 / 70 35 / 60 S' f \ T J j \ 01 z 4 50 \ to Q . j 30 I 40 / 30 25 20 10 20 .......... 0 , 10 10 10 Rate(Kbps) 1 0 10° 10 10° Rate(Kbps) Figure 5.12: The performance of test sequences with a frame size of 720 x 480. 130 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of seq u en ce "susie" Speed Up 100 70 Rate(Kbps) 40 Rate(Kbps) Figure 5.12 (Cont.) R eproduced with perm ission o f the copyright owner. Further reproduction prohibited without perm ission. Performance of sequence "foreman" (CIF) 46 | ----------------- | 44 - 10' 102 103 104 Rate(Kbps) Performance of sequence "akiyo" (CIF) Rate(Kbps) Speed Up 100 j — — — ......... .......... 90 80 - 50 - 40 30 - 20 - 10 - 10’ 102 103 104 Rate(Kbps) Speed Up 250 200 150 100 Rate(Kbps) Figure 5.13: The performance of test sequences of the CIF size. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of seq u en ce ’ 'coastguarcT(CIF) Speed Up 26 Rate(Kbps) Performance of sequence "hall"(C IF) 42 40 38 32 30 28 26 10 10 Rate(Kbps) Speed Up Rate(Kbps) 10 10 Rate(Kbps) Figure 5.13 (Cont.) R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of sequ en ce "salesman" (CIF) S peed Up 42 “ 36 24 Rate(Kbps) Performance of sequence "missam"(CIF) 44 42 40 38 “ 36 34 32 30 Rate(Kbps) 100 250 200 150 100 10 10 Rate(Kbps) Speed Up 10 10 Rate(Kbps) Figure 5.13 (Cont.) R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of sequ ence "foreman" (QCIF) Speed Up 46 44 42 32 26 Rate(Kbps) Performance of sequence "akiyo" (QCIF) 100 70 60 250 200 100 50 Rate(Kbps) Speed Up Rate(Kbps) 10 10 Rate(Kbps) 1 0 Figure 5.14: The performance of test sequences of the QCIF size. R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of seq u en ce "coastguarcT(QClF) 46 24 Rate(Kbps) Performance of sequence "haH "(Q C IF) Rate(Kbps) Speed Up 100 60 40 Rate(Kbps) Speed Up 250 200 150 100 Rate(Kbps) Figure 5.14 (Cont.) R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. Performance of seq u en ce "salesman"(QCIF) 250 Up Rate(Kbps) Performance of sequence "missam"(QCIF) Rate(Kbps) 150 250 200 150 100 50 1 0 ' 10‘ Rate(Kbps) Speed Up 10 10 Rate(Kbps) Figure 5.14 (Cont.) R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. C hap ter 6 C onclu sion and F uture W ork 6.1 C onclusion In the first part of this dissertation (i.e. Chapters 3 and 4), we proposed a special class of space-time codes to enable wireless broadcast for heterogeneous receivers. The layered multimedia source can be nicely integrated with this kind of systems for scalable content delivery. The terminal with more antennas can retrieve contents with higher quality. It is worthwhile to point out that conventional space-time codes utilize the trans mitter diversity while our current research exploits the receiver diversity. Our basic idea is that some space-time codes can only be detected with more than one an tenna. Thus, one can trade more receiver antennas with a higher computational complexity for higher quality. Nevertheless, receivers with less diversity can still get the minimum level of guaranteed QoS. Embedded STC can integrate the receiver diversity with the layered source coding technique to provide the wireless broadcast 138 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. service. Basic design considerations were discussed in Chapter 3, and it was verified via computer simulation that the bit error rate performance depends on the number of receiving antennas. In addition, the performance depends on the signal ampli tude ratios among different layers. Optimal parameters can be determined from simulation results. Differential design was applied to embedded STC in Chapter 4, which finds appli cations in communication systems without knowledge of the channel state informa tion (CSI). In this scheme, the base-layer adopts conventional differential STC, while the higher layer coding encodes signals according to those of the base-layer. Our discussion shows that the receiver diversity can also be exploited in the differential scheme. The performance is slightly inferior to that of the non-differential scheme. An improved receiver using the Kalman filter to track the channel status was pro posed. In such a system, the low-end receiving terminal can still retrieve base-layer signals with less complexity. Furthermore, it allows the high-end terminal to receive signals of higher quality. In the second part of this research (i.e. Chapter 5), we attempted to reduce the computational complexity in the inter-prediction module of the H.264 encoder. A fast motion search algorithm for variable block sizes was proposed using the method ology of fast mode detection and early termination. The less likely modes can be skipped in advance to avoid unnecessary SAD calculations. This is achieved by a multi-resolution motion search scheme with an adaptive rate-distortion model, which 139 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. provides good prediction of the number of bits and the initial motion vectors to be used in the motion search process. Unlike conventional fast algorithms that focus only on the saving of SAD computations, our approach provides a deeper under standing of the fundamental goal of the source coder, that is, to minimize the bit number used to represent a macroblock via block decomposition and inter-frame prediction. In our framework, the threshold of early termination can be adaptively determined using a more accurate and reasonable approach. Compared to the ex haustive search algorithm, the increase of bit rates caused by early termination and skip of unlikely modes can be controlled to a very small value due to good predic tion. The proposed method outperforms several existing fast search algorithms and provides a good tradeoff between coding efficiency and the coding speed. 6.2 Future work 6.2.1 Integration of Space-tim e and M ultim edia Source C oding To integrate multimedia source coding with the proposed embedded STC system may need some more intermediate coding modules. It is necessary to incorporate the forward error correction (FEC) technique. The integration with FEC has some advantages: • Reduce the bit error probability to combat wireless channel fading and noise so that the receiver can decode more accurately. 140 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. • FEC can be integrated with interference cancellation and iterative channel tracking to further improve the overall performance. • Unequal error protection can be used along with better resource allocation for video transmission to achieve better quality. • Iterative Turbo-like decoding can be applied if appropriate FEC encoders are concatenated. 6.2.2 Further Enhancem ent of H .264 Encoder In addition to motion estimation with various block sizes, the scheme with multiple reference frames also increases the complexity and the memory requirements signif icantly. Thus, it demands further research to lower these requirements. It appears that, with some modification, the proposed algorithm can be extended to reduce the search space of multiple reference frames. Some detailed procedures in early termination and mode prediction remain to be explored. The other bottleneck lies in the interaction between inter- and intra-prediction mode selection. H.264 allows blocks of sizes 4 x 4 or 16 x 16 to be intra-predicted or inter-predicted. To perform intra-prediction, these blocks need the information from reconstructed pixels of neighboring blocks. Thus, additional computation is required to perform intra-prediction. It is an interesting yet challenging research problem to find the best mode from all possible inter- and intra-prediction modes. 141 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. R eferen ce List [1] Dakshi Agrawal, Vahid Tarokh, Ayman Naguib, and Nambi Seshadri, “Space time coded ofdm for high data-rate wireless communicatin over wideband chan nels,” Proc. VTC 98, pp. 2232-2236, 1998. [2] Siavash M. Alamouti, “A simple transmit diversity technique for wireless com munication,” IEEE Journal on Select Areas in Communications, vol. 16:8, pp. 1451-1458, 1998. [3] J. A. C. Bingham, “Multicarrier modulation for data transmission: an idea whose time has come,” IEEE Communication Magazine, vol. 28, pp. 5 14, 1990. [4] Junavit Chalidabhongse and C.-C. Jay Kuo, “Fast motion vector estimation using multiresolution-spatio-temporal correlations,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 7, no. 3, pp. 477-488, June 1997. [5] L. J. Cimini, “Analysis and simulation of a digital mobile channel using or thogonal frequency division multiplexing,” IEEE Trans, on Commun., vol. COM-33, pp. 665-675, 1985. [6] Coding of Audio-Visual Objects, Part-2 Visual, Amendment f: Streaming Video Profile, ISO/IEC 14496-2/FPDAM4, July 2000. [7] Thomas M. Cover, “Broadcast channels,” IEEE Trans, on Information Theory, vol. IT-18, pp. 2-14, 1972. [8 ] Gerard J. Foschini, “Layered space-time architecture for wireless communica tion in a fading environment when using multiple antennas,” Bell Labs Tech nical Journal, vol. 1, no. 2, pp. 41-59, Autumn 1996. [9 ] Michael J. Gormish and John T. Gill, “Computation-rate-distortion in trans form coders for image compression,” in Proc. SPIE Annual Meeting, 1993. [10] Hsueh-Ming Hang and Jiann-Jone Chen, “Source model for transform video coder and its application-part 1: fundamental theory,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 7, no. 2, pp. 287-298, April 1997. 142 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. [11] B. M. Hochwald and T. L. Marzetta, “Unitary space-time modulation for multiple-antenna communications in rayleigh flat fading,” IEEE Tr. Infor mation Theory, vol. 46, pp. 543-564, 2000. [12] B. M. Hommons and Hesham El Gamal, “On the theory of space-time codes for psk modulation,” IEEE Tr. Information Theory, vol. 46, pp. 524 -542, 2000. [13] B. L. Hughes, “Differential space-time modulation,” IEEE Tr. Commun., vol. 46, p p . 2567-2578, 2000. [14] Information Technology - JPEG 2000 Image Coding System - Part 1: Core Coding System, ISO/IEC 15444-1:2000, Dec 2000. [15] Hamid Jafarkhani and Vahid Tarokh, “Multiple transmit antenna differential detection from generalized orghogonal designs,” IEEE Tr. Information Theory, vol. 47, pp. 2626-2631, 2001. [16] Irving Kalet, “The multitone channel,” IEEE Trans, on Comm,un., vol. 37, pp. 119-124, 1989. [17] Emil Kalman, Rudolph, “A new approach to linear filtering and prediction problems,” Transactions of the ASM E-Journal of Basic Engineering, vol. 82, no. Series D, pp. 35-45, 1960. [18] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion compen sated interframe coding for video conferencing,” in IEEE Proc. Nat. Telecom- mun. Conf., December 1981, vol. 4, pp. G5.3.1-G5.3.5. [19] Edmund Y. Lam and Joseph W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans, on Im,age Processing, vol. 9, no. 10, pp. 1661-1666, October 2000. [20] Hung-Ju Lee, Tihao Chiang, and Ya-Qin Zhang, “Scalable rate control for MPEG-4 Video,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 10, no. 6, pp. 878-894, September 2000. [21] Renxiang Li, Bing Zeng, and Ming L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 4, no. 4, pp. 438-442, August 1994. [22] Xiaoming Li and Cesar Gonzales, “A locally quadratic model of the motion es timation error criterion function and its application to subpixel interpolations,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 6, no. 1, pp. 118 122, February 1996. 143 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. [23] Weiping Li, “Overview of fine granularity scalability in mpeg-4 video standard,” IEEE .Journal on Circuits and Systems for Video Technology, vol. 11, pp. 301 317, 2001. [24] Mao-Chao Lin, Chi-Chang Lin, and Shu Lin, “Computer search for binary cyclic uep codes of odd length up to 65,” IEEE Trans, on Information Theory, vol. 36, no. 4, pp. 924 - 935, July 1990. [25] Lurng-Kuo Liu and Ephraim Feig, “A block-based gradient descent search al gorithm for block motion estimation in video coding,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 6, no. 4, pp. 419-422, August 1996. [26] Z. Liu, G.B. Giannakis, and S. Barbarossa, “Block precoding and transmit- antenna diversity for decoding and equalization of unknown multipath chan nels,” Proc. 33rd Asilomar Conf. on Signals, Systems and Computers, vol. 2, pp. 1557-1561, 1999. [27] Zhiqiang Liu, Xiaoli Ma, and Georgios B. Giannakis, “Space-time coding and kalman filtering for time-selective fading channels,” IEEE Transactions on Communications, vol. 50, no. 2, pp. 183-186, February 2002. [28] Henrique Malvar, Antti Hallapuro, Marta Karczewicz, and Lous Kerofsky, “Low-complexity transform and quantization with 16-bit arithmetic for H.26L,” in Proc. ICIP 2002, 2002. [29] Jerry M. Mendel, Lessons in Estimation Theory for Signal Processing, Com munications, and Control, Prentice Hall, New Jersey, 1995. [30] Ayman F. Naguib, Nambi Seshadri, and A. R. Calderbank, “Applications of space-time block codes and interference suppression for high capacity and high data rate wireless systems,” Proc. 32nd Asilomov Conf. Signals, Systems, and Computers, pp. 1803-1810, 1998. [31] Optimization Model Version 3.0, ISO/IEC JTC1/SC29/WG11 N4344, Sydney, July 2001. [32] I-Ming Pao and Ming-Ting Sun, “Modeling DCT coefficients for fast video encoding,” IEEE Trans, on Circuits and Systems for Video Technology, vol. 9, no. 4, pp. 608-616, June 1999. [33] Lai-Man Po and Wing-Chung Ma, “A novel four-step search algorithm for fast b lo c k m o tio n e s tim a tio n ,” IE E E Trans, on Circuits and S ystem s fo r Video Technology, vol. 6, no. 3, pp. 313-317, June 1996. [34] John G. Proakis, Digital Comm,unications, McGraw-Hill, New York, 1989. 144 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission. [35] K. Ramchandran, A. Ortega, K. M. Uz, and Martin Vetterli, “Multiresolution broadcast for digital hdtv using joint source/channel coding,” IEEE Journal on Selected Areas in Communications, vol. 11, pp. 6-23, 1993. [36] A. Shokrollahi, B. Hassibi, B.M. Hochwald, and W. Sweldens, “Representation theory for high-rate multiple-antenna code design,” IEEE Trans, on Informa tion Theory, vol. 47, no. 6, pp. 2335-2367, Sept. 2001. [37] Study of Final Committee Draft of Joint Video Specification, Joint Video Team (JVT) of ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC, February 16, 2003. [38] Gary J. Sullivan and Thomas Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74 90, November 1998. [39] Vahid Tarokh, Nambi Seshadri, and A. R. Calderbank, “Spce-time codes for high data rate wireless communication: Performance criterion and code con struction,” IEEE Tr. Information Theory, vol. 44, pp. 744-764, 1998. [40] Vahid Tarokh, Hamid Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Tr. Information Theory, vol. 45, pp. 1456-1467, 1999. [41] Vahid Tarokh and Hamid Jafarkhani, “A differential detection scheme for trasmit diversity,” IEEE J. Select. Areas Commun., vol. 18, pp. 1169 1174, 2000. [42] Jo Yew Tham, Surendra Ranganath, Maitreya Ranganath, and Ashraf Ah Kas- sim, “A novel unrestricted center-biased diamond search algorithm for block motion estimation,” IEEE Trans, on Circuits and Systems for Video Technol ogy, vol. 8, no. 4, pp. 369-377, August 1998. [43] Alexis M. Tourapis, O. C. Au, and M. L. Liou, “Predictive motion vector field adaptive search technique (PMVFAST)-enhancing block based motion estima tion,” in SPIE Proc. Visual Comm,. Image Proc., January 2001. [44] Alexis M. Tourapis, “Enhanced predictive zonal search for single and multiple frame motion estimation,” in SPIE Proc. Visual Comm. Image Proc., 2002. [45] H. Zheng and K. J. R. Liu, “Space-time diversity for multimedia delivery over wireless channels,” Proc. ISCAS 2000, vol. IV, pp. 285 -288, 2000. [46] Shan Zhu and Kai-Kuang Ma, “A new diamond search algorithm for fast block- matching motion estimation,” IEEE Trans, on Image Processing, vol. 9, no. 2, pp. 287-290, February 2000. 145 R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Advanced video coding techniques for Internet streaming and DVB applications
PDF
Call admission control and resource allocation for quality of service support in wireless multimedia networks
PDF
Contributions to image and video coding for reliable and secure communications
PDF
Contributions to content -based image retrieval
PDF
Error resilient techniques for robust video transmission
PDF
Adaptive video transmission over wireless fading channel
PDF
Design and analysis of MAC protocols for broadband wired/wireless networks
PDF
Contribution to transform coding system implementation
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
PDF
Design and analysis of server scheduling for video -on -demand systems
PDF
Algorithms and architectures for robust video transmission
PDF
Intelligent systems for video analysis and access over the Internet
PDF
Color processing and rate control for storage and transmission of digital image and video
PDF
Information hiding in digital images: Watermarking and steganography
PDF
Contributions to efficient vector quantization and frequency assignment design and implementation
PDF
Design and applications of MPEG video markup language (MPML)
PDF
Code assignment and call admission control for OVSF-CDMA systems
PDF
Intelligent image content analysis: Tools, techniques and applications
PDF
Dynamic radio resource management for 2G and 3G wireless systems
PDF
Geometrical modeling and analysis of cortical surfaces: An approach to finding flat maps of the human brain
Asset Metadata
Creator
Kuo, Chih-Hung
(author)
Core Title
Contributions to coding techniques for wireless multimedia communication
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Chang, Tu-Nan (
committee member
), Ortega, Antonio (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-487835
Unique identifier
UC11340184
Identifier
3133299.pdf (filename),usctheses-c16-487835 (legacy record id)
Legacy Identifier
3133299.pdf
Dmrecord
487835
Document Type
Dissertation
Rights
Kuo, Chih-Hung
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical