Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Algorithms for streaming, caching and storage of digital media
(USC Thesis Other)
Algorithms for streaming, caching and storage of digital media
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. U M I films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send U M I a complete manuscript and there are missing pages, these w ill be noted. Also, if unauthorized copyright material had to be removed, a note w ill indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, M l 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ALGORITHMS FOR STREAMING, CACHING AND STORAGE OF DIGITAL MEDIA by Zhourong Miao A Dissertation. Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2002 Copyright 2002 Zhourong Miao Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3073814 __ ____ _ _ < g ) UMI UMI Microform 3073814 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTH! S IN CALIFORNIA The G raduate School U niversity ] • ark LOS ANGELES, CALIFOli NIA 90089-1695 This dissertation , w ritte n . y H r # : o Com m ittee, an d approved by a llits m em bers, has been p resen ted to anil accepted b y The Graduate School, in p a r ia l fu lfillm en t o f requirem ents fo r th e degrv. ? o f DOCTOR OF PH ) IOSOPHY U nder th e direction o f bJS.. D issertation i tan o f G n dm te Studies DISSERTATION COM M ) V TE E Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. D edication To my family, And Shan Liu Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents D edication ii List o f Figures v A bstract viii 1 Introduction and M otivation 1 1.1 Digital communication and multimedia data delivery . . . . . . . . . 1 1 . 1 . 1 Focus of this th e s is ............................................................................. 4 1.2 V'BR video c h a ra c te ristic s............................................................................ 6 1.2.1 V'BR video ........................................................................................... 6 1.3 Real-time VBR video delivery and its challenges .......................... 1 1 1.3.1 Delivery over best-effort n e tw o rk s.................................................. 11 1.3.2 Joint approaches ...................................................... 12 1.3.3 Rate control and error c o n tro l.......................................................... 13 1.3.4 Delivery over QoS n e tw o rk s ............................................................. 17 1.4 Other Components in streaming video sendees sy stem .......................... IS 1.4.1 Proxy caching for streaming v id eo ................................................... 18 1.4.2 Disk storage strategy for central video se rv e r............................... 20 1.5 Contributions of this thesis .................................................................. 22 2 Scalable stream ing m edia delivery 24 2.1 In tro d u ctio n ....................................................................................................... 24 2.2 Scalable Media and Streaming System A rc h ite c tu re .............................. 29 2.2.1 D ata p ack etizatio n .............................................................................. 29 2.2.2 System architecture.............................................................................. 32 2.3 The Packet Scheduling P ro b le m .................................................................. 34 2.4 Proposed Scheduling A lg o rith m ................................................................... 36 2.4.1 Expected run-time packet distortion ................................................. 37 2.4.2 Expected Run-time Distortion Based Scheduling - ERDBS . . 41 2.4.3 D iscussion............................................................................................... 41 2.5 Experimental R e su lts...................................................................................... 42 2.6 C o n clu sio n s....................................................................................................... 45 iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 Scalable P roxy C aching for Stream ing V ideo A pplications 48 3.1 In tro d u ctio n .......................................................................................................... 48 3.1.1 System architecture................................................................................. 50 3.1.2 Related w o r k ........................................................................................... 54 3.2 Basic definitions....................................................................................................57 3.3 Video caching in QoS networks ......................................................................59 3.3.1 Problem fo rm u latio n ...............................................................................59 3.3.2 Analysis on bandwidth r e d u c tio n ........................................................60 3.3.3 Client buffer a n a ly sis..............................................................................63 3.3.4 Proposed SCQ alg o rith m ....................................................................... 6 6 3.4 Video caching in best-effort n e tw o rk s............................................................ 6 8 3.4.1 Problem fo rm u la tio n ...............................................................................71 3.4.2 Analysis on buffer trace after c ach in g ................................................. 72 3.4.3 Proposed SCB alg o rith m ................................................................... 74 3.4.4 Caching t a b l e ...........................................................................................76 3.5 Experimental results ................................................................................ 76 3.6 C o n clu sio n s...........................................................................................................79 4 V ideo C om pression w ith R ate C ontrol for V id eo Storage on D isk B ased V ideo Servers 90 4.1 In tro d u ctio n ...........................................................................................................90 4.2 Problem F orm ulation.......................................................................................... 96 4.3 Optimization based on Multiple Lagrange M u ltip lie rs......................................................................................... 1 0 2 4.4 Experimental R e s u l t s .......................................................................................104 4.5 C o n clu sio n s.........................................................................................................108 5 C onclusions 109 R eference List 111 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. t o to List of Figures 1.1 Streaming video system e x a m p le s............................................... 4 1.2 The relationship between the topics in this thesis and streaming video applications.......................................................................................................... 5 2.1 (a) MPEG 4 FGS packetization of a single frame, (b) MPEG 4 FGS packetizcition of frames with inter-frame dependencies................................. 30 2.2 System arch itectu re.......................................................................................... 32 2.3 MPEG 4 FGS frame data rate. The dotted line is the base layer size of each frame; and the solid line represents the full size of each frame (base layer and enhancement layer)............................................................... 43 2.4 The comparison of playback quality (PSNR) using proposed ERDBS and SS algorithm, in various conditions, such as bandwidth, channel packet loss rate, start-up delay and round trip time (see also Fig. 2.5- 2.7) In most cases the playback quality of ERDBS outperforms the regular SS delivery algorithm around 2 dB............................. 44 2.5 Similar to Fig. 2.5, while parameterized by channel error probability (e)........................................................................................................................... 45 . 6 Similar to Fig. 2.5, while parameterized by end to end delay (d). . . . 46 .7 Similar to Fig. 2.5, while parameterized by Round-Trip-Time (R T T ). 47 3.1 System architecture. The proxies are set close the clients, and are con nected to the video server via either a QoS (a) or best-effort network (b) 50 3.2 Cumulative rate and slope functions, (a): Cumulative frame/channel rate. Curve (1 ), (2) and (4) are cumulative channel rate functions, 52iC(i). Among them only ( 1 ) and (4) represent CBR channels. Curve (3) is the cumulative frame rate, 5 ^(t). Curve ( 1 ) and (2) are feasible channel rate, while (4) is not. (b): Slope function L^(t) is drawn in curve (3). The minimum CBR channel bandwidth that has to be reserved is Cr = m ax{L^(t)}........................................................ 62 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3 Illustration of Proposition 1. The lower figure shows the slope func tions before ancl after caching one frame F (k t), represented by L^{t) (solid line) and L ^ { t) (dashed line), respectively. The upper figure is the corresponding cumulative channel/frame rate functions. After caching an “earlier” frame before tpeak, i.e., /q < tpeak, m ax{L^(t)} reduces from c to q . Obviously, caching a “later” frame F(k2), i.e., k2 > tpeak, can not reduce max{£>t(t)}, which occurs at tpeak.....................84 3.4 Illustration of caching one frame before or after trnax. 5 ^ t (t) and S ^ 2(t) are the cumulative frame rate functions after caching F(ki) and F(k2), (drawn in dashed and dotted lines) respectively. Note th at only one frame is selected in each case. If R(ki) = R(k2) and k2 < tmax < k v < tP eak, Proposition 1 shows that selecting F(ki) or F(ko) leads to the same reduction on bandwidth Cr. However, the corresponding changes in Bmax are different: caching F(ki) reduces B max from b to (A6 = — b < 0 ); while caching F{k2) increases Bmax from b to b2 (Ab = b2 — b > 0). Therefore caching a frame between tmax and tpeak (e.g., F(k2)) can reduce Bmax while keeping the same reduction on Cr ........................ 85 3.5 Trace of client buffer size in number of frames and bits................................ 8 6 3.6 (a): B^{t), no frame is cached, (b): B ^ ( t) , frame F (t t) is cached. These figures illustrated that by caching one frame F ( t l), the buffer trace can be “lifted” for £ > £ls when there is no buffer size limitation. 8 6 3.7 The client buffer size limitation is B m a x■ (a): B$(t), no frame is cached, (b): B ^ t (t), after F(£t) is cached, (c): B ^ 2{t), after both F{t{) and F (t2) are cached. With the buffer size limitation, the buffer trace after caching frames will not follow that in Fig. 3.6. (b) shows th at caching frame F(£i) only increase B ^ t (t) between < t < tmax. (c) shows that caching frame F(t2) can lift buffer trace for all t > t 2 because there is no maxima point after t2........................................................87 3.8 (a): Trace where only Ireq is cached and troughs occur at time t\_ and £ 4 . (b): After SCQ caching. First select frames before £t, but £ 4 still remains the same, due to the maximum peak at t2, drawn in dotted line. Next select frames within [£o. £ 4 ] to increase robustness for £ 4 . . 87 3.9 (a) The bandwidth (Cr («4.), see (3.7)) that has to be reserved, v.s. the percentage of the video been cached, (a): Both the proposed SCQ and prefix caching can reduce Cr(A) similarly as more portion of the video has been cached, (b): The maximum buffer size, Bmax, required at the client to achieve the caching performance in (a)............................... 8 8 3.10 Robustness vs the percentage of the video been cached, using SCB and prefix caching methods, (a) Robustness U defined in (3.17). (b) Robustness Ua defined in (3.18).......................................................................... 8 8 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.11 Robustness verification, with 1000 realizations usecl in each case, (a): Tj/Tv vs percentage of the video being cached, (b): Tj/Ty vs channel error e, when 20% of the video is cached, (c): Tj/Ty vs average channel congestion duration dc............................................................................ 89 4.1 Service Round. During each service round time, Trou n d, many users can retrieve d ata from the server for playback. This figure shows an example of Round Robin Service, where each user is allowed to retrieve a block data from a particular time slot during each service round...................................................................................................................... 92 4.2 Disk placement. The video blocks are placed in an "interlaced” fash ion so that disk seek-time can be reduced when multiple video objects are requested concurrently ..............................................................................94 4.3 Illustration of accumulated channel rate and frame rate. .................... 100 4.4 Simulation results. Accumulated channel rate and frame r a t e ................. 105 4.5 PSNR of the video sequence. When the maximum number of the con current user can be supported by the server increases, the bandwidth allocated to each user decreases, therefore the video bit rate has to be reduced which leads to poor quality. Using rate control based com pression can improve the overall PSNR (averaging over the frames) by 0.5 to 1.5 dB compared to the scheme without any rate control (a uniform quantizer is used for all frames).........................................................106 4.6 Average waiting time and hiccups without rate c o n t r o l ........................... 107 v ii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Streaming media applications have become im portant components in multime dia communications. Typically, these applications require real-time data delivery in order to provide continuous playback with good visual quality. However, the real-time constraints may not be explicitly guaranteed when the streaming media is delivered over networks exhibiting time varying behavior. Streaming systems are designed to maximize the playback quality in the presence of various channel condi tions. This research includes studies of different components of a streaming system, and proposes several algorithms to improve the streaming performance. The first part of this thesis considers the transport of scalable streaming media over best-effort networks (e.g., today’s Internet) and proposes a scheduling algorithm for packet delivery. The proposed algorithm first determines the importance values of all packets in the transmission buffer, based on the packet contents, channel conditions and client feedbacks. Then the algorithm guides the media server to transm it more im portant data packets earlier than less im portant ones. This leads to improvements in the playback quality. viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The second part focuses on video caching, and shows that the streaming perfor mance can be improved even when only part of the video object is cached in the proxy. Two video caching algorithms are proposed to store selected frames of the video in the proxy, under the constraints of cache space and decoder buffer size. The first caching algorithm aims at reducing the cost for channel bandwidth reser vation in QoS networks; while the second one is designed for best-effort networks, with the goal to improve the robustness of continuous playback against poor channel conditions (e.g., packet delay and loss). The last part of this thesis addresses the video compression problem combined with disk storage strategies for video servers. Video disk storage algorithms aim at improving the video server throughput by placing the video data blocks in a special order (therefore reducing the disk seek time). We translate the specified disk placement algorithm into rate constraints for video compression, and propose a rate-distortion based compression algorithm to improve the video quality, while maximizing the advantage achieved with the disk placement strategies. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. streaming data on time, (ii) Using retransmission can not always recover all the lost/corrupted packet due to the delay constraints, (iii) The media data packets may have different impacts on the quality of the reconstructed signal and therefore the loss of some (important) packets may have a more severe impact on quality than other packets. For such streaming media applications. Quality-of-Service (QoS) guarantees in the network are very useful, including for example, constant delay, sufficient band width and low data loss rate during playback. Therefore, transporting streaming media over QoS networks can easily achieve better performance. An example of a QoS networks considered in this thesis (Chapter 3) is a network based on Asyn chronous Transfer Mode (ATM) [19]. The main issues for delivery of streaming media over the QoS networks can be. for example, admission control, channel utilization, bandwidth cost etc. However these QoS parameters can not be easily provided by best-effort net works, e.g., the Internet. Best-effort networks are suitable for delivering bulk data with relatively loose timing requirements (e.g., text, emails, d ata files). Thus, the channel packet losses can be recovered by retransmissions. But due to its extra delay, retransmission can only recover a limited part of the lost packets in real-time streaming applications. Consequently, it becomes a challenging issue in streaming media applications to provide robust transmission and therefore better playback quality against delay, congestion and packet loss, especially in best-effort networks. 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As the Internet grows and its costs come down, best-effort networks represent the majority of practical networks used today. The delivery of stream ing media over the Internet attracts more interest, from academic research to industrial development. Though there have already been some successful commercial products for streaming applications over the Internet, those challenging problems, such as delay, jitter, congestion, packet loss, still remain open. The increase in Internet data traffic also leads to the quick development of proxy based caching in recent years. The initial research in this area (e.g., within the Harvest project [7]) has led to the development of commercial products (e.g., [21]) and to continuing research activity (e.g.. [ 8 6 , 95]). Due to the increasing demand of streaming video service and its large volume of data traffic, proxy caching of streaming video (cis well as other multimedia objects such as images, audio clips) is becoming increasingly im portant. Similar to the video end-to-end delivery methods, caching strategies specific to video are often designed differently from other methods for caching “traditional” web objects. Studies in [1 0 1 , 48, 83] show that the benefits from video caching include not only the reduction in networks access cost and delay, but also an improvement in the overall performance of streaming video applications, e.g., more robust packet delivery against poor network conditions. Examples of video streaming systems are shown in F ig .-l.l. The encoded video objects are stored on the video server (live video is encoded in real-time). The servers 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. □ Heterogeneous networks - QoSnetworks - Besteffort networks Client Proxy Client Client Client Video Server Video Server Video Server Figure 1 . 1 : Streaming video system examples and clients are connected by some heterogeneous networks, e.g. they can be either QoS networks or best-effort networks, and may have various bandwidth capacities and packet delays. Some clients can be directly connected to the servers (through the communication networks), while others may be connected to them through the proxies, where the proxies are set close to those clients and connect to the servers through the communication networks. Here the proxies are expected to improve the performance of streaming video services. 1.1.1 Focus of this thesis This thesis includes the studies on three topics, each focusing on a different compo nent of the streaming video system shown in F ig.-l.l, namely, (i) the video trans mission problem (Chapter 2 ): (ii) the video proxy caching problems for QoS and best-effort networks (Chapter 3), and (iii) video compression constrained by the 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. server storage restrictions. (Chapter 4). The relationship between each topic and the overall system is further depicted in Fig.-1.2. The streaming media objects can be video, audio, speech, graphics or any combi nation of them. Most of this thesis addresses variable-bit-rate (VBR) video objects, which usually have high data volume and bit rate burstiness. However, the proposed algorithms, for example the delivery scheduling in Chapter 2 , can also be applied to other media objects, e.g., the audio clips. Feedback » | ciitnt Network channel Video Server Video proxy Video disk storage Video Caching for besteffort networks Video Caching for QoSnetworks Figure 1.2: The relationship between the topics in this thesis and streaming video applications. In the rest of this chapter, we will first describe the characteristics of VBR video, then we will give brief reviews on the related areas of streaming video delivery, caching and disk storage. The contributions of the thesis on each topic are summa rized at the end of this chapter. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2 V B R video characteristics In this section, we review some properties for streaming media objects and their delivery challenges, especially for Variable-Bit-Rate (VBR) video. 1.2.1 V B R video In order to achieve acceptable visual quality, video data is typically large even after compression. For some good quality video stream, e.g., compressed digital movies, the data volume can be huge and can pose significant challenges to system resources, such as server disk storage, disk bandwidth, network channel bandwidth, codec buffer size, etc. For example, a 2 hour movie may have a total size of 4 gigabytes. The compressed video signal usually produces a variable bit rate stream, i.e., some frames (e.g, frames containing “complex” visual objects and/or more motions of the objects) may be coded with more bits than other frames (e.g, with simple objects and less motion). This phenomenon is also referred to as data burstiness of the VBR video. The burstiness introduces difficulties for video transmission over a constant-bit-rate (CBR) channel, and has been studied in some literature as well as in this thesis (see Chapter 3 on video caching). 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.1.1 R ate-d istortion m odel Most current coding standards for images, video and audio, such as JPEG [61], MPEG [50], H.263 [32], use lossy compression techniques to achieve high compres sion gain, in which the detail information of the source may discarded, while the reconstructed signal still has reasonable perceptual quality. The amount of output data from lossy compression can be increased/decreased by increasing/decreasing the quality of the reconstructed signal. The fundamental mathematical principle behind this property is the "rate-distortion” theory. The classical rate distortion (R-D) theory has its origins in Shannon's theory [85], which is concerned with the task of representing a source with the minimum number of bits possible, subject to a fidelity criterion. R-D based video compression for different applications has been studied exten sively in recent years, [81, 89, 55]. The basic idea in most R-D based approaches is to properly allocate the amount of bits to different parts of the signal for a given a rate budget (e.g., total number of bits for the compressed signal), so that the quality of coded signal is maximized. For example, video objects of real-time deliver}' are usually compressed with a given target rate, based on the available channel band width. Under this rate constraint, the compressed video quality can be maximized by using R-D based compression approaches [28]. Note that because the channel rate may be unknown in the compression stage due to heterogeneity of the network, in some cases it is preferable to compress the 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. video at high quality with high bitrate, and let the sender application decide the number of bits to be transm itted during the delivery. This highlights the usefulness of the so-called scalable or layered encoded video (similar scalable formats can be found in still image or audio coding). In fact scalable coded video also has other advantages over non-scalable systems, which will be discussed in the next section. 1.2.1.2 D ata contents One im portant property of compressed multimedia data is that different parts of the bitstream have different “importance value” for the quality of reconstructed sig nal. There are different ways to classify the compressed data into categories with different “importance” . For example, in transform-based (DCT or wavelet) still im age compression, the low frequency coefficients are treated as more im portant data than high frequency coefficients, because low frequency coefficients can be used to reconstruct a coarse version of the signal, and high frequency coefficients can be decoded to obtain additional fine details based on that coarse version. The different importance among those coefficients can also be found in video compression, where some frames are coded as still images (e.g., the I-frames coded in intra mode). More over, in compressed video data, “data dependencies” also contributes to producing different levels of data importance. For example, I-frames are more im portant than P-frames (coded in predictive mode), because P-frames are coded depending on the d ata in corresponding I-frames (P-frames can not be decoded without the presence of I-frames) [31]. 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Thus, in the presence of limited channel bandwidth resources, or when network congestion occurs, the less im portant data can be discarded by the sender or the intermediate nodes in the networks, in order to give more chance to the more impor tant data to be delivered on time. To better utilize this property, one may choose to use a scalable encoding scheme. 1.2.1.3 Scalable and non-scalable representations Scalable (or layered) compression techniques are attractive because of the additional flexibility and functionality they provide, although this comes at the cost of a reduced compression performance. In a scalable coding scheme, the original signal is coded into several layers, from lowest to highest layer. The lowest layer, (also referred as the base layer), contains a coarse version of the signal, each of the higher layers (referred as enhancement layers) carries finer information of the original signal. To reconstruct the signal, one can decode starting from base layer to an arbitrary higher layer. The more layers are decoded, the better the quality of the reconstructed signal is. Note th at in order to decode a higher layer, all the layers lower then that particular higher layer must be decoded first. In other words, a higher layer is useless without the presence of all corresponding lower layers. Layered coding for image compression has been used in several systems, e.g., progressive compression in JPEG standard. The new JPEG 2000 standard sup ports still image compression with a better embedded scalability (by using wavelet transform based compression) [30]. 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Scalable video is also supported by some standards, such as H.263-F and MPEG-4 [31]. MPEG-4 specifies a scalable coding format for video, referred as fine-granular- sealability (FGS), which encodes video into two bit streams, i.e., the base layer and enhancement layer. The base layer has to be decoded completely and then the enhancement layer can be decoded to an arbitrary position [63]. While, typically scalable video encoding produces less compression gain as com pared to non-scalable techniques, it is still preferred in some situations for its ad vantages over non-scalable video. For example, to access versions of the video with different qualities, several complete bit streams of compressed video have to be stored by using non-scalable coding. In contrast, only a single scalable video bit stream is needed in that case, so that the disk storage can be reduced. For streaming video multicast applications, some end users connected with chan nels having small bandwidth may prefer a low bitrate video stream with poor quality, while some users with a high capacity channel may prefer a higher bitrate video with better quality. This conflict can be easily solved by sending a single scalable video bit stream with full quality, so that the users can decide to receive a number of lay ers that matches their channel capacity. This approach is known as receiver driven multicast [45, 46, 14]. For end-to-end unicast streaming video, scalable video offers better flexibility for different network conditions, e.g., when the network suffers from congestion or delay, the sender can send fewer number of layers to maintain the continuous playback 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. at receiver; conversely the number of layers can be increased when the network conditions become better. Chapter 2 of this thesis studies the transmission strategy for end-to-end scalable streaming video. By analyzing and exploring the flexibility of the scalable video, a transmission algorithm is proposed to achieve better receiver playback quality. Chapter 3 and 4 do not explicitly specify the video format, the algorithms proposed in those chapters are applicable to both scalable and non-scalable videos. 1.3 R eal-tim e V B R video delivery and its challenges 1.3.1 D elivery over best-effort networks Delivery of streaming media may requiring some QoS guarantees on channel band width, delay and loss rate that may not be explicitly available in best-effort networks. Delay. Real-time streaming applications require strict delay in packet delivery, since those packets that arrive to the client later than their playback time are con sidered to be useless. This delay bound also introduces the following requirements for the channel: bandwidth and packet loss. B andw idth. The channel should have enough bandwidth to deliver the packets on time, especially for video data which is typically large. 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Loss. Since retransmission introduces extra delays, it can not recover all the lost packets in real-time streaming applications. Therefore the channel should have low loss rate, or have extra bandwidth to deliver additional error resilient data. Usually in best-effort networks, these parameters can not be easily controlled by the end users (sender or receiver), and the network behavior may be difficult to predict. Since these parameters are time varying, a common mechanism uses network feedback to estimate them during the transmission. The lack of these QoS guarantees is a very significant challenge in designing an efficient streaming video system. 1.3.2 Joint approaches Approaches to improve the performance of Internet streaming video can be roughly classified into two general categories: rate control and error control methods. Before we give an overview of these approaches, it should be pointed out that most of these approaches are designed in a joint fashion by considering the characteristics of both the media data contents and the network conditions. The main reason for the better performance of “ joint” approaches is th at the compressed media data has different levels of importance (and data dependencies), as mentioned above. Joint approaches can treat the data packets differently according to their importance to achieve better overall performance. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, from a coding point of view, Joint Source Channel Coding (JSCC) has been proven to have advantages in coding multimedia signals with protection against channel error (for example, unequal protection coding, multiple description coding, etc..) From a network point of view, since the traditional protocols (such as TCP/U D P, which treat all data packets in the same way) are not suitable for real time trans mission, the new protocols proposed (such as RTP/RTSP [79]) are designed to be more specifically for streaming media delivery. There are other proposed delivery mechanisms that consider both the content of data packets and channel conditions, for example, priority packet transmission or dropping, [62]. Even in other components of the system, such as the proxy caching and central server disk storage, the joint approaches can lead to attractive performance improve ments. In this thesis, the proposed strategies in different topics (delivery, caching, disk storage) also have the "flavor” of a joint approach, and will be shown to have improved performance for streaming applications (covered in Chapter 2-4). 1.3.3 R ate control and error control 1.3.3.1 R ate control The heavy data traffic of streaming video may introduce network congestion (of course, the congestion can be also due to other reasons such as physical problems of internal network nodes), which in turn, causes packet losses and degrades the 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. quality of the playback. “Rate control” methods are commonly applied to solve this problem, by considering the channel condition as a critical parameter during the data encoding procedure. To obtain the channel conditions, one can use a pre defined channel model or estimate it from the real time network statistics based on the feedback; or combine them both, e.g., use a priori channel model and update it during the data transmission according the network status. The rate control algorithms can be roughly divided into the following two cate gories. 1. Encoder rate control. This approach operates directly in the video compression domain, which basically reduces the output bitrate from the encoder when net work congestion occurs, and maintains the output bitrate under the available channel bandwidth when there is no congestion. 2. Network interface rate control (rate shaping/sm oothing). Instead of varying the actual bits of compressed frame, it controls the video data to be fed into to the channel, or statistically multiplexes multiple streams into the channel, to maximally utilize the channel bandwidth. 1.3.3.2 Error control For data (such as text, files) with lax timing requirement, error control is easy to achieve, e.g., retransmission can be simply applied to recover lost data. Due to the delay constraints, streaming applications have to use more carefully designed error 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. control mechanisms to achieve better performance. Some approaches are listed as follows. 1 . Forward error correction (FEC). Both compression and error protection are applied during the encoding of original signal and the transmission of coded data. Joint Source Channel Coding (JSCC) has been studied extensively ([22, 34]) in designing of FEC approaches. FEC is applied in the situations where the network feedback is difficult to obtain (to guide the retransmission), or the round-trip-time is too long for retransmission. For example in multicast stream ing video, FEC is preferred to avoid feedback explosion and improve channel utilization, [14]. 2 . Error resilient coding. This approach has some similarity with the FEC ap proach. However error resilient coding operates mostly in the signal com pression stage. Different compression methods, with different error resilient capability, are chosen according to the channel environments. For example, Multiple Description Coding (MDC) is shown to be more robust against chan nel noise than layered coding for audio or video in certain cases, i.e., in the case when there is no prioritized delivery (which is common for the Internet), or packet retransmission is not allowed (due to tight delay constraint) or not provided (e.g., lack of feedback channel) [33, 8 8 ]. Another example can be the “compression mode selection” (intra/inter mode) in video compression [23], where the more robust compression mode (intra) is chosen when the channel 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. noise is high, at the cost of requiring a higher transmission rate than if inter coding modes were used. 3. Retransmission. Though being tightly restricted by the delay bound, retrans mission can be applied with certain limitation in streaming media applications. The "limitation” means that (i) the retransmission is only useful if the packets can still arrive the client before time out, (ii) when there is only limited chan nel bandwidth, e.g. the “overall” transmission rate is limited, a retransmission of one lost packet may reduce the chances for transmission of other packets (or transmission of other lost packets). The sender has to make decisions on whether or not to abandon the retransmission requests in order to save band width for other packets. There have been works on retransmission-based error control for both unicast [20, 57] and multicast [14, 60, 39, 96]. The retransmission issue leads to an interesting media delivery problem: how to deliver (considering (re)transmission of both new and lost packets) a given media stream over a lossy channel with delay constraint? In fact studies have shown that to achieve better performance, not only the decisions on packets retransmission should be carefully made, but also the delivery schedule for both new and lost packets is im portant to improve the playback quality, e.g., see [62]. 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3.3.3 T h e m edia d ata delivery problem In Chapter 2 of this thesis, a general framework of scalable streaming media delivery problem is presented. This study focuses on the end-to-end unicast streaming ap plications, using a scalable (layered) encoded media. Each data packet may contain data of different layers and frames, thus has different importance “value” for the playback quality. The channels in best-effort networks between sender and receiver have certain loss rate and delay, which can vary over the time. The streaming me dia data packets are transm itted via the forward channel from sender to receiver; while the packet loss (as well as other statistics such as RTT) are reported to the server via a feedback channel. Chapter 2 formulates this delivery problem as follows. Given the scalable media bit stream and the channel feedback (such as packet lost, round-trip-time), find the best order of packet delivery in the transmission buffer containing both the new packets and lost packets requesting for retransmission, in order to maximize the playback quality at the receiver end. This problem is solved by the proposed scheduling algorithm. A brief overview can be found in the end of this chapter (Section-1.5), and details are presented in Chapter 2. 1.3.4 D elivery over QoS networks Compared to best-effort networks, QoS networks are more desirable for streaming applications, since the requirements, such as bandwidth, delay and packet loss rate, can be easily obtained in QoS networks. As a result, significant research work has 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. focused on (i) maximizing the video playback quality given certain QoS resources, and (ii) minimizing the resource usage for given streaming objects so that the system throughput can be increased, or the cost for the resource can be reduced. Some of the joint approaches for best-effort networks described in the previous sections can also be applied in QoS networks. For example, JCSS is used in [29] for video transmission over the ATM networks with some policing constraints to maximize the video playback quality. In [76], a smoothing algorithm can be used to reduce the bandwidth that has to be reserved on the channel for VBR video transmission. Chapter 3 (also see next section) includes an example of a proxy caching strat egy for video deliver}' over QoS networks, which improves the channel bandwidth utilization (a video caching problem for best-effort networks is also covered at the same time). 1.4 Other C om ponents in stream ing video services system 1.4.1 P roxy caching for stream ing video The great m ajority of recent proxy caching research and development has focused on techniques that can handle generic web objects, i.e., a decision is made about whether an object should be cached based on the type of object, or on m eta-data provided by 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the content creator. But among “cacheable” objects there is no distinction is made between, say, an HTML text file and a JPEG image. Some recent work has proposed that having caching strategies that are specific of particular types of objects can help improving the caching performance. We refer to these approaches as “partial caching” methods to distinguish them from the “complete caching- ” for web objects. An example of partial caching for images is soft caching in [54, 97, 37], which results in images being “recorded" (i.e., compressed with lower quality), instead of simply being removed from the cache when there is not sufficient cache space left. Likewise, caching only part of the video can be useful for the streaming video applications. For example, the prefix caching [83] caches the beginning frames of the video sequence in order to further smooth out the variance of the VBR video bitrate. Another approach in [101] caches part of larger frames (e.g., a larger frame is broken into two parts, one part is cached, the other is still in the server), in order to reduce the bandwidth requirement on the channel between server and proxy. Chapter 3 proposes two streaming video proxy caching frameworks for the QoS networks and best-effort networks. Both of these two cases focus on the problem of improving the overall streaming and caching performance by using “partial” caching strategies, which cache only part of video on the proxy. In the case of QoS networks, the channels have high bandwidth, low loss rate and small delay, and the caching goal is to maximally reduce the bandwidth reservation needs (so that the network cost can be reduced), by selecting proper frames of the video to be cached. In the 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. case of best-effort networks, since the channels are vulnerable to congestion, delay and loss, the caching goal here is to improve the robustness of continuous playback at receiver end against bad network condition. 1.4.2 D isk storage strategy for central video server For the video servers in VOD (video-on-demand) systems, the disk storage strategies (disk data placement) of video objects can be one of the critical factors that affects the server performance, i.e., the server maximum throughput which can be measured by the maximum number of concurrent users can be supported by the server. The factors that may affect the server throughput can be channel bandwidth, disk bandwidth and disk seek time. VOD servers are usually equipped with high speed disks with large capacities for video storage. Since the bandwidth of channel and disk are “hard” constraints, research has aimed at reducing the disk seek time, which is proportional to the distance the disk head has to travel to find the requested video data block. The studies in [26, 3, 5, 6 ] showed th at by carefully designing the disk placement for the video objects, the disk seeking time for the video data blocks can be reduced, such that the server throughput is increased. In those schemes, the data blocks of different video objects are typically placed in a “sequential” fashion rather than being randomly placed. This is because th at the video data is usually requested sequentially for playback, and a sequential placement can reduce the seek time significantly when the video size is very large. 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For multiple users connected to the same server, a Round Robin Seruice [26] is applied to allocate disk bandwidth to all users, such that each user can receive a block of video data during a fixed interval in each service round. This block of data should be sufficient for the user to display video until the next block arrives. Most disk placement methods segment the video into fixed-size blocks and place them in a certain order so as to reduce the disk seek time for these blocks. Given that the video data is VBR in nature, the blocks with same size may not contain same number of frames to supply same playback duration. The sequential placement implies restrictions in the access order of video biocks, which has a negative side effect: if a video block contains fewer frames to playback for a complete round for a particular user, the server has to (i) either ignore this and continue to serve other users, which will cause playback jitter for that particular user, or (ii) send additional blocks of data to that user, which requires the server to use additional disk seek time to retrieve other blocks, and may disorder the "sequential” data retrieval for all other users. In any case, the benefit of sequential d ata placement may become unavailable as the "random” disk seek has to be performed. Since none of these two methods are desirable, Chapter 4 describes an approach to solve this problem in the compression domain, which compresses the video under the constraints of the particular disk placement so that each data block is guaranteed to have sufficient frames for playback during an entire service round, while the video quality is maximized. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.5 C ontributions of this thesis There are three topics related to streaming media presented in this thesis, i.e., de livery, caching and disk storage. 1 . Chapter 2 studies the end-to-end delivery strategies for scalable streaming media over best-effort networks. The starting point of this problem is how to make a decision on whether or not to retransmit a lost packet in the presence of delay constraints and limited channel bandwidth. It turns out that the delivery order of all packets (both the new and retransm itted packets) is important for the playback quality. This chapter proposes a fast scheduling algorithm to deliver the packets in sender’s buffer, based on a packet importance metric that considers the data contents, data dependencies, channel conditions, and packet delivery status. The proposed algorithm can improve the streaming video (or audio) playback quality by 2 to 4dB, with very small computational overhead that enables it to be applied in real-time applications. 2. Chapter 3 focuses on the video caching problem for both QoS networks and best-effort networks. This chapter shows how to cache only part of the video to improve the overall performance when proxy cache space is limited. For QoS networks, the caching goal is to reduce the network bandwidth cost and channel utilization. The proposed method is a frame-based selection algorithm, which uses an iterative method to select frames to be cached, such that the channel bandwidth is guaranteed to be (maximally) reduced. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For best-effort networks, the caching goal is to improve the robustness of con tinuous playback against poor network condition, Another frame-based itera tive caching algorithm is proposed for this case, where a frame is selected to maximally improve the robustness. Both caching algorithms take the client buffer size into account so that the caching performance is maximized without producing overflow in the client buffer. 3. Chapter 4 formulates a rate-distortion optimal video compression problem un der the constraints of particular disk storage strategies for video servers. It proposes an R-D based compression algorithm such that the video quality is maximized under the condition that continuous playback is guaranteed even when the server throughput reaches its maximum limit. This chapter trans forms the disk placement constraint into video rate constraints, and applies the Multiple Lagrange Multiplier during the video compression stage to determine the optimal (in an R-D sense) operating points for compressed video. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Scalable stream ing m edia delivery 2.1 Introduction In this chapter we focus on the problem of delivery of scalable streaming media over a lossy packet network, e.g., the Internet. More specifically, we consider an end-to-end system, where a video server and client are connected through a channel, suffering from packet loss and varying channel bandwidth. Our goal is to find a packet transmission policy to select the packets to be transm itted (or retransm itted) at any given time during a streaming session, in such a way as to maximize the playback quality. In this Chapter we propose a server-driven “packet scheduling” algorithm especially designed for scalable media. Before we introduce the main ideas of this chapter, we first give some examples of media delivery over the best-effort networks. First, consider a video sequence en coded in a single layer and without any temporal dependencies (e.g, motion JPEG). To transm it such a video object can be straightforward: we can simply transm it the 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. frames according to their original their playback time, i.e., frames to be displayed earlier will be transm itted earlier (due to the delay constraint). A second example is to transm it video encoded with bi-directional temporal dependencies (e.g., MPEG encoded video). For example, when B-frames are used, it may be necessary to trans mit all the reference frames before transm itting any of the B-frames. This is because B-frames are useless unless their reference frames have arrived. The last example is for a scalable video encoded with both temporal and SNR (or other) dependencies, where each frame may contain several layers, e.g., the Fine Grain Scalability (FGS) algorithms with MPEG 4 [31]. The base layer can be decoded by itself with low quality: while the higher layer can be decoded, with the presence of the entire base layer, to obtain enhanced visual quality. The data dependencies among both layers and frames introduce a more complex dependency relationship across the different video data packets. Furthermore, when streaming over a best-effort network with packet loss or error, retransmission can be used to recover the lost packets, as long as the delay constraints are met. However a retransm itted packet typically has an extra delay of one or more RTT(s) (round-trip-time), and can not be guaranteed to arrive to the client on time. In addition, even if there is still time to retransmit a given packet, a decision needs to be made on whether this packet should be given more preference over other retransmission requests. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this Chapter, we propose a scheduling algorithm to select the most “impor tant” packets to be transm itted at a given time, i.e., the packets th at will tend to maximize the playback quality, where “importance value” of a packet is evaluated by taking into account data dependencies, network conditions and other factors. The work in this Chapter is an extension of our previous work [49] where we apply the scheduling algorithm to layered video objects with more complex dependencies than those considered in [49]. Our simulation results show that by using the proposed algorithm, the video playback quality (in PSNR) can be improved around 2 dB, compared to a simple fixed scheduling approach, the when packet loss rate is around 20%. Retransmission with delay constraints has been studied in [58, 40], where the decisions have been made on whether to retransmit or not based on the time-out factor the lost packet. While those works consider the problem of retransmissions, they do not consider specifically scalable media, which is addressed in our work. In [63] a rate control algorithm for delivering MPEG-4 video over the Internet was proposed with a priority re-transmission strategy for recovery of lost packets, which incorporates a the constraint to prevent the receiver buffer underflow. This is achieved by giving priority to retransmission of base (lower) layers. However this work did not address the problem of the delivery order of new packets in the sender's buffer. For example, the enhancement layer is packetized before transmission, and those packets actually have decoding dependencies on each other as well as the 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dependencies on the base layer. Our proposed framework solves the problem of delivery order of both the new and lost packets and can complement other rate- control based schemes, which make decisions on the rate of the video bit-stream to be transm itted, e.g., [63]. Optimization of layered streaming media delivery was also addressed by Podolsky et al [62], who use a Markov chain analysis to find the optim al packets transmission and retransmission policies. However, the algorithm in [62] needs to search over all the possible candidate policies and thus the policy space grows exponentially with the number of layers and frames (in the scheduling window). In practice this algorithm may only be used with a limited number of layers and frames. Chou and Miao also addressed the same problem with a rate-distortion analysis [ 1 1 , 12]. They show that the policy space can be factored so that packet dependencies are loosely coupled, and the optimal schedule policy can be found through a few of iterations over all the packets in the scheduling window. Therefore that framework can operate on more layers and a larger transmission window. The main contribution of our work is to propose a simplified formulation and cost function so th at a decision on which packet to transm it can be found without any iterations, reducing the complexity greatly with respect to [11, 12]. This is achieved by introducing the concept of expected run-time distortion, which summarizes the parameters to be considered in the packet scheduling (such as data dependencies, network condition) into a single packet importance metric. A similar concept of 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. packet expected distortion has been proposed in [14, 15] and also used in [11, 12]. Those metrics take into account the status of packets whose decoding is needed be fore the current packet can be decoded (i.e., its parent packets). Our definition of expected run-time distortion extends this metric by explicitly including the “impor tance” of the children packets (i.e., those that depend on the current one). While the iterative techniques in [1 1 , 1 2 ] resolve the data dependencies and take into ac count the impact of future scheduling of other packets, we can achieve these without iterations by using our proposed metric. Therefore our proposed scheduling scheme can operate, with very low computa tional overhead, on more complex media streaming, such as MPEG 4 FGS scalable video, where a video packet may have hundreds of parent or children packets in the dependency tree (see details below). Furthermore, our proposed low complexity al gorithm is no longer restricted to operate on fixed transmission times as in [62,11,12] (i.e., the scheduling algorithm assume packets sent at fixed intervals). Our packet selection algorithm can be used at any desired time when policy is needed to select packet(s) for transmission (or retransmission). Thus data packet size can be either fixed or variable. The chapter is organized as follows. In Section 2.2 we describe the backgrounds for scalable media (e.g., MPEG 4 FGS) data, the data packetization and the stream ing system architecture used in this chapter. Section 2.3 formulates our packet 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scheduling problem. Section 2.4 presents the proposed ERDBS algorithm, and sim ulation results is shown in Section 2.5. Section 2.6 concludes the chapter. 2.2 Scalable M edia and Stream ing System A rchitecture Scalable encoded media has several layers. The lowest layer (or base layer) can be decoded by itself to obtain a coarse reconstructed version of the signal. As increasing numbers of higher layers to be decoded, more refined version of the signal is achieved. The MPEG 4 standard specifies a FGS (Fine Grain Scalability) mode [31], where a video sequence can be encoded into two layers: a base layer and an enhancement layer. The base layer has to be received completely before it can be decoded; while to obtain better reconstruction quality, one can continue to decode the enhancement layer, and can stop decoding at any position of the enhancement layer. 2.2.1 D ata packetization In this chapter we use the MPEG 4 FGS video stream as implemented by the reference codec [31]. The base and enhancement layers are packetized separately, see Fig. 2.1(a). Usually the enhancement layer has large size and in order to generate different layers out of a single FGS layer, the enhancement layer is packetized into several packets; while the base layers of many frames can be packetized into a single packet. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For each packet, denoted as it -j (the j th layer of the ith frame), we are interested in the following parameters: (i) the size of the packet r ij ; (ii) the playback time of the packet which is defined as the time when frame i has to be displayed at the client end; (iii) the distortion value of the packet djj; (iv) the set of its parent packets A ij; and (v) the set of its children packets Bij. distortion rate bits Frame/ Frame /♦! Frame ^ 2 Frame I B B P V ir (a) (b) Figure 2 .1 : (a) MPEG 4 FGS packetization of a single frame, (b) MPEG 4 FGS packetization of frames with inter-frame dependencies. The packet distortion value d, j is determined as the decrease in the distortion of the reconstructed signal when the corresponding packet Z t -j is decoded. For example in Fig. 2.1(a), when packet / t-i 4 is decoded, the distortion is reduced by d,-i4. If the base layer is split into several packets then the distortion value of each base layer packet can be assigned proportional to its packet size r ,j, while their sum is equal to the total distortion value of the base layer. However it should be emphasized that 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. this distortion value assignment is for our scheduling algorithm proposed below, decoding any single packet of the base layer cannot decrease the distortion of the reconstructed signal, as the base layer (of a frame) cannot be decoded without all its packets being available. The parent set A ij of packet /,j is defined as the set of packets th at has to be available (or decoded) in order to decode packet /t -j. For example the parent set of packet lt -, 4 . includes packets Z ,-^, 1^2 and lit 3 . The child set B ij of packet /,j is defined as the set of packets that cannot be decoded without the presence of packet lij, i.e., Bij contains all packets that have packet Uj in their parent set. For example, the child set of Z ,-t 4 is Bi, 4 including packets ^ 5 , Z j , 6 and Z , - t 7 . Fig. 2.1(a) is an example for a single frame with only intra-frame dependencies (e.g., an I frame). For any P or B frame, the inter-frame dependencies should be taken into account to form the parent and child sets, which may contain packets across several frames. For example, in Fig. 2.1(b), when two packets are connected by a line, this means there is a direct dependency between them. The arrows are directed from parent to child packets. Note that some parent packets of a packet Z ,-j may not be shown explicitly, but should be counted as well. For example, packet li+1,2 has a direct parent Z t+i,i and other parent packets Z t -,i and /i+3 ,i (this is because packets Z ,tl and /,-+ 3 , 1 are in the parent set of packet Z t - +iti). Although the original MPEG 4 FGS stream has two layers, after packetization, the enhancement layer can be broken into several sub-layers, where each packet 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. represents a sub-layer. We define the total distortion of a frame, d,, as the distortion when a frame is completely lost (see Fig. 2.1(a)). A frame reconstructed with all its layers (packets) will have the minimum distortion. Define the total distortion of the media stream as the sum of dj of all the individual frames. The playback distortion is defined as the total distortion minus the distortion of all layers and frames that are actually used at playback. Define ciij as an indicator such that a^j — 1 if /tJ is used for playback, otherwise a ,j = 0. For a media sequence with N frames, with de layers in each frame, the actual •playback distortion, Dy, can be obtained as N Li N Li = (21 ) i= I j=l t=l j= I Note that the on-time arrival of a packet l^j does not necessary mean th at it can be used for playback, as decoding a packet is not possible unless all its parent packets have also arrived on time. 2.2.2 System architecture Chund Re-constructed signal Multimedia signal Client: Playback Lossy Channel Layered encoding Tx buffer Figure 2.2: System architecture 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 2.2 shows the architecture of our proposed streaming system. The server and the client are connected by an unreliable channel link, which has low delay but with packet loss/delay (e.g., a UDP channel). We denote e the packet loss proba bility in this server-client data channel. The source media sequence is compressed and packetized into several layers/packets, then is fed into the server’s transmission buffer. These packets are referred as the “new” packets waiting to be scheduled for transmission. There are also packets in the server’s buffer that are waiting for retransmission because they are reported lost or no acknowledgment was received. The server’s scheduling module selects one packet at a transmission time from those buffers and sends it to the channel. At the client end, the lost or damaged packets are reported to the server via a feedback channel. The value of round-trip-time (R T T ) is defined as the interval from the time a packet is sent from the server to the time the server gets feedback on this packet from the client. W ith a smaller RTT, the server can get the feedback more promptly and have more time to re-send (if necessary) a packet before its time-out. If the feedback channel is also unreliable, RTT estimates can be used by the sender as the “time out” threshold in the case of lost of feedback. Both channel error and RTT can be estim ated by the feedback information. They can be used by the server to adjust the stream delivery mechanism according to the varying channel conditions. The start-up delay, r , is defined as the period between the time when the first packet is sent by the server, and the time when the first frame starts to be displayed 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. at the client. Usually a larger initial delay can smooth out more variations in channel behavior, and enable more time for retransmissions, resulting in a better playback quality. Refer to [28, 48, 47] for a more detailed analysis of the trade-off between start-up delay and playback quality. Our proposed scheduling framework can be applied for both pre-recorded and real-time encoded media which has a small start up delay. 2.3 The Packet Scheduling Problem The goal of this work is to minimize the playback distortion Dy for a streaming session in a lossy packet network. Due to the fixed start-up delay constraint, not all lost packets can be recovered by retransmission. However, if the server schedules a packet to be sent much earlier than its playback time, this packet will have more chances to be retransm itted (and received) before it is too late for display. This fact motivates the main ideas in our proposed framework. Given a set of packets, Q, that are the candidates to be transm itted by the server, we define a schedule s as the transmission order of all those candidate packets, which specifies whether and when a packet should be transm itted. Clearly, the order of delivering the packets has an impact on the actual playback quality, because of the delay constraint (which limits the retransmission) and data dependencies. Intuitively, it would be useful to transm it (retransmit) a more “im portant” packet at an earlier time, e.g., a base layer or I frame packets, rather than simply transm it 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. all the packets in a sequential order (i.e., following the original time stam p order) without any distinction among all the packets. Due to the packet loss, the importance of a packet needs to take into account not only the playback distortion and its playback time tt -, but also the status of other packets in both the parent and child sets. These other parameters can be captured into an “expected” distortion metric. We denote D q as the expected playback distortion of the packet set Q, where Q is the set of packets currently available in the transmission buffer. Similar to (2.1), we have £> c ( s ) = £ did, - £ a-ijdij, (2 .2 ) k,j<zG k , j € Q where a is the probability that packet can be used for decoding, &i,j = I I (1 — Pm,n(s))- (2-3) Eq. (2.3) specifies that the probability, a jj, is the product of the probabilities of the successful arrival of all its parent packets. pm,„(s) is the probability that packet lm,n (which may or may not belong to Q) is lost or delayed, for a given schedule s. Clearly pm )„ depends on both the packet loss rate e and the schedule s. For a given e, if packet lm > n is scheduled to be transm itted earlier, it will have more chances to arrive on time, or to be retransm itted (within the delay constraint) if it is reported lost. We can formulate the scheduling problem as follows. 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To find the optimal solution for our problem in (2.4), an exhaustive search over all candidate schedules is possible but may not be practically useful, since even a small candidate set Q can leads to a large number of candidate schedules. The size of S, and therefore the search complexity, increases exponentially with the size of Q. Exhaustive search is only applicable when the window size W is very small. Alternatives to an exhaustive search do exist as in the search algorithms proposed by Chou and Miao in [11, 12]. In this chapter we propose a fast heuristic approach to solve this problem practi cally. The performance of the heuristic approach is very close to that of an exhaustive search (see experimental results below). At the end of this section we will discuss the difference between our approach and those in [1 1 , 1 2 ]. 2.4.1 E xpected run-tim e packet distortion Recall that only the first packet in s' is actually used for transmission at a given time. Thus, instead of scheduling the order of the transmission of a packets in < ?, if one can properly predict the importance of each packet in £?, then choosing the most “im portant” packet to be (re-) transm itted should provide a greedy solution to the scheduling problem. In order to estimate the “importance” of a packet prior to the transmission, we propose the concept of expected run-time distortion, which takes into account (i) the packet distortion dij, (ii) the packet data dependencies, (iii) the channel conditions 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (e.g., loss rate, RTT), (iv) the packet playback time (deadline) and its delivery status (e.g., transm itted, received, lost, etc.). We first introduce the concept of run-time distortion, dij, which enables us to capture the dependencies among packets (layers). We will show some examples before giving its definition. Consider transm itting or retransm itting a packet lij. If it is independent from any other packets, its run-time distortion is equal to its original distortion, djj — dij. If it has a child packet ltj + L , we will have the following situations. 1. If lij+i has not been transm itted yet, transm itting lij only affects the layer itself. 2. If lij+i has been transm itted (but without any ACK or NAK), transm itting lij becomes more valuable since hj+i might be received and has to be decoded with lij. 3. If lij+i has been transm itted and an ACK feedback is received, then lij be comes more im portant, because the received hj+i will be useless without this lij- 4. If l{j+i has been transm itted and a NAK was received we will have the same situation as in ( 1 ). 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Similarly, the gain of transm itting (re-transmitting) Z t j + 1 also depends on the status of lij. Extending the above to multiple layers, we define the run-time distortion of lij as follows: = n ( 1 - (2.5) € B ,j where P H(lm < n) is probability of loss/damage to layer /m,n, based on its transmission history, and is defined as 1 if layer I has not been sent, 1 if there is NAK on layer I, en if layer I has been sent n times, 0 if layer I is ACKed. (2.6) The term dij n /m ,„6^ j ( l — P fr{lm,n)) in (2.5) shows that the original distortion of a packet is weighted by the probability of receiving all its parent packets. The sec ond term, £zm ,„es, j dm,n( 1 — P K{lm,n)), indicates that the importance of a packet increases if any of its children packets has been received. Before anything is trans m itted, the run-time distortion of all layers is zero except for the lowest layer, for which the run-time distortion is equal to the original distortion dij = dij. Eq. (2.5) implies that only after transmission (at least once) of all its parents, does a packet’s run-time distortion become non-zero (except the base layer). The run-time distortion increases if a child packet has arrived (or has been transm itted). Thus this definition 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reflects the “importance” of a layer given data distortion, d ata dependencies, and the current status of the transmission. Now we consider the loss probability of packet lij when we are able to send it at time tX j. We denote it as P x (lij) to distinguish it from P H in (2.6). Since there could be possible retransmissions for a packet lij before its playback time U, we approximate P x (Uj) as follows, where Uij is the number of possible retransmissions before packet Uj passes its playback deadline, and can be obtained as where tP is the transmission delay for lij, i.e., tP = rij/C {t). C(t) is channel bandwidth, which can be varying over the time. We here defined the expected run-time distortion dij of a packet lij as follows, (2.7) (*i — — tfj)/R T T , (2 .8 ) diji^ij) — P (k,j) X dij , (2.9) where dij is a function of tjj since P x {kj) depends on it. We will use dij as the importance metric of packet lij. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.4.2 E xpected R un-tim e D istortion B ased Scheduling — E R D B S We now propose a fast scheduling algorithm to select packets for transmission at any given time. Denote t^ r as the current time at which the server wishes to select a packet to transm it. Note that tcur corresponds to the time when the first packet in a schedule s is to be sent. We substitute in (2.9) by 4 ^ , for all the packets in G, and select the packet with the largest value of dij(tcur) to be transm itted at current time t ^ . The reason is that the expected run-time distortion evaluated at the current time t ^ reflects importance of the packet if it is to be transm itted at tcur• We refer to this approach as “Expected Run-time Distortion Based Scheduling” (ERDBS). The complexity is greatly reduced with this algorithm, since only one iteration for all the packets in transmission buffer is needed. The results in [49] show that the packets selected using this greedy search, are almost identical to the ones selected by the optimal schedule in (2.4) using an exhaustive search. 2.4.3 D iscussion From (2.2) and (2.3), we can see that the distortion decrease in D q ( s ) achieved by any given single packet is heavily coupled with the transmission schedule of its parent packets. This introduces high complexity in the search for the optimal scheduling solution. As mentioned before, Chou and Miao [1 1 , 12] already show that the dependencies of packets can be factored in a way such that the optimal 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. schedule policy can be found with several iterations instead of a full search. The main difference between the approach in this chapter and those in [1 1 , 1 2 ] lies in the following two aspects. First, for any packet Z x -j, the status and distortion values of its child packets have been explicitly taken into account in our definition of the run-time distortion (see (2.5)). Second, the loss probabilities of all parent and child packets of Z x -j are computed based only on the history of the transmission of those packets (see (2.6)), without any assumption of the future transmission schedules. In contrast, in [1 1 , 1 2 ] these probabilities are calculated based on both the history and the transmission policies which are scheduled in the future. Therefore, with (2.5) and (2.7) we are able to capture both the data dependen cies and delay constraints into a single importance metric, i.e., expected run-time distortion, and the selection of packets can be done by choosing the packet with that maximum metric. Thus the full search in [62] or the iterative techniques in [1 1 , 1 2 ] can be avoided. 2.5 Experim ental R esults To evaluate our proposed scheduling algorithm, we also simulate a transmission scheme with no particular scheduling among different packets. We refer to this as the Sequential Scheduling (SS) scheme. In SS, the data packetization is the same as depicted in Section 2.2.1. The packets within the same frame will be transm itted consecutively. The frames are delivered according to their original order in the 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. encoded video stream. The SS scheme has the ability to discard a packet (either new or retransmitted) if it detects that this packet exceeds its playback time, in order to save bandwidth for other packets. The SS scheme selects proper layers of each frame to be sent to the client such that the average video data rate will not exceed the available channel bandwidth, and the remaining higher layers will be discarded. Video frame bit rate: base layer and enhancement layer — Fun bit rate per frame (base layer ♦ enhancement layer) Base layer bit rate per frame ® 3 250 150 Frame index 200 300 100 Figure 2.3: MPEG 4 FGS frame data rate. The dotted line is the base layer size of each frame; and the solid line represents the full size of each frame (base layer and enhancement layer). We use a MPEG 4 FGS video stream in our simulation. The video data is packetized with a fixed packet size of 512 bytes. The data used in the simulation is shown in Fig. 2.5 (we use the video sequence Stefan with 300 frames). The average video data rate after compression, with both base and enhancement layer, is 958 KBits/second. The channel packet loss rate is 0.2, i.e., 20% of the packets will be lost when transm itting over the server-client data channel, and packet loss is assumed 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 8 3 6 3 4 g 3 2 D C Z tn 30 Q . 2 8 2 6 2 4 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0 Channel bandwidth (KBits/second) Figure 2.4: The comparison of playback quality (PSNR) using proposed ERDBS and SS algorithm, in various conditions, such as bandwidth, channel packet loss rate, start-up delay and round trip time (see also Fig. 2.5-2.7) In most cases the playback quality of ERDBS outperforms the regular SS delivery algorithm around 2 dB. to be i.i.d.. The round-trip-time is 200 ms and the streaming session has a start-up delay of 40 ms. The playback quality is measured by PSNR of the reconstructed video frames at the client end incorporating all the packets that were received on time. Over 1 0 0 realizations are averaged in the results we present. Fig. 2.5-2.7 shows the simulation results of the playback quality using the FGS video stream (in Fig. 2.5) with various parameters (such as bandwidth, channel packet loss rate, start-up delay and RTT). Our simulation shows th at by using the 44 ( 1 ) — P r o p o s e d E R D B S s c h e d u l i n g - e - ( 2 ) — S S s c h e d u l i n g /■ ? - i / - A v g . f u ll v i d e o d a t a r a t e N u m b e r o f : f r a m e s = 3 0 0 C h a n n e l p a c k e t l o s s r a t 9 5 8 K B i t s • = 0 . 2 ; S t a r t - u p d e l a y = 2 0 0 m R T T = 2 0 0 ; m s P a c k e t s i z e = 5 1 2 b y t e s / / __ A J O ' S : / / v ' / ’ W 0 ...... * '.S' . a' s ........... Q > ’........................ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 40. 3 5 ffl 2 , cc z « 3 J s o C O -Q >s ra Q . 1 5 ( 1 ) — P r o p o s e d E R D B S s c h e d u l i n g • © • ( 2 ) — S S s c h e d u l i n g V v \ X S i \ : v \ \ \ \ N . \ y . 7 > v ....... .................. ............... \ \ ) . \ \ X C h a n n e l b a n d w i d t h = 6 7 0 K B its S t a r t - u p ; d e l a y = 2 0 0 m s R T T = 2 0 0 m s P a c k e t s i z e = 5 1 2 b y t e s \ \ N . ^ \ \ V ( 0 .1 0 . 2 0 . 3 0 . 4 Channel packet lo ss error, e 0 . 5 Figure 2.5: Similar to Fig. 2.5, while parameterized by channel error probability (e). proposed ERDBS algorithm to deliver the video data, the playback quality improves about 2 dB compared to using a SS delivery algorithm. 2.6 Conclusions In this chapter, a new framework for delivery of scalable streaming media data over networks with packet loss is presented. We proposed a new delivery method, ERDBS, for this framework to solve the packet scheduling problem. The proposed algorithm, designed for a sender-driven transmission system, can increase the client playback quality by selecting proper packet (s) to be transm itted at any given time during the streaming session. The simulation results shows th at ERDBS algorithms 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ( 1 ) — P r o p o s e d E R D B S s c h e d u l i n g e • (2 ) — S S s c h e d u l i n g _________________ 3 3 . 5 ” • C h a n n e f b a n d w i d t h = 6 7 0 K B its C h a n n e l [ p a c k e t l o s s r a t e = 0 . 2 R T T = 2 0 0 m s T 3 3 2 . 5 P a c k e t s i z e = 5 1 2 b y t e s O ' o 3 1 . 5 .-O Or 3 0 . 5 4 0 0 6 0 0 Start-up delay (ms) 8 0 0 1000 200 Figure 2.6: Similar to Fig. 2.5, while parameterized by end to end delay (d). outperforms the other packet delivery schemes with a fixed scheduling, in the pres ence of channel packet loss. The low complexity of the search algorithm in ERDBS also enable it to be applied in real-time applications. 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 37 3 6 3 5 m T 3 i 3 4 w 3 3 3 2 ( 0 ^*31 o < c > . 3 0 « Q . 2 9 2 8 2 7 < x : ■' < X ................. \ : .................... \ V 0 \ \ \ \ \ ............................................................ v : . . V . . . . \ \ \ \ \ \ ..................... ; ....................... ................................................................................................... \ \ C h a n n e l b a n d w i d t h = 6 7 0 K B its © . \ ....................... .............. .......................... ........................ N ...............\ ................... C h a n n e l p a c k e t l o s s r a t e = 0 . 2 S t a r t - u p d e l a y = 2 0 0 m s X \ ; \ ..................\ . . P a c k e t s i z e = 5 1 2 b y t e s r 1 K ( 1 ) — P r o p o s e d E R D B S s c h e d u l i n g \ •e- ( 2 ) — S S s c h e d u l i n g \ ■ 1 > 1 t » 0. 1 5 0 1 0 0 1 5 0 2 0 0 2 5 0 Round trip time, RTT (ms) 3 0 0 Figure 2.7: Similar to Fig. 2.5, while parameterized by Round-Trip-Time (R T T ). 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Scalable Proxy Caching for Stream ing V ideo A pplications 3.1 Introduction Interest in proxy based caching has increased with the growth in Internet traffic and the initial research in this area (e.g., within the Harvest project [7]) has quickly led to the development of commercial products and continuing research activity (e.g., [95, 8 6 , 87, 73, 42, 93]). The great m ajority of recent research and development on proxy caching has focused on techniques th at can handle generic web objects; among the “cacheable” objects no distinction is made between, say, an HTML text file and a JPE G image. As real-time streaming video is becoming a significant proportion of network traffic and, given the large data volumes involved and its variable-bit-rate (VBR) nature, even a few popular video applications can result in potential network congestion problems. The congestion can cause not only packet loss, but also packet 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. delays, which degrade the video playback quality dram atically (since the packets that arrive after their playback time are useless). In this chapter we focus on the caching problem specifically for stream ing video objects. Some recent work has proposed that having caching strategies which are specific for particular types of objects can help improving the overall performance. In particular, for some objects, it is possible to perform “partial caching” , where only part of the objects are stored on the proxy, as opposed to the “complete caching” where the objects are stored completely. An example can be found in “soft caching” for images ([54, 97, 37]). Approaches for partial caching for video have also been proposed in [101, 83]. In this chapter, we study a “selective caching” strategy [48], which selects only certain parts of the video to be cached. We will show that the strategy to use depends on the specific network environment; and we focus on two representative scenarios, namely networks with Quality-of-Service (QoS) and best- effort networks. We provide a selective caching method for each of these scenarios. O ur proposed caching methods are frame-wise selection algorithms, i.e., the smallest caching unit we consider is one frame of the video (each frame may con tain different number of bytes). Since there can be frequent changes on caching parameters (e.g., the popularity of the video objects), it is desirable to enable the proxy to be scalable, i.e., to be able to easily increase/reduce the portion of video be ing cached while still maintaining good performance. This scalability is inherent in 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. our proposed frame-wise selective caching methods, by which the proxy can simply add/drop the selected frames as the environment changes. 3.1.1 System architecture Fast and reliable channel Client Low cost channel Proxy QoS Backbone Internet Client High cost channel Client Unreliable channel ■ Best-effort networks: Client Client Client Client Client Proxy Client Client Video Server Video Server Video Server Video Server (a) (b) Figure 3.1: System architecture. The proxies are set close the clients, and are connected to the video server via either a QoS (a) or best-effort network (b). The server-proxy-client model used in this chapter is shown in Fig. 3.1. The clients are attached to the proxy and all their requests for videos go through the proxy. The proxy allocates a certain cache space for each video sequence. Upon the client’s request, if the frames are cached, the proxy will send them from its cache to the client directly; otherwise, the proxy will retrieve the frames from the video server. For a video streaming session, usually there is an end-to-end playback delay d, which is the interval between the time when the first packet is sent and the time when the first frame is displayed. For a continuous playback, the transmission of any given frame cannot exceed the delay d. We axe also interested in other parameters for the design of selective caching, such as the server-proxy and proxy-client channel 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. characteristics; the cache space (H ) allocated on the proxy for a particular video; and client buffer size (B c). We consider two network cases in the following and will propose different strategies for selective caching for each of them. C ase 1: Proxy caching in QoS networks (see Fig. 3.1a). The bandwidth on the server-proxy channel can be reserved, and the cost of reservation is proportional to the reserved bandwidth. The goal of caching in this case is to reduce the amount of bandwidth C that has to be reserved on the server-proxy backbone channel (therefore reducing the network cost), while minimizing the required client buffer size B c to achieve th at reduction, given a limited cache space H. C ase 2 : Proxy caching in best-effort networks without QoS on the server-proxy channel, as for example in the current Internet (see Fig. 3.1b). The delivery of data over these channels is vulnerable to congestion, delay and loss, which are harmful for real-time streaming video delivery. The caching goal for this case is to provide more robustness1 for continuous playback against poor network conditions on the proxy-server channel. To improve the streaming performance given that the proxy-client channel is fast and reliable, one could consider using the proxy as an external buffer for each client, i.e., such that a client with minimal buffer resources can store some of the incoming video data at the proxy. In this case the proxy would need to allocate separate storage resources to each client for each streaming session (even when some lSee Section 3.4 for detailed definition of robustness. 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sessions may be accessing the same video object). Thus, as a proxy may serve a large number of clients simultaneously, this scenario may require the availability of high performance caching resources at the proxy, including not only memory but also output bandwidth. Both these resources have to increase as the number of clients is increased, since all the clients active at any given time will be simultaneously using the proxy for secondary storage. Thus, in this chapter, we focus on a less resource-intensive caching scenario where caching storage is assigned to each video object, rather than to each client. The assumption here is that the data stored for each video object changes only when the popularity of the video object changes and that the storage is shared by all clients accessing a given video object. Therefore the requirement of the cache storage space is reduced. In additional, this approach reduces the amount of data that the proxy has to provide to the clients (since only certain video frames need to be served from the proxy) and also requires substantially less real time storage management (since only when the popularity of an object changes is the storage devoted to it modified.) In both cases, the proxy-client channel delivers the data originated from both server and proxy. In our model we assume th at the main network bottleneck (in terms of both cost and reliability) is the server-proxy channel, and therefore the proxy-client link is assumed to have “left-over” bandwidth even when video trans mission is ongoing. One example of such a scenario would be accessing a relatively low bandwidth video stream through a DSL/cable link. Another example would 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that where proxy and client are co-located within the same local area network [ 1 0 1 ]. The algorithms proposed here are good approximations for the case when there is a substantial amount of “left-over” bandwidth, in which we can assume that the delay in delivering a frame from the proxy to the client is very small (even when transmis sion of other frames from the server is taking place simultaneously.) This will enable us to assume that, since they can delivered in a very short time, frames stored at the proxy consume practically no client buffer memory. Therefore, to simplify the analysis, we exclude them from client buffer consumption in the rest of this chapter. For the scenario when when the “left-over” bandwidth on proxy-client channel is not large enough to ignore the delay between proxy and client, our buffer analysis can still be used as an approximation for the proposed selective caching algorithms. In the extreme case where there is no “left-over” bandwidth, and therefore the client cannot receive data simultaneously from the proxy and the server, our proposed will reduce to the prefix caching proposed in [83], which will provide the optimal solution. We will show that the performance of some “partial” video caching strategies may be limited by client buffer size constraints. For example, it is obvious that in QoS networks caching any part of the video can reduce the server-proxy bandwidth C, since less d ata has to be retrieved from the server directly. However, the bandwidth may be reduced at the expense of increasing in the required client buffer size B c, because the frames that are stored at the server (not cached at the proxy) will have to 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be buffered at the client until they are displayed. We will show that the increment in B c varies among different selective caching strategies which lead to the same bandwidth reduction. Similar constraints also exist in the case of video caching for best-effort networks. Therefore, there are trade-offs between the reduction of C (or improvement of U) and the memory size B c to achieve it. Our goal is to find proper selective caching methods such that the best performance is achieved under a constraint on B c. Thus one of the main differences between our work and other proposed caching algorithms (see below) is that we will consider the storage constraints at both client and proxy. In summary, the main assumptions we make in our work, which also explain how it differs from other proposed caching algorithms (see below), are (i) th at memory at the client is constrained, (ii) that cost considerations preclude using the proxy cache to provide dynamic “additional memory” for each client buffer, and (iii) that there is some “left-over” bandwidth in the proxy-client link. We will propose two approaches for selective caching in each of the above cases, namely, Selective Caching fo r QoS networks (SCQ) and Selective Caching for Best-effort networks (SCB). 3.1.2 R elated work Proxy caching for video has been explored in [83, 101, 41] under network conditions similar to those in Case 1 (QoS networks in Fig. 3.1a). Prefix caching, proposed by Sen et al. [83], is a special form of selective video caching, which involves caching 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. only a group of consecutive frames at the beginning of the video sequence to smooth and reduce the bit-rate of a VBR video. We will show th at our proposed SCQ algorithm, compared to prefix caching, requires less client buffer B c while achieving the same bandwidth reduction (Case 1 ), and that SCB can improve robustness more than prefix caching (Case 2). Note that when B c is large, SCQ/SCB can reduce to prefix caching (see Sections 3.3 and 3.4.2). Wang et al. propose a “video staging” algorithm in [101], which prefetches to the proxy a portion of bits from the video frames whose size is larger than a pre determined “cut-off” rate, to reduce the bandwidth on the server-proxy channel. Therefore some frames are separated into two parts: one is cached on the proxy and the other remains on the server. By contrast, our proposed algorithms are frame-wise caching schemes so that a frame is either not available at the proxy or it is stored in its entirety. One advantage of frame-wise caching is that those frames available in the proxy can actually be played by the client (this would not be possible with partially cached frames) in the event where congestion prevents any data from being delivered from the server for some period of time. Another advantage of frame-wise caching scheme is that the proxy can easily add (or drop) more frames when cache space increases upon the changes of caching condition (such as network status, video object popularities, cache space, etc.), by using a caching table created off-line (proposed in Section 3.4.4). As an example, with a staging approach, if the popularity of a video increases the proxy will need to increase the 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. percentage of data in all frames (sending a request to the server to achieve this). Instead, with frame-wise caching only a few complete frames need to be requested from the server. Ma et al. [41] also study a frame-wise video caching problem slightly different from that in Case 1 , where selective caching is performed but the algorithm attem pts to select groups of consecutive frames rather than isolated frames, in order to re duce the complexity of proxy management. In our work, by using a caching table, the proposed SCQ/SCB algorithms can select isolated frames (during iterations) to maximally improve the overall performance without increase the complexity for proxy online operations. Another m ajor difference between our proposed work and other works in [41, 83, 101] is that we provide a caching strategy (SCB) for video delivery over best-effort networks, which are the most popular nowadays but are not explicitly considered in those works. Both proposed SCQ/SCB algorithms consider non-layered coded video streams, while caching for layered (scalable) video can be found in [70, 36]. Rejaie et al. [70] propose a video caching algorithm for scalable video, which co-operates with the congestion control mechanism (for best-effort networks in Case 2 ) proposed in [67]. This work studies the caching replacement mechanism and cache resource allocation problem, according to the popularity of video objects, e.g., more layers of the video with higher popularity will be cached, and vice versa. Therefore the overall streaming performance can be improved (e.g., less network congestion, better 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. playback quality). Kangasharju et al. formulate the caching problem for layered video differently, aiming to maximize the overall revenue for the service providers [36]. Tewari et al. [90] and Reisslein et al. [65] study cache replacement of streaming media (in non-layered format) to improve the cache hit ratio and therefore the streaming quality. Our work can be complementary to [90, 65], as we are focusing on the problem of selecting which part of the video should be cached, after the cache space for this particular video has been allocated. The rest of this chapter is organized as follows. Section 3.2 gives the background and definitions. Section 3.3 addresses the video caching problem for QoS networks and proposes the SCQ algorithm, while the SCB algorithm is proposed in Section 3.4 to solve the caching problem for best-effort networks. The experimental results are shown in Section 3.5. Finally, Section 3.6 concludes the chapter. 3.2 Basic definitions Most standard video codecs (e.g., [50, 31]) produce VBR data after compression, which leads to high data burstiness. Usually there is a small start-up playback delay d (here we define it as the duration between sending out the first packet and playing the first frame) for most existing streaming video services to allow the client buffer to store a few beginning frames before playback starts. This delay is useful to (i) smooth the burstiness of VBR video data [76]; and (ii) to provide robustness against packet delay resulting from poor network conditions (in best-effort networks), thus 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. playback from the client buffer is possible even when frames are being delayed. For this reason, we always cache this beginning portion of video data, which is referred as the “required initial buffering segment” (Ireq), such th at the client can start to playback with a smaller start-up delay. When we can assign more cache space than Ireq for a given video, we can choose between continuing to cache the immediately following frames (as would be done in a prefix caching technique), or instead selecting other intermediate frames to be cached. In this chapter, we consider the latter option, i.e., selective caching rather than prefix caching. Assume there are N frames in a video V, denoted as F(i), i = [1 ,2,..., N]; each frame F(i) has a size of R(i) bytes, and a constant playback duration of Tp seconds (e.g., Tp = 1/30 second). We discretize the time axis t with intervals of Tp. The total size of the video is Rtotai = HiLi R{i)- We define a “caching indicator sequence”, A = [a(l), a (2 ),..., a(iV)], to indicate whether the ith frame F(i) is cached or not: o(i) = 0 if frame F(i) is not cached (3.1) 1 if frame F(i) is cached A sequence A uniquely defines which part of video is cached, and can represent a selective caching scheme. We denote A < p (or simply 4 > ) as the zero sequence, i.e., the 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sequence where no frames are cached so th at all a{i) = 0 . A possible A must satisfy the cache space constraint N £ fi(z)a(z) < H, a{i) € A (3.2) i=l We also denote A 1 as the indicator sequence where only Ireq is cached. H is pre determined for each video object. 3.3 V ideo caching in QoS networks 3.3.1 Problem formulation An example of a QoS network is shown in Fig. 3.1a. We assume that a CBR bandwidth is reserved on the server-proxy backbone, and the cost of reservation is proportional to that bandwidth. Therefore we always reserve only the minimum required bandwidth for video delivery in a particular streaming session. Define Cr as the required bandwidth of a CBR channel to deliver the video V on time for real-time playback without jitter, given a finite start-up delay d. Due to the data burstiness of VBR video, Cr is usually higher than the average video data rate, R avg, to avoid decoder buffer underflow. Feng et al. [10] proposed a general method to find such Cr for pre-coded video without caching (where Cr is defined as critical bandwidth) in [1 0 ]. We will show that Cr changes after some frames are cached, and is a function of caching indicator A . One of our objectives in this case is to choose 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A to minimize Cr(A) by caching selected frames, so th at the bandwidth reservation cost on server-proxy channel can be reduced. However, while there may exist many possible A which lead to the same maxi mum reduction on Cr(A), they may require different amount of client buffer size B c to achieve th at reduction, as shown in the analysis below. Considering both band width and buffer size, we formulate the video caching problem for QoS networks as follows. P ro b le m fo rm u latio n 1: Given a limited proxy cache space H for a video sequence (H < Rtotai), a pre-encoded video stream V with N frames and a fixed delay d, among all possible A satisfying (3.2), find A * which maximally reduces Cr(A) after caching while requiring a minimum B c to achieve that bandwidth reduction. 3.3.2 A nalysis on bandw idth reduction To calculate Cr{A), here we will extend the solution in [1 0 ] with minor modifications. We will only refer to the results from [10]; readers can refer to [10, 72, 82] for more details. Assume the transmission starts at time to = — d, and playback starts at t = 0, the start-up delay is d (d > 0). Frame F(i) is scheduled to be displayed at time t — i. Define SU(t) as the cumulative frame rate at time t, for a given A , S x ( i ) = E ( l -<■(»•))*(*)• (3.3) i=0 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Define C(t) as the server-proxy channel bandwidth at time t. A feasible channel rate function C (t) to ensure a continuous playback must meet the following constraint: t £ C(i) > SA (t), for all t € [l,iV]. (3.4) i=— ( i S A(t) and 5Zi=-d C{i) can been thought of as the video data consumption curve at the client and the data supply curve from the channel, respectively. Since cached frames do not consume the server-proxy bandwidth and they can be fetched when needed (right before decoding) from the proxy, then the cached frames can be excluded from the client consumption curve in (3.3). Eq. (3.4) means th at the supply curve should be greater than the consumption curve in order to avoid client buffer underflow (see Fig. 3.2). For a CBR channel, the bandwidth cannot exceed the constant allocated rate, so that C (t)< C r{A ), for all t e [ l , N } . (3.5) Define the slope function of a video sequence to be L A(t), LA {t) = S A(t)ft. (3.6) Fig. 3.2 shows an example of S A(t) and L A(t). In fact, L A(t) represents the lower bound of a feasible C(£), and Cr can be obtained as (see [10, 72, 82] for a proof) C M ) = maxr{LA {t)}, t e [1 , N], (3.7) 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bits F r a m e r a t e : R(i) (a) B its/sec 1 /fC U-dC(i) LA{t) Frame rate: W) imm 'rsv Figure 3.2: Cumulative rate and slope functions, (a): Cumulative frame/channel rate. Curve ( 1 ), (2) and (4) are cumulative channel rate functions, £,-C (i). Among them only (1) and (4) represent CBR channels. Curve (3) is the cumulative frame rate, 5 ^(t). Curve (1 ) and (2 ) are feasible channel rate, while (4) is not. (b): Slope function L ^(t) is drawn in curve (3). The minimum CBR channel bandwidth that has to be reserved is CT = m ax{L^(t)}. i.e., the minimum bandwidth Cr(A) that has to be reserved on a CBR channel is the maximum value of the video slope function L ^(t), which is reached at time tpeak, referred to as the consumption peak time, (3.8) Suppose we want to cache one more frame F (k) after some frames have already been cached. Let A and A k be the caching indicator sequences before and after caching F {k ), respectively. The following proposition shows the change in bandwidth after caching F(k). 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P ro p o s itio n 1 For a given F (k) and A , maoc{L^k(t)} = m ax{L ^(t)} — A l, where R { k )/tpeak, k G [lj^peafc]) A l = (3.9) 0 , otherwise. See Appendix for a proof. An illustration is shown in Fig. 3.3. This means that max{L.4 fc (f)} can be reduced if and only if k G [1, tpeak\. Therefore we should cache a frame before tpeak to reduce CT, and the reduction A l depends only on the size of the cached frame but not on its position, i.e., Cr{Ak) = C M ) - R (k )/tpeak, if k G [1 , tpeak\. (3.10) The above conclusion indicates that caching successive frames from the beginning of the video sequence is one of the caching schemes that can reduce Cr maximally. As an example, “prefix caching” , which selects the group of beginning frames to be cached until the cache space is full, is a good approach to reduce the bandwidth requirements [83]. 3.3.3 C lient buffer analysis Recall that we need to select frames to be cached to (i) maximally reduce Cr(A ), and (ii) require the minimum B c to achieve that reduction. Selecting different frames to be cached within the period of [1, tpeak\ leads to different requirements of B c. This can be explained by the following buffer analysis. Define a byte-level buffer trace 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where 1 < t < tpeak■ Note that after caching F (k), SAk(t) = S A(t) for 1 < t < k; and S Ak(t) = SA(t) — R(k) for k < t < tpeak■ Thus we have B A{t) - R (k)j-Lr , 1 < t < k, BaM = { (3.14) B A(t) - R(k) + R (k) k < t < tpeak. Eq. (3.14) means that after caching frame F {k), the new buffer trace decreases before k (due to the reduction of Cr), and increases during [k,tpeak\ (due to the removal of R(k) from the consumption curve)2. It also indicates that at any particular time t € [l,£P eafc]) the amount of increment/decrement of B Ak{t) depends only on R (k). More specifically, if tmax < tpeak, we can reduce B max(Ak) only if we cache frame F (k) after tmax, and we will have Fmax('Ak) = ^m ai(^) if tmax ^ k ^ tpeaki (3.15) where A 6 — * ™ axR (k). (3.16) tpeak The change of B Ak(t) for the remaining period tpeak < t < N may not be easily expressed in a closed form. However, based on the results from [10, 72, 82] we can find that the maximum buffer occupancy may increase (or at least not decrease) for the period of tpeak < t < N . This is because B Ak{tpeak) = B A(tpeak) = 0 , and 2Eq. (3.14) is obtained as the delay of transmitting F(k) (over the proxy-client channel) is small, and the delay is ignored to simplify the analysis. The frames after F(k) are received earlier than they would have if caching is not used. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the only difference is that a smaller bandwidth (CUfc (-i)) is available to transm it the video data for t > tpeak, after caching frame F (k) (see [10] for details). Proposition 1 and (3.14) assume th at tpeak or tmax do not change after F (k) is cached. However those results still hold when tpeak or tmax changes, except that the absolute values of A/ and A b will become smaller, which would not affect the conclusions in next section. 3.3.4 Proposed SCQ algorithm The results in (3.10) and (3.15) lead to the following conclusion for caching one frame F(k): one can cache the frame at tpeak, F{tpeak)> in order to minimize Cr(A) and B max(A). The reasons are: 1 . From Proposition 1 we have to cache a frame before tpeak to reduce Cr. 2. From (3.14) and (3.15), if B max is reached before tpeak (i-e., tmax tpeak)i caching the frame F (tpeak) can also reduce B max- More specifically, in this case, caching any frame within the period of {tmax, tpeak\ has the same effect in terms of reducing both B max and Cr. Similarly, caching any frame of a given size within the period of [1 , tmax] has the same effect on reducing Cr while increasing B max• See Fig. 3.4 for an illustration. Therefore, simply selecting frame F (tpeak) is guaranteed to reduce B max and Cr , without requiring to compute tmax' 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. If Bmax is reached after tpeak (i.e., tmax > tpeak), caching any frames before tpeak can not reduce (or may increase) B max- However, from (3.14), we can see that caching frame F(tpeak) has the effect of reducing B^{t) for the longest duration of [l,£peajf c ] - Also as more frames are cached, tpeak tends to increase monotonically. Once tmax < tpeak, we will have the above situation 2, and the new B max is already reduced if we keep on caching frame F(tpeak). The above analysis shows that the changes on Cr(A) and B max(A) depend only on the cached frame size (i.e., the absolute values of A I and A 6 depend only on R{k)), not the exact position of the cached frame, as long as that frame falls in the range of [1 , tpeak\. For a VBR video, the frames around tpeak may have different sizes, the search for the optimal selection of frames to be cached can be complex when cache space H is large. Therefore we propose an heuristic approach, SCQ, for selective video caching in QoS networks, which selects frames iteratively. During each iteration, SCQ computes the L^(t) and locates tpeak using (3.8), then selects the frame at tpeak (i.e., F(tpeak)) to be cached. This process is iterated until the cache space is full. The detailed procedure is summarized in the following steps. S tep 1 . Initialization. Set n = 1 (n is the iteration index). Cache the required initial segment Ireq and set A 1 correspondingly (see Section 3.2). Let A n be the cache indicator after the nth iteration. Set H < = H — ITeq. S tep 2: Find t^eak = arg maxt{L** (£)} for the n th iteration. 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. S te p 3: Cache frame F(t%eak), set a(t%eak) = 1; set H < = H - R(t” eak). S te p 4: If there is no cache space left (H < 0), procedure ends. Otherwise, set n 4= n + 1 , go to Step 2 (start next iteration). We will obtain the solution A * = A!1 at the last iteration. Note that usually tmax > tpeak during the initial iterations. Therefore increasing of B max can not be avoided since we have to cache a frame before tpeak to reduce bandwidth. Bmax starts to decrease once tmax < tpeak after more frames are cached, and SCQ tries to keep th at initial increment as small as possible. See the results in Section 3.5. 3.4 V ideo caching in best-efFort networks An example of video caching in best-effort networks is shown in Fig. 3.1b. The server- proxy channel bandwidth may have variations due to network congestion or other poor conditions. These variations can cause dram atic degradation on continuous video playback quality, since packets that arrive too late are considered to be lost. Thus it is useful for the client to buffer a certain number of frames before and during playback, in order to increase the likelihood th at frames are available for playback in the decoder buffer during the periods of packet lost (delay). The more frames are buffered at a given time, the more robustness there will be against the packet delay. However, the frames in VBR video have different sizes, which means th at the number of buffered frames in the client’s buffer may not be constant during playback. 6 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Periods during which the number of frames in the decoder buffer becomes low are referred to as the risky periods (less robust). Our caching goal here is to improve the robustness by increasing the number of frames in the client buffer during risky periods. As we are focusing on an off-line caching algorithm that only has the knowledge of the video sequence, we do not make any assumptions about the actual bandwidth variations while deriving the caching algorithm. The risky periods of the video sequence can be located before transmission by analyzing the decoder buffer contents during a “virtual” playback, where a constant server-proxy bandwidth C (close to the average video bit-rate) is applied. Note that the network congestion may happen any time during a real-time session. However the congestion is more likely to cause quality degradation when the client buffer contains fewer frames, i.e., during the risky periods identified in our analysis. In the simulation, we transm itted video (with partial caching) using a server-proxy channel with bandwidth variations to verify the effectiveness of the buffer analysis and the caching algorithm. V V e define a frame-level buffer trace function, B^(t), which indicates the num ber of frames available at the client during the playback. A measurement of the robustness of a video stream U can be defined as U = m in{Bf {t)}, (3.17) 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. i.e., the minimum value (referred to as a trough) of the buffer trace in number of frames. Risky period is the time when a trough occurs, i.e., tr = argm int{R^(t)}. There might be many risky periods/frames in one session. The larger the more robust this video stream will be around time t. The robustness metric U defined in (3.17) corresponds to using a MaxMin crite rion (as we will try to maximize U by caching frames, see below). Obviously there are alternative ways to define robustness such as, for instance, the average number of frames in the buffer (referred to as a MaxAverage criterion): ^=-s?x;b/w - ( 3 - is> t=i Each of these two measures of robustness, U or Ua, leads us to different algorithms to select which frames should be cached. For most scenarios in this chapter, we use the MaxMin criterion for robustness, and we will use the MaxAverage criterion only to break the tie among multiple choices that all improve U in the same way (see Section 3.4.3). Note that B^{t) (measured in number of frames) is used to calculate robustness; while B{t) (measured in number of bits, see (3.11) ) is used to determine the occu pancy of the client physical buffer during playback. It should be emphasized that both the frames in the client’s physical buffer and the cached frames at the proxy are counted as the available frames for the client. In other words, cached frames can 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. increase B^(t) (and therefore the robustness), while not occupying client physical buffer, i.e., they do not increase B (t)3. 3.4.1 Problem form ulation In this case the decoder buffer size Bc is the bottleneck in improving U. For ex ample, a straightforward way to improve U is to cache the “earlier” frames from the beginning of video sequence. Thus the client can retrieve the “later” non-cached frames from the server, while it is displaying the cached frames retrieved from proxy. At the time when those “later” frames that are not cached start to be displayed, many of them are already buffered at the client, and therefore U is improved. How ever, this method could soon fill up the client buffer since the frames retrieved from the server are not played until all the cached frames (scheduled to be displayed at earlier time) are displayed. Thus the server may have to slow down the transmission speed when the client buffer is full, which waste the bandwidth to further increase the robustness. Therefore a proper selection scheme should be designed under the constraint a limited memory buffer Bc. We formulate the caching problem as follows (assume Bmax is known). P roblem form ulation 2: Given a limited cache space (H < Rtotai) on the proxy, a pre-encoded video stream V with N frames and a fixed delay d, among all possible A satisfying (3.2), find the cache indicator sequence A* such that the 3Again, this is true because the cached frames can be fetched quickly from proxy right before their playback time, such that they can be excluded from the consumption of client physical buffer, see explanation in Section-3.1. 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. robustness U = mint{B^. (t)} is maximized, without exceeding the maximum client buffer size Bmax. 3.4.2 A nalysis on buffer trace after caching The byte-level buffer trace function B (t) can be computed from (3.11). The frame- level buffer trace function B f (t ) can be obtained by simulating the transmission frame by frame, assuming that the nominal channel rate C is provided. An example is shown in Fig. 3.5. We first study the case of caching a single frame. For a given A , and available channel bandwidth C, if the client buffer size (B max) is large enough, i.e., Bm ax > maxt{B^(t)}, then based on (3.11) we have B a {1) = C(t + d ) —SA(t) = c(«+ < < )- ( £ * , - £ > ( * ) * ( < ) ’) \ i = l i=L / = B S ) + I > ( W ) , (3.19) i= t where B${t) is the buffer trace (in bits) where no frame is cached (A = < f > ) . From (3.19) it can be seen that caching one frame F(k) increases the buffer trace B^(t) for the duration t G [k,N] by the cached frame size R(k). This is because, after being cached, frame F(k) will be fetched from the proxy rather than from the server, thus all the non-cached frames later than F(k) are “shifted” to an earlier transmission 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. time (because frame F(k) is skipped for transmission) from server. Those later frames (after F(k)) will stay in the client buffer for a longer period and thus increase B A(t) (and corresponding BA(t)), by caching F(k). A simple example of the “raise” in BA(t) after caching F(k) is shown in Fig. 3.6. However, (3.19) does not hold if there is a tight buffer size limitation, i.e., Bmax < maxt{jB^(t)}. If the decoder buffer is full, then the server and proxy have to reduce the transmission speed, which means C{i) is smaller than C, otherwise packets will be discarded due to client buffer overflow. P ro p o s itio n 2 If there exists a tmax (defined in (3.12)) such that B^itmax) = Bmax (where B max is the known maximum buffer size), then caching one frame F(U) which is located before tmax (t,- < tmax) can only increase the buffer trace by approximately R(U) between time ti and tmax, where U is the transmission time of frame F(ti), and t i < < t m a x • B^it) if t < ti , B${t) + R{ti) if ti < t < tmax ) (3.20) Bmax If tmax ^ — tmax > B ^ t) if tmax < t < N . See the Appendix for a proof. In short, before B^(t) reaches £ mai (i.e., ft - < t < t'max), if c is “lifted” according to (3.19), and as a result B A{t) will reach £ max at an “earlier” time t'max (where t'max < tmax )• Then the proxy and/or server have to reduce transmission speed after time t'max when the buffer is full, until it is drained 73 BA(t) = < Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to accept further data. This reduction cancels out the “extra” frames cumulated in the buffer, so that the remaining buffer trace for t > tmax is the same as if no frames are cached. See Fig. 3.7 for an illustration. Caching multiple frames is similar to the single frame case. W hen additional frames are cached, the buffer trace B(t) is “raised” consequently, and may hit the maximum bound Bmax at more points. Therefore, we conclude th at each cached frame can only increase the buffer trace between its scheduled transmission time £ ,- and next nearest tmax, if it exists, where B (tmax) = B max. 3.4.3 Proposed SC B algorithm Based on the above conclusions, we now propose the Selective Caching for Best-effort networks (SCB), which iteratively selects one frame that is located within the range of [fmax, tr] i where tmax is the closest buffer peak time before tT. An example of SCB is shown in Fig. 3.8, where troughs occurs at £i and £ 4 before caching. According to Proposition 2, caching frames before the buffer peak time £ 2 will not lift B^(t) after £ 2 . So after caching frames before before ti to increase fJ^(£L ), we should select frames between t 2 and £ 4 to increase B? (£4 ) (see Fig. 3.8(b)), and therefore improve U defined in (3.17). The details of SCB algorithm are summarized in the following steps. S te p 1. Initialization. Same as Step 1 in SCQ algorithm. See Section 3.3.4. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. S te p 2. In the nth iteration, find the most risky period £? = arg mint{B^(t)}, e.g., time ti in Fig. 3.8a. If there are multiple , choose the first one (the MaxMin criteria is applied first, and MaxAverage is applied to break the tie choices if needed). S te p 3. Find the nearest buffer peak time before the chosen £" (e.g., t2 for risky time £ 4 in Fig. 3.8b). If no maximum peak exists before set t^ax = 0 (e.g., t = 0 for ti in Fig. 3.8a). Note that ££,ax is obtained from the byte-level buffer trace B^n(t). S te p 4. Select one frame F(cn) which is right after t^ ax (obtained in Step 3) to be cached, set the a(cn) = 1 . Update the trace B ^ n(t) and set H H — R(cn). If H < 0, there is not enough space left on proxy, procedure ends. Otherwise, set n ■ $ = n + 1 , go to Step 2 . In each iteration, we first locate the tr for U, thus to increase U is the same as to increase B ^(tr). From Proposition 2, we know that in order to increase B^(tr), we have to cache the frames after the nearest previous buffer peak tim e . 4 Since there might be multiple choices for selecting, e.g., caching any frames between tmax and tr can increase B*(tr), the MaxAverage criteria for robustness requires to select the 4 Our goal is to increase the number of available frames in the buffer during the playback, which is measured by Bf (£). The exact value of Bf {t) may have a slightly different shape from B(t), due the variable size of frames in VBR video. However, the increment in B(t) should also lead to the increment in Bf (t) when there is more data in the client buffer. Therefore the results in (3.19) and (3.20) can also be applied to approximate the increment in Bf (t) after caching a frame F(k). 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. frame furthest away from tT but after the nearest previous peak, tmax. This provides the largest increase in average robustness Ua. 3.4.4 Caching table Both SCQ/SCB algorithms select additional frames when there is more cache space available, while the frames selected earlier still remain being cached. Therefore we can perform the SCQ/SCB algorithms for the complete video sequence to determine the order of all frames to selected for caching (this can be achieved by setting the cache space equal to the size of the video), and store the results into a cache table. When the available cache space increases or decreases, then the proxy can add or remove frames that need to be cached according to the cache table, without re computing the selection procedure. Thus, the caching scalability and low complexity (for online operations of the proxy) can be achieved, by using such a cache table. 3.5 Experim ental results Experimental results on the SCQ algorithm described in Section 3.3.4 are shown in Fig. 3.9. A video clip (part of movie Star Wars in MPEG-1 [25]) with 10,000 consecutive frames is used for simulation. The original video has the average rate (R avg) of 5,516 KBits/sec, (peak frame rate is 9,812 KBits/sec). The total size of video clip (R totai) is 280.6 MBytes. Fig. 3.9a shows th at the proposed SCQ algorithm can reduce the server-proxy bandwidth almost the same as th at of the prefix caching 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. algorithm. Fig. 3.9b shows th at as expected, the SCQ algorithm requires a much smaller client buffer size to achieve the same bandwidth reduction as prefix caching. In the experiments of SCB algorithm, the video clip having 10,000 frames is encoded with MPEG-2 under rate control5 similar to [53]. The original video data has an average rate (R avg) of 2,112 KBits/sec, with peak data rate 2,400 KBytes/sec. The average frame size is 8 8 KBits. The average channel bandwidth, C, is also 2,112 KBits/sec. The client buffer size (B c) is 512 KBytes. Fig. 3.10a shows the robustness U = mint{J5^(t)} with respect to the percentage of video being cached. Fig. 3.10b shows the average number of frames in the buffer during the playback, or Ua = B* (i). Note that as defined in Section 3.4.1, the selective caching uses the MaxMin robustness criterion first rather than the MaxAverage criterion to eliminate the worst case first. When only a portion of the video is cached, the SCB outperforms prefix caching in both cases (for U and Ua). Fig. 3.11 shows the simulation results to verify the effectiveness of our definition of robustness ( U and Ua )• We simulate the delivery of stream ing video when it is partially cached. The cached part of the video is sent to client from the proxy with no loss. The non-cached frames are retrieved from server (via proxy), starting from the beginning of the playback session, and the data is buffered at client until it is displayed. The client physical buffer size is 512KB. 5The reason to apply rate control is to avoid large variance of video data rate to avoid potential network congestion in a best-effort network. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We use a binary erasure channel (BEC) [16] to model the server-proxy channel. A packet (we use a fixed packet size of 512 bytes) is lost with probability e, and arrives to the client correctly with probability 1 — e, where e is the channel packet loss rate. All lost packets are recovered by retransmission. In this simulation all frames have to be played, thus if a frame arrives too late to the client, due to delay or retransmission, the previous frame is “frozen” on the screen until it arrives. We refer to the period during which a frame is “frozen” as the “ jitte r duration” Tj. The fraction of Tj/Ty (where Ty is the total video playback time) is used to measure the continuity of the playback. Tj/Ty = 0 means that the playback has no jitter; a larger Tj/Ty indicates that more jitter happens during the playback. Thus a smaller Tj/Ty indicates more robustness for a continuous playback. The experiment uses 1 0 0 0 realizations, the packet loss is performed randomly with the given channel error e. We can see from Fig. 3.11a shows that when a larger proportion of the video is cached, the robustness is increased and the jitter duration is reduced. Fig. 3.11b shows the jitter duration with different packet loss rate. In both cases, the proposed SCB algorithm leads to smaller jitter duration than prefix caching since it select frames to be cached to maximize the robustness. In the presence of network congestion, the congestion duration can last over some unpredictable time-scale. During the congestion all packets are delayed, and the bandwidth drops to zero. Fig. 3.11c shows the results when both the channel congestion and the congestion duration (denoted by dc) occurs randomly. dc follows 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. an exponential distribution with mean of m c. We test the robustness with two cache schemes with different value of m c. Fig. 3.11c shows th at in both schemes, the jitter duration increases when m c becomes large. This is true as jitte r is more likely to happen when network congestion becomes severe (last longer). As expected, the proposed SCB algorithm outperforms prefix caching algorithms with different m c. The results showed in Fig. 3.11 verify the effectiveness of our definition of robustness criteria developed for SCB algorithm. 3.6 Conclusions In this chapter, two novel approaches for proxy caching of video are presented for both the QoS networks and best-effort networks (e.g., the Internet). The video caching performance is measured differently in these network environments, i.e., for QoS networks, the metric of interest is the network bandwidth cost; while the ro bustness of continuous playback against poor network conditions is more important in best-effort networks. Therefore the caching algorithms should be designed accord ingly. We also emphasized that some resources, such as client decoder buffer size and limited proxy cache space, are also critical for the design of video caching algo rithms. We proposed two caching algorithms, SCQ and SCB, for QoS and best-effort networks, respectively. SCQ can reduce the network cost of bandwidth reservation near optimally and requires a small client buffer size to achieve it; SCB can increase the playback robustness while not violating the client buffer size budget. Both SCQ 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and SCB algorithms provide good scalability for proxy space adjustment, and low complexity for proxy online operations. The proxy can easily reduce/increase the cache space for a video object while still maintain the good performance provided by these algorithms. Both SCQ and SCB algorithms are designed for caching a single video object with a pre-allocated cache space. In the situation th at the total cache space is limited, to determine how much cache space allocated to each video to maximize the overall performance can be an interesting resource allocation problem (e.g., [87, 65, 36, 73, 90]. ). The proposed caching algorithms (SCQ and SCB) in this Chapter are independent of any other cache space allocation mechanisms, and can be used in conjunction with them. The cache table described in Section 3.4.4 can be used to find the trade-offs between the cache space and the caching performance for each individual video sequence. 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix P ro o f o f P rop osition 1. From (3.3) and (3.6) we get 1 < t < k (3.21) La ( t) ~ R{k)/t, k < t < N Recall the definition of tpeak in (3.8). If k < tpeak, from the second case in (3.21), we have max{L^fc (t)} < max{L^(i)}, therefore caching a frame F(k) before tpeak can reduce the required bandwidth Cr. The reduction is A / = R (k)/tpeak (note that this assumes tpeak remains the same after caching F(k); however the Proposition still holds if tpeak changes except that the absolute value of A/ is smaller than R{k)/tpeak)- Obviously if k > tpeak, caching that frame cannot reduce the maximum of L^(t) which occurs before k. P ro o f o f P rop osition 2. From (3.19), we know that without the physical buffer size constraint, Ba(£) would exceed Bmax. Thus to prevent the buffer overflow, the server and/or proxy has to reduce the transmission speed (C(i)) at (or before) tmax. However we assume the reduction of C(i) is as small as possible6 so th at client buffer is always kept full 8 This can be easily achieved by sending feedback from client to proxy indicating the buffer fullness during the transmission, so that the proxy/server can adjust the C(i) accordingly. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. during the period around tmax7. Therefore, at time tmax, BA(t) also reaches the maximum, BA(tmax) = Bmax. Because R{ti) > 0 and tmax > £ * , it is easy to see th at there must exist t'm ax < tm ax such that BA(t'max| = Bmax. This means BA(t) reaches Bm ax at an earlier time t'max due to the increase of R(t{). For t{ < t < t'max, we know from (3.19), BA{t)=Bt(t) + R(ti), (3.22) For t'm ax < t < tmax, BA{t) remains full at Bmax (as the above assumption). We also have BA(tmax) = B^(tmax) = Bmax For t > tmax, from (3.11) we know that t m a x + i BA(tmax + 1)= £ C 'W - E i= l i= I t m a x t m a x = ( E C(i) - E * (0 ) + C (tmax + 1 ) - R (tmax + 1) i = l i= l = B A(t max) + C (t max “ t” 1) - R (t max 1) “ &max + C {t max + 1) — R (tmax +1) = B<p(t max ) + C ( t m ax "b 1) R {tm aX “ b 1) = B^tjnax + 1 ) 7To always keep the buffer as full as possible is to maximize the robustness, by storing more frames in the buffer. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Applying this recursively for all tmax < t < N, we can get B a W = B$(t), where tmax < t < N. (3.23) Finally, for t < t{ (ti is the transmission time of frame F (t,)), obviously there is no change for the buffer trace, B A(t) = B<j,(t). Combining these results we get (3.20). 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bits E C A1(t) : After caching F(/c,) S A(t) : Before caching .SA1(t) : After caching F{Jc 7 ) ■peak BW L A (f): Before I caching LA1( t ) : After caching F(lc t) Figure 3.3: Illustration of Proposition 1 . The lower figure shows the slope functions before and after caching one frame F(ki), represented by L^(t) (solid line) and (t) (dashed line), respectively. The upper figure is the corresponding cumulative channel/frame rate functions. After caching an “earlier” frame before tpeak, i.e., ki < tpeak, m ax{L^(i)} reduces from c to ci. Obviously, caching a “later” frame F(fc2), i.e., ki > tpeak, can not reduce max{L^(£)}, which occurs at tpeak- 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bits r c ,,( f ) : A f t e r c a c h i n g F(ki) o r F(k2) SA (t): B e f o r e c a c h i n g max SA,(t): A f t e r c a c h i n g F(kr) SA 2 (t) : A f t e r c a c h i n g F(k2) '•peak max Figure 3.4: Illustration of caching one frame before or after tmax. S ^ i t ) and S>t2(£) are the cumulative frame rate functions after caching F(ki) and F(k2), (drawn in dashed and dotted lines) respectively. Note that only one frame is selected in each case. If R(ki) = R(k2) and k2 < tmax < ki < tpeak, Proposition 1 shows that selecting F(ki) or F(k2) leads to the same reduction on bandwidth Cr . However, the corresponding changes in B max are different: caching F(ki) reduces B max from 6 to bi (A 6 = bx — b < 0); while caching F(k2) increases Bmax from b to 6 2 (A 6 = b2 — b > 0). Therefore caching a frame between tmax and tpeak (e.g., F(k2)) can reduce B max while keeping the same reduction on Cr. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. N u m b e r o f f r a m e s = 2 0 0 0 , A v g . d a t a r a te = 2 .4 9 8 M b p s , d = 2 s e c , C h a n n e l r a t e = 2 .5 0 0 M b p s , j j 4 0 Number of frames in decodei buffer during playback p, , 5 20 0 1 10 T im e 6 0 0 j> 5 0 0 Number of bite in decoder buffer £ 4 0 0 I 3 0 0 | 200 S Z 100 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 T im e Figure 3.5: Trace of client buffer size in number of frames and bits. Bfi) :No frame cached. ^ Figure 3.6: (a): no frame is cached, (b): frame F(£i) is cached. These figures illustrated th a t by caching one frame F(ti), the buffer trace can be “lifted” for t > ti, when there is no buffer size limitation. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a): Peak Bandwidth (b): Maximum buffer usage 7 0 0 0 6 0 0 0 ' E 5 0 0 0 2* 4 0 0 0 £ 3 0 0 0 « 2000 1000 ( 1 ) : P r o p o s e d S C Q a lg o r i th m • A - ( 2 ) ; P r e f i x c a c h i n g ____________ P e r c e n t a g e o f v i d e o b e i n g c a c h e d 7 0 3 0 20 * * - ( 1 ) : P r o p o s e d S C Q a lg o r i th m • a • ( 2 ) ; P r e f i x c a c h i n g ____________ J U P e r c e n t a g e o f v i d e o b e i n g c a c h e d Figure 3.9: (a) The bandwidth (Cr(A ), see (3.7)) that has to be reserved, v.s. the percentage of the video been cached, (a): Both the proposed SCQ and prefix caching can reduce Cr(A) similarly as more portion of the video has been cached, (b): The maximum buffer size, B max, required at the client to achieve the caching performance in (a). ( a ) : R o b u s t n e s s U - m in [ Ei ( t ) ] © S = c 0) in 0} E ( 0 2 e 3 Z : f i I i I / X . I / i 1 * * f ! ( 1 ) : P r o p o s e d S C B a l g o r i t h m p * - A * ( 2 ) r P r e f i x c a c h i n g 4 5 ® 4 0 ( b ) : R o b u s t n e s s U a: a v e r a g e o f Ef(t) 3 5 | 3 0 a ° 2 5 Z 20 ~ ' . . . . A f * ” . . . . . 1 u ( t T ^ di / i r A / / L (1 ) A - (2 ) • r o p o s e d S C 8 a l g o r i t h m P r e f i x c a c h i n g 0 2 0 4 0 6 0 8 0 P e r c e n t a g e o f v i d e o c a c h e d in p r o x y 0 2 0 4 0 6 0 8 0 1 0 0 P e r c e n t a g e o f v i d e o c a c h e d in p r o x y Figure 3.10: Robustness vs the percentage of the video been cached, using SCB and prefix caching methods, (a) Robustness U defined in (3.17). (b) Robustness Ua defined in (3.18). 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a): Robustness (U) vs cache size (b): Robustness (U) vs channel error 0.6F c 0.5 o '■ £ < o l o . 4 4 m •5.0.3 o 0.2 0.1 c o ■ e c o 3 * o 0 .1 4 4 o. 1 3 Q 'S . 0.12 a > 0.11 •.................... • .................... ‘.......................... ( 1 ) : P r o p o s e d S C B a lg o r ith m \ A■ ( 2 ) : P r e f ix c a c h i n g L - \ ^ P a c k e t l o s s r a t e (e) = \ N ...V 'a 0 .3 V A : \ \ \ • \ A V \ — * i 1 0 2 0 3 0 4 0 P e r c e n t a g e o f v id e o c a c h e d in p r o x y ( c ) : R o b u s t n e s s (U ) v s c o n g e s t i o n d u r a t i o n ( 1 ) : P r o p o s e d S C B a lg o r i th m • A- ( 2 ) : P r e f ix c a c h i n g P a c 20°/ <t l o s s r a t e o f t h e vide* e )= 0 .3 o i s c a c h e d t / / r . - f k - A a ' / 1 2 3 M e a n c o n g e s t i o n d u r a t i o n ( s e c o n d ) 0.25 ( 1 ) : P r o p o s e d S C B a lg o r ith m A- ( 2 ) : P r e f ix c a c h i n g ____________ 2 0 % o f t h e v id e o Is c a c h e d ■ o 0.15 Q. 0.1 t 0.05 C h a n n e l p a c k e t l o s s r a t e e Figure 3.11: Robustness verification, with 1000 realizations used in each case, (a): Tj/Ty vs percentage of the video being cached, (b): Tj/Ty vs channel error e, when 20% of the video is cached, (c): Tj/Ty vs average channel congestion duration dc. 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 V ideo Com pression w ith R ate Control for V ideo Storage on Disk Based V ideo Servers 4.1 Introduction Video-On-Demand (VOD) services have been studied in the past few years and may soon become popular, as recording, storage and transmission of video data becomes less costly. The two challenging problems in a VOD system are video data transmission and disk storage. The output bit rate after compression is usually Variable Bit Rate (VBR ) , 1 but common transmission channels are based on the Constant Bit Rate (CBR) mode where the transmission bandwidth is fixed. Thus, transmission of VBR video data through a CBR channel may generate hiccups (i.e. frame loss) [ 6 , 44, 28, 53]. lThe variable nature of the bit rate per frame comes from the fact that frames, when compressed to achieve a specific visual quality, require a different number of bits depending on such factors as the number of objects in the frame, the motion, the proportion of textured areas to flat areas, etc. 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this chapter, we first address the disk storage issues and, in particular, we study how video data can be encoded to achieve more efficient transmission. This will lead us to the rate control algorithms optimized for specific disk placement strategies. Since the volume of video data is large, large capacity and high-speed hard disks are commonly used to store it. These modern disks have very high transfer bandwidth, and thus it is possible for a server to provide continuous video display to several users simultaneously. The disk drive can be multiplexed among several displays by providing Round Robin Service [26]. An example is shown in Fig.-4.1. A number of users (W,X, ...Z ) are served by the server. Each user is allocated a “time slot” (Ta/ot) during one round interval (Tround) to receive a block of data from the server. This block (e.g., ) of data should contain sufficient number of data (frames) for the user to display video until the next block arrives, the required display time is also equal to Trolind. If there is not sufficient data in that block, jitters will occur during the playback at the user. For a fair service, each user gets the same amount of compressed data in each service round. The video data blocks have to be retrieved from disks before the server send it to the network channels. When the disk is large (which is typical for the purpose of storing large volume of video d ata), the time spent to find a particular video data block on the disk, referred as disk seek-time, can be significant [26, 3, 5, 6 ]. Since 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Twfc One user interval Display Wt Display WH Display X, One service round interval, T. _ Figure 4.1: Service Round. During each service round time, Troun(i, many users can retrieve d ata from the server for playback. This figure shows an example of Round Robin Service, where each user is allowed to retrieve a block data from a particular time slot during each service round. data can not be transferred from disk during the seek-time, a smaller seek-time overhead allows more data to be delivered from disk to the the users during one round Trmtnd and therefore, more users can be supplied by the server concurrently. The disk seek-time also introduces another side effect: if the video blocks are placed in a random order, the seek-time can be variable to access different blocks, which will lead Trtm nd to be variable2 and unpredictable. If Troum i becomes too long, there are more chance for the client to run out of frames to be displayed (where jitter occurs) before it get another data block from next service round. Studies in [26, 3, 5, 6 ] show different approaches to reduce the seek-time by place the video objects on the hard disk special orders. Many of these approaches partition the disk and place the video data blocks on the disk sequentially, according to the display order, since the video data is likely to be requested sequentially (from the 2We assume T,iot is fixed to transfer a complete data block. Otherwise if we vary T,iot to make Tround to be constant, then some data blocks may have to be delivered during more than one time slot, which means more disk seek-time has to be spent to find the same data block for each additional delivery 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. earlier frames to later frames) for playback. The disk partition creates “regions” . One region contains several video data blocks from different video objects, see Fig.- 4.2 for an example. The data blocks from different video objects forms an interlaced fashion of placement. The reason for the “interlaced fashion” is th at disk bandwidth is large enough to transfer one d ata block which contains enough frames to playback during the time when the disk transfer other blocks in the same region. For example, after transferring data block V (the ith data block of video object 1 ), the disk arm continues to move in one direction to retrieve other blocks in the same region. W ith carefully design, the disk arm takes less then Trm ind time when it reaches block to continue transfer next block of video object 1. Therefore the data can be transferred for different video objects during the disk arm movement showed in Fig.-4.2. It is shown that this kind of placement can reduced the overall disk seek-time when different video objects are requested simultaneously by several concurrent users. For more details of disk placement strategies, refer to [3, 5, 6 ]. If blocks are placed in a more restrictive sequence (e.g. zigzag mode [26]), the seek-time can be reduced to close to zero. Another benefit from these disk placement strategies is th at the disk seek-time is relatively constant for different d ata blocks, so that the resulting Trmind is nearly constant, which reduces the complexity of software design of the server system and improves its performance. 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.2: Disk placement. The video blocks are placed in an “interlaced” fashion so that disk seek-time can be reduced when multiple video objects are requested concurrently These disk placement algorithms target the efficiency3 of VOD servers, but they may affect the servers’ ability to provide continuous display. For example, some data placement algorithms may not allow random access because the location of the data blocks has been restricted. Thus it will be more difficult than th at in a pure random placement approach to reduce the duration of hiccups. This is because, once the maximum number of users N is set, the block size B S i~e is also fixed (transm itted during each Traun(i) [26]. Thus, after the disk placement algorithm has been applied, it is not possible for the disk arm to reach a block at an arbitrary location, because the disk arm moves in a single direction (either towards the edge or the inside). If a block can not be displayed for Trotlnd long, the result is that a user must wait for the next service round to get the next block. The server may have to “pause” service to other users by spending extra time on one particular client to reduce its hiccups. 3i.e., they try to maximize the number of users that can be served simultaneously. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Studies [ 6 , 44] show different strategies to reduce the hiccups, such as restricting the number of concurrent users based on the Quality-of-Service parameters. Other approaches involve making a decision on whether to adm it a new user based on the probability of hiccups if that user is adm itted. Some of these admission algorithms may be too restrictive, and while they may prevent hiccups, they may also waste bandwidth during some service rounds (when all the users request small size blocks). Conversely, a less demanding admission policy may result in more hiccups. In this chapter, we tackle this problem from the encoder view point. We encode the video data with the given constraints set (B S ize, T/, N ) to guarantee continuous display given a particular disk placement specification. Admission of a new user will be simple, we just need to check to see if the total number of users exceeds the maximum number allowed (N ). Since iV is used to calculate the bandwidth per user applied in the rate control optimization, we can guarantee that no hiccups will occur if the number of users is less than N . This approach could also be incorporated with other algorithms mentioned above, if we allow limited hiccups during the display. Note that an easy way to achieve this is to compress the video with restrict Constant- Bit-Rate (CBR), so that each data block with same size is guaranteed to supply a constant period of time, e.g, However, typically a CBR video has lower quality (measured in Peak Signal to Noise Ratio, PSNR) then a Variable-Bit-Rate (VBR) video for the rate budget, and VBR video is generally supported by many current standards, such as MPEG-1, -2 [50], H.263 [32], MPEG-4 [31]. In this 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. paper we code the video in VBR formate to achieve better quality, and propose a rate-distortion based algorithm for video compression under the constraints of disk placement strategies, which selecting different quatizers for each frame to find the optimal bit allocate for each frame. The experimental results shows that the video quality is improved by 0.5dB to 1.5dB compared the compression method without rate control. The organization of this chapter is as follows. Section II presents the formulation of the problem. Section III describes a Multiple Lagrange Multiplier Algorithm, to obtain the optimal solution, and Section IV provides the experimental results and conclusions. Our results show that the overall PSNR with rate control can be im proved by 0.5 to 1.5 dB as compared to not using rate control under the constraints of continuous displaying. 4.2 Problem Formulation U ser b u ffer c o n strain ts: We assume th at the video frame rate is constant. In a real-time video transmission system, the end-to-end delay interval must be constant, say A T seconds. Thus a frame read from disk at time t, must arrive at the decoder (user) before t + A T. As for a VOD system, all the video data is available before transmission, the size of the user buffer and initial latency should be small (i.e., we assume the user can only pre-fetch a smaller number of frames, / p, before playback). Real-time transmission constraints are still applicable in VOD systems, because once 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Tf Period during which a frame is displayed, e.g. 1/30second fp Number of pre-fetched frames TP Period during which the pre-fetched frames are displayed, Tp = f p x T f Trmind Turn-around-time for one service round Ft z-th frame Bdisk Disk bandwidth (bits/second) N Maximum number of users Buser The average bandwidth per user £ uaer = B disk/N C acc Accumulated channel rate Race Accumulated frame rate B u fmax User buffer size ck Channel rate at time t = kT f x(i) Quantization step for frame i Rx(i) (i) Number of bits for frame i coded with quantization step x(i) Cx(i) {i) Distortion of frame i coded with quantization step a;(z) Nf Total number of frames Baize Block size in disk placement algorithms Rtf Number of total service rounds N s{i) Number of frames in block i Table 4.1: Notation used in this chapter 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. we start displaying frames, we need to continuously transm it frames, and the number of frames that can be stored in the decoder buffer is limited (e.g. the whole sequence cannot be stored in the buffer). E n d -to -e n d D elay: We assume the frame rate is 30 fram es/sec (it can be other number depending on the video object). The pre-fetched f p frames will be displayed for Tp = /p/30 seconds. If a frame is scheduled to be displayed at time t during the playback, it should arrive at the user end before t + Tp. R a te c o n stra in t: The delay and buffer constraints can be converted into rate constraints, which the encoder has to meet to prevent hiccups. We assume that T j is the period each frame can be displayed (typically, 1/30second), which is also the “time unit” we will use in this paper. Each frame is labeled with index i, i = (1 ,2 ,3 ...) , and if we start display at time t = 0, the frame F* will be scheduled to display at time i = t x Tf. i2, is the number of bits of frame Fi. The channel rate is the disk bandwidth allocated to each user after time mul tiplexing. According to Table-1 , each user will get B U 3 er = Bdisk/N(bits/sec) of bandwidth on average. That is, each user will receive bandwidth Bdisk during a fraction Tround/N of each service round. The constraint for no buffer underflow is that any frame F* must arrive at the user no later than t = i x T /. This requires that the channel have enough capacity to transm it all the frames (F i, F2 ... F ) to 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the user before time t — i x T f. Denote C(k) as the disk bandwidth (allocated to the particular user) at time t — kTf, we get where Rx(k)(k) is the size of k-th. frame encoded with quantization step x(k). In this paper we use the video sequences compressed using MPEG-1 with intra-mode, which means each frame is coded independently The frame size and quality can be adjust by the quantization step size x(k) (chosen from an available quantization step set) used for that frame: a larger x(k) leads to lower bit rate of the frame (smaller Rx(k)(k)) with poor quality; while a smaller x(k) leads to a larger Rx(k){k) with higher quality. We can use Accumulated channel Tate (Cacc) and accumulated frame rate (Ra C c) to describe the problem more clearly. We can re-write the (4.1) as (with frames pre-fetching): i '+ / p £ R x W {k) < £ C(fc), * = 1,2 ,3 ,..., N f (4.1) fc=i fc=i ^-acc( 0 — ^ ( i ) ( 0 ? C-'acc (^ ) — ^ C fc ; (4.2) IP W)<c«(i) + Ecw, (4.3) ^ k < Z TlTround "b Tt, round where C(k) = < 0 , otherwise \ 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Here R acc is simply the accumulated bits of all the frames that have been sent. C(k) equals Bdisk while the server is transm itting data to that user; it is zero while the server is serving other users. We assume there are no constraints introduced by the network bandwidth here. Fig.-4.3 shows and example of R acc and Cacc■ For a continuous playback, the curve of R acc must below Cacc, otherwise hiccups will occur during the playback at the time when Rac^ is larger than Cacc- Accumulated frame rate Accumulated channel rate Transmit to other users Hiccups occur here Figure 4.3: Illustration of accumulated channel rate and frame rate. If different video sequences are stored on the disk, and different users request different sequences randomly, buffer underflow can be avoided for all clients if all the video sequence is encoded with rate control set by (4.1). This is because that if each sequence is encoded with rate constraints, and each of them will meet the requirement of no buffer underflow. These constraints are still met for each user even if the sequences are requested simultaneously. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In real-time playback, lost frames will cause visual distortion. For our scenario, hiccups mean the user has to wait for the next frame, while the current frame is frozen on the screen (or other error concealment can be applied). Thus, it may be preferable to encode a frame with fewer bits if th at allows us to avoid hiccups. Although the distortion of this frame would be increased, it is better than displaying nothing in certain cases. In the cost function below, we do not allow any frame loss in our formulation4. The quantization step could be chosen at the encoder end before the video streams are stored onto the disk. The problem can be formalized as follows. Given a set of constraints (as in [53]), how do we choose the quantization step size for each frame while minimizing the total distortion. To encode Nf frames, using a given set Q of M admissible quantizers, such that, for each choice of quantizer j = x(i) for a given block i, we incur a distortion cost Dx^(i) while requiring a certain rate R x(i) (i)- The objective is to find the optimal quantizer choices x* € x = QN > for a given channel rate Ck as in (4-3), such that: JV > x * (l,..., Nf) = argmin £ £ > * « ) « (4.4) t=l subject to the constraint set (4-1) or (4-3). We will solve this problem using multiple Lagrange Multipliers in the next section. 4It is possible to develop other cost functions which may take account frame losses. This is beyond the scope of this chapter. 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3 O ptim ization based on M ultiple Lagrange M ultipliers Using Lagrangian optimization for rate control under multiple rate constraints was previously studied in [53, 8 ]. In th at approach, the constrained optimization prob lem above is equivalent to the unconstrained problem derived by introducing a non negative Lagrange multiplier A * associated with each constraint in (4.2). The opti mization formulation then becomes: find the quantizer choice x* at the time U = iT f such that: "f "f .3 x* (1 ,Nf)= arg nun Dx(i) (0 + A i ( S * * * (< ) (*)) > (4-5) i=L j=1 t= L We introduce Nf Lagrange multipliers to replace the Nf constraints in equation (4.3). To find the optimal quantizer set x*(l,Nf) is the same as to to find the appropriate multipliers {A*} to meet the constraints. From [28], we can introduce another set of multipliers A [ = A i> (i = 1,2, ...Nf) to rearrange (4.5) as: Nf x*(l, Nf) = a r g m in ^ (£>x(i)(i) -I- A*R.W(*)), (4.6) r «=i Finding the solution for (4.5) is equivalent to finding the appropriate non-negative values of the set {A [-}. Define Ji(A(-, x(i)), the cost for frame i, as: (4.7) 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. If we use intra-frame mode, the quantizer for each frame can be chosen indepen dently while minimizing the cost for each block Ji(A(-, x(i)) as: x*(i) = arg min J f(A-, x(i)), Vi 6 { 1 , 2 , Nf } (4.8) )€ < 3 In [53, 28] a similar problem is solved by iteratively increasing the lower bounds on the multipliers, defined as {A(}, such that the violation of rate constraints can be prevented, and adjusting the values of {A (-} until an optimal bit allocation, where none of the constraints is violated, is found. The details of the search for these multiple Lagrange multipliers can be found in [53, 28, 80]. Here we outline the basic procedures. Step 1: Initially the quantizer choices x = (x (l),x (2 ), ...,x(iV/)} are obtained by using a single Lagrange multiplier A'^ for all the frames in (4.8), subject to only one constraint: Yik=i ^ , * ( 0 ^ Hk=tfp C fc- Step 2: If x is such that all rate constraints in (4.1) are met, then x is the optimal solution x* for problem (4.4). Otherwise, assume th at frame v is the last frame which violates the rate constraint, that is, v < Nf and no other frame between frame v + 1 and frame Nf violates the rate constraint. Find the minimum value of Lagrange multiplier Nv — min A (. for the video stream from frame 1 to frame v which prevents violation of the rate constraint: Ri,x(i) < Hfc={p C*. 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Step 3: Find the quantizer choices x = (x (l), x ( 2 ) , x(N/)} as in Step 1 except that the Lagrangian multiplier for the video streams from frame 1 to frame v is lower-bounded by A (, as A J, < = max(A^, X 'v). Step 4: Go to Step 2. Repeat until all the rate constraints in (4.1) are met. 4.4 Experim ental R esults In order to test our proposed algorithms, we simulated the transmission behavior with and without the rate control. We use 5000 and 10,000 frames from the movie “Mission Impossible” for our simulation. Each frame was encoded in intra-frame mode and we use 7 different quantization steps encoded by JPEG, thus generating 7 source streams with different rate-distortion performance. Each source stream uses a fixed quantizer. We test with different parameters for B ^ k, Tround and N . Fig. 4.4 shows the accumulated channel and frame rate (with and without rate control). The rates are added up from the time the video transmission starts. Curve ( 1 ) is accumulated channel rate (C(acc)) which is the upper bound of the frame rate. The other curves (2-3) are accumulated frame rate (i?(acc))- The hiccups will occur if R{acc) exceeds the C(acc). Among curves (2-3), curve (2) is closest to the bound (1 ), which is based on rate control processing (Lagrange iteration). Curve (3,4) use fixed quantization steps, with a smaller quantizer step size for curve (4) (high frame rate, low distortion), and a larger one for curve (3). Those two quantization step are the closest two consecutive steps size of all the available steps. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The channel rate is the disk bandwidth, the block size is a typical size from certain disk placement algorithms. These two parameters decide the shape of curve (1), the upper frame rate bound. It shows that with a smaller fixed quantization step, there are more hiccups, while with a larger one, the channel capacity is wasted. 300 250 2-200 lo 5 ■ £ 150 8 M s 100 50 0 0 2000 4000 6000 8000 10000 Time (sec) Figure 4.4: Simulation results. Accumulated channel rate and frame rate After applying the rate control over the whole sequence, a different quantization step can be chosen for each frame, and the accumulated frame rate can be set very close to the channel bound without exceeding it. We compared the PSNR with the distortion using rate controls with the frames of fixed quantization (its rate also not exceed the channel bound). As there is no common methods to measure the distortion (neither measured by PSNR or by some perceptual measurement) for a “lost” frame, the comparison is made based on that no hiccups occur for any choice of the quantizers (with or without rate control). Based on our experimental result, 105 Accumulated channel rate and data rate. Users =36 ; I i " ! ' A v g r a t e / f r a m e = 5 .9 1 - 2 7 .7 2 K b p s C h a n n e l R a te = 1 9 .2 0 M b p s T u rn a r o u n d • = 3 6 0 s e c - : B lo c k s i z e : = 1 0 0 0 .0 0 K b y te s P r e f e tc h e d : = 2 4 0 f r a m e s T o ta l f r a m e s . = 2 4 0 K . . . . . . N u m b e r o f u s e r s = 3 6 c h a n n e l b o u n d ( 1 ) w i t h r a t e c o n t r o l ( 2 ) n o r a t e c o n t r o l ( 3 ) n o r a t e c o n t r o l ( 4 ) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. we have about 0.5 to 1.5 dB for overall frame sequences as showed in Fig.-4.5. Of course, if there are less available quantization steps, we can get larger PSNR gain, and small PSNR gain vice versa5. Figure: PSNR of 'No Rate Control* vs. ‘Rate Control' (Lagrange iteration) 4 8 1 ----------------- 1 -----------------1 -----------------i i i O With Rate Control M M S e S O — * — Without Rate Control 46 - ......................................................................................... • »- * N . 0 40 - .................... ...................... ' - 1 ' w 38 - ....................................... • .......................... N • : ................................. - 0. \ ^ 36 Channel Rate = 19.20 Mbps x _ Turnaround = 3 6 0 se c ' " Prefetched = 240 frames ; ^ ^ Total fram es = 240 K ; ' ' v 3 2 - Avg. rate per frame = S.91K — 27.72K bps.............................................' s 3Q 1 1 1 » ---------- 1 0 20 40 60 80 100 120 Number of users Figure 4.5: PSNR of the video sequence. When the maximum number of the con current user can be supported by the server increases, the bandwidth allocated to each user decreases, therefore the video bit rate has to be reduced which leads to poor quality. Using rate control based compression can improve the overall PSNR (averaging over the frames) by 0.5 to 1.5 dB compared to the scheme without any rate control (a uniform quantizer is used for all frames). For the other choices of uniform quantizers which leads to different quality and bit rate of the compressed video, we show the total hiccups and average user waiting time during the hiccups (hiccup duration) in Fig-4.6. The results shows th at when the number of designed maximum user increases, more hiccups will occur for the video coded with a particular quantizer, and the average waiting time (during which a frame has to be frozen on the screen until the next video block arrives) also 5If the set of available quantization scales is small than it will be more likely that a solution which does not violate the rate constraints is far below the bound. 106 Figure: PSNR of 'No Rate Control” vs. 'R ate Control” (Lagrange iteration) r ------------ — ----------------- r- ------------ ,_ r O With Rate Control — * — Without Rate Control \ \ \ : \ v \ * - K. \ \ ; > » X Channel Rate = 19.20 Mbps Turn around = 360 sec N . Prefetched = 240 frames Total fram es = 240 K ' Avg. rate per frame = 5.91K — 27.72K bps................. ________ • X ' .................. N . •v . •s . N * Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. increases. This is the drawback of using a uniform quantizer for compression. Note th at with rate control, the hiccups can be guaranteed to be avoided for a given maximum number of concurrent users. Total hiccup times(without rate control), using fixed quantizer 300 250 200 r p 150 100 50 0 120 Time period :> toooosec Avg. rate per trame = 5.91 K — 27.72Kbps . Total frames. .=.240 K ....................... Prefetched = 240 frames Turn around = 360 sec Channel Rate = 19.20 Mbps 100 1 Top to bottom : Quantization step INCREASES to I G © i 3 Z 1 2 0 40 100 Number of users Average waiting time (without rate control), using fixed quantizer _ • 1 ------- f Time period = 10OOOsec Avg. rate per frame = 5.91 K — 27.72Kbps Total frames. = 240 K . Prefetched = 240 frames Turn around = 360 sec Channel Rate * 19.20 Mbps ; Top to b ottom : Quantization step INCREASE 20 60 Number of usors 12 0 Figure 4.6: Average waiting time and hiccups without rate control 4.5 Conclusions In this paper we analysis the impact on continuous playback of the video stream by some particular video disk storage strategies. The disk storage strategies are designed to improve the server throughput by reducing the disk seek time (for finding 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a particular video data block on the disk) so that more users can be served currently. We found th at certain disk storage strategies imply some restrictions the bit rate of the VBR video stream to meet their data retrieval patterns. We translated those restrictions into rate constraints and the cost functions for our proposed rate control based compression algorithm, Basically, the proposed algorithm selects the proper quantizers for different frames in the video to maximize the overall (averaging over the frames) video quality (measure in PSNR) while the rate constraints are not violated. Our experimental results show the proposed scheme has 0.5 to 1.5 dB quality improvement compared to the compression scheme without any rate control, which uses uniform quantizer for all frames in a video object. 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Conclusions In this thesis several novel algorithms are presented for stream ing media services, namely, the delivery scheduling algorithm (ERDBS) for scalable stream ing media over best-effort networks; the two selective caching algorithm for video proxy caching in QoS networks (SCQ algorithm ) and best-effort networks (SCB algorithm); a rate-control based compression technique under the constraint of video server disk placement. The thesis discussed each algorithm in details and shows th at there is perfor mance improvement by using these algorithms. For example, EDBS algorithm can improve the playback quality of streaming media compared to the traditional deliv ery methods. The SCQ video caching algorithm can reduce the QoS network cost maximumly while a smaller receiver buffer size is required and the caching space is limited (only part of the video can be cached); the SCB video caching algorithm can increase the robustness of continues playback of stream ing video against poor 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. network condition in best-effort networks (under the same constraints as QoS net works, such as limited cache space and receiver buffer size). Finally, using rate control based video compression with certain video server disk placement strategy can secure the benefit of the particular disk placement strategy and gaurante the conitnues playback at the client. 1 1 0 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [12] P. A. Chou and Z. Miao. Rate-distortion optimized streaming of packetized media. Technical Report MSR-TR-2001-35, Microsoft Research Center, Febru ary 2 0 0 1 . [13] P. A. Chou and Z. Miao. Rate-distortion optimized streaming of packetized media. IEEE Multimedia, February 2001. Submitted. [14] P. A. Chou, A. E. Mohr, A. Wang, and S. Mehrotra. FEC and pseudo-ARQ for receiver-driven layered multicast of audio and video. In Proc. Data Com pression Conf., Snowbird, UT, March 2000. IEEE Computer Society. [15] P. A. Chou, A. E. Mohr, A. Wang, and S. Mehrotra. Error control for receiver- driven layered multicast of audio and video. IEEE Transactions on Multime dia, 2 0 0 1 . [16] T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, 1991. [17] A. Dan and D. Sitaram. Multimedia caching strategies for heterogeneous application and server environments. In Multimedia Tools and Applications, volume 4, pages 279-312, 1997. [18] A. Dan, D. Sitaram, and P. Shahabuddin. Scheduling policies for an on- demand video server with batching. In Proc. of AC M Multimedia, pages 391- 398, Oct. 1994. [19] M. de Prycker. Asynchronous Transfer Mode: Solution for Boardband ISDN. Ellis Horwood, Chichester, England, 1991. [20] B. Dempsey, J. Liebeherr, and A. Weaver. On retransmission-based error control for continuous media traffic in packet-switching networks. Computer Networks and ISD N Systems Journal, 28(5):719-736, March 1996. [21] Cisco Caching Engine, http://www.cisco.com /warp/public/751/cache. [22] N. Farvardin. A study of vector quantization for noisy channels. IEEE Trans, on Information Theory, 26:799-809, July 1990. [23] N. Farvardin. A study of vector quantization for noisy channels. IEEE Trans, on Information Theory, 26:799-809, July 1990. [24] G.D. Forney. The viterbi algorithm. Proceedings of the IEEE, 61:268-278, Mar. 1973. [25] M. W. G arrett. Contributions Toward Real-Time Services on Packet Switched Networks. PhD thesis, Dept, of Electrical Eng., Columbia Univ., 1993. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [26] S. Ghandeharizaheh, S. H. Kim, and C. Shahabi. On configuring a sin gle disk continuous media server. In Proceedings of the A C M SIGMET- RICS/PERFO RM ANC E, May 1995. [27] S. Gruber, J. Rexford, and A. Basso. Protocol considerations for a prefix- caching proxy for multimedia streams. W W W 9 / Computer Networks, 33(1- 6):657-668, 2000. [28] C. Y. Hsu, A Ortega, and M. Khansari. Rate control for robust video trans mission over burst-error wireless channels. IEEE JSAC, Special Issue On Mul timedia Network Radios, 17(5):756-773, May 1999. [29] C.-Y. Hsu, A. Ortega, and A. Reibman. Joint selection of source and chan nel rate for VBR video transmission under ATM policing constraints. IEEE Journal on Select Areas in Communications, 15:1016— 1028, Aug. 1997. [30] ISO/IEC JTC 1 /S C 29/W G 1. JPEG-2000 Image Coding System, (WG1N390 REV). [31] ISO/IEC JTC1/SC29/W G11. MPEG-4 version 2 visual working draft rev. 3.0, N2202, March 1998. [32] ITU-T. Video coding for low bitrate communication. ITU-T Recommendation H.263; version 1, Nov. 1995, version 2, Jan. 1998. [33] W. Jiang and A. Ortega. Multiple description coding via polyphase transform and selective quantization. In Proc. of VCIP, 1999. [34] M. Vetterli K. Ramchandran. Multiresolution Joint Source Channel Coding. Wireless Communications:Signal Processing Perspective. Prentice Hall, 1998. [35] M. Kamath, K. Ram am ritham , and D. Towsley. Continuous media sharing in multimedia database systems. In 4-th International Conference on Database Systems for Advanced Applications, Apr. 1995. [36] J. Kangasharju, F. Hartanto, M. Reisslein, and K. W. Ross. Distributing layered encoded video through caches. In Proc. of IE E E Infocom, Anchorage, AK, USA, 2001. [37] J. Kangasharju, Y. Kwon, A. Ortega, X. Yang, and K. Ramchandran. Im plementation of optimized cache replenishment algorithms in a soft caching system. In IE E E Signal Processing Society Workshop on Multimedia, CA, Dec. 1998. [38] R. Koenen. MPEG-4, multimedia for our time. IE E E Spectrum, pages 26-34, 1999. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [39] P.Pancha L. Xue, S. Paul and M. Ammar. Layered video multicast with re transmission (LVMR): Evaluation of error recovery schemes. In Proc. NOSS- DAV, pages 161-172, St. Louis, MO, May 1997. [40] M. Lucas, B. Dempsey, and A. Weaver. MESH: distributed error recovery for multimedia streams in wide-area multicast networks. In Proc. IEEE Int. Conf. on Commun., volume 2, pages 1127-32, Montreal, Que., June 1997. [41] W.-H. Ma and H.C. Du. Reducing bandwidth requirement for delivering video over wide area networks with proxy server. In Proc. International Conference on Multimedia and Expo., volume 2, pages 991 -994, 2000. [42] A. Mahanti, C. Williamson, and D. Eager. Traffic analysis of a web proxy caching hierarchy. IEEE Network, 14(3):16-23, May-June 2000. [43] A. Mahanti, C. Williamson, and D. Eager. Traffic analysis of a web proxy caching hierarchy. IEEE Network, 14(3):16 -23, May-June 2000. [44] D. Makaroff, G. Neufeld, and N. Hutchinson. An evaluation of vbr disk admis sion algorithms for continuous media file servers. In AC M Multimedia, Seattle, Washington, 1997. [45] S. McCanne, V. Jacobson, and M. Vetterli. Receiver-driven layered multicast. In Proc. ACM/SIGCOMM , pages 26-30, Stanford, CA, August 1996. [46] S. McCanne, M. Vetterli, and V. Jacobson. Low-complexity video coding for receiver-driven layered multicast. IEEE Journal on Selected Areas in Commu nications, 16(6):983-1001, August 1997. [47] Z. Miao and A. Ortega. Rate control algorithms for video storage on disk based video servers. In Proc. of 32nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, November 1998. [48] Z. Miao and A. Ortega. Proxy caching for efficient video services over the Internet. In 9th International Packet Video Workshop (P V W ’ 99), New York, April 1999. [49] Z. Miao and A. Ortega. Optimal scheduling for streaming of scalable media. In Proc. of 34th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, Novemeber 2001. [50] J. Mitchell, W. Pennebaker, C. Fogg, and D. LeGall. M PEG Video Compres sion Standard. Chapman and Hall, 1996. [51] RealNetworks, http://www.real.com . 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [52] J. Nussbaumer, B. Patel, F. Schaffa, and J. Sterbenz. Networking require ments for interactive video on demand. IEEE Journal on Selected Areas in Communications, 13(5):779-787, 1995. [53] A. Ortega. Optimal rate allocation under multiple rate constraints. In Data Compression Conference, Snowbird, Utah, Mar. 1996. [54] A. Ortega, F. Carignano, S. Ayer, and M. Vetterli. Soft caching: Web cache management for images. In IEEE Signal Processing Society Workshop on Multimedia, Princeton, NJ, June 1997. [55] A. Ortega and K. Ramchandran. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine, 15(6):23-50, Nov 1998. [56] Soft Caching Project Page:, http://sipi.usc.edu/ortega/ softcaching/. [57] C. Papadopoulos and G. Parulkar. Retransmission-based error control for continuous media applications. In Proc. NOSSDAV, pages 5-12, April 1996. [58] C. Papadopoulos and G. Parulkar. Retransmission-based error control for continuous media applications. In Proc. NOSSDAV, pages 5-12, April 1996. [59] V. Paxson. End-to-end Internet packet dynamics. IE E E /A C M Trans, on Networking, 7(3):277-292, June 1999. [60] S. Pejhan, M. Schwartz, and D. Anastassiou. Error control using retransmis sion schemes in multicast transport protocols for real-time media. IEE E/AC M Transactions on Networking, 4(3):413-427, June 1996. [61] W. Pennebaker and J. Mitchell. JPEG Still Image Data Compression Stan dard. Van Nostrand Reinhold, 1993. [62] M. Podolsky, S. McCanne, and M. Vetterli. Soft ARQ for layered stream ing media. Journal of VLSI Signal Processing Systems fo r Signal, Image and Video Technology, Special Issue on Multimedia Signal Processing, Kluwer Aca demic Publishers, 2 0 0 1 , to appear. [63] H. Radha, Y. Chen, K. Parthasarathy, and R. Cohen. Scalable Internet video using MPEG-4. Signal Processing: Image Communication, 15 p.p. 95-126, 1999. [64] A. R. Reibman and B. G. Haskell. Constraints on variable bit-rate video for ATM networks. IEEE Trans, on Circ. and Sys., 2:361-372, Dec. 1992. [65] M. Reisslein, F. Hartanto, and K. W. Ross. Interactive video streaming with proxy servers. In Proc. of First International Workshop on Intelligent Multi- media Computing and Networking (IMMCN), Atlantic City, NJ, Feb. 2000. 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [ 6 6 ] M. Reisslein and K.W. Ross. Join-the-shortest-queue prefetching for vbr en coded video on demand. In 1997 International Conference on Networking Protocols, Atlanta, October 1999. [67] R. Rejaie, M. Handley, and D. Estrin. Quality adaptation for congestion controlled video playback over the Internet. In Proc of A C M SIGCOM M ’ 99, Cambridge, MA., Sept. 1999. [ 6 8 ] R. Rejaie, M. Handley, and D. Estrin. Rap: An end-to-end rate-based conges tion control mechanism for realtime streams in the Internet. In Proc. IEEE Infocom, March 1999. [69] R. Rejaie, M. Handley, H. Yu, and D. Estrin. Proxy caching mechanism for multimedia playback stream s in the Internet. In Proc. o f 4th Web Cache Workshop, San Diego, CA, Mar. 1999. [70] R. Rejaie, H. Yu, M. Handely, and D. Estrin. M ultimedia proxy caching mechanism for quality adaptive streaming applications in the Internet. In Proc. of IEEE Infocom ’ 2000, Tel-Aviv, Israel, March 2000. [71] Reza Rejaie. An End-to-End Architecture for Quality Adaptive Streaming Ap plications in the Internet. PhD thesis, University of Southern California, Dec. 1999. USC-Tech Report-99-718. [72] J. Rexford and D. Towsley. Smoothing variable-bit-rate video in an Internet work. IE E E /A C M Transactions on Networking, April 1999. [73] L. Rizzo and L. Vicisano. Replacement policies for a proxy cache. IE E E /A C M Trans, on Networking, 8(2):158-170, April 2000. [74] O. Rose. Statistical properties of mpeg video traffic and their impact on traffic modeling in ATM systems. Technical Report 101, Univ. of Wuerzburg, Institute of Computer Science Research Series, Feb., 1995. [75] S. Sahu, P. Shenoy, and D. Towsley. Design considerations for integrated proxy servers. In Proc. of IEE E NO SSD AV’ 99, pages 247-250, Basking Ridge, NJ, June 1999. [76] J. Salehi, Z. Zhang, J. Kurose, and D. Towsley. Supporting stored video: Re ducing rate variability and end-to-end resource requirements through optimal smoothing. In IE E E /A C M Trans. Networking, September 1998. [77] D. Saparilla, K.W. Ross, and M. Reisslein. Periodic broadcasting with vbr- encoded video. In IE E E INFOCOMM, 1999. [78] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications, request for comments rfc2026, Oct. 1999. 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [79] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming Protocol (RTSP), internet rfc 2326, 1998. [80] G. M. Schuster and A. K. Katsaggelos. Rate-Distortion Based Video Com pression. Kluwer Academic Publishers, 1986. [81] G.M. Schuster and A. K. Katsaggelos. A video compression scheme with optimal bit allocation among segmentation, motion, and residual error. IEEE Transactions on Image Processing, 6(11):1487 -1502, Nov 1997. [82] S. Sen, J. Dey, J. Kurose, J.Stankovic, and D. Towsley. Cbr transmission of vbr stored video. In SPIE Symposium on Voice Video and Data Communications, Nov. 1997. [83] S. Sen, J. Rexford, and D. Towsley. Proxy prefix caching for multimedia streams. In Proc. IEEE Infocom. 99, New York, USA, March 1999. [84] D. N. Serpanos, G. Karakostas, and W. H. Wolf. Effective caching of web objects using Zipf’s law. In Proc. of IEEE International Conference on Mul timedia and Expo, ICME, volume 2, pages 727 -730, 2000. [85] C. E. Shannon. A mathematical theory of communication. Bell System Tech. Journal, 27:379-423, 1948. [ 8 6 ] J. Shim, P. Scheuermann, and R.' Vingrale. A case for delay-conscious caching of web documents. In Proc. Intl. W W W Conf Santa Clara, CA, Apr. 1997. [87] J. Shim, P. Scheuermann, and R. Vingrale. Proxy cache algorithms: design, implementation, and performance. IEEE Trans, on Knowledge and Data En gineering, ll(4):549-562, July-Aug 1999. [ 8 8 ] R. Singh and A. Ortega. Erasure recovery in predictive coding environments using multiple description coding. In Proc. of MMSP, 1999. [89] G. Sullivan and T. Wiegand. Rate-distortion optimization for video compres sion. IEEE Signal Processing Magazine, pages 74-90, Nov. 1998. [90] R. Tewari, H. Vin, A. Dan, and D. Sitaram. Resource-based caching for web servers. In Proc. SP IE /A C M Conference on Multimedia Computing and Net working, January 1998. [91] R. Tewari, H. Vin, A. Dan, and D. Sitaram. Resourse based caching for web servers. In S P IE /A C M Conference on Multimedia Computing and Networks, San Jose, CA, USA, 1998. 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [92] M. Vishwanath and P. Chou. An efficient algorithm for hierarchical compres sion of video. In IEE E International Conference in Image Processing, Nov. 1994. [93] J. Wang. A survey of web caching schemes for the Internet. In AC M Computer Communication Review, volume 29(5), pages 36-46, Oct. 1999. [94] Y. Wang, Z. L. Zhang, D. Du, and D. Su. A network conscious approach to end-to-end video delivery over wide area networks using proxy servers. In IEEE Infocom, Apr. 1998. [95] G. Abdulla S. Williams, M. Abrams, and S. Patel. Removal policies in net work caches for world-wide web documents. In Proc. of AC M SIGCOM M ’ 96, Stanford, CA, Aug. 1996. [96] X. Xu, A. Myers, H. Zhang, and R. Yavatkar. Resilient multicast support for continuous-media applications. In Proc. NOSSDAV, pages 183-194, St. Louis, MO, May 1997. [97] X. Yang and K. Ramchandran. An optimal and efficient soft caching algorithm for network image retrieval. In Proc. of ICIP, Chicago, IL, Oct. 1998. [98] L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala. RSVP: A new resource ReSerVation protocol. IEEE Network, 7(5):8-18, September 1993. [99] Z.-L Zhang, S. Nelakuditi, R. Aggarwa, and R. P. Tsang. Efficient server selective frame discard algorithms for stored video delivery over resource con strained networks. In Proc. IEEE INFOCOM ’ 99, NYC, March 1999. [100] Z.-L Zhang, S. Nelakuditi, R. Aggarwa, and R. P. Tsang. Efficient server selective frame discard algorithms for stored video delivery over resource con strained networks. Journal of Real-Time Imaging, 2000. [101] Z.-L. Zhang, Y. Wang, D. H. C. Du, and D. Su. Video staging: A proxy- server-based approach to end-to-end video delivery over wide-area networks. IEEE Trans, on Networking, 8(4):429-442, Aug. 2000. [102] G. Zipf. Human Behavior and the Principle o f Least Effort. Addison-Wesley, 1949. 1 1 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Color processing and rate control for storage and transmission of digital image and video
PDF
Blind multiuser receivers for DS -CDMA in frequency-selective fading channels
PDF
Design and analysis of MAC protocols for broadband wired/wireless networks
PDF
Design and performance analysis of low complexity encoding algorithm for H.264 /AVC
PDF
Advanced video coding techniques for Internet streaming and DVB applications
PDF
High fidelity multichannel audio compression
PDF
Intelligent systems for video analysis and access over the Internet
PDF
Contributions to image and video coding for reliable and secure communications
PDF
Dynamic radio resource management for 2G and 3G wireless systems
PDF
Contributions to coding techniques for wireless multimedia communication
PDF
Design and performance analysis of a new multiuser OFDM transceiver system
PDF
Algorithms and architectures for robust video transmission
PDF
Design and applications of MPEG video markup language (MPML)
PDF
Digital processing for transponding FDMA/TDMA satellites
PDF
Information hiding in digital images: Watermarking and steganography
PDF
Direction -of -arrival and delay estimation for DS -CDMA systems with multiple receive antennas
PDF
Content -based video analysis, indexing and representation using multimodal information
PDF
Complexity -distortion tradeoffs in image and video compression
PDF
Intelligent image content analysis: Tools, techniques and applications
PDF
Cost -sensitive cache replacement algorithms
Asset Metadata
Creator
Miao, Zhourong (author)
Core Title
Algorithms for streaming, caching and storage of digital media
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
[illegible] (
committee chair
), [illegible] (
committee member
), Kuo, C.-C. Jay (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-223457
Unique identifier
UC11339173
Identifier
3073814.pdf (filename),usctheses-c16-223457 (legacy record id)
Legacy Identifier
3073814.pdf
Dmrecord
223457
Document Type
Dissertation
Rights
Miao, Zhourong
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical