Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
(USC Thesis Other)
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING AND OPTIMIZATION OF ENERGY-EFFICIENT AND DELAY-CONSTRAINED VIDEO SHARING SERVERS by Hang Yuan A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2014 Copyright 2014 Hang Yuan I dedicate my dissertation to my family and friends, especially my loving parents, my uncle Parker and my fianc´ ee Ran. ii Acknowledgments I would like to thank my PhD advisor Professor C.-C. Jay Kuo not only for his guidance but also for always being so considerate throughout my PhD career. I would also like to thank my committee members and my fellow group members, who have encouraged and helped me a great deal. Contents Dedication ii Acknowledgments iii Abstract ix Chapter 1: Introduction 1 1.1 Significance of Research . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Power Mode Selection and Caching . . . . . . . . . . . . . . . 4 1.2.2 Data Placement . . . . . . . . . . . . . . . . . . . . . . . . . . 5 iii 1.3 Contribution of Research . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2: Energy Efficiency in Data Centers and Cloud-Based Multimedia Services: An Overview and Future Directions 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Energy Management in Data Centers . . . . . . . . . . . . . . . . . . . 10 2.2.1 Energy Saving in Server Systems . . . . . . . . . . . . . . . . . 11 2.2.2 Burdened Energy Management . . . . . . . . . . . . . . . . . . 14 2.2.3 Power-Performance Tradeoff and Modeling . . . . . . . . . . . 16 2.3 Overview of Multimedia Streaming Servers . . . . . . . . . . . . . . . 18 2.4 Future Directions on Energy-Efficient Multimedia Services in Cloud Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 New Challenges and Opportunities in Data Center Energy Saving 22 2.4.2 Towards Energy-Optimized Multimedia Hosting in Data Centers 24 2.4.3 Energy-Efficiency in Future Multimedia Cloud . . . . . . . . . 26 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 3: Design of Energy-Efficient Video Sharing Servers with Delay Con- straints 30 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Workload Scheduling and Modeling . . . . . . . . . . . . . . . . . . . 38 3.4.1 Parallel Video Workload Scheduling . . . . . . . . . . . . . . . 38 3.4.2 Modeling of Disk Idle Time . . . . . . . . . . . . . . . . . . . 39 3.5 Energy Optimization with Delay Constraint . . . . . . . . . . . . . . . 41 iv 3.5.1 Threshold-Based Energy Management . . . . . . . . . . . . . . 41 3.5.2 Prediction-Based Energy Optimization . . . . . . . . . . . . . . 43 3.6 Experimental Results and Evaluation . . . . . . . . . . . . . . . . . . . 46 3.6.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 46 3.6.2 Effects of Scheduling Window . . . . . . . . . . . . . . . . . . 48 3.6.3 Energy-Delay Performance . . . . . . . . . . . . . . . . . . . . 50 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.8 Appendix: Distribution of Disk Idle Time . . . . . . . . . . . . . . . . 53 Chapter 4: Energy-Aware Cache Management for Video Sharing Servers 55 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Workload Scheduling and Energy Efficiency . . . . . . . . . . . . . . . 57 4.2.1 Review of Workload Scheduling . . . . . . . . . . . . . . . . . 57 4.2.2 Scheduling Window and Disk Idle Time . . . . . . . . . . . . . 58 4.2.3 Workload Schedulability . . . . . . . . . . . . . . . . . . . . . 60 4.3 Energy-Delay Optimized Cache Replacement . . . . . . . . . . . . . . 61 4.3.1 Disk access and mode transistion costs . . . . . . . . . . . . . . 62 4.3.2 Combining Access and Transition Costs . . . . . . . . . . . . . 64 4.4 Prediction-Based Energy-Efficient Prefetching . . . . . . . . . . . . . . 67 4.4.1 Prefetching Decisions . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.2 Prefetching Process . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Experimental Results and Evaluation . . . . . . . . . . . . . . . . . . . 71 4.5.1 Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . 72 4.5.2 Comparison of Energy-Delay Performance . . . . . . . . . . . 74 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter 5: Learning-Based Placement Optimization in Video Servers 80 v 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2 Heuristic Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Learning-Based Placement Optimization . . . . . . . . . . . . . . . . . 85 5.3.1 Basic Framework . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 Feature Extraction and Model learning . . . . . . . . . . . . . . 88 5.3.3 Optimization and Final Placement . . . . . . . . . . . . . . . . 95 5.4 Experimental Results and Evaluation . . . . . . . . . . . . . . . . . . . 103 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Chapter 6: Conclusion and Future Work 108 Bibliography 111 List of Tables 2.1 Overview and classification of energy management algorithms . . . . . 10 3.1 Important Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Parameters for Different Speed Modes . . . . . . . . . . . . . . . . . . 48 4.1 Low power modes and corresponding recovery penalty . . . . . . . . . 56 4.2 Energy saving of PMD over TMD . . . . . . . . . . . . . . . . . . . . 75 4.3 Energy saving under low delay levels. . . . . . . . . . . . . . . . . . . 77 4.4 Parameters and results for a possible future disk model. . . . . . . . . . 78 5.1 An example of class distribution. . . . . . . . . . . . . . . . . . . . . . 87 vi 5.2 An example of class distribution. . . . . . . . . . . . . . . . . . . . . . 97 5.3 Energy saving under low delay levels. . . . . . . . . . . . . . . . . . . 106 List of Figures 2.1 Server power consumption at different utilization levels . . . . . . . . . 11 2.2 Architecture of Multimedia Server System . . . . . . . . . . . . . . . . 18 2.3 Data Striping and Parallel Accessing . . . . . . . . . . . . . . . . . . . 20 2.4 Energy Optimization for Multimedia Server System . . . . . . . . . . . 25 2.5 Distributed Multimedia Service in the Cloud . . . . . . . . . . . . . . . 29 3.1 Architecture of video sharing services . . . . . . . . . . . . . . . . . . 33 3.2 The distribution of video popularities for (a) all videos and (b) moder- ately played and tail content. . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 CDN caching and inter-arrival time at the back-end data center . . . . . 35 3.4 The video server model . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Inter-arrival times for different disks . . . . . . . . . . . . . . . . . . . 37 3.6 Modeling disk wake-up . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.7 Energy consumption for a 4-speed disk . . . . . . . . . . . . . . . . . . 42 3.8 Average idle time and scheduling window . . . . . . . . . . . . . . . . 48 3.9 Energy-Delay Performance for PMD under different scheduling window sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10 Comparison of Energy-Delay Performance . . . . . . . . . . . . . . . . 49 4.1 Average idle time and scheduling window under PEDO . . . . . . . . . 59 vii 4.2 Energy-delay performance for PEDO under different scheduling win- dow sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3 Average idle time and scheduling window sizes . . . . . . . . . . . . . 72 4.4 Energy-Delay Performance of PEEP under different scheduling window sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.5 Comparison of the energy-delay performance of four algorithms with different inter-arrival times. . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6 Comparison of energy consumption with low delay levels. . . . . . . . 74 5.1 Framework for learning the prediction model. . . . . . . . . . . . . . . 88 5.2 Hit rate distribution for leading blocks for inter-arrival time of (a) 100ms and (b) 16ms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3 Hit rate distribution for non-leading blocks for inter-arrival time of (a) 100ms and (b) 16ms. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4 Learning results for different number of classes . . . . . . . . . . . . . 94 5.5 Relationship between cumulative load and space requirement . . . . . . 96 5.6 The optimization results for two arrival rates. . . . . . . . . . . . . . . 104 5.7 Comparison of energy consumption with low delay levels. . . . . . . . 105 5.8 Comparison of energy consumption when the system is close to fully utilized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 viii Abstract With the continually growing popularity of online video sharing, energy consumption in video sharing servers has become a pivotal issue. Energy saving in large-scale video sharing data centers can be achieved by utilizing low power modes in disks, yet this could lead to excessive delay and affect the quality-of-service. In this thesis, we present techniques that jointly optimize energy and delay for video sharing servers. Specifically, we present a general energy-delay optimization framework that can be applied to a vari- ety of issues related to energy management in video-sharing services. Furthermore, the framework is generally applicable to disks with multiple low-power modes, including currently available disks and future ones. This thesis features a comprehensive survey followed by careful examination of three major problems in energy management for video-sharing services: power mode selection, caching and data placement. For the first topic, we propose a novel model that exploits the unique workload characteristics of video sharing services. Based on the model, we formulate the power mode decision problem as a constrained optimiza- tion task. By solving the optimization problem, the proposed prediction-based mode decision (PMD) algorithm selects the optimal power modes for disks with various delay constraints. For the second topic, we investigate the effects of caching on energy efficiency and study how cache can be better utilized in the context of energy-delay optimization. We ix extend the original framework and propose two new techniques along this direction to improve energy efficiency. Firstly, we adopt a energy-delay-optimized caching (EDOC) utility for cache replacement. Then, we propose the prediction-based energy-efficient prefetching (PEEP) algorithm that effectively reduces mode transition overheads for the video storage server. Experiments show that our schemes achieve significantly more energy savings under the same delay level compared to the traditional threshold-based energy management scheme. Finally, we present a learning-based optimization scheme for the placement of video data. Optimization of data placement has been known to be an NP-hard problem even when the objective function is explicitly given, and it becomes even more difficult in the context of energy efficiency due to lack of analytical models that can accurately predict energy consumption and service delays. Instead of resorting to heuristic approaches like previous work, we approach the mathematical problem by applying machine learning techniques. The solution we provide can create data-disk allocations that are energy efficient under a wide array of conditions, including different levels of service load, delay requirement and capacity constraints. x Chapter 1 Introduction 1.1 Significance of Research The expanding scale and density of data centers has made their energy consumption an imperative issue. Data center energy management has become of unprecedented impor- tance not only from an economic perspective but also for environment conservation. An estimate indicates that data centers in U.S. would consume 100 billion KWh by 2011 [32]. This can result in an increase in electricity cost and emission of more carbon diox- ide.Today, the number of servers in data centers has reached the order of 10,000. And data centers with order 150,000 servers are emerging [20]. Power density has grown so far from 10 KW/rack in 2004 [68] up to the 55KW/rack [79] most recently. In light of these trends, energy management has become a multi-dimensional problem. Recently, online video-sharing services (VSS), such as YouTube, have come to dom- inate the Internet traffic. In fact, video data are expected to represent 57% of all con- sumer Internet traffic by 2014 [21]. Similar to other web services, energy-aware design must take performance constraints into account and the problem evolves into energy- performance trade-off. In particular, video servers rely on the storage and bandwidth of the parallel disk system, which is among the heaviest energy consumer. It has been reported that such a storage system typically accounts for 27% of the total energy con- sumption in data centers [39, 110]. As storage capacity continues to grow and disk speed rises, the share of the storage system in the data center energy consumption will be further increased [7]. 1 To reduce energy consumption, various dynamic power management (DPM) tech- niques have been developed, taking advantage of the low-power modes in different devices. Compared to processors, implementation of DPM for disk drives have been rather limited [4]. Traditionally, disk drives can be spun down to save energy. However, it is very expensive because spinning the disk up will incur heavy penalties in terms of time and energy. More recently, disks with multiple power modes have been developed in both home- and server-class systemsd [73, 86]. Although energy savings in storage system can be done by utilizing these low power modes, the associated delay penalties can severely affect the performance of real-time services including VSS. Large-scale VSS has many unique characteristics which make it very different from traditional Internet services and even the movie-on-demand (MOD) video services. Firstly, a large-scale VSS delivers a large number of short videos to a huge number of users. Second, users are more sensitive to delays and jitter. Furthermore, the video repository is very diverse in terms of bit-rates, sizes and popularity. To support the huge size and rapid growth in both video data and user demand in VSS, multi-layer and parallel storage architectures have been deployed, which make system modeling and characterization difficult. To our best knowledge, there is no pre- vious work that studies the energy issues in a large-scale video-sharing servers with parallel storage systems. Also, there appears to be no work in the literature that opti- mizes energy for video servers with constraints on service delays. In this thesis, we exploit the unique workload characteristics of video sharing ser- vices and introduce a new model that efficiently facilitates the joint energy-delay opti- mization. Based on the model, we propose a prediction-based scheme that makes opti- mal selection of disk power modes in VSS. Then, we investigate the role of cache plays 2 in energy efficiency and present two novel cache management policies. Both of the pro- posed algorithms were based on mathematical analysis of energy and delay costs for the storage system. Finally, we tackle the optimization of data placement, which is known to be an NP-complete problem, using a learning-based framework. With the help of a set of representative features, accurate prediction models can be built and applied to form a constrained optimization problem, which is used by us to create energy-efficient data layouts. 1.2 Review of Previous Work The first study of energy management for data center storage systems took place around a decade ago. Colarelli and Grunwald [22] proposed an architecture that uses a small of number disks as cache so that more disks can be spun down. Similar approaches were introduced in [74] and [103]. Basically, when long idle period is expected, spinning down the disk can save energy. However, these techniques cannot be applied to video services in general, which requires real-time guarantee for maintaining continuous play- back [51]. The theoretical foundation of multi-speed disks was laid down by Gurumurthi et al in [39]. Among the work that leveraged DRPM, some researchers assumed that disk running at lower speed can still serve requests [35, 51], while others require the disk to transition back to full-speed mode [50, 110]. We name the first model full-DRPM and the second partial spin-down. While full-DRPM works well in theory, real products [86, 73] only have partial spin-down functionality. Except for DRPM, other technology has been developed to conserve energy. For example, Seagate’s latest R PowerChoice TM [86] technology combines different tech- niques and creates a number of low-power modes. In our work, we assume a general 3 disk model with multiple inactive low-power modes. Therefore, our approaches apply to both partial spin-down DRPM and practical technologies such as R PowerChoice TM . 1.2.1 Power Mode Selection and Caching Previous research utilizing low-power modes mainly employed threshold-based tech- niques for mode transition decisions [50, 110]. The central idea is to switch to low power modes when the disk idle time has exceeded a certain threshold. While this tech- nique has been proven effective, it is a heuristic approach and could not take advantage of the workload characteristics of VSS. A number of algorithms focus on extending disk idle periods and applying low power modes. In [76], authors proposed to concentrate the workload to a subset of the disks so that the rest can use more low power modes. Son et al. [89] used compiler-directed data prefetching to let disks stay longer in low power modes. In [110], Zhu and Zhou designed an efficient cache replacement algorithm so that disk access can be reduced. This technique also creates longer idle periods and enables more frequent use of low power modes. The above algorithms make use of low power modes and achieve energy savings. However, they are not suitable for real-time applications including video-sharing in which time constraint is crucial. Recently, a few schemes have been put forward for reducing energy consumption in video servers [34, 51]. In [34], the authors studied how data should be ordered and placed onto a disk drive so that the video server can support more concurrent users and consume less energy. In [51], Kim et al. proposed to adjust the round length and disk speed to save energy in MOD servers. Although [34] and [51] also addressed energy issues for video servers, the problems they looked at are very different from ours. To begin with, we deal with large-scale video-sharing workloads which consist of a diverse repository of videos with different 4 bit-rates and lengths. In [34] and [51], the video databases are rather homogeneous and contains much smaller number of videos. Second, we deal with a large number of user requests and mainly short videos. This makes delay a crucial factor which both [34] and [51] did not address. Third, our system model assumes parallel storage of the video content which both works explicitly avoided for sake of simplicity. Finally, a unrealistic power saving technology (full-DRPM) is adopted in [51], while we focus on disks with multiple inactive low-power modes, which are practically available and thus truly practical. Related research on storage cache management and prefetching studied how we can make use of cache to reduce disk energy. In particular, the concept of energy-optimal cache replacement was discussed in [110]. Also, Manzanres et al proposed to prefetch popular content into cache [63]. Although these algorithms cannot be applied to our VSS storage systems, we followed these general directions in our study and proposed energy-efficient cache management algorithms for VSS. A few recent studies took advantage of file replications and applied workload redi- rection to increase the length of disk idle intervals [67, 98]. Their approaches are fun- damentally orthogonal to our mode decision and caching algorithms, in which each workload is directed to one single disk, and they can work together with our algo- rithms. These techniques can be used first to decide which disk should be accessed for each workload; our algorithms will then be applied to optimize the mode selection and caching decision after the target disk is selected. 1.2.2 Data Placement There exists a large amount of academic work on the file assignment problem (FAP). They in general either deal with physical-level striping [84] or replication [94, 102], although some seeks a general treatment to FAP [27]. In particular, it is pointed out that 5 a generic FAP problem is NP-complete. Therefore, almost all previous work focus on finding heurisitc algorithms. Also, most of these previous works tried to improve either system throughput and other performance-related metrics, and are not very applicable for the purpose of energy efficiency. One relevant approach is proposed by Pinheiro et al. [76], in which the authors proposed to concentrate popular data onto a small subset of disks. Essentially a greedy algorithm, it allows more disks to be put into low-power modes, thereby conserving energy. Similarly, Xie studied the data placement in RAID-structure storage system and designed an energy-aware data striping algorithm [105]. In the scheme, disks were separated into two zones according to the load and different power modes can then be applied. There have many placement algorithms that update file allocation dynamically [77, 93]. Dynamic algorithms require frequent data migration and may not suit allocations such as video delivery, where file sizes are large and migration cost is heavy. There are also a large body of work that studies data replication, mostly for a dynamic environ- ment [102, 18]. File replication can improve energy efficiency [67, 98]. In this thesis (Chapter 5), we consider a static placement problem, in which we possess prior knowledge about workload level and video popularity. We look beyond heuristics as proposed in virtually all previous work and attack the mathematical opti- mization problem directly. To make things a little simpler, we do not consider file repli- cations. However, dynamic replication schemes can work together with our approach: our algorithm provides the initial static data allocation, while dynamic algorithms including replication can be used in run time. 6 1.3 Contribution of Research This dissertation studies the energy efficiency of large-scale video sharing servers and proposes schemes that minimize energy consumption under service delay constraints. Specifically: 1. It provides an overview of critical issues of energy management in a large-scale video-sharing server (VSS) with a diverse video database and parallel storage systems. In particular, we study the multi-layer server architecture and scheduling of parallel workload. 2. A model for disk idle time for video-sharing servers is proposed by taking into account the unique characteristics of parallel video workloads. That is, most video blocks need not be delivered right away after they are requested. The model is based on the time-varying Poisson process and it accurately characterizes the disk behaviors in a VSS. 3. The thesis proposes a prediction-based scheme that makes optimal selections of disk power modes in VSSs. Based on a novel Lagrangian formulation to solve the energy optimization problem, the proposed algorithm consumes up to 15.1% less energy than the traditional threshold-based approach under the same delay constraints. The saving percentage can go up to 35% if more power modes are available, for example, through the DRPM technology. 4. Since caching and disk energy management are essentially related, the paper pro- poses two algorithms to optimize cache utilization for energy efficiency in con- junction with disk mode selection. More specifically, we look into how to bet- ter use the cache to improve scheduling efficiency and reduce the mode transi- tion overheads. Compared to traditional caching algorithms, these schemes allow 7 energy consumption to be reduced by up to 29% under very low delay constraints. With DRPM technology, energy saving can go up to 40%. 5. We present an efficient and practical learning-based optimization technique to solve the static data placement problem in video servers. Data placement in the context of a multi-disk environment is an NP-hard problem. Unlike almost all previous works, we do not use heuristics and solve the optimization with the help of machine learning tools and some simplifications. Experiments show that our algorithm is effective under different service load levels, service delay levels and capacity constraints. Data layouts created by our algorithm saves up to 15% energy under low levels of service delay. 1.4 Outline The rest of the thesis proposal is organized as follows. In Chapter 2, we conducted a comprehensive survey of the techniques and approaches in the fields of energy effi- ciency for data centers and large-scale multimedia services. Chapter 3 introduces the our problem setting and presents the prediction-based energy-delay optimization algo- rithm, which optimizes the power mode selection for disks. Chapter 4 proposes two algorithms that addresses cache management to achieve optimized energy-efficiency. Chapter 5 presents the learning-based optimization algorithm for data placement. Chap- ter 6 concludes the thesis proposal and discusses future work. 8 Chapter 2 Energy Efficiency in Data Centers and Cloud-Based Multimedia Services: An Overview and Future Directions 2.1 Introduction As large data centers emerge for media-rich Internet services and applications, their energy efficiency has become a central issue of both economic importance and envi- ronmental urgency. Cloud computing relies on the compute and storage infrastructure provided by these data centers and is becoming a highly accepted computing paradigm. The need for high availability and scalability in Internet computing has resulted in multi- layer, distributed and virtualized systems which make energy efficiency a more challeng- ing problem. In particular, multimedia services are growing more and more popular in recent years. How to manage power and energy for multimedia services in data centers and cloud environment, therefore, is a subject of great practical value. The energy consumption of data centers has been skyrocketing. An estimate indi- cates that data centers in U.S. would consume 100 billion KWh by 2011 [32]. This can result in an increase in electricity cost and emission of more carbon dioxide. In this chapter, we identify some of the challenges in energy management and summarize design techniques for energy-efficient data centers. 9 Table 2.1: Overview and classification of energy management algorithms Management Target Server Energy Saving Burdened Energy Management Enabling Technology Low-Poewr Active States Inactive Power States Cooling and Thermal Control Power Provisioning Device-Level Techniques CPU Disk [109, 31] [64, 101] PSU Other Devices [59, 46, 57, 45] [10, 39] [56, 64] [6] Global Techniques [30] [78, 90, 14] [87, 96, 8] [78, 79, 33] With the increasing popularity of online multimedia streaming and sharing, video data are expected to represent 90% of all consumer Internet traffic by 2012 [21]. Sim- ilar to other web services, energy-aware design must take performance constraints into account and the problem evolves into energy-performance tradeoff. We study Internet multimedia services in the aspects of architecture, data storage, delivery and perfor- mance in this chapter, and explore the framework for energy efficient streaming services in cloud computing environments. The rest of the chapter is organized as follows: Section 2.2 discusses different dimen- sions of energy management for data centers. Section 2.3 summarizes the design prin- ciples of multimedia servers. Section 2.4 identifies some of the limitations of existing approaches and some future directions of energy-efficient cloud-based multimedia ser- vices. Section 2.5 provides some conclusions. 2.2 Energy Management in Data Centers Over the years we have witnessed the explosive growth of data centers in both size and density. Today, the number of servers in data centers has reached the order of 10,000. And data centers with order 150,000 servers are emerging [20]. Power density has grown so far from 10 KW/rack in 2004 [68] up to the 55KW/rack [79] most recently. In light of these trends, energy management has become a multi-dimensional problem, as summarized and classified in Table 1. In addition to energy (or average power) con- sumption in the server system, one has to deal with the enormous energy cost of cooling 10 Figure 2.1: Server power consumption at different utilization levels and power delivery infrastructure. At the same time, one needs to maintain quality- of-service (QoS) for demanding Internet applications. In this section, we expound the aforementioned three aspects. 2.2.1 Energy Saving in Server Systems Much of the astronomical amount of energy consumed by data centers has been wasted on idle or underutilized resources. The gap between peak and average workload has resulted in very low average server utilization in current data centers (mostly between 20%30%). As current hardware devices draw about 50% of the peak power [4], the energy efficiency in the regions of low utilization is very poor, as depicted in Figure 2.1. The goal is to achieve energy proportionality, both at the device and data center levels. Towards this end, adjustable power states as specified in Advanced Configuration and Power Interface (ACPI) [5] have been widely implemented. In ACPI, two types of power states are defined: active and inactive. In active states, the device is still running 11 but its power consumption level changes with performance; in inactive states, operation of the device is suspended with dissipation greatly decreased. (1) Low Power Active States: These states are usually realized by dynamic voltage and frequency scaling (DVFS), which allows us to reduce the devices operating voltage to cut power consumption. To leverage the dynamic power range enabled by DVFS, various scheduling techniques have been proposed to adapt frequency and voltage in response to variations in utilization levels and application requirement. For example, we can apply DVFS to CPUs during memory-intensive periods of execution. Similarly, CPU can be switched to low-power states during phases of internal communication in parallel programs [59]. Analogous to DVFS, multispeed technology has been adopted in disks to realize low-power active states [10, 39]. DVFS scheduling policies can be implemented in different fashions. On the one hand, we can use hardware-based approaches. In [57], Li et al tracked L2 cache misses and instruction-level parallelism in hardware, and exploited the low-usage period of CPU. On the other hand, DVFS polices can be easily ported into OS kernel which has the in-built ability to monitor CPU utilization level. Using OS-based features, we can predict runtime behaviors of applications and set processor power states accordingly [46]. In brief, low-level methods allow more accurate scheduling, but do not have the advantage of easy implementation and applicability of OS-level approaches. To take full advantage of DVFS capabilities in server systems, scheduling polices of different devices and levels must be coordinated. For a typical multiprocessor chip, we can perform DVFS separately for each core, while impose chip-level constraints [45]. For a data center, DVFS-based policies often serve as a local component of multi-level management to enforce polices on individual servers [30]. (2) Suspended Inactive States: To remove the energy waste caused by idle devices, inactive states have been implemented to allow partial shutdown of hardware circuitry 12 and component [9]. In order to make use of these low-power states, both local and system-wide approaches can be used. And there are two main goals: a) idle more devices or server; b) keep them in inactive states as long as possible. The first goal can be accomplished by load concentration. For subsystem such as disk arrays, we can use cache disks to keep most frequently requested data, letting the rest of disks be idle [109]. Datacenter-level load concentration is usually performed by server consolidation, through which workloads are dispatched to a small set of active servers. In cloud computing environments, server consolidation has been made much easier with virtualization, which separates applications from the underlying hardware and comfortably supports resource sharing. In energy-efficient consolidation, we need to provision and allocate resources according to load variation. For this purpose, workload prediction and virtual machine (VM) migration have been proved effective [78, 90, 14]. To let idle devices and servers stay longer in suspended states, request batching can be applied. This technique allows requests to be grouped together and executed in batches, prolonging the idle periods. One simple approach is to wake up the servers only after incoming requests have been accumulated longer than a timeout threshold [23]. More sophisticated request batching algorithms involve the input of application- level knowledge such as service priority [6]. (3) Hybrid and Global Approaches: Different energy saving techniques can be inte- grated to improve system level saving. One approach is to put together sleep mecha- nisms for different subsystems to realize global inactive states [64]. Moreover, the ben- efits of active and inactive low-power states can be synthesized by combining request batching, DVFS, power capping and server consolidation [96, 78, 31]. 13 2.2.2 Burdened Energy Management Although servers are the sources of direct energy consumption, cooling and power sup- ply equipments surprisingly draw half of the total energy in data centers [32]. In other words, to deliver and remove every watt of heat, we have to generate another watt as bur- dened cost [75]. Therefore, energy management for cooling and power infrastructure is no less important than that for servers. The burdened cost complicates the energy issue in the following respects: First, we need to address the power consumption of cooling infrastructure. Second, we have to control the thermal environment of data centers. Second, we are concerned with the efficiency of power supply and provisioning. 1) Cooling: Today, data centers are embracing high-density design such as the blade architecture [56]. With higher power density, the probability of thermal failover also increases [79]. To prevent it, additional cooling facilities such as server fans and room air conditioners have been deployed. In particular, fan power grows cubically with respect to its speed [64]. With common employment of variable-speed fans and tem- perature sensors in servers, we can set fan speed by referring to server temperatures to ensure both energy saving and thermal safety. Tolia et al gave a good overview of the fan control problem in [96]. In typical blade architecture, a number of fans are shared among blades in the same rack. A simple control method is to adjust all fans together according to the maximum blade temperature in the rack. More sophisticated approaches can have each individual fan controlled according to its own thermal con- dition. The physical location and blade configuration, as well as hardware utilization information, can all be useful in regulating fan operation [96]. 2) Thermal Management: Even with well-designed cooling, overheating and ther- mal imbalance can strike. This not only causes reliability concerns, but threatens overall 14 energy efficiency as burdened costs go up. Similar to server energy management, ther- mal management policies can be imposed at device or system. An example of device- level technique is to throttle processes whenever the workload risks overheating any device. Specifically, CPU performance counters can be used to infer the energy cost of each process, and allow system to keep CPU temperatures within certain limits [101]. Datacenter-wide techniques allow dynamic workload allocation based on temperature distribution of the system. In specific, global thermal management should avoid placing new workload onto racks that are already experiencing thermal emergencies, and should move existing workload to from hot spots to cooler regions. To make dynamic configuration effective, we need to build a thermal control sys- tem [87]. Both hardware sensor and software monitor can provide valuable reference in feedback control, directing thermal-aware workload allocation. Researchers have inves- tigated dynamic thermal management for various platforms, addressing both cooling system and server workloads [87, 8]. Through monitoring thermal load and eliminating hot spots, we can keep temperature and air flow balanced across the whole data center. 3) Power Delivery and Provisioning: The efficiency of power delivery for data cen- ters has been low due to AC-to-DC conversion losses along the power distribution paths [56] and over-supply [33]. The PSU conversion efficiency varies with electrical load, with poorest efficiency at low level of load. Unfortunately, two factors have put PSUs in those inefficient regions [64]. Firstly, data centers require highly configurable and scalable settings, and as a result, people usually just supply more than enough power to the system. Secondly, redundant supplies are necessary to share thermal loads and pre- vent PSU failure. To push PSU operation out of those regions, we can dynamically vary the number of PSU to allow more efficient load sharing [56]. Alternatively, we can try to reduce redundancy by re-designing rack-level power supply using fine-grained PSU configuration [64]. 15 Furthermore, power supply loss can be contained by improving the efficiency of power provisioning. We can approach this goal by managing power budget of the sys- tem. We can either squeeze in more devices to increase utilization of power budget [33], or reduce system peak power [79]. Power budget and capping can be enforced on all levels simultaneously [78]. Also, the budgets of different devices or groups can be redistributed, for instance between memory and CPU [6]. So far, we have discussed energy management separately for servers and burdened infrastructure. These two aspects of energy efficiency should be considered together in system design, as done in many papers [96, 78, 64, 33]. On the one hand, energy saving in servers can directly translate into reduced power consumption of cooling and power supply units. On the other hand, effective thermal management and green power supply minimize the overheads in server energy management. In a system which employs diverse energy management techniques, information of different layers must be exchanged and coordinated to avoid chaos and achieve system-wide energy-efficiency [78]. 2.2.3 Power-Performance Tradeoff and Modeling Energy management involves tradeoff between power and performance. For virtually all energy management techniques, we encounter performance penalties in one way or another. Switching to low power states reduces hardware frequency or responsiveness; server consolidation leads to internal contention; temperature-aware workload allocation also incurs considerable overheads. We therefore need to evaluate our energy manage- ment polices based on its potential impact on energy and performance. In server con- solidation, for example, the simplistic energy saving approach is to go with the smallest server set possible. While higher utilization can be obtained in this way, the overloaded servers may slow down the execution of applications and end up consuming more energy 16 in addition to degrading performance. More effective consolidation schemes [90] adopt modeling approaches. With accurate power and performance models, we can predict the potential benefits or penalties of different power states or workload distributions before taking real actions [6]. Then, we can formulate energy management as a constrained optimization problem. Power modeling is relatively easy provided we understand the power behaviors of our hardware system. There have been many power models proposed based on resource utilization. Some are formed by detailed analysis of certain architecture [47]. A real- time power model, however, often requires simpler and high-level approaches because of implementation overheads [81]. Many researchers used solely CPU utilization to esti- mate the system power using linear or other models [78, 33]. Some added OS-reported disk usage [[90, 41] or memory cycles [70] into their model. Others incorporated CPU performance counters [29]. Generally speaking, power models tend to be less accurate for workloads that are not CPU-bound [81], as non-CPU components have less powerful OS or physical level in-built monitoring facilities. As CPUs share in power and performance continues to shrink, it becomes increasingly important to develop high-level inferences for memory and disk power behaviors [81]. Finally, to realize an on-line energy optimizer, we need to provide the system with workload and performance constraints. In this way, the system can be informed when performance degradation is close to a threshold level and adjust its energy manage- ment polices accordingly. Srikantaiah et al modeled the performance levels as functions of disk and CPU utilization, and used it to bound energy minimization [90]. In gen- eral, however, workload behavior and performance metrics vary from one application to another, and hence application-specific knowledge is desirable. We will touch on this subject in the context of multimedia service in later sections. 17 Figure 2.2: Architecture of Multimedia Server System 2.3 Overview of Multimedia Streaming Servers Multimedia streaming imposes stringent timing constraints. In addition, servers have to process data such that target bit-rate is satisfied. Furthermore, such services require significantly larger storage and bandwidth requirement compared with traditional appli- cations. To satisfy these, we need to design systems with large capacity, good scalability and mechanisms which can guarantee service quality. Recent efforts have been focused on server architecture, delivery control, storage and scheduling. This section gives a brief overview of the above four aspects. 1) Architecture Design: A typical clustered media server system consists of a large number of server nodes interconnected by a high-speed switching network, as shown in Figure 2.2. Normally, servers can be classified into front-end access or delivery nodes and back-end storage nodes. The former handles requests from the client and transmit streaming data, while the latter stores data blocks and forward them to the front-end nodes. If the front- and back-end nodes are physically separated, it is called two-tier architecture. Alternatively, we have flat architecture in which the same server is used for both storage and delivery. Tewari et al concluded that when the internal switching 18 bandwidth is low, the flat design is preferred because of better server utilization; oth- erwise, the two-tier configuration performs better due to low delay variance [95]. In addition, there is the direct access architecture where clients contact both access and storage nodes to avoid chunk data transfer inside the server cluster. A detailed compari- son of these design models can be found in [48]. 2) Front-End Access and Delivery Control: One major function for the front-end is to carry out admission control. Due to limited resources, new clients should not be admitted if they are likely to make the system unable to maintain QoS requirement for the existing users. To adapt to dynamic workloads and system behaviors, access servers need to predict changes in system condition and performance. After that, hardware resources are reserved or allocated according to the workloads in the next time inter- val. Typical admission control algorithms can be either deterministic [62] or statistical [99]. The former offers strict QoS guarantees, and the latter has the advantage of better utilization. Another type of job for the front-end nodes is QoS control, which addresses network congestion and packet loss. To deal with congestion, access servers generally employ rate control so that transfer rate suits network bandwidth. Transcoding, frame-dropping, frequency-domain and re-quantization filters can all effectively reduce bit-rate in case of network jam [104]. To tackle packet loss, error control techniques including channel coding, source coding-based FEC and retransmission can be performed. 3) Storage Management: For multimedia storage, challenges come from the demanding requirements for capacity, throughput, load balance and error-resilience. To support the large capacity requirement, storage-area-network (SAN) [38] or network- attached storage (NAS) is adopted to separate data from front-end nodes and enable high-speed data transfer [48]. An effective way to increase throughput is to stripe data across disk arrays (e.g. RAID). By putting data in multiple disks and accessing them 19 (a) Synchronized Data (b) Interleaved Data Figure 2.3: Data Striping and Parallel Accessing in parallel, we can overcome the limit on concurrent access [53]. If files are placed and retrieved such that accesses are synchronized (see Figure 2.3 (a)), data transfer rate will be increased with the array size with latency penalty. Another way is to interleave data blocks in a staggered manner with different streams being retrieved in each round simultaneously (Figure 2.3 (b)). By doing so, we sacrifice some transfer rate but reduce the buffer requirement and startup delay [36]. Besides capacity and throughput, good data placement policy also results in balanced load and fault tolerance. Load imbalance is caused by both inter- and intra-file skewness. Storage nodes that have the most popular files or even the popular blocks (e.g. starting part of a video) can be overwhelmed and then system performance will be degraded [48]. We therefore need to replicate popular data. Data replication can also improve fault tolerance since multiple copies of the same data blocks can be made available. 4) Real-Time Scheduling: To meet playback deadlines, timing of both processor and disk activities need to be carefully planned in multimedia server. For processors, typical scheduling techniques prioritize processes by their deadlines or request rates. While earliest-deadline-first (EDF) policies give better CPU utilization, highest-rate- first scheduling can ensure that deadlines can be met [104]. Real-time disk scheduling is similar to process scheduling, except that for disks we need to consider seek time and rotational latency. For example, if we rely on simple policy such as EDF, we may 20 incur unnecessary seek time, poor disk utilization and low throughput. More preferable schemes combine EDF with traditional disk scheduling algorithms that are designed to reduce seek time and rotational latency, such as shortest seek time first (SSTF) and SCAN [69, 40]. To summarize, we have discussed various problems in multimedia server design. While extensive research has been conducted in these areas to optimize the system per- formance, little attention has been paid to the associated power/energy issue. We will introduce some directions in realizing an energy-conscious multimedia server design in Section 2.4.2. 2.4 Future Directions on Energy-Efficient Multimedia Services in Cloud Computing Environments Recent developments in data centers and cloud computing have posed new challenges to energy management for multimedia services. Also, the characteristics of multimedia services have not yet been leveraged in energy optimization. Furthermore, the future cloud services are expected to embrace a more distributed framework, making energy management a more complicated subject. In this section, we first present some promising topics in energy-efficient data center design. We then look at the largely unchartered area of energy optimization for multi- media servers. Finally, we examine a few new directions that are becoming noteworthy with the rise of cloud computing. 21 2.4.1 New Challenges and Opportunities in Data Center Energy Saving 1) Energy-Proportional Hardware Redesign: While DVFS can elegantly reduce CPU power dissipation in low usage periods, recent trends have limited its impact. To begin with, CPU no longer dominates energy consumption [4, 64]. And even for processors, DVFS has become less important for two reasons. Firstly, threshold voltage is reducing quickly and the room for voltage scaling has been squeezed. Second, multicore chips often use shared voltage rail among cores [70], further diminishing the effect of DVFS. For devices other than CPU, the implementation of low-power active states has been rather limited. This has resulted in a narrow dynamic power range for those devices. That is why researchers, including Barroso and Holze [4], have called for hardware and architecture re-design, especially in memory and disk subsystems. Newest studies have seen many interesting designs including shared memory among blade servers and disk caching by flash drives [58]. With the concept of energy-proportionality being more widely recognized, we can expect more innovation in this domain. 2) Power Management in Virtualized Environments: While virtualization enables easy implementation of server consolidation, it may be hard in practice to fully idle servers even during low demand periods, resulting in little energy saving. For one thing, data are distributed among machines. There is therefore a reliability issue regarding putting servers into inactive modes [4]. Also, high wake-up penalties can cause severe QoS degradation [4]. An additional limiting factor is the execution of background task. To make better use of virtual machine consolidation, we need hardware systems that allow fast transition between states. The pattern of utilization variation and idle periods must be considered in the system design to satisfy latency requirements [64]. Also, we may need to modify OS kernels so that background tasks will not prevent servers from entering sleep states [44]. 22 Additionally, the isolation of application from the underlying hardware conflicts with traditional energy management techniques that took effect directly on physical resource. In specific, different VMs can have contradictory views on how to adjust power states of the hardware device they are sharing. To address this, Nathuji et al [71, 72] proposed to use soft“ power states which were generated by OS scheduling to emulate the effect of hardware power states. If inconsistency on power state adjustment is detected, soft states can be applied instead of hardware scaling. The rise of cloud computing has made virtualization a hot research area. Notably for energy management, we need to figure out how to exploit VM-level information and coordinate application-side energy management policies from different VMs. Research on energy-efficient virtualized servers is still in its preliminary stage, and we can foresee more endeavors attacking the issue. 3) Exploiting Platform Heterogeneity: Most large-scale server systems are hetero- geneous because of system upgrade, replacement and capacity increase [18]. Although seeming to be an undesirable property, heterogeneity of different levels can actually be beneficial to energy optimization. At chip level, it has been shown that asymmetric mul- ticore processors can offer better power efficiency than symmetric ones [66]. At cluster level, we can have a large number of server classes according to their hardware or con- figurations, including different hardware platforms, power modes and OS settings [70]. Workloads were allocated to different server classes according to the estimated number of servers required for each class using their platform model. Although the issue of heterogeneity has already been inspected in a few papers, no systematic framework with effective power and performance models has been devel- oped. For example, the authors of [70] did not use any power estimation in workload allocation. We believe deeper inquiry is necessary in this area of studies. 23 2.4.2 Towards Energy-Optimized Multimedia Hosting in Data Cen- ters One direction towards an energy-aware multimedia server system is to consider energy cost in various decision steps. For example, we can impose energy constraints in admis- sion control so that a new request will not be accepted if it can potentially make server energy consumption unreasonably high, or upset the system thermal stability. Energy cost can likewise serve as one of the determinant factors in rate control and scheduling. With each important step in streaming services optimized against energy concern, we will naturally have an energy-aware multimedia system. Another direction is to improve energy management by leveraging the characteristics of multimedia services. A possible framework of energy optimization for multimedia servers is sketched in Figure 2.4. With models and workload information available, we can estimate power and performance levels for any candidate steps of energy manage- ment. Here, we clearly see the important roles played by workload prediction and per- formance modeling. In the following, we introduce these two subjects, and discuss how the corresponding techniques can be incorporated into the design of an energy-efficient data center. 1) Workload Analysis and Prediction: Multimedia workload can be inspected from the user access perspective. Properties such as access time distribution, arrival rate, session length and interactive control can all be conducive to resource provisioning. With some experiments and simple models, we can predict user access behaviors, and more efficiently allocate resources [49, 108]. This represents a fundamental step leading to energy-aware system design. It is also useful to analyze workload from the content retrieval viewpoint. For exam- ple, most requests are targeted at a small percentage of files. This kind of file access locality can be exploited by data caching and load concentration techniques to achieve 24 Figure 2.4: Energy Optimization for Multimedia Server System significant energy saving. However, things can become complicated as the potential load imbalance caused by locality can disturb the thermal system. Another important charac- teristic of content access pattern is the impact of new files, which brought in the majority of accesses [16]. Besides, popularity of new files decreases quickly with predictable pat- terns. These attributes are central to workload prediction and resource allocation, both of which contribute to energy optimization of the server system. 2) Performance Modeling for Multimedia Services: Performance degradation is often associated with some bottleneck resource, and this relationship is not hard to find given enough application-specific information. One example of modeling system perfor- mance by identifying bottleneck resource was described in [17]. The authors classified the video streaming requests into disk- and CPU-bound ones, and then constructed scal- ing functions of system capacity accordingly. Using this simple model, system through- put can be obtained given workload inputs. Such performance models can help reduce over-provisioning and energy consump- tion. Nevertheless, they did not consider workload dynamics. To take time-varying 25 request pattern into account, we can apply queueing theory. For instance, the server cluster can be formulated as a single queue [49]. A more comprehensive approach is to model it as a multi-queue system, considering disk and paths between front-end and back-end nodes in both directions (receiving and sending) separately [95]. To sum up, we should take advantage of our knowledge about multimedia services to assist energy management. We can apply energy-conscious principles in multime- dia server design; alternatively, we may exploit application-specific analysis, such as workload and performance characterization, to achieve multimedia-aware energy opti- mization. With rich-media content swamping the Internet, it may be necessary to pursue these avenues that single out multimedia services to enhance overall energy efficiency in data centers. 2.4.3 Energy-Efficiency in Future Multimedia Cloud Previous work on energy management in data centers paved the way for energy-efficient cloud computing. The cloud framework contains a large spectrum of models from high- level SaaS to low-level infrastructure-as-a-service (IaaS), embodying various elements including distributed computing [3, 100]. In this subsection, we look beyond data center design and explore other facets of energy-efficient computing in the context of cloud- based multimedia services. Particularly, we discuss future directions in energy-aware network and distributed cloud services. Efficiency in Network and Transmission To fully realize the cloud computing paradigm for multimedia sharing, the networks that link us to the cloud need to be extremely fast and reliable. Weiss [100] even suggested that network would replace processing unit as the center of computing. For energy effi- ciency, the implication is complicated. On the one hand, we need to optimize routing 26 protocols to reduce the number of traversing hops. On the other hand, power consump- tion of network nodes has to be taken care of. In addition, the node processing through- put and power of peripheral devices are important factors. With IEEEs recent effort, an energy-efficient Ethernet (IEEE 802.3 az) is coming out soon, aiming at reduce power consumption during period of low link utilization [19]. This will provide a foundation for the future studies on energy issues in multimedia transmission. To stream multimedia sequences to a large number of clients, dedicated unicast- ing is not energy efficient. There have been many proposals for multicast streaming service, making use of user tolerance and buffer arrangement [24, 55]. How to maxi- mize resource sharing while maintaining user experience poses optimization problems on scheduling of transmission and delivery. Energy constraints can be added into the optimization formulations to derive energy-aware multicast scheduling policies, which is another promising research problem. With increasing use of mobile devices for Internet service, energy efficiency on the client side has also caught a lot of attention. Optimization of network transmission can help to slash power consumptions on the network interface cards (NICs), taking advantage of server side information [12] and low-power states of NICs [65]. As mobile clients accessing high definition video becomes a norm, energy-aware transmission has grown into a very worthwhile research challenge. Distributed Multimedia Services in the Cloud Large-scale data centers incur huge cost on infrastructure and power. From either an economic or energy viewpoint, it is cost-effective for cloud services to use multiple data centers in geographically diverse locations [20]. Most large-scale multimedia services cater to a global audience. With distributed design, we can anticipate cost saving in data transfer and reduction in service delay [83]. 27 The distributed architecture provides rich flexibility to multimedia service design. For example, we can use a large data center as central server, and rent compute and storage resources from other data centers as control [55] and cache servers [31]. A possible scenario of such design is illustrated in Figure 2.5, where multiple data cen- ters are involved. With the modular and containerized server design altering the way datacenters are constructed, we can expect increasing mobility and diversity in cloud resource deployment. The benefits are two-fold. First, we can spare the central system from heavy scheduling and delivery tasks. Second, it enables effective data placement in geo-diverse locations to better serve the local users. Files that are popular or are of regional affinity can be easily cached in local servers. The distributed architecture is inherently heterogeneous. This attribute can be seen as an extension of the intra-site heterogeneity, and similar energy management principles can be applied. For example, Liu et al [60] proposed a task allocation and scheduling scheme for multi-datacenter platform. In their power model, they considered not only the intra-site cost, but the energy cost on inter-site communication and data migration. Furthermore, geo-diversity provides new dimensions for energy optimization, such as electricity prices in different locations [52]. In short, we believe the distributed cloud service paradigm provides a cost-effective solution to large-scale rich media services, and offers new opportunities for their energy management. Cross-site energy optimization has just begun to draw attention and deserves further investigation. 28 Figure 2.5: Distributed Multimedia Service in the Cloud 2.5 Conclusions In this chapter, we reviewed recent research on energy-related issues in data centers and multimedia services, and suggested future directions of them in cloud computing envi- ronments. We first looked at various aspects of energy management, including server energy and burdened energy costs. Then, we gave an overview of multimedia servers. Next, we commented on the limitation of existing techniques, and addressed some areas that may potentially contribute to an energy-efficient multimedia server system. Finally, we explored some new dimensions for green multimedia services in future cloud-based framework. 29 Chapter 3 Design of Energy-Efficient Video Sharing Servers with Delay Constraints 3.1 Introduction YouTube and other large-scale video-sharing services (VSSs) have come to dominate the Internet traffic [21] nowadays. These VSSs mainly deal with user-generated videos that lead to very different service patterns from traditional Internet applications, such as movie-on-demand (MOD), due to various reasons. As compared with the MOD workload, the number of users and videos in VSS are greater, while the video length becomes shorter. Also, video files in VSSs are considerably more heterogeneous in terms of their sizes and bit-rates than those in MOD. An efficient video-sharing server system needs to store a huge amount of video files and deliver them to possibly millions of users. To meet the rapid growth in demand of both data and users, multi-layer infrastructures have been deployed, which can result in unique workload characteristics. To provide high throughput, scalability and fault- tolerance, such server systems utilize parallel architectures for storage and retrieval [53, 54]. In parallel video servers, a video file is divided into multiple blocks and spread across multiple storage nodes. By dividing a relatively long and continuous video file into smaller units, more requests can be served concurrently with proper disk schedul- ing [36]. A further benefit is that both bandwidth and capacity can be easily scaled up by putting additional servers into the system [53]. 30 The use of massive data centers for large-scale VSS has led to ever-increasing energy cost. In particular, video servers rely on the storage and bandwidth of the parallel disk system which is among the heaviest energy consumer. It has been reported that such a storage system typically accounts for 27% of the total energy consumption in data centers [39, 110]. A large storage system may consist of hundreds of disk drives which exert a huge burden on the energy resources. Low-power modes for different devices have been designed and leveraged to reduce energy consumption. Compared to processors, implementation of such technology for disk drives has been rather limited [4]. One widely applied energy-saving technique is to spin down the disk when long idle period is expected [74, 103]. However, it incurs heavy penalties in terms of both spin-up time and energy, making it not a viable choice to video services that require real-time guarantee for maintaining continuous playback [51]. For short-video workloads in VSS, meeting the time constraint is even trickier because users are less tolerant towards initial delay and jitter. Recently, multiple speed modes have been introduced in both home- and server- class disks [86, 73]. The basic idea is to reduce the spinning speed to lower energy consumption. Gurumurthi et al. [39] laid down the theoretical foundation for Dynamic Rotations-Per-Minute (DRPM). Among the work that leveraged DRPM, some researchers assumed that disk running at lower speed can still serve requests [35, 51], while others require the disk to transition back to full-speed mode [50, 110]. We name the first model full-DRPM and the second partial spin-down. While full-DRPM works well in theory, real products [86, 73] only have partial spin-down functionality. There- fore, we choose to use a partial spin-down disk model in our work. Although we chose to use partial spin-down based on DRPM in this chapter, it is important to note that our technique applies to any disks with multiple low-power mode. In Chapter 4, we will use a more conservative disk model following the specification 31 of a real disk and show that our algorithm is still able to achieve significant energy saving. Therefore, our work can work on a wide variety of disks and it can attain more siginificant saving for future disks with more power modes and shorter wakeup penalty. Previous research utilizing partial spin-down mainly employed threshold-based tech- niques for mode transition decisions [50, 110]. The central idea is to reduce disk speed when the disk idle time has exceeded a certain threshold. In this chapter, we propose a new prediction-based approach which exploits the unique characteristics of short-video sharing workloads and achieves energy-delay optimization. To our best knowledge, it is the first study on energy issues in parallel video-sharing servers. Furthermore, we pro- poses the first framework which jointly optimizes energy and delay for video servers. The rest of the chapter is organized as follows. We will present some background information in Section 3.2. Section 3.3 introduces the system model. Section 3.4 describes the scheduling of heterogeneous and parallel video workloads, and the mod- eling of disk idle interval under such scheduling. Section 3.5 provides the details of the proposed energy-delay optimization algorithm. Section 3.6 includes the details of the evaluation of the proposed algorithm through simulation. Section 3.7 wraps the chapter with some concluding remarks. 3.2 Background Our study focuses on YouTube-like large-scale video sharing services. According to one of the Google technical talks [107], YouTube employs a hierarchical and distributed delivery cloud that contains at least one level of cache servers placed globally close to end users. These cache servers store and deliver the most popular videos, and are typically part of the content distribution network (CDN). The architecture of VSS with one level of cache servers is shown in figure 3.1. 32 Figure 3.1: Architecture of video sharing services In traditional server workload including MOD, content popularity often follows a Zipf-like distribution [1]. Video sharing workload contains mainly user-generated con- tent, which has been shown to exhibit a different popularity distribution [11]. Figure 3.2 shows the relationship between video ranks and normalized popularities in log scale. We can see that while the popular videos suggest a Zipf-like distribution (linear in log- scale),the tail part is clearly non-Zipf but rather exponential. In such a media delivery system, the most popular contents are served by CDN servers, leaving the rest in data center storage. (a) (b) Figure 3.2: The distribution of video popularities for (a) all videos and (b) moderately played and tail content. 33 This chapter focuses on the back-end storage system, which serves the non-Zipf part of the videos. The large number of less popular content corresponds to the “long-tail” effect [2]–a lot of requests will be served but mostly for different files [107]. As a result, we can anticipate low cache hit rate and a lot of random disk accesses [107] in the back- end data center. As shown in Figure 3.1, the back-end data center employs a parallel storage architecture. Each storage node is an I/O server, which can have either a single disk or a disk array. In this paper, we assume that a single disk is attached to each node. However, our scheme is based on a very general system model and, therefore, can be easily applied to the case of disk arrays. One simple way to do this is to treat each disk array as a single logical disk attached to an I/O server node. Within this setting, the percentage of video files cached by CDN servers plays a decisive role in shaping the workload at the back-end storage server. To study its effect, we calculated the inter-arrival times at the back-end from an online YouTube dataset [15] (which contains around 160,000 videos) for different CDN caching percentages. We assume that the request arrivals follow a Poisson process. As shown in Figure 3.3, the inter-arrival time changes exponentially with CDN-caching percentage. According to a study in [37], the amount of YouTube traffic that goes through CDN is limited due to redirection cost. In our study, we assume a range between 3% and 15%. This corresponds to inter-arrival times between 30 ms to 100 ms for our dataset. 3.3 System Model Figure 3.4 shows the architecture of a typical video server and the data path for video delivery. When a client requests a video for playback, the dispatcher will parse the request and locate all the blocks of the video. One workload will be created for each block and added to the service queue. Then, the main memory buffer cache will be 34 Figure 3.3: CDN caching and inter-arrival time at the back-end data center Figure 3.4: The video server model checked for each of the workload. If the requested block is cached, data will be read out from main memory. Otherwise, the workload will be added to the disk service queue and disk access will be initiated. Disk energy takes up the main part of energy consumption, accounting for 86% of total energy in a typical EMC Symmetrix 3000 storage system [51]. Moreover, disks will only be accessed during cache misses, while memory cache needs to be traversed for all video blocks. Therefore, it is harder to exploit techniques such as workload scheduling to save energy for main memory. Due to these reasons, we focus on the disk drives in our energy management scheme. 35 As mentioned earlier, we use disks with partial spin-down capability. Such disks have multiple power modes including one active, one idle and a few sleep modes. In the active mode, the disk is actively performing a read or write operation. In the idle mode, the disk is not serving any request but is still spinning at full speed. In any of the sleep modes, the disk will have lower rotational speed and thus lower power consumption. However, some energy and time penalty will be incurred to spin up the disk in order to serve a request. The detailed disk model is presented in Section 3.6.1. To facilitate video delivery, video files are often divided into fixed size blocks in modern video services such as Netflix [82]. And large-scale video servers use paral- lel architecture to improve throughput and scalability, which means these equal-sized blocks can be stored onto different nodes. Note that here we are not talking about fine- level file striping on the disk-array level. Rather, the division happens on the storage node level, where any of the lower-level striping techniques might be applied. Due to the periodic nature of video retrieval, video servers usually adopt a round- based scheduling framework [36, 51, 34]. Time is divided into equal-sized rounds to facilitate disk scheduling and admission control. During each round, the SCAN algo- rithm batches a group of workloads together and accesses them in the order of track positions [80]. For admission control, as block size is fixed in our problem setting, we can simply apply a constant limit on the number of blocks that can be served per round. Another advantage of the the round-based framework is that the whole delivery process can be easily pipelined [88]: when a request arrives at rounda, one video block from a previous request will be fetched from disk to memory in rounda + 1, and another from memory to the network interface controller in rounda + 2. With fixed block size, we only need to decide the starting disk index for each video. The stripe length will simply equal the video size divided by the block size. Tradition- ally, we would like to concentrate the popular videos as much as possible to create more 36 Figure 3.5: Inter-arrival times for different disks energy-saving opportunity [76]. However, this approach requires frequent data migra- tion which is impractical for VSS with high dynamics of user and content. Also, high degree of concentration means severe load imbalance which affects system throughput especially during heavy-load period. In our study, we follow a heuristic data placement strategy that creates a reasonable level of skewness. Among the non-popular videos that are served from the back-end data center, we place files with relatively high access rate (top 5%) evenly across all disks. Then, for the most unpopular files, we place them using the concentration strategy. The resulting inter-arrival time distribution for different disks is shown in Figure 3.5. Although disk layout is not the focus of this chapter, efficient data placement algorithms can be easily incorporated into our scheme and improve the overall performance. Note that energy and power are two different concepts. Lower power levels often cause degradation in system performance which in turn prolong the time required to fin- ish a given task. Since energy is power multiplied by time, the total energy consumption may be increased in this case. Therefore, reducing power does not always lead to energy saving. In this chapter, we aim at optimizing the energy consumption of the system. 37 3.4 Workload Scheduling and Modeling 3.4.1 Parallel Video Workload Scheduling When cache miss is expected, the system needs to decide when to initiate a disk access operation. For the leading block of a video, it has to be done immediately to avoid further delay. For the rest of the blocks, however, disk access can be postponed because the typical data transfer rate is much higher than the playback rate of video streams even for HD videos. For each video block, we can define a maximum waiting time (MWT) to be the longest time it can wait without extra penalty on service delay. The MWT of thej th block of videoi can be obtained by calculating the playback time of all the preceding blocks as MWT i;j = (j 1)b r i ; (3.1) whereb is the block size andr i the bit-rate of videoi. Apparently, the deadline of a workload equals the arrival time of the request plus the MWT of the requested video block. With known deadlines, we can apply the earliest- deadline-first (EDF) scheduling algorithm which maximizes disk utilization and mini- mizes delay [61]. However, to save energy, it is desirable to schedule the workloads such that the disk can have longer idle periods. Also, for a typical server load, the average disk utilization is low [50]. Therefore, we can apply the EDF algorithm within a time period which we named scheduling window. For each new request arrival, the workload for the leading block will pre-empt and be scheduled to the following round, potentially delaying some non-leading blocks. Scheduling window controls the average length of disk idle time, which will be discussed in Section 3.6.2. 38 3.4.2 Modeling of Disk Idle Time The disk idle time plays a crucial role in energy management [110]. The arrival of client requests in video servers typically follows the Poisson process [51, 92]. Therefore, if we do not consider data striping, the disk idle time will be exponentially distributed. In our parallel delivery problem setting, however, we need a more complicated model. In general, a disk in sleep mode needs to be woken up if either of the two events happens: 1) a new request arrives and the leading block requires access to this disk; and 2) a non-leading block that requires disk access is due soon. As mentioned in Section 3.4.1, a workload for thej th block of videoi will be dueMWT i;j seconds after the request arrival time. Equivalently,MWT i;j represents how far ahead the deadline of such a workload will be known to the system. We can treat all workloads from the leading blocks and the unknown workloads from the non-leading blocks as random arrivals. We name this type of event random- triggered wake-up. This event can be modeled as the first arrival of a time-varying Poisson process. Figure 3.6 illustrates the model. For each disk, if we sort all non- leading blocks that are not cached according to their MWT s (denoted as k 0 , k 1 , ...), we can divide the future timeline into a number of intervals. Within thek 0 window, the arrival rate equals the aggregate rate of all uncached leading blocks in the disk. Beyond that, we will need to include the arrival rates of some uncached non-leading blocks: betweenk i andk j , the arrival rates of blocks withMWT s equalk i will be added. We denote the arrival rates to be 0 , 1 and so on. If there is some pending workload which has not been processed in the last schedul- ing cycle, its deadline serves as the deterministic upper bound for the future disk idle time. Let us denote the disk idle time as T s , the time until random-triggered wake- up to be T r , and the time until the first known deadline to be t D . Then, we have 39 Figure 3.6: Modeling disk wake-up T s = min (T r ;t D ). If there is no pending workload known to the server,T s will equal T r . Let us first assumet D 2 (k n1 ;k n ]. In this case, the time-varying Poisson process will consist ofn segments, within each of which the arrival rate is constant. The pdf of T s can be obtained by combining the time-varying Poisson process and the deterministic boundary, as shown in Equation (3.2). p(t) = 8 > > > < > > > : 0 e 0 t ift<k 0 i i e i t ift2 [k i1 ; min(k i ;t D )) n (tt D )e nt D iftt D (3.2) wherei = 1; 2;:::;n and x = Q x1 j=0 e ( j j+1 )k j . If the deadline of the first known arrival is so early that t D < k 0 , we will simply havep(t) = 0 e 0 t . On the other extreme, ift D > k m wherek m is the largestMWT of all uncached blocks in the disk,we setn =m. Finally, if we do not have any pending workload,t D goes to infinity. In this case, the last segment becomest2 (k m1 ;1), and there will be no deterministic boundary. Equation (3.2) serves as the foundation of our energy-delay optimization algorithm, as we shall see in Section 5. Its detailed derivation is presented in Section 3.8. Also, the important notations used in this chapter are listed in Table 3.1. 40 Table 3.1: Important Notations Symbols Meanings T Length of one scheduling round n l Maximum number of blocks a disk can process in one round T s Idle interval length of the disk T r Time until random-triggered wakeup t D Time until the first unscheduled known deadline k i Thei th smallest MWT of the un-cached blocks in the disk i The arrival rate in thei th time interval P i Power level of thei th low speed mode R i Spin-up time of thei th low speed mode O i Energy cost of disk wake-up from thei th low speed mode The Lagrangian multiplier that imposes rate constraints 3.5 Energy Optimization with Delay Constraint The goal of energy management here is to save energy by letting disks switch to sleep modes whenever possible without inducing excessive service delay. As different sleep modes have different power profiles and spin-up penalties, it is crucial to decide when and how deep the disks should sleep. For the former question, we use a very simple heuristic: if a disk has been idle for a number of rounds (set to 4 in our experiment), it will enter one of the sleep modes. For the latter, we will investigate two approaches in this section. Firstly, we briefly introduce the threshold-based approach which is widely applied in disk energy management [10, 35, 50, 51, 110]. Next, we present the proposed prediction-based mode transition (PMD) algorithm and discuss how it achieves optimal sleep mode decisions. 3.5.1 Threshold-Based Energy Management The threshold-based disk energy management policy exploits the energy consumption profile of disks with multiple power modes. The key concept is the break-even time [10, 110], which is defined as the minimum amount of idle time that would justify a 41 Figure 3.7: Energy consumption for a 4-speed disk partial spin-down operation. Break-even time can be obtained as follows. For modei and idle intervalt, the energy consumption can be calculated asE i (t) =P i (tR i )+O i , where P i , R i and O i are the power consumption, recovery time and spin-up energy for modei, respectively. Then, the minimum energy consumption can be obtained by plotting lines for E i (t) and finding the lower envelope of all lines [110], as shown in Figure 3.7 (assuming a 4-speed disk). The intersections t 1 ;t 2 ;t 3 are the break-even times. If we have perfect knowledge of the disk idle time, we can minimize the energy consumption by choosing a power mode according to the lower envelope. However, in practice it is not possible to know beforehand the future disk idle time. The practical approach is to use thresholds: after a disk has been idle at a power mode for a certain time, it is switched into the next lower power mode. One widely applied solution [110] is to use the break-even times as thresholds, which has been shown to be 2-competitive to the perfect energy minimization scheme [110]. 42 3.5.2 Prediction-Based Energy Optimization Although the threshold-based approach exploits disk idleness, it does not take into account the workload characteristics. Here we leverage the model presented in Sec- tion 3.4 to make optimal sleep mode decisions by prediction. To make decisions that minimize energy, we need to predict how energy consump- tion will be changed by using different power modes. To simplify the problem, we assume that the total service time and the average power consumption for the active periods will not be affected by mode decisions. The assumption is valid because we are dealing with energy consumption over a long period of time. Therefore, we can use the average power of the sleep and transition periods to measure the energy cost of each speed mode, which can be evaluated as follows: P i = P i T s +O i T s +R i (3.3) Another major problem with the threshold-based energy management scheme is that it focuses only on energy minimization and ignores the effect of sleep modes on service delay. To address this issue, let us predict how service delay will be affected by choosing different speed modes. Since the disk cannot serve any workload during the transition period, all workloads that are due in the period will suffer from extra delay in addition to the one-round cache miss delay. Also, if the number of pending workloads exceeds the admission limit for one round, the later-arrived requests will have further delays. Here, we employ a simple approach to estimate the total delay. Let k be the arrival rate in the segment where disk wake-up occurs (according to the time-varying Poisson 43 model introduced in Section 3.4.2), and assume that a = k R i n l is an integer, the total service delay caused by sleeping can be evaluated as: D i = a X j=1 (R i +jT ) = ( k T 2n 2 l + 1 n l ) k R 2 i + T 2n l k R i ; (3.4) whereT is the round length andn l is the admission limit per round. In general, deeper sleep modes will lead to lower energy consumption but a higher penalty for service delay. The goal of our energy management scheme is to minimize energy with a constraint on service delay, denoted byD c . This constrained minimization takes the following form: minfEg; subject toD<D c : (3.5) The optimization task in Equation (3.5) can be solved using the Lagrangian optimization where the delay term is weighted by a Lagrangian multiplier [43]. Following the Lagrangian formulation, the minimization problem can be written as minfCg; whereC =E +D: (3.6) By minimizing the Lagrangian energy-delay cost function C for a specific , we solve the same optimization problem for a particularD c in Equation (3.5). In the proposed energy management scheme, we will estimate the cost function for different power modes and choose the one that minimizes it, thereby optimizing the overall energy-delay performance. To do so, we will make use of the distribution of the disk idle timeT s as formulated in Equation (3.2) and estimate the Lagrangian cost function in Equation (3.6). We will useP i in Equation (3.3) as the energy term andD i 44 in Equation (3.4) as the delay term. Our objective functionJ i is defined as the expected value of the cost function for power modei and can be written mathematically as J i =E(P i +D i ) = Z t (P i +D i jT s =t)p(t) dt = (O i P i R i ) Z t p(t) t +R i dt +P i + Z t D i p(t) dt: (3.7) The delay termD i is a quadratic function of both k andR i . WhileR i is a constant, k will change with respect to time. To simplify our calculation, we can assume that it stays constant for each specific idle period. We can use the expectation ofT s to decide the corresponding value of k by mapping E[T s ] to the correct segment of the time- varying Poisson process shown in Figure 3.6. With this simplification, we have J i = (O i P i R i ) Z t p(t) t +R i dt + (P i +D i ): (3.8) To select the optimal modeM, we minimize the objective functionJ as M = argmin x J i : Note that the idle interval distribution will change with cache replacement opera- tions and request arrivals (i.e. the t D term). Therefore, our objective function needs to be evaluated in real-time. One issue is that the number of time segments defined by k 0 ;k 1 ;::: can be large if we have a lot of video blocks stored on the disk. To sim- plify the calculation, we can group blocks with similarMWT s together and reduce the number of segments. 45 In addition, the integration in Equation (3.8) is not bounded. Fortunately, we do not need very high precision for comparing a small number of power modes. Therefore, fast numerical methods suffice to serve our purpose. The PMD algorithm is applied every time the disk becomes idle with the following steps: 1. Update the idle interval model as shown in Figure 3.6 by evaluating the time intervals, the arrival rates and the deterministic bound. 2. Calculate the expected idle interval length and use it to select k . Then, calculate the estimated delay using Equation (3.4). 3. Calculate the cost function according to Equation (3.8) for each mode and pick the optimal mode. The Lagrangian multiplier can be adjusted to control the delay level. A larger will lead to lower service delay. 4. Switch the disk to the selected mode in the next round. PMD requires some basic workload information including the total arrival rate and the popularity distribution of video files. These values will not change frequently over time [49], and can be obtained using past statistics and simple prediction techniques such as the moving average model. 3.6 Experimental Results and Evaluation 3.6.1 Simulation Environment To evaluate our energy management algorithm, we simulate a parallel storage system based on Seagate Cheetah 15k.7 disk [85]. Also, we incorporate a shared cache mem- ory into our simulator. We use LRU as the cache replacement algorithm. For system parameters including the number of disks, round length and block size, we set them 46 Table 3.2: Simulation Parameters Number of Disks 20 Max. Disk Rotation Speed 15000 RPM Idle Power 11.6 W Active Power 16.3 W Cache Memory Size 32 GB Block Size 1.5 MB Round Length 0.15 secs Admission Limit per Round 4 according to the size of our dataset and the narrow striping policy as introduced in Sec- tion 3.3. The admission limit is set by calculating how many video blocks the disk can read per round, taking into account the seek and rotational overhead. Depending on the workload, the active energy consumption per round is calculated using the model proposed in [97]. The related disk and system parameters are shown in Table 3.2. For parameters associated with different speed modes, we refer to both academic papers [39, 50, 51] and product specifications [73, 86]. We notice that for spin-up recovery time, there is a wide gap between theory and practice. For example, in [39] and [50], the transition times are calculated to be in the order of 10 ms and a linear relation is assumed between the spin-up time and the speed difference. In real products [73, 86], their magnitudes range from order of 100 ms to 10 s, and the relation is clearly not linear. In our simulation, we make compromise between the two. We assumed 5 low-speed modes: LS1 to LS5, with rotation speeds from highest to lowest. We set the spin-up times of the LS1 to the order of 100 ms and used a linear model to extrapolate the next two speed modes (LS2 and LS3). For the lowest speed modes (LS4 and LS5), we added some extra penalties which can be observed in real products. The parameters for different speed modes are shown in Table 3.3. We generated the client requests using an online YouTube dataset available at [15]. We used the newest available dataset (from 2008) which contains the view counts for 47 Table 3.3: Parameters for Different Speed Modes Mode LS1 LS2 LS3 LS4 LS5 RPM 13000 11000 8000 5000 1000 Idle Power (W) 10.3 9.01 7.1 5.13 2.6 Recovery Time (s) 0.15 0.3 0.6 1.2 2.1 Figure 3.8: Average idle time and scheduling window 161,085 videos for over 3 weeks. We used it to decide the relative popularities of videos and to calculate the aggregate arrival rate. Then, we combined it with another dataset from the same source with video size and bitrate information. Since the typical video request arrival pattern follows a Poisson process [51, 92], we generated requests by directing exponential arrivals to videos according to their relative popularities. 3.6.2 Effects of Scheduling Window The concept of scheduling window was introduced above in Section 3.4.1. Basically, it decides the degree of workload concentration in time. Workloads are more scattered around with a small scheduling window, while they tend to be served together with a large window. For energy management, scheduling window plays an important role because it affects the length of disk idleness. Figure 3.8 shows how we can control the average idle time by adjusting the scheduling window. 48 Figure 3.9: Energy-Delay Performance for PMD under different scheduling window sizes (a) Inter-Arrival Time: 30 ms (b) Inter-Arrival Time: 40 ms (c) Inter-Arrival Time: 50 ms (d) Inter-Arrival Time: 100 ms Figure 3.10: Comparison of Energy-Delay Performance In general, having a large scheduling window is advantageous because longer idle periods can be exploited for energy saving. Yet, longer disk idle time means more frequent use of lower speed modes which can potentially lead to more delays. This is 49 true for the threshold-based algorithm, which does not have a delay control mechanism. For our PMD algorithm, however, it is safe to apply a large scheduling window because we jointly optimize energy and service delay. To verify our claim, we show the plot of the energy-delay curve in Figure 3.9. The average power was obtained by dividing the total energy by the system running time which is fixed (30 minutes). Therefore, it is equivalent to the system energy consump- tion. Service delay was calculated for each request which sums up the initial latency and all extra delays due to non-leading blocks. Different delay levels were achieved using different values of. As indicated in the figure, even for a large scheduling window of 15 secs, the energy-delay performance continues to improve when we increase the win- dow size. Moreover, even in the low-delay region, increasing the scheduling window will not affect the performance of energy saving. 3.6.3 Energy-Delay Performance We evaluate the energy-delay performance of both the threshold-based energy man- agement (TEM) and our PMD algorithm by comparing how energy can be reduced by allowing larger average service delay. As mentioned in Section 3.2, the inter-arrival time is decided by the percentage of CDN-cached videos and is calculated to be between 30 ms and 100 ms for CDN-caching percentage ranging from 3% to 15%. For each setting, we simulated the system for 30 minutes. In PMD, we can adjust to impose different constraints on service delay. We fix the scheduling window size to 22.5 secs. For TEM, there is not really a mechanism to control delays. Therefore, we need to change the scheduling window to adjust the average length of the disk idle time, thereby achieving different delay levels. We plot the average service delay against the average power consumption (which is equivalent to energy) of the system in Figure 3.10. Due to the “long-tail” effect as 50 mentioned in Section 3.2, the cache hit rate is less than 30% in our experiment. This means that around 70% of the blocks will be subject to a cache miss penalty of one- round (0.15 sec in our setting). Therefore, we expect the minimum average service delay to be around 0.1 sec. The leftmost parts of the figures correspond to this region, which represents extremely tight delay constraints. In all settings, PMD consistently outperforms TEM. The energy saving gains of PMD over TEM are shown for both low- and high-delay regions. Not surprisingly, PMD save more energy for longer inter-arrival times. Short inter-arrival times indicate high server load, which will inevitably lead to much less disk idleness. Since PMD does not deal with the energy consumption of the active periods, the portion of energy we can optimize is much smaller during high server load periods. Furthermore, the gain is more significant if we allow more delays in the system because there is more room to exploit all the power modes. In these regions, PMD makes more intelligent use of the lower speed modes by prediction compared to TEM, under which the energy savings achieved by lower speed modes are largely offset by the high delay penalties. As shown in Figure 3.10, the energy saving gain of PMD under less strict delay constraint ranges from 16.4% in high server load to 35.1% in low server load. In low-delay regions, PMD does not gain as much because the lower speed modes are unlikely to be selected in order to satisfy the tight delay budget. Still, PMD consumes 4.6% 10.3% less energy than TEM even in the minimum-delay end. The main reason it works well in these regions is that PMD allows the system to apply a large scheduling window, while still having a firm control on service delay. Finally, it can be observed from the low-load scenario in Figure 3.10 (c) that TEM is unable to achieve low service delay. The reason is that we have very long idle periods under low server load, which leads to frequent use of lower power modes. Without delay control, these modes will incur excessive spin-up time penalties. As energy saving 51 techniques are most commonly applied when server loads are low, the ability of PMD to offer low-delay guarantee under these scenarios is promising for practical real-time servers. Overall, PMD exhibits superior energy-delay performance compared to TEM under different server loads. Energy saving is more evident if we do not require minimal service delay and when the server load is lower. Also, PMD can enable the system to operate in a wider range of energy levels and delay requirements. Moreover, PMD is able to provide very tight delay constraints under all settings. 3.7 Conclusions In this chapter, we demonstrated that energy and service delay can be jointly optimized in large-scale video sharing servers. More specifically, we investigated the characteris- tics of video-sharing workload in a parallel system and developed a model for the disk idle time to assist the power mode decision at the disk level. Based on the model, we proposed a novel energy management scheme that achieves energy minimization under a wide range of delay constraints. Compared to the traditional threshold-based approach, our algorithm can save up to 16% 35% more energy under the same delay constraint. Furthermore, PMD can offer low delay guarantee which is not achievable by TEM. Our work can be extended in several areas. First, we are in the process of design- ing better data placement policies that can improve the performance of the PMD algo- rithm. Second, we only optimized energy consumption for the idle periods of the disk, which prevented us from achieving more energy saving especially under high server load. We can investigate the use of more efficient caching algorithms or full-DRPM [39] to improve our scheme along this direction. Third, we have yet to examine the 52 energy consumption of memory cache. We are interested in extending our EDO frame- work to include memory energy. Finally, we plan to consider the storage of video with multiple layers (as in scalable video coding) and error resilience information. 3.8 Appendix: Distribution of Disk Idle Time The model of disk idle interval has been introduced in Section 3.4.2. We know that T s = min(T r ;t D ). Let the MWTs (k 0 ;k 1 ;:::;k m ) divide the timeline as shown in Figure 3.6. The distribution of T r follows a time-varying Poisson process, where we have a constant arrival rate i in interval [k i1 ;k i ). LetN(t) be the number of arrivals in the interval [0;t]. Then, we can evaluate the probabilityP (T r >t). For the interval t k 0 , P (T r > t) = P [N(t) = 0] = e 0 t . For interval t2 (k i1 ;k i ) whereim, the term can be derived as follows: P (T r >t) =P [N(t) = 0] =P [N(k 0 ) = 0] i1 Y j=1 P [N(k j )N(k j1 ) = 0] P [N(t)N(k i1 ) = 0] =e 0 k 0 ( i1 Y j=1 e j (k j k j1 ) )e i (tk i1 ) = i1 Y j=0 e ( j j+1 )k j e i t The cdf ofT r is simplyP (T r t) = 1P (T r >t). The arrival rate in intervalt2 (k m ;1) is m . Thus, the cdf in that interval will be the same as that int2 (k m1 ;k m ]. With a deterministic upper boundt D , the cdf of the disk idle time will become: 53 P (T s t) = 8 < : P (T r t) ift<t D 1 iftt D Now assumet D 2 (k n1 ;k n ] and let x = Q x1 j=0 e ( j j+1 )k j , we can calculate the cdf for each segment, starting from (0;k 0 ) to [k n1 ;t D ). Then, the cdf can be written as: P (T s t) = 8 > > > < > > > : 1e 0 t ift<k 0 1 i e i t ift2 [k i1 ; min(k i ;t D )) 1 iftt D ; wherei = 1; 2;:::;n. Note that the cdf has a jump at t = t D caused by the cut-off effect of t D . More specifically, we haveP (T s =t D ) =P (T r t D ) = n e nt D . Taking derivative of the cdf, we obtain the pdf ofT s : p(t) = 8 > > > < > > > : 0 e 0 t ift<k 0 i i e i t ift2 [k i1 ; min(k i ;t D )) n (tt D )e nt D iftt D ; wherei = 1; 2;:::;n. 54 Chapter 4 Energy-Aware Cache Management for Video Sharing Servers 4.1 Introduction In the last chapter, we presented the PMD algorithm. Based on the model of disk idle time, PMD converts the power mode decision problem to a constrained optimiza- tion. More specifically, PMD selects optimal sleep modes for disks and minimizes disk energy under delay constraints. Disk access operations have high energy and time cost. Fortunately, not all accesses to storage system need to go to disks. If the requested blocks are placed in the storage cache, we can avoid disk accesses. Modern storage systems adopt large storage cache to reduce the number of disk accesses. Storage cache can be either volatile or non-volatile memory. Latest EMC Symmetrix storage system with hundreds of hard drives can have up to 128 GB of memory cache [23]. Note that storage cache is different from disk buffers, which are typically small (1-4 MB). Cache management policy is central to system performance. In particular, different cache management policies lead to different disk access sequences. Traditional caching algorithms , including LFU and LRU, focus on reducing disk access. While less number of disk accesses generally lead to reduced disk energy, minimizing disk access will not lead to minimized energy consumption. The reason is that disk energy saving is mainly achieved through increasing disk idle time and putting disk into sleep mode. If disk 55 Table 4.1: Low power modes and corresponding recovery penalty Mode Description Power Savings Recovery Times (s) L1 Active idle 0% 0 L2 Unloaded heads 23% 0.5 L3 Reduced speed 35% 1 L4 Stopped motor 54% 8 access is minimized but the sequence of disk access operations would wake up the disk often, the energy efficiency will be low. In this chapter, we study the effects of cache management policies on disk energy efficiency. More specifically, we look into how to better use the cache to improve scheduling efficiency and reduce the mode transition overheads. We continue to use the system model and mathematical framework introduced in Chapter 3. We propose two algorithms to optimize cache utilization for energy efficiency. First, we develop a new caching utility for cache replacement, which takes into account both disk access and mode transition costs. Second, we design a prefetching algorithm that effectively reduces the number of mode transitions for disks. We used DRPM as the disk model in Chapter 3. Although DRPM technology has been proposed for a decade [39], industrial adoption has been slow. Up to the time this thesis is written, practical disks only have one low RPM mode. To demon- strate that our framework is effective for currently available disks as well, we adopt a different disk model in this chapter based on the technical description of Seagate R PowerChoice TM [86], as summarized in Table 4.1. The rest of the chapter is organized as follows. Section 4.2 discusses the effects of scheduling and caching in energy efficiency. Section 4.3 presents our energy-delay- optmized cache replacement scheme. Section 4.4 proposes the prediction-based energy- efficient prefetching algorithm. Section 4.5 evaluates the proposed algorithms. Sec- tion 4.6 concludes the chapter. 56 4.2 Workload Scheduling and Energy Efficiency In this section, we study the factors that affect energy efficiency. In particular, we look at the roles caching and scheduling play in determining the overall energy efficiency. Please note that to measure ”energy efficiency”, we use the concept of energy-delay cost as introduced in Chapter 3. 4.2.1 Review of Workload Scheduling In Section 3.4, we introduced the workload scheduling algorithm in the video sharing server. At the disk level, the system should decide when to initiate accesses when cache misses occur. Disk access has to be launched immediately if the pending workload is created for the leading block of a certain video; otherwise, service delay will increase. For other workloads, there is no need to fetch the blocks right away because the data transfer rate is typically much higher than the playback rate of any video. The longest time that a disk access operation can be delayed equals the playback time of all blocks preceding the requested one. We define this quantity as the maximum waiting time (MWT). TheMWT of thej th block of videoi can be written as MWT i;j = (j 1)b r i ; (4.1) whereb is the block size andr i the bit-rate of videoi. If the workload is to access the j th block of videoi, its deadline can be found via Deadline i;j =t req +MWT i;j +AD j1 i ; (4.2) wheret req is the arrival time of the request, andAD (j1) i is the accumulated delay of the request up to the time the previous block is delivered. For the leading block, the deadline 57 will be its arrival time. If a deadline is already missed for a block, the deadlines for the following blocks will be extended since they will only be needed when the preceding block has been played. With Equation (4.2) in place, we may apply the earliest-deadline-first (EDF) scheduling algorithm that maximizes disk utilization and minimizes delay [61]. How- ever, we do not process a workload whose deadline is too far from the current time for two reasons. First, accessing these blocks in a unconstrained manner may lead to many out-of-order deliveries. Second, under EDF, the disk is likely to be kept busy fetching blocks for future workloads and has less opportunity to switch to low-power modes. To prevent these undesirable effects, we apply a window constraint to the EDF scheduling. In other words, only workloads with deadlines within the window will be scheduled in the current service cycle. While EDF can achieve optimal utilization, the performance of our windowed EDF scheduling is only insignificantly affected. The reason is that the average disk utilization in a typical video server is low as most popular videos go to CDN and storage cache [50]. The effect of scheduling window has been discussed in Section 3.6.2 and will be elaborated in more details in this chapter. 4.2.2 Scheduling Window and Disk Idle Time In the last chapter, we examined how to make the optimal power mode decision by evaluating the energy-delay cost associated with each mode. While PMD makes opti- mal mode decision, it cannot optimizse the disk access sequence, which directly affects the overall energy efficiency. More specifically, the pattern in which disks are being accessed decides the average idle length for each sleep period, which is of fundamental importance in energy management [110]. One way to alter the disk access sequence is to change the size of the scheduling win- dow. Particularly, a larger scheduling window can typically lead to longer disk idle time. 58 Figure 4.1: Average idle time and scheduling window under PEDO The advantage of PMD in containing delays makes it work well with large scheduling windows, which is the reason we can achieve enhanced energy-delay performance com- pared to threshold-based approaches, as demonstrated in Chapter 3. However, the gain of applying a large scheduling window with PMD will saturate and diminishes. This can be illustrated by plotting the average idle length with different scheduling window sizes under PMD, as shown in Figure 4.1. We can see that when the window size is large ( 20 secs), continuing to increase it will hardly increase the average idle time. The reason lies in the fact that changing scheduling window sizes doesn not change the workload arrival sequence. To schedule a workload, the system should know its deadline, which is MWT seconds after its request arrival time. Thus, blocks with shorterMWT are more difficult to schedule in a timely manner. Even when we increase the scheduling window size, the difficulty associated with these blocks remains. We call them the less schedulable blocks. 59 4.2.3 Workload Schedulability Except for power mode decision, mode transition overhead is also a major factor that determines the overall energy efficiency. In the previous subsection, we notice the effect of workload schedubility on the average disk idle time, which is directly related to mode transition overhead. In fact, given a similar level of total idleness, which is decided by the workload arrival pattern, the average idle length is inversely proportional to the number of mode transitions. As the existence of less schedulable blocks reduces the disk idle length, it will increase the number of mode transitions. Furthermore, less schedulable blocks can incur higher penalties when mode transi- tions take place. Assume that the disk is in modei, it will takeR i for the disk to return to the active mode. Since the disk cannot serve during transition periods, any workload that is due before the disk wakes up will be delayed. However, if theMWT is greater thanR i , we can wake up the disk early enough so that the disk will be able to deliver the requested block in time. In this case, the extra delay cost can be hidden. On the contrary, the less schedule blocks will likely to miss the deadline due to their shorterMWT s. Note that the total mode transition penalty is the product of the number of transi- tions and the penalty per transition. As schedulability contributes to both terms in the product, it directly impacts the overall mode transition overhead of the server system. Notably, the less schedulable blocks becomes the bottleneck to system energy efficiency. To support our argument, we look at how energy-delay performance changes with dif- ferent scheduling window sizes, as plotted in Figure 4.2. We can see that while we see significant improvement in the high-delay regions by increasing the window size from 3 secs to 22.5 secs, there is virtually no gain from 22.5 secs to 30 secs. This is consistent with the fact that the average idle length hardly increases after reaching 20 seconds, as shown in Figure 4.1. 60 Figure 4.2: Energy-delay performance for PEDO under different scheduling window sizes More interestingly, there is very little improvement in the low-delay regions even when the window size changes from 4.5 secs to 22.5 secs. This can be attributed by the fact that is that the use of low power modes becomes very conservative in order to satisfy tight delay constraint. In other words, the expensive cost of mode transition overheads overshadows the advantages of having a large scheduling window when low- delay constraints are imposed. To further improve the energy-delay performance, we need find ways to improve the schedulability of disk-accessing workloads and reduce mode transition penalties. 4.3 Energy-Delay Optimized Cache Replacement As mentioned in the last section, blocks with shorter MWT s are less schedulable, which leads to increased mode transition overheads and subsequently reduced energy efficiency. Note that we are only concerned with workloads that require disk access. 61 In other words, if the less schedulable blocks can be put into the cache, they will not require disk access, and the overall schedulability can be improved. Furthermore, the deeper sleep mode a disk operates at, the more severe the transition penalty will be. Therefore, it is wise to keep in cache blocks that reside in disks that are at or will be transited to a lower power mode. Similar to workload schedulability, this effect is related to mode transition penalties. Disk accesses will fetch video blocks to cache. Cache replacement algorithms decide which blocks to be kept in the cache and which ones to be discarded. While traditional cache replacement algorithms reduce the number of disk access and the corresponding energy consumption, they do not address the mode transition overhead, which can be equally important in overall energy efficiency. 4.3.1 Disk access and mode transistion costs Essentially, we need a caching policy that takes into account both the disk access and the mode transition costs. Here the cost can be defined as the Lagrangian cost function as we defined in Section 3.5.2: minfCg; whereC =E +D: (4.3) The energy-efficient cache utility function should measure the reduction of the joint energy-delay cost if a block is in the cache. This is equivalent to the additional cost the block incurs on the system if it is not in the cache. Using this guideline, we can derive the cache utility as follows. First, we calculate the cost of the block if it is requested and causes disk accesses. To simplify the derivation, we assume that this happens when the disk is already awake. The scenario of the workload waking up the disk will be discussed later. Energy-wise, the 62 disk processes one more workload. Without the workload, the disk can process anything from 1 to n l 1 workloads in that round. We thereby use the average increment in average power to quantify this effect. For the delay cost, assuming there is no additional queuing delay, the workload incurs a one-round cache miss delay only if it is a leading block. Summarizing the above, the cost of accessing block (i;j) (thej th block of video i) is: C a (i;j) = 1 n l n l 1 X w=0 (P a (w + 1)P a (w)) +TI j=0 ; (4.4) whereP a (x) is the average power of a round whenx workloads are accessed by a disk, andI j=0 is 1 only when it is a leading block and stays 0, otherwise. Next, we calculate the cost of a mode transition when it takes place. If a workload happens to cause a disk that operates at modem to wake up, the additional delay for the workload associated with block (i;j) can be written as l i;j = max(T +R m MWT i;j ; 0); (4.5) whereT is the scheduling round length andR m stands for the recovery time. In addition to the delay penalty, we need to calculate the energy term in the cost func- tion, which is simply the average power of transitional rounds. Obviously, the energy cost depends on the current disk mode. However, even if the disk is currently active, cache replacement decisions can still affect mode transitions in the future. To account for this effect, we use the power mode of the last sleep cycle if the disk is in active mode. If being triggered by block (i;j), the mode transition cost becomes C r (i;j) = O m R m +l i;j ; (4.6) 63 wherem is the current power mode or the previous sleep mode if the disk is currently active. 4.3.2 Combining Access and Transition Costs An ideal utility should combine the above two cost functions. To do so, we calculate the expected number of each of these events. For the former event (accessing the disk for block (i;j)), the expected number of accesses should be proportional to the average request rate of video i, denoted by p i , if we look at long-term statistics. However, the existence of pending workloads shapes the access patterns in short term, which in turn affects cache efficiency. The impact of this short-term information varies for different blocks. The most popular blocks, for example, are expected to be requested a lot more often in the long run than the least popular blocks in the cache. Therefore, short-term knowledge (i.e. the deadlines of pending workloads) are not so important for the popular blocks as for the unpopular ones. To make the best use of both long-term statistics and the existing information of pending deadlines, we define the access frequency as the expected number of workloads over a certain time window. The size of the window decides how much short-term knowledge will be taken into account. The window size will vary from block to block, depending on how long the block is expected to stay in cache. For popular blocks, we use a larger window so that we can benefit from the long-term expectation that they will be accessed often. For unpopular blocks, the turnover rate is expected to be high, which means that the block can be swapped out of the cache very soon. In this case, the available deterministic information brings more benefits than its popularity. Thus, it is advantageous to apply a small window for them. 64 In short, the size of the time window should relate to the popularities of videos, which follow an exponential distribution, as shown in Figure 3.2b. Therefore, it is reasonable to define the time window based on an exponential function. But if the block has pending requests, the time window should be large enough to include the earliest time all these workloads can be served. Based on these two principles, we design the window function as follows. For each of the cached blocks (i;j), we can obtain a rank,r i;j , based on it’s popu- larity. If the block has pending requests, the deadline of the latest workload is denoted byT p (i;j). Then, the time window is calculated by L i;j = max(e r i;j ;T p (i;j)T w ); (4.7) whereT w is the size of scheduling window we discussed in Section 3.4.1, and and are trained parameters (selected to be 2000 and -4.610 4 , respectively, in our experi- ments). After getting the window size, we can estimate the access frequency by combining the deterministic and stochastic information. Recalling the constraints on deadlines due to MWT, we have f i;j = 8 > < > : n L i;j ifL i;j MWT i;j ; p i (L i;j MWT i;j ) +n L i;j otherwise; (4.8) wheren is the number of pending workloads for block (i;j). The latter event (disk waking up by block (i;j)) is part of the disk re-activation pro- cess. Its expectation is proportional to the contribution of block (i;j) to the time-varying Poisson process as described in Section 3.4.2. With the inclusion of block (i;j), we can recalculate the new pdf of disk idle time T s and its expectation using Equation (3.2). 65 Assuming that this new disk idle time is T 0 s , the expected increase in the number of mode transitions is proportional to i;j = 1 E(T 0 s ) 1 E(T s ) : (4.9) Finally, we can combine the two costs with proper weights by their expected numbers of occurrences to obtain the final utility function: C i;j =f i;j C a (i;j) + i;j C r (i;j): (4.10) Note that as MWT becomes smaller, the weight of C r grows more quickly. In particular, ifMWT is 0, which means the block is a leading block, i;j is very close top i . In this case, the mode transition penalty has almost the same weight as the disk access cost. Furthermore, the disk access cost for leading blocks is greater as there will be one-round cache miss delay. Thus, the utility function biases greatly towards leading blocks. On the contrary, blocks with very long MWT s have little contribution to the wake-up penalty. We can notice that this utility function depends on the current disk mode. It makes sense since if a disk is currently in a sleep mode, this utility will account for the direct mode transition cost. However, even if the disk is currently active, cache replacement decisions can still affect mode transition in the future. To account for this effect, we can use the power mode of the last sleep cycle. In the Energy-Delay Optimized Cache Replacement (EDOC) algorithm, Equa- tion (4.10) is evaluated whenever there is a cache miss and the cache is full. The block with the least caching utility is victimized and swapped out of the cache. In this way, the cache replacement decision leads to a lower expected energy-delay cost than LRU/LFU. We present the performance of EDOC in Section 4.5. 66 4.4 Prediction-Based Energy-Efficient Prefetching The main idea of EDOC is to put less schedulable blocks into the cache so that the disk- access workload is easier to schedule. In this way, the cache is better utilized as far as energy efficiency is concerned. The cache replacement operations are however passive. That is, it only occurs when there are cache misses, which are rare under low service loads. To further improve the “energy awareness”, we can develop a proactive caching policy. More specifically, we can fetch some less schedulable blocks selectively into the cache before disks go to sleep. Since a disk will stay idle for a few rounds before it enters the sleep mode as described in Section 3.5, these periods can be utilized for prefetching. 4.4.1 Prefetching Decisions It is important to note that the number of disk accesses may be increased with prefetch- ing, as we need to fetch additional blocks from disks. Therefore, we have a trade-off between the number of disk accesses and the workload schedulability (or mode transi- tion penalties). To optimize this tradeoff, we need to decide the number of blocks that will be prefetched. As the time of prefetching decision is close to the time when the system decides for a disk to go to a sleep mode, the two decisions can be combined together. Mathematically, we need to incorporate the cost of prefetching operations into the for- mulation of PEDO, which was proposed in Section 3.5.2. Recall that the Lagragian cost function we formulated for PEDO in Equation 3.7 has an energy term and a delay term. The energy term is the average power of the non- active periods, which include sleep and transition periods. To incorporate the effect of 67 prefetching, the prefetching energy and time will need to be added. Assume we are to prefetchm blocks and switch the disk to thei th mode, the energy cost can be evaluated as: P (i;m) = E p (m) +P i T s (m) +O i T p (m) +T s (m) +R i ; (4.11) whereE p (m) is the prefetching energy andT p (m) is the prefetching time. These two terms only depend onm and can be easily calculated from the disk power model. The sleep time,T s , is a function ofm since the prefetching decision affects the distribution of disk idle time . In Section 3.5.2, we simplified the derivation of the delay estimation by assuming the arrival rate stays constant with respect to time. To make it more accurate for our prefetching decisions, we can use the time-varying Poisson model as shown in Equa- tion 3.2. Recall from Equation 3.4, the delay cost for modei can be estimated as fol- lows: D i = a X j=1 (R i +jT ) = TR 2 i 2n 2 l 2 + ( (T + 2R)R i 2n l ): (4.12) With the arrival rate varying over different time intervals indexed byk as shown in Figure 3.6, the expected delay cost becomes: E(D i ) = TR 2 i 2n 2 l Z t 2 (t)p(t)dt + (T + 2R)R i 2n l Z t (t)p(t)dt (4.13) The integral R t r (t)p(t)dt can be derived generally for any r by substituting the pdf ofT s (Equation 3.2) into Equation 4.13. The expression for the most general case 68 (where the first known deadlinet D exists and is no less thenk 0 ) can be derived from the pdf of the disk idle time (Equation 3.2), as written in Equation 4.14. E[ r ] = Z t r (t)p(t)dt = r 0 Z k 0 0 0 e 0 t dt + n1 X j=1 r j Z k j k j1 j j e j t dt + r n ( Z t D k n1 n n e nt dt + n e nt D ) = r 0 (1e 0 k 0 ) + n1 X j=1 r j j (e j k j1 e j k j ) + r n n e nk n1 ; (4.14) wherei = 1; 2;:::;n and x = Q x1 j=0 e ( j j+1 )k j . The expression for the exceptional cases (with very small or very largeMWT ) can be trivially derived. In the proposed prefetching algorithm, we will estimate the expected cost function for each selection of power modei and each possible number of prefetching blocksm. With prefetching incorporated into the formulation, the arrival rates 0 ; 1 ; will vary with different prefetching decisions. Also, the distributionp(t) will also be dependent onm. Therefore, following the similar procedures as we did for PEDO in Section 3.5.2, the objective functionJ(i;m) can be written as: J(i;m) =E(P (i;m) +D(i;m)) = Z t (P (i;m) +D(i;m)jT s (m) =t)p m (t) dt = (E p (m) +O i P i R i ) Z t p m (t) t +R i +T p (m) dt +P i +E(D(i;m)): (4.15) The calculation of p m (t) and for different m is straightforward as we only need to change the set of arrival rates 0 ; 1 ; . With the same sets of arrival rates, we can apply Equation 4.13 to derive D(i;m) for different m. The other details will be discussed in the next subsection. 69 4.4.2 Prefetching Process Before we can make a prefetching decision as described in the previous subsection, we need to find the set of blocks, which reside in the disk, to be the candidates of the prefetching action. For this purpose, we can use the caching utility function in Equation (4.10). Since prefetching incurs unnecessary accesses to disks for the benefit of better schedulability, it calls for precaution. In particular, we should not allow too many prefetching accesses. Moreover, there is a risk that the disk may not enter any sleep mode after prefetching. In this case, these extra accesses can be wasted. To address these two issues, we may limit the number of blocks to be prefetched and postpone the prefetching operation. They are implemented as follows. Recall that we allow a fixed number of blocks,n l , to be accessed from a disk in each round. To maximize utilization, we should make the disk as busy as possible. Thus, the prefetching time can be decided by the number of prefetched blocks via T p (m) =d m n l eT: (4.16) To restrict the number of prefetched blocks, we impose a cap on the number of prefetch- ing rounds. Letm p denote the maximum number of prefetching rounds andn p be the actual number of prefetching rounds (n p =d m n l e). The disk does not enter any sleep mode until it has been idle for a few rounds, and we usen t to denote this threshold. To ensure maximum disk sleep time, we demandm p <n t . The whole prefetching process can be summarized below. 1. When the disk has been idle forn t m p rounds, we go through the list of blocks and find the uncached ones with the highestm p caching utilities as our prefetching candidates. 70 2. Perform the prefetching decision by calculating the prefetching utilities for all modes and all prefetching options (from 1 to m p ). Decide the optimal decision pair (i th mode,m blocks to prefetch). 3. Calculate the actual prefetching roundsk =d m n l e. 4. If the disk has been idle forn t k rounds, start prefetching blocks. 5. If there are new workloads due in the current round, abort prefetching operations. Otherwise, wait until prefetching finishes and switch the disk to thei th low power mode. The prediction-based energy-efficient prefetching (PEEP) algorithm requires the decision routine to be run by m p n t times, while PMD only requires n t iterations. The complexity of the two routines is almost the same. Since the complexity of PMD is low, PEEP will not be computationally intensive either. 4.5 Experimental Results and Evaluation The simulation environment is largely the same as that in Chapter 3, except that a more practical low-power technology is used. We use the same simulator based on the Sea- gate Cheetah 15k.7 disk [85]. The simulation and system parameters can be found in Tables 3.2. The low power modes were simulated according to Table 4.1. In this section, we evaluate the performance of three proposed schemes: 1. PMD with LFU cache replacement (PMD); 2. PMD with EDOC cache replacement (PMD+EDOC); 3. PEEP with EDOC cache replacement and up to 8 blocks to prefetch each time (PEEP+EDOC). 71 Figure 4.3: Average idle time and scheduling window sizes The benchmark algorithm is the threshold-based mode decision (TMD) scheme as intro- duced in Section 3.5.1. 4.5.1 Schedulability Analysis In Section 4.2, we addressed the schedulability issue by looking at the effects of schedul- ing window on the average disk idle time and the overall energy efficiency for PMD. In this section, we analyze the performance of PMD+EDOC and PEEP+EDOC by con- ducting the same experiments. The results were obtained by running the system for 30 minutes with an inter-arrival time of 40 ms, which indicates medium server load. We first plot the average disk idle time with different scheduling window sizes under the three algorithms in Figure 4.3. As we can see, PMD+EDOC increases the average idle length over PMD and PEEP+EDOC improves over PMD+EDOC. More importantly, the average idle length continues to increase under PEEP+EDOC and PMD+EDOC well beyond 20 seconds, while in PMD the increase saturates there. This 72 Figure 4.4: Energy-Delay Performance of PEEP under different scheduling window sizes indicates that the disk access sequence has been reshaped and the number of less schedu- lable workloads reduces. Recall that in Section 4.2, the use of a large scheduling window only improves the high-delay part of the energy-delay performance and no improvement is shown at all when the scheduling window is increased from 22.5 seconds to 30 seconds. Here, we show the results for PEEP+EDOC in Figure 4.4. As shown in the figure, the improve- ment is significant over all delay ranges. From window size of 3 seconds to that of 30 seconds, the energy consumption is reduced by 25% under the average delay con- straint 0.12 second. An extra 3% saving is achieved by increasing the window size to 30 seconds. To conclude, the proposed algorithms enable us to apply a large scheduling win- dow to achieve more energy saving without sacrificing the service delay. In particu- lar, PMD+EDOC and PEEP+EDOC lead to better cache utilization and improved disk access sequence since disk-access workloads become more schedulable. As a result, we can take more advantage of large scheduling window sizes. 73 4.5.2 Comparison of Energy-Delay Performance (a) Inter-Arrival Time: 30 ms (b) Inter-Arrival Time: 40 ms (c) Inter-Arrival Time: 50 ms (d) Inter-Arrival Time: 100 ms Figure 4.5: Comparison of the energy-delay performance of four algorithms with different inter- arrival times. (a) Average Delay: 0.1 s (b) Average Delay: 0.12 s Figure 4.6: Comparison of energy consumption with low delay levels. 74 We evaluate the performance of the three algorithms by plotting the energy-delay curves in Figure 4.5. We used the same experiment setting as we did in Chapter 3. We simulated the system for 30 minutes and tested under four levels of server load by changing the inter-arrival time. For EDOC, we do not need any additional parameters. For PEEP, we set the number of prefetching rounds to 2, allowing the system to prefetch up to 8 blocks every time (since a disk can access 4 blocks per round). Following the observation in Sections 4.2.3 and 4.5.1, we set the scheduling win- dow size to 22.5 seconds as increasing it further would not improve the performance. For EDOC and PEEP, the improved schedulability allows us to use a larger window to achieve more energy savings. Therefore, we increased the window size to 30 seconds in these two algorithms. Performance of PMD under the new disk model Although the performance of PMD has been evaluated extensively in Chapter 3, we shall analyze the performance under the new disk model, which is based on a currently available disk and more conservative than the one we adopted in Chapter 3. As we can see from Figure 4.5, PMD still outperforms TMD is all settings under the new disk model. To better evaluate the performance, the percentages of energy saving of PMD over TMD with two averaged delay levels (0.12 seconds and 0.14 seconds) and four inter-arrival times (30ms, 40ms, 50ms and 100ms) are shown in Table 4.2. Table 4.2: Energy saving of PMD over TMD Average Delay 30 ms 40 ms 50 ms 100 ms 0.12 s 5.7% 7.5% 8.5% 11.0% 0.14 s 7.6% 9.7% 11.2% 15.0% As shown in Table 4.2, the percentage of energy reduction of PMD over TMD ranges from 7.6% to 15.0% in the longer delay scenario (0.16s), and from 5.7% to 11.0% in 75 the shorter delay scenario (0.12s). We see that PMD saves more energy for longer inter- arrival times, which indicates lower service loads. This is expected because the amount of disk idleness will inevitably reduce when we increase the load. Since PMD does not address energy consumption in active periods, the portion of energy to be optimized is smaller in higher load periods. Energy saving is more significant if we allow longer delay in the system since there is more room to utilize the low power modes. In these regions, PMD makes more intelligent use of the lower power modes by prediction than TMD, under which energy saving achieved by the lower power modes is largely offset by the high delay penalty. Considerable energy reduction is also achieved in the shorter delay scenario. The main reason is that PMD allows the system to use a large scheduling window while having a firm control on service delay. However, PMD does not gain as much in this case because the delay constraint prevents the system from selecting the lower power modes. Performance of PMD+EDOC and PEEP+EDOC As compared to PMD with traditional caching which only achieves significant energy saving in high-delay regions, PMD+EDOC and PEEP+EDOC improve energy effi- ciency for all settings across all delay levels. The low-delay region is of particular interest since VSS users have become increasingly sensitive to service delays. To bet- ter visualize the benefits of EDOC and PEEP under tight delay constraints, we plot the energy consumption levels in Figure 4.6 for average delay levels of 0.1 second and 0.12 second. Also, the saving percentages are listed in Table 4.4, indicating that the saving for all service loads are significant and consistent. As mentioned earlier, PMD alone does not gain a lot of energy saving in low-delay regions because the lower power modes are unlikely to be selected due to heavy delay penalties of mode transitions. For PMD+EDOC and PEEP+EDOC, we can effectively 76 reduce the number of mode transitions by getting the troublemakers (the less schedula- ble blocks) in the cache. In addition, PMD+EDOC and PEEP+EDOC achieve consistent improvement across different service loads. Recall that PMD is less effective when the service load is higher. The reason is that idle periods make up a smaller percentage in this case. Since PMD with EDOC and PEEP can make non-active periods longer and more continuous, their improvement in the high load region is comparable to that in the low load region. Table 4.3: Energy saving under low delay levels. (a) Average Delay: 0.1 s 30 ms 40 ms 50 ms 100 ms PMD+EDOC 23.0% 21.8% 23.2% 22.9% PEEP+EDOC 27.0% 26.9% 28.6% 29.1% (b) Average Delay: 0.12 s 30 ms 40 ms 50 ms 100 ms PMD+EDOC 22.6% 23.1% 23.7% 24.0% PEEP+EDOC 26.1% 27.9% 29.3% 28.7% Furthermore, we observe that PMD+EDOC and PEEP+EDOC are able to achieve very low delay levels (0.5 to 0.7 sec), which are not attainable with TMD or PMD alone. The ability to offer such low-delay guarantee makes our scheme a promising approach for practical real-time Internet services. In addition to this current disk model, we also simulated our system for future disks that are equipped with more power modes, possibly utilizing the DRPM tech- nology [39]. We simulated six power modes as listed in Table 4.4a. Compared to R PowerChoice TM [86], two intermediate power modes are used. The resulting energy savings under PEEP+EDOC are reported in Tabel 4.4b. As we can see, with the intro- duction of additional intermediate power modes, we are able to save considerably more energy. To summarize, PMD+EDOC and PEEP+EDOC optimize cache utilization for the benefit of energy efficiency. The increased schedulability of workloads leads to longer average idle time and reduced mode transition overheads. As a result, we observe con- sistent and significant energy saving by both PMD+EDOC and PEEP+EDOC in the low 77 Table 4.4: Parameters and results for a possible future disk model. (a) Power modes Mode L1 L2 L3 L4 L5 Idle Power (W) 10.3 9.28 7.54 6 5.3 Recovery Time (s) 0.15 0.3 0.9 3 8 (b) Energy saving under PEEP+EDOC Delay 30 ms 40 ms 50 ms 100 ms 0.1 sec 31.2% 33.0% 34.1% 35.1% 0.12 sec 28.7% 32.1% 33.5% 36.1% delay and high service load scenario, which is not achievable using PMD alone. Finally, we find out that our algorithms can work better if more power modes are provided. 4.6 Conclusion In this chapter, we continued on the path of energy-delay optimization and investigated how to better use the memory cache to improve the overall energy efficiency. We stud- ied the ways to optimize disk access sequence through caching to improve the workload schedulability and reduce the mode transition overhead. The proposed cache replace- ment and prefetching schemes can effectively increase the average disk idle length and significantly improve the overall energy-delay performance of the VSS storage system, especially when we have very tight delay constraints. More specifically, the disk sub- system consumes 22% 29% less energy with the proposed schemes under average delay levels from 0.1 to 0.12 second. It should be noted that the energy saving potential of our algorithms has been limited by the current disk model. If we apply them to a DRPM-based disk model as what we used in Chapter 3, we can save more than 40% energy under PEEP+EDOC compared to running under TMD. By experimenting with different disk models, we demonstrate that our algorithms are generally applicable to all disks with multiple low-power modes. This work can be extended in several ways. First, better data placement policies can be developed to further improve the energy-delay performance. Second, one can 78 investigate the energy consumption of memory cache. Extending the proposed energy- delay optimization framework to inclusion of memory energy is a promising research direction. Finally, it is interesting to consider the storage and service of scalable video with a multi-layer representation. 79 Chapter 5 Learning-Based Placement Optimization in Video Servers 5.1 Introduction We mentioned the importance of placement in Chapter 3. The way video blocks are stored onto disks directly alters the workload access pattern and thereby affects energy efficiency. In a video server environment as described in 3, each video is divided into equal-sized blocks. Therefore, we have the choice of placing each of the blocks onto one of the storage nodes in the server system. As independent workloads are created for each block, there is basically no restriction regarding the placement of the blocks for each single piece of video content. In this chapter we follow the same assumption made in Chapters 3 and 4 that a single disk is attached to each storage node. And again, our framework can be easily applied to systems with disk arrays. The problem of placing video blocks onto multiple disks is essentially a file assign- ment problem (FAP). FAP for parallel storage systems has been extensively studied since the 1980s. The basic formulation is defined by a set ofM storage nodes andN files and a cost function. The goal is to find the file-disk allocation that optimizes the cost function [27]. It has been shown that the optimization of any cost function in FAP is an NP-complete problem [27]. Therefore, most previous studies focus on development of efficient heuristic algorithms [106]. 80 As in previous chapters, we aim at the joint optimization of both energy and delay. To optimize energy efficiency in video data centers, we need to understand how different arrangements of video blocks contribute to energy consumption and service delay. To do so, we need to take into account the important factors in the operation of the whole system, which includes caching, scheduling and power mode switching. Therefore, we can take advantage of our knowledge on the specific algorithms and schemes we proposed in Chapter 3 and Chapter 4. However, the general optimization framework should work for any video storage systems regardless of their caching, scheduling and power management schemes. In this chapter, we first approach the problem with a heuristic approach. We started from the conventional concentration strategy and modified it to take into account the uniqueness of scheduling in the video storage servers. Then, we move beyond heuristic and proposed a fundamentally different framework for placement optimization based on machine learning. By extracting representative features from different placement schemes and observing simulation results, we can learn accurate prediction models for both energy and delay. Next, we make use of the learned models to formulate the video data placement problem into a constrained optimization task, which is not immediately solvable. Nevertheless, we can simplify it into a form that can be resolved by off- the-shelf optimization tools. We finally discuss the process of optimization and the conversion of optimization results into actual data placement schemes. The rest of the chapter is organized as follows. Section 5.2 presents a heuristic approach based on the concentration strategy. Section 5.3 proposes our learning-based optimization framework for video servers. Section 5.4 evaluates the proposed algorithm. Section 5.5 concludes the chapter. The related work has been summarized previously in Section 1.2.2. 81 5.2 Heuristic Placement Traditional placement heuristics mainly aim at improving system throughput. In our problem setting, maximum throughput is bounded by the basic disk scheduling parameter–the admission limit (i.e. number of blocks allowed each round). In a static placement scenario, the known workload level should not exceed the maximum level the system can sustain. Otherwise, we will need to either acquire more disks, update to faster disks or improve other parts of the system (e.g. increase cache size). A well-known placement strategy in the context of energy conservation is to con- centrate popular data on the minimum set of disks [76]. According to this strategy, we should try to place the blocks onto each disk according to the popularity of the video content. That is, the first disk should be loaded with as many popular blocks as possible. And the same rule applies to the rest of the disks and the remaining blocks. In this way, we have a skewed distribution of file popularity across disks. While the first few disks might need to stay active most of the time, we can maximize the number of disks that can stay in low-power modes as they are lightly loaded. While data concentration based on popularity can result in energy saving, it does not take into account the unique characteristics of video workload. Specifically, mode transition overhead contributes significantly to energy consumption and service delay, as has been detailed in Chapter 4. By moving the less schedulable blocks to cache, we have seen that the overall energy-delay performance improves significantly. By the same token, if these blocks can be concentrated to a minimum set of disks just as the popular ones, the lightly loaded disks will be left with blocks that are both unpopular and easily schedulable, which will lead to low levels of both disk access and mode transition costs. Therefore, instead of popularity-based concentration, it is better to combine popular- ity and schedulability as the reference metric. Recall that schedulability depends on the 82 Maximum Waiting Time (MWT), and blocks with shorter MWT is harder to schedule. We propose the scheduling-aware concentration (SAC) based on the following metric: M i;j = (1 + 1 1 +MWT i;j )p i ; (5.1) whereMWT i;j is the MWT of thej th block of thei th video andp i the popularity of the i th video. We have a parameter to control the weighting on schedulability. Since the effect of schedulability is noticeable only when theMWT is less than or close to the wake-up time penalty, we can use a relatively small value for. From our experiment we found = 2 to be a good choice. To main quality of service, we need to control the delay level. To do so, we have to make sure the no disk will incur excessive delays. In addition to the one-round delay for each disk access, which is imposed by the round-based and pipelined scheduling scheme, there are two main causes for the extra delays: power mode transition and queuing for disk service. Mode transition delay is already dealt with by our energy-delay optimization framework which is used in both Chapter 3 and Chapter 4. Therefore, the major concern here is to contain queuing delay. When we concentrate blocks according to Equation 5.1, there is a risk of overloading some disks. According to queuing theory for disk services, queuing delay relates to disk utilization. If we assume an M/M/1 queue, the average waiting time for each workload is given by [42]: E(T wait ) =E(T ser ) u 1u ; (5.2) whereT ser is the disk service time. In our system, disk service time is given byT=n l , whereT is the round length andn l the maximum number of blocks a disk can process per round. 83 To set a upper limit on the average waiting time, we can restrict the aggregate load on each disk, which can be easily calculated by adding the popularities of stored video blocks and multiplying the sum with the total arrival rate. Assume the upper limit for the average waiting ist c , we require E(T wait ) =E(T ser ) u 1u t c ; (5.3) which is equivalent to the following condition on utilization level: u t c n l T +t c n l : (5.4) The disk utilization is directly proportional to the aggregate arrival rate. Therefore, the inequality condition 5.4 on utilization level can be easily mapped to a cap on the aggregate arrival rate on the disk. Denoting this cap on disk load level l c , we have n l T ul c and l c =n l ( 1 T 1 T +t c n l ) (5.5) To summarize, the proposed SAC algorithm includes the following steps: 1. Calculate the placement utility for each of the video blocks using Equation 5.1. Then sort the blocks according to their placement utility in descending order. Place the block indices in a list. 2. Calculate the load cap for each disk according to Equation 5.5. 3. Start to place blocks onto the first available disk on the list until either 1) the disk is full (or meets target capacity) or 2) the aggregate load exceeds the load cap. 4. Repeat step 3 until all blocks are placed. 84 5.3 Learning-Based Placement Optimization While the SAC placement algorithm is a reasonable heuristic approach that can improve energy efficiency, it does not address the placement optimization problem directly. To our best knowledge, there has been no previous work that has proposed a practical opti- mization technique that does not rely on heuristics for video data placement, or for the more general file assignment problem. In this section, we take a new look at the prob- lem and introduce a new optimization framework that is based on a properly formed mathematical optimization. By solving this optimization task, we virtually work out a solution to what has been deemed unsolvable before. 5.3.1 Basic Framework A proper formulation of our problem can be stated as follows. We haveN blocks (or files) andM disks. For each of theN files, we need to pick one disk from theM ones for storage. We have a constraint on disk capacity (Cc max ) and another on load (Ll c ), which has been discussed in the previous section. Our goal is to find a data allocation scheme (a set of all block-disk mappings)s among all possible schemes (the union of which denoted byS) that optimizes the energy-delay cost function, which is formulated based on Lagrangian relaxation for the delay-constrained energy minimization problem, as we discussed in Chapter 3: s o = argmin s2S C = argmin s2S f M X i=0 E i (s) +D(s)g (5.6) There are two main difficulties in this optimization problem. Firstly, the size of the problem grow exponentially with the number of blocks. In fact, there are totally M N possible assignments, making the problem theoretically impossible to be solved efficiently. Second, it is very difficult to derive analytical models that can represent the 85 energy and delay terms in Equation 5.6 in terms of individual blocks. The reason is that it is the ensemble effect of all block-disk allocations, together with countless actions of disk scheduling, caching and mode transition, that result in some specific values of energy consumption and service delay. To attack this complex problem, we first need to reduce its size. As we discussed previously in Section 5.2, popularity and schedulability are the two defining proper- ties for each block. Basically, blocks affect the overall energy efficiency in different ways because they have different popularities and MWTs or (p;m) pair as we write for simplicity. Although a full description of (p;m) pair for each block can account for individuality very well, it is not necessary for modeling the development of energy and delay during the retrieval and delivery process. Consider the case where we have a blocki on disk 1 and two blocksj andk on disk 2. Ifp i = p j +p k and their MWTs are the same, the two disks are expected to have the same workload access pattern. That is, instead of considering each block separately, it is possible to group some of them together and examine their collective effect on energy efficiency. To make the problem solvable, we cannot classify blocks into groups according to the exact values of their popularities and MWTs. Rather, we can afford to lose some granularity and use only a handful of groups, defining by a set of ranges of popularities and MWTs. Provided that blocks within the same group have proportional contribu- tion (with respect to their popularities) to the energy-delay performance, knowing their aggregate arrival rates might suffice to make reasonable predictions for the final energy consumption and delay level. If such classification scheme exists and we can divide all video blocks inton groups regardless of the size of the video database, the problem becomes the distribution ofn groups ontoM disks. A possible distribution scheme is shown in Table 5.1. As we can 86 Table 5.1: An example of class distribution. Group 1 Group 2 Group 3 Disk 1 80% 20% 90% Disk 2 10% 50% 5% Disk 3 5% 15% 2% Disk 4 5% 15% 3% see, the size of the matrix is onlyMn, wheren is a constant and small integer, thus making the problem mathematically tractable. Let we denote the group distribution matrix of sizeMn asD. Then, the optimiza- tion problem in Equation 5.6 is converted to one that searchs for optimalD in the set of all possible group distributions. Although oneD can match to infinitely many possible placement schemes, we shall see that with proper classification the final costs for all these possibilities are very similar. The next task is to find the relationship between these group distributions and the energy-delay cost function. This amounts to predicting energy consumption and service delay from the input distributionD. Unfortunately, even this much simplified rela- tionship is difficult to obtain and a simple-minded derivation may result in inaccurate predictions, which will not be useful in the final optimization step. To deal with the situation, we adopt a learning-based approach to find accurate pre- diction models. With a good block classification scheme, all the individual quantities in a group distributionD can be influential to energy and delay. Therefore, they are good candidates for features in the learning process. By feeding different group distributions into the system, together with their simulation results, we can establish a prediction model for the cost function that is required in Equation 5.6. The learning process is illustrated in Figure 5.1. 87 Figure 5.1: Framework for learning the prediction model. After learning, we need to tackle the simplified version of Equation 5.6. By solving the optimization task, we can get an optimized distribution vectorD. Then, we have to convert this distribution to an actual data placement schemes. In the rest of the section, we will describe the two important steps of our LEarning- based Placement Optimization (LEPO) algorithm: 1) feature extraction and model learn- ing and 2) optimization and final data placement. 5.3.2 Feature Extraction and Model learning Simplified Formulation To learn an accurate model, we need enough training data that can cover a wide range of possible group distributions. If we consider a complete class distribution as in Table 5.1 as one data point, the search pool becomes huge for a large-scale system with many storage nodes. A more efficient approach is to consider the group distribution on each disk as a data point. Although each disk (or storage node) operates independently, the energy consump- tion of different disks are not strictly independent with each other because scheduling of one disk might be changed due to operations of others. Furthermore, service delay is calculated for each request which involves multiple blocks. 88 In spite of these dependencies, however, it is actually reasonable to assume indepen- dence and treat each disk separately in the learning process. For energy consumption, scheduling change seldom results in changes in active energy consumption (i.e. access- ing the blocks) because of relatively large size of the scheduling window compared to typical change in schedule caused by delays in previous blocks; and changes in inac- tive energy come infrequently. For service delay, sum of per-block delay correlates well with the average per-request delay. Comparing to the benefits of being able to efficiently obtain much more data points and having a much smaller search pool than a full-system training strategy, loss of accuracy that it incurs is a small price to pay. With the use of group distribution and disk-based learning, the problem is simplified and can be rewritten as: D o = argmin D f M X i=0 ( ^ E( ~ d i ) + ^ D( ~ d i ))g; (5.7) where ~ d i denotes thei th row vector in a distribution matrixD, representing the share of each block class thei th disk gets. Note that this simplified formulation requires general models to predict disk-level energy and delay from an arbitrary distribution vector that satisfies certain constraints, which will be presented later. Block Classification and Features As mentioned earlier, we classify video blocks according to their ranges of popularity and MWT. Within each of the resulting groups, we need to measure the aggregate con- tribution of all blocks within it and use it as a feature value to train the prediction models. This is only possible if the blocks in each class share certain important characteristics and contribute to energy and delay in ways that can be easily captured mathematically. 89 (a) (b) Figure 5.2: Hit rate distribution for leading blocks for inter-arrival time of (a) 100ms and (b) 16ms. The leading blocks (LBs) have zero MWT and are particularly important for the overall energy efficiency, as we already analyzed in Chapter 4. The remaining non- leading blocks (NLBs) have different characteristics according to their MWTs. Those with short MWTs behave similarly to LBs in the sense that they are more likely to cause mode transition but are less influential. In addition, they are also more likely to be cached because of energy-aware caching algorithms. Therefore, it makes sense to treat LBs, short-MWT blocks (SMBs) and high-MWT blocks (HMBs) separately. Along this dimension, we use these three MWT classes, which requires only one parameter as the threshold on MWT to separate SMBs and HMBs. For both LBs and NLBs, blocks with different popularities also have different behav- iors. Most importantly, the more popular blocks are more likely to be cached, thereby reducing costs associated with disk access. To look deeper into caching behaviors, we plot cache hit rate against popularities for various workload levels in Figures 5.2 and 5.3. All the videos are divided into 450 bins according to their popularity and hit rate is cal- culated for each of these bins. 90 (a) (b) Figure 5.3: Hit rate distribution for non-leading blocks for inter-arrival time of (a) 100ms and (b) 16ms. We can see that some blocks stay in the cache for virtually all the time. Therefore, their placement will not affect energy and delay at all in spite of their high popularity. The portion of such blocks varies according to the workload level. In low-load scenarios (inter-arrival time equals 100ms), the frequency of cache replacement is lower than the high-load cases (inter-arrival time equals 16ms) and therefore allows more blocks to sit in cache stably. Also, for non-leading blocks, the portion that are always cached is much smaller than leading blocks. For the purpose of feature selection, we need to identify these groups of blocks and remove them from the training process. In addition, for the leading blocks there are clear transitions from “always cached” to “never cached”. The transitional region looks narrow on Figure 5.2, but they make up quite a significant part of the total workload. Thus, it is beneficial to single out this class of leading blocks. To measure the contribution of this class of blocks, we can use divide it further into smaller groups; within each of these groups we can assume each block contribute in proportion to its popularity. 91 Caching statics of non-leading blocks as shown in Figure 5.3. It can be seen that hit rate distributed exponential with respect to popularity ranking in the high-load (16ms) case, while in low-load case we see most blocks have low hit-rate, which means that they do not benefit too much from caching. Exponential behavior is expected because that is the distribution of video popularities is essentially exponential, as we showed earlier in Chapter 3. When the service load is very low, however, the frequency of being requested for the moderately popular content is not high enough for them to be kept in cache. Nevertheless, the distribution is still largely exponential, only with a faster decay coefficient and more fluctuation in the low-popularity region. When load level increases, the rate of decay becomes slower and smoother. Although the ideal way to take into account the hit-rate distribution in this case is to multiply the popularity of each block with a multiplier that decays exponentially just as the hit rates, such an approach would complicate the later stages of our approach. In particular, it would be hard to impose the necessary constraints (e.g. cap on disk load and making sure all block groups are fully placed) and convert feature vector to a placement scheme. Therefore, we choose to use the same approach as we did to the leading blocks: divide the blocks into groups based on their popularity range and assume each block of a group contribute in proportion to its popularity. Finally, we also remove the very tail part of the blocks because they are very rarely requested. If we plot the CDF of the video popularity ranks that are requested, the final 25% of the least popular videos get only 0.044% of the requests. Therefore, it is safe to ignore this 25% part, which not only improves learning accuracy but also benefits the optimization process by reducing space requirement, as we shall see in Section 5.3.3. It is clear the caching statistics of different service load levels vary a lot. Therefore, it is ideal to separate the learning process for different inter-arrival times. However, this does not make sense in practice, as it is impossible to create a separate model for all 92 the possible load levels. A simpler approach is to put the request inter-arrival time as an extra feature in the feature vector and let machine learning techniques deal with the discrepancies. Learning Prediction Models To obtain accurate prediction models for disk-level energy consumption and service delay, we apply Support Vector Regression (SVR) [28] using the libSVM toolbox [13]. Training data are generated by running simulation for the same system configuration for different placement schemes and arrival rates. To cover the wide range of possibil- ities, we created a large number of placement schemes, including random placements, balanced placements, different levels of concentration schemes, among many others. In total we gathered around 4000 data points as training data. From these data points, we classify blocks according to the aforementioned guide- lines and make use of the class distribution in learning. In particular, we normalize the class distribution row vectors (i.e. for each disk) and use the concatenation of each row vector and the arrival rate as a feature vector. A data point gives the information of all blocks stored on a disk, together with the workload level (arrival rate). For each arrival rate, we can use a single run of simulation to obtain the hit-rate distribution, as plotted in Figure 5.2 and Figure 5.3. Then, we can identify the blocks that are always cached and make sure none of the subsequent steps will not consider them. To get the best performance, we start from the most simple classification (only two classes with LBs and NLBs) and gradually increase the number of classes (features) by splitting groups in ways described earlier. The classification schemes we tried include the following: 1. LBs and NLBs (2 classes). 93 Figure 5.4: Learning results for different number of classes 2. Split LBs according to their popularities. The threshold is selected, based on hit- rate distribution, to be the point where the hit rate falls below 10% (3 classes). 3. Split NLBs based on MWT. The threshold is selected to be two times the schedul- ing window by experiments (4 classes). 4. Split each of the NLB classes into two based on popularity. We use the popularity ranking of 12,000 as the threshold (6 classes). 5. Apply one more threshold based on popularity, which is selected to be popularity ranking of 48,000 (8 classes). The thresholds are selected based on both the hit-rate distribution and the balance of distribution among classes. For each classification scheme, we use the normalized values of load assignment as features. We applied SVR with a Gaussian kernel to energy prediction and SVR with a linear kernel for delay prediction. The results are evaluated using 5-fold cross validation and the squared correlation coefficient (SCC) is used to assess the prediction accuracy, which are shown in Figure 5.4. We can see that the first two splits (splitting LBs and MWT-based classification) improve the prediction accuracy significantly. This indicates that these classification 94 techniques are very effective. Subsequent splitting based on the popularity of NLBs continue to improve results, albeit the improvement is much less noticeable. Compared to the 6-class case, the SCC for delay decreases when the number of classes increases to 8. Therefore, we choose to use 6 classes in our learning process, which gives SCC values of 0.94 and 0.98 for delay and energy prediction, respectively. 5.3.3 Optimization and Final Placement In the previous section, we built accurate prediction models. The SVR regression mod- els can be expressed mathematically in Equations 5.8 and 5.9. The feature vector~ x is the concatenation of the aggregate load distribution ~ d with arrival rate. Also,~ x n denotes the feature vector of then th training data point. ^ E(~ x) = N X n=1 ! n exp( jj~ x~ x n jj 2 +b) (5.8) ^ D(~ x) = N X n=1 n (~ x~ x n ) +c (5.9) As we are dealing with the static placement problem, the optimization problem is solved for a constant arrival rate. Since a placement scheme needs to allocate all the video blocks to the specific set of disks, the distribution (i.e. the percentages) of each class needs to add up to 1. This equality constraint will be put on the column vectors in the distribution matrixD. Furthermore, as we discussed in Section 5.2, we need to impose a cap on the total load of each disk, which can be calculated using Equation 5.5. This inequality constraint will be applied on each row vectors ofD. For the sake of simplicity, we use A, B, A 0 and B 0 to represent the matrices that will be applied to represent constraints. The computation of them is trivial and therefore is omitted here. The optimization task can now be formulated in Equation 5.10. 95 Figure 5.5: Relationship between cumulative load and space requirement minimize D M X i=0 ( ^ E( ~ d i ) + ^ D( ~ d i )) subject to ADB andA 0 D =B 0 : (5.10) There is yet one more important constraint to consider: disk capacity. Such con- straint can be either the physical capacity of the disk or some target level set by the application to ensure quality of service. In either case, we can impose a constant con- straint on how many blocks can be stored on each disk. To incorporate this constraint into the optimization formulation in 5.10, we need to write it in terms of the distribution matrix, which contains load distribution of each class. While it is not possible to deduce the number of blocks given an arbitrary disk load distribution (i.e. a row vector ~ d i ), we can obtain the minimum number of blocks (MNB) given the cumulative load level (CLL) of a class (i.e. cumulative distribution of block popularity in the class). By sorting the blocks in each class by their arrival rates, we can plot a CLL-MNB curve for each class, which is exemplified in Figure 5.5. These load-to-space functions give lower bounds for space requirement given input load levels. More specifically, given a scalar value d i;j , meaning the load level of the 96 Table 5.2: An example of class distribution. Group 1 Group 2 Disk 1 80% 20% Disk 2 10% 50% j th class on thei th disk, we can get the minimum number of blocks that are needed to satisfy the load assignment. If we sum up the lower bounds of all classes, we get the minimum space requirement for the disk, as written in Equation 5.11. S i n X j=1 f j (d i;j ) (5.11) The lower bound can be achieved if we start placing the most popular blocks in the class until the load assignment is satisfied. That is, allocating the head part of the curve. As we can see, the curves in Figure 5.5 grow exponentially, which means that the tail part requires a lot more space than the head part. Therefore, we cannot apply these bounds separately in a multi-disk system. Instead, we need to consider the cumulative load assignment. Take the load assignment in Table 5.2 as an example, we can get the following bounds: S 1 b 1 =f 1 (0:8) +f 2 (0:2) S 2 b 2 =f 1 (0:1) +f 2 (0:5) S 1 +S 2 b 3 =f 1 (0:9) +f 2 (0:7) The minimum per-disk space requirement for the 2-disk system ismax(b 1 ;b 2 ;b 3 =2). Ifb 3 =2 is greater than bothb 1 andb 2 , it gives the lower bound of required space for both disks. The equality condition is achievable by exchanging the head and tail parts of the curve for the same class, which can be done in very fine granularity due to the existence 97 of a large number of blocks with varying arrival rates. Therefore, by considering the cumulative bound, we can estimate the per-disk space requirement. If we assume all disks in the system share the same capacity constraint, the per-disk space requirement estimated in this way can be used to check if such constraint can be satisfied. To extend the 2-disk space estimation to a general many-disk system, we have to loop over all disk indices and classes, and calculate both the per-disk bounds and the cumulative bounds. In theory, we need to evaluate every element in the power set of M disk indices, which makes the complexity grow exponential with respect to the num- ber of disks. Practically, however, we can use a greedy approach to speed it up. We know from the load-space relationship as illustrated in Figure 5.5 that space require- ment increases monotonically with respect to load level. Therefore, for each class, we only need to check the top one disk with the highest level of load assignment, followed by the sum of top two disks, and so on. As we calculate all the space requirement, we can make sure we obtain the maximum space requirement for all disks. Our space estimation algorithm follows this simple guideline. For each classj, we sort the disk indices according to the partial sum of load levels, P j k=1 d i;k . In this way, we create a list that prioritizes the disks. Then, we go through the list one by one and calculate the maximum disk requirement. In this way, the General Space Estimation Algorithm (GSEA) runs in O(nMlogM) time. The details are summarized in Algo- rithm 1. By applying GSEA, we can compute the space requirement for any distribution allo- cation matrixD. If we know the space constraint , the optimization becomes: minimize D M X i=0 ( ^ E( ~ d i ) + ^ D( ~ d i )) subject to ADB ,A 0 D =B 0 andS(D) : (5.12) 98 Algorithm 1 General Space Estiamtion Algorithm procedure ESTIMATE SPACE(fd i;j g ) . coefficients inD S 0 forj = 1 ton do . loop overn classes Sort disk indicies by P j k=1 d i;k in descending order and store inSL fori = 1 toM do . loop overM disks ii SL(i) . get disk index from the sorted list S maxfS; P n k=1 f k (d ii;k ); 1 i P n k=1 f k ( P i l=1 d SL(l);k )g end for end for returnS end procedure Now that we formulate the optimization problem with all the necessary constraints, we can proceed with solving it. Unfortunately, it is a difficult task for two reasons. First, the equality constraints make the problem non-convex, which means that we may get stuck in local minima during optimization. Second, the space constraint is calculated from a complex process and is highly nonlinear. As most optimization tools only deal with linear constraints, the problem presented in Equation 5.12 requires very compli- cated tools to solve (e.g. genetic algorithms). One effective way to deal with the non-convexity is to start from different initial points and repeat the optimization process for a few times. Although it is not guaran- teed that we can reach the true global minimum, this allows us to get the best-performing local minimum that in general works reasonably well. As most off-the-shelf optimiza- tion toolbox can deal with this issue, such an approach is easily applicable. To overcome the issue of non-linear constraint, we first removed the space con- straints and tried to solve the optimization in Equation 5.10 using the interior point method [25]. During optimization, we calculated the space requirement for each step. We repeated such procedure for different initial points and arrival rates. It turns out that for all cases the space requirement increases monotonically when we approach the 99 optima. This is not surprising as space requirement increases, the bulk of the workload gets allocated onto a smaller set of disks which means we have more lightly-loaded disks and can apply more low-power modes. In a way, the optimization process gradu- ally concentrate the workload in order to improve energy efficiency. Therefore, the partially optimized results actually satisfy a set of space constraints that is continuously increasing. If we reduce the optimization step size, we can obtain results that satisfy any practical constraints on disk capacity. Although this approach cannot provide any guarantee on the optimality of the result, it is certain that we can get gradually improved results by repeating the process in an informed way. Recall that we ignore a considerable amount of blocks in the feature extraction stage (the always-cached blocks and those in the heavy tail). Therefore, when the optimization starts the space requirement is almost always considerably lower than the practically minimum one (i.e. when all disks store the same number of blocks). Therefore, by the time the minimum space constraint is reached, the result is guaranteed to be better than the initial point. The same goes for all the ensuing steps. After we get a load assignment scheme from a partially optimized one satisfying a certain constraint, we can modify it slightly so that only the least significant part of the load distribution (i.e. the class of the least popular range of NLBs with long MWTs) is changed but the space requirement reduces. Then, we use this modified distribution as our new input and run the optimization again. We can repeat this process until the result converges. When LEPO is applied in practice on large-scale systems, execution of GSEA may become the bottleneck and makes the process slow. For one thing, we used in GSEA the load-space relation from raw data, whose size can be huge in practice. We basically needs to store a giant look-up table which could be a real issue. Also, GSEA needs to 100 be run on each of disks repeatedly for each class on every optimization step. When the number of disk is very large, it can also become very slow. To solve the first issue, we can actually use the summation of two exponential func- tion to model the curves in Figure 5.5. The curve fitting is very accurate (theR 2 coeffi- cient is over 0.99). In this way, we only need to store four parameters per class instead of a huge look-up table. For the second problem, we can focus on the highly loaded disks and omit some of the least popular disks in GSEA. For example, if a class load assignment changes from 70% to 80%, the consequence on energy-delay cost can be very big; in contrast, the difference between a 0.1% and a 0.2% assignment can be very little. Hence, we can apply some threshold for each class and only consider the selected disks. The final conversion step requires some additional steps and there are multiple ways to do it. One simple approach is to final place the blocks in a greedy way as how we estimate the space requirement: use the head part to populate the popular disks. After this initial placement, we can exchange blocks between disks so that the load distribution stays the same but the number of blocks get evened out. To sum up, the LEPO algorithm works as follows: 1. Create a range of different data layouts for the system under different arrival rates. 2. Extract features from these data layouts to create load assignment matrices as training data. Perform simulation for all these data layouts. The resulting disk- level energy and delay values become training labels. 3. Apply SVR with Gaussian kernel for energy and SVR with a linear kernel for delay. Record the two prediction models. 101 4. Use interior point method to solve the optimization task in Equation 5.10 step by step. In each step, apply GSEA to compute the space requirement. Record all the partial results and their space requirements 5. Repeat Step 4 for different initializations until obtaining reasonable results. Initial points can be designed heuristically, randomly, or from the modification of partial results previously obtained. 6. Convert the resulting load assignment to an actual placement scheme. Space requirement can be linked to the cost of system associated with purchasing or upgrading disk drives. A simple way to quantify this cost is to compute the ratio of the space requirement to the minimum average space occupancy, which is achieved when all disks store the same number of blocks, as shown in Equation 5.13. F (D) = S(D) S 0 (5.13) The capacity cost factor not only is a convenient way to describe the space constraint, but also can be used to express the joint energy-delay-cost optimization, as shown in Equation 5.14. minimize D M X i=0 ( ^ E( ~ d i ) + ^ D( ~ d i )) + F (D) subject to ADB ,A 0 D =B 0 andF (D) S 0 : (5.14) In practical applications, other forms of capacity cost factor might be used to formu- late this joint optimization. The procedure would be similar to what we have described above. Instead of taking different partial results, we can calculate the joint cost for dif- ferent capacity cost factor, and select the one that gives the minimum total cost. This 102 approach would be very helpful in the planning stage for setting up or upgrading a large- scale video service. 5.4 Experimental Results and Evaluation The simulation environment is the same as that in Chapter 4. We use the same simulator based on the Seagate Cheetah 15k.7 disk [85]. The simulation and system parameters can be found in Tables 3.2. The low power modes were simulated according to Table 4.1. In this section, we compare the performance of the SAC heuristic placement and the LEPO algorithm. In both algorithms, we need to set a cap on the maximum disk load, which can be practically done by setting a upper limit on the average waiting time and then applying Equation 5.5. In our experimental setting (where T is 0.15 second and n l is 4), we cap the average waiting time to 0.5 second, which would be greater than most of our previous results. According to the inequality in 5.4 and equation 5.5, the maximum utilization rate is around 93% and the maximum arrival rate on disk is around 25 workloads per second. We ran everything using the EDOC+PEEP algorithm. During the learning and optimization process, we fixed the Lagrangian multiplier used in the power mode decision to be 5. As mentioned earlier, we gathered 4000 data points with the 7 features and two labels (energy and delay). The optimization involves some repetition due to non-convexity and the partial results we need to satisfy space constraints. We plot the best optimization results for two different inter-arrival times, 100 ms and 30 ms, in Figure 5.6. We can see that actual energy consumptions correspond well with values predicted from model. The predicted delay is not shown because disk-level delay and average service delay do not have the same range of magnitude. However, they correlate very well. Furthermore, we can see that both energy and delay decrease 103 (a) Inter-arrival time: 100 ms (b) Inter-arrival time: 30 ms Figure 5.6: The optimization results for two arrival rates. when the required capacity increases. When workload level increases, the benefits of having more disk capacity diminishes, as evidenced by the smaller slope of the curves in Figure 5.6b as compared to Figure 5.6a. In fact, when we continue to increase the load level and make the inter-arrival time 16 ms (average disk utilization rises to 80%), relaxing the space constraint makes very little difference. The compare the performance of SAC and LEPO, we plot energy consumptions for the two inter-arrival times, 100 ms and 30 ms, under service delay of 0.1 sec and 0.12 sec for four different capacity cost factors in Figure 5.7. The percentage savings are listed in Table 5.3. In addition, the results under a inter-arrival time of 16 ms, which indicates very high level of average disk utilization, are shown in Figure 5.8 for the case where 104 (a) 100ms, 0.1 sec (b) 100ms, 0.12 sec (c) 30ms, 0.1 sec (d) 30ms, 0.12 sec Figure 5.7: Comparison of energy consumption with low delay levels. F = 1. As we mentioned earlier, increasing capacity does not improve energy-delay performance in this case. The results demonstrate the effectiveness of LEPO for different server load levels and space capacity constraints. In general, more saving can be achieved when the aver- age delay level is lower. When average delay is lower, mode transition becomes more costly. Optimization of data placement by LEPO in general has the effect of reducing mode transition and improve scheduling for the whole system. Therefore, its advantage turns into more savings in the low-delay scenario. In addition, the percentage of saving 105 Figure 5.8: Comparison of energy consumption when the system is close to fully utilized decreases when system load becomes heavier. This is due to the fact that it is the inac- tive part of the energy consumption that can be further reduced. When system load is lighter, there is more room to optimize. Moreover, it’s noteworthy to point out that energy saving of LEPO generally goes down when space constraint becomes more relaxed. The reason is that when we can afford to have more space, workloads are practically concentrated onto a subset of the disk, thereby making the effect of LEPO restricting to those active disks. Hence, it amounts to optimizing the placement over a smaller system, which makes relative saving smaller. Note that as system becomes close to fully loaded, such effect disappears, as with the case when the inter-arrival time reaches 16 ms. Finally, it is interesting to note that the energy consumption given by LEPO under the minimum capacity requirement (F=1.0) is lower than the one by SAC under the maximum capacity requirement (F=2.21 or F=2.45). This means that even with less than half of the disk capacity, LEPO still provides schemes that outperform SAC. Table 5.3: Energy saving under low delay levels. (a) Inter-arrival time: 100ms F = 1.0 F = 1.13 F = 1.49 F = 2.21 0.1 sec 14.9% 14.6% 14.8% 14.2% 0.12 sec 15.2% 14.0% 11.5% 10.0% (b) Inter-arrival time: 30ms F = 1.0 F = 1.18 F = 1.60 F = 2.45 0.1 sec 11.4% 12.0% 8.7% 9.6% 0.12 sec 8.2% 8.4% 7.4% 8.0% 106 To summarize, LEPO improves upon the heuristic concentration technique under different service load levels, service delay levels and disk capacity constraints. By accu- rately predicting energy and delay from features, LEPO is able to create data placement schemes that lead to better energy efficiency. Compared to SAC heuristics, LEPO can save up to 15% energy under low levels of service delay. Even when the system is about 80% utilized, it consumes 9% to 11% less energy with LEPO than with SAC. 5.5 Conclusion In this chapter, we propose an effective and practical learning-based optimization tech- nique for static data placement in video servers. To simplify the data placement problem in the context of a multi-disk environment, which is known to be NP-hard, we use a con- cise set of features to represent the complete data allocation schemes. Then, we apply machine learning techniques to obtain accurate prediction models for energy and delay, with which we are able to formulate the optimization problem mathematically. Finally, although this optimization task is not readily approachable, we develop a work-around and solve a simplified problem that gives us energy-efficient data layouts. Experiments show that the proposed LEPO algorithm can create data placement schemes that work under a wide range of service load levels and save energy under different delay and capacity constraints. 107 Chapter 6 Conclusion and Future Work In this dissertation, we demonstrated that energy and service delay can be jointly opti- mized in large-scale video sharing servers. More specifically, we investigated the char- acteristics of video-sharing workload in a parallel multi-disk storage system and devel- oped a model for the disk idle time to assist energy-aware decisions and operations. Based on the model, we proposed a novel prediction-based mode decision (PMD) algo- rithm , which makes optimal power mode decision at the disk level. Compared to the traditional threshold-based approach, our algorithm can save up to 5.7% 15.0% (and 16% 35.0% with DRPM) more energy under the same delay constraint. Furthermore, PMD can offer low delay guarantee which is not achievable by the traditional threshold- based mode decision (TMD) approach. Then, we extended our energy-delay analysis and examined the effect of cache management policies. In particular, we studied how to optimize disk access sequence through more efficient caching so that we could improve the workload schedulability and reduce the mode transition overhead. We proposed the energy-delay-optimized cache replacement (EDOC), which includes a novel energy-aware caching utility. Also, we incorporated prefetching action into our energy-delay optimization framework, and introduced the prediction-based energy-efficient prefetching (PEEP) algorithm. The combination of EDOC and PEEP can effectively increase the average disk idle length and significantly improve the overall energy-delay performance of the VSS storage sys- tem, especially when we have very tight delay constraints. Under PEEP+EDOC, the 108 disk subsystem consumes 22% 29% (and around 40% with DRPM) less energy under tight delay constraints. Furthermore, we took on the problem of data placement in the context of multi-disk data-disk allocation. An originally NP-hard problem due to the exponentially increasing problem size with respect to the scale of the system, it becomes mathematically solvable after we used a set of representative features to describe the data-disk assignment. We extracted the features and trained support vector regression (SVR) models to establish the relationship between the assignment feature matrix and disk-level energy and delay values. Finally, we used the learned models to formulate a constrained optimization task and presented an efficient approach to solve the problem. The resulting data schemes created by our algorithm consume considerably less energy (up to 15%) than the best- working heuristics for a wide range of conditions, including different service load levels, delay requirement and space capacity constraint. To sum up, we showed that the dual optimization of energy and delay can be applied to different procedures of VSSs to improve the overall energy efficiency. While all these techniques are effective and applicable, they all assume stationary workload. In practical VSSs, we observe variations in not only the serve load, but also in the video database, where video numbers are increasing all the time and the popularity of each video is constantly changing. To deal with dynamic workload, we need a prediction model and an adaptation mechanism. More specifically, the system should be able to detect workload change and react to it in a timely manner. In most energy management schemes, these two stages are separated. For example, Tan et al applied reinforcement learning to predict the workload behavior [91]. The system will make power mode decisions based on its observation of power consumption and queue length. To apply similar approaches to 109 our problem we need to a workload model that captures both the video-sharing workload and the parallel video retrieval process. Alternatively, it is also possible to combine the process of workload prediction and energy management decisions. In [26], the authors included different well-known power management policy (e.g. fixed timeout, adaptive timeout, stochastic modeling, etc.) and learned their “success probabilities” online. Although only one policy can be applied each time, the potential performance of all others is evaluated as well which contributes to the updating of learned weights. In this way, the scheme can adaptively pick the best policy in real time. The goal is to design an energy-efficient video-sharing server that work robustly in optimal configurations under a changing and scalable environment. 110 Bibliography [1] ALMEIDA, V., BESTAVROS, A., CROVELLA, M., AND DE OLIVEIRA, A. Char- acterizing reference locality in the www. In Fourth International Conference on Parallel and Distributed Information Systems, 1996. (dec 1996), pp. 92 –103. [2] ANDERSON, C. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, 2006. [3] ARMBRUST, M., FOX, A., GRIFFITH, R., JOSEPH, A. D., KATZ, R. H., KON- WINSKI, A., LEE, G., PATTERSON, D. A., RABKIN, A., STOICA, I., AND ZAHARIA, M. Above the clouds: A berkeley view of cloud computing. Tech. Rep. UCB/EECS-2009-28, EECS Department, University of California, Berke- ley, Feb. 2009. [4] BARROSO, L., AND HOLZLE, U. The case for energy-proportional computing. Computer 40, 12 (dec. 2007), 33 –37. [5] BENINI, L., BOGLIOLO, A., AND DE MICHELI, G. A survey of design tech- niques for system-level dynamic power management. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 8, 3 (june 2000), 299 –316. [6] BIANCHINI, R., AND RAJAMONY, R. Power and energy management for server systems. Computer 37, 11 (nov. 2004), 68 – 76. [7] BOSTOEN, T., MULLENDER, S., AND BERBERS, Y. Power-reduction tech- niques for data-center storage systems. ACM Computing Surveys 45, 3 (2013). [8] BROOKS, D., AND MARTONOSI, M. Dynamic thermal management for high- performance microprocessors. In The Seventh International Symposium on High- Performance Computer Architecture, 2001. HPCA (2001), pp. 171 –182. [9] BROWN, D. J., AND REAMS, C. Toward energy-efficient computing. Queue 8, 2 (Feb. 2010), 30:30–30:43. 111 [10] CARRERA, E. V., PINHEIRO, E., AND BIANCHINI, R. Conserving disk energy in network servers. In Proceedings of the 17th annual international conference on Supercomputing (New York, NY , USA, 2003), ICS ’03, ACM, pp. 86–97. [11] CHA, M., KWAK, H., RODRIGUEZ, P., AHN, Y.-Y., AND MOON, S. I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (New York, NY , USA, 2007), IMC ’07, ACM, pp. 1–14. [12] CHANDRA, S., AND VAHDAT, A. Application-specific network management for energy-aware streaming of popular multimedia formats. In USENIX Annual Technical Conference, General Track (2002), C. S. Ellis, Ed., USENIX, pp. 329– 342. [13] CHANG, C.-C., AND LIN, C.-J. LIBSVM: A library for support vec- tor machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1–27:27. Software available at http://www.csie.ntu.edu. tw/ ˜ cjlin/libsvm. [14] CHEN, G., HE, W., LIU, J., NATH, S., RIGAS, L., XIAO, L., AND ZHAO, F. Energy-aware server provisioning and load dispatching for connection-intensive internet services. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (Berkeley, CA, USA, 2008), NSDI’08, USENIX Association, pp. 337–350. [15] CHENG, X., DALE, C., AND LIU, J. Dataset for statistics and social network of youtube videos. http://netsg.cs.sfu.ca/youtubedata/. 2008. [16] CHERKASOVA, L., AND GUPTA, M. Analysis of enterprise media server workloads: access patterns, locality, content evolution, and rates of change. IEEE/ACM Trans. Netw. 12, 5 (2004), 781–794. [17] CHERKASOVA, L., AND STALEY, L. Building a performance model of streaming media applications in utility data center environment. In CCGRID (2003), IEEE Computer Society, pp. 52–59. [18] CHOU, C.-F., GOLUBCHIK, L., AND LUI, J. Striping doesn’t scale: how to achieve scalability for continuous media servers with replication. In Proceedings. 20th International Conference on Distributed Computing Systems, 2000. (2000), pp. 64 –71. [19] CHRISTENSEN, K., REVIRIEGO, P., NORDMAN, B., BENNETT, M., MOSTOWFI, M., AND MAESTRO, J. Ieee 802.3az: the road to energy efficient ethernet. Communications Magazine, IEEE 48, 11 (november 2010), 50 –56. 112 [20] CHURCH, K., GREENBERG, A., AND HAMILTON, J. On delivering embarrass- ingly distributed cloud services. In HotNets (2008). [21] CISCO SYSTEMS INC. Cisco visual networking index - forecast and methodol- ogy, 2010-2015. Cisco White Paper, 2011. [22] COLARELLI, D., AND GRUNWALD, D. Massive arrays of idle disks for storage archives. In SC (2002), pp. 1–11. [23] CORP, E. Emc symmetrix vmax 10k storage system, 2012. [24] DAN, A., SITARAM, D., AND SHAHABUDDIN, P. Scheduling policies for an on-demand video server with batching. In ACM Multimedia (1994), M. Blattner and J. O. Limb, Eds., ACM Press, pp. 15–23. [25] DANTZIG, G. B. Linear programming and extensions. Princeton university press, 1998. [26] DHIMAN, G., AND ROSING, T. Dynamic power management using machine learning. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design (2006), ACM, pp. 747–754. [27] DOWDY, L. W., AND FOSTER, D. V. Comparative models of the file assignment problem. ACM Computing Surveys (CSUR) 14, 2 (1982), 287–313. [28] DRUCKER, H., BURGES, C. J., KAUFMAN, L., SMOLA, A., AND VAPNIK, V. Support vector regression machines. Advances in neural information processing systems (1997), 155–161. [29] ECONOMOU, D., RIVOIRE, S., AND KOZYRAKIS, C. Full-system power anal- ysis and modeling for server environments. In In Workshop on Modeling Bench- marking and Simulation (MOBS) (2006). [30] ELNOZAHY, E., KISTLER, M., AND RAJAMONY, R. Energy-efficient server clusters. In Power-Aware Computer Systems, vol. 2325 of Lecture Notes in Com- puter Science. Springer, 2003, pp. 179–197. [31] ELNOZAHY, M., KISTLER, M., AND RAJAMONY, R. Energy conservation poli- cies for web servers. USITS ’03 (2003). [32] EPA, U. Report to congress on server and data center energy efficiency. U.S. Environmental Protection Agency (july 2007). [33] FAN, X., WEBER, W.-D., AND BARROSO, L. A. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th annual international sym- posium on Computer architecture (New York, NY , USA, 2007), ISCA ’07, ACM, pp. 13–23. 113 [34] FORTE, D., AND SRIVASTAVA, A. Energy-aware video storage and retrieval in server environments. In Green Computing Conference and Workshops (IGCC), 2011 International (july 2011), pp. 1 –6. [35] GARG, R., SON, S. W., KANDEMIR, M., RAGHAVAN, P., AND PRABHAKAR, R. Markov model based disk power management for data intensive workloads. In Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid (Washington, DC, USA, 2009), CCGRID ’09, IEEE Computer Society, pp. 76–83. [36] GEMMELL, D., VIN, H., KANDLUR, D., VENKAT RANGAN, P., AND ROWE, L. Multimedia storage servers: a tutorial. Computer 28, 5 (may 1995), 40 –49. [37] GILL, P., ARLITT, M., LI, Z., AND MAHANTI, A. Youtube traffic characteri- zation: a view from the edge. In Proceedings of the 7th ACM SIGCOMM con- ference on Internet measurement (New York, NY , USA, 2007), IMC ’07, ACM, pp. 15–28. [38] GUHA, A. The evolution to network storage architectures for multimedia appli- cations. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2 (Washington, DC, USA, 1999), ICMCS ’99, IEEE Computer Society, pp. 9068–. [39] GURUMURTHI, S., SIVASUBRAMANIAM, A., KANDEMIR, M., AND FRANKE, H. DRPM: dynamic speed control for power management in server class disks. In Computer Architecture, 2003. Proceedings. 30th Annual International Sympo- sium on (june 2003), pp. 169 – 179. [40] HAMIDZADEH, B., AND TSUN-PING, J. Dynamic scheduling techniques for interactive hypermedia servers. IEEE Transactions on Consumer Electronics 45, 1 (feb 1999), 46 –56. [41] HEATH, T., DINIZ, B., CARRERA, E. V., MEIRA, JR., W., AND BIANCHINI, R. Energy conservation in heterogeneous server clusters. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming (New York, NY , USA, 2005), PPoPP ’05, ACM, pp. 186–195. [42] HENNESSY, J. L., AND PATTERSON, D. A. Computer architecture: a quantita- tive approach. Elsevier, 2012. [43] HUGH EVERETT III. Generalized lagrange multiplier method for solving prob- lems of optimum allocation of resources. Operations Research 11, 3 (1963), pp. 399–417. 114 [44] INTEL OPEN SOURCE TECHNOLOGY CENTER. Getting maximum mileage out of tickless (June 2007). [45] ISCI, C., BUYUKTOSUNOGLU, A., CHER, C.-Y., BOSE, P., AND MARTONOSI, M. An analysis of efficient multi-core global power management policies: Max- imizing performance for a given power budget. In 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. MICRO-39. (dec. 2006), pp. 347 –358. [46] ISCI, C., CONTRERAS, G., AND MARTONOSI, M. Live, runtime phase moni- toring and prediction on real systems with application to dynamic power manage- ment. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (Washington, DC, USA, 2006), MICRO 39, IEEE Computer Society, pp. 359–370. [47] ISCI, C., AND MARTONOSI, M. Runtime power monitoring in high-end pro- cessors: Methodology and empirical data. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (Washington, DC, USA, 2003), MICRO 36, IEEE Computer Society, pp. 93–. [48] JIN, H., TAN, G., AND WU, S. Clustered multimedia servers: Architectures and storage systems. Annual Review of Scalable Computing (2003). [49] KANG, X., ZHANG, H., JIANG, G., CHEN, H., MENG, X., AND YOSHIHIRA, K. Measurement, modeling, and analysis of internet video sharing site workload: A case study. In IEEE International Conference on Web Services, 2008. ICWS ’08 (sept. 2008), pp. 278 –285. [50] KIM, H., KIM, E. J., AND MAHAPATRA, R. N. Power management in raid server disk system using multiple idle states. In Proceedings of International Workshop on Unique Chips and Systems (UCAS) (2005), p. 5359. [51] KIM, M., AND SONG, M. Saving energy in video servers by the use of multi- speed disks. Circuits and Systems for Video Technology, IEEE Transactions on PP, 99 (2011), 1. [52] LE, K., BIANCHINI, R., MARTONOSI, M., AND NGUYEN, T. Cost-and energy- aware load distribution across data centers. Proceedings of HotPower (2009). [53] LEE, J. Parallel video servers: a tutorial. Multimedia, IEEE 5, 2 (apr-jun 1998), 20 –28. [54] LEE, J., AND WONG, P. Performance analysis of a pull-based parallel video server. IEEE Transactions on Parallel and Distributed Systems 11, 12 (Dec 2000), 1217 –1231. 115 [55] LEE, J. Y. B., AND LEE, C. H. Design, performance analysis, and implementa- tion of a super-scalar video-on-demand system. IEEE Trans. Circuits Syst. Video Techn. 12, 11 (2002), 983–997. [56] LEIGH, K., AND RANGANATHAN, P. Blades as a general-purpose infrastructure for future system architectures: Challenges and solutions. HP Labs Technical Report (2007). [57] LI, H., CHER, C.-Y., VIJAYKUMAR, T., AND ROY, K. Vsv: L2-miss-driven variable supply-voltage scaling for low power. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. (dec. 2003), pp. 19 – 28. [58] LIM, K., RANGANATHAN, P., CHANG, J., PATEL, C., MUDGE, T., AND REIN- HARDT, S. Understanding and designing new server architectures for emerging warehouse-computing environments. In Computer Architecture, 2008. ISCA ’08. 35th International Symposium on (june 2008), pp. 315 –326. [59] LIM, M. Y., FREEH, V. W., AND LOWENTHAL, D. K. Adaptive, transparent frequency and voltage scaling of communication phases in mpi programs. In SC 2006 Conference, Proceedings of the ACM/IEEE (nov. 2006), p. 14. [60] LIU, C., QIN, X., KULKARNI, S., WANG, C., LI, S., MANZANARES, A., AND BASKIYAR, S. Distributed energy-efficient scheduling for data-intensive applications with deadline constraints on data grids. In IPCCC (2008), T. Znati and Y . Zhang, Eds., IEEE, pp. 26–33. [61] LIU, C. L., AND LAYLAND, J. W. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM 20, 1 (Jan. 1973), 46–61. [62] MAKAROFF, D., NEUFELD, G., AND HUTCHINSON, N. An evaluation of vbr disk admission algorithms for continuous media file servers. In Proceedings of the fifth ACM international conference on Multimedia (New York, NY , USA, 1997), MULTIMEDIA ’97, ACM, pp. 143–154. [63] MANZANRES, A., RUAN, X., YIN, S., NIJIM, M., LUO, W., AND QIN, X. Energy-aware prefetching for parallel disk systems: Algorithms, models, and evaluation. In Eighth IEEE International Symposium on Network Computing and Applications, 2009. NCA 2009. (july 2009), pp. 90 –97. [64] MEISNER, D., GOLD, B. T., AND WENISCH, T. F. Powernap: eliminating server idle power. In Proceedings of the 14th international conference on Archi- tectural support for programming languages and operating systems (New York, NY , USA, 2009), ASPLOS ’09, ACM, pp. 205–216. 116 [65] MOHAPATRA, S., CORNEA, R., DUTT, N. D., NICOLAU, A., AND VENKATA- SUBRAMANIAN, N. Integrated power management for video streaming to mobile handheld devices. In ACM Multimedia (2003), L. A. Rowe, H. M. Vin, T. Plage- mann, P. J. Shenoy, and J. R. Smith, Eds., ACM, pp. 582–591. [66] MORAD, T. Y., WEISER, U. C., KOLODNYT, A., VALERO, M., AND AYGUAD, E. Performance, power efficiency and scalability of asymmetric cluster chip mul- tiprocessors. Computer Architecture Letters 5, 1 (2006), 14–17. [67] MOUNTROUIDOU, X., RISKA, A., AND SMIRNI, E. Adaptive workload shaping for power savings on disk drives. In ICPE (2011), pp. 109–120. [68] MOUTON, J. Enabling the vision: Leading the architecture of the future. Keynote at Server Blade Summit (2004). [69] NARASIMHA REDDY, A., AND WYLLIE, J. I/o issues in a multimedia system. Computer 27, 3 (march 1994), 69 –74. [70] NATHUJI, R., ISCI, C., AND GORBATOV, E. Exploiting platform heterogene- ity for power efficient data centers. In Autonomic Computing, 2007. ICAC ’07. Fourth International Conference on (june 2007), p. 5. [71] NATHUJI, R., AND SCHWAN, K. Virtualpower: coordinated power management in virtualized enterprise systems. In SOSP (2007), T. C. Bressoud and M. F. Kaashoek, Eds., ACM, pp. 265–278. [72] NATHUJI, R., AND SCHWAN, K. Vpm tokens: virtual machine-aware power budgeting in datacenters. In HPDC (2008), M. Parashar, K. Schwan, J. B. Weiss- man, and D. Laforenza, Eds., ACM, pp. 119–128. [73] NEXSAN. Nexsan energy efficient automaid technol- ogy. http://www.slideshare.net/socialnexsan/ nexsan-auto-maid-marketing-report, 2009. [74] PAPATHANASIOU, A. E., AND SCOTT, M. L. Energy efficient prefetching and caching. In Proceedings of the annual conference on USENIX Annual Techni- cal Conference (Berkeley, CA, USA, 2004), ATEC ’04, USENIX Association, pp. 22–22. [75] PATEL, C. D., AND SHAH, A. J. Cost model for planning, development and operation of a data center. HP Lab Technical Report (June 2005). [76] PINHEIRO, E., AND BIANCHINI, R. Energy conservation techniques for disk array-based servers. Proceedings of the 18th annual international conference on Supercomputing ICS 04 (2004), 68. 117 [77] QIU, L., PADMANABHAN, V. N., AND VOELKER, G. M. On the placement of web server replicas. In INFOCOM 2001. Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE (2001), vol. 3, IEEE, pp. 1587–1596. [78] RAGHAVENDRA, R., RANGANATHAN, P., TALWAR, V., WANG, Z., AND ZHU, X. No ”power” struggles: coordinated multi-level power management for the data center. In Proceedings of the 13th international conference on Architec- tural support for programming languages and operating systems (New York, NY , USA, 2008), ASPLOS XIII, ACM, pp. 48–59. [79] RANGANATHAN, P., LEECH, P., IRWIN, D., AND CHASE, J. Ensemble- level power management for dense blade servers. In Proceedings of the 33rd annual international symposium on Computer Architecture (Washington, DC, USA, 2006), ISCA ’06, IEEE Computer Society, pp. 66–77. [80] REISSLEIN, M., HARTANTO, F., AND ROSS, K. W. Interactive video streaming with proxy servers. Information Sciences 140 (2002), 3 – 31. [81] RIVOIRE, S., RANGANATHAN, P., AND KOZYRAKIS, C. A comparison of high-level full-system power models. In Proceedings of the 2008 conference on Power aware computing and systems (Berkeley, CA, USA, 2008), HotPower’08, USENIX Association, pp. 3–3. [82] RONCA, D. A brief history of netflix streaming, May 2013. [83] SAXENA, M., SHARAN, U., AND FAHMY, S. Analyzing video services in web 2.0: a global perspective. In NOSSDAV (2008), C. Griwodz and L. C. Wolf, Eds., ACM, pp. 39–44. [84] SCHEUERMANN, P., WEIKUM, G., AND ZABBACK, P. Data partitioning and load balancing in parallel disk systems. the VLDB Journal 7, 1 (1998), 48–66. [85] SEAGATE. Seagate cheetah 15k.7 sas product manual. http: //www.seagate.com/staticfiles/support/disc/manuals/ enterprise/cheetah/15K.7/100516226a.pdf, 2010. [86] SEAGATE TECHNOLOGY. Reducing storage energy consumption by up to 75%. http://www.seagate.com/files/docs/pdf/whitepaper/ tp608-powerchoice-tech-provides-us.pdf, 2011. [87] SHARMA, R., BASH, C., PATEL, C., FRIEDRICH, R., AND CHASE, J. Bal- ance of power: dynamic thermal management for internet data centers. Internet Computing, IEEE 9, 1 (jan.-feb. 2005), 42 – 49. 118 [88] SITARAM, D., AND DAN, A. Multimedia servers: applications, environments, and design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000. [89] SON, S. W., AND KANDEMIR, M. Energy-aware data prefetching for multi- speed disks. In Proceedings of the 3rd conference on Computing frontiers (New York, NY , USA, 2006), CF ’06, ACM, pp. 105–114. [90] SRIKANTAIAH, S., KANSAL, A., AND ZHAO, F. Energy aware consolidation for cloud computing. In Proceedings of the 2008 conference on Power aware com- puting and systems (Berkeley, CA, USA, 2008), HotPower’08, USENIX Associ- ation, pp. 10–10. [91] TAN, Y., LIU, W., AND QIU, Q. Adaptive power management using rein- forcement learning. In Proceedings of the 2009 International Conference on Computer-Aided Design (2009), ACM, pp. 461–467. [92] TANG, W., FU, Y., CHERKASOVA, L., AND VAHDAT, A. Medisyn: a synthetic streaming media service workload generator. In Proceedings of the 13th interna- tional workshop on Network and operating systems support for digital audio and video (New York, NY , USA, 2003), NOSSDA V ’03, ACM, pp. 12–21. [93] TANG, X., AND XU, J. On replica placement for qos-aware content distribution. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies (2004), vol. 2, IEEE, pp. 806–815. [94] TEWARI, R., AND ADAM, N. R. Distributed file allocation with consistency constraints. In Distributed Computing Systems, 1992., Proceedings of the 12th International Conference on (1992), IEEE, pp. 408–415. [95] TEWARI, R., MUKHERJEE, R., DIAS, D., AND VIN, H. Design and perfor- mance tradeoffs in clustered video servers. In Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems, 1996. (jun 1996), pp. 144 –150. [96] TOLIA, N., WANG, Z., MARWAH, M., BASH, C., RANGANATHAN, P., AND ZHU, X. Delivering energy proportionality with non energy-proportional sys- tems: optimizing the ensemble. In Proceedings of the 2008 conference on Power aware computing and systems (Berkeley, CA, USA, 2008), HotPower’08, USENIX Association, pp. 2–2. [97] VARKI, E., MERCHANT, A., XU, J., AND QIU, X. Issues and challenges in the performance analysis of real disk arrays. IEEE Transactions on Parallel and Distributed Systems 15, 6 (june 2004), 559 – 574. 119 [98] VERMA, A., KOLLER, R., USECHE, L., AND RANGASWAMI, R. Srcmap: Energy proportional storage using dynamic consolidation. In FAST (2010), pp. 267–280. [99] VIN, H., GOYAL, P., AND GOYAL, A. A statistical admission control algorithm for multimedia servers. In Proceedings of the second ACM international con- ference on Multimedia (New York, NY , USA, 1994), MULTIMEDIA ’94, ACM, pp. 33–40. [100] WEISS, A. Computing in the clouds. netWorker 11, 4 (Dec. 2007), 16–25. [101] WEISSEL, A., AND BELLOSA, F. Dynamic thermal management for dis- tributed systems. In IN PROCEEDINGS OF THE FIRST WORKSHOP ON TEMPERATURE-AWARE COMPUTER SYSTEMS (TACS04 (2004). [102] WOLFSON, O., JAJODIA, S., AND HUANG, Y. An adaptive data replication algorithm. ACM Transactions on Database Systems (TODS) 22, 2 (1997), 255– 314. [103] WON, Y., KIM, J., AND JUNG, W. Energy-aware disk scheduling for soft real- time I/O requests. Multimedia Systems 13 (2008), 409–428. 10.1007/s00530- 007-0107-8. [104] WU, D., HOU, Y., ZHU, W., ZHANG, Y.-Q., AND PEHA, J. Streaming video over the internet: approaches and directions. IEEE Transactions on Circuits and Systems for Video Technology 11, 3 (mar 2001), 282 –300. [105] XIE, T. Sea: A striping-based energy-aware strategy for data placement in raid- structured storage systems. IEEE Transactions on Computers 57, 6 (june 2008), 748 –761. [106] XIE, T., AND SUN, Y. A file assignment strategy independent of workload char- acteristic assumptions. ACM Transactions on Storage (TOS) 5, 3 (2009), 10. [107] YOUTUBE/GOOGLE, INC. Google tech talks: Youtube scability. Seattle Conference on Scalability, http://www.youtube.com/watch?v=ZW5_ eEKEC28, Oct 2007. [108] YU, H., ZHENG, D., ZHAO, B. Y., AND ZHENG, W. Understanding User Behavior in Large-Scale Video-on-Demand Systems. In EuroSys (2006). [109] ZHU, Q., SHANKAR, A., AND ZHOU, Y. Pb-lru: a self-tuning power aware storage cache replacement algorithm for conserving disk energy. In Proceedings of the 18th annual international conference on Supercomputing (New York, NY , USA, 2004), ICS ’04, ACM, pp. 79–88. 120 [110] ZHU, Q., AND ZHOU, Y. Power-aware storage cache management. IEEE Trans- actions on Computers 54, 5 (may 2005), 587 – 602. 121
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Energy proportional computing for multi-core and many-core servers
PDF
Novel and efficient schemes for security and privacy issues in smart grids
PDF
Architectural innovations for mitigating data movement cost on graphics processing units and storage systems
PDF
Towards green communications: energy efficient solutions for the next generation cellular mobile communication systems
PDF
Integration of energy-efficient infrastructures and policies in smart grid
PDF
Power efficient multimedia applications on embedded systems
PDF
Efficient data collection in wireless sensor networks: modeling and algorithms
PDF
A joint framework of design, control, and applications of energy generation and energy storage systems
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Energy efficient design and provisioning of hardware resources in modern computing systems
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
PDF
Energy optimization of mobile applications
PDF
Energy-efficient computing: Datacenters, mobile devices, and mobile clouds
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Efficient coding techniques for high definition video
PDF
SLA-based, energy-efficient resource management in cloud computing systems
PDF
Architecture design and algorithmic optimizations for accelerating graph analytics on FPGA
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Average-case performance analysis and optimization of conditional asynchronous circuits
Asset Metadata
Creator
Yuan, Hang
(author)
Core Title
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
09/14/2015
Defense Date
01/21/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
energy efficiency,OAI-PMH Harvest,parallel storage systems,video servers
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Golubchik, Leana (
committee member
), Hwang, Kai (
committee member
)
Creator Email
klutz_casino@hotmail.com,rhagn.ryu@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-370334
Unique identifier
UC11296126
Identifier
etd-YuanHang-2300.pdf (filename),usctheses-c3-370334 (legacy record id)
Legacy Identifier
etd-YuanHang-2300.pdf
Dmrecord
370334
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yuan, Hang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
energy efficiency
parallel storage systems
video servers