Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 699 (1999)
(USC DC Other)
USC Computer Science Technical Reports, no. 699 (1999)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Continuous Media Placement and Scheduling in Heterogeneous Disk Storage Systems Roger Zimmermann 1999 Abstract A number of recent technological trends have made data intensive applications such as continuous media (audio and video) servers a reality. These servers store and retrieve a large volume of data using magnetic disks. Servers consisting of heterogeneous disk drives have become a fact of life for several reasons. First, disks are mechanical devices that might fail. The failed disks are almost always replaced with new models. Second, the current technological trend for these devices is one of annual increase in both performance and storage capacity. Older disk models are discontinued because they cannot compete with the newer ones in the commercial arena. With a heterogeneous disk subsystem, the system should support continuous display while managing resources intelligently in order to maximize their utilization. This dissertation describes a taxonomy of techniques that ensure a continuous display of objects using a heterogeneous disk subsystem. This taxonomy consists of: (a) strategies that partition resources into homogeneous groups of disks and manage each independently, and (b) techniques thattreatalldisksuniformly,termednon-partitioningtechniques. Weintroduceandevaluatethree non-partitioning techniques: Disk Merging, Disk Grouping, and Staggered Grouping. Our results demonstratethat Disk Merging is the most flexible scheme while providinga competitive, low cost per simultaneousdisplay. Finally, using an open simulation model, wecompareDisk Merging with a partitioning technique. The obtained results confirm the superiority of Disk Merging. Chapter 1 Introduction The reasonable man adapts himself to the world. The unreasonable man tries to adapt the world to himself. Therefore all progress depends on the unreasonable man. – George Bernard Shaw During the past few years it has become technically feasible to implement continuous media (CM) storage servers because of the increase in available computing power and advances in networking and data storage technologies. The exponential improvements in solid state technology (i.e., processors and memory) 1 as well as increased bandwidth and storage capacities of modern magnetic disk drives have allowed even personal computers to support audio and video clips. Table 1.1 shows a selection of commercially available CM server implementations [SMI96, SNI96, SCI95, nCU97]. Many applications that traditionally were the domain of analog video are evolving to utilize digital video. For example, terrestrial broadcasters in the U.S. will start to transmit some of their programs in digital form by the end of the year 1998. Satellite broadcastnetworks, such as DirecTV TM , were designed from the ground up with a completely digital infrastructure [BGR94]. The proliferation of digital audio and video has been facilitated by the wide acceptance of standards for compression and file formats, such as 2 MPEG-2 [Gal91]. Consumer electronics are also adopting these standards in products such as the digital versatile disk (DVD 3 ) and digital VHS 4 (D-VHS). Furthermore, increased network capacity for local area networks (LAN) and advanced streaming protocols (e.g., RTSP 5 ) allow remote viewing of video clips. In the future, the Internet maybecomeaprimarycarrierofcontinuousmedia. Theseadvanceshaveproducedinterestinnewapplications such as digital libraries, video-on-demand or movie-on-demand services, distance training and learning, etc. Continuousmediamuchexceedtheresourcedemandsoftraditionaldatatypesandrequiremassiveamounts of space and bandwidth for their storageand display. The storagemedia of choice for such objects are usually magnetic disk drives because of their high performance and moderate cost. At the time of this writing, other storagetechnologies are either slow (magneto-optical discs), providelimited random access (tapes), or limited write capabilities (CD-ROM, DVD). The capacity and speed of magnetic disk drives has improved steadily over the last few decades. Table 1.2 shows the characteristics of three successive generations of disk drives from a commercial manufacturer. The storagecapacityhasroughlydoubledeverythreeyears(arateofapproximately26%per year)[PGK88]. More recently, the rate has acceleratedto approximately 50% annually (see Figure 1.1). The disk transfer rate (i.e., the bandwidth) follows a similar trend with an annual improvement of approximately 40% (see Figure 1.2). 1 Popularly indicated by Moore’s Law (The observation that the logic density of silicon integrated circuits has closely followed the curve (bits per square inch) = 2 (t−1962) where t is time in years; that is, the amount of information storable on a given amount of silicon has roughly doubled every year since the technology was invented.) 2 The Motion Picture Expert Group (MPEG) has standardized several video and audio compression formats. 3 DVD is a standard for optical discs that features the same form-factor as CD-ROMs but holds up to 4.7 GB of data (8.5 GB with a dual-layer option). They can store video (MPEG-2), audio, or computer data (DVD-ROM). A writable version is planned for the near future [DVD96]. 4 Video Home System: a half-inch video cassette format. 5 The Real Time Streaming Protocol is an Internet Engineering Task Force (IETF) proposed standard for control of streaming media on the Internet. 3 4 CHAPTER1. INTRODUCTION Thetechnologicaladvancestoreducetheseektimehavebeenmoremoderateatarateoffivetotenpercentper year (see Figure 1.3) [Gro97b, PGK88, RW94b]. At the same time the cost per megabyte has been declining steadily (see Figure 1.4). Table 1.3 shows the characteristics of hypothetical disks extrapolated from these technological trends. To achieve the high bandwidth and massive storage required for multi-user CM servers, disk drives are commonly combined into disk arrays to support many simultaneous I/O requests. Such a large-scale storage system suffers from two limitations that might introduce heterogeneity into its disk array. First, disks are mechanical devices that might fail. Because the technological trend for these devices is one of annual increase in both performance and storage capacity, it is usually more cost-effective to replace a failed disk with a new model. Moreover,in the fast-paced disk industry, the corresponding model may be unavailable because it was discontinued by the manufacturer. Second, if the disk array needs to be expanded due to increased demand of either bandwidth or storage capacity, it is usually most cost-effective to add current-generation disks. For these reasons heterogeneous storage systems are a common occurrence. Hence, a CM server should manage these heterogeneous resources intelligently in order to maximize their utilization and to guarantee jitter-free video and audio displays. Before we explore the current techniques in support of CM servers with heterogeneous storage, we briefly introducethe requirementsfor the displayofcontinuousmedia. Unliketraditionaldata types, suchasrecords, textandevenstillimages,continuousmedia objectsareusuallylargein size. Forexample, atwo hourMPEG- 2 encoded movie requires approximately 4 gigabytes (GB) of storage (at a display rate of 3-15 megabits per second (Mb/s)). Figure 1.5 compares the space requirements for a one hour video clip encoded in different industry standarddigital formats. Second, their isochronousnature requirestimely, real-timeretrieval of data blocks at a pre-specified rate. For example, the NTSC 6 video standard requires that 30 video frames per second are displayed to a viewer. If data blocks are delayed in their delivery to the decoding station then the display might suffer from disruptions and delays. Digital continuous media streams can be encoded using either constant bit-rate (CBR) or variable bit-rate (VBR) schemes. As the name implies, the consumption rate of a CBR media stream is constant over time, while VBR streams use variable rates to achieve maximal compression. InthisdissertationwewillassumeCBRmediatoprovideafocuseddiscussion. Theextensionsof ourtechniquestosupportVBRconstituteoneofourfutureresearchdirectionsandaredescribedinChapter7. Tosupportthe continuousdisplayofavideo object, forexampleX,froma diskarraybasedstorageserver, X is commonly striped into n equi-sized blocks: X 0 ,X 1 ,...,X n−1 (see Figure 1.6a) [Pol91, TPBG93, CL93, BGMJ94, NY94a]. Both, the display time of a block and its transfer time from the disk are a fixed function of the display requirementsof an object and the transfer rateof the disk, respectively. Using this information, the system stagesa blockofX (sayX 0 ) from the disk into main memoryand initiates its display. It schedules the disk drive to read X 1 into memory prior to the completion of the display of X 0 . This ensures a smooth transition between the two blocks in support of continuous display. This process is repeated until all blocks of X have been retrieved and displayed. The periodic nature of the data retrieval and display process gives rise to the definition of a time period (T p ): it denotes the time required to display one data block. Note that the display time of a block is in general significantly longer than its transfer time from the disk drive (assuming a compressed video object). Thus, the bandwidth of a disk can be multiplexed among several displays. Effectively, each time period is partitioned into slots which are guaranteed to be long enough to handle the retrieval of a single media block (see Figure1.6b) [LS93, VSR94, GZS + 97]. With a multi-disk architecture, the data blocks are assigned to the disks in a round-robin manner to distribute the load imposed by a display evenly. With a homogeneous disk storage system, a display occupies one slot and migrates from one disk to the next at the end of each time period 7 . This paradigm is no longer appropriate for a heterogeneous storage system, because fast disks can accommodate more slots per time period. However, since streams move in a round-robin manner from disk to disk, only the number of slots supported by the slowest participating disk drive can be allocated. Otherwise, some of the streams need to be abandoned during a transition from a fast to a slow disk. As a result, the fast disk drives will be idle for part of each time period, wasting bandwidth that could have been used to support additional displays. Onlyafewmulti-diskdesignsandimplementationstodisplaycontinuousmediafromheterogeneousstorage systems have been reported in the literature. They can broadly be classified into two groups: (1) designs that 6 National Television Standard Committee. 7 All requests (i.e., streams) that accessed disk d i to retrieve their current data block will advance to disk d (i+1)modD , where D is the total number of disk drives in the system, to retrieve the following block during the next round. 5 Vendor Product Max. no. of users Max. bandwidth Starlight TM StarWorks-200M TM 133 @ 1.5 Mb/s 200 Mb/s Sun TM MediaCenter TM 1000E 270 @ 1.5 Mb/s 400 Mb/s Storage Concepts TM VIDEOPLEX TM 320 @ 1.5 Mb/s 480 Mb/s a nCUBE TM MediaCUBE TM 30 270 @ 1.5 Mb/s 400 Mb/s nCUBE TM MediaCUBE TM 300 1,100 @ 1.5 Mb/s 1,650 Mb/s nCUBE TM MediaCUBE TM 3000 20,000 @ 1.5 Mb/s 30,000 Mb/s a The VIDEOPLEX system does not transmit digital data over a network but uses analog VHS signals instead. Table 1.1: A selection of commercially available continuous-media servers. Model ST31200WD ST32171WD ST34501WD Series Hawk 1LP Barracuda 4LP Cheetah 4LP Manufacturer Seagate Technology TM , Inc. Capacity C 1.006 GB 2.061 GB 4.339 GB Avg. transfer rate RD 3.47 MB/s 7.96 MB/s 12.97 MB/s Spindle speed 5,400 rpm 7,200 rpm 10,033 rpm Avg. rotational latency 5.56 msec 4.17 msec 2.99 msec Worst case seek time 21 msec 19 msec 16 msec Surfaces 9 5 8 Cylinders #cyl 2697 5177 6582 Number of Zones Z 23 11 7 Sector size 512 bytes 512 bytes 512 bytes Sectors per Track ST 59 - 106 119 - 186 131 - 195 Sector ratio ST Z 0 ST Z N−1 106 59 =1.8 186 119 =1.56 195 131 =1.49 Introduction year 1990 1993 1996 Table 1.2: Parameters for three commercial disk drives. Model a D1998 D1999 D2000 Introduction year 1998 1999 2000 Capacity C 17 GB 23 GB 37 GB Avg. transfer rate RD 18 MB/s 25 MB/s 36 MB/s Spindle speed 12,000 rpm 12,000 rpm 15,000 rpm Avg. rotational latency 2.50 msec 2.50 msec 2.00 msec Worst case seek time 13 msec 11 msec 9 msec a These fictitious model names are used to reference the data sets of this table for the purpose of this thesis. Table 1.3: Estimated parameters for disk drives of the near future based on the projections from Figures 1.1 through 1.4. 6 CHAPTER1. INTRODUCTION 10 100 1,000 10,000 100,000 1977 1980 1983 1986 1989 1992 1995 1998 2001 Capacity [MB / Disk] + 26% per year + 50 ... + 60% per year Production Year Figure 1.1: Capacity improvement [Pat93]. 1990 1992 1994 1996 1998 2000 1 2 4 10 20 40 100 Data Rate [MB / Second] Production Year + 40% per year Figure 1.2: (Media) data rate improvement [Gro97c]. 7 1990 1992 1994 1996 1998 2000 4 8 10 12 16 Time [msec] Production Year - 5% per year 20 Rotational Access time Seek time latency Figure 1.3: Seek, rotation, and access times improvement [Gro97b]. 1980 1985 1990 1995 2000 Cost [$ / MB] 1 0.1 0.01 10 100 1,000 - 40% per year Production Year Figure 1.4: Cost/megabyte decline [Gro97a]. 8 CHAPTER1. INTRODUCTION 1 10 100 1000 Size [GB] MPEG-2 DV 4:1:1 DV 4:2:2 Digi Beta D1 HDTV 3 Mb/s 31 Mb/s 50 Mb/s 90 Mb/s 270 Mb/s 1.2 Gb/s 1.9 20.4 33.0 59.3 178.0 791.0 Figure1.5: Storagerequirementsforaninetyminutevideoclipdigitizedindifferentindustrystandardencoding formats. partitiona heterogeneousstoragesysteminto multiple, homogeneoussub-servers,and(2) designsthat present a virtual homogeneous view on top of the physical heterogeneity. We will consider the two groups in turn. Bypartitioningaheterogeneousstoragesystemintoasetofhomogeneoussubservers,eachcanbeconfigured optimally, thus improving the disk bandwidth utilization. With this approach, each video object is striped only across the storage devices of the subserver on which it is placed. The access pattern to a collection of video or movie clips is usually quite skewed, for example, 20% of the clips may account for 80% of all retrieval requests [GKSZ97]. Because the subservers will have different performances (bandwidth and/or storage capacity) it becomes important to select the appropriate server on which to place each video clip so that the imposed load is balanced evenly. One metric that can be employed to make such placement decisions isthebandwidth to space ratio(BSR)ofeachsubserver[DS95]. However,shouldtheaccesspatternchangeover time then a subserver may become a bottleneck. Load-balancing algorithms are commonly used to counter such effects but they have the disadvantage of being detective rather than preventive, i.e., they can only try to remedy the situation after an imbalance or a bottleneck has already occurred. Replication can help to reduce the likelihood of bottlenecks but the replicated objects are wasting valuable storage space and hence the resulting system may not be cost-effective. The most relevant work has been proposed in a study that developed declustering algorithms to place data over all the disk drives present in a heterogeneous, multi-node storage system [TJH + 94, CRS95]. In its simplest form, this technique is based on the bandwidth ratio of the different devices leading to the concept of virtual disks. A variation of the technique considers capacity constraints as well. This scheme fails to address two issues. First, the presented algorithms assume a fixed bandwidth for each disk drive without considering the inevitable and variable mechanical positioning overhead. This may result in system configurations that arenotnecessarilyoptimalfromacost-effectivenessperspective. AsdemonstratedinChapter 4, onemayvary the block size to enhance the cost-effectiveness of a configuration. Second, large disk arrays that use striping across all their disks are especially vulnerable to failures. The risk of a disk failure increases proportionally with the number of drives. For example, if one disk has a mean time between failures (MTBF) of 500,000 hours(57 years)then, with a systemthat consistsof1,000disks, a potentialfailure occursevery500hours(21 days), see Table 6.1 in Chapter 6. Because striping distributes each object across all disk drives, the failure of a single disk will affect all video clips, resulting in disruptions of service. This dissertation investigates continuous display with heterogeneous disk storage systems and proposes cost-effective techniques that improve upon traditional approaches. Its contributions are three-fold: 9 d d d d 0 1 2 3 X 0 X 4 X 1 X 5 X 2 X 6 X 3 X 7 Time Time d d d d 0 1 2 3 X 0 X 1 X 2 X 3 X 4 Time period slot Figure 1.6a: Data placement. Figure 1.6b: Data retrieval. Figure1.6: StripingofobjectX inastoragesystemwithfourdisks. Fixeddatablocksareplacedandretrieved in a round-robin manner. During the display time of a block (termed a time period) multiple block retrievals (each in its own time slot) are scheduled for each disk drive on behalf of different streams. 1. It combines data placement algorithms for heterogeneous storage systems with detailed disk modeling and scheduling techniques to provide guaranteed continuous media display. 2. It describes a configuration planner that takes the requirements of a CM application and the char- acteristics of a heterogeneous storage system as inputs and produces a set of parameters that most cost-effectively configures the system. 3. It provides a framework of fault tolerance techniques applicable to heterogeneous storage systems. Wehaveevaluatedtheproposedtechniquesusinganalyticalandsimulationmodelstoensuretheirfeasibility and performance. We have further verified the accuracy of the results by implementing them in the storage subsystem of Mitra (a CM server research prototype [GZS + 97]). 10 CHAPTER1. INTRODUCTION 1.1 Organization The remainder of this thesis is organized as follows. Chapter 2 surveys previous and related work in the field of continuous media storage systems. Single disk, multi-disk, and heterogeneous storage systems are detailed. It is followed by a discussion of the fundamental principles of continuous media display in Chapter 3. Further contained in that chapter is an overview of the mechanical and electrical characteristics of modern magnetic disk drives and how the different aspects of such devices can effectively be modeled analytically. Chapter 4 introducesthreenon-partitioningtechniquesforheterogeneousCMstoragesystems: DiskGrouping,Staggered Grouping, and Disk Merging. It also features a configuration planner that finds the most cost-effective Disk Merging configuration for a given application. The first part of Chapter 5 provides an evaluation of the three techniques based on analytical results. In the second part one non-partitioning technique (Disk Merging) is comparedto a simple partitioning schemewith a simulation model. A frameworkoffault tolerancetechniques applicable to heterogeneous storage systems is proposed in Chapter 6. Conclusions and future work are outlined in Chapter 7. Some of the detailed parameters used for disk modeling are provided in Appendix A and Appendix B contains hardness results. Chapter 2 Previous Work 2.1 Introduction Anumber ofstudieshaveinvestigatedCMstorageserversinrecentyears. Theycommonlyfocusonmulti-disk architectures, because the number of CM streams that can be supported concurrently from a single disk is limited 1 . To increase the number of streams, multiple disks (also referred to as disk arrays or disk farms) are commonlyemployed[TPBG93,LS93,LPS94,VSR94,Vin94,VRG95,GVK + 95,HLL + 95,ORS96b,MNO + 96, GZS + 97]. A possible multi-disk architecture based on the popular SCSI 2 I/O bus is illustrated in Figure 3.2. Two main issues need to be addressed when designing the storage system of a high-performance CM server: data placement and scheduling algorithms. A common technique to place CM data across multiple disks is striping (or some variation thereof, for examplestaggeredstriping [BGMJ94]). Suchround-robindataplacementhasbeenwidelyreportedto balance the load imposed on individual disk drives evenly, and furthermore, to support the maximum achievable throughput (e.g., [TPBG93, LS93, HLL + 95, VRG95, GZS + 97]). However, striping data across devices with varying storage capacities poses new challenges. The isochronous nature of CM streams requires that data blocks are retrieved in a periodic manner with real-time deadlines that cannot be missed for a smooth display. Numerous scheduling techniques have been developed, applicable at either the disk level to allow the multiplexing of the read/write heads of each disk on behalf of multiple displays (see Section 3.2.2 for more details), or at the stream level to provide the round- robin movement acrossthe disk drives. But again, these techniques need to be fine-tuned and adapted for the context of heterogeneous devices. 2.2 Heterogeneous Storage Systems Techniques designed for homogeneous storage systems assume a fixed number of slots per time period. In a heterogeneous storage environment fast (i.e., high bandwidth) disks will spend less time to complete a block transfer and accordingly accommodate more slots. However, since streams move in an ordered, round-robin manner from disk to disk, only the number of slots supported by the slowest participating disk drive can be allocated. As a result, the fast disk drives will be idle for a fraction of each time period, wasting part of their increased bandwidth that could have been used to support additional displays. The utilization can be improvedbypartitioningaheterogeneousstoragesystemintoasetofhomogeneoussubservers,eachconfigured optimally. Every video object is striped across the storage devices of the subserver on which it is placed (a striping group). However, balancing the load across subservers becomes a challenge. The access pattern to a collection of video or movie clips is usually quite skewed, for example, references to 20% of the clips may account for 80% of all retrieval requests [GKSZ97]. Individual subservers will have differentperformancesbaseduponthecharacteristicsofthediskdrivesused(bandwidthandstoragecapacity). 1 Disks with the highest available transfer bandwidth today (1998) can support approximately 25-30 MPEG-2 streams at 3.5 Mb/s. 2 Small Computer System Interface. 11 12 CHAPTER 2. PREVIOUSWORK Therefore, it becomes significant to select the appropriate server on which to place each video clip so that the load is balanced evenly. One metric that can be employed to make such placement decisions is the bandwidth to space ratio (BSR) of each subserver [DS95]. However, should the access pattern change over time then a subservermay become a bottleneck. For someapplications the accesspattern canchange quite unpredictably. A video-on-demand server, for example, may be subject to long-term changes because of external events. The release of a new movie may introduce some uncertainty as to how well it will be received and how quickly publicinterestwillwane. Word-of-mouthmayincreasethepopularityofamovieunexpectedly(a“sleeper”hit in Hollywood terms). Or the death of a well-known movie actor or actress may suddenly revitalize the public interest in movies that he or she participated in. In the short-term the access pattern during a twenty-four hour period may change due to a higher popularity of children-oriented programming in the afternoon and more requests for adult oriented material in the evening. Load-balancing algorithms are commonly used to counter such effects but they have the disadvantage of being detective rather than preventive, i.e., they can only try to remedy an undesirable situation due to an imbalance once a bottleneck occurs. Furthermore, the overhead incurred by moving CM objects from one subserver to another may be significant. Recall that CM objects are usually large in size and that valuable bandwidth will be consumed during the transfer of data blocks from one subserver to another. Replication can help to reduce the likelihood of bottlenecks but the replicated objects are wasting scarce storage space and the resulting system may not be cost-effective. Atechniquethatavoidstheseload-balancingissuesaltogetherhasbeenproposedinastudythatdeveloped declustering algorithms to place data across all the disk drives present in a heterogeneous, multi-node storage system [TJH + 94, CRS95]. The study describes a distributed parallel data storage architecture that is based on several server workstations connected through a high speed network. Each server maintains its own local secondary storage which consists of a variable number of magnetic disk drives. Therefore, individual servers deliver different streaming performance to the network and the end users. As a sample application, a single- user, high-bandwidth terrain visualization is presented which retrieves data in parallel at a very high data rate (300-400 Mb/s) from participating servers. The study observes that this architecture could be used for multiple, lower-bandwidth streams, but no details are presented. The proposed technique is, in its simplest form, based on the bandwidth ratio of the different devices and the concept of virtual disks is introduced. A second scheme considers capacity constraints as well. This scheme fails to adequately address two issues. First, the presented algorithms assume a very simple disk model with a fixed bandwidth for each disk drive. The inevitable and variable mechanical positioning overhead (head seek times, rotational latency) are not considered. This simplification may necessitate a very conservative estimate of the transfer rates and hence result in system configurations that are not necessarily optimal from a cost-effectiveness perspective. Second, large disk arraysthat use striping acrossall their disks are especially vulnerable to failures. The risk of a disk failure increases proportionally with the number of drives. For example, if one disk has a mean time between failures(MTBF)of500,000hours(57years)then,withasystemthatconsistsof1,000disks,apotentialfailure occurs every 500 hours (21 days), see Table 6.1 in Chapter 6. Because striping distributes each object across all disk drives, the failure of a single disk will affect all stored video clips, resulting in service disruptions. 2.3 Multi-Zones Disk Drives A number of studies have concentrated on data placement and disk head scheduling in multi-zone disk drives (e.g., [HMM93, Bir95, GKSZ96, GIKZ96, TKKD96]). Most of these techniques are either orthogonal to the techniques in this dissertation or they can be adapted to heterogeneous disk environments. Incorporating multi-zone schemes is attractive because they can increasea server’soverallperformance. For example, track- pairing enables a server to utilize close to the average transfer rate of a zoned disk as compared with the rate of the innermost (slowest) zone [Bir95]. This directly results in a higher number of supported streams. Section 4.6 details how a paradigm that is based on logical tracks[HMM93] can be combined with each of the three non-partitioning techniques introduced in Chapter 4. 2.4. SUMMARY 13 2.4 Summary Most of the previous research in the field of continuous media storage servers has focused on homogeneous systems. However, these techniques are not suitable for heterogeneous storage environments. Two prior studies have investigated magnetic disk drive heterogeneity with either multiple partitions of homogeneous subservers[DS95]ortheconceptofuniform,logicaldisks[TJH + 94,CRS95]. Theformersuffersfromdifficultto address load-balancing aspects while the latter neglected to consider disk modeling details and fault tolerance issues. Therefore, new solutions remain desirable. 14 CHAPTER 2. PREVIOUSWORK Chapter 3 Fundamentals of CM Display and Magnetic Disk Drives 3.1 Continuous Display Overview To support a continuous display of a video object, for example X, several studies have proposed to stripe X into n equi-sized blocks: X 0 ,X 1 ,...,X n−1 [Pol91, TPBG93, CL93, BGMJ94, NY94a]. Both, the display time of a block and its transfer time from the disk are a fixed function of the display requirements of an object and the transfer rate of the disk, respectively. Using this information, the system stages a block of X (say X 0 ) from the disk into main memory and initiates its display. It schedules the disk drive to read X 1 into memory prior to completion of the display of X 0 . This ensures a smooth transition between the two blocks in order to support a continuousdisplay. This processis repeateduntil all blocksofX havebeen retrievedand displayed. The system needs to estimate the disk service time in order to stage a block into memory in a timely manner to avoid starvation of data, i.e., hiccups. Section 3.2.2 describes techniques to model the characteristics of disk drives to obtain the necessary service time estimates. The periodic nature of the data retrieval and display process gives rise to the definition of a time period (T p ): the time required to display a block, i.e., T p = B RC , whereB denotes the size of each block X i and R C is the consumption rate of X. We are making the simplifying assumption in this thesis that the objects that constitute the video server database belong to a single media type and require a fixed bandwidth for their display. This assumption can be relaxed and the proposed techniques extended by employing various variable bit-rate techniques as surveyed in [AMG98]. Most multi-disk designs utilize striping to assign data blocks of each CM file to individual disks. With striping, a file is broken into (fixed) striping units which are assigned to the disks in a round-robin manner Display W Display W W W Disk Activity System Activity W i i+1 i+2 i i+1 X X j j+1 Display X j Mechanical Positioning Delays Time Period Z k Z k+1 T p Figure 3.1: Continuous display of multiple objects (W,X,...,Z) by multiplexing the bandwidth of a disk. 15 16 CHAPTER3. FUNDAMENTALS OF CMDISPLAY AND MAGNETIC DISK DRIVES Host Adapter Memory CPU System Bus I/O Bus Disk Disk Disk Initiator(s) Target(s) The host adapter links the system bus with the I/O bus. Storage/Disk Subsystem SCSI ID 0 SCSI ID 1 SCSI ID 15* SCSI ID 7 *Note: Narrow SCSI supports 8 devices, while wide SCSI supports 16 devices. (e.g., SCSI) (.e.g., PCI) Figure 3.2: Storage subsystem architecture. [SGM86]. Therearetwo basicwaysto retrievestripeddata: (a)in parallel,to utilizethe aggregatebandwidth of all the disks (this is typically done in RAID 1 systems), or (b) in a cyclic fashion to reduce the buffer requirements (this method is sometimes referred to as simple striping or RAID level 0). Both scheduling techniques can also be combined in a hierarchical fashion by forming several clusters of disks [GK95]. Data retrieval proceeds in parallel within a cluster and in cycles across the clusters. When identical disks are used, all the above techniques feature perfect load balancing during data retrieval and on average an equal amount of data is stored on every disk. Level 1 and above of RAID systems have only been analyzed for homogeneous disk arrays, since their performance depends critically on the slowest disk drive (parity information must be calculated from the data of all disks, and read or written for each I/O operation to complete [Fri96]). Note that the display time of a block is in general significantly longer than its transfer time from the disk drive (assuming a compressed video object). Thus, the bandwidth of a disk drive can be multiplexed among several displays referencing different objects (see Figure 3.1). However, a magnetic disk drive is a mechanical device. Multiplexing it among several displays causes it to incur mechanical positioning delays. The source of these delays is described in Section 3.2.1, which provides an overview of the internal operation of disk drives. Such overhead is wasteful and it reduces the number of simultaneous displays supported by the system 2 . Section 3.2.2 details how advanced scheduling policies can minimize the impact of these wasteful operations. 3.2 Target Environment An illustrationof our target hardwareplatform is providedin Figure3.2. The system bus is high performance withnanosecondlatencyandinexcessof100MBytespersecondtransferrateoncethebusarbitrationoverhead is considered. It connects all the major components of the computer system: the memory, the CPU (central processing unit), and any attached devices, such as display, network, and storage subsystems. Within the storage subsystem each individual device, e.g., a disk or a tape, is attached to the I/O bus which in turn is connected to the system bus through a host adapter. The host adapter translates the I/O bus protocol into the system bus protocol and it may improve the performance of the overall system by providing caching and off-loading low-level functions from the main processor. The disk subsystem is central to this study and is detailed in the next section. 1 Redundant Arrays of Inexpensive Disks [PGK88]. 2 The disk performs useful work when it transfers data. 3.2. TARGETENVIRONMENT 17 Spindle Head Arm Platter Cylinder Arm Assembly (Actuator) Read/Write Track Sector Figure 3.3: Disk drive internals. 3.2.1 Modern Disk Drives The magnetic disk drive technology has benefited from more than two decades of research and development. It has evolved to provide a low latency (in the order of milliseconds) and a low cost per MByte of storage (a few cents per MByte at the time of this writing in 1998). It has become common place with annual sales in excess of 30 billion dollars [oST94]. Magnetic disk drives are commonly used for a wide variety of storage purposes in almost every computer system. To facilitate their integration and compatibility with a wide range of host hardware and operating systems, the interface that they present to the rest of the system is well defined and hides a lot of the complexities of the actual internal operation. For example, the popular SCSI (Small Computer System Interface, see [ANS94, Ded94]) standard presents a magnetic disk drive to the host system as a linear vector of storage blocks (usually of size 512 bytes each). When an application requests the retrieval of one or several blocks the data will be returned after some (usually short) time but there is no explicit mechanism to inform the application exactly how long such an operation will take. In many circumstances such a “best effort” approach is reasonable because it simplifies program development by allowing the programmer to focus on the task at hand instead of the physical attributes of the disk drive. However, for a number of data intensive applications, for example continuous media servers, exact timing information is crucial to satisfy the real-time constraints imposed by the requirement for a jitter-free delivery of audio and video streams. Fortunately, with a model that imitates the internal operation of a magnetic disk drive it is possible to predict service times at the level of accuracy that is needed to design and configure CM server storage systems. Below, we will first give an overview of the internal operation of modern magnetic disk drives. Next, we will introduce a model that allows an estimation of the service time of a disk drive. This will build the basis for our introduction of techniques that provide CM services on top of heterogeneous storage systems. Internal Operation A magnetic disk drive is a mechanical device, operated by its controlling electronics. The mechanical parts of the device consist of a stack of platters that rotate in unison on a central spindle (see [RW94b] for details). Presently, a single disk contains one, two, or as many as fourteen platters 3 (see Figure 3.3 for a schematic illustration). Each platter surface has an associated disk head responsible for reading and writing data. The minimum storage unit on a surface is a sector (which commonly holds 512 bytes of user data). The sectors are arranged in multiple, concentric circles termed tracks. A single stack of tracks across all the surfaces at a common distance from the spindle is termed a cylinder. To access the data stored in a series of sectors, 3 This is the case, for example, for the Elite series from Seagate Technology TM , Inc. 18 CHAPTER3. FUNDAMENTALS OF CMDISPLAY AND MAGNETIC DISK DRIVES the disk head must first be positioned over the correct track. The operation to reposition the head from the current track to the target track is termed a seek. Next, the disk must wait for the desired data to rotate under the head. This time is termed rotational latency. In their quest to improve performance, disk manufacturers have introduced the following state-of-the-art designs features into their most recent products: • Zone-bit recording (ZBR). To meet the demands for a higher storage capacity, more data is recorded on the outer tracks than is on the inner ones (tracks are physically longer towards the outside of a platter). Adjacent tracks with the same data capacity are grouped into zones. The tighter packing allows more storage space per platter compared with a uniform amount of data per track [Tho96]. Moreover, it increases the data transfer rate in the outer zones (see the sector ratio in Table 1.2). Section 4.6 details how multi-zoning can be incorporated in our designs. • Statistical analysis of the signals arriving from the read/write heads, referred to as partial-response maximum-likelihood (PRML)method[Goo93,Way94,FASN96]. Thisdigitalsignalprocessingtechnique allows higher recording densities because it can filter and regenerate packed binary signals better than the traditional peak-detection method. For example, the switch to PRML in Quantum Corporation’s Empire product series allowed the manufacturer to increase the per-disk capacity from 270 MBytes to 350 MBytes [FASN96]. • High spindle speeds. Currently, the most common spindle speeds are 4,500, 5,400, and 7,200 rpm. However,highperformancedisksuse10,000and12,000rpm[Gut98](seeTable1.2). Allotherparameters being equal, this results in a data transfer rate increase of 85%, or 122% respectively, over a standard 5,400 rpm disk. Some manufacturers also redesigned disk internal algorithms to provide uninterrupted data transfer (e.g., to avoid lengthy thermal recalibrations) specifically for CM applications. Many of these technological improve- ments have been introduced gradually with new disk generations entering the marketplace every year. To date (i.e., in 1998) disk drives of many different performance levels with capacities ranging from 1 up to 18 gigabytes are commonly available 4 . Thenextsectiondetailsdiskmodelingtechniquestoidentify thephysicalcharacteristicsofamagneticdisk in order to estimate its service time. These estimates have been applied to develop and implement Mitra, a scalable continuous media server [GZS + 97]. Our techniques employ no proprietary information and are quite successful in developing a methodology to estimate the service time of a disk. 3.2.2 Disk Drive Modeling Disk drive simulation models can be extremely helpful to investigate performance enhancements or tradeoffs in storage subsystems. Hence, a number of techniques to estimate disk service times have been proposed in the literature [Wil77, BG88, RW94b, WGPW95, GSZ95]. These studies differ in the level of detail that they incorporate into their models. The level of detail should depend largely on the desired accuracy of the results. More detailed models are generally more accurate, but they require more implementation effort and morecomputationalpower [Gan95]. Simple models mayassumea fixed time for I/O,or they mayselect times from a uniform distribution. However, to yield realistic results in simulations that imitate real-time behavior more precise models are necessary. The most detailed models include all the mechanical positioning delays (seeks, rotational latency), as well as on-board disk block caching, I/O bus arbitration, controller overhead, and defect management. In the next few paragraphswe will outline our model and which aspects we chose to include. On-Board Cache Originally,thesmallbuffers ofon-boardmemoryimplemented with thedevice electronicswereusedfor speed- matching purposes between the media data rate and the (usually faster) I/O bus data transfer rate. They have more recently progressed into dynamically managed, multi-megabyte caches that can significantly affect 4 The highest capacity disk drives currently available with 47 GB of storage space are the Elite 47 models from Seagate Technologies, Inc. 3.2. TARGETENVIRONMENT 19 performance for traditional I/O workloads [RW94b]. In the context of CM servers, however, data is retrieved mostly in sequential order and the chance of a block being re-requested within a short period of time is very low. Most current disk drives also implement aggressive read-ahead algorithms which continue to read data into the cache once the read/write head reaches the final position of an earlier transfer request. It is the assumption that an application has a high chance of requesting these data in the cache because of spatial locality. Such a strategy works very well in a single user environment, but has a diminishing effect in a multi- programming context because disk requests from multiple applications may be interleaved and thus requests may be spatially disbursed 5 . In a CM server environment, a single disk may servedozensof requestsin a time period on behalf of different streams with each request being rather large in size. It has been our observation that disk caches today are still not large enough to provide any benefits under these circumstances. Hence, our model does not incorporate caching strategies. Data Layout Tomakeadiskdriveself-containedandpresentacleaninterfacetotherestofthesystem,itsstorageisusually presented to the host system as an addressable vector of data blocks (see 3.2.1). The logical block number (LBN) of each vector element must internally be mapped to a physical media location (i.e., sector, cylinder, and surface). Commonly, the assignment of logical numbers starts at the outermost cylinder, covering it completely before moving on to the next cylinder. This process is repeated until the whole storage area has been mapped. This LBN-to-PBN mapping is non-contiguous in most disk drives because of techniques used to improve performance and failure resilience. For example, some media data blocks may be excluded from the mapping to store internal data needed by the disk’s firmware. For defect management some tracks or cylinders in regular intervals across the storage area may be skipped initially by the mapping process to allow the disk firmware later to relocate sectors that became damaged during normal operation to previously unused areas. As a result, additional seeks may be introduced when a series of logically contiguous blocks are retrieved by an application. It is possible to account for these anomalies when scheduling data retrievals, because most modern disk drives can be queried as to their exact LBN-to-PBN function. The effects of using accurate and complex mapping information versus a simple, linear approximation have been investigated and quantified in [WGP94]. This study reports possible marginal improvements of less than 2% in combination with seek- reducing scheduling algorithms. Scheduling Algorithms Scheduling algorithms that attempt to reduce the mechanical positioning delays can drastically improve the performance of magnetic disk drives. Figure 3.6 shows the seek and rotational latency overhead incurred by three different disk drive types when retrieving a data block that requires the read/write heads to move across half of the total number of cylinders. As illustrated, the overheadmay waste a large portion of a disk’s potential bandwidth. Furthermore, it is interesting to note that newer disk models spend a proportionally larger amount of time on wasteful operations (e.g., the Cheetah 4LP model is newer than the Barracuda 4LP, which in turn is newer than the Hawk 1LP). The reasons can be found in the technological trends which indicate that the media data rate is improving at a rate of approximately 40% per year (Figure 1.2) while the mechanical positioning delays lag with an annual decrease of only 5% (Figure 1.3). Hence, seek-reducing algorithms are essential in conjunction with modern, high performance disk drives. More than 30 years of research have passed since Denning first analyzed the Shortest Seek Time First (SSTF) and the SCAN or elevator 6 policies [Den67, RW94a]. Several variations of these algorithms have been proposed in the literature, among them Cyclical SCAN (C-SCAN) [SLW96], LOOK [Mer70], and C- LOOK[WGP94] (a combination of C-SCAN and LOOK).Specifically designed with CM applicationsin mind was the Grouped Sweeping Scheme (GSS) [YCK93]. There are various tradeoffs associated with choosing a disk scheduling algorithm. For CM retrieval it is especially important that a scheduling algorithm is fair and starvation-free such that real-time deadlines can be guaranteed. A simplistic approach to guarantee a hiccup-free continuous display would be to assume the 5 This problem can be alleviated to some extent with the implementation of segmented caches. 6 This algorithm first serves requests in one direction across all the cylinders and then reverses itself. 20 CHAPTER3. FUNDAMENTALS OF CMDISPLAY AND MAGNETIC DISK DRIVES 2 4 6 8 10 12 Number of Disks (25%) 5 (50%) 10 (75%) 15 (100%) 20 Transfer Rate [MB/s] (0%) 0 Outermost Zone Innermost Zone Figure 3.4: Example SCSI bus scalability with a Fast&Wide bus that can sustain a maximum transfer rate of 20 MBytes/sec. The graphs show read requests issued to both the innermost and outermost zones for varying numbers of disks. The bandwidth scales linearly up to about 15 MBytes/sec (75%) and then starts to flatten out as the bus becomes a bottleneck. worst case seek time for each disk type, i.e., setting the seek distance d equal to the total number of cylinders, d = #cyl. However, if multiple displays (N) are supported simultaneously, thenN fragments need to be retrieved from each disk during a time period. In the worst case scenario, theN requests might be evenly scattered across the disk surface. By ordering the fragments according to their location on the disk surface (i.e., employing the SCAN algorithm), an optimized seek distance of d = #cyl N can be obtained, reducing both the seek and the service time. This scheduling technique was employed in all our analytical models and simulation experiments. Bus Interface We base our observations on the SCSI I/O bus because it is most commonly available on todays high perfor- mance disk drives. The SCSI standard is both a bus specification and a command set to efficiently use that bus (see [ANS94, Ded94]). Figure 3.2 shows how the host adapter (also called initiator) provides a bridge between the system bus (based on any standard or proprietary bus; e.g., PCI 7 ) and the SCSI I/O bus. The individual storage devices (also called targets in SCSI terminology) are directly attached to the SCSI bus and share its resources. There is overhead associated with the operation of the SCSI bus. For example, the embedded controller on a disk drive must interpret each received SCSI command, decode it, and initiate its execution through the electronicsandmechanicalcomponentsofthedrive. Thiscontrolleroverheadisusuallylessthan 1millisecond for high performance drives ([RW94b] cite 0.3 to 1.0 msec, [WGPW95] measured an average of 0.7 msec, and [Ng98] reports0.5 msec for a cachemiss and 0.1 msec with a cachehit). As we will see in the forthcoming sections, CM retrieval require large block retrievals to reduce the wasted disk bandwidth. Hence, we consider thecontrolleroverheadnegligible(lessthan1%ofthe totalservicetime) anddo notincludeitwith ourmodel. The issue of the limited bus bandwidth should be considered as well. The SCSI I/O bus supports a peak transfer rate of 10, 20, 40, or 80 MBytes per second depending on the SCSI version implemented on both the controller and all the disk devices 8 . Current variations of SCSI are termed Fast, Fast&Wide, Ultra, 7 Peripheral Component Interconnect, a local bus specification developed for 32-bit or 64-bit computer system interfacing. 8 The SCSI protocol implemented over FibreChannel achieves up to 100 MB/s. 3.2. TARGETENVIRONMENT 21 500 1000 1500 2000 2500 Cylinders Traversed 0 5 10 15 20 25 Seek Time [ms] Modeled Seek Profile Measured Seek Profile Figure 3.5: Example measured and modeled seek profile for a disk type ST31200WD. Ultra&WideorUltra2&Wide,respectively. IftheaggregatebandwidthofallthediskdrivesattachedtoaSCSI bus exceeds its peak transfer rate, then bus contention occurs and the service time for the disks with lower priorities 9 willincrease. Thissituationmayarisewithtodayshighperformancediskdrives. Forexample,SCSI buses that implement the wide option (i.e., 16-bit parallel data transfers) allow up to 15 individual devices per bus, which together can transfer data in excess of 150 MBytes per second. Furthermore, if multiple disks want to send data over this shared bus, arbitration takes place by the host bus adapter to decide which request has higher priority. In an earlier study we investigated these scalability issues and concluded that bus contention has a negligible effect on the service time if the aggregatebandwidth of all attached devices is less than approximately75% of the bus bandwidth (see Figure 3.4 [GSZ95]). We assume that this rule is observed to accomplish a balanced design. Mechanical Positioning Delays and Data Transfer Time The disk service time (denotedT Service ) is composed of the data transfer time (desirable) and the mechanical positioning delays (undesirable), as introduced earlier in this chapter. For notational simplicity in our model, we will subsume all undesirable overhead into the expression for the seek time (denoted T Seek ). Hence, the composition of the service time is as follows: T Service =T Transfer +T Seek (3.1) Thetransfertime(T Transfer )isafunctionoftheamountofdataretrievedandthedatatransferrateofthe disk: T Transfer = B RD . The seek time (T Seek ) is a non-linear function of the number of cylinders traversed by thediskheadstolocatetheproperdatasectors. Acommonapproach(see[RW94b,WGPW95,GSZ95,Ng98]) to model such a seek profile is a first-order-of-approximation with a combination of a square-root and linear function as follows: T Seek (d) = c 1 +(c 2 × √ d) if d<z cyl. c 3 +(c 4 ×d) if d≥z cyl. (3.2) 9 Every SCSI device has a fixed priority that directly corresponds to its SCSI ID. This number ranges from 0 to 7 for narrow (8-bit) SCSI. The host bus adapter is usually assigned ID number 7 because this corresponds to the highest priority. The wide SCSI specification was introduced after the narrow version. Consequently, the priorities in terms of SCSI IDs are 7...0, 15...8, with 7 being the highest and 8 the lowest. 22 CHAPTER3. FUNDAMENTALS OF CMDISPLAY AND MAGNETIC DISK DRIVES Disk Model Hawk 1LP Barracuda 4LP Cheetah 4LP Metric ST31200WD ST32171WD ST34501WD Units Seek constant c1 3.5 + 5.56 a 3.0 + 4.17 a 1.5 + 2.99 a msec Seek constant c2 0.303068 0.232702 0.155134 msec Seek constant c3 7.2535 + 5.56 a 7.2814 + 4.17 a 4.2458 + 2.99 a msec Seek constant c4 0.004986 0.002364 0.001740 msec Switch-over point z 300 600 600 cylinders Total size #cyl 2697 5177 6578 cylinders a Average rotational latency based on the spindle speed: 5,400 rpm, 7,200 rpm, and 10,033 rpm, re- spectively. Table 3.1: Seek profile modeling parameters for three commercially available disk models. where d is the seek distance in cylinders. In addition, the disk heads need to wait on average half a platter rotation—once they have reached the correct cylinder—for the data to move underneath the heads. Recall that this time is termedrotationallatencyandwe include it in our model ofT Seek (seec 1 andc 3 in Table 3.1). Every disk type has its own distinct seek profile. Figure 3.5 illustrates the measured and modeled seek profile for a Seagate ST31200WD disk drive (shown without adding the rotational latency) and Table 3.1 lists the corresponding constantsused in Equation 3.2. These empirical results havebeen obtained with the embedded differentialFast&WideSCSI I/Ointerfaceof anHP 9000735/125workstation. Throughoutthis thesiswewill use subscripts (e.g.,R Di ,T Seek i ) to denote the parametersof a specific disk type (e.g., SeagateST31200WD). 3.2.3 Low-Level SCSI Programming Interface Information about disk characteristics such as the seek profile or the detailed zone layout is not generally available. Hence, it is necessary to device methods to measure these parameters. The SCSI standard specifies many useful commands that allow device interrogation. However, SCSI devices are normally controlled by a device driver, which shields the application programmer from all the low-level device details and protects the system from faulty programs. In many operating systems, for example HP-UX TM and Windows NT TM , it is possible to explicitly send SCSI commands to a device through specialized system calls. Great care must be taken when issuing such commands because the usual error detection mechanisms are bypassed. The format to initiate a SCSI command is operating system dependent. For example, HP-UX utilizes a variation of the ioctl() system call while Windows NT provides the advanced SCSI programming interface (ASPI) 10 . The syntax and semantics of the actual SCSI commands are defined in the SCSI standard docu- ment [ANS94]. Three types of SCSI commands can be distinguished according to their command length: 6, 10, and 12 bytes. The first byte of each command is called operation code and uniquely identifies the SCSI function. The rest of the bytes are used to pass parameters. The standard also allows vendor-specific exten- sions for some of the commands, which are usually described in the vendor’s technical manuals of the device in question. Figure 3.7 shows a sample code fragment that allows the translation of linear logical block addresses into correspondingphysicallocationsofthe formhhead/cylinder/sectori. It usesthe SEND DIAGNOSTICScommand to pass the translate request to the device (the translate request is identified through the XLATE PAGE code in the parameter structure). By translating a range of addresses of a magnetic disk drive, track and cylinder sizes as well as zone boundaries can be identified. Appendix A contains tables that illustrate the results for different disk types. 3.2.4 Validation Toincreaseconfidenceintheconclusionsdrawnfromanalyticalandsimulationmodelswehaveverifiedandval- idatedthese modelsagainsta realimplementation. We reportedpreliminaryresultsin earlierstudies[GSZ95]. For this dissertation we tested the accuracy of our models extensively to demonstrate a close match between analytical, simulation, and implementation results. For example, the disk utilization graphs presented in 10 ASPI was defined by Seagate Technology, Inc. 3.3. SUMMARY 23 200 400 600 800 1000 Retrieval Block Size [KB] 20 40 60 80 100 Overhead [%] ST31200WD (Hawk 1LP), introduced 1990 ST32171WD (Barracuda 4LP), introduced 1993 ST34501WD (Cheetah 4LP), introduced 1996 0 Figure 3.6: Mechanical positioning overhead as a function of the retrieval block size for three disk drive models. The overhead includes the seek time for the disk heads to traverse over half of the total number of cylinders plus the average rotational latency, i.e., T Seek ( #cyl 2 ) (for details see Equation 3.2 and Table 3.1). The technologicaltrends (see Figure 1.2 and 1.3) are such that the data transfer rate is improving faster than the delays due to mechanical positioning are reduced. Hence the wasteful overhead is increasing, relatively speaking, and seek reducing scheduling algorithms are becoming more important. Section 4.5 for a storage system consisting of one ST34501WD and four ST31200WD disks show a deviation of the simulation and implementation results of no more than +2.9%/-6.9% from the analytically predicted values. 3.3 Summary Magnetic disk drives are the storage devices of choice for CM servers because of their high performance and low cost. However, they consist of a sophisticated combination of mechanics and electronics that needs to be well-understood to operate smoothly within the real-time environment required to deliver CM streams. This chapter first described the fundamental techniquesto display isochronousmedia types. Next, it provided an overview of the internal operation of modern magnetic disk drives and how it relates to the CM display paradigm. Third, disk drive modeling techniques were introduced that allow the accurate prediction of disk performance, such as the service time for a data block retrieval. Techniques to measure or extract some of the modeling parameters directly from the actual physical devices were provided. The next chapter introduces three non-partitioning CM storage solutions based on the detailed under- standing of magnetic disk drives. 24 CHAPTER3. FUNDAMENTALS OF CMDISPLAY AND MAGNETIC DISK DRIVES int TranslateAddr (fd, addr) /* This routine translates an address from the LBA (logical block address) format */ /* to the physical sector format. It will return 0 on success and -1 on error. */ int fd; /* File descriptor. */ int addr; /* Address to translate. */ { unsigned short parm_len; struct xlate_addr_page buff; /* Set up the input data structure. */ memset(&buff, 0, sizeof(buff)); buff.page_code = XLATE_PAGE; buff.reserved1 = 0x00; buff.page_len = 0x0A; buff.supp_format = 0x00; buff.xlate_format = PHYSICAL_SECTOR; buff.addr[0] = (unsigned char)(addr >> 24); buff.addr[1] = (unsigned char)(addr >> 16); buff.addr[2] = (unsigned char)(addr >> 8); buff.addr[3] = (unsigned char)addr; parm_len = BUFF_LEN; /* Set up and send the command. */ memset(&sctl_io, 0, sizeof(sctl_io)); /* Clear reserved fields. */ sctl_io.flags = 0x00; /* No input data is expected. */ sctl_io.cdb[0] = CMDsend_diag; /* SEND DIAGNOSTIC(6) command. */ sctl_io.cdb[1] = 0x00 | PF; /* Page format. */ sctl_io.cdb[2] = 0x00; /* Reserved. */ sctl_io.cdb[3] = parm_len >> 8; /* Parameter list length, MSB. */ sctl_io.cdb[4] = parm_len; /* Parameter list length, LSB. */ sctl_io.cdb[5] = 0x00; /* Control. */ sctl_io.cdb_length = 6; /* 6 byte command. */ sctl_io.data = &buff; /* Data buffer location. */ sctl_io.data_length = parm_len; /* Maximum transfer length. */ sctl_io.max_msecs = 10000; /* Allow 10 seconds for command. */ if (ioctl(fd, SIOC_IO, &sctl_io) < 0) { /* Request was invalid. */ printf("Error: request was invalid!\n"); return(-1); } else if (sctl_io.cdb_status == S_GOOD) { /* Device is ready. */ if (RecvDiagCmd(fd, &buff) == 0) { /* Receive translated address. */ PrintXlateAddr(&buff, addr); /* Print translated address. */ return(0); } else return(-1); } else { /* Error condition. */ printf("Error: state %d!\n", sctl_io.cdb_status); return(-1); } } Figure3.7: Samplecodefragmentto interrogatethe controllerelectronicsof a SCSIdiskabout thetranslation of logical block addresses (of a linear vector) into physicalhhead/cylinder/sectori locations under HP-UX TM 9.07. Chapter 4 Three Non-partitioning Techniques This chapter describes three non-partitioning techniques that support the hiccup-free display of isochronous media, such asaudio and video, from a heterogeneousstoragesystem. The three techniques differ in how they place data and schedule disk bandwidth. These decisions impact how much of the available disk bandwidth andstoragespacecanbeutilized,andhowmuchmemoryisrequiredtosupportafixednumberofsimultaneous displays. The three alternative techniques are Disk Grouping, Staggered Grouping, and Disk Merging. They are illustrated with an example storage system consisting of six disks, two of each disk type detailed in Table 1.2. We assume a location independent, constant average transfer rate for each disk model even though these are multi-zone drives that provide a different transfer rate for each zone. This simplifying assumption was made to provide a focused presentation of the non-partitioning techniques. It is eliminated in Section 4.6. We concentrate on constant bit rate media data types whose bandwidth requirement (R C ) 1 remains fixed as a function of time for the entire display time of a clip. To guarantee a hiccup-free display, a clip must be produced at a rate of R C . This is accomplished using the concept of time period (T p ) and logical blocks (B l ) [TPBG93, RV93, AOG91]. Both, data placement and retrieval are performed in a round-robin manner. Once a requestarrivesreferencing objectX, the system producesa logicalblock ofX per time period starting with X 0 . The display time of a block is equivalent to the duration of a time period. A client initiates the display of the first block of X (X 0 ) at the end of the time period that retrieved this block. This enables the server to retrieve the blocks of the next time period using an elevator algorithm to minimize the impact of seeks [YCK93]. Our multi-node target infrastructure is illustrated in Figure 6.1 and the architectural details of each storage node are presented in Figure 3.2. An implementation of our target architecture is realized in Mitra, an operational CM server research prototype [GZS + 97]. Below,wedescribe the organizationof logicalblocksandthe concept of a time period with eachtechnique. We identify the different system parameters and present optimization techniques along with a configuration planner for computing optimal values for these system parameters. The first two techniques are neither as flexible nor asefficient asDisk Merging. Byflexibility we mean that they cannotsupport an arbitrarynumber of disks with different physical characteristics. They are described due to their simplicity. In addition, they serve as a foundation for Disk Merging. 4.1 Disk Grouping As implied by its name, this technique groups physical disks into logical ones and assumes a uniform char- acteristic for all logical disks. To illustrate, the six physical disks of Figure 4.1 are grouped to construct two homogeneous logical disks. In this figure, a larger physical disk denotes a newer disk model that pro- vides both a higher storage capacity and a higher performance 2 . The blocks of a movie X are assigned to 1 The definitions of the parameters used to describe the different techniques are summarized in Table 4.1. 2 While this conceptualization might appear reasonable at first glance, it is in contradiction with the current trend in the physical dimensions of disks that calls for a comparable volume when the same form factor is used (e.g., 3.5” platters). For example, the ST31200WD holding 1 GByte of data has approximately the same physical dimensions as compared with the ST34501WD holding 4 GBytes. 25 26 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES Term Definition Units R D i Transfer rate of physical disk i MB/s R C Display bandwidth requirement (consumption rate) Mb/s X Continuous media object X i Logical block i of object X X i.j Fragment j of logical block i (of object X) n Number of sub-fragments that constitute a fragment d p i Physical disk drive i d l i Logical disk drive i D Number of physical disk drives D l Number of logical disk drives p i Number of logical disks that map to physical disk type i p 0 Number of logical disks that map to the slowest physical disk type m Number of different physical disk types K Number of physical disks that constitute a logical disk N Number of simultaneous displays supported by a single logical disk N Tot Number of simultaneous displays supported by the system (throughput) B i Size of a fragment assigned to physical disk i MB B l Size of a logical block assigned to a logical disk MB M Total amount of memory needed to support N Tot streams MB Lmax Maximum latency (without queuing) sec L avg−min Average-minimum latency ( Tp 2 ) sec C Cost per stream $ C 0 Adjusted cost per stream (including cost for unusable space) $ M$ Cost per megabyte of memory $ D$ i Cost per disk drive of physical disk model i $ W$ Cost per megabyte of unused storage space $ S Storage space usable for continuous media MB W Percentage of storage space unusable for continuous media % G Parity group size G i Parity group i; a data stripe allocated across a set of G logical disks and protected by a parity code L Disk utilization (also referred to as load) of an individual, logical disk Tp Duration of a time period (also round or interval) sec Sp Duration of a sub-period sec T Seeky (x) Seek time to traverse x cylinders (0<x≤#cyly) with seek model of disk type y msec (#cyly denotes the total number of cylinders of disk type y); e.g., Table 3.1 T Transfery Time to transfer a block of data from a disk of type y msec T Servicey Total time to retrieve a block of data from a disk of type y; msec (T Servicey =T Transfery +T Seeky ) #disk Number of physical disks per type #seek Number of seeks incurred per time period n i Number of seeks incurred because of sub-fragmentation #cyl Number of cylinders of a disk drive Table 4.1: List of parameters used repeatedly in this thesis and their respective definitions. 4.1. DISK GROUPING 27 X 0.0 d 0 Physical Logical Time X 0.1 X 0.2 X 0 X 1.0 X 1.1 X 1.2 X 1 d 1 d 2 d 3 d 4 d 5 X 0.0 X 0.1 X 0.2 X 1.0 X 1.1 X 1.2 T P0 T P2 Start display of X Retrieval Schedule d d 0 1 X 2 X 3 X 2.0 X 2.1 X 2.2 T P1 ... (e.g., 48 slots) ... (e.g., 48 slots) ... ... ... ... l l p p p p p p Figure 4.1: Disk Grouping. the logical disks in a round-robin manner to distribute the load of a display evenly across the available re- sources [SGM86, BGMJ94, ORS96a, FT96]. A logical block is declustered across the participating physical disks [GR93]. Each piece is termed a fragment (e.g., X 0.0 in Figure 4.1). The size of each fragment is deter- mined such that the service time (T Service ) of all physical disks is identical (for an example see Figure 4.2). With a fixed storage capacity and pre-specified fragment sizes for each physical disk, one can compute the number of fragments that can be assigned to a physical disk. The storage capacity of a logical disk (number of logical blocks it can store) is constrained by the physical disk that can store the fewest fragments. For example, to support 96 simultaneous displays of MPEG-2 streams (R C = 3.5 Mb/s), the fragment sizes 3 of Table 4.2 enable each logical disk to store only 5,338 blocks. The remaining space of the other two physical disks (ST34501WD and ST31200WD) can be used to store traditional data types, e.g., text, records, etc. During off-peak times, the idle bandwidth of these disks can be employed to service requests that reference this data type. Whenthe systemretrievesa logicalblockinto memoryonbehalfof aclient, allphysicaldisksareactivated simultaneously to stage their fragments into memory. This block is then transmitted to the client. We now derive equations that specify the size of a block, the size of a fragment, the duration of a time period, the amount of memory required at the server, and the maximum startup latency a client observes with this technique. To guarantee a hiccup-free display, the display time of a logical block must be equivalent to the duration of a time period: T p = B l R C (4.1) Moreover, if a logical disk servicesN simultaneous displays then the duration of each time period must be greater or equal to the service time to readN fragments from every physical disk i (d p i ) that is part of the logical disk. Thus, the following constraint must be satisfied: T p ≥N×T Service i =N× B i R Di +T Seek i ( #cyl i N ) (4.2) Substituting T p with its definition from Equation 4.1 plus the additional constraint that the time period must be equal for all physical disk drives (K) that constitute a logical disk, we obtain the following system of equations (note: we use T Seek i as an abbreviated notation for T Seek i ( #cyl i N )): B l = K−1 X i=0 B i 3 Fragment sizes are approximately proportional to the disk transfer rates, i.e., the smallest fragment is assigned to the disk with the slowest transfer rate. However, different seek overheads may result in significant variations (see Figure 4.2). 28 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES 687.4 KB 395.4 KB Time [msec] 19.5 % 16.5 % Seagate ST34501WD : Overhead (seek time, rot. latency) : Data transfer time 30 20 10 0 166.0 KB 10.9 % 40 Seagate ST32171WD T T Seek Transfer T Service 50 60 Seagate ST31200WD (Hawk 1LP) (Barracuda 4LP) (Cheetah 4LP) 46.8 msec 48.5 msec 51.8 msec 11.3 msec 9.6 msec 6.3 msec Figure 4.2: Mechanical positioning overhead (seek and rotational delay) when retrieving blocks from three different disk models. (The block sizes correspond to the configuration shown in Table 4.2.) The retrieval block sizes have been adjusted to equalize the overall service time. Note that at first the Cheetah 4LP model seems to have the lowest overhead. However, this is only the case because of its large block size. For small block sizes the Cheetah wastes a higher percentage of its bandwidth. This trend is common for newer disk models (see Figure 3.6). Disk Model Fragment Fragment display Number of % Space for (Seagate) Size [KByte] time [sec] Fragments traditional data = 48 = 2 = 96 ST34501WD B 2 = 687.4 1.534 6,463 17.4% ST32171WD B 1 = 395.4 0.883 5,338 0% ST31200WD B 0 = 166.0 0.371 6,206 14.0% Table 4.2: Disk Grouping configuration example with three disk drive types (2 × ST31200WD, 2 × ST32171WD, and 2× ST34501WD). It supportsN Tot = 96 displays of MPEG-2 (R C = 3.5 Mb/s) with a logical block size ofB l =B 0 +B 1 +B 2 =1,248.7 KByte. 4.2. STAGGEREDGROUPING 29 B l R C ×N = B 0 R D0 +T Seek0 B l R C ×N = B 1 R D1 +T Seek1 (4.3) . . . = . . . B l R C ×N = B K−1 R DK−1 +T SeekK−1 Solving Equation Array 4.3 yields the individual fragment sizesB i for each physical disk type i: B i = R Di ×[T Seeki ×N×R C − P P A + P P B ] P K−1 z=0 R Dz −R C ×N (4.4) with P P A def = P i−1 x=0 R Dx ×T Seeki + P K−1 x=i+1 R Dx ×T Seeki P P B def = P i−1 y=0 R Dy ×T Seek y + P K−1 y=i+1 R Dy ×T Seek y Figure 4.2 illustrates how each fragment sizeB i computed with Equation 4.4 accounts for the difference in seek times and data transfer rates to obtain identical service times for all disk models. To supportN displays with one logical disk, the system requires 2×N memory buffers. Each display necessitates two buffers for the following reason: One buffer contains the block whose data is being displayed (it was staged in memory during the previous time period), while a second is employed by the read command that is retrieving the subsequent logical block into memory 4 . The system toggles between these two frames as a function of time until a display completes. As one increases the number of logical disks (D l ), the total number of displays supported by the system (N Tot = D l ×N) increases, resulting in a higher amount of required memory by the system: M =2×N Tot ×B l (4.5) InthepresenceofN Tot −1activedisplays,wecancomputethemaximumlatencythatanewrequestmight observe. Intheworstcasescenario,anewrequestmightarrivereferencingaclipX whosefirstfragmentresides on logical disk 0 (d l 0 ) and miss the opportunity to utilize the time period with sufficient bandwidth to service this request. Due to a round-robin placement of data and activation of logical disks, this request must wait for D l time periods before it can retrieve X 0 . Thus, the maximum latency L max is: L max =D l ×T p =D l × B l R C (4.6) These equationsquantify the resourcerequirement of a system with this technique. They can be employed to conduct an analytical comparison with the other techniques (see Section 5.2). 4.2 Staggered Grouping Staggered Grouping is an extension of the grouping technique of Section 4.1. It minimizes the amount of memory required to supportN simultaneous displays based on the following observation: Not all fragments of a logical block are needed at the same time. Assuming that each fragment is a contiguous chunk of a logicalblock, fragmentX 0.0 is required first, followed byX 0.1 andX 0.2 during a single time period. Staggered Grouping ordersthe physical disks that constitute a logicaldisk based on their transfer rate and assumes that the first portion of a block is assigned to the fastest disk, second portion to the second fastest disk, etc. The explanation for this constrained placement is detailed in the next paragraph. Next, it staggers the retrieval of each fragment as a function of time. Once a physical disk is activated, it retrievesN fragments of blocks 4 With off-the-shelf disk drives and device drivers, a disk read command requires the address of a memory frame to store the referenced block. The amount of required memory can be reduced by employing specialized device drivers that allocate and free memory on-demand [NY94b]. 30 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES X 0 X 1 X 0.0 X 0.1 X 0.2 X 1.0 X 1.1 X 1.2 S P0 S P1 S P2 Physical Logical T P0 T P2 Time Start display of X Retrieval Schedule X 2.0 X 2.1 X 2.2 X 2 X 3 T P1 X 0.2 d 2 X 0.1 X 0.0 X 1.2 X 1.1 X 1.0 d 1 d 0 d 5 d 4 d 3 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... (e.g., 16 slots each) d 0 l d 1 l p p p p p p Figure 4.3: Staggered Grouping. procedure PeakMem (Sp,RC,B0,...,BK−1,K); /* Consumed: the amount of data displayed during a sub-period Sp. */ /* K: the number of physical disks that form a logical disk. */ Consumed=Sp×RC; Memory =Peak =B0 +Consumed; for i = 1 to K−1 do Memory =Memory+Bi−Consumed; if (Memory >Peak) then Peak =Memory; if ((B0 +B1)>Peak) then Peak =B0 +B1; return(Peak); end PeakMem; Figure4.4: AlgorithmforthedynamiccomputationofthepeakmemoryrequirementwithStaggeredGrouping. referenced by theN active displays that require these fragments, see Figure 4.3. The duration of this time is termed a sub-period (S pi ) and is identical in length for all physical disks. Due to the staggered retrieval of the fragments, the display of an object X can start at the end of the first sub-period (as opposed to the end of a time period as is the case for Disk Grouping). The ordered assignment and activation of the fastest disk first is to guarantee a hiccup-free display. In particular,forslowerdisks,thedurationofasub-periodmightexceedthedisplaytimeofthefragmentretrieved fromthosedisks. Fragmentsizesareapproximatelyproportionaltoadisk’sbandwidthandconsequentlyslower disks are assigned smaller fragments. As an example, with the parameters of Table 4.2, each sub-period is 0.929 seconds long (a time period is 2.787 seconds long) with each disk retrieving 16 fragments: the two logical disks read 48 blocks each to supportN Tot = 96 displays, hence, 16 fragments are retrieved during each of the three sub-periods. The third column of this table shows the display time of each fragment. More data is retrieved from the Seagate ST34501WD (with a display time of 1.534 seconds) to compensate for the bandwidth of the slower ST31200WD disk (with a display time of 0.371 seconds). If the first portion of block X (X 0.0 ) is assigned to the ST31200WD then the client referencing X might suffer from a display hiccup after 0.371 seconds because the second fragment is not available until 0.929 seconds (one sub-period) later. By assigning the first portion to the fastest disk (Seagate ST34501WD), the system prevents the possibility of hiccups. Recall that an amount of data equal to a logical block is consumed during a time period. The amount of memory required by the system is a linear function of the peak memory (denoted ˆ P) required by a display during a time period,N andD l , i.e., M =N×D l × ˆ P. For a display, the peak memory can be determined using the simple dynamic computation of Figure 4.4. This figure computes the maximum of two possible peaks: one that corresponds to when a display is first started (labeled (1) in Figure 4.5) and a second that is reached while a display is in progress (labeled (2) in Figure 4.5). Consider each in turn. When 4.2. STAGGEREDGROUPING 31 Time Peak Memory [KB] C C C: Consumed = 3 S P0 S P1 S P2 T P0 S P0 S P1 S P2 T P1 S P0 S P1 S P2 T P2 (1) (2) ... 100 200 300 400 500 600 700 800 C 900 1000 1100 (2) 0 1 2 0 2 0 1 2 + + 1 Figure 4.5: Memory requirement for Staggered Grouping. a retrieval is first initiated, its memory requirement is the sum of the first and second fragment of a block, because the display of the first fragment is started at the end of the first sub-period (and the block for the next fragment must be allocated in order to issue the disk read forB 1 ). For an in-progress display, during each time period that consists of K sub-periods, the display is one sub-period behind the retrieval. Thus, at the end ofeachtime period, onesub-periodworthofdata remainsin memory(the variabletermedConsumed in Figure 4.4). At the beginning of the next time period, the fragment size ofB 0 is added to the memory requirementbecausethismuchmemorymustbeallocatedtoactivatethedisk. Next,fromthecombinedvalue, the amount of data displayed during a sub-period is subtracted and the size of the next fragment is added, i.e., the amount of memory required to read this fragment. This process is repeated for all sub-periods that constitute a time period. Figure 4.5 illustrates the resulting memory demand as a function of time. Either of the two peaks could be highest depending on the characteristicsand the number of physical disks involved. In the example of Figure 4.3 (and with the fragment sizes of Table 4.2), the peak amount of required memory is 1,103.7KBytes, which isreachedatthe beginning of eachtime period(peak (2) in Figure4.5). This compares favorably with the two logical blocks (2.44 MBytes) per display required by Disk Grouping. If no additional optimization is employed then the maximum value of either peak (1) or (2) specifies the memory requirement of a display. We now derive equations that specify the remaining parametersof this technique analogousto the manner in the previous section. The sub-periods S pi are all equal in length and collectively add up to the duration of a time period T p : S p ≡S p0 =S p1 =...=S pK−1 (4.7) T p = K−1 X i=1 S pi =K×S p (4.8) 32 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES The total block sizeB l retrieved must be equal to the consumed amount of data: B l = K−1 X i=0 B i =R C × K−1 X i=0 S pi =R C ×T p (4.9) Furthermore, the sub-period S p must satisfy the following inequality for the block sizeB i on each physical disk i (d p i ): S p ≥N× B i R Di +T Seeki ( #cyl i N ) (4.10) Solving for the fragment sizesB i leads to a system of equations similar to the Disk Grouping Equation 4.3 withthedifferenceofT p beingsubstitutedbyS p . AsaconsequenceoftheequalityT p =K×S p ,everyinstance ofN is replaced by the expressionN×K. The solution (i.e., the individual block sizeB i ) of this system of equations is analogous to Equation 4.4 where, again, every instance ofN is replaced withN×K (and hence, for notational convenience, T Seekx is equivalent to T Seekx #cylx N×K ). The total number of streams for a Staggered Grouping system can be quantified as: N Tot =K×D l ×N (4.11) Recallthatthereisnoclosedfromsolutionforthepeakmemoryrequirement(seeFigure4.4foritsalgorithmic computation). Hence, the total amount of memory M at the server side is derived as: M =N Tot ×PeakMem(S p ,R C ,B 0 ,...,B K−1 ,K) (4.12) Consequently, the maximum latency L max is given by: L max =K×D l ×S p =D l ×T p =D l × B l R C (4.13) 4.3 Disk Merging This technique separates the concept of logical disk from the physical disks all together. Moreover, it forms logical disks from fractions of the available bandwidth and storage capacity of the physical disks, as shown in Figure 4.6. These two concepts are powerful abstractions that enable this technique to utilize an arbitrary mix of physical disk models. To illustrate, consider the following example. Example 4.3.1: AssumethesixphysicaldiskdrivesofthestoragesystemshowninFigure4.6areconfigured such that the physical disks d 0 and d 3 each support 1.8 logical disks. Because of their higher performance the physical disks d 1 and d 4 realize 4.0 logical disks each and the disks d 2 and d 5 support 6.2 logical disks. Note that fractions of logical disks (e.g., 0.8 of d 0 and 0.2 of d 2 ) are combined to form additional logical disks (e.g., d l 1 , which contains block X 1 in Figure 4.6). The total number of logical disks adds up tob(2×1.8)+ (2×4.0)+(2×6.2)c = 24. AssumingN = 4 MPEG-2 (3.5 Mb/s) streams per logical disk, the maximum throughput that can be achieved is 96 streams (24×N =24×4). 2 Thedetaileddesignofthistechniqueisasfollows. First,itselectshowmanylogicaldisksshouldbemapped to each of the slowest physical disks and denotes this factor with p 0 5 . With no loss of generality, for Disk Merging we index the types in order of increasing performance. For example, in Figure 4.6, the two slowest disks d 0 and d 3 each represent 1.8 logical disks, i.e., p 0 =1.8. There are tradeoffs associated with choosing p 0 which will be addressed in detail with a configuration planner in Section 4.4. Next, the time period T p and the block size necessary to supportp 0 logical disks on a physical disk can be established by extending Equation 4.2 and solving for the block sizeB l as follows: T p ≥N× B l ×p 0 R D0 +#seek(p 0 )×T Seek 0 #cyl 0 #seek(p 0 )×N (4.14) 5 Subscripts for p i indicate a particular disk type, not individual disks. 4.3. DISK MERGING 33 X 6 d 0 Physical Logical d 1 d 2 d 3 d 4 d 5 T P0 T P1 T P2 X 7 X 8 X 9 X 2 X 3 X 4 X 0 X 1.0 X 0 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 X 13 X 14 X 0 X 1 X 2 . . . Time Start display of X Retrieval Schedule d 0 d 1 d 2 d 18 . . . T P3 T P4 d 17 d 7 d 5 X 15 X 16 X 17 X 18 X 19 d 19 X 5 X 1.1 X 18 X 19 X 20 X 21 X 14 X 15 X 16 X 12 X 13.0 X 17 X 13.1 =1.8 p 0 =4.0 p 1 =6.2 p 2 =1.8 p 0 =6.2 p 2 =4.0 p 1 d 22 d 21 X 20 X 21 X 22 X 23 d 23 X 10 X 11 X 22 X 23 p p p p p p l l l l l l l l l l l Figure 4.6: Disk Merging. B l = N×R C ×R D0 ×#seek(p 0 )×T Seek 0 #cyl 0 #seek(p 0 )×N R D0 −p 0 ×N×R C (4.15) As with the two previously introduced techniques, all the logical disks in the system must be identical. Therefore, T p andB l obtained from Equations 4.14 and 4.15 determine how many logical disks map to the other, faster disk types i, 1≤i≤m−1, in the system. Each factor p i must satisfy the following constraint: B l N×R C = B l ×p i R Di +#seek(p i )×T Seeki #cyl i #seek(p i )×N (4.16) To compute the values for p i from the above equation we must first quantify the number of seeks, i.e., #seek(p i ), that are induced by p i . The function #seek(p i ) consists of two components: #seek(p i )=dp i e+(n i −1) (4.17) Because the disk heads need to seek to a new location prior to retrieving a chunk of data of any size—be that a full logical block or just a fraction thereof—the number of seeks is at leastdp i e. However, there are scenarios whendp i e underestimates the necessary number of seeks because the fractions of logical disks no longer add up to 1.0. In these cases, the expression (n i −1) denotes any additional seeks that are needed. To illustrate, if there are two physical disk types with p 0 = 3.7 and p 1 = 5.6 then it is impossible to directly 34 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES combine the fractions 0.7 and 0.6 to form complete logical disks. Hence, one of the fractions must be further divided into parts, termed sub-fragments. However, this process will introduce additional seeks. Specifically, assumingn i sub-fragments,thenumberofseekswillincreaseby(n i −1). Becauseseeksarewastefuloperations, it is important to find an optimal division for the fragments to minimize the total overhead. For large storage systems employing a number of different disk types, it is a challenge to find the best way to divide the fragments. Further complicating matters is the phenomenon that the value of n i depends on p i and, conversely, p i might change once n i is calculated. Because of this mutual dependency, there is no closed form solution for p i from Equation 4.16. However, numerical solutions can easily be found. An initial estimate for p i may be obtained from the bandwidth ratio of disk types i and 0 (p i ≈ RD i RD 0 ×p 0 ). When iteratively refined, this value converges rapidly towards the correct ratio. The configuration planner of Section 4.4 uses such a numerical method. Appendix B presents a detailed algorithm to determine n i . Based on the number of physical disks in the storage system and the p i values for all the types the total number of logical disks (D l ) is defined as 6 : D l = $ m−1 X i=0 (p i ×#disk i ) % (4.18) Once a homogeneous collection of logical disks is formed, the block sizeB l of a logical disk determines the total amount of required memory, the maximum startup latency, and the total number of simultaneous displays. The equations for these are an extension of those described in Section 4.1 and are as follows. The total number of streams a system can sustain is proportional to the number of logical disks and the number of streams each logical disk can support: N Tot =N×D l (4.19) The total amount of memory M is: M =2×N Tot ×B l (4.20) While the maximum latency L max is no more than: L max =D l ×T p =D l × B l R C (4.21) The overall number of logical disks constructed primarily depends on the initial choice of p 0 . Figure 4.7 shows possible configurations for a system based on two disks each of type ST31200WD, ST32171WD, and ST34501WD. In Figure 4.7a, system throughputs are marked in the configuration space for various sample values of p 0 . A lower value of p 0 results in fewer, higher performing logical disks. For example, if p 0 = 1.8 then each logical disk can support up to 4 displays simultaneously (N ≤ 4). Figure 4.7a shows the total throughput for eachN = 1,2,3,4. As p 0 is increased, more logical disks are mapped to a physical disk and the number of concurrent streams supported by each logical disk decreases. For example, a value of p 0 = 7.2 results in logical disks that can support just one stream (N = 1). However, the overall number of logical disks also roughly increases four-fold. The maximum value of the product p 0 ×N is approximately constant 7 and limited by the upper bound of RD 0 RC . In Figure 4.7 this limit is RD 0 =3.47MB/s=27.74Mb/s RC=3.5Mb/s = 7.93. Pairs ofhp 0 ,Ni that approach this maximum will yield the highest possible throughput (e.g.,h3.6,2i orh7.8,1i). However, the amount of memory required to support such a high number of streams increases exponentially (see Figure 4.7b). The most economical configurations with the lowest cost per stream (i.e., approximately 100 streams, see Figure 4.7c) can be achieved with a number of different, but functionally equivalent,hp 0 ,Ni combinations. The number of possible configurations in the two-dimensional space shown in Figure 4.7a will increase if p 0 is incremented in smaller deltas than Δp 0 =0.2. Hence, for any but the smallest storage configurations it becomes difficult to manually find the minimum cost system based on the performance objectives for a given application. However, an automated configuration planner (see Section 4.4) can rapidly search through all configurations to find the ones that are most appropriate. 6 m denotes the number of different disk types employed and #disk i refers to the number of disks of type i. 7 It is interesting to note that, as a result of this property, the perceived improvement of the average seek distance d from d= #cyl i N (Equation 4.2) to d= #cyl i #seek(p i )×N (Equation 4.16) does not lower the seek overhead. 4.3. DISK MERGING 35 20 40 60 80 100 120 Number of Streams 20 40 60 80 100 120 Number of Logical Disks 0 Different factors 0 =1 =2 =3 =4 1.4 1.6 1.8 2.4 3.6 4.6 5.8 6.4 7.2 7.4 7.6 7.8 6.6 6.8 5.2 4.8 4.4 3.8 2.8 p N N N N 20 40 60 80 100 120 Number of Streams 100 200 300 400 500 600 700 800 Memory [MB] 0 Figure 4.7b: Memory 20 40 60 80 100 120 Number of Streams 50 100 150 200 250 300 350 400 Cost per Stream [$] 0 Figure 4.7c: Cost Figure 4.7a: Configuration space Figure 4.7: Sample system configurations for different values of p 0 (1.0≤p 0 ≤ 7.8 with increments of Δp 0 = 0.2) for a storage system with 6 physical disks (2× ST31200WD, 2× ST32171WD, and 2× ST34501WD). Streams are of type MPEG-2 with a consumption rate of R C =3.5 Mb/s. Disk Model p i Logical block Logical disk % Space for Overall % Space (Seagate) size [MB] size [MB] traditional data for trad. data = 1 = 96 = 96 ST34501WD p 2 =25.2 172.2 23.3% ST32171WD p 1 =15.6 0.554 132.1 0% W = 14.4% ST31200WD p 0 = 7.2 139.7 5.4% = 4 = 24 = 96 ST34501WD p 2 = 6.2 699.8 26.4% ST32171WD p 1 = 4.0 0.780 515.3 0% W = 16.5% ST31200WD p 0 = 1.8 558.9 7.8% Table4.3: TwoconfigurationsfromFigure4.7showninmoredetail. Their pointsintheconfigurationspaceof Figure4.7aarehp 0 =1.8,N =4iandhp 0 =7.2,N =1i, respectively. Streamsareoftype MPEG-2(R C =3.5 Mb/s). 36 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES Configuration Enumerator Configuration Selector (d) Number and models of disk drives Q Q’ (a) Minimum number of streams (b) Maximum storage space for non-CM Disk model characteristics Stage 1 Stage 2 (c) Cost per MB for non-CM All possible configurations Proposed configurations db Figure 4.8: Configuration planner structure. 4.3.1 Storage Space Considerations Utilizing most of the bandwidth of the physical disks in a system to lower the cost per stream C is certainly a compelling aspect of selecting a system configuration (i.e., p 0 andN). However, for some applications the available storage space is equally or more important. With Disk Merging the storage space associated with eachlogicaldiskmayvarydepending ontowhichphysicaldiskitismapped. Forexample,Table4.3showstwo possible configurations for the same physical disk subsystem. Because the round-robin assignment of blocks results in a close to uniform space allocation of data, the system will be limited by the size of the smallest logical disk. Both sample configurations supportN Tot = 96 streams. While one fails to utilize 14.4% of the total disk space in support of continuous media (W = 14.4%), the other wastes 16.5% of the disk space (W = 16.5%). In general, this space is not wasted and can be used to store traditional data, such as text and records, that are accessed during off-peak hours. Alternatively, the number of logical disks can purposely be decreased on some of the physical disks to equalize the disk space. The latter approach will waste some fraction of the disk bandwidth. In summary, if a configuration fully utilizes the disk bandwidth some storage space may be unused, or conversely, if all of the space is allocated then some bandwidth may be idle. Both approaches will be addressed by the configuration planner in Section 4.4. 4.4 Configuration Planner for Disk Merging The same hardware configuration can support a number of different system parameter sets depending on the selected value of p 0 , see Figure 4.7, The total number of streams, the memory requirement, the amount of storage space unusable for continuous media, and ultimately the cost per stream are all affected by the choice ofp 0 . The value ofp 0 is a real number and theoretically an infinite number of values are possible. But even if p 0 is incremented by some discrete step size to bound the search space, it would be very time consuming and tedious to manually find the best value ofp 0 for a given application. To alleviate this problem a configuration planner is presented in this section that can select an optimal value for p 0 based on a number of parameters of the desired system. Figure 4.8 shows the schematic structure of the configuration planner. It works in two stages: Stage 1 enumerates all the possible configuration tuples based on a user supplied description of the disk subsystem under consideration. The calculations are based on the analytical models of Section 4.3. Stage 2 filters the outputofStage1byremovingconfigurationsthat donotmeet theminimum performanceconstraintsrequired by the application. The result of Stage 2 is a set of proposed configurations that provide the lowest cost per stream or, alternatively, make the best use of the available storage space. 4.4. CONFIGURATION PLANNER FORDISK MERGING 37 for p0 =1 to R D 0 R C step Δp0 do forN =1 to R D 0 R C ×p 0 step 1 do /* Computation of the tuples Q. */ end for; end for; Figure 4.9: The two nested loops that define the configuration planner search space. 4.4.1 Operation The detailed operationof the planner is as follows. The number and type of disks of the storagesystem under consideration are input parametersinto Stage 1. This information is used to access a database that stores the detailed physical characteristics of these disk drive types. (This information is similar to the examples listed in Tables 1.2 and 3.1.) With p 0 =1 the analytical models of Section 4.3 are then employed (by iterating over values ofN) to produce output values for the total number of streamsN Tot , the number of logical disks D l , the memory required M, the maximum latency L max , the logical block sizeB l , the storage space usable for continuous media S, the percentage of the total storage space that cannot be used for continuous media W, the cost per stream C (see Equation 4.22), and p i ,i≥ 1. WhenN reaches its upper bound of RD 0 RC×p0 , the value of p 0 is incremented by a user specified amount, termed the step size and denoted with Δp 0 , and the iterationoverN isrestarted. Thisprocessisrepeateduntilthesearchspaceisexhausted. Stage1producesall possible tuples Q=hN Tot ,D l ,M,L max ,B l ,S,W,C,p 0 ,p 1 ,...,p m−1 i for a given storagesubsystem. Table 4.4 shows an example output for a 30 disk storage system with three disk types. To providethe output of Stage 1 in a more useful and abbreviated form the data is fed into Stage 2, which eliminates tuples Q that do not meet certain constraints, and then orders the remaining tuples according to an optimization criterion. The input parameters to Stage 2 are (a) the minimum desired number of streams, (b) the maximum desired cost per stream, and (c) a hypothetical cost per MByte of disk space that cannot be utilized by CM applications. Stage 2 operates as follows. First, all tuples Q that do not provide the minimum number of streams desired are discarded. Second, an optimization component calculates the logical cost C 0 per stream for each remaining tuple Q 0 according to Equation 4.23, C = (M×M$)+ P D−1 i=0 D$ i N×D l (4.22) C 0 =C + W 100%−W ×S×W$ (4.23) where M$ is the cost per MByte of memory, D$ i is the cost of each physical disk drive i, and W$ is the cost per MByte of unused storage space. ByincludingahypotheticalcostforeachMByteofstoragethatcannotbeutilizedforCM(inputparameter (c)), Stage 2 can optionally optimize for (i) the minimum cost per stream, or (ii) the maximum storage space for continuous media, or (iii) a weighted combination of (i) and (ii). The final, qualifying tuples Q 0 will be output by Stage 2 in sorted order of ascending total cost per stream C 0 as shown in Table 4.5. For many applications the lowest cost configuration will also be the best one. In some special cases, however, a slightly more expensive solution might be advantageous. For example, the startup latency L max of a stream is a linear function of the number of logical disk drives in the system (see Equation4.21). Hence,adesignermightchoose,forexample,thefourthconfigurationfromthetopinTable4.5 at a slightly higher cost per stream but for a drastically reduced initial latency (57.4 seconds versus 273.2 seconds, maximum). Alas, in some cases the final decision for a certain configuration is application-specific, but the planner will always provide a comprehensive list of candidates to select from. 4.4.2 Search Space Size Itishelpfultoquantifythesearchspacethattheplannermusttraverse. Ifitissmallthenanexhaustivesearch is possible. Otherwise, heuristics must be employed to reduce the time or space complexity. The search space is defined by Stage 1 of the planner. The algorithm consists of two nested loops, as illustrated in Figure 4.9. 38 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES N Tot D l M [MB] Lmax [sec] B [MB] S [MB] W [%] C [$] p 0 p 1 p 2 55 55 1.446 1.652 0.0131 55330.0 25.290 436.39 1.0 1.7 a 2.8 100 50 4.571 2.612 0.0229 50300.0 32.082 240.05 1.0 1.5 2.5 156 52 11.390 4.339 0.0365 52312.0 29.365 153.92 1.0 1.6 b 2.6 216 54 24.674 7.050 0.0571 54324.0 26.649 111.23 1.0 1.7 c 2.7 285 57 52.336 11.963 0.0918 57342.0 22.574 84.39 1.0 1.8 d 2.9 366 61 119.001 22.667 0.1626 61366.0 17.140 65.90 1.0 1.9 e 3.2 455 65 351.495 57.387 0.3863 63792.9 13.863 53.52 1.0 2.1 3.4 106 106 5.933 6.780 0.0280 62418.9 15.719 226.47 1.5 f 3.5 5.6 208 104 26.436 15.106 0.0635 63042.4 14.877 115.51 1.5 g 3.4 5.5 315 105 82.056 31.259 0.1302 63648.5 14.058 76.45 1.5 3.4 5.6 420 105 246.774 70.507 0.2938 63648.5 14.058 57.73 1.5 3.4 5.6 525 105 1684.140 384.946 1.6039 63648.5 14.058 48.92 1.5 3.4 5.6 97 97 4.433 5.067 0.0229 48791.0 34.120 247.47 2.0 3.1 4.6 h 208 104 23.761 13.577 0.0571 52312.0 29.365 115.50 2.0 3.2 5.2 360 120 117.050 44.590 0.1626 60360.0 18.499 66.99 2.0 3.8 6.2 192 192 22.153 25.318 0.0577 61830.0 16.514 125.12 2.5 g 6.4 10.3 328 164 90.099 51.485 0.1373 63774.3 13.888 73.45 2.5 f 5.3 8.6 519 173 1380.632 525.955 1.3301 63670.2 14.029 48.90 2.5 f 5.6 9.2 147 147 10.733 12.266 0.0365 49294.0 33.440 163.34 3.0 4.5 7.2 360 180 117.050 66.886 0.1626 60360.0 18.499 66.99 3.0 5.7 9.3 223 223 31.544 36.051 0.0707 63833.8 13.808 107.76 3.5 f 7.2 11.6 478 239 502.537 287.164 0.5257 63151.2 14.730 51.26 3.5 7.8 i 12.6 206 206 23.532 26.894 0.0571 51809.0 30.045 116.62 4.0 6.4 10.2 290 290 60.515 69.160 0.1043 64267.7 13.222 82.97 4.5 9.3 15.2 278 278 51.051 58.344 0.0918 55933.6 24.475 86.51 5.0 8.8 i 14.0 356 356 117.952 134.802 0.1657 64361.1 13.096 67.75 5.5 g 11.4 18.7 361 361 117.375 134.143 0.1626 60527.7 18.272 66.81 6.0 11.5 f 18.6 424 424 239.050 273.200 0.2819 64254.7 13.240 57.17 6.5 f 13.6 22.3 456 456 352.267 402.591 0.3863 63933.1 13.674 53.40 7.0 14.7 j 23.9 515 515 1080.622 1234.997 1.0491 63557.8 14.181 48.70 7.5 16.7 27.3 a through j : Fractions of logical disks must sometimes be devided into parts such that complete logical disks can be assembled (see Section 4.3 and Appendix B). For example, fractions of 10×0.7 and 10×0.8 become useful when 0.7 is divided into two parts, 0.2+0.5. Then 5 disks can be assembled from the 0.5 fragments and 10 disks froma combination of the 0.2+0.8 fragments. Theconfiguration planner automatically finds such divisions ifthey are necessary. For this table the complete list is as follows: a 0.2+0.5, b 0.2+0.4, c 0.2+0.2+0.3, d 0.1+0.2+0.5, e 0.1+0.8, f 0.1+0.4, g 0.2+0.3, h 0.1+0.5, i 0.2+0.2+0.4, and j 0.1+0.1+0.5. Table4.4: PlannerStage 1outputfor30disks(10×ST31200WD,10×ST32171WD,and10×ST34501WD). The raw tuples Q are sorted by increasing values of p 0 (third column from the right). Streams are of type MPEG-2 with R C =3.5 Mb/s and the value ofp 0 was iterated over a range of 1.0≤p 0 ≤7.5 ( RD ST31200WD RC = 7.93) with increments of Δp 0 =0.5. M$=$1 per MB of memory and D$ i =$800 per disk drive. 4.4. CONFIGURATION PLANNER FORDISK MERGING 39 N Tot D l M [MB] Lmax [sec] B [MB] W [%] C [$] C 0 [$] p 0 p 1 p 2 424 424 239.050 273.200 0.2819 13.240 57.17 547.43 6.5 13.6 22.3 356 356 117.952 134.802 0.1657 13.096 67.75 552.69 5.5 11.4 18.7 456 456 352.267 402.591 0.3863 13.674 53.40 559.75 7.0 14.7 23.9 455 65 351.495 57.387 0.3863 13.863 53.52 566.88 1.0 2.1 3.4 519 173 1380.632 525.955 1.3301 14.029 48.90 568.39 2.5 5.6 9.2 525 105 1684.140 384.946 1.6039 14.058 48.92 569.50 1.5 3.4 5.6 515 515 1080.622 1234.997 1.0491 14.181 48.70 573.81 7.5 16.7 27.3 420 105 246.774 70.507 0.2938 14.058 57.73 578.30 1.5 3.4 5.6 328 164 90.099 51.485 0.1373 13.888 73.45 587.73 2.5 5.3 8.6 478 239 502.537 287.164 0.5257 14.730 51.26 596.70 3.5 7.8 12.6 315 105 82.056 31.259 0.1302 14.058 76.45 597.02 1.5 3.4 5.6 366 61 119.001 22.667 0.1626 17.140 65.90 700.60 1.0 1.9 3.2 361 361 117.375 134.143 0.1626 18.272 66.81 743.42 6.0 11.5 18.6 360 120 117.050 44.590 0.1626 18.499 66.99 751.99 2.0 3.8 6.2 Table4.5: PlannerStage 2outputfor30disks(10×ST31200WD,10×ST32171WD,and10×ST34501WD). The input data for this example was as listed in Table 4.4. These final tuples Q 0 are sorted according to their increasing,adjusted costC 0 (fourth column from the right) which includes a hypothetical costfor eachMByte that cannot be utilized for continuous media (in this example W$ = $0.05 per MByte, M$ = $1 per MByte and D$ i =$800 per disk drive). The minimum number of streams desired was 300. Figure 4.9 also shows the three parameters that define the size of the search space: the disk bandwidth of the slowest disk drive R D0 , the bandwidth requirements of the media R C , and the step size Δp 0 . The outer loop, iterating over p 0 , is terminated by its upper bound of RD 0 RC . This upper bound describes into how many logical disks the slowest physical disk can be divided. The inner loop iterates overN to describe the number of streams supported by one logical disk. Hence, the upper bound of the inner loop is dependent on the outer loop. A low value of p 0 results in fewer, higher performing logical disks. As p 0 is increased, more logical disks are mapped to a physical disk and the number of concurrent streamsN supported by each logical disk decreases. Therefore the maximum value of the product p 0 ×N is approximately constant at a value of RD 0 RC . The total number of executions of the program block enclosed by the two loops can be estimated at: E #loops = 1 Δp 0 × R D0 R C + R D0 2×R C + R D0 3×R C +...+1 (4.24) ≈ 1 Δp 0 × R D0 R C × 1+ 1 2 + 1 3 + 1 4 +...+ 1 RD 0 RC ! (4.25) There is no closed form solution of Equation 4.24. However, since the number of terms enclosed by the parenthesis is j RD 0 RC k , a loose upper bound for E #loops is: E #loops < 1 Δp 0 × R D0 R C 2 (4.26) Table 4.6 shows two sample disk configurations, both with approximately the same aggregate disk band- width (200 MB/s) but with a different number of disk types. Figures 4.10 and 4.13 show the corresponding search spaces (R C = 3.5 Mb/s, Δp 0 = 0.1). In Figure 4.10 the slowest disk is of type ST31200WD which limits the ratio RD 0 RC to 7.93. The number of configuration points in Figure 4.10 is 139. The estimated search spacesize(usingEquation4.24)isE #loops =160,orabout15.1%higher. ThesamecalculationforFigure4.13 yields E #loops = 580 (with 469 actual points, an over-estimation of 23.7%). Note that the number of possi- ble configuration points depends on the ratio between the media bandwidth requirements and the disk drive production rate for the slowest disk in the system. It does not depend on the overall number of disk drives. Table 4.7 shows a few additional configuration spaces and estimates. Note that at each configuration point, for all disk types other than the slowest one, p i must be iteratively computed. Theupper bound illustratedin Equation4.26 isless than quadraticandthe examplesofTable 4.7 indicate that the exponentis approximately1.4(a five-foldincreasein the diskbandwidth resultsina ten-foldincrease in the search space, i.e., ( 18 MB/s 3.47 MB/s ) 1.4 =5.2 1.4 =10= 1381 139 ). 40 CHAPTER4. THREE NON-PARTITIONING TECHNIQUES Configuration A B Aggregate disk bandwidth 200 MB/s 200 MB/s Number of different disk types 3 2 Total number of disks 33 20 Number of disks of type ST34501WD 5 8 Number of disks of type ST32171WD 8 12 Number of disks of type ST31200WD 20 – Table 4.6: Two sample storagesubsystem configurations: A with three disk types, and B with two disk types. Both provide the same aggregate disk bandwidth of 200 MB/s. Slowest disk type Search space Model Bandwidth Planner Estimate (E #loops ) ST31200WD (3.47 MB/s) 139 160 (+15.1%) ST32171WD (7.96 MB/s) 469 580 (+23.7%) ST34501WD (12.97 MB/s) 898 1030 (+14.7%) D1998 (18 MB/s) 1381 1600 (+15.9%) Table 4.7: Sample configuration planner search space sizes (R C = 3.5 Mb/s). All configurations used three disk types with 10 disks per type. However, to find an optimal configuration the planner must consider that fragments sometimes need to be divided into sub-fragments (see Section 4.3). Unfortunately, the runtime of the algorithm to find the best fragment division is exponential (see Appendix B). Hence, an exhaustive search for large storage systems becomes impractical. In that case heuristics need to be employed. For example, the algorithm described in Appendix B might be modified to take advantage of a polynomial-time approximation [IK75]. 4.4.3 Step Size Δp 0 Ideallythevalueforp 0 isarealnumberandinfinitelymanyvaluesarepossiblewithinaspecifiedinterval. The smaller the step size Δp 0 , the closer this ideal is approximated. However, a disk subsystem can only support a finite number of streamsN Tot and the number of logical disks for any configuration must be between 1 andN Tot (i.e., a logical disk cannot support less than one stream). Because of this limitation, many values of p 0 will map to effectively the same final configurations. Since the step size Δp 0 is directly related to the execution time of the planner, it should be chosen sufficiently small to generate all possible configurationsbut large enough to avoid excessive and wasteful computations. Empirical evidence suggests that values between 0.01 and 0.1 result in an efficient and complete search space traversal. 4.4.4 Planner Extension to p 0 < 1 Thus far, an implicit assumption for Disk Merging has been that a physical disk would be divided into one or more logical disks, i.e., p 0 ≥ 1. However, we have already investigated how to combine block fragments into complete logical disks. By generalizing this idea we can extend the Disk Merging technique to include p 0 values that are finer in granularity, i.e., 0
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 870 (2005)
PDF
USC Computer Science Technical Reports, no. 625 (1996)
PDF
USC Computer Science Technical Reports, no. 799 (2003)
PDF
USC Computer Science Technical Reports, no. 598 (1994)
PDF
USC Computer Science Technical Reports, no. 685 (1998)
PDF
USC Computer Science Technical Reports, no. 627 (1996)
PDF
USC Computer Science Technical Reports, no. 628 (1996)
PDF
USC Computer Science Technical Reports, no. 742 (2001)
PDF
USC Computer Science Technical Reports, no. 610 (1995)
PDF
USC Computer Science Technical Reports, no. 762 (2002)
PDF
USC Computer Science Technical Reports, no. 666 (1998)
PDF
USC Computer Science Technical Reports, no. 892 (2007)
PDF
USC Computer Science Technical Reports, no. 909 (2009)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 615 (1995)
PDF
USC Computer Science Technical Reports, no. 846 (2005)
PDF
USC Computer Science Technical Reports, no. 590 (1994)
PDF
USC Computer Science Technical Reports, no. 878 (2006)
PDF
USC Computer Science Technical Reports, no. 844 (2005)
PDF
USC Computer Science Technical Reports, no. 886 (2006)
Description
Roger Zimmermann. "Continuous media placement and scheduling in heterogeneous disk storage systems." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 699 (1999).
Asset Metadata
Creator
Zimmermann, Roger
(author)
Core Title
USC Computer Science Technical Reports, no. 699 (1999)
Alternative Title
Continuous media placement and scheduling in heterogeneous disk storage systems (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
100 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270545
Identifier
99-699 Continuous Media Placement and Scheduling in Heterogeneous Disk Storage Systems (filename)
Legacy Identifier
usc-cstr-99-699
Format
100 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/