Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 598 (1994)
(USC DC Other)
USC Computer Science Technical Reports, no. 598 (1994)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Managemen t of Space in Hierarc hical Storage Systems
Shahram Ghandeharizadeh Douglas J Ierardi Roger Zimme rm ann
Departmen t of Computer Science
Univ ersit y of Southern California
Los Angeles California Decem ber Abstract
The past decade has witnessed a proliferation of rep ositories whose w orkload consists of
queries that retriev e information These rep ositories pro vide online access to v ast amountof
data and serve asanin tegral comp onentofman y applications eg library information systems
scien tic applications and the en tertainmen t industry Their storage subsystems are exp ected
to b e hierarc hical consisting of memory magnetic disk driv es optical disk driv es and tap e
libraries The database itself resides p ermanen tly on the tap e Ob jects are sw app ed on to
either the magnetic or optical disk driv es on demand and later deleted when the a v ailable
space of a device is exhausted This b eha vior will generally cause fragmen tation of the disk
space o v er a p erio d of time resulting in a noncon tiguous la y out of diskresiden tobjects As
a consequence the disk is required to rep osition its read head m ultiple times incurring seek
op erations whenev er a residen t ob ject is retriev ed This ma y reduce the o v erall p erformance of
the system
This study in v estigates four alternativ e tec hniques to manage the a v ailable space of me
c hanical devices in suc h hierarc hical storage systems Conceptually these tec hniques can b e
categorized according to ho w they optimize sev eral quan tities including the fragmen tation
of diskresiden t ob jects the amoun tof w asted space and adaptation to the ev olving
access pattern of an application F or eac h of these alternativ e strategies w e iden tify the funda
men tal factors that impact the p erformance of the system and dev elop analytical mo dels that
quan tify eac h factor These mo dels can b e emplo y ed b y a system designer to c ho ose among
comp eting strategies based on the ph ysical c haracteristics of b oth the system and the target
application
This researchw as supp orted in part b y the National Science F oundation under gran ts IRI IRI
NYI a w ard and CD A and a HewlettP ac k ard unrestricted cashequipmen t gift
In tro duction
A recen t trend in the area of databases has b een an increase in the n um b er of rep ositories whose pri
mary functionalit y is to disseminate information These systems are exp ected to pla y a ma jor role in
library information systems scien tic applications eg Bro okha v en protein rep ository BKW
the h uman genome rep ository Cou etc the en tertainmen t industry health care information
systems kno wledgebased systems etc These systems exhibit the follo wing c haracteristics First
they pro vide online access to v ast amoun t of data Second only a small subset of the data is
accessed at a giv en p oin t in time Third a ma jor fraction of their w orkload consists of readonly
queries F ourth ob jects managed b y these systems are t ypically large and irregularly structured
Fifth their applications consume the data at a high rate and almost alw a ys exhaust the a v ailable
disk bandwidth Hence they face the traditional IO b ottlenec k phenomenon As an example
consider the follo wing t w o applications
The health care industry en visions the use of image rep ositories to manage Xra ys PET and
MRI scans along with other patien t records These rep ositories enable a ph ysician to retriev e
and displa y an image for further analysis The size of a still image ma yv ary from sev eral
kilob ytes to h undreds of megab ytes if not gigab ytes dep ending on whether it is color or
blac k and white its resolution and lev el of detail F or example an uncompressed pixel Gra yscale image migh t b e megab ytes in size The same image in color w ould b e
megab ytes in size A rep ository managing thousands of images migh tbe h undreds of
gigab ytes in size with only a small fraction of images b eing accessed frequen tly eg those
corresp onding to the patien ts curren tly undergoing diagnosis and treatmen t T ypicallyan
application will retriev e an image in a sequen tial manner for displa y The faster the image
can b e retriev ed the so oner it can b e displa y ed due to the a v ailabilit y of fast CPUs
The en tertainmen t industry en visions the use of video rep ositories to pro vide the socalled
vide oondemand service the abilit y to displa y the mo vie of c hoice to a clien t up on request
Video ob jects are large in size F or example a t w o hour uncompressed video clip based
on NTSC
for net w orkqualit y video is appro ximately gigab ytes in size Moreo v er
it requires a megabits p er second m bps sustained bandwidth for its con tin uous displa y With a lossy compression tec hnique MPEG Gal that reduces the bandwidth requiremen t
of this ob ject to m bps this ob ject is appro ximately gigab ytes in size A rep ository
that con tains thousands of suc h ob jects is terab ytes in size with only a handful of them sa y
the to most p opular mo vies ha ving the highest frequency of access A clien t generally
retrievesamo vie in a sequen tial manner for displa y The large size of these databases has led to the use of hierarc hical storage structures This is
motiv ated primarily b y dollars and sense Storing terab ytes of data using DRAM w ould b e v ery
The US standard established b y the National T elevision System Committee
Tape Drives
Optical Disks
Magnetic Disks
Memory
Lower
Cost
per
Megabyte
Faster
Service
Time
Stratum
0
1
2
3
+
Higher
Density
Figure Hierarc hical storage system
exp ensiv e Moreo v er it w ould b e w asteful b ecause only a small fraction of the data is referenced at
an y giv en instan t in time ie due to lo calit y of references A similar argumen t applies to other
devices ie magnetic disks The most practical c hoice w ould b e to emplo y a com bination of fast
and slo w devices where the system con trols the placemen t of the data in order to hide the high
latency of slo w devices using fast devices
Assume a hierarc hical storage structure consisting of random access memory DRAM magnetic
disk driv es optical disks and a tap e library CHL see Figure As the dieren t strata of
the hierarc h y are tra v ersed starting with memory termed stratum b oth the densit y of the
medium the amoun t of data it can store and its latency increases while its cost p er megab yte
of storage decreases A t the time of this writing these cost v ary from megab yte of DRAM
to megab yte of disk storage to megab yte of optical disk to less than megab yte of
tap e storage An application referencing an ob ject that is disk residen t observ es b oth the a v erage
latency time and the deliv ery rate of a magnetic disk driv e whic h is sup erior to that of the tap e
library An application w ould observ e the b est p erformance when its w orking set b ecomes residen t
at the highest lev el of the hierarc h y memoryHo w ev er in our assumed en vironmen t the magnetic
disk driv es are the more lik ely staging area for this w orking set due to the large size of ob jects
T ypically memory w ould b e used to stage a small fraction of an ob ject for immediate pro cessing
and displa yW e dene the w orking set Den of an application as a collection of ob jects that
are rep eatedly referenced F or example in existing video stores a few titles are exp ected to
b e accessed frequen tly and a store main tains sev eral sometimes man y copies of these titles to
satisfy the exp ected demand These mo vies constitute the w orking set of a database system whose
application pro vides a videoondemand service
In general assuming that the storage structure consists of n strata w e assume that the database
resides p ermanen tly on stratum n F or example Figure sho ws a system with four strata in
whic h the database resides on stratum Ob jects are sw app ed in and out of a device at strata
i n based on their exp ected future access patterns with the ob jectiv e of minimizing the frequency
of access to the slo w er devices at higher strata This ob jectiv e minimizes the a v erage latency time
incurred b y requests referencing ob jects
A t some p oin t during the normal mo de of op eration the storage capacit y of the device at
stratum i will b e exhausted Once an ob ject o
x
is referenced the system ma y determine that the
exp ected future reference to o
x
is suc h that it should reside on a device at this stratum In this
case other ob jects should b e sw app ed out in order to allo w o
x
to b ecome residen t here Ho w ev er
this migration of ob jects in and out of strata ma y cause the a v ailable space of the devices to b e
come fragmen ted resulting in the noncon tiguous la y out of its residen t ob jects Unlik e DRAM
optical and magnetic disk driv es and tap e driv es are mec hanical devices Storing an ob ject non
con tiguously w ould cause the device to rep osition its read head when retrieving the ob ject reducing
the o v erall p erformance of the device When it is kno wn that a collection of blo c ks will b e retriev ed
sequen tially as in the applications considered here then it is adv an tageous to store the le con
tiguouslyT o demonstrate the signicance of this factor GRa rep orts that a t ypical magnetic
disk supp orted b y an adequate IO subsystem can sustain a data rate of m bps as long as it
is allo w ed to mo v e its arm monotonously in one direction With random blo c k accesses scattered
across the disk at saturation p oin t one w ould observ e a data rate of m bps from that same disk
This analysis assumes a blo c k size of Kilob ytes and a service time of millisecond to read a
blo c k In addition applications that emplo ycon tin uous media data t yp es eg audio and video
need to ensure con tin uous displayof eac h ob ject T oac hiev e this suc h systems m ust b e able to
predict the service time of a device suc h as a disk driv e in order to sc hedule the displa y of one
or more ob jects eectiv ely When the n um b er of seeks encoun tered during retriev al of an ob ject
is unpredictable the application ma yha venoc hoice but to mak e a conserv ativ e estimate on the
exp ected n um b er of seeks to ac hiev e a sucien tly high condence in its abilit y to sustain con tin uous
displa y As a consequence memory is w asted since more data m ust b e staged in memory than is
absolutely necessary See App endix A for further details of this application
T o illustrate the increase in n um b er of seeks as ob jects are sw app ed in and out consider the
curv e corresp onding to Standard in Figure This curv e presen ts the a v erage n um ber of disk
seeks required p er retriev al of an ob ject as a function of time for a le system that partitions
the a v ailable disk space in to blo c ks and manages blo c ks on an individual basis Details of the
exp erimen tal design are outlined in Section The disk starts with a few seeks on b ehalf of eac h
ob ject and settles with an a v erage of seeks during its later stages of op eration In addition to an
increase in the a v erage n um b er of seeks the v ariance in this quan tit y also increases Of course the
system ma y emplo y a reorganization pro cess to ensure the con tiguityof bloc ks whic h comprise an
ob ject With systems that ha veado wntime ie b ecome una v ailable for some duration of time
p erio dically the reorganization pro cedure can b e activ ated as an oline
activit y during this
p erio d Ho w ev er there are applications that cannot tolerate suc hdo wntime F or example health
care information systems are exp ected to pro vide unin terrupted service hours a da yy ear round
F or systems of this sort the reorganization pro cedure m ust b e an online pro cess One ma y design
an eectiv e reorganization pro cess based on the c haracteristics of the target application Ho w ev er
it is lik ely to suer from the follo wing limitations
The o v erhead of reorganization can b e high if in v ok ed frequen tly The reorganization pro cess can resp ond only when it has detected an undesirable b eha v
ior namely to o man y seeks Consequen tly the user ma y observea lo w er p erformance than
exp ected for a while b efore the reorganization pro cess can remedy the situation
The reorganization pro cess will almost certainly fail in en vironmen ts where the frequency of
access to the ob jects c hanges in a manner that the iden tit y of ob jects residen t in a stratum
c hanges frequen tlyF or example it ma y happ en that b y the time the reorganization pro cess
groups the blo c ks of an ob ject o
x
together the system has already elected to replace o
x
with
another ob ject that is exp ected to ha v e a higher n um b er of future references
One ma y design a space managementtec hnique that ensures a con tiguous la y out of eac h ob ject
eg REBA TE GI Generally sp eaking there is a tradeo b et w een the amountof con tiguit y
By oline w e mean that a pro cess is allo w ed to utilize all of the system resources in order to reorganize the
la y out of ob jects as fast as p ossible By online w e mean that the pro cess is allo w ed to utilize only a fraction of
resources to reorganize the la y out of ob jects in order to enable the system to con tin ue to pro vide service to user
requests In the latter case the users ma y observ e a degradation in system p erformance
guaran teed for the la y out of eac h ob ject on a device at stratum i and the amountof w asted space
on that device F or example a tec hnique that ensures the con tiguous la y out of eac h ob ject on a
magnetic disk mayw aste substan tial amoun t of disk space This tradeo migh tbe w orth while if
the w orking set of the target application can b ecome residen t on the magnetic disks It w ould not
be w orth while if the p enalt y incurred due to an increasing n um b er of references to slo w er devices
at lo w er strata out w eighs the b enets of eliminating disk seeks
The con tributions of this pap er are t w ofold First it emplo ys the design of the UNIX F ast
File System MJLF termed Standard to describ e the design of three new space managemen t
p olicies Dynamic REBA TE GI and EVEREST While Standard pac ks ob jects on to the disk
without ensuring the con tiguityof eac h ob ject b oth Dynamic and REBA TE striv e to ensure the
con tiguous la y out of eac h ob ject EVEREST on the other hand strik es a compromise b et w een the
t w o conicting goals con tiguous la y out v ersus w asted space b y appro ximating a con tiguous la y out
of eac h ob ject In a dynamic en vironmen t where the frequency of access to the ob jects ev olv es o v er
time the design of Standard Dynamic and REBA TE can b enet from a reorganization pro cess
that detects and eliminates an undesirable side eect
Standard b enets b ecause the reorganization pro cess can ensure a con tiguous la y out of eac h
ob ject once the system has detected to o man y seeks p er request
Dynamic b enets b ecause the reorganization pro cess detects and eliminates its w asted space
REBA TE b enets b ecause the reorganization pro cess maximizes the utilization of space b y
detecting and reallo cating space o ccupied b y the infrequen tly accessed ob jects
EVEREST is a prev en tivetec hnique that a v oids these undesirable side eects T o the b est of our
kno wledge the design of EVEREST is no v el and has neither b een prop osed nor in v estigated to this
date
Second this study iden ties the fundamen tal factors that impact the a v erage service time of the
system using alternativ e space managemen t p olicies and mo dels them analytically These mo dels
w ere v eried using a sim ulation study They quan tify the amoun t of useful w ork transfer of data
and w asteful w ork seeks prev en tiv e op erations reorganization access to slo w er devices due to
w asted space attributed to a design The mo dels are indep enden t of those strategies describ ed in
this pap er and can b e emplo y ed to ev aluate other space managemen t tec hniques Th us they can
Memory CPU
Magnetic
disk
Tape
System bus
Figure Arc hitecture
b e emplo y ed b y a system designer to quan tify the tradeo of one tec hnique relativ e to another with
resp ect to a target application and hardw are platform
The rest of this pap er is organized as follo ws Second describ es our target hardw are platform
In Section w e describ e the four alternativ e space managementtec hniques using this platform
Section demonstrates the tradeo asso ciated with these tec hniques using a sim ulation studyIn
Section w e dev elop analytical mo dels that quan tify the factors that impact the a v erage service
time of a system with alternativ e strategies Our conclusions are con tained in Section T arget En vironmen t
In order to fo cus on alternativ e tec hniques to manage the space of a mec hanical device this study
mak es the follo wing simplifying assumptions
The en vironmen t consists of strata memory disk and tap e library The service time of
retrieving an ob ject from tap e is signican tly higher than that from magnetic disk
The database resides p ermanen tly on the tap e The magnetic disk is used as a temp orary
staging area for the frequen tly accessed ob jects in order to minimize the n um b er of references
to the tap e All devices are visible to the user via memory see Figure The memory is used as a
temp orary staging area either to service a p ending request or to transfer an ob ject from tap e
to the magnetic disk driv e
The system accum ulates statistics on the frequency of access heat CABK to the ob jects
as it services requests for the users It emplo ys this past history to predict the future n um ber
of references to an ob ject
Eac h referenced ob ject is retriev ed sequen tially and in its en tiret y Either all or none of an ob ject is residen t at a stratum the system do es not main tain a p ortion
of an ob ject residen t on a stratum This assumption is justied in the follo wing paragraphs
With the assumed arc hitecture the time required to read an ob ject o
x
from a device is a function
of the size of o
x
the transfer rate of the device and the n um b er of seeks incurred when reading
o
x
The time to p erform a seek ma y include the time required for the readhead to tra v el to
the appropriate lo cation con taining the referenced data rotational latency time and the time
required to c hange the ph ysical medium when necessary The n um b er of accesses to a device on
b ehalf of an ob ject dep ends on the frequency of access to that ob ject its heat
Once ob ject o
x
is referenced if it is not disk residen t and there is sucien t space to store o
x
then o
x
is rendered disk residen t If the disk driv e has insucien t space to store o
x
then the system
m ust determine if it should delete one or more ob jects victims from this device in fa v or of o
x
Generally sp eaking the follo wing p olicy is emplo y ed for ob ject replacemen t The system determines
a collection of least frequen tly accessed ob jects sa y k of them whose total size exceeds the size of
o
x
If the total heat of these ob jects P
k
j
heat o
j
is lo w er than heat o
x
then the system deletes
these k ob jects in fa v or of o
x
As describ ed in Section this general replacemen t p olicy cannot b e
enforced with all space managementtec hniques in particular REBA TE and Dynamic In those
cases w e describ e its necessary extensions
An alternativ e to the assignmen t imp osed b y assumption migh t b e to strip e an ob ject across
the dieren t strata suchthateac h stratum p erforms its fair share of the o v erall imp osed w ork when
a request references this ob ject In the follo wing paragraph w e describ e this paradigm and its
limitations These limitations justify assumption With the striping paradigm eac hobjectisstriped in to n fragmen ts with eac h fragmen t
assigned to a device at stratum i n no fragmen ts are assigned to memory In order
to v oid the situation in whic h one device is w aiting for another while requests w ait in a queue the
system can c ho ose appropriate sizes for dieren t fragmen ts of eac h ob ject so that the service time
of eac h device at stratum i n is almost iden tical Ev ery time the ob ject is referenced
devices at all strata are activ ated eac h for the same amoun t of time Hence all devices con tribute
an equal share to the w ork p erformed for eac h request T o illustrate assume that the rate of
data deliv ery is t for tap e and t for magnetic disk Moreo v er assume that this deliv ery rate is
computed b y considering the o v erhead of initial and subsequen t seeks attributed to retriev al of an
ob ject from a device With these parameters this paradigm assigns
of o
x
to the magnetic disk
and
of o
x
to the tap e Once o
x
is referenced b oth devices are activ ated sim ultaneously with eac h
completing its retriev al of the fragmentof o
x
at appro ximately the same time
This paradigm suers from the follo wing limitations First for eac h ob ject o
x
it requires the
size of o
x
s diskresiden t fragmen t to b e larger than the fragmen t that is residen t on the tap e placing
larger fragmen ts on devices that ha v e a smaller storage capacit y If all ob jects are required to ha v e
their diskresiden t fragmen ts ph ysically presen t on the disk driv e then the amoun t of required disk
storage w ould b e larger than that of the tap e resulting in a high storage cost One ma y reduce
the amoun t of required disk storage in order to reduce cost b y rendering a subset of ob jects tap e
residen t in their en tiret y Once suc hanobjectsa y o
x
is referenced the system emplo ys the tap e
to retriev e o
x
without the participation of the magnetic disk During this time the system ma y
service other requests b y retrieving their disk residentfragmen ts Ho w ev er should these requests
require access to tap e residen t fragmen ts of their referenced ob jects then in eect the tap e has
fallen b ehind and b ecome a b ottleneckfor the en tire system at some p oin t the memory as a
temp orary staging area for these other ob jects will b e exhausted and the disk will sit idle and
w ait for the tap e to catc h up
A second limitation of this approac h is its requiremen t that the dieren t devices m ust b e
sync hronized so that they complete servicing requests sim ultaneously This in v olv es a computation
of the deliv ery rate of a device p erhaps in the presence of a v ariable n um b er of seeks This
sync hronization a v oids the scenario in whic h one device w aits for another in the presence of p ending
requests Suc h sync hron y is dicult to ac hieveev en in an en vironmen t that consists of homogeneous
devices suc has m ultiple magnetic disk driv es PGK Gib It b ecomes more c hallenging in a
heterogeneous system where eac h device exhibits its o wn unique ph ysical c haracteristics Due to
these limitations w e elected to eliminate striping from further consideration for the remainder of
this pap er
F our Alternativ e Space ManagementT ec hniques
This section presen ts four alternativ e space managementtec hniques Standard Dynamic
EVEREST and REBA TE Standard refers to the most common organization of disk space in
curren t op erating systems W eha v e elected to use the UNIX F ast File System MJLF termed
Con tiguous Ma y require W astes
La y out Reorganization Space
Standard NO YES NO
Dynamic YES YES YES
EVEREST NO NO NO
REBA TE YES YES YES
T able Characteristics of alternativ e space managementtec hniques
UNIX FFS to represen t this class of mo dels Hence for the remainder of this pap er Standard UNIX FFS While Dynamic and EVEREST are t w o dieren t algorithms eac h can b e view ed as an
extension to the Standard mo del REBA TE ho w ev er is a more radical departure that partitions
the a v ailable disk space in to regions where eac h region manages a unique collection of similarly
sized ob jects A region is not equiv alen t to a cylinder group as describ ed b y UNIX FFS
Both Dynamic and REBA TE ensure a con tiguous la y out of eac h ob ject Moreo v er b oth tec h
niques illustrate the b enets and diculties in v olv ed in pro viding suc h a guaran tee while main
taining sucien tly high utilization of the a v ailable space The design of Dynamic for example
demonstrates that a smart algorithm for ensuring con tiguit y of ob jects is b oth dicult to imple
men t and computationally exp ensiv e to supp ort In addition it w astes space REBA TE on the
other hand attempts to simplify the problem b y partitioning the a v ailable disk space in to regions
Within eac h region the space is partitioned in to xedsized frames that are shared b y the ob jects
corresp onding to that region In general ho w ev er the use of frames of xed size will increase
the amountofw asted space and the partitioning of resources mak es the tec hnique sensitiveto
c hanges in the heat of ob jects This sensitivit yto c hanging heats motiv ates the in tro duction of a
reorganization pro cess that detects these c hanges and adjusts the amoun t of space allo cated to
eac h region andor the sets of ob jects managed b yeac h region
EVEREST on the other hand do es not ensure a con tiguous la y out of eac h ob ject Instead
it appro ximates a con tiguous la y out b y represen ting an ob ject as small collection of c h unks Eac h
c h unk consists of a v ariable n um ber of con tiguous blo c ks Ho w ev er the n um ber of c h unks p er ob ject
and the n um b er of blo c ks p er c h unk are a xed function of the size of an ob ject and conguration
parameters Moreo v er the n um ber of c h unks is small b ounded logarithmically in the ob ject s
size In con trast to the other strategies EVEREST is prev en tiv e rather than detectiv e in its
managemen t of space fragmen tation Its adv an tages include ease of implemen tation a
minimal amountof w asted space comparable to the Standard in this resp ect and no need for
an auxiliary reorganization tec hnique Moreo v er the basic parameters of the EVEREST sc heme
can serv e to tune the p erformance of the system in tradingo time sp en t in its prev en tiv e
main tenance and the time attributed to seeks b et w een c h unks of residen t ob jects
W e describ e eachtec hnique in turn starting with Standard
Standard
T raditionally le systems ha vepro vided a deviceindep en den t storage service to their clien ts They
w ere not targeted to manage the a v ailable space of a hierarc hical storage structure Ho w ev er they
serv e as an ideal foundation to describ e the tec hniques prop osed in this studyW e use the Unix F ast
File System UNIX FFS as a represen tativ e of the traditional le systems W e could not justify the
use of SpriteLFS R O in this role and its detailed design b ecause it is an extended v ersion of
UNIXFFS designed to enhance the p erformance of le system for small writes R O con v ersely our target en vironmen t assumes a w orkload consisting of large sequen tial reads and writes Sim
ilarlyw eha vea v oided le systems that supp ort exten tbased allo cation eg WiSS CDKK
b ecause their design targets les that are allo w ed to gro w and shrink dynamically the ob jects in
our assumed en vironmen t are static in size
With UNIXFFS the size of a blo c k for device i determines the unit of transfer b et w een this
device and the memory With ob jects les that are retriev ed in a sequen tial manner the utilization
of a device is enhanced with larger blo c k sizes b ecause the device sp ends more of its time transfering
data p erforming useful w ork instead of rep ositioning its read head w asteful or at least p oten tially
a v oidable w ork F or example with UNIX FFS that supp orts small les the p erformance of a
magnetic disk drivew as impro v ed b y a factor of more than t wobyc hanging the blo c k size of the
le system from to b ytes MJLF A disadv an tage of using large blo c k sizes is in ternal
fragmen tation of space allo cated to a blo c k an ob ject consists of sev eral blo c ks with its last blo c k
remaining partially empt y UNIX FFS minimizes this w aste of space as follo ws It divides a single
blo ckin to m fr agments the v alue of m is determined at system conguration time Ph ysically a le is represen ted as l blo c ks and at most m fragmen ts Once a le gro ws to consist of m
fragmen ts UNIX FFS restructures the space to form a con tiguous blo c k from these fragmen ts
The adv antages ofthistec hnique include its simplicit y its ready a v ailabilit y from the
commercial arena its enhancemen t of the utilization of space b y minimizing w aste and its exibilit y it can emplo y the general replacemen t p olicy that w as outlined in Section to
resp ond to c hanging patterns of access to the ob jects A limitation of this tec hnique ho w ev er is
its inabilit y to ensure con tiguous la y out of the blo c ks of an ob ject on the surface of a device As
the system con tin ues op eration and ob jects are sw app ed in and out it will scatter the blo c ks of
a newly materialized ob ject across the surface of the device This motiv ates the adoption of a re
organization pro cedure that will groups the blo c ks of eac h ob ject together to ensure its con tiguit y Section sk etc hes the dra wbac ks inherentinsuc h a pro cedure
Dynamic
The metho d that w e term Dynamic is an extension of Standard Section that attempts to
guaran tee the con tiguit y of all diskresiden t ob jects Similar to Standard the a v ailable disk space
is partitioned in to blo c ks of a xed size Ho w ev er whenev er an ob ject is rendered disk residen t
Dynamic requires that the sequence of blo c ks allo cated to that ob ject b e ph ysically adjacen t The
goal of the ob ject replacemen t criterion is similar to that describ ed in Section namely to maximize
the w orkload of the device or the total heat con tributed b y the collection of diskresiden t ob jects
Ho w ev er the w a y that Dynamic striv es to ac hiev e this ob jectiv e diers in the follo wing w a y Let o
x
b e an ob ject requiring b blo c ks and assume that o
x
is not disk residen t The replacemen t
p olicy considers all p ossible con tiguous placemen ts of o
x
on the disk If there is some free region
that con tains b free blo c ks then o
x
can b e made disk residen t in this region and the w orkload of
the set of disk residen t ob jects increases On the other hand if no suc h free region exists then
it m ust b e the case that ev ery sequence of b con tiguous blo c ks con tains all or part of some other
residen t ob jects T o b e sp ecic let us x some sequence of b blo c ks Assume that these blo c ks
con tain all or part of ob jects o
i
o
j
together with zero or more free blo c ks If Dynamic w ere to
mak e o
x
residen t in these blo c ks the diskresiden t copies of these ob jects w ould b e destro y ed in
whole or in part Ho w ev er since w eha v e assumed that no ob jects ma y reside partially on the disk
whenev er a single blo c k o ccupied b y a residentobject iso v erwritten then this ob ject is destro y ed in
its en tiret yT o determine howthe w orkload mightc hange if o
x
is made residen t in these blo c ks w e
w ould lik e to quan tify the amountofw ork con tributed b y the curren t conguration and compare
that to the w ork exp ected from the prop osed c hange T odothis w e dene w ork as follo ws
Denition If o
x
is an obje ct then w ork o
x
heat o
x
size o
x
The denition captures the idea that part of the disk s w orkload that ma y b e attributed to requests
for ob ject o
x
is not merely a function of its heat but also dep ends on the amoun t of time used b y
the disk to service these requests This in turn dep ends up on the ob ject s size During an y p erio d
of time during which the objects heats remain xed one expects that this time will be proportional
to w ork o
x
for eac h o
x
that is actually diskresiden t if w e neglect the initial seek for eac h access
to o
x
T o illustrate whyw ork rather than heat alone is required b y Dynamic s replacemen t p olicy consider the follo wing example An ob ject o
x
of heat
requires blo c ks to b ecome disk residen t
On the disk there is a region of con tiguous blo c ks in whic h are free and are o ccupied
b y an ob ject o
y
of heat
On the one hand w e can exp ect that ob ject o
y
will receivet wice as
man y requests as ob ject o
x
On the other hand supp ose that the time required to service a single
request for o
y
is t neglecting the initial seek Then eac h request for o
x
requires t time Based
on these heats w e can exp ect that ab out one in ten requests will reference o
x
and one in v e will
access o
y
So o v er a sucien tly long sequence of requests one exp ects that
time servicing requests for o
x
time servicing requests for o
y
t
t
w ork o
x
w ork o
y
Materializing o
x
in this region will th us increase the exp ected w orkload of the disk
Dynamic s replacemen t p olicy ma y b e stated succinctly as follo ws On eac h request for an
ob ject o
x
that is not diskresiden t Dynamic considers all sequences of blo c ks where o
x
maybe
placed F or eac h p ossible placemen t it ev aluates the exp ected c hange in the w orkload of the disk
If materialization of o
x
will increase this quan tit y Dynamic stores o
x
in the region that maximizes
the w orkload of the disk Otherwise o
x
is not materialized
When Dynamic considers placing o
x
in a sequence of b blo cksitrst ev aluates the w ork con
tributed b y the curren t residen ts o
i
o
j
of those blo c ks W e dene this quan tit y j
X
k i
w ork o
k
to b e the w ork asso ciated with these blo c ks Rendering o
x
disk residentbyo v erwriting these ob jects
w ould increase the w orkload of the disk byw ork o
x
and reduce the w orkload b y the curren t
w ork asso ciated with ob jects o
i
o
j
Hence the exp ected c hange in the w orkload of the disk will
be
w ork o
x
j
X
k i
w ork o
k
If is p ositiv e then Dynamic materializes o
x
as it increases the w orkload of the disk
The algorithm that Dynamic uses to determine when and where to materialize an ob ject o
x
is a straigh tforw ard scan of the disk or rather a memoryresiden t data structure that records
the la y out of the curren t diskresiden t p opulation T o illustrate assume that o
x
needs blo c ks
to b ecome disk residen t In order to maximize the device utilization Dynamic m ust nd the con tiguous blo c ks on the device that con tribute the least to the device w orkload Conceptually this
can b e ac hiev ed b y placing a window of blo c ks at one end of the device and calculating the
total w orkload of all ob jects that can b e seen through this windo w The windo w is then slid do wn
the length of the disk Ev ery time that the set of ob jects visible through this windowc hanges the
visible w orkload is recalculated and the o v erall minim um v alue m is recorded After the en tire disk
is scanned m is compared to w ork o
x
and if w ork o
x
m o
x
is materialized in that sequence
of blo c ks with asso ciated w orkload m Otherwise o
x
is not materialized on the disk
The actual calculation can b e simplied somewhat b yk eeping an appropriate memoryresiden t
image of the disk s organization F or this w e emplo y a list of in terv als Eac hin terv al corresp onds
either to some sequence of blo c ks o ccupied b y a single ob ject or to a maximal con tiguous
sequence of free blo c ks All in terv als are annotated with their size and their residen t ob ject when
the blo c ks are not free When a request is made for an ob ject o
x
requiring b blo c ks Dynamic b egins
b y gathering in terv als from the head of the list un til at least b blo c ks ha v e b een accum ulated Sa y
this windo w consists of in terv als I
I
j
The total w orkload of the ob jects represen ted among
these in terv als is recorded Then to slide the windo w to its next in terv al the rst in terv al I
is
omitted zero or more of the in terv als I
j
I
j are added to the windo w un til it again con tains
at least b blo c ks The pro cess is rep eated across the en tire list while retaining a p oin ter to the
windowof minim um w orkload It is easy to see that the en tire algorithm is linear in the n um ber of
disk residentobjects d since the n um ber of in terv als free and o ccupied is no more than d ! and eachin terv al is added to and remo v ed from the windo w at most once
The adv an tage of this pro cedure is its abilit y to guaran tee con tiguous la y out of ob jects In
addition similar to Standard it alw a ys uses the most uptodate heat information in making
decisions concerning diskresidency so its disk conguration is adaptiv e and resp onds to c hanging
access patterns It uses the heat statistics to pac k the hottest ob jects on to the disk Colder
ob jects are remo v ed from the disk when it is found that hotter ob jects can increase the disk s
w orkload But eac h of these decisions m ust b e made in a greedy lo cal manner b y considering
ob jects as they are requested The decisions are further constrained b y the curren t organization
of the disk since Dynamic do es not c hange the la y out of those ob jects that remain disk residen t
More global reorganization of this sort ma y b e eected b y an auxiliary reorganization p olicy
Nev ertheless Dynamic can suer from the follo wing limitations First it will almost certainly
w aste disk space b ecause it do es not p ermit the discon tin uous la y out of an ob ject Smaller cold
or free sequences of blo c ks can b ecome temp orarily un usable when sandwic hed b et w een t w o hot
ob jects In eect the metho d is restricted in its later placemen t of data b y the currentla y out
whic h in turn can ev olv e in an unpredictable manner Moreo v er Dynamic optimizes the w orkload
of the disk b y considering only a lo cal greedy p ersp ectiv e Hence it ma y p erform w asteful w ork
F or example it ma y render an ob ject o
x
disk residen t only to o v erwrite it with a hotter ob ject
so on thereafter Of course as in the case of Standard Dynamic can also b e augmen ted with a re
organization sc heme that attempts to optimize the la y out of diskresiden t ob jects from a more global
p ersp ectiv e Suc h a reorganization pro cess w ould b e sub ject to the same limitations as outlined
in Section Finallyw e note that the algorithm discussed ab o v e for determining whether an
ob ject should b e materialized and where it should b e placed although linear in the n um ber of disk
residen t ob jects ma y b e timeconsuming when the n um b er of ob jects is large This ma y add a
signican t computational o v erhead to ev ery request
EVEREST
EVEREST is an extension of Standard designed to appro ximate a con tiguous la y out of eac h ob ject
0
1
2
3
4
Depth
Blocks
0123456789 101112131415
Section View
Buddies Buddies
Figure Ph ysical division of disk space in to blo c ks and the corresp onding logical view of the
sections with an example base of B on the disk driv e Its basic unit of allo cation is a blo c k also termed se ctions of heigh t Within the
EVEREST sc heme these blo c ks canbecom bined in a treelik e fashion to form larger con tiguous
sections As illustrated in Figure only sections of size blo c k B
i
for i are v alid where the
base B is a system conguration parameter If a section consists of B
i
blo c ks then i is said to b e
the heigh t of the section In general B heigh t i sections ph ysically adjacen t migh t b e com bined
to construct a heigh t i ! section
T o illustrate the disk in Figure consists of blo c ks The system is congured with B Th us the size of a section ma yvary from up tobloc ks In essence a binary tree is
imp osed up on the sequence of blo c ks The maxim um heigh t giv en b y
N dlog
B
b
C apacity
size blo c k c e is With this organization imp osed up on the device sections of heigh t i cannot start at just
anyblockn um b er but only at osets that are m ultiples of B
i
This restriction ensures that an y
section with the exception of the one at heigh t N has a total of B adjacen t buddy sections of
the same size at all times With the base organization of Figure eac h blo c k has one buddy This prop ert y of the hierarc h y of sections is used when ob jects are allo cated as describ ed b elo win
Section Organization and Managemen t of the F ree List
With EVEREST a p ortion of the a v ailable disk space is allo cated to ob jects The remainder
should an y exist is free The sections that constitute the a v ailable space are handled b y a memory
residen t fr e e list This free list is actually main tained as a sequence of lists one for eac h section
T o simplify the discussion assume that the total n um ber of blo c ks is a p o w er of B The general case can b e
handled similarly and is describ ed in Section
heigh t The information ab out an un used section of heigh t i is enqueued in the list that handles
sections of that heigh t In order to simplify ob ject allo cation the follo wing b ounde d list length
pr op erty is alw a ys main tained
Prop erty F or eac h heigh t i N atmost B free sections of i are allo w ed
Informally the ab o v e prop ert y implies that whenev er there exists sucien t free space at the
free list of heigh t i EVEREST must compact these free sections in to sections of a larger heigh t
Allo cation of an Ob ject
Prop ert y allo ws for straigh tforw ard ob ject materialization The rst step is to c hec k whether
the total n um ber of blo c ks in all the sections on the free list is either greater than or equal to the
n um b er of blo c ks denoted noofblo c ks o
x
that the new ob ject o
x
requires If this is not the
case one or more victim ob jects are elected and deleted The pro cedure for selecting a victim
is the same as that describ ed in Section The deletion of a victim ob ject is describ ed further
in Section b elo w Assuming at this p oin t that there is enough free space a v ailable o
x
is divided in to its corresp onding sections according to the follo wing sc heme First the n um ber
m noofblo c ks o
x
is con v erted to base BF or example if B and noofblo c ks o
x
then its binary represen tation is The full represen tation of suchacon v erted n um ber is
m d
j B
j ! ! d
B
! d
B
! d
B
In our example the n um b er can b e
written as ! ! ! In general for ev ery digit d
i
that is nonzero d
i
sections
are allo cated from heigh t i of the free list on b ehalf of o
x
In our example o
x
requires section
from heigh t no sections from heigh t section from heigh t and section from heigh t F or eac h ob ject the n um ber k of con tiguous pieces is equal to the n um b er of one s in the binary
represen tation of m or with a general base B k P
j
i
d
i
where j is the total n um b er of digits
Note that k is alw a ys b ounded b y B dlog
B
meF or an y ob ject k denes the maxim um n um ber
A lazy v ariantof this sc heme w ould allo w these lists to gro w longer and do compaction up on demand ie when
large con tiguous blo c ks are required This w ould b e complicated as a v arietyof c hoices migh t exist when merging
blo c ks This w ould require the system to emplo y heuristic tec hniques to guide the searc h space of this merging
pro cess Ho w ev er to simplify the description w e fo cus on an implemen tation that observ es the in v arian t describ ed
ab o v e
of disk seeks required to retriev e that ob ject The minim um is if all k sections are ph ysically
adjacen t A complication arises when no section at the righ theigh t exists F or example supp ose
that a section of size B
i
is required but the smallest section larger than B
i
on the free list is of
size B
j
j i In this case the section of size B
j
can b e split in to B sections of size B
j If
j i then B of these are enqueued on the list of heigh t i and the remainder is allo cated
Ho w ev er if j i then B of these sections are again enqueued at lev el j and the
splitting pro cedure is rep eated on the remaining section It is easy to see that whenev er the total
amoun t of free space on these lists is sucien t to accommo date the ob ject then for eac h section
that the ob ject o ccupies there is alw a ys a section of the appropriate size or larger on the list The
splitting pro cedure sk etc hed ab o v e will guaran tee that the appropriate n um b er of sections eac hof
the appropriate size will b e allo cated and that Prop ert y is nev er violated
The design of EVEREST is related to the buddy system prop osed in Kno LD for an
ecien t main memory storage allo cator DRAM The dierence is that EVEREST satises a
request for b blo c ks b y allo cating a n umberof sectionssuc h that their total n um b er of blo c ks equals
b The storage allo cator algorithm on the other hand will allo cate one section that is rounded
up to dlg be
blo c ks resulting in fragmen tation and motiv ating the need for either a reorganization
pro cess or a garbage collector GRb
Deallo cation of an Ob ject
When the system elects that an ob ject m ust b e materialized and there is insucien t free space
then one or more victims are remo v ed from the disk Reclaiming the space of a victim requires t w o
steps for eac h of its sections First the section m ust b e app ended to the free list at the appropriate
heigh t The second step is to ensure that Prop ert y is not violated Therefore whenev er a
section is enqueued in the free list at heigh t i and the n um b er of sections at that heigh t is equal to
or greater than B then B sections m ust b e com bined in to one section at heigh t i ! If the list at
i ! no w violates Prop ert y then once again space m ust b e compacted and mo v ed to section
i ! This pro cedure migh t b e rep eated sev eral times It terminates when the length of the list
for a higher heightislessthan B Compaction of B free sections in to a larger section is simple when the sections are all adjacen t
0123456789101112131415
BLOCKS:
: O
: O
: O
7
14
0
1
2
3
4
Depth
FREE LIST:
1
2
3
: free BLOCKS
Figure a Tw o sections are on the free list al
ready and and ob ject o
is deallo cated
0123456789101112131415
BLOCKS:
7
14
0
1
2
3
4
Depth
FREE LIST: 13
0123456789101112131415
BLOCKS:
7
14
0
1
2
3
4
Depth
FREE LIST: 13
6
Figure b Sections and should b e com
bined ho w ev er they are not con tiguous
Figure c The buddy of section is Data
m ust mo v e from to 0123456789101112131415
BLOCKS:
6
14
0
1
2
3
4
Depth
FREE LIST: 7
0123456789101112131415
BLOCKS:
6
0
1
2
3
4
Depth
FREE LIST:
14
4
Figure d Sections and are con tiguous
and can b e com bined
Figure e The buddy of section is Data
m ust mo v e from to 0123456789101112131415
BLOCKS:
4
0
1
2
3
4
Depth
FREE LIST:
6
0123456789101112131415
BLOCKS:
4
0
1
2
3
4
Depth
FREE LIST:
Figure f Sections and are no w adjacen t
and can b e com bined
Figure g The nal view of the disk and the
free list after remo v al of o
Figure Deallo cation of an ob ject The example sequence sho ws the remo v al of ob ject o
from
the initial disk residen t ob ject set fo
o
o
gBase t w o B
to eac h other in this case the com bined space is already con tiguous Otherwise the system migh t
b e forced to exc hange one o ccupied section of an ob ject with one on the free list in order to ensure
con tiguit y of an appropriate sequence of B sections at the same heigh t The follo wing algorithm
ac hiev es spacecon tiguit y among B free sections at heigh t i Chec k if there are at least B sections for heigh t i on the free list If not stop
Select the rst section denoted s
j
and record its blo c kn um b er ie the oset on the disk
driv e The goal is to free B sections ph ysically adjacentto s
j
Calculate the blo c kn um bers of s
j
s buddies EVEREST s division of disk space guaran tees
the existence of B buddy sections ph ysically adjacentto s
j
F or ev ery buddy s
k
k B k j if it exists on the free list then mark it
An y of the s
k
unmark ed buddies curren tly store parts of other ob jects The space m ust b e
rearranged b ysw apping these s
k
sections with those on the free list Note that for ev ery
buddy section that should b e freed there exists a section on the free list After sw apping
space b et w een ev ery unmark ed buddy section and a free list section enough con tiguous space
has b een acquired to create a section at heigh t i ! of the free list
Go bac ktoStep T o illustrate consider the organization of space in Figure a The initial set of disk residen t
ob jects is fo
o
o
g and the system is congured with B In Figure a t w o sections are on
the free list at heigh t and addresses and resp ectiv ely and o
is the victim ob ject that
is deleted Once blo c k is placed on the free list in Figure b the n um b er of sections at heigh t
is increased to B and it m ust b e compacted according to Step As sections and are not
con tiguous section is elected to b e sw app ed with section s buddy ie section Figure c
In Figure d the data of section is mo v ed to section and section is no w on the free list The
compaction of sections and results in a new section with address at heigh t of the free list
Once again a list of length t w o at heigh t violates Prop ert y and blo c ks are iden tied as
the buddy of section in Figure e After mo ving the data in Figure f from blo c ks to another compaction is p erformed with the nal state of the disk space emerging as in Figure g
Once all sections of a deallo cated ob ject are on the free list the iterativ e algorithm ab o v e is run
on eac h list from the lo w est to the highest heigh t The previous algorithm is somewhat simplied
b ecause it do es not supp ort the follo wing scenario a section at heigh t i is not on the free list
ho w ev er it has b een brok en do wn to a lo w er heightsa y i and not all subsections ha v e b een
used One of them is still on the free list at heigh t i In these cases the free list for heigh t
i should b e up dated with care b ecause those free sections ha vemo v ed to new lo cations In
addition note that the algorithm describ ed ab o v e actually p erforms more w ork than is strictly
necessary A single section of a small heigh t for example ma y end up b eing read and written
sev eral times as its section is com bined in to larger and larger sections This can b e eliminated in
the follo wing manner The algorithm is rst p erformed virtually that is in main memory as
a compaction algorithm on the free lists Once completed the en tire sequence of op erations that
ha v e b een p erformed determines the ultimate destination of eac h of the mo died sections These
sections are then read and written directly to their nal lo cations One ma y observ e that the total
amoun t of data that is mo v ed read and then written during an y compaction op eration is no more
than B times the total amoun t of free space on the list F or example when B then in the
w orst case the n um ber of b ytes written due to prev en tiveoperationsisno morethan the n um ber
of b ytes materialized in an amortized sense One ma y exp ect ho w ev er that for a collection of
ob jects of v arying sizes this n um b er to b e smaller
The v alue of B impacts the frequency of prev en tiv e op erations If B is set to its minim um v alue
ie B then prev en tiv e op erations w ould b e in v ok ed frequen tly b ecause ev ery time a new
section is enqueued there is a " c hance for a heigh t of the free list to consist of t w o sections
violates Prop ert y Increasing the v alue of B will therefore relax the system b ecause it
reduces the probabilit y that an insertion to the free list w ould violate Prop ert y Ho w ev er this
w ould increase the n um b er of seeks observ ed when retrieving an ob ject and the exp ected
n um ber of b ytes migrated p er prev en tiv e op eration F or example at the extreme v alue of B n
where n is the total n um b er of blo c ks the organization of blo c ks will consist of t w o lev els and
for all practical purp ose EVEREST reduces to a v arian t of Standard
The design of EVEREST suers from the follo wing t w o limitations First it incurs a xed
n um b er of seeks although few when reading an ob ject Second the o v erhead of its prev en tiv e
op erations ma y b ecome signican tifman y ob jects are sw app ed in and out of the disk driv e this
happ ens when the w orking set of an application cannot b ecome residen t on the disk driv e The pri
mary adv an tage of the elab orate ob ject deallo cation tec hnique of EVEREST is that it a v oid in ternal
and external fragmen tation of space as describ ed for traditional buddy systems see GRb
Implemen tation Considerations
In an actual implemen tation of EVEREST it migh t b e infeasible to x the n um b er of blo c ks as
an exact p o w er of B Rather one w ould generally x the blo c k size of the le system in a manner
dep endentupon ph ysical c haracteristics of b oth the device and the ob jects in the database This
is p ossible with some minor mo dications to EVEREST The most imp ortan t implication of an
arbitrary n um ber of blo c ks is that some sections ma y not ha v e the correct n um b er of buddies B of them Ho w ev er w ecan alw a ys mo v e those sections to one end of the medium for example
to the side with the highest blo c k osets Then instead of c ho osing the rst section in Step in
the ob ject deallo cation algorithm Section one should c ho ose the one with the lo w est blo c k
n um b er This ensures that the sections to w ards the critical end of the disk that migh t not ha v e
the correct n um b er of buddies are nev er used in b oth Steps and of the algorithm
REBA TE
REBA TE GI partitions the a v ailable space of a device i in to g regions G
G
G
g
b y
analyzing the storage capacit y of the device termed C
i
and the size and frequency of
access to eac h ob ject in the database termed size o
x
and heat o
x
resp ectiv ely CABK Eac h
region G
j
o ccupies a con tiguous amountofspace The amoun t of space allo cated to region G
j
termed space G
j
is determined suc h that the o v erall utilization of the space is maximized ie
the probability ofab yte from a region con taining useful data data whic h is most lik ely to b e
accessed in the future is maximized and is appro ximately the same for all regions Eac h region
manages a set of unique ob jects OBJ G
j
fo
o
o
k
g The minim um and maxim um size
of an ob ject managed b y a region G
j
termed min G
j
and max G
j
resp ectiv ely are unique The
space allo cated to region G
j
is split in to j
xed sized frames where j
b
space G
j
max G
j
c All ob jects
whose size lie in the range from min G
j
tomax G
j
are managed b y region G
j
and comp ete for its
frames REBA TE w astes disk space when the size of a frame is larger than its o ccup ying ob ject In
order to minimize this w aste the regions are constructed so that the size of all ob jects in OBJ G
j
is appro ximately the same Hence REBA TE attempts to minimize the v alue max G
j
min G
j
for eac h region G
j
If o
x
maps to region G
j
and do es not curren tly o ccup y a frame of G
j
then the
system compares w ork o
x
with the other ob jects that curren tly o ccup y a frame of G
j
It replaces the
ob ject with the least imp osed w ork sa yobject o
y
only if w ork o
y
w ork o
x
Otherwise o
x
do es
not b ecome residen t on this stratum F urther details are presen ted in GI whic h also pro vides
an ecien t dynamic programming algorithm for constructing optimal r e gionb asedp artitions when
accurate data on the heat of ob jects is a v ailable
With REBA TE the system migh t b e required to either construct new regions or reallo cate
space among the existing regions for at least t w o reasons First a new ob ject migh tbein tro duced
whose size is larger than the size of ob jects that constitute the presen t database In this case none
of the existing frames can accommo date this ob ject and so a new region of larger frames m ust
be in tro duced Second the access pattern to the ob jects migh tev olv e in a manner that dictates
the follo wing one or sev eral of the curren t regions deserv es more space than already allo cated to
it while other regions deserv e prop ortionally less space Hence the design of REBA TE includes a
reorganization tec hnique that p erio dically prop oses a new organization of the regions and renders
it eectiv e only if its exp ected impro v emen t in the actual hit ratio observ ed b y the device that
is the eectiv e utilization of its space exceeds a preset threshold This online reorganization
pro cedure is describ ed further in GI REBA TE ma y suer from t w o limitations First in a system where the ob jects sizes are
not naturally clustered in to lik esized classes REBA TE mayw aste space This underutilization
of a v ailable space in turn increase the frequency of access to the tertiary storage device when
compared to Standard that pac ks ob jects on the disk driv e without ensuring their con tiguit y
Y et ev en in the case where suc h natural classes exists determining a truly optimal partition of
the device s space amongst regions is an NPhard problem REBA TE s compromise settling
for regionbased partitions ma y in fact b e sub optimal in certain w orstcase scenarios GI Ov erall whether REBA TE outp erforms Standard dep ends on a n um b er of factors including the
amountof w asted space the size of the w orking set of an application relativ e to the capacityof
the device the o v erhead attributed to p erforming seeks when retrieving an ob ject and the p enalt y
incurred in accessing the device at the next stratum With our assumptions the impact of last
factor is signican t
Second REBA TE partitions the a v ailable disk space among m ultiple regions necessitating a
reorganization pro cess when deplo y ed for a database where the frequency of access to its ob jects
v aries dynamically o v er time This reorganization pro cedure is undesirable for sev eral reasons
First the o v erhead of reorganization can b e exp ensiv eif it ev aluates alternativela y out of regions
b yin v oking the REBA TE algorithm to o frequen tly Second the reorganization pro cedure can
only resp ond after it has detected a lo w er hit ratio than is exp ected Consequen tly the user
m ust observea higher latency than exp ected for a while b efore the reorganization pro cedure can
recognize and remedy the situation Third the reorganization pro cedure will almost certainly fail
in en vironmen ts where the frequency of access to the ob jects c hanges in a manner that forces a
frequen t reallo cation of space among regions a pingp ong eect in whic h space is sh u#ed bac k
and forth to follo w the regions of hottest ob jects When these frequencies c hange to o often an ev en
w orse situation arises where the reorganization pro cedure instan tiates a new la y out corresp onding
to heat v alues that ha v e already c hanged The system is th us trying to predict the future y et its
only guide in this task is the statistical information that it can accum ulate When these quan tities
are unreliable or sho w large and frequentv ariation these predictiv e metho ds are b ound to fail In
this circumstance the use of an online reorganization metho d can itself cause further degradation
in the system s p erformance
P erformance Ev aluation
T o quan tify the p erformance tradeos of Standard Dynamic EVEREST and REBA TE w e con
ducted a n um b er of sim ulation studies The sim ulation mo del ev olv ed o v er a p erio d of t w elv e
mon ths During this p erio d w e conducted man y exp erimen ts and gained insigh ts in to the
factors that impact the p erformance of the system with alternativ e space managemen t tec hniques
the exp erimen tal design of the sim ulator and the results that w ere imp ortan t to presen t
Indeed the design of EVEREST w as in tro duced once w e had understo o d the tradeos asso ciated
with Standard Dynamic and REBA TE
Almost all the comp onen ts of the sim ulator are straigh tforw ard except for the Driv er mo dule
that generates requests with eac h request referencing an ob ject It is complicated b ecause it
emplo ys sev eral distributions to generate the requests p ertaining to an arbitrary pattern of access
to the ob jects An arbitrary request generator w as desirable for sev eral reasons First it eliminated
the p ossibilit y of bias to w ards a tec hnique Second w e b eliev e that it mo dels realit y b ecause the
pattern of access to the ob jects is t ypically unkno wn in real applications Third using this mo del
it is straigh tforw ard to ev aluate the accuracy of the statistical mo dules for estimating heats W e
observ ed that the statistics mo dule is fairly accurate
The design of the Driv er is based on the assumption that the heat of ob jects ev olv es gradually F or example one ma y sample the distribution of access to the ob jects at t w o dieren tpoin ts in time
and observ e that " of requests are directed to " of the ob jects for the rst sample termed
access pattern and a access pattern for the second sample " of accesses are directed
to " of the ob jects Our assumption states that the heat ev olv ed incremen tally and that at
some p ointit w as more uniform than b oth and access patterns The Driv er mo dels
this paradigm b y using a normal distribution to mo del eac h of and access patterns
Next it migrates the heat of ob jects in steps from to After the rst in terv al the
distribution is more uniform than b oth and It is most uniform at the fth in terv al
Starting with the sixth in terv al the Driv er starts its progress to w ards a access pattern By
the ten th in terv al the Driv er is pro ducing requests to the ob jects based on a access pattern
The details of this is pro vides in Section Wein v estigated simpler designs for generating
requests eg c hanging the distribution of access from to in one step and observ ed no
c hange in the nal conclusions
Early on w eemplo y ed the a v erage service time observ ed with alternativ e space managemen t
tec hniques as the criterion to compare one strategy with another This w as a mistak e b ecause it hid
the factors that impact the p erformance of the system with alternativetec hniques b y asso ciating
w eigh ts to them These w eigh ts describ e the ph ysical c haracteristics of the devices in the hierarc h y By fo cusing on these factors instead of the a v erage service time w ew ere able to dev elop analytical
mo dels that incorp orate the ph ysical parameters of a system to compute the a v erage service time
see Section These analytical mo dels w ere v alidated using the sim ulation study with less than
" margin of error Using these mo dels the system designer ma yc ho ose the v alue of parameters
corresp onding to a target hardw are platform to ev aluate alternativ e tec hniques
279
04
σσ
heat,1 heat,2
1,000 heat values
Randomizer Randomizer
Attenuator Attenuator
0% 0%
100% 100%
Adder
Normalizer
σ
size
Request Queue Generator
1471935
1,000 size values
Object Id Request Queue
100,000 Requests
0 999 0 999
0 999
HEAT [o ]
1 0..999 HEAT [o ]
2 0..999
HEAT [o ]
0..999
SIZE [o ]
0..999
Driver
Module
Space Management
Module
Device Emulation
Module
Heat Statistics
Module
knob
Space Management
Algorithm
Timestamp Queues
13568
0.08
0.12
0.34
Heat Values
Calc.
= 0.1 = 0.9
Figure Blo c k diagram of the Sim ulation Mo del
Sim ulation Mo del
The sim ulation mo del consists of four comp onen ts Driv er Space Managemen t Device Em ulation
and Heat Statistics see Figure The Driver mo dule generates a syn thetic w orkload b y con
structing a queue of ob ject requests The Sp ac e Management mo dule realizes the dieren t space
managemen t algorithms and in terfaces with the Devic e Emulation mo dule The Device Em ulator
mo dels a magnetic disk driv e with its seeks and transfer times see R W and a simple ter
tiary device ie a tap e driv e Finally the He at Statistics mo dule gathers information ab out the
sequence of requests and compiles this data in to a heat v alue for ev ery ob ject The estimated
heat v alue is then used b y the Space Managemen t mo dule to decide whic h ob jects should b e disk
residen t The sim ulator w as implemen ted using the C programming language
The Driv er mo dule uses three input parameters to generate a sequence of ob ject requests t w o
heat distributions heat and heat and a knob The knob determines the role of a giv en heat
in generating the requests As illustrated in Figure when knob is equal to " for heat its v alue
is " for heat in this case the v alue c hosen b y heat determines the nal queue of requests
T o describ e ho w requests are generated using a normal distribution assume that knob is equal to
" for heat In this case ob ject heats or frequency of their app earance in the request queue
ob ey a normal distribution with a mean of zero and a standard deviation of heat Ob jects are
view ed as uniformly distributed sample p oin ts in the in terv al With a small v alue of heat the access pattern is sk ew ed and most accesses will b e concen trated on a smaller subset of the
ob jects With larger v alues of heat
the heats of the ob jects b ecomes more uniform As a rule of
th um b appro ximately " of the heat will b e concen trated on a heat fraction of all ob jects and
" of the heat on a heat fraction F or example when heat then appro ximately " of
the heat is concen trated in " of the ob jects As the v alue of heat increases this ratio c hanges
rapidly A t heat nearly " of the heat is concen trated in " of the ob jects while at
heat less than " of the heat is concen trated on " of the ob jects F or v alues of heat this distribution is nearly uniform Ob jects are assigned heats in a purely random manner there
is no in ten tional correlations b et w een the size and heat of ob jects
The Driv er uses t w o heat distributions to generate the nal queue of requests Both distributions
are based on the same normal distribution with a mean of zero The knob con trols to what exten t
eac h of the t w o heat distributions is used in generating the request queue When the v alue of knob
c hanges the heat essen tially migrates from one set of ob jects to another A t one end " of the
heat curv e and " of the heat curv e are in eect A t the other end the p ercen tages are
rev ersed ie " of the heat curv e and " of the heat curv e are used F or ev ery ob ject o
x
heat
o
x
and heat
o
x
are added and stored in the arra y heat o
This arra y of results is
further normalized suc h that
P
j heat o
j
Finally the request queue is generated from the
heat o
arra y The Space Managemen t mo dule services the requests that are generated b y the Driv er mo dule
It has access to a syn thetic database that consists of ob jects A normal distribution of the sizes
guaran tees a xed a v erage ob ject size for all exp erimen ts con trolled b y the input parameter siz e
Eac h dieren t space managemen t algorithm that implemen ts Standard Dynamic EVEREST and
REBA TE p olicies is a plugin mo dule
The Device Em ulation part of the sim ulator consists of data structures and routines to em ulate a
magnetic disk driv e and a tertiary device W eemplo y ed the analytical mo dels of R W to represen t
the seek op eration the transfer rate and latency of a magnetic disk driv e The tertiary device is
simplied and only its transfer rate is mo deled This mo dule is also resp onsible for gathering the
statistics that are used to compare the eectiv eness of the dieren t space managemen t p olicies
the n um b er of seeks p erformed on b ehalf of an ob ject the a v erage p ercen tage of disk space that
remains idle
The Space Managemen t mo dule do es not ha v e access to an y heat information that exist in
the Driv er and m ust learn ab out it b y gathering statistics from the issued requests The learning
pro cess is as follo ws The mo dule k eeps a queue of timestamps for ev ery ob ject as w ell as an
estimated heat v alue All the queues are initially empt yand theheatv alues are uniformly set to
n
where n is the total n um b er of ob jects Up on the arriv al of a request referencing ob ject o
x
the
curren t time is recorded in the queue of ob ject o
x
Whenev er the timestamp queue of ob ject o
x
b ecomes full the heat v alue of that ob ject is up dated according to
heat
new
o
x
c K
P
K i
t
i
t
i
! c heat
old
o
x
where K is the length of the timestamp queue set to c is a constan tbet ween and set to
and t
x
is one individual timestamp After the up date is completed the queue of this ob ject
is ushed and new timestamps can b e recorded This approac h is similar to the concept of the
Backwar d Kdistanc e used b y the authors of OO W in the LR UK algorithm The t wosc hemes
dier in three w a ys First the heat estimates are not based on the in terv al b et w een the rst and
the last timestamp in the queue but are a v erages o v er all the in terv als Second the heat v alue of an
ob ject o
x
is only up dated when the timestamp queue of o
x
is full therefore reducing o v erhead And
third the previous heat v alue heat
old
o
x
is b y a fraction of c tak en in to accoun t when heat
new
o
x
is calculated The ab o v e measures balance the need for smo othing out short term uctuations in
the access pattern and guaran teeing resp onsiv eness to longer term trends
0%
100%
0%
100%
0 10203040 180 200
heat,2
σ
heat,1
σ
0.1 = 0.1 = 0.1 = 0.1 0.1
= 0.17 = 0.2 = 1.0
Simulation Cycles
σσ σ
σσ σ
New
Randomization
New
Randomization
New
Randomization
190
Knob
100% of heat,1
σ
100% of heat,2
σ
Figure V alues o v er time of three of the input parameters for the sim ulation exp erimen ts
Exp erim en tal Design
The t w o sim ulation mo del input parameters heat and heat are used to mo del ho w the heat
of individual ob jects mightc hange o v er time The relev an t parameters of the exp erimen ts are
summarized in T able The v alue of heat is alw a ys held constan t at The parameter heat is
initially set to The v alue of the knob is initialized to " of heat After requests
the v alue of knob is decremen ted b y " and therefore the ratio of requests corresp onding to
heat increased from " to " and that of heat to " This pro cess con tin ues with the
knob v alue decreasing b y " after ev ery requests un til its v alue reac hes " at this p oin t
all requests corresp ond to the heat distribution The heat represen ted b y heat is no w
redistributed b yin v oking the randomization routine A t this p oin t the v alue of knob starts to
increase b y " incremen ts Eac h time a new queue of requests is generated This pro cedure
is rep eated man y times A t extreme v alues of knob for heat ie " and " for heat a
random n um b er generator is emplo y ed to ensure that the iden tit y of frequen tly accessed ob jects
c hanges requiring the system to learn the iden tit y of the frequen tly accessed ob jects eac h time
The ab o v e exp erimentw as rep eated a total of times eac h time with a dieren t heat parameter The v alues used are listed in T able and Figure illustrates the pro cess
Device P arameters
Disk Size GB
Database also T ertiary Size GB
Blo c k Size where applicable kB
Ob ject P arameters
Num b er of Ob jects Maxim um Ob ject Size MB
Minim um Ob ject Size MB
Av erage Ob ject Size MB
Input P arameters
siz e
heat heat T able Sim ulation P arameters
P erformance Results
Figure presen ts the n um b er of seeks observ ed p er request that nds its referenced ob ject on
the disk driv e
termed a disk hit With an empt y disk Standard la ys the referenced ob ject
con tiguouslyHo w ev er after a few iteration of knob c hanging its v alue eac h request observ es on
the a v erage more than seeks Dynamic and REBA TE ensure a con tiguous la y out and observ e
zero seeks p er disk hit As exp ected due to its prev en tivest yle EVEREST renders the n um ber of
seeks a constan t in this exp erimen t this n um b er represen ts the total seeks required to b oth
service a request observing a disk hit and the prev en tiv e op erations p erformed b y EVEREST
Figure demonstrates the disadv an tages of la ying out an ob ject con tiguously with Dynamic and
REBA TE Dynamic w astes " to " of the a v ailable disk space this is explained in Section REBA TE w astes appro ximately " of the disk space due to in ternal fragmen tation of a frame
Both Standard and EVEREST utilize the a v ailable space to its fullest p oten tial They do w aste a
small fraction of space less that " due to our assumption that an ob ject should b e residen tin
its en tiret y no partial materialization of an ob ject is allo w ed
Figure quan ties the o v erhead attributed to the prev en tivec haracteristics of EVEREST The
n um ber of prev en tiv e op erations p erformed dep ends on ho w frequen tly the replacemen t p olicy is
activ ated to lo cate and delete victim ob jects T o illustrate the p eaks in Figure a corresp ond to
This n um b er do es not include the rst seek required to access the rst blo c k ofanobject
1000 2000 3000 4000 5000
Number of Heat Cycles
0
10
20
30
40
50
60
70
Seeks per Disk Hit
Standard
EVEREST
Dynamic, REBATE
Figure Num b er of seeks p er disk hit
200 400 600 800 1000
Number of Heat Cycles
0
2
4
6
8
10
12
14
Wasted Disk Space [%]
REBATE
Dynamic
Standard, EVEREST
Figure W asted disk space
200 400 600 800 1000
Number of Heat Cycles
0
50
100
150
200
250
No. of Operations
Figure a The n um b er of prev en tiveopera tions
200 400 600 800 1000
Number of Heat Cycles
0
5
10
15
20
25
30
No. of Seeks per Operation
200 400 600 800 1000
Number of Heat Cycles
0
0.5
1
1.5
2
2.5
MB per Operation
Figure b The n um b er of seeks p er prev en
tiv e op eration Eac h migration of a section
requires t w o seeks read ! write
Figure c MBytes of data migrated p er pre
v en tiv e op eration
Figure Ov erhead attributed to the prev en tivec haracteristic of EVEREST
the v alue of heat knob " for heat As describ ed in Section T able the v alue
of heat increases from to A t heat the distribution of access to the ob jects is fairly
uniform motiv ating the replacemen t p olicy to delete sev eral ob jects from the disk in fa v or of the
others The amoun tof w ork disk activit y p erformed p er prev en tiv e op eration dep ends on the
degree of fragmen tation of sections on the disk driv e Figure b and c demonstrate the n um ber of
seeks incurred and the amoun t of migrated data attributed to a prev en tiv e op eration While there
is signican tv ariation on the a v erage a prev en tiv e op eration requires seeks and the migration
of MByte of data note this is " of the a v erage requested ob ject size Once amortized across
all the requests this o v erhead b ecomes negligible as illustrated b y the n um b er of incurred seeks
in Figure Figure demonstrates the follo wing First the n um b er of prev en tiv e op erations should b e a
small fraction of the total n um b er of requests serviced b y a device This clearly states the need for
the existence of a w orking set Otherwise the n um b er of ob jects replaced ma y b ecome signican t
and in turn cause the o v erhead attributed to the prev en tiv e nature of EVEREST to dominate
the a v erage service time of the device Second the latency incurred b y a request mightbe v ariable
dep ending on whether a prev en tiv e op eration is in v ok ed and the amoun tof w ork p erformed
b y this prev en tiv e op eration
Finallyw e compared the obtained results with the scenario where the system w as allo w ed
access to the queue of requests and could compute the heat of the ob jects with " accuracy as
compared to emplo ying the Heat Statistics Mo dule to learn the heat information see Section for the details of this mo dule The obtained results w ere almost iden tical demonstrating that the tec hnique emplo y ed b y the Heat Statistics Mo dule has no impact on the obtained results and
the emplo y ed tec hnique to compute the heat statistics is eectiv e in our exp erimen tal design
Analytical Mo dels
In this section w edev elop analytical mo dels that appro ximate the a v erage service time of a system
based on its ph ysical c haracteristics and the fundamen tal factors that impact the p erfor
mance of the system with the alternativ e space managementtec hniques These abstract mo dels are
useful b ecause a system designer ma y manipulate the v alue of their parameters to understand the
b enets of one strategy as compared to another They ha vebeen v alidated using the exp erimen tal
sim ulation mo del
The p erformance of the system with a space managemen t strategy is impacted b y the follo wing
factors
Av erage n um b er of seeks incurred when reading an ob ject F and thea v erage time to p erform
a seek S
Seek
Num ber of prev en tiv e op erations p erformed Pand the a v erage time to p erform one suc h
op eration S
Prev
Num ber of b ytes reorganized U and the n um b er of seeks attributed to the reorganization
pro cedure E The amoun tof w asted space W and its exp ected hit ratio
W e analyze the a v erage service time of the system with a giv en strategy to service a xed n um ber
T erm Denition
C Storage capacit y of the magnetic disk
R
T
T otal n um b er of requests issued during a xed p erio d of time
R
H
T otal n um b er of requests that observ e a disk hit during the xed p erio d of time
Size
Av g req
Av erage n um ber of b ytes retriev ed p er request
B T otal n um ber of b ytes retriev ed b y R
T
requests
H T otal n um ber of b ytes found on the disk b y R
T
requests
P Av erage n um ber of prev en tiv e op erations p er disk hit
W T otal n um ber of b ytes w asted b y a strategy
U T otal n um ber of b ytes reorganized read ! write
E T otal n um b er of seeks attributed to the reorganization pro cedure
F Av erage n um b er of seeks p er disk hit
D
T er tiar y
Deliv ery rate of tertiary storage device incorp orates the seek time of the device
T
D isk
T ransfer rate of the disk driv e
b yte hit F raction of a b yte that observedahitb yte hit H
B
S
S eek
Av erage service time for a seek
S
Prev
Av erage service time for a prev en tiv e op eration
T able List of parameters used b y the analytical mo dels
of requests and quan tify what fraction of this service time is attributed to eac h of these factors
One or more of these factors migh t b e nonexistence for a strategy F or example REBA TE incurs
the o v erhead of neither prev en tiv e op erations nor the seeks attributed to retriev al of an ob ject In
this case the v alue of appropriate parameters will b e zero F and P for REBA TE enabling the
mo del to eliminate the impact of these factors Refer to T able for a list of factors attributed to
the dieren t space managemen t tec hniques describ ed in this pap er W e assume that the system
has accum ulated the statistics sho wn in T able W e describ e eac h factor and its corresp onding
analytical mo del in turn
The p ortion of a v erage service time attributed to seeks incurred when reading an ob ject is
F S
S eek
where F is the a v erage n um b er of seeks p erformed on b ehalf of a retriev al from disk and S
S eek
is
the a v erage service time to p erform a seek These statistics can b e gathered as the system services
requests
The p ortion of a v erage service time attributed to prev en tiv e op erations is dened as
P S
Prev
Preventive operations =
Wasted disk space =
Re-organization =
Seek time =
Disk transfer time =
Tertiary service time =
P
byte_hit 1
D
U
RT
Tertiary
H Disk
+
E
R
S
H
Seek *
Prev
F
H
T R
B
D R
Seek
Disk T
Tertiary T
*
*
Average
service
time
*W
*
*S
*S
*
1
C
Avg_req*
Size
1
-H 1
Figure Comp onen ts of a v erage service time for a single queue of requests
P denes the a v erage n umberofprev en tiv e op erations p er disk hit and S
Prev
is the a v erage service
time to p erform a prev en tiv e op eration
The amoun t of time attributed to disk transfer time is a function of the a v erage n um ber of
b ytes retriev ed from the disk p er request H
R
T
and the transfer rate of the disk driv e T
D isk
H
R
T
T
D isk
Similarly the amoun tof time spen t transfering data from tertiary is a function of the a v erage
n um ber of b ytes retriev ed from the tertiary p er request B H
R
T
and the deliv ery rate of the tertiary
storage device
B H
R
T
D
T er tiar y
The deliv ery rate of tertiary storage device incorp orates the a v erage n um b er of seeks incurred b y
this device p er request and the o v erhead of suc h seeks
Atec hnique that emplo ys a reorganization pro cess reads and writes a xed n um ber of b ytes
U causing the device to incur a xed n um b er of seek op erations E This o v erhead a v eraged
across all requests R
H
is
U
R
H
T
Disk
!
E
R
H
S
Seek
Atec hnique suc h as REBA TE mayw aste disk space Ho w ev er its impact migh t b e negligible
if the w asted space is not exp ected to ha v e a high hit ratio Assume the existence of a unit that
denes what fraction of eac hb yte on the disk should observ e a hit termed b yte hit ratio its details
are presen ted in the follo wing paragraphs The w asted space reduces the c hange in b yte hit bya
xed margin
by te hit W
C
This causes a xed n um ber of b ytes of the a v erage request to b e retriev ed
from the tertiary storage device by te hit W
C
Siz e
Av g req
and the o v erhead of reading these b ytes
can b e quan tied as
by te hit W
C
Siz e
Av g req
D
T er tiar y
Byte hit ratio is a function of the size of b oth the w orking set of the system and the storage
capacit y of the disk driv e When the size of the w orking set of an application is larger than the
storage capacit y of the disk ev ery b yte b ecomes v aluable b ecause it minimizes the n umberofb ytes
retriev ed from the tertiary storage device In this case b yte hit ratio is dened as
H
B
When
the w orking set is smaller than the storage capacit y of the disk the probabilit yof a w asted b yte
observing a hit is a function of the database size the amoun tof w asted space and the pattern of
access to the ob jects F or example if one assumes that references are randomly distributed across
the ob jects b ytes that constitute the remainder of database except for those that are part of the
w orking set then b yte hit migh t b e dened as
siz e DB W
Wev eried the analytical mo dels using the sim ulator This w as ac hiev ed as follo ws The
sim ulation w as in v ok ed for a p erio d of time in order to accum ulate the v alue of parameters outlined
in T able Next the a v erage service time of the system as computed b y the sim ulator w as compared
with that of the analytical mo del In almost all cases there w as a p erfect matc h The highest
observ ed margin of error w as less than " It is imp ortan t to note that these mo dels should b e
extended with queuing times in the presence of b oth m ultiple users and m ultiple disk driv es this
is b ey ond the fo cus of this study
Conclusions
In this pap er weha v e studied alternativ es in space managemen t for large rep ositories of ob jects
that are generally retriev ed sequen tially and in their en tiret y These rep ositories migh tbefound
in v arious m ultimedia applications suc h as videoondemand serv ers and in n umerous scien tic
applications suc h as the Bro okha v en and Cam bridge database of molecular structures T o isolate
those factors that con tribute signican tly to the p erformance of suc h a system w eha v e sampled the
space of storage managemen t p olicies for a xed hierarc hical arc hitecture The sim ulation results
and their analyses p ermit one to isolate the tradeos inheren tin v arious designs tradeos
bet w een w asted time seeking and w asted space b et w een lo cal greedy tec hniques for optimizing
a device s w orkload as in Standard or Dynamic and those that imp ose a more global order on
the medium REBA TE or EVEREST b et w een detectiv e and prev en tiv e strategies for adapting
toac hanging w orkload Ho w ev er a complete ev aluation of these tradeos is dep enden tonboth
the ph ysical c haracteristics of the system and the target application F or example the impact of
w asted space and w asted time up on the actual w orkload of the device dep ends critically on its seek
time and bandwidth blo c k size and a v erage ob ject size and the size of the residen tw orking set
relativ e to the capacit y of the device Whether a system should imp ose a global order on its device
and eectiv ely partition its space ma y dep end up on the c hangeabilit y of the w orking set and the
exp ected or observ ed v ariance in the heats of ob jects Similarly the p olicy adopted to organize
and reorganize space dep ends up on c haracteristics of the tasks for whichitisdeplo y ed whether
aw orking set exists ho w quicklyitc hanges and ho w predictable its ev olution is
While w e b eliev e that this study is complete in its treatmen t of issues that arise with design
of strategies to manage the space of a mec hanical device it raises t w o related researc h topics that
deserv e further in v estigation First implemen tation details of Dynamic REBA TE and EVEREST
are lac king and require further consideration should a system designer elect to emplo y one of these
strategies in a le system In particular w ein tend to in v estigate the crashreco v ery comp onentof
these strategies ie it enables the device to reco v er to a consisten t state after a p o w er failure
Second the managemen t of ob jects across the dieren t strata of a hierarc hical storage structure
requires further analysis In particular a managemen ttec hnique should decide if it is w orth while
to allowm ultiple copies of an ob ject with one cop y residing at a dieren t stratum of the hierarc h y
eg one cop y on the magnetic disk and a second on the optical disk in Figure References
BGMJ S Berson S Ghandeharizadeh R Mun tz and X Ju Staggered striping in m ultim edia informa
tion systems In Pr o c e e dings of the A CM SIGMOD International Confer enc e on Management
of Data
BKW
FC Bernstein TF Ko etzle GB William s EF Ma y er MD Bryce JR Ro dgers O Ken
nard T Himan uc hi and M T asumi The Protein Databank A Computer Based Arc hiv al File
for Macromolecular Structures Journal of Mol Biol CABK G Cop eland W Alexander E Bough ter and T Keller Data Placemen t in Bubba In Pr o c e e d
ings of the A CM SIGMOD International Confer enc e on Management of Data pages CDKK H T Chou DJ DeWitt R Katz and T Klug Design and implemen tatio n of the Wisconsin
Storage System SoftwarePr actic es and Exp erienc e CHL M Carey L Haas and M Livn yT ap es hold data to o Challenges of tuples on tertiary storage
In Pr o c e e dings of the A CM SIGMOD International Confer enc e on Management of Datapages
CL HJ Chen and T Little Ph ysical Storage Organizations for TimeDep enden t Multimedia Data
In Pr o c e e dings of the F oundations of Data Or ganization and A lgorithms F ODO Confer enc e Octob er Cou National Researc h Council Mapping and Sequencing the Human Genome In Committe e on the
Human Genome Bo ar d on Basic Biolo gy National Academ y Press April Den P J Denning The Working Set Mo del for Program Beha vior Communic ations of the A CM Gal D Le Gall MPEG a video compression standard for m ultimedia applications Communic ations
of the A CM April GI S Ghandeharizadeh and D Ierardi Managemen t of Disk Space with REBATE Pr o c e e dings
of the Thir d International Confer enc e on Information and Know le dge Management CIKM No v em b er Gib G A Gibson Re dundant Disk Arr ays Reliabls Se c ondary Stor age MIT Press Cam bridge
Mass GRa J Gra y and A Reuter T r ansaction Pr o c essing Conc epts and T e chniquesc hapter pages
Morgan Kaufmann GRb J Gra y and A Reuter T r ansaction Pr o c essing Conc epts and T e chniquesc hapter pages
Morgan Kaufmann Kno K C Kno wlton A fast storage allo cator Communic ations of the A CM Octob er
LD H R Lewis and L Denen berg Data Structur es Their A lgorithmsc hapter pages Harp er Collins MJLF M Mc kusic k W Jo y S Leer and R F abry A Fast File System for UNIX A CM T r ansactions
on Computer Systems August NY RT Ng and J Y ang Maximizing Buer and Disk Utilizations for News OnDemand In
Pr o c e e dings of the International Confer enceonV ery L ar ge Datab asesSeptem b er OO W E J ONeil P E ONeil and G W eikum The LRUK Page Replacemen t Algorithm for
Database Disk Buering In Pr o c e edingsofthe A CM SIGMOD International Confer enceon
Management of Data pages PGK D P atterson G Gibson and R Katz A case for RedundantArra ys of Inexp ensiv e Disks
RAID In Pr o c e e dings of the A CM SIGMOD International Confer enc e on Management of
DataMa y R O M Rosen blum and J Ousterhout The Design and Implemen tatio n of a LogStructured File
System T r ansactions on Computer Systems R W C Ruemmler and J Wilk es An In tro duction to Disk Driv e Mo deling IEEE Computer Marc h
TPBG FA T obagi J P ang R Baird and M Gang Streaming RAIDA Disk Arra y Managemen t
System for Video Files In First A CM Confer enc e on Multime dia August
A Displa y of Con tin uous Media
This app endix explains the signicance of predicting the service time of a mec hanical device eg a
disk driv e in order to sc hedule it eectiv ely to displayan object of con tin uous media data t yp e eg
video T o simplify the discussion assume that the target en vironmen t consists of some memory one disk driv e and a single tertiary storage device as describ ed in Section Moreo v er assume that
the bandwidth of b oth the tertiary storage device and magnetic disk driv e exceed the bandwidth
required to displa y compressed video ob jects T o ensure a con tin uous displa y of a video ob ject X
and minimize the amoun t of required memorysev eral studies BGMJ NY CL TPBG ha v e prop osed that X b e strip ed in to n equisized sub ob jects X
X
X
n
Eac h sub ob ject
X
i
represen ts a con tiguous p ortion of XT o displa y X from a device sa y the disk driv e the
system stages X
from the disk driv e to memoryIt sc hedules the disk driveto read X
suc h that
it b ecomes memory residen t b efore the displayof X
completes Next it initiates the displa yof
X
This pro cess is rep eated for eac h X
i
and X
i
un til all sub ob jects of X are displa y ed F or
complete details of this mec hanism see BGMJ
In order to sc hedule the disk driv e to satisfy the constrain t that X
i is rendered memory
residen t b efore the displayof X
i
completes the system m ust compute the displa y time of
X
i
and the time required to read X
i
from the disk driv e Consider eac h factor in turn
The displaytime of X
i
is a function of its size and the bandwidth required to displa y it If the
bandwidth required to displa y X is m bps and the size of eac h of its sub ob jects is KByte
then the displa y time of X
i
is seconds The size of sub ob jects of dieren t media t yp es is
prop ortional to their bandwidth requiremen t BGMJ F or example if the bandwidth required
to displa y ob ject Y is m bps then the size of eac h of its sub ob jects w ould b e t wice that of X
KBytes ho w ev er note that their displa y time is iden tical simplifying the sc heduling of time
Th us in a database consisting of n media t yp es eac h with a unique bandwidth requiremen t one
w ould nd n classes of sub ob jects eac h with a unique size
The service time of the disk drivetoreadsubobject X
i
is dep enden t on its size the transfer
rate of the disk and the n um b er of disk seeks incurred when reading it The n um b er of seeks
in tro duces v ariabilit y in service time among the sub ob jects of X These seeks can b e eliminated b y
storing eac h sub ob ject con tiguously on the disk driv e In a le system based on Standard where the
a v ailable disk space is partitioned in to xed size pages the con tiguously of a sub ob ject is ensured
when its size is smaller than that of a disk page Otherwise one ma y not assume that a sub ob ject
is stored con tiguously This is b ecause the sub ob jects are sw app ed in and out of the a v ailable
disk space dep ending on their exp ected future frequency of access in order to minimize the n um ber
of accesses to the tertiary storage device pro viding the user with a lo w latency time Ho w ev er
this causes the disk space to b ecome fragmen ted o v er a p erio d of time This forces the system
to either store a sub ob ject in a noncon tiguous manner in tro ducing seeks or delete more data
essen tially sub ob jects that corresp ond to other ob jects than necessary to ensure the con tiguous
la y out of eac h new sub ob ject In the rst case the seeks are undesirable In the second scenario
the n um b er of references to the tertiary storage device increases as the deleted sub ob jects mayha v e
high exp ected future access REBA TE and Dynamic ensure a con tiguous la y out of eac h ob ject
EVEREST b ounds the n um b er of seeks p erformed when retrieving an ob ject in order to enable a
sc heduler to compute an upp er b ound on the service time of the disk
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 625 (1996)
PDF
USC Computer Science Technical Reports, no. 578 (1994)
PDF
USC Computer Science Technical Reports, no. 584 (1994)
PDF
USC Computer Science Technical Reports, no. 628 (1996)
PDF
USC Computer Science Technical Reports, no. 612 (1995)
PDF
USC Computer Science Technical Reports, no. 601 (1995)
PDF
USC Computer Science Technical Reports, no. 685 (1998)
PDF
USC Computer Science Technical Reports, no. 627 (1996)
PDF
USC Computer Science Technical Reports, no. 602 (1995)
PDF
USC Computer Science Technical Reports, no. 610 (1995)
PDF
USC Computer Science Technical Reports, no. 659 (1997)
PDF
USC Computer Science Technical Reports, no. 766 (2002)
PDF
USC Computer Science Technical Reports, no. 748 (2001)
PDF
USC Computer Science Technical Reports, no. 589 (1994)
PDF
USC Computer Science Technical Reports, no. 600 (1995)
PDF
USC Computer Science Technical Reports, no. 699 (1999)
PDF
USC Computer Science Technical Reports, no. 587 (1994)
PDF
USC Computer Science Technical Reports, no. 870 (2005)
PDF
USC Computer Science Technical Reports, no. 558 (1993)
PDF
USC Computer Science Technical Reports, no. 591 (1994)
Description
Shahram Ghandeharizadeh, Douglas J. Ierardi, Roger Zimmermann. "Management of space in hierarchical storage systems." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 598 (1994).
Asset Metadata
Creator
Ghandeharizadeh, Shahram
(author),
Ierardi, Douglas J.
(author),
Zimmermann, Roger
(author)
Core Title
USC Computer Science Technical Reports, no. 598 (1994)
Alternative Title
Management of space in hierarchical storage systems (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
40 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270859
Identifier
94-598 Management of Space in Hierarchical Storage Systems (filename)
Legacy Identifier
usc-cstr-94-598
Format
40 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/