Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 920 (2011)
(USC DC Other)
USC Computer Science Technical Reports, no. 920 (2011)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Data Centers Power Reduction: A two Time Scale
Approach for Delay Tolerant Workloads
Yuan Yao
∗
, Longbo Huang
∗
, Abhihshek Sharma
∗
, Leana Golubchik
∗
and Michael Neely
∗
∗
University of Southern California, Los Angeles, CA 90089
Email: {yuanyao, longbohu, absharma, leana, mjneely}@usc.edu
Abstract—In this work we focus on a stochastic optimization
based approach to make distributed routing and server man-
agement decisions in the context of large-scale, geographically
distributed data centers, which offers significant potential for
exploring power cost reductions. Our approach considers such
decisions at different time scales and offers provable power
cost and delay characteristics. The utility of our approach
and its robustness are also illustrated through simulation-based
experiments under delay tolerant workloads.
I. INTRODUCTION
Over the last few years, the demand for computing has
grown significantly. This demand is being satisfied by very
large scale, geographically distributed data centers, each con-
taining a huge number of servers. While the benefits of having
such infrastructure are significant, so are the corresponding
energy costs. As per the latest reports, several companies
own a number of data centers in different locations, each
containing thousands of servers – Google (≈1 million), Mi-
crosoft (≈200K), Akamai (60-70K), INTEL (≈100K), and
their corresponding power costs are on the order of millions
of dollars per year [1]. Given this, a reduction by even a
few percent in power cost can result in savings of millions
of dollars.
Extensive research has been carried out to reduce power
cost in data centers, e.g., [2], [3], [4], [5], [6], [7]; such
efforts can (in general) be divided into the following two
categories. Approaches in the first category attempt to save
power cost through power efficient hardware design and
engineering, which includes designing energy efficient chips,
DC power supplies, and cooling systems. Approaches in the
second category exploit three different levels of power cost
reduction at existing data centers as follows. Firstly, at the
server level, power cost reduction can be achieved via power-
speed scaling [5], where the idea is to save power usage
by adjusting the CPU speed of a single server. Secondly, at
the data center level, power cost reduction can be achieved
through data center right sizing [7], [4], where the idea is to
dynamically control the number of activated servers in a data
center to save power. Thirdly, at the inter-data center level,
power cost reductions can be achieved by balancing workload
across centers [2], [3], where the idea is to exploit the price
diversity of geographically distributed data centers and route
more workload to places where the power prices are lower.
Our work falls under the second category, where our goal
is to provide a unifying framework that allows one to exploit
power cost reduction opportunities across all these levels.
Moreover, the non-work-conserving nature of our framework
allows us to take advantage of the temporal volatility of power
prices while offering an explicit tradeoff between power cost
and delay.
Consider a system ofM geographically distributed data cen-
ters, each consisting of a front end proxy server and a back end
server cluster as shown in Figure 1. At different time instances,
workload arrives at the front end proxy servers which have
the flexibility to distribute this workload to different back end
clusters. The back end clusters receive the workload from front
end servers and have the flexibility to choose when to serve
that workload by managing the number of activated servers
and the service rate of each server.
The problem then is to make the following three decisions,
with the objective of reducing power cost: (i) how to distribute
the workload from the front end servers to the back end
clusters, (ii) how many servers to activate at each back end
cluster at any given time, and (iii) how to set the service rates
(or CPU power levels) of the back end servers.
Our proposed solution exploits temporal and spatial varia-
tions in the workload arrival process (at the front end servers)
and the power prices (at the back end clusters) to reduce power
cost. It also facilitates a cost vs. delay trade-off which allows
data center operators to reduce power cost at the expense
of increased service delay. Hence, our work is suited for
delay tolerant workloads such as massively parallel and data
intensive MapReduce jobs. Today, MapReduce programming
based applications are used to build a wide array of web
services – e.g., search, data analytics, social networking, etc.
Hence, even though our proposed solution is more effective
for delay tolerant workloads it is still relevant to many current
and future cloud computing scenarios.
Our contributions can be summarized as follows:
• We propose a two time scale control algorithm aimed at
reducing power cost and facilitating a power cost vs. delay
trade-off in geographically distributed data centers (Section
II and III).
• By extending the traditional Lyapunov optimization ap-
proach, which operates on a single time scale, to two
different time scales, we derive analytical bounds on the
time average power cost and service delay achieved by our
algorithm (Section IV).
• Through simulations based on real-world data sets, we
show that our approach can reduce the power cost by
2
as much as 18%, even for state-of-the-art server power
consumption profiles and data center designs (Section VI).
We also show that our approach is environmentally friendly,
in the sense that it not only reduces power cost but also the
actual power usage.
• We demonstrate the robustness of our approach to errors
in estimating data center workload both analytically as well
as through simulations (Sections VI-C and VI-D).
II. PROBLEM FORMULATION
We first formulate our problem and then discuss the prac-
tical aspects of our model. We consider M geographically
distributed data centers, denoted byD ={D
1
,...,D
M
}, where
the system operates in slotted time, i.e., t = 0,1,2,.... Each
data center D
i
consists of two components, a front end proxy
server, S
F
i
, and a back end server cluster, S
B
i
, that has N
i
homogeneous servers. Fig. 1 depicts our system model. Below
we first present our model’s components.
Data Center 1
S
F
1
S
B
1 Q
F
1
Q
B
1
Data Center 2
S
F
2
S
B
2 Q
F
2
Q
B
2
Data Center M
S
F
M
S
B
M Q
F
M
Q
B
M
Fig. 1. A model of M geographically distributed data centers.
A. The Workload Model
In every time slot t, jobs arrive at each data center. We
denote the amount of workload arriving atD
i
byA
i
(t), where
A(t) = (A
1
(t),...,A
M
(t)) denotes the arrival vector. In our
analysis, we first assume that A(t) are i.i.d. every time slot
with E
©
A(t)
ª
= λ , (λ
1
,...,λ
M
). We later discuss how
our results can be extended to the case when A(t) evolve
according to more general random processes. We also assume
that there exists some A
max
such that A
i
(t)≤A
max
for all i
and t. Note that the components inA(t) can be correlated –
this is important in practice as arrivals to different data centers
may be correlated.
B. The Job Distribution and Server Operation Model
A job first arrives at the front end proxy server of D
i
, S
F
i
,
and is queued there. The proxy serverS
F
i
then decides how to
distribute the awaiting jobs to the back end clusters at different
data centers for processing. To model this decision, we use
μ
ij
(t) to denote the amount of workload routed from D
i
to
D
j
at time t, and use μ
i
(t) = (μ
i1
(t),...,μ
iM
(t)) to denote
the vector of workload routing rates at S
F
i
. We assume that
in every time slot, μ
i
(t) must be drawn from some general
feasible workload distribution rate set R
i
, i.e., μ
i
(t) ∈ R
i
for all t. We assume only that each set R
i
is time invariant
and compact and that each setR
i
contains the constraint that
P
j
μ
ij
(t) ≤ μ
max
for some finite constant μ
max
. Note that
this assumption is not at all restrictive. For example, R
i
can
contain the constraints thatμ
ij
(t) = 0 for allt to represent the
constraint that jobs arriving at D
i
cannot be processed at D
j
,
e.g., due to the fact that D
j
does not have a data set needed
for the corresponding computation.
For each data center D
i
, the jobs routed to its back end
cluster are queued in a shared buffer. The data center then
controls its back end cluster as follows. In every time slot
t
k
= kT with k = 0,1,... and T ≥ 1, the data center first
decides on the number of servers to activate. We denote the
number of active servers at time t at D
i
as N
i
(t), where
N
i
(t)∈{N
i
min
,N
i
min
+1,N
i
min
+2,...,N
i
}, with N
i
being
the total number of servers at the back end cluster S
B
i
, and
N
i
min
,0 ≤ N
i
min
≤ N
i
being minimum number of servers
that should be activated at all times for data center D
i
. If at
time slot t
k
, we have N
i
(t
k
) > N
i
(t
k−1
), then we activate
more servers. Otherwise, we simply put N
i
(t
k−1
)−N
i
(t
k
)
servers to sleep. The reasons for having N
i
(t) changed only
every T time slots are: (i) activating servers typically costs a
non-negligible amount of time and power, and (ii) frequently
switching back and forth between active and sleep states can
result in reliability problems [8]. In addition to deciding on the
number of active servers, the data center sets the service rate
of each active server every time slot. This can be achieved
by using techniques such as power scaling [5]. For ease of
presentation, below we assume that all active servers in a
data center operate at the same service rate, and denote active
servers’ service rate at D
i
by b
i
(t),0≤ b
i
(t)≤ b
max
, where
b
max
is some finite number. We note that our model can be
extended to more general scenarios. A further discussion of
above assumptions is given in Section II-E.
C. The Cost Model
We now specify the cost model. For time slot t, we use
t
T
= ⌊ t
T
⌋ T to denote the last time before t when the
number of servers has been changed. Then at time slot t,
by running N
i
(t
T
) servers at speed b
i
(t), D
i
consumes a
total power of P
i
(N
i
(t
T
),b
i
(t)). We assume that the function
P
i
(·,·) is known to D
i
, and there exists some P
max
such
that P
i
(N
i
(t
T
),b
i
(t)) ≤ P
max
for all t and i. Such power
consumption will in turn incur some monetary cost for the
data centers, of the form “power×price”. To also model the
fact that each data center may face a different power price at
time slot t, we denote the power price at D
i
at time slot t by
p
i
(t). We assume that p(t) = (p
1
(t),...,p
M
(t)) varies every
T
1
≥ 1 time slots, where T = cT
1
for some integer c. We
assumep(t) are i.i.d and every T
1
time slot, each p
i
(t) takes
a value in some finite state space P
i
= {p
1
i
,...,p
|Pi|
i
}. We
also define p
max
, max
ij
p
j
i
as the maximum power price
that any data center can experience. We use π
i
(p) to denote
the marginal probability that p
i
= p. An example of these
different time scales is given in Figure 2.
Finally we use f
i
(t) =P
i
(N
i
(t
T
),b
i
(t))p
i
(t) to denote the
power cost at D
i
in time slot t. It is easy to see that if we
define f
max
,MP
max
p
max
, then
P
i
f
i
(t)≤f
max
for all t.
3
t 0
T 2T
T
T1
Fig. 2. An example of different time scales T and T
1
. In this example,
T = 8, T
1
= 4, and T = 2T
1
.
D. The data center Power Cost Minimization (PCM) problem
Let Q(t) = (Q
F
i
(t),Q
B
i
(t),i = 1,...,M), t = 0,1,...,
be the vector denoting the workload queued at the front end
servers and the back end clusters at time slot t. We use the
following queueing dynamics:
Q
F
i
(t+1) = max
£
Q
F
i
(t)−
P
j
μ
ij
(t),0
¤
+A
i
(t), (1)
Q
B
i
(t+1)≤ max
£
Q
B
i
(t)−N
i
(t)b
i
(t),0
¤
+
P
j
μ
ji
(t). (2)
The inequality in (2) is due to the fact that the front end
servers may allocate a total routing rate that is more than the
actual workload queued. In the following, we assume that data
centers can estimate the unfinished workload in their queues
accurately. The case when such estimation has errors will be
discussed in Section IV. Throughout the paper, we use the
following definition of queue stability:
Q, limsup
t→∞
1
t
t−1
X
τ=0
M
X
i=1
E
©
Q
F
i
(τ)+Q
B
i
(τ)
ª
<∞. (3)
Then we define a feasible policy to be the one that chooses
N
i
(t) every T time slots subject to N
i
min
≤ N
i
(t) ≤ N
i
,
and chooses μ
ij
(t) and b
i
(t) every time slot subject to only
μ
i
(t) ∈ R
i
and 0 ≤ b
i
(t) ≤ b
max
. We then define the time
average cost of a policy Π to be:
f
Π
av
, limsup
t→∞
1
t
t−1
X
τ=0
M
X
i=1
E
©
f
Π
i
(τ)
ª
. (4)
Here, f
Π
i
(τ) denotes the power cost incurred by policy Π at
time slot τ. We call every feasible policy that ensures (3) a
stable policy, and usef
∗
av
to denote the infimum average power
cost over all stable policies. The objective of our problem is
to find a stable policy that chooses the number of activated
servers N
i
(t) every T time slots, and chooses the workload
distribution ratesμ
ij
(t) and the service ratesb
i
(t) every single
time slot, so as to minimize the time average power cost. We
refer to this as the data center Power Cost Minimization (PCM)
problem in the remainder of the paper.
E. Model Discussion and Practical Consideration
We now discuss the assumptions made in our model.
Above, we assume that in time slot t all activated servers
at D
i
have the same service rate b
i
(t). This is not restrictive.
Indeed, let us focus on one data center in a single time slot
and consider the following formulation.
Let the power consumption of a server with service rate b
beP
server
(b); if theN activated servers in one data center run
at service rates b
1
, b
1
, ... , b
N
, then the total power consumed
by these servers can be written asP
total
=
P
N
j=1
P
server
(b
j
).
Let us also assume that the actual power consumption in this
data center, P
center
, has the form P
center
= C
1
×P
total
+
C
2
, where C
1
and C
2
are constants or functions of N only.
Finally we assume that P
server
(b) is a convex function of b,
which it typically is (see [6], [9]). Then according to Jensen’s
inequality, we have
P
center
=C
1
P
total
+C
2
=C
1
N
X
i=1
P
server
(b
i
)+C
2
≥C
1
NP
server
(
P
N
i=1
b
i
N
)+C
2
.
This indicates that, to minimize the power consumption with-
out reducing the amount of workload served, all servers should
have the same service rate. This justifies our assumption.
We also assume that jobs can be served at more than
one data center. When serving certain types of jobs, such
as I/O intensive ones, practical considerations such as data
locality should also be considered. This scenario can easily
be accommodated in our model by imposing restrictions on
R
i
at the front end servers. Moreover, service providers like
Google replicate data across (at least) two data centers [10].
This provides flexibility for serving I/O intensive jobs.
We also assume that the data centers can observe/measure
Q(t), i.e., the unfinished work of all incoming workload
accurately. However, in many cases, the available information
only contains the number of jobs and the number of tasks per
job. With this, we can estimate the amount of workload of the
incoming jobs. In addition, in Section IV we show that even
if the estimation is not accurate we can still prove bounds on
the performance of our approach. Moreover, in Section VI-C
we show the robustness of our approach against workload
estimation errors through experiments.
Finally, we assume that a server in sleep state consumes
much less power than an idle server. According to [7], a
server consumes 10 Watts when in sleep state, as compared
to 150 Watts in idle state. This indicates that our assumption
is reasonable. We also assume that servers can be activated
and put to sleep immediately. We note that waking up from
sleep takes around 60 seconds. During this 60 seconds, the
server cannot perform any work. This should not be ignored,
if the control actions on activating servers are made frequently.
However, when we choose T , the period of such actions, to
be large, potentially no less than an hour, we can assume that
the wake up time is amortized over the relatively long period
during which the server is active. The effect of T is further
discussed in Section VI, where we give experimental results.
III. THE SAVE ALGORITHM
We solve PCM through our StochAstic power redUction
schEme (SAVE). We first describe SAVE’s control algo-
rithms and discuss corresponding insight and implementation
related issues. SAVE’s derivation and analytical performance
bounds are discussed in the following section.
A. SAVE’s control algorithms
SAVE’s three control actions are:
• Front end Routing: In every time t = kT , k = 0,1,...,
each D
i
solves for μ
ij
to maximize
M
X
j=1
μ
ij
[Q
F
i
(t)−Q
B
j
(t)] (5)
4
subject to the constraint that μ
i
= (μ
i1
,...,μ
iM
) ∈ R
i
.
Then in every time slot τ ∈ [t,t+T −1], D
i
distributes
up to μ
ij
(τ) amount of workload to the back end cluster
at D
j
, 1≤j ≤M.
• Back end Server Management: At time slot t = kT ,
data center D
i
chooses the number of servers N
i
(t) ∈
[N
i
min
,N
i
] to minimize the following quantity:
E
©
t+T−1
X
τ=t
X
j
£
Vf
j
(τ)−Q
B
j
(t)N
j
(t)b
j
(τ)
¤
|Q(t)
ª
(6)
Then data center D
i
uses N
i
(t) servers over time [t,t+
T−1]. In every time slotτ ∈ [t,t+T−1], each data center
D
i
chooses the service rate of the serversb
i
(τ) (note that
N
i
(t) is determined at time slot t) to minimize:
Vf
j
(τ)−Q
B
j
(t)N
j
(t)b
j
(τ) (7)
• Queue Update: Update the queues using (1) and (2).
SAVE works at two different time scales. The front end
routing decisions and number of active servers selection,
N
i
(t), are made every T slots, while back end servers’
service rates are updated at each slot. This two time scale
mechanism is important from an implementation perspective
because waking up servers from sleep state usually takes
much longer than servers’ speedscaling. The Back end Server
Management step involves maximizing (6), an expectation
over future (power cost) events. We show in Section III-C
that this can be carried out through learning.
B. Properties of SAVE
We highlight SAVE’s two interesting properties. First, it is
not work-conserving. A back end cluster S
B
j
may choose not
to serve jobs in a particular time slot, even if Q
B
j
> 0, due to
a high power price atD
j
. This may introduce additional delay
but can reduce the power cost as shown in Section VI-A.
Second, SAVE can provide opportunities for bandwidth
cost savings because (a) it provides an explicit upper bound
on the workload sent from S
F
i
to S
B
j
, and (b) these routing
decisions remain unchanged for T time slots. If T is large,
this can provide opportunities for data centers to optimize
network routing ahead of time to reduce bandwidth cost. As
highlighted in [2], content distribution networks like Akamai
can incur significant bandwidth costs. Incorporating bandwidth
costs into our model is part of future work.
C. Implementing the algorithm
Note that the routing decisions made at the front end servers
do not require any statistical knowledge of the random arrivals
and prices. All that is needed is that D
i
’s back end cluster
broadcasts Q
B
i
(t) to all front end proxy servers every T time
units. This typically only requires a few bits and takes very
little time to transmit. Then each data center D
i
compute μ
ij
for each j. The complexity of maximizing
P
j
μ
ij
[Q
F
i
(t)−
Q
B
j
(t)] depends on the structure of the set R
i
. For example,
ifR
i
only allows one μ
ij
to take nonzero values, then we can
easily find an optimal solution by comparing the weight-rate-
product of each link, and finding the best one.
In the second step, we need to maximize the quantity (6).
This in general requires statistical knowledge of the random
power prices. However, we use the following procedure to
carry out our computation. At every t = kT , k ≥ 0, we use
the empirical distribution of prices over a time window of size
L to compute the expectation. Specifically, if t≥ L, then let
n
i
p
(t,L) be the number of times the event{p
i
(t) =p} appears
in time interval[t−L+1,t]. Then use
n
i
p
(t,L)
L
as the probability
π
i
(p) for estimating the expectations. Since allp
i
(t) only take
finitely many values, it is well known that:
lim
L→∞
n
i
p
(t,L)
L
=π
i
(p). (8)
Therefore, as we increase the number of samples, the estima-
tion becomes better and better. Note that in this procedure, we
use the fact that (6) can be decomposed into a summation of
M expectations, and that each expectation only requires the
marginal distribution of the prices.
IV. SAVE: DESIGN AND PERFORMANCE ANALYSIS
SAVE’s focus on reducing power cost along with queue
stability suggests a design approach based on the Lyapunov
optimization framework [11]. This framework allows us to
include power costs into the Lyapunov drift analysis, a well-
known technique for designing stable control algorithms.
However, the vanilla Lyapunov optimization based algorithms
operate on a single time scale. We extend this approach to two
different time scales, and derive analytical performance bounds
analogous to the single time scale case. We now highlight the
key steps in deriving SAVE, and then characterize its power
cost and delay performance.
A. Algorithm Design
We first define the Lyapunov function, L(t), that measures
the aggregate queue backlog in the system.
L(t)
△ =
M
X
i=1
¡
[Q
F
i
(t)]
2
+[Q
B
i
(t)]
2
¢
. (9)
Next, we define the T -slot Lyapunov drift, Δ
T
(t) as the
expected change in the Lyapunov function over T slots.
Δ
T
(t),E
©
L(t+T)−L(t)|Q(t)
ª
. (10)
Following the Lyapunov optimization approach, we add the
expected power cost over T slots (i.e., a penalty function),
E
˘P
t+T−1
τ=t
P
j
fj(τ)
¯
, to (10) to obtain the drift-plus-penalty
term. A key derivation step is to obtain an upper bound on
this term. The following lemma defines such an upper bound
for our case (see [12] for proof).
Lemma 1. Let V > 1 and t = kT for some nonnegative in-
teger k. Then under any possible actions N
i
(t)∈ [N
i
min
,N
i
],
μ
i
(t)∈R
i
and b
i
(t)∈ [0,b
max
], we have:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
≤
B
1
T +VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
5
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(τ)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(τ)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
.
(11)
Here B
1
,MA
2
max
+
P
i
N
2
i
b
2
max
+(M
2
+M)μ
2
max
.
The main design principle in Lyapunov optimization is
to choose control actions that minimize the R.H.S. of (11).
However, for any slot t, this requires prior knowledge of the
future queue backlogs (Q(t)) over the time frame [t,t+T−1].
Q(t) depends on the job arrival processes A
i
(t), and SAVE’s
decisionsμ
ij
(t),b
i
(t)i, andN
i
(t) (that depend on time varying
power prices). Hence, minimizing the R.H.S. of (11) requires
information about the random job arrival and power price
processes. This information may not always be available.
In SAVE we address this by approximating future queue
backlog values as the current value, i.e., Q
F
i
(τ) =Q
F
i
(t) and
Q
B
j
(τ) = Q
B
j
(t) for all t < τ ≤ t +T − 1. However, the
simplification forces a “loosening” of the upper bound on the
drift-plus-penalty term as shown in the following lemma (see
[12] for proof).
Lemma 2. Let t = kT for some nonnegaive integer k. Then
under any possible actions N
i
(t),μ
ij
(t),b
i
(t) that can be
taken, we have:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
≤
B
2
T +E
©
t+T−1
X
τ=t
X
j
Q
i
(t)A
i
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
μ
ij
(τ)
£
Q
F
i
(t)−Q
B
j
(t)
¤
|Q(t)
ª
+E
©
t+T−1
X
τ=t
X
j
£
Vf
j
(τ)−Q
B
j
(t)N
j
(t)b
j
(τ)|Q(t)
ª
. (12)
Here B
2
,B
1
+(T−1)
P
j
[N
2
j
b
2
max
+(M
2
+1)μ
2
max
]/2.
Comparing (12) with (5), (6), and (7), we can see that SAVE
chooses N
i
(t),μ
ij
(t),b
i
(t) to minimize the R.H.S. of (12).
B. Performance bounds
Theorem 2 (below) provides analytical bounds on the power
cost and delay performance of SAVE. To derive these bounds,
we first need to characterize the optimal time average power
cost f
∗
av
that can be achieved by any algorithm that stabilizes
the queue. Theorem 1 (below) shows that using a stationary
randomized algorithm we can achieve the minimum time
average power cost f
∗
av
possible for a given job arrival rate
vectorλ = (λ
1
,...,λ
M
) whereλ
i
=E
©
A
i
(t)
ª
. We define sta-
tionary randomized algorithms as the class of algorithms that
chooseN
i
(t),μ
ij
(t), andb
i
(t) according to a fixed probability
distribution that depends onA
i
(t) andf
j
(t) but is independent
ofQ(t). In Theorem 1,Λ denotes the capacity region of the
system – i.e., the closure of set of rates λ for which there
exists a joint workload assignment and computational capacity
adaptation algorithm that ensures (3).
Theorem 1. (Optimality over Stationary Randomized Policies)
For any rate vectorλ∈Λ , there exists a stationary random-
ized control policy Π
opt
that choosesN
i
(t),i = 1,...,M every
T slots, and chooses μ
i
(t)∈R
i
and b
i
(t)∈ [0,b
max
] every
time slot purely as functions of p
i
(t) and A
i
(t), and achieves
the following for all k = 0,1,2,...
kT+T−1
X
τ=kT
M
X
i=1
E
©
f
Πopt
(τ)
ª
=Tf
∗
av
(λ),
kT+T−1
X
τ=kT
E
©
X
i
μ
Πopt
ij
(τ)
ª
=
kT+T−1
X
τ=kT
E
©
N
Πopt
j
(kT)b
Πopt
j
(τ)
ª
,
kT+T−1
X
τ=kT
E
©
A
i
(τ)
ª
=
kT+T−1
X
τ=kT
E
©
X
j
μ
Πopt
ij
(τ)
ª
.
Proof: It can be proven using Caratheodory’s theorem in
[11]. Omitted for brevity.
The following theorem presents bounds on the time average
power cost and queue backlogs achieved by SAVE.
Theorem 2. (Performance of SAVE) Suppose there exists an
ǫ> 0 such thatλ+2ǫ 1∈Λ, then under the SAVE algorithm,
we have:
Q
T
, limsup
t→∞
1
K
K−1
X
k=0
M
X
i=1
E
©
Q
F
i
(kT)+Q
B
i
(kT)
ª
≤
B
2
+Vf
max
ǫ , (13)
f
SAVE
av
, limsup
t→∞
1
t
t−1
X
τ=0
M
X
i=1
E
©
f(τ)
ª
≤f
∗
av
+
B
2
V
. (14)
Heref
∗
av
is the optimal cost defined in Section II and1 denotes
the vector of all 1’s.
Proof: See [12].
We can extend the results in Theorem 2 to Markovian arrival
processes using techniques developed in [13]. We omit the
details here due to space limitations.
What happens when SAVE makes its decisions based on
queue backlog estimates
ˆ
Q(t) that differ from the actual
queue backlogs? The following theorem shows that the SA VE
algorithm is robust against queue backlog estimation errors.
Theorem 3. (Robustness of SAVE) Suppose there exists an
ǫ > 0 such that λ + 2ǫ 1 ∈ Λ. Also suppose there exists
a constant c
e
, such that at all time t, the estimated backlog
sizes
ˆ
Q
F
i
(t),
ˆ
Q
B
i
(t) and the actual backlog sizesQ
F
i
(t),Q
B
i
(t)
satisfy|
ˆ
Q
F
i
(t)−Q
F
i
(t)|≤c
e
and|
ˆ
Q
B
i
(t)−Q
B
i
(t)|≤c
e
. Then
under the SAVE algorithm, we have:
Q
T
, limsup
t→∞
1
K
K−1
X
k=0
M
X
i=1
E
©
Q
F
i
(kT)+Q
B
i
(kT)
ª
≤
B
3
+Vf
max
ǫ , (15)
6
f
SAVE
av
, limsup
t→∞
1
t
t−1
X
τ=0
M
X
i=1
E
©
f(τ)
ª
≤f
∗
av
+
B
3
V
. (16)
Here f
∗
av
is the optimal cost defined in Section II, and B
3
=
B
2
+2Tc
e
(μ
max
+A
max
+N
max
b
max
+Mμ
max
).
Proof: See [12].
By comparing the inequalities (16) and (14), we can see that
with inaccurate information, we need to setV to a larger value
to obtain the same time average power cost as with accurate
information. However, this will result in higher average queue
backlogs (compare inequalities (15) and (13)). Hence, SAVE
works even with inaccurate queue backlog information but its
robustness is achieved at the expense of a power cost vs. delay
trafe-off. We further demonstrate SAVE’s robustness using
simulations in Section VI.
V. EXPERIMENTAL SETUP
The goal of this experimental study is to evaluate SAVE
under real world settings using real world data sets. Our evalu-
ation scenario consists of 7 data centers at different geographic
locations. Next, we describe the three main components of our
simulations – the electricity prices and job arrivals at different
data centers, the system parameters, and alternate techniques
against which we compare SAVE.
A. Data sets
Electricity prices. We downloaded the hourly electricity prices
for 7 hubs at different geographic locations from [14]. These
hubs supply electricity to large cities such as Boston and New
York, and sites like Palo Alto, CA and Ashburn, V A that host
Google’s data centers. To fully exploit the cost savings due to
temporal power price variations, we would have preferred to
have prices at a time granularity that exhibits high variability,
e..g. 5 minute intervals [2]. However, since we had access to
only the hourly prices, we use interpolation to generate prices
at5 minute intervals. For more details on this, please see [12].
Workload. We chose MapReduce jobs as representative of
delay tolerant workloads, and generate workload according to
the published statistics on MapReduce usage at Facebook [15].
Each job consists of a set of independent tasks that can be
executed in parallel. A job is completed when all its tasks
are completed. We make the following assumptions in our
experiments: (i) all tasks belonging to a job have the same
processing time; tasks from different jobs can have different
processing times; (ii) jobs can be served in any of the 7
data centers; and (iii) all the tasks from the same job must
be served at the same back end cluster. Regarding (i), in
practice, tasks from the same MapReduce job (and other
parallel computations), exhibit variability in processing times.
However, techniques exist for reducing both the prevalence
and the magnitude of task duration variability for MapReduce
jobs [16]. Hence, (i) is not a significant oversimplification. As
explained in Section II-E, we believe (ii) is also reasonable.
Assumption (iii) is not required by our approach. Rather, it is
motivated by the fact that, in practice, partitioning tasks from
the same MapReduce job across geographically distant clusters
can degrade overall performance due to network delays.
We choose the execution time of a task belonging to a
particular job to be uniformly distributed between 1 and 60
seconds
1
with the “job size” distribution (i.e., number of tasks
per job) given in Table V-A; these distributions (job execution
time and “job size”) correspond to data reported in [15].
TABLE I
DISTRIBUTION OF JOB SIZES
# Tasks 1 2 10 50 100 200 400 800 4800
% 38 16 14 8 6 6 4 4 4
We generated 5 groups of delay tolerant workloads. Each
group consists of 7 different Poisson job arrival traces – one
for each cluster. Group 1 has “homogeneous” arrival rates –
the arrival process for each cluster is Poisson with a rate of 15
jobs per time slot. The length of one time slot is 15 seconds.
Groups 2 to 5 have “heterogeneous” arrival rates. The average
arrival rate across all data centers is kept at 15 jobs per time
slot. But as the index grows larger, the variance of arrival rates
grows larger. For example, Group 2 has arrival rates ranging
from14 to16 jobs every time slot, whereas Group5 has arrival
rates ranging from12 to18 jobs per time slot. We note that the
assumption of Poisson distributed arrivals is not groundless.
In fact, it is suggested by the measurements in [15].
B. Experimental Settings
Power Cost Function. SAVE can handle a wide range of
power cost functions including non-convex functions. In our
experiments, we model power consumption P(N
i
,b
i
) as:
P(N
i
(t),b
i
(t)) =
µ
N
i
(t)
µ
b
i
(t)
α
A
+P
idle
¶¶
·PUE (17)
In (17), A, P
idle
, and α are constants determined by the
data center. Specifically, P
idle
is the average idle power
consumption of a server, and b
i
(t)
α
/A + P
idle
gives the
power consumption of a server running at rate b
i
(t). In our
experiments we choose α = 3, P
idle
= 150 Watts, and A
such that the peak power consumption of a server is 250
Watts. The model (17) and all its parameters are based on
the measurements reported in [9]. The PUE term accounts
for additional power usage (such as cooling) for having N
i
(t)
servers active. PUE values for today’s data centers lie between
1.3 and 2 [17]. We chose PUE=1.3 in all of our experiments.
This choice is pessimistic, in a sense that SAVE will achieve
larger power cost reductions when PUE is higher.
System Parameters. We set N
i
, the number of servers in data
center i, to be 1000 for all 7 data centers. Each server can
serve up to 10 tasks at the same time. With a 15 jobs per slot
arrival rate, this gives an average server utility of about 60%.
We set the bandwidth between the front end servers and the
back end clusters to a large value, based on the real world
practice [10]. Hence, a large amount of workload can be sent
from one data center to another within one time slot. We vary
the N
i
min
across a range of values to explore its impact on
SAVE’s performance (see Section VI-B).
1
Recall that all tasks within a job have the same execution time.
7
C. Simulated Schemes for Comparison
We compare SAVE the following work-conserving schemes
that either represent the current practices in data center man-
agement or are simple heuristic based approaches proposed by
others.
Local Computation. All the requests arriving atS
F
i
are routed
toS
B
i
(the local back end); i.e.,μ
ii
=A
i
andμ
ij
= 0 ifj 6=i.
Load Balancing. The amount of workload routed from D
i
to D
j
, μ
ij
, is proportional to the service capacity of D
j
,
regardless ofD
j
’s power prices. Intuitively, this scheme should
have good delay characteristics.
Low Price. This scheme is similar to the heuristic proposed
in [2] that routes more jobs to data centers with lower power
prices. However, no data center receives workload over 95th
percentile of its service capacity. Due to the discreteness of
job sizes and the constraint that all tasks of one job should be
served at the same back end cluster, it is difficult to ensure
that in each time slot the cluster with the lowest power price
always runs close to, but not over, its capacity. Thus, in this
scheme, the workload is routed such that the long term arrival
rate at the back end cluster with the lowest average power
price is close to the 95th percentile of its capacity. We then
route the workload to the second lowest price cluster, and so
on.
In all these schemes, we assume that all servers are activated
at all times.
2
However, we assume that the service rates of
the servers can be tuned in every slot. We also simulate the
following scheme that power downs idle servers:
Instant On/Off. Here, the routing decisions between front
end servers and back end clusters are exactly the same as
in the Load Balancing scheme. However, now not all servers
are active in all time slots. In every slot, each data center is
able to activate/put to sleep any number of servers with no
delay or power cost, and also adjust servers’ service rates.
This idealized scheme represents the upper bound of power
cost reductions achievable for the single data center case by
any work-conserving scheme in our experimental settings. It is
highly impractical because it assumes that servers can switch
between active state and sleep state at the same frequency as
power scaling (once every 15 seconds in our simulations).
VI. EXPERIMENTAL RESULTS
We now evaluate SAVE through simulation based experi-
ments, using the experimental setup described above.
A. Performance Evaluation
The performance of SAVE depends on parametersV andT .
We show experimental results of all schemes on all data sets
under different V and T values (with the other parameters
fixed). For power cost reduction results, we use the Local
Computation scheme as a baseline. For all other schemes,
we show the percentage of average power cost reduction as
compared to the Local Computation scheme. Specifically, let
2
According to [18], a large fraction (about 80%) of data center operators
do not identify idle servers on a daily basis.
PC
X
denote the average power cost of scheme X. We use
PCL.C.−PCX
PCL.C.
×100 to quantify the power cost reduction due
to schemeX. (HereL.C. is short for Local Computation.) For
delay results, we show the schemes’ average delay (in number
of time slots). We omit the delay results of the On/Off scheme
as they are nearly identical to those of the Load Balancing
scheme — the maximum difference is≈ 0.03 time slots (0.45
second).For all comparison schemes, we show average values
(power cost reduction and delay) over all arrival data sets. For
SAVE, we use curves to represent average values and bars to
show the corresponding ranges.
We first fix T to be 240 time slots (one hour) and run
experiments with different V values. The results are shown
in Figure 3(a) and (b). From Figure 3(a) we can see that as
V goes from 0.01 to 100, the power cost reduction grows
from an average of around 0.1% to about 18%. The On/Off
scheme achieves power cost reduction of about 9.7%. If we
choose V to be greater than 5 then SAVE results in larger
power cost reductions than scheme On/Off. Because (i) our
approach considers differences in power prices across different
data centers, and (ii) our approach is not work conserving and
can adjust service rates at different data centers according to
power prices. We also note that the scheme Low Price gives
a small power cost reduction (of 0.5%) – i.e., sending more
workload to data centers with low power prices in a greedy
fashion does not lead to significant savings in power cost. In
Figure 3(b), we observe that when V is small (< 0.1) the
average delay of SAVE is quite small and close to the delay
of scheme Load Balancing. IncreasingV results in larger delay
as well as larger power cost reductions. In general,V in SAVE
controls the trade-off between delay and power cost; e.g.,
whenV is large, SAVE outperforms scheme On/Off (which is
impractical scheme, as noted above), in power cost reduction.
We fix V to be 10 and vary T from 30 time slots (7.5
minutes) to 1080 time slots (4.5 hours), which is a sufficient
range for exploring the characteristics of SAVE. (Note that
servers are activated and put to sleep every 10 minutes in
[4] and every hour in [19].) Corresponding results of the
different schemes are shown in Figures 3(c) and (d). From
Figure 3(c) we can see that changing T has relatively small
effect on power cost reductions of our SAVE approach. The
average power cost reduction fluctuates between 8.7% and
13.6% when T varies from 30 to 1080 time slots. In most
cases, it results in higher cost reductions than scheme On/Off.
However, we note thatT has a larger impact on average delay,
as shown in Figure 3(d). In the extreme case, when T = 1080
time slots, the average delay is close to 64 time slots. This
is not surprising – recall that in the bound on queue size
given in Theorem 2, the B
2
term is proportional to T , i.e.,
the delay increases with T . However, reasonable delay values
are possible with appropriate choices of T , e.g., if we choose
T to be 240 time slots (1 hour), SAVE gives an average delay
of 14.8 time slots (3.7 minutes). From this set of results we
can see that for delay tolerant workloads, SAVE would take
infrequent actions on server activation/sleep (once in an hour
or less) and still achieve significant power cost reduction.
8
.01.02.05 .1 .2 .5 1 2 5 10 20 50100
0
5
10
15
20
V Value
Power Cost Reduction (%)
SAVE
Local
Load Balancing
On/Off
Low Price
.01.02.05 .1 .2 .5 1 2 5 10 20 50100
2
4
8
16
32
64
V Value
Delay (Time Slot)
SAVE
Local
Load Balancing
Low Price
30 60 90 120 180 240 360 540 7201080
0
5
10
15
T Value (Time slot)
Power Cost Reduction (%)
SAVE
Local
Load Balancing
On/Off
Low Price
30 60 90 120 180 240 360 540 7201080
2
4
8
16
32
64
T Value (Time slot)
Delay (Time slot)
SAVE
Local
Load Balancing
Low Price
(a) (b) (c) (d)
Fig. 3. Average power cost and delay of all schemes under different V and T values
0 10 20 30 40 50
0
5
10
N
min
(%)
Power Cost Reduction (%)
SAVE
Local
Load Balancing
On/Off
Low Price
0 10 20 30 40 50
1
2
4
8
16
32
64
N
min
(%)
Delay (Time Slot)
SAVE
Local
Load Balancing
Low Price
.01.02.05 .1 .2 .5 1 2 5 10 20 50100
−1
−0.5
0
0.5
1
V Value
Diff. in Power Cost Reduction (%)
.01.02.05 .1 .2 .5 1 2 5 10 20 50100
−1.5
−1
−0.5
0
0.5
1
V Value
Difference in Delay (Time Slot)
(a)
(d)
(c) (b)
Fig. 4. Average power cost and delay of all schemes under different N
min
values and robustness test results
B. The impact of N
min
In this set of experiments we keep V and T values
unchanged, but vary N
min
values from 0 to 50% of the
number of servers in a data center. The results are depicted
in Figures 4(a) and (b). Figure 4(b) indicates that increasing
N
min
improves delay performance, e.g., when it increases
from 0 to 20% of the number of servers, the average delay
decreases significantly, from about 72.5 to 25.9 time slots. At
the same time, as shown in Figure 4(a), the effect of N
min
on
power cost reduction is relatively small. This makes intuitive
sense. When N
min
grows larger, more servers are activated
regardless of job arrivals, providing more slots to serve jobs,
thus reducing average delay. On the other hand, adding more
active servers reduces the service rate of each server, which
compensates for the extra power consumed by added servers.
C. Robustness Characteristics
As mentioned in Section II-E, our SAVE algorithm needs
to know the amount of workload of each job. In practice,
when this is not available, we use estimated values. In this set
of experiments we explore the influence of estimation errors
on the performance of SAVE. To do this, for each arriving
job, we add a random estimation error (±50%, uniformly
distributed) to the amount of workload it contains. This gives
us one error data set for each arrival data set. We run SAVE on
these data sets, but let SAVE make all decisions on control
variables based on the amount of workload with estimation
errors. Only when a job get served does the server know the
exact amount of workload it actually contains.
We run experiments for all 5 pairs of data sets for different
V values, and compare the results to the results we obtained
using the original arrival data sets. In Figures 4(c) and (d),
we use the results on data sets without estimation errors as
the baseline, and show the differences in power cost reduction
percentage and delay (in time slots) due to injected estimation
errors. From Figure 4(c) we can see that for all V values we
experimented with, the difference(due to errors) in power cost
reduction is between −1.0% and 0.7%. As shown in Figure
4(d), estimation errors result in changes in average delay, but
only in the range of −1.2 to 0.9 time slots.
To conclude, SAVE is robust to workload estimation errors.
D. Actual Power Consumption of SAVE
SAVE is designed to reduce the cost of power in geograph-
ically distributed data centers, as this is one major concern for
large computational facility providers. At the same time, with
more attention paid to the social and ecological impact of large
computational infrastructures, it is also desirable to consider
environmental friendly approaches, i.e., while reducing the
cost of power, it is also desirable to reduce the actual consump-
tion of power. To this end, we record the actual power con-
sumption of all simulated schemes. In Figure 5 we show the
percentage of average power consumption reduction by SAVE
with different V values, relative to the Local Computation
scheme. Figure 5 illustrates that with V values ramping from
.01.02.05 .1 .2 .5 1 2 5 10 20 50100
0
5
10
V Value
Power Usage Reduction (%)
SAVE
Local
Load Balancing
On/Off
Low Price
Fig. 5. Differences in average power usage reduction for different V values
0.01 to100, the actual power consumption reduction goes from
about 0.1% to 10.3%. When V = 10, the reduction is around
8.7%. This indicates that SAVE is environmentally friendly,
in a sense that, while it reduces power cost, it also reduces
actual power consumption significantly. As a comparison, the
Low Price scheme is not environmentally friendly – although it
reduces power cost (see Figure 3(a)), it consumes more power
than the Local Computation scheme.
VII. RELATED WORK
As mentioned in Section I, work on power cost reduction
can be classified into three broad categories – single server,
single data center, and multiple data centers. For a single
server, researchers have proposed scheduling algorithms to
9
minimize power consumption subject to job deadlines [20], or
minimize average response time subject to a power constrain
[21]. Wierman et al. use dynamic CPU speed scaling to
minimize weighted sum of mean response time and power
consumption [5]. A survey of work on single server power
cost reduction is given in [22]. For a data center, Gandhi et
al. provide management policies that minimize mean response
time under a total power cap [9] or minimize the product to
response time and power cost [7]. Chen et al. propose solutions
based on queueing models and control theory to minimize
the server energy as well as data center operational costs [8].
Lin et al. design an online algorithm to minimize a convex
function of power usage and delay that is 3-competitive [4].
SAVE differs from these work in three ways: (i) it leverages
spatio-temporal differences in job arrivals and power prices at
different geographic locations to achieve power cost reduction;
(ii) all work mentioned above except [21] and [20] assume
closed form convex expressions for service delay or convex
delay-cost functions, whereas SAVE does not make these
assumptions as they may not always hold, especially for delay
tolerant jobs; (iii) SAVE does not rely on predictions of
workload arrival processes, as [8] and [4] do.
Power cost reduction across multiple data centers is an
area of active research. Qureshi et al. proposed the idea of
exploiting spatial diversity in power prices to reduce cost by
dynamically routing jobs to data centers with lower prices [2].
They also provide a centralized heuristic for doing so that is
similar to the Low Price scheme we evaluated in Section VI.
Rao et al. provide an approximation algorithm for minimizing
the total power cost subject to an average delay constraint
[19], while Liu et al. designed load balancing algorithms to
minimize a metric that combines power and delay costs [3].
Both papers make routing and server on/off decisions based on
predictions on arrival rates and closed form convex expressions
for average delay. [6] makes control decisions at three levels –
server, data center and across multiple data centers – in
one time slot by solving a deterministic convex optimization
problem. All these work have a work conserving scheduler and
only exploit the spatial differences in job arrivals and power
prices. Also, they work on a single time scale. In contrast,
SAVE exploits both the spatial and temporal differences in
job arrivals and power prices by using a non work conserving
scheduler. This leads to greater power cost reductions when
serving delay-tolerant workloads. Moreover, it works on two
time scales to reduce the server on/off frequency.
The Lyapunov optimization technique that we use to de-
sign SAVE was first proposed in [23] for network stabil-
ity problems. It was generalized in [11] for network utility
optimization problems. Recently, Urgaonkar et al. used this
technique to design an algorithm for joint job admission
control, routing, and resource allocation in a virtualized data
center [24]. However, they consider power reduction in a
single data center only. To the best of our knowledge, our
work is the first to apply a novel two time scale network
control methodology to distributed workload management for
geographically distributed data centers.
VIII. CONCLUSIONS
In this paper, we propose a general framework for power
cost reduction in geographically distributed data centers. Our
approach incorporates routing and server management actions
on individual servers, within a data center, and across multiple
data centers, and works at multiple time scales. We show
that our approach has provable performance bounds and is
especially effective in reducing power cost when handling
delay tolerant workloads. We also show that our approach
is robust to workload estimation errors and can result in
significant power consumption reductions.
REFERENCES
[1] www.gizmodo.com/5517041/.
[2] A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, and B. Maggs,
“Cutting the electric bill for internet-scale systems,” in SIGCOMM,
2009.
[3] Z. Liu, A. Wierman, S. Low, and L. Andrew, “Greening geographical
load balancing,” in ACM SIGMETRICS, 2011.
[4] M. Lin, A. Wierman, L. Andrew, and E. Thereska, “Dynamic right-sizing
for power-proportional data centers,” in IEEE INFOCOM, 2011.
[5] A. Wierman, L. Andrew, and A. Tang, “Power-aware speed scaling in
processor sharing systems,” in IEEE INFOCOM, 2009.
[6] R. Stanojevic and R. Shorten, “Distributed dynamic speed scaling,” in
IEEE INFOCOM, 2010.
[7] A. Gandhi, V . Gupta, M. Harchol-Balter, and A. Kozuch, “Optimality
analysis of energy-performance trade-off for server farm management,”
Perform. Eval., vol. 67, pp. 1155–1171, November 2010.
[8] Y . Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gau-
tam, “Managing server energy and operational costs in hosting centers,”
in ACM SIGMETRICS, 2005.
[9] A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy, “Optimal power
allocation in server farms,” in ACM SIGMETRICS, 2009.
[10] http://googleenterprise.blogspot.com/2010/03/disaster-recovery-by-
google.html.
[11] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation and
cross-layer control in wireless networks,” Foundations and Trends in
Networking, vol. vol. 1, no. 1, pp. 1-149, 2006.
[12] Y . Yao, L. Huang, A. Sharma, L. Golubchik, and M. Neely, “Power
cost reduction for delay tolerant workloads in distributed data centers: a
two time scale approach,” University of Southern California Computer
Sciences Department, Tech. Rep., July 2011.
[13] L. Huang and M. J. Neely, “Max-weight achieves the ex-
act [o(1/v),o(v)] utility-delay tradeoff under markov dynamics,”
arXiv:1008.0200v1, 2010.
[14] Federal Energy Regulatory Commission www.ferc.gov.
[15] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and
I. Stoica, “Delay scheduling: a simple technique for achieving locality
and fairness in cluster scheduling,” in EuroSys, 2010.
[16] G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y . Lu,
B. Saha, and E. Harris, “Reining in the outliers in map-reduce clusters
using mantri,” in OSDI, 2010.
[17] www.google.com/corporate/green/.
[18] “Unused servers survey results analysis,” http://www.thegreengrid.org/.
[19] L. Rao, X. Liu, L. Xie, and W. Liu, “Minimizing electricity cost:
Optimization of distributed internet data centers in a multi-electricity-
market environment,” in IEEE INFOCOM, 2010.
[20] F. Yao, A. Demers, and S. Shenker, “A scheduling model for reduced
cpu energy,” in IEEE FOCS, 1995.
[21] K. Pruhs, P. Uthaisombut, and G. Woeginger, “Getting the best response
for your erg,” ACM Trans. Algorithms, vol. 4, pp. 1–17, July 2008.
[22] S. Albers, “Energy-efficient algorithms,” Commun. ACM, vol. 53, pp.
86–96, May 2010.
[23] L. Tassiulas and A. Ephremides, “Stability properties of constrained
queueing systems and scheduling policies for maximum throughput in
multihop radio networks,” IEEE Trans. on Automatic Control, vol. 37,
no. 12, pp. 1936-1949, Dec. 1992.
[24] R. Urgaonkar, U. Kozat, K. Igarashi, and M. Neely, “Dynamic re-
source allocation and power management in virtualized data centers,”
in IEEE/IFIP NOMS, 2010.
10
APPENDIX - PROOFS
A. Proof of Lemma 1
Here we prove Lemma 1.
Proof: Let t = kT for some k ∈ Z
+
. Then consider
any τ ∈ [t,...,t + T − 1]. Squaring both sides of the
queueing dynamic (1) and using the fact that for any x∈R,
(max[x,0])
2
≤x
2
, we have:
[Q
F
i
(τ +1)]
2
≤ [Q
F
i
(τ)]
2
+[
X
j
μ
ij
(τ)]
2
+[A
i
(τ)]
2
− 2Q
F
i
(τ)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
.
Summing the above over i = 1,...,M and using the fact that
P
j
μ
ij
(τ)≤μ
max
and A
i
(t)≤A
max
, we have:
X
i
¡
[Q
F
i
(τ +1)]
2
−[Q
F
i
(τ)]
2
¢
≤M(μ
2
max
+A
2
max
)
−
M
X
i=1
2Q
F
i
(t)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
. (18)
Repeating the above steps for the back-end backlogs, we get:
X
j
¡
[Q
B
j
(τ +1)]
2
−[Q
B
j
(τ)]
2
¢
≤MN
2
i
b
2
max
+M
2
μ
2
max
−
M
X
j=1
2Q
B
j
(τ)
£
N
i
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
. (19)
Note in (19) the number of servers is N
i
(t) because it is
determined everyT slots. Now multiply (18) and (19) by 1/2,
summing them together, and taking expectations over A
i
(τ)
and p
i
(τ), i = 1,...,M, τ ∈ [t,...,t+T−1], conditioning on
Q(t), we get:
Δ
1
(τ)≤B
1
−E
©
X
i
Q
F
i
(τ)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
X
j
Q
B
j
(τ)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
. (20)
Here B
1
= MA
2
max
+ (M
2
+ M)μ
2
max
+
P
i
N
2
i
b
2
max
.
Summing (20) over τ = t,t + 1,...,t + T − 1, and using
the definition of Δ
T
(t), we have:
Δ
T
(t)≤B
1
T
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(τ)
£
X
j
μ
ij
(τ)−A
i
(t)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(τ)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
.
(21)
Now adding to both sides the power cost over the frame, i.e.,
the term VE
©P
t+T−1
τ=t
P
M
j=1
f
j
(τ)|Q(t)
ª
proves the lemma.
We now prove Lemma 2.
Proof: Recall that we have:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
≤
B
1
T +VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(τ)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(τ)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
. (22)
Now by (1) and (2), we see that for any τ ∈ [t,t+T−1], we
have:
Q
F
j
(t)−(τ −t)μ
max
≤Q
F
j
(τ)≤Q
F
j
(t)+(τ −t)A
max
,
Q
B
j
(t)−(τ −t)N
i
b
max
≤Q
B
j
(τ)≤Q
B
j
(t)+(τ −t)Mμ
max
.
Therefore,
t+T−1
X
τ=t
X
j
Q
B
j
(τ)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
≥
t+T−1
X
τ=t
X
j
[Q
B
j
(t)−(τ −t)N
i
b
max
]N
j
(t)b
j
(τ)
−
t+T−1
X
τ=t
X
j
[Q
B
j
(t)+(τ −t)Mμ
max
]
X
i
μ
ij
(τ)
≥
t+T−1
X
τ=t
X
j
Q
B
j
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
−
t+T−1
X
τ=t
(τ −t)[N
2
i
b
2
max
+M
2
μ
2
max
]
≥
t+T−1
X
τ=t
X
j
Q
B
j
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
− T(T −1)
X
j
[N
2
j
b
2
max
+M
2
μ
2
max
]/2. (23)
Similarly, we have:
t+T−1
X
τ=t
X
i
Q
F
i
(τ)[
X
j
μ
ij
(τ)−A
i
(τ)]
≥
t+T−1
X
τ=t
X
i
[Q
F
i
(t)−(τ −t)μ
max
][
X
j
μ
ij
(τ)]
−
t+T−1
X
τ=t
[Q
F
i
(t)+(τ −t)A
max
]A
i
(τ)
≥
t+T−1
X
τ=t
X
i
Q
F
i
(t)[
X
j
μ
ij
(τ)]−T(T −1)[μ
2
max
+A
2
max
]/2.
(24)
Therefore by defining B
2
= B
1
+ (T − 1)
P
j
[N
2
j
b
2
max
+
(M
2
+ 1)μ
2
max
]/2 + (T − 1)MA
2
max
/2 and using (23) and
(24) in (22), we get:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
≤
B
2
T +VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(t)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
11
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
.
Rearrange the terms in the RHS, we get:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
≤
B
2
T +E
©
t+T−1
X
τ=t
X
j
Q
i
(t)A
i
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
μ
ij
(τ)
£
Q
F
i
(t)−Q
B
j
(t)
¤
|Q(t)
ª
+E
©
t+T−1
X
τ=t
X
j
£
Vf
j
(τ)−Q
B
j
(t)N
j
(t)b
j
(τ)|Q(t)
ª
.
This completes the proof.
PROOF OF THEOREM 2
We prove Theorem 2 here.
Proof: Let t = kT for some k ∈ {0,1,...,}. Using
Lemma 1 and the fact that SAVE is constructed to minimize
the RHS of (12), we have:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
SAVE
j
(τ)|Q(t)
ª
≤
B
2
T +VE
©
t+T−1
X
τ=t
X
j
f
alt
j
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(t)
£
X
j
μ
alt
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(t)
£
N
alt
j
(t)b
alt
j
(τ)−
X
i
μ
alt
ij
(τ)
¤
|Q(t)
ª
. (25)
Here alt represents any alternative policy that can be im-
plemented over the frame from t to t + T − 1. Now since
λ + 2ǫ 1 ∈ Λ, it can be shown using Theorem 1 that there
exists a stationary and randomized policy Π
′ opt
that achieves
the following:
kT+T−1
X
τ=kT
M
X
i=1
E
©
f
Π
′ opt
(τ)
ª
=Tf
∗
av
(λ+2ǫ 1), (26)
kT+T−1
X
τ=kT
E
©
X
i
μ
Π
′ opt
ij
(τ)
ª
=
kT+T−1
X
τ=kT
E
©
N
Π
′ opt
i
(τ)b
Π
′ opt
i
(τ)
ª
−ǫ, ∀ j, (27)
kT+T−1
X
τ=kT
E
©
A
i
(τ)
ª
=
kT+T−1
X
τ=kT
E
©
X
i
μ
Π
′ opt
ij
(τ)
ª
−ǫ ∀ i.
(28)
Here f
∗
av
(λ+2ǫ 1) is the minimum cost corresponding to the
rate vectorλ+2ǫ 1. Plug (26), (27) and (28) into (25), we get:
Δ
T
(t)+VE
©
t+T−1
X
τ=t
X
j
f
SAVE
j
(τ)|Q(t)
ª
≤B
2
T+
VTf
∗
av
(λ+2ǫ 1)−ǫT
X
i
Q
F
i
(t)−ǫT
X
j
Q
B
j
(t).
Now we can take expectations on both sides overQ(t) to get:
E
©
L(t+T)−L(t)
ª
+VE
©
t+T−1
X
τ=t
X
j
f
SAVE
j
(τ)
ª
≤B
2
T+
VTf
∗
av
(λ+2ǫ 1)−ǫT E
©
X
i
Q
F
i
(t)
ª
−ǫT E
©
X
j
Q
B
j
(t)
ª
.
(29)
Rearranging the terms, and using the fact that 0 ≤ f
∗
av
(λ+
2ǫ 1)≤f
max
we get that:
E
©
L(t+T)−L(t)
ª
+ǫT
¡
E
©
X
i
Q
F
i
(t)
ª
+E
©
X
j
Q
B
j
(t)
ª¢
≤B
2
T +VTf
max
.
Summing the above over t = kT , k = 0,1,...K − 1,
rearranging the terms, using the fact that L(t) ≥ 0 for all
t, and dividing both sides by ǫKT , we have:
1
K
K−1
X
k=0
¡
E
©
X
i
Q
F
i
(kT)
ª
+E
©
X
j
Q
B
j
(kT)
ª¢
≤
B
2
+Vf
max
ǫ .
Taking a limsup asK →∞ proves (13). To prove (14), using
(29), we have:
VE
©
t+T−1
X
τ=t
X
j
f
SAVE
j
(τ)
ª
≤B
2
T +VTf
∗
av
(λ+2ǫ 1).
Summing the above over t = kT , k = 0,...,K − 1, and
dividing both sides by KTV , we have:
1
KT
E
©
KT−1
X
τ=0
X
j
f
SAVE
j
(τ)
ª
≤f
∗
av
(λ+2ǫ 1)+
B
2
V
.
Now (14) follows by taking a limsup as K →∞, using the
Lebesgue’s dominated convergence theorem, and then letting
ǫ → 0.
IX. PROOF OF THEOREM 3
Proof: It suffices to show that using the weights
ˆ
Q
F
i
(t)
and
ˆ
Q
B
i
(t), we still minimize the right hand side of (25) to
within some additive constant. To see this, suppose now
ˆ
Q
F
i
(t)
and
ˆ
Q
B
i
(t) are used to carry out the SA VE algorithm, then we
see that we try to minimize:
Obj(
ˆ
Q(t)),VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
ˆ
Q
F
i
(t)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
ˆ
Q
B
j
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
.
Denote e
F
i
(t) =
ˆ
Q
F
i
(t)−Q
F
i
(t) and e
B
i
(t) =
ˆ
Q
B
i
(t)−Q
B
i
(t).
Rewrite the above as:
Obj(
ˆ
Q(t)) =VE
©
t+T−1
X
τ=t
X
j
f
j
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(t)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
12
−E
©
t+T−1
X
τ=t
X
i
e
F
i
(t)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
e
B
i
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q(t)
ª
. (30)
Now denote the minimum value of Obj(Q(t)) to be Obj
∗
,
i.e., the minimum of the RHS of (25) withQ(t), and denote
the minimum value of Obj(
ˆ
Q(t)) to be Obj
†
. Then using the
fact that |e
F
i
(t)|,|e
B
i
(t)| ≤ c
e
for all t, and the facts that
|
P
j
μ
ij
(τ) − A
i
(τ)| ≤ μ
max
+ A
max
and |N
j
(t)b
j
(τ) −
P
i
μ
ij
(τ)|≤N
max
b
max
+Nμ
max
, we see that
Obj
†
≤ Obj
∗
−E
©
t+T−1
X
τ=t
X
i
e
F
i
(t)
£
X
j
μ
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
e
B
i
(t)
£
N
j
(t)b
j
(τ)−
X
i
μ
ij
(τ)
¤
|Q
ª
.
Using this and (30), we see that:
VE
©
t+T−1
X
τ=t
X
j
f
†
j
(τ)|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
i
Q
F
i
(t)
£
X
j
μ
†
ij
(τ)−A
i
(τ)
¤
|Q(t)
ª
−E
©
t+T−1
X
τ=t
X
j
Q
B
j
(t)
£
N
†
j
(t)b
†
j
(τ)−
X
i
μ
†
ij
(τ)
¤
|Q(t)
ª
≤ Obj
∗
+2Tc
e
(μ
max
+A
max
+N
max
b
max
+Mμ
max
).
Here f
†
j
(τ),μ
†
ij
(τ),N
†
j
(t) and b
†
j
(τ) are the action taken by
the policy based on
ˆ
Q(t). This completes the proof. This
shows that (25) holds with Q(t) replaced by
ˆ
Q(t), and B
2
replaced by B
3
=B
2
+2Tc
e
(μ
max
+A
max
+N
max
b
max
+
Mμ
max
). The rest of the proof follows similarly as the proof
of Theorem 2.
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 914 (2010)
PDF
USC Computer Science Technical Reports, no. 905 (2009)
PDF
USC Computer Science Technical Reports, no. 888 (2007)
PDF
USC Computer Science Technical Reports, no. 904 (2009)
PDF
USC Computer Science Technical Reports, no. 785 (2003)
PDF
USC Computer Science Technical Reports, no. 917 (2010)
PDF
USC Computer Science Technical Reports, no. 919 (2011)
PDF
USC Computer Science Technical Reports, no. 918 (2010)
PDF
USC Computer Science Technical Reports, no. 766 (2002)
PDF
USC Computer Science Technical Reports, no. 834 (2004)
PDF
USC Computer Science Technical Reports, no. 928 (2012)
PDF
USC Computer Science Technical Reports, no. 894 (2008)
PDF
USC Computer Science Technical Reports, no. 799 (2003)
PDF
USC Computer Science Technical Reports, no. 923 (2012)
PDF
USC Computer Science Technical Reports, no. 715 (1999)
PDF
USC Computer Science Technical Reports, no. 667 (1998)
PDF
USC Computer Science Technical Reports, no. 906 (2009)
PDF
USC Computer Science Technical Reports, no. 739 (2001)
PDF
USC Computer Science Technical Reports, no. 969 (2016)
PDF
USC Computer Science Technical Reports, no. 952 (2015)
Description
Yuan Yao, Longbo Huang, Abhihshek Sharma, Leana Golubchik and Michael Neely. "Data centers power reduction: A two time scale approach for delay tolerant workloads." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 920 (2011).
Asset Metadata
Creator
Golubchik, Leana (author), Huang, Longbo (author), Neely, Michael (author), Sharma, Abhihshek (author), Yao, Yuan (author)
Core Title
USC Computer Science Technical Reports, no. 920 (2011)
Alternative Title
Data centers power reduction: A two time scale approach for delay tolerant workloads (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
12 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270809
Identifier
11-920 Data Centers Power Reduction A two Time Scale Approach for Delay Tolerant Workloads (filename)
Legacy Identifier
usc-cstr-11-920
Format
12 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/