Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Studies on the impact of long-term correlation on computer network performance
(USC Thesis Other)
Studies on the impact of long-term correlation on computer network performance
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI
films the text directly from the original or copy submitted. Thus, some
thesis and dissertation copies are in typewriter face, while others may be
from any type o f computer printer.
The quality o f this reproduction is dependent upon the quality o f the
copy submitted. Broken or indistinct print, colored or poor quality
illustrations and photographs, print bleedthrough, substandard margins,
and improper alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete
manuscript and there are missing pages, these will be noted. Also, if
unauthorized copyright material had to be removed, a note will indicate
the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and
continuing from left to right in equal sections with small overlaps. Each
original is also photographed in one exposure and is included in reduced
form at the back o f the book.
Photographs included in the original manuscript have been reproduced
xerographically in this copy. Higher quality 6” x 9” black and white
photographic prints are available for any photographs or illustrations
appearing in this copy for an additional charge. Contact UMI directly to
order.
UMI
A Bell & Howell Information Company
300 North Z eeb Road, Ann Arbor MI 48106-1346 USA
313/761-4700 800/521-0600
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
ST U D IE S O N T H E IM P A C T O F L O N G -TER M C O R R ELA TIO N O N
C O M P U T E R N E T W O R K P E R F O R M A N C E
by
Hany D. Alsaialy
A Thesis Presented to the
FACULTY OF THE SCHOOL OF ENGINEERING
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
Master of Science
(Computer Engineering)
December 1998
©1998 Hany D. Alsaialy
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 1394895
UMI Microform 1394895
Copyright 1999, by UMI Company. All rights reserved.
This microform edition is protected against unauthorized
copying under Title 17, United States Code.
UMI
300 North Zeeb Road
Ann Arbor, MI 48103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This thesis, written by
under the guidance of his/her Faculty Committee and
approved by all its members, has been presented to and
accepted by the School of Engineering in partial
fulfillment of the requirements for the degree of
L£j£tCJL
JS d a yrijFtiinJZ& f.urs.
s -
_ , December 4, 1998
Date:---------------------------------
Faculty Committee
lairman
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
C ontents
List of Figures iv
List of Tables vi
I Link-layer M odeling 1
1 Introduction 2
1.1 Long-Range Dependence in Traffic stre a m s .............................................. 3
2 Basic C lustering M eth od and Traffic M odels P rop osed 6
2.1 How to cluster d a t a ........................................................................................ 6
2.2 Clustering m ethod and Markov M odeling................................................. 8
2.3 Clustering M ethod and Semi-Markov Modeling .................................... 8
3 Sim ulation R esu lts 11
3.1 Matching Mean, Variance, and C L R ...............................................................13
3.2 LBC and VT p l o t s ............................................................................................16
4 Conclusion 20
II Transport-layer M odeling 22
5 Introduction 23
5.1 Related W o r k ..................................................................................................... 25
6 Issues and Traffic M od el Proposed 27
6.1 Web Workload traffic (for request arrivals and volum es).......................... 27
6.2 The Superposition of Fractal Renewal
Processes Model (S u p -F R P )............................................................................28
6.3 The Sup-FRP M atch the Arrival P ro c e ss.....................................................30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.4 Heavy-Tailed Distributions: A N o te ................................................................31
7 N etw ork M odel 33
7.1 Simulation E n v iro n m en t..................................................................................34
8 Sim ulation R esults 36
9 C onclusion 43
10 A ppendix: D erivation o f th e Sup-FR P 45
B ibliography 48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
iv
List o f Figures
1.1 The main fundamental constraints in ATM networks [CG96]............... 4
3.1 Dominant regions for the three main components of a trace [CG95]. . 12
3.2 A single queue single server system............................................................. 14
3.3 Effect of short-range dependence is pronounced with small buffer size. 15
3.4 Effect of long-range dependence is pronounced as buffer size increase. 15
3.5 The semi-Markov process matches well under realistic operational
scenarios................................................................................................................. 16
3.6 The leaky-bucket contour plot for the four traces.................................... 17
3.7 The variance-time plot for the four traces..................................................18
3.8 The eight quantized levels of the processed trace..................................... 19
3.9 The three data traces shown at different levels of aggregation (Note:
a Poisson process was used in the DTMP and SMP state)........................ 19
6.1 The Sup-FRP process shown graphically. Each FR P is i.i.d. (M=3
in the figure)..........................................................................................................29
6.2 EDC match between the Sup-FRP and the Web request arrival process
[Ryu98]................................................................................................................... 31
7.1 The two node simulation topology. ............................................................. 34
8.1 Effects from correlated arrivals get pronounced as the buffer size in-
crerase (we used B=2KB, 4KB, and 64KB)................................................... 37
8.2 Effects of changing the Hurst parameter of the Arrival process and
File size pdf on Throughput.............................................................................. 38
8.3 File size distribution impact performance at low utilization levels.
The arrival process impact performance at high utilization level. . . . 39
8.4 Both a strongly correlated arrival process and heavy-tailed file pdf
result in high performance degradation...........................................................40
8.5 Average file transmission time increase with the Sup-FRP arrival pro
cess..........................................................................................................................41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
V
8 .6 Variance in file transmission tim e increase with both the Sup-FRP
arrival process and Heavy-tailed file p d f . ..................................................... 41
10.1 Life-time and Residual-life interval.................................................................. 46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
vi
List of Tables
3.1 Statistics of the real and synthesized data traces..........................................13
8.1 Network Utilization and equivalent Link C apacity..................................... 36
8.2 Effects of correlated arrivals and heavy-tailed file size pdf on perfor
mance: Summary of Results...............................................................................42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A bstract
STUDIES ON THE IMPACT OF LONG-TERM CORRELATION ON
COMPUTER NETWORK PERFORMANCE
Hany D. Alsaialy
In Part I of this thesis we present a framework for modeling Variable-Bit-Rate
(VBR) traffic based on semi-Markov processes. We propose an algorithm based
on a simple clustering m ethod to build semi-Markov models for computer network
traffic modeling. We introduce our novel mechanism by giving its detailed algorithm,
and later analyze its performance by means of simulation. We reveal the efficacy
of the proposed method for Long-Range Dependence (LRD) traffic modeling under
realistic buffer sizes, and compare the performance of a real computer network VBR
traffic trace with the synthesized traces obtained using our mechanism.
In Part II, we present a mechanism for the modeling of the Internet’s World
Wide Web (WWW or Web) traffic based on the superposition of independent and
identically distributed (i.i.d.) point processes. We describe the Superposition of
Fractal Renewal Processes (Sup-FRP) traffic model used for modeling Web request
arrivals. We show that neglecting the correlation found in real Web requests will
lead to inaccurate performance evaluations. We show the performance impact of
correlated Web requests on throughput and average response time, and compare
our findings with previously reported results.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgments
viii
I am indebted to a large number of people who contributed directly and indirectly
in the development of this thesis.
First and foremost, I would like to thank Professor John Silvester, my academic
advisor, whose constant supervision and perceptive insights helped me finish this
work. I am also very thankful to Dr. Bo Ryu from the Hughes Research Labs
(HRL) without whom, part II of this thesis wouldn’t have been possible. I would
like to thank Professor Deborah Estrin and Dr. Antonio Ortega for serving on my
committee.
I would like to acknowledge my sponsoring agency, the Institute of Public Ad
m inistration (IPA), Riyadh, Saudi Arabia, for their support throughout my studying
in the United States. I would also like to acknowledge the support of the HRL Labs,
Malibu, California, USA, for their partial support of part II of this thesis.
Last, but by no means least, I would like to thank my Mom and Dad. For their
unconditional Love and constant support. To my brother Sami, and everyone who
contributed and supported me throughout this work. I thank You all.
H a n y D . A l s a i a l y
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1
Part I
Link-layer M odeling
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2
Chapter 1
Introduction
In recent years, there has been significant attention paid to computer network
traffic modeling. This research has increased tremendously since the discovery of
Long-Range Dependence (LRD) in real computer network traffic streams [LTWW94].
In [LTWW94] several traces of real computer traffic were collected over a three year
period from both Local Area Network (LAN) and Wide Area Network (WAN).
W hen they were analyzed they revealed what is now commonly known as LRD. But
what does LRD mean? Simply, and with a high level of abstraction, LRD states that
“things th at happen far apart cannot be considered independent” . Later, we will
give several mathematical characterizations of LRD. After the first results published
revealing the omnipresence of LRD in computer network traffic streams, other re
search studies where done and similar results obtained [GW94]. An example of such
a stream is the one from the Star Wars movie encoded using the MPEG standard.
W hen modeling computer network traffic, it is important to identify the various
types of traffic that are found in today’s networks and the quality of service (QOS)
expected by the user.
It is clear that the difference in network traffic modeling lies not only on the
type of data being transferred (e.g. voice, video, data) or whether the data being
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3
transferred is encoded or not, but also on the QOS required by the user. As noted,
there is a relationship between the QOS (that concerns the user), and resources
that need to be allocated in order to guarantee the service required (that concerns
network management). Unfortunately, there is an inverse relation between QOS and
network utilization (e.g. resource allocation), in other words, as utilization increases,
overall delays increase which adversely affects the QOS of the users. Both Short-
Range Dependence (SRD) and LRD phenomena may impact this utilization/QOS
trade-off, so it is important to use traffic models that appropriately represent both
factors.
Before proceeding, we emphasize two different aspects of network performance:
i) network management, and ii) user requirements. We will later reveal their impor
tance to the concept of traffic modeling. We identify three fundamental constraints
[CG96], each constraint affects overall network utilization, or the user performance
(i.e. QOS). These three main constraints are:
1. (Maximal) Delay constraint.
2. (Maximal) Loss constraint.
3. (Minimal) Utilization constraint.
Figure 1.1 shows the constraints graphically1. It is important to keep these con
straints in mind when designing computer network traffic models.
1.1 Long-Range Dependence in Traffic streams
In the introduction we gave a brief description of LRD, we now present the
concept of LRD from a mathematical point of view. There are several ways to
throu gh ou t this thesis ’’frame” refers to ”frame-time”= ^-second.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4
V Mcudmal Delay C onstraint
iirf
Allowable
Minimal Utilization
Constraint
Region
m
Capacity (cells/fram e)
Figure 1.1: The main fundamental constraints in ATM networks [CG96].
describe the concept of LRD, we briefly present some methods, more details can be
found in [LTWW94].
Let X = {X i : i > 0} be a wide-sense stationary random process. X is LRD if
the variance of its aggregate process X ^ d e c re a se s slowly. This can be checked by
the variance-time plot. The variance of the aggregate process X fm\ is defined as the
variance of the number of observations in the new process X}mK X jmhs obtained
by averaging the original process X over non-overlapping blocks of size m; that is,
x}m^ = ^(Xim-m+i + ... + Xim), i > 1. M athem atically for a process to be LRD
we require v a r ( X ^ ) ~ C\vn~D as m — > oo , 0 < D < 1. where D = 2 — 2H. H is
known as the Hurst parameter. For a process to be LRD we require 1/2 < H < 1.
Since we have D = 0 for H = 1, and D = 1 for H = 1/2, the presence of LRD
can be determined by the slope of the variance-time plot; a slope of 0 (i.e. H = 1)
will indicate strong LRD while a slope of -1 (i.e. H = 1/ 2) indicates the absence of
LRD, th at is, the process is SRD.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5
O ther conditions th at can be used to test a random process for LRD include;
testing the autocovariance function 7 (&), the power spectral density or the
mean of its rescaled adjusted range R (n )/S (n ). Mathematically, we require 'y(k) ~
C2k ~ ° as rrt — > • 00 , 0 < D < 1, this can be checked by the empirical autocorrelation
function, /(w ) ~ as u > > ■ 0 , 0 < a. < 1, this can be checked by the
peridiogram. Finally, E [R (n)/S(n)\ ~ C^n~a as n — » • 00 , 1 /2 < H < 1, this
can be checked by the rescaled adjusted range plot (also called the pox diagram
of R/S)[BSTW95]. Note that: D = I — a. See [BSTW95], [GW94],[LTWW94],
[LTWW95] for more details.
In our simulations we will look at the variance-time plot, and other tools we later
describe.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6
Chapter 2
B asic C lustering M ethod and
Traffic M odels Proposed
In this chapter we introduce the notion of clustering; we first give a generic
clustering algorithm, and later introduce the data we wish to cluster.
2.1 How to cluster data
For the time being we introduce the general clustering algorithm , later we will
describe the actual data used in our experiments, and see how the clustering method
can be used to build the proposed traffic models.
Clustering of data, is performed in the following way:
A lgorithm 1 (Clustering)
1 begin read the first sample Xy, set C\ < — Xi, * — 1, and initialize Threshold\ 1
2 do read the next sample Xi
3 do fo r all clusters j (with centers Cj)
X A description of the method we use to calculate the threshold is given in chapter 3.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7
4 i f there exist a distance, dij, from Xi to Cj less than the threshold of
cluster j, that is;
d^ = [|rc£ — Cj\\ < Thresholdj
5 then select cluster j having the smallest d^
6 end fo r
7 i f cluster j (steps 4, 5) is found then
8 update the center of cluster j by:
Cj * — i i Cj + xt -}
T l j “ j * * X
where n j = total number of samples in cluster j before adding x,.
9 update nj * — ra, + 1
10 update Thresholdj
11 else form a new cluster as performed in 1
12 until end of stream (or for a reasonable time enough to characterize traffic)
13 end
In our experiments we used a stream of VBR video traffic, and converted the
stream representing bits/fram e to 53 byte ATM cells/frame. We left the measure
ment rate of 24 frames/sec unchanged. Hence, clustering is performed per-frame.
The stream of the full-length Star Wars movie used was originally coded using the
MPEG standard measured at frame resolution. Since this known stream resembles
real VBR traffic, we choose it to verify the efficacy of the proposed model. Note
that this stream is bursty and exhibits LRD as will be shown later.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8
2.2 Clustering method and Markov Modeling
In this section we describe the m ethod we use to build a pure Markovian model
based on the clusters obtained by the algorithm just described.
The data stream represents a sequence of random events, we will therefore specify
the analogy of the clustering m ethod to a Discrete-Time Markov Process (DTMP).
As part of the clustering process described above, we also maintain the cluster
transition history by the m atrix T which specifies the number of transitions % from
cluster i to cluster j \fij.
The analogy with the DTM P is now clear and easy to formalize. The mapping
is done as follows:
• The number of states of the DTM P * ----► N u m b er of clusters.
• The average input rate generated in state i « ----» ■ Final location of the cluster
center C *.
• The transition probabilities Pij between states i , j < — * Vij.
3
Throughout this thesis, we will use the terms cluster and state (i.e. state of the
Markov or semi-Markov chain) interchangeably.
2.3 Clustering Method and Semi-Markov Model
ing
We are now ready to define the steps required to build a Semi-Markov Process
(SMP) using the clustering method. We would like to mention that a SMP in not
Markovian, in other words, a SMP may or may not be SRD depending on the
holding-time in each state, for more details see [Nel95]. A SMP is a generalization
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9
of a Markov chain. In such a process we include a pdf to specify the state holding
time in each state. Note th at if the state holding time follows the geometric or
exponential distribution, we would then have a Markov chain. Any other pdf yields
to a SMP. In this study, we use holding times distributed according to a Pareto
distribution (other holding time distributions could be as well used).
In order to model the holding time Hj for any state j of the semi-Markov chain,
we need a sequence of random events representing these holding times. Let X ij be
the random variable counting the number of j > ■ j transitions in a single visit to
state j, and vrij be the number of visits to state j observed during the trace, in other
words, define {X mj,j} = Xi,j, ...
To estimate the parameters of a pareto pdf defining the holding-time of state j
of our SMP, given we use the m ethod of maximum likelihood.
The pareto pdf is defined as follows [AFT98]:
/ (x | ctj, kj) = ajk*5x~(aj+v> Oij,kj > 0, x ^ kj (2.1)
where ctj, kj are the pareto parameters and the subscript j refers to cluster j.
Hence, the likelihood of a3 is:
lik (aj) = J J / (Xij | ajt k3) (2.2)
i= 1
and the log-likelihood of a3 is:
m 3
1 O i) = Y 2 log f I a* ' ( 2-3)
t=i
solving (ctj) = 0 for O L j we get:
O-jMLE = — Vj (2-4)
E log^ ,i -f rrij log kj
i=i
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
10
where kj is an estim ate of kj that can be found solving for kj numerically using
Newton’ s method [PVTF92].
We adjust the transition probabilities PtJ- to eliminate the self loops by:
This is the simplest definition for operating a SMP. In our simulations we used
this definition, however, we would like to describe another th at requires further
investigation:
1. After entering state i, select the next state j ^ i where the transition is to be
made.
2. Find the holding time H ij that the process will spend state i before moving
to the next state j (where H^j is defined as the holding time in state i given
that the next state is j).
In this definition, the state holding time depends on the next state transition,
in other words, several pdfs define the holding time in a given state. Intuitively, we
believe this approach would yields better modeling results.
(2.5)
The SMP described operates in the following fashion:
1. After entering state i, find the holding time Hi that the process spends in
state i before moving to the next state j.
2. Select the next state j ^ i where the transition is to be made.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
11
Chapter 3
Sim ulation R esults
build both DTMPs and SMPs
movie, and analyze our results
by simulation. We compare several aspects of the original and model generated
streams, including, Cell-Loss Rate (CLR) under different network scenarios. We
will also compare the behavior of the streams by the Leaky-Bucket Contour plot
(LBC) [LNR94]. For the LBC plot method, there are several regions of interest as
described in [CG96] see Figure 3.1. The three main regions are: i) LRD dominant
region, ii) SRD dominant region, and iii) Marginal distribution dominant region.
Notice that in the clustering method, we produce a processed stream creating a
smoothed version of the original. The resulting stream will vary in number of
clusters depending on the threshold size of the clusters.
The threshold of a given cluster can be fixed (e.g. Parzen-window method),
or made sensitive to the location of the cluster (e.g. k-nearest-neighbor method),
for more details see [DH73]. In our experiments we use the latter approach1 by
setting the threshold as: Threshold j < — M in[A + ( - ^ * Cj) , th resh ^^], where A,
1 Informally speaking, our approach is an intermediate between the two methods since we adjust
the size of the threshold during training. This is known as the ’ ’Relaxation method”. See [DH73].
We use the algorithms described in chapter 2 to
by using the VBR video stream from the Star Wars
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12
LRD Dominant Region
SRD Dominant Region
Marginal Dominant Region
0 50 100 150 200 250 300 350 400 450 500
Capacity (cells/frame)
Figure 3.1: Dominant regions for the three main components of a trace [CG95].
B, and threshm ^ are set before each simulation run. Cj is the center of cluster
j . Since the threshold is set according to the cluster center, we obtained larger
thresholds for higher values of Cj; this helped compensate the larger variability for
the higher levels. In our experiments we adjusted A & [5,15], B [10,20], and
thresh™^ ~ 90. From several simulation runs we found that approximately eight
clusters gave good results to characterize the original stream. We later show the
similarity of the processed stream compared to the original one, and show that the
only difference is in the marginal distribution. In other words, both the original and
the smoothed stream are similar in terms of LRD and SRD as the LBC plot will
verify. The use of eight quantized levels (clusters in our case) to provide an accurate
representation of the stream agree with results reported in [SS93].
1C?
e
! i d
C D
e
p
t i (f
E
L U
£ 10
10'
1 fT -
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13
3.1 Matching Mean, Variance, and CLR
Table 3.1 summarizes some statistical results from our experiments.
Trace Samples Mean Peak/M ean Ratio Variance States
MPEG stream 174136 36.289 12.015 1835.554 353
Processed stream 174136 35.914 11.305 1836.113 8
DTMP stream 174136 35.725 11.756 1815.807 7-12
SMP stream 174136 26.124 15.962 1225.562 7-12
Table 3.1: Statistics of the real and synthesized data traces.
The 353 states (or quantized levels) resulted from converting the original stream
to ATM cells. In addition, the range in the number of states for the DTMP and SMP
stream is due to the fact th at several experiments were conducted each with slightly
different cluster size. We would also like to mention th at we used a deterministic
process while in the D TM P/SM P state. We can, however, observe th at both a
deterministic and a Poisson process would produce similar results by comparing the
original and processed stream. We observe that the performance impact (e.g. Cell-
Loss Rates CLR) is due to: a) number of states, b) transition probabilities Pij, and
c) holding-times Hj. The high-frequency component, on the other hand, has only a
marginal impact.
We can see from Table 3.1 th at the simple statistics from the DTMP closely
matched the average statistics from the original stream. However, we need to inves
tigate the behavior of the streams since we still have no indication of the correlation
structure. We further analyze the streams by comparing CLR for different buffer
sizes and capacity values, to compare the effects of Long-Range Dependence (LRD)
and Short-Range Dependence (SRD). Figures 3.3, 3.4, and 3.5 show the CLR ob-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
14
Traffic Arrival
“cells/frame”
Traffic Departure
“cells/frame”
A
%
o
Queue Size “Q” Link Capacity “C”
Cell Loss
“cells/frame”
Figure 3.2: A single queue single server system.
tained in our simulation. We applied all four streams to a simple FCFS single queue
single server system, see Figure 3.2.
In Figure 3.3 we used a small buffer size for the wide range of capacity allocation
(Maximum Queueing Delay (MQD)f«4.5— 45 ms), all streams seem to match the
behavior of the original stream well due to: a) the effect of SRD is pronounced; or
b) we see little effect from LRD. Therefore a well designed SRD Markovian model
gave good results under these operational conditions (i.e. relatively small buffer).
In Figure 3.4, we fixed the link capacity to 60 cells/frame («600Kbps) and
showed results over a wide range of buffer sizes (MQD?a7— 350ms). As the buffer
size increases, the effect of LRD become more pronounced making the pure DTMP
fail to model the original stream. On the other hand, the SMP model gives better
results as expected. We reemphasize the importance of realistic buffer sizes used
in simulation due to the QOS expected by the users since as we know, buffer sizes
are finite and often small in a real life computer network. More results showing
the dominant effect of short-term correlation on CLR can be found in [RE96]. We
see, therefore, the efficacy of the proposed semi-Markovian model for VBR traffic
modeling.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
15
Ilfr
ic r * -
Itrfr
Cell-Loss R a ts for Buffer=40 cells
o o o Starwars trace
® ------- Processed trace
10't- *•*, + + + SMP trace
* '« •. * * * * DTMP trace
f c
* d
* '©
* * •«
8 toi -
S 's s - j.
fJtO--.
xo*-.
+ • *Ov
50 100 150 200 250 300 350 400
Capacity (ceils/frame)
Figure 3.3: Effect of short-range dependence is pronounced w ith small buffer size.
Cell-Loss R ate with C apa city=60 cells/frame
1 1 1 I 1
0 0 0
" 1 --------- t --------- 1 ---------1 ---------
Starwars trace
-------
Processed trace
I-.. + + + SMP trace
• * •
DTMP trace
■ft.
•- + •
S-. +
X
+
X
X
- o.
+
X
X
X
■ Q....0....0
+
+
+ •
+ •
+ +:
X
X
X
X
X
X
X
10 0 50 100 150 200 250 300 350 400 450 500
Buffer S ize (cells)
Figure 3.4: Effect of long-range dependence is pronounced as buffer size increase.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
16
1 0 '
103
o
10*
10 -
Cell-Loss R ate with Capacity=250 ceils/frame
■">18 o ......
* 9
o o o Starw ars trace
P ro c essed trace
+ + + SM P trace
* * * DTMP trace
9:
$‘ 9-
20 40 60 80
Buffer S ize (cells)
100 120
Figure 3.5: The semi-Markov process matches well under realistic operational
scenarios.
To validate our results we conducted additional experiments for realistic val
ues of buffer sizes and link capacities. Figure 3.5 shows results obtained using a
link capacity of 250 cells/frame («2.5Mbps) and ranges from 10 to 120 cell buffers
(MQDrs2— 20ms). We can clearly see the good m atch from the SMP model.
3.2 LBC and VT plots
To fully investigate the behavior of the proposed model we present results ob
tained by the Leaky-Bucket Contour plot (LBC) [LNR94], as well as a variance-time
plot for all our streams.
Figures 3.6 and 3.7 show additional results obtained analyzing our four streams
of real and synthesized data, hi the LBC plot of Figure 3 .6 , we assumed the buffer to
be infinite (i.e. CLR=0) and the simulation records the maximum buffer occupancy
for a range of link capacities; as in [CG96] we plot capacity versus the buffer-empty-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
17
L B C plot
Starw ars trace
Processed trace
SMP trace
DTMP trace
1 1C ?
LRD Dominant Region
S R l ogam Region
Marginal Dominant Region
Capacity (cells/frame)
Figure 3.6: The leaky-bucket contour plot for the four traces.
time (T) which was found by T = B /C , in other words T is simply the MQD
measured in frame time (i.e. MQD = T / 24 in our case). Figure 3.6 shows that all
streams are matching in part of the SRD dominant region, however, the DTMP -
being a pure Markovian process - failed, as expected, to m atch the original stream in
the LRD dominant region. The SMP with non-memoryless state holding times gave
better results and appears to m atch some of the correlation structure of the original
stream. It is im portant to remember, however, that the maximal delay and minimal
unitization constraints found in real life networks restrict the allowable region for
capacity allocation and buffer sizes. As shown in Figure 3.6, the lower and upper
bounds separating the three regions is equivalent to a MQD~4.167ms and 416.67ms,
respectively. We believe, therefore, that the proposed semi-Markov model mimics
well the behavior of a real VBR traffic stream under realistic operational scenarios.
Figure 3.7 shows that the SMP is in fact more bursty in nature than its counter
part Markovian model, we can see that the stream generated by the SMP captured
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
18
Variance-Time plot
o o o Starw ars tra c e
P rocessed tra c e
+ + + SMP trace
* * * DTMP trace
Aggregation Level
Figure 3.7: The variance-time plot for the four traces.
more of the correlation of the original stream. The DTMP, being inherently SRD,
quickly looses its correlation as revealed by a slope of -1 (i.e. H = .5).
Finally, Figures 3.8 and 3.9 depict the actual traces. In Figure 3.8 we show a
small segment of the original and processed trace showing the eight quantized levels.
In Figure 3.9, we show the original, DTMP and SMP trace plotted at different time
scales. It is clear that both the SMP and Star Wars trace appear to be more bursty
and much more similar compared to the trace generated by the DTMP.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1% S egm ent of th e 'S ta r Wats* trace 1% Segm ent of th e "Processed" trace
2 5 q ----- . -----. ----- . ----- 2 5 0 -
0 500 1000 1500 2000 0 500 1000 1500 2000
Time Unit = .04167 S ec Time Unit = .04167 S e c
Figure 3.8: The eight quantized levels of the processed trace.
S ta r W ars trace DTMP trace
300---------------------- 30Q
x ikL, " idyll
oBSBtBtKLJ
30C
20 C
1 Q C
SM P trace
0 1000 2000 0 1000 2000 0 1000 2000
Time Unit = .04167 S ec Time Unit = .04167 S e c Time Unit = .04167 S ec
1000— ------------------ 100Q--------------------- 1C
50C
1000 2000 T > 1000 2000 T > 1000 2000
Time Unit = .4167 S ec Time Unit = .4167 S ec Tim e Unit = .4167 S ec
10000-
1000 2000 T > 1000 2000 T 3 1000 2000
Time Unit = 4.167 S e c Time Unit = 4.167 S ec Time Unit = 4.167 S e c
T > 1000 2
Figure 3.9: The three data traces shown at different levels of aggregation (Note:
Poisson process was used in the DTM P and SMP state).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
20
Chapter 4
Conclusion
From results obtained by simulation, we conclude th at the proposed clustering
method enabled us to create accurate models for Variable-Bit-Rate (VBR) traf
fic modeling in computer networks. We used a similar algorithm to build semi-
Markovian models and found it to be simple and efficient. Results showed that un
der realistic operational network scenarios, the impact of Long-Range Dependence
(LRD) to determine the Cell-Loss-Rate (CLR) is not of practical importance. We
showed that the use of well designed semi-Markovian models can give satisfactory
results for computer network traffic modeling and performance evaluation.
We believe that there are still several issues th at requires further investigation.
Among those are how to formalize an efficient way to define the cluster sizes for best
matching a given stream. We would also like to propose the use of different pdfs for
state holding times, to see their goodness of fit by means of statistical tools such as
the Quantile-Quantile plot [LK91]. We also propose using the data obtained by the
clustering method to directly specify an empirical distribution for the modeling of
state holding times. As described in Section 2.3, we propose the use of conditional
probability for modeling state holding times.
We would also like to mention the possibility of utilizing the proposed clustering
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
21
method with the Hidden. Markov Models (HMM) described in [RJ93]. We propose
the use of the clustering method to identify the number of states in the Markov
chain and corresponding average rate for each state, and then use the theory of
HMM to find the transition probabilities.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
22
Part II
Transport-layer M odeling
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
23
Chapter 5
Introduction
Simulation of computer networks is considered an efficient tool used in analysis
and performance evaluation. In today’s computer networks (e.g. the Internet),
a range of users generate different types of traffic, and expect different types of
response from the underlying network (i.e. each with a different set of rules to
specify their QOS). In traffic modeling for simulation it is, hence, im portant to
choose the best models th at mimic the behavior of real users to ensure a correct
analysis from a simulation point of view. This enables us to correctly predict the
performance of a network prior to its deployment or redesign.
The concept of Long-Range-Dependence (LRD) initially reported in [LTWW94]
opened new challenges in the engineering of computer network traffic. For many
years researchers relied on Markovian models for network performance prediction,
however, these models are inherently Short-Range Dependent (SRD). Other recent
studies also revealed the self-similarity or ’’fractal” nature of streams collected over
the Internet [Ryu98]. But w hat are the implications of LRD found in these streams
from an engineering point of view, and why it is im portant to find new models that
capture these features? We summarize the answer as follows:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
24
1. The cell/packet loss probability in network queues decays faster when Marko
vian types of traffic models are used compared to comparable ’ ’ fractal” models
( i.e. with similar first and second order statistics).
2. The distribution of asymptotic queue size (i.e. maximum queue occupancy)
decays faster in Markovian models.
These are some of the reasons encouraging researchers to gain a better under
standing and find appropriate m athem atical models that capture the behavior of
real computer networks streams.
We just mentioned the effects of using inaccurate performance models (e.g.
Markovian), the effects in a real network from a user perspectives are increased
response times caused by either the very long queues, or, for reliable protocols (e.g.
TC P), the retransmissions of dropped packets. For protocols such as ATM a degra
dation in the QOS (e.g. increased cell-loss rate) will result.
Having traffic models that correctly mimic real computer network traffic streams,
allows us to generate a variety of synthesized streams that can be used in simulation;
So that we may obtain correct analysis and performance evaluation of computer
networks.
We classify the modeling and simulation of computer networks into two main
categories: i) Link-level, and ii) Application or Transport-level. In the former, mod
els are constructed and model-parameters are matched from a previously collected
stream or group of streams. These models can be later used to generate synthesized
streams in simulation. The latter approach is similar, however, the modeling is per
formed above the link-layer. In other words, while in the former method we usually
analyze long traces of data at the packet or cell level, in the former approach we
look at either user behavior, or more generally, higher-layer protocols (e.g. file size
distributions, request lengths, etc.). In this part of the thesis, we focus on the latter
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
25
aspect, specifically, we analyze the behavior of the popular World Wide Web (also
known as the Web or WWW).
5.1 Related Work
Several studies have attem pted to understand how self-sim ila rity arises in com
puter networks. In [PKC96] the ON/OFF model described in [WTSW95] was
used simulate multiple client-server sessions emulating the behavior of Web traf
fic. [CB96] suggested that the reasons for traffic self-similarity can be attributed
to the heavy-tailed nature of file size distributions available on the Web. [PF94]
showed similar results modeling FT P bursts. Other results focused on the model
ing via Cumulative-Distribution Functions (CDFs) estimated by empirical results,
see [DJ91], [Edd96], [Mah97]. [Mah97] estimated several empirical CDFs to model
H TTP traffic. The CDFs capture parameters of Web client/server behavior such
as H TTP request/reply lengths, documents sizes and user think time. All of these
studies require the use of a TCP algorithm to get packet-level results since the
modeling is performed above the transport layer.
Before proceeding, we identify alternatives to model-driven simulations, and de
scribe their m ajor drawbacks:
1. Trace-driven simulation:
One way to simulate real traffic is by using a trace or a group of traces collected
from real computer networks. The three main drawbacks to this approach are:
(a) A trace represents only a particular instance of ” history” .
(b) The simulation is limited by the ” lengthP of the trace.
(c) We may need a large number of traces to accurately verify the simulation
results.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
26
2 . Empirical-driven simulation:
In this method, an empirical cumulative-distribution-function (CDF) is defined
from a collection of real traffic traces, and hence used to generate random
events in the simulation. This m ethod too has its drawbacks which we identify
briefly in the following two points:
(a) In many cases the empirical CDF captures only first-order statistics (dis
tribution). However, higher order statistics may be important.
(b) Values of the random variables are ” bounded” by the m inim um and max
imum values used to generate the empirical CDF. Large values, however,
are not unlikely and have higher probability than previously thought (e.g.
heavy-tailed distribution).
Even though empirical driven simulations can be considered of a somewhat more
realistic approach than trace-driven simulations, the drawbacks mentioned suffice to
motivate researchers to look for alternative methods in their simulations to insure
realistic results.
The use of validated traffic models for simulation not only simplifies the simula
tion itself but can improve the simulation by giving more realistic results and perm it
ting efficient and accurate network performance evaluation under a wide variability
of scenarios. We briefly point out some advantages associated to this approach:
1. Virtually no limit in the length of the simulation.
2. A wider variability of scenarios can be simulated since traffic models can be
adjusted to given input parameters w ith specific traffic characteristics.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
27
Chapter 6
Issues and Traffic M odel Proposed
In previous sections, we briefly mentioned th a t today’s computer networks in
clude a wide variety of heterogeneous traffic types. These traffic not only differ in
nature, but users differ in the quality of service (QOS) they expect. It is of little
practical use to propose a generic traffic model; we, therefore, focus on traffic gen
erated by the popular World Wide Web (W W W or Web), with emphasis on the
request arrival process.
6.1 Web Workload traffic (for request arrivals and
volumes)
Traffic generated by the Web is considered the leading source of backbone net
work traffic found in today’s Internet [Mah97]. W W W uses the Hypertext Transfer
Protocol (HTTP) as its application layer protocol. H TTP uses the Transmission
Control Protocol (TCP) as its transport layer protocol which is a guaranteed delivery
protocol.
While users of this type of traffic tolerate delays (compared to other Internet
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
28
users such as users of real-time applications), there is little, if any, tolerance to loss
of data. This implies th at the network can be operated at wider ranges of utilization
so long as data is not lost (this is, of course, guaranteed by TCP). We emphasize
the possibility of operating the network at relatively high utilization levels (recall
there is an inverse relationship between utilization and delay).
From analyzing a NASA and a Berkeley Web trace, [Ryu98] found that the
request arrivals are well modeled by a superposition of fractal renewal processes
(Sup-FRP). Compared to previous related work (e.g. [PKC96]), we show that ig
noring second-order statistics in the Web arrival process will lead to inaccurate
performance results in terms of response-time and packet loss.
6.2 The Superposition of Fractal Renewal
Processes Model (Sup-FRP)
The Sup-FRP is a process generated from “M” independent and identically dis
tributed (i.i.d.) fractal renewal processes (FRPs). Each FRP is defined by the
following pdf:
f 7 A- 1erA 1 for t ^ A
p(t) = { (6.1)
[ 7 e-'M 7 r('’ r+ 1> for t > A
W ith 1 < 7 < 2. The parameter A serves as a threshold between exponential
behavior and power-law behavior of interarrival times. Figure 6.1 shows a graphical
realization of the Sup-FRP.
In the Appendix we describe the algorithm used to generate the Sup-FRP arrival
process in detail.
To fully describe the sup-FRP model we need the following three values to be
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
29
f r p O ) ^
FRK2)
t tz(2)
f r p (M) ^
o(M) A
Sup-FRF.
i_ii in .,m e
Simulation Start Time
Figure 6.1: The Sup-FRP process shown graphically. Each FR P is i.i.d. (M=3 in
the figure).
known:
1. 7 :the shape of the pdf.
2. A :the cut-off value of the pdf.
3. M :the number of i.i.d. FRPs.
W hat is usually known, or can be estimated, from a given stream of traffic are
the following three parameters:
1. H :the Hurst parameter defining the degree of self-similarity.
2. A :the average arrival rate.
3. Tq :the onset-time (i.e. the level of aggregation where the fractal behavior
of equations defines this relationship. Details can be found in [RL96], [RL98]:
starts, see [RL96], [Rii98]).
There is a relationship between traffic and model parameters, the following set
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
30
7 = 2 — a (6.2)
H = (6.3)
A = M j ll + (7 — l ) - 1e-T]- 1A-1 (6.4)
Tq = 2 -17 "2( t ~ 1)_1(2 - 7 )(3 - 7 )e"7[l + ( 7 - l)e 7]2A2^ (6.5)
It is fairly simple to solve this system of equations to derive the Sup-FRP pa
rameters. From Eq. (6.3) we can find a . Solving Eq. (6.2) we get 7 . Next Eq. (6.5)
yields A, and finally we obtain the value of M from Eq. (6.4).
6.3 The Sup-FRP Match the Arrival Process
From the analysis of two Web traces from NASA and Berkeley, Ryu [Ryu98]
matches the IDC curve (Index of Dispersion for Counts, also known as the Fano
factor F(T)) of the Web request arrival process and the Sup-FRP process. Figure
6.2 verifies this match. The IDC is defined as the variance of the number of arrivals
in a given time window of width T divided by the mean number of arrivals in T.
Note th a t for a Poisson point process (i.e. exponentially distributed interarrivals),
the EDC curve value is 1 over the entire range of time scales. Recall that both the
mean and variance of the Poisson process are identical, and the resulting aggregate
process is still Poisson [Kle75].
We, therefore, use the Sup-FRP process to model the Web request arrival process.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
31
Index of Dispersion for Counts (IDC) functions of trace. Sup-FRP analysis, and Sup-F
o Berkeley Trace
- Analytical IDC
Sup-FRP (sample path
Sup-FRP (sample path
idl'l-------------r>ftnnnOOC°.
to3 1 0 2
Figure 6.2: IDC match between the Sup-FRP and the Web request arrival process
[Ryu98].
6.4 Heavy-Tailed Distributions: A Note
A random variable X follows a heavy-tailed distribution if the its complementary
distribution (also known as the survivor distribution) has the form [AFT98]:
P [X > x] ~ x a , as r — * oo, 0 < a < 2 (6.6)
An example is the Pareto distribution with probability density function:
f (x \ a, k) = a k ax ^ Q +1^ a, k > 0 , x ^ k (6.7)
There are several properties associated with heavy-tailed distributions such as
infinite mean and variance. The mean of the pareto pdf is:
f° ° n k
E (X ) = I x f (x \ a, k ) dx = ------
Jk a — .
(6.8)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
32
The variance is:
VarVC) = B(X*) - E(X)* = ( a _ g _ x), (6-9)
Hence, for a ^ 2, the distribution has infinite variance, and for a <1 1 , the
distribution has also infinite mean.
The value H (the Hurst parameter), and the param eter a of the Pareto pdf are
related by the equation[PKC97],[AFT98]: H = (3 — ct)j2. Therefore, for LRD (i.e.
.5 < H < 1) we require 1 < a < 2. For a > 2 the process is SRD (i.e. H < .5).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
33
Chapter 7
N etw ork M odel
The network m odel we use in our simulation is a fairly simple one. Figure 7.1
shows our simple two node topology.
The reason for this simple model is twofold; from a simulation point of view
it allow us to investigate the behavior of the transport layer (TCP) and eliminate
the effects of the routing protocol. From a modeling point of view it enables us
to gain a better understanding of the effects of using the proposed arrival process.
As shown in Figure 7.1, the request arrival process at node G 2 initiates an F T P
sessions. Note th at an H TTP request may initiate several F T P sessions. The sup-
FRP model proposed will be used to model the initiation (i.e. request arrival) of
an FTP, not the arrival of an HTTP request. In other words, we model the F T P
requests seen from the Web server at G2.
In our analysis we study the performance of the downstream traffic (G2 — * G l)
for a wide range of buffer sizes. We also study network performance at different
utilization levels (e.g. at different link speeds). B y fixing the arrival process and
changing the available capacity we investigate the effects of response time seen by
the user.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
34
W eb R eq. Am P ro c ess
C lien g
(Sup-FRP)
Web
Serve
Multiple FTP S e ssio n s
Figure 7.1: The two node simulation topology.
7.1 Simulation Environment
To study the performance impact of the proposed traffic model on TC P1, we
used the LBNL Network Simulator (ns) [MF98]. ns is an event-driven simulator
derived from S. Keshav’s REAL network simulator. We modified ns by adding an
implementation of the Sup-FRP model.
To measure performance, we recorded throughput for each FTP session:
r m _ ± File Size (bits) . ^,
Throuput = —— --------— ---- -------1 ;---------— (7.1)
File Transmission Time (simulated seconds)
As described in [PD96], the throughput can be thought of the achieved band
width, compared to the available bandwidth.
For each simulation run, we average throughput over all FTP sessions. For a
given scenario (i.e. combination of arrival process and file size distribution), we
find the averaged throughput at different levels of network utilization p in the range
[0.1,0.9]. Utilization is found by:
P = ^ (7-2)
1Our focus here is to verify th e general impact of the arrival process on network performance.
We will, therefore, use only one o f the available TCP flavours, nam ely, ’ ’Tahoe” TCP.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
35
where, A = Average arrival rate (requests/sec), F = Average file size (bits), and
C = Link capacity (bps).
To vary p, we adjusted the value of C and fixed A , and F for a given run.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
36
Chapter 8
Sim ulation R esults
We performed several experiments for different arrival processes (e.g. Exponen
tial, and Sup-FRP) and file size distributions (e.g. Exponential, and Pareto). For a
given experiment, we kept the average arrival rate and mean file size equal to ensure
a parsimonious comparison, in other words, whether the interarrival request was ex
ponentially distributed or followed the Sup-FRP process, we used the same average
arrival rate (for the Sup-FRP process we adjusted the onset-time (Tq) rs [.1,.3],
and the Hurst param eter (H ) ~ .8). The same for file sizes. Table 8.1 summarize
values of link capacity (C) for different values of utilization (p). We used an aver
age arrival rate (A) of 10 arrivals/sec1, and average file size (F ) of 9.375KB. The
G2 — * • G l link-delay was fixed at 20ms. In each simulation run, we generated 3000
arrivals (i.e. approximately 5 simulated minutes).
Utilization ”p” .1 .2 .3 .4 .5 .6 .7 .8 .9
Capacity ” C" (Mbps) 7.5 3.75 2.5 1.875 1.5 1.2 1.07 .9375 .8333
Table 8.1: Network Utilization and equivalent Link Capacity.
xThe low arrival rate (e.g. A = 10) was motivated to lower th e number o f superimposed FT P
sessions. We believe that sim ilar results consistent with our findings w ill be achieved for larger-scale
sim ulations (e.g. A = 1000).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
37
Effect of using the Sup-FRP as the Web request arrival process
0.45-
0 - -0--0 Sup-FR P Air., Exp. R e
■ o
B=64KB
o.
« 0 .2 -
■ o.
0.1
B=2KB
0.0J-
0.9 0.2 0.3 0.4 0.5
Utilization
0.6 0.8 0.7
Figure 8.1: Effects from correlated arrivals get pronounced as the buffer size incr-
erase (we used B=2KB, 4KB, and 64KB).
In Figure 8.1 we see the effect of using correlated arrivals. We used two different
arrival processes, one uncorrelated (exponentially distributed interarrivals) and the
other correlated (Sup-FRP arrivals). To evaluate the effects of the arrival process
on throughput and eliminate effects due to heavy-tailed file size pdf, we used ex
ponentially distributed file sizes. We performed several simulation runs with buffer
size of 2KB, 4KB, and 64KB. As the buffer size increase, we observe a pronounced
effect of Sup-FRP on average throughput. It is interesting to see that strongly cor
related arrivals not only lowered the achieved throughput, but had a stronger effect
at higher utilization levels; we attribute this result to: a) increased buffer occupancy
at higher utilization, and b) increased number of packets dropped, hence, increased
transmissions times due to the retransmission of lost packets.
In the next experiment we investigated the performance impact using both a
correlated arrival process, and a heavy-tailed file size distribution. As reported in
[CB96], traffic self-similarity m ay be attributed to the heavy-tailed nature of file
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
38
Throughput degr. due to Correlated Arr. and Heavy-tailed File pdf
O.S
0.4i
o - - o - - o Sup-FRP Arr., Exp. Rle
' — ' — * Exp. Arr., P areto Rle
H=.65
„ 0.4-
S 0 .3 J -
o.
o
H=.95
0.1
0.0'-
0.3 0.4 0.8 0.9 0.5
UtiHzation
0.6 0.7
Figure 8.2: Effects of changing the Hurst parameter of the Arrival process and File
size pdf on Throughput.
sizes found in the Web. Even though several studies (e.g. [PKC96]) relied on. the
findings of [CB96], we emphasize the importance of characterizing the nature of
files requested (in contrast to files found or observed for multiple requests). In
[AFT98] it is argued that the presence of caching in the Web has the effect of
making the set of transm itted files relatively insensitive to the set of files requested,
and distributionally similar to the set of available files.
In Figure 8.2, we compare the performance impact due to a heavy-tailed file
size distribution and correlated request arrivals for different values of H (the Hurst
parameter).
We observe the performance impact when file sizes follow a heavy-tailed distribu
tion (e.g. Pareto), and notice th a t the effect is mainly at lower utilization levels (p in
the range [0.1,0.5]). The arrival process, as previously observed has a predominant
effect at higher utilization levels (p in the range [0.5,0.9]). Figure 8.3 summarizes
the main results (for H = .8).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
39
Sup-FRP (Web request arrival) Vs. Pareto (Web volume)
0.45-
x— x----x Exp. A it., Exp. file
o - . o - - o Sup-FR P Arr., Exp. file
*---- ' Exp. Arr., P areto file
0.4-
m"0.3J
» 0 .2 -
0.1 J-
0.1
o..
0.0‘ -
0.2 0.8 0.9 0.3 0.5
Utilization
0.4 0.6 0.7
Figure 8.3: File size distribution impact performance at low utilization levels. The
arrival process impact performance at high utilization level.
Since the observed Web request arrival process is well matched by the Sup-FRP
process [Ryu98], and previous studies reported that Web file sizes are well modeled
by the pareto pdf [PKC96], [Mah97], [AFT98]; we analyze the performance impact
using the Sup-FRP process to simulate the Web request arrival process, and the
Pareto pdf to simulate Web volumes. Figure 8.4 reveals an enormous degradation
in performance due to both phenomena.
In Figure 8.5, we plot the average FTP session delay recorded. The Sup-FRP
resulted in longer delays compared to delays predicted by both an uncorrelated
arrival process (e.g. exponentially distributed interarrivals), and heavy-tailed file
size distribution (e.g. Pareto). Our intuition attributes this observation to: a)
overall increased queue lengths with the Sup-FRP arrival process, and b) increase
in the number of dropped packets for most of the F T P sessions with the Sup-FRP
arrival process. We also argue th at the lower delays observed at higher utilization
for Pareto distributed file sizes may be due to the high variability in the file sizes
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40
Effect of Both: Sup-FRP (Web req. arr.) and Pareto (Web volume)
0 .4 5 -
o - . o - - o Sup-FRP Arr., Exp. File
' -----' -----* Exp. A it., Pareto File
- t — H — + Sup-FRP Arr., Pareto File
0.4-
0. 1 £-
0.1
o...
0.0£-
0.6 0.7 0.9 0.5
Utilization
0.8 0.2 0.3 0.4
Figure 8.4: Both a strongly correlated arrival process and heavy-tailed file pdf
result in high performance degradation.
and increased number of small files.
To investigate the results of Figure 8.5 we plot the delay variance for all FTP
sessions. Figure 8.6 shows the variance plotted on a log scale. We observe that
using a heavy-tailed file size pdf introduces high variability at lower utilization
levels, compared to the Sup-FRP process which affects utilization at higher levels.
In Table 8.2, we summarize our results. We describe the effects of correlated
arrivals (e.g. Sup-FRP) and the heavy-tailed distributed file sizes (e.g. Pareto) on
throughout and response time. We compare the effects of both phenomena with
uncorrelated arrivals and exponentially distributed file sizes.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
Increased av erag e delay with the Sup-FRP arrival process
5000-
450C- h 1 ----- 1 - Sup-FR P Arr., P areto File
o— o— o S up-FR P Arr., Exp. Rle
*— *— ' Exp. Arr., Pareto Rle
x— x— x Exp. Arr., Exp. R le
^3CXK-
150(
50t
0.2 0.3 0.6 0.4 0.5
Utilization
0.7 0.8 0.9
Figure 8.5: Average file transmission tim e increase with the Sup-FRP arrival process.
Large Delay V ariance for Sup-FR P Arr. & Pareto file pdf
+ - - + - - + Sup-FR P Arr., P areto Rle
o . . o - - o Sup-FR P Arr., Exp. Rle
' — * — * Exp. Am, Pareto Rle
x— x— x Exp. A m . Exp. R le
Sltf
0.3 0.6 0.4 0.5
Utilization
0.7 0.8 0.9
Figure 8 .6 : Variance in file transmission tim e increase with both the Sup-FRP arrival
process and Heavy-tailed file pdf.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
< < E ffe c ts » Correlated Arrivals
(Sup-FRP)
Heavy-Tailed File
Dist. (Pareto)
Average
Throughput
degradation at higher util,
levels (p = [0.5,0.9]).
degradation at lower util.
levels (p “ [0.1,0.5]).
Average
Response Time
increases faster from lower
utilization levels (p = .65).
only little incr. at lower util,
due to incr. no. of small files.
Response Time
Variance
increased at higher utilization
levels ( p S [0.5,0.9]).
higher over the entire range
of utilization.
Table 8.2: Effects of correlated arrivals and heavy-tailed file size pdf on performance:
Summary of Results.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
43
Chapter 9
Conclusion
We analyzed the performance impact in Internet Web traffic due to correlated
arrivals and heavy-tailed file size distribution. We used the Sup-FRP process pro
posed in [Ryu98] for modeling the Web request arrival process and compared the
performance impact with uncorrelated arrivals (i.e. exponentially distributed inter-
arrivals) and heavy-tailed file size pdf (e.g. Pareto). From several experiments, we
show that ignoring the correlation found in the Web request arrival process will lead
to inaccurate performance analysis.
As reported in [PKC96], [AFT98], and [Mah97], we verified that the heavy-tailed
nature of files found in the Web lead to a performance degradation, furthermore, we
show that the network performance impact due to heavy-tailed file size distribution
is mostly at low utilization level. We also show that correlated Web requests (e.g.
Sup-FRP) affects performance at higher utilization level.
Since Web users are tolerant to delays (compared to users of real-time appli
cation), we believe it is not unlikely to operate networks at high utilization levels;
Therefore, we believe th at the arrival process has an impact on performance and
needs to be well characterized.
Another issue we believe deserves investigation is to compare transmission time
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
44
distributions with the Sup-FRP arrival process, and both exponential and Pareto file
size pdfs. [AFT98] shows th at there does not seem to be strong sample correlation
between file sizes and transmission times. We, therefore, believe th at the arrival
process may be the reason for the observed heavy-tailed nature in the distribution
of transmission times. In addition, in order to fully validate the Sup-FRP model,
we propose to compare our results to empirical studies previously reported (e.g.
[Mah97], [DJ91]).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
45
Chapter 10
Appendix: D erivation of the
Sup-FR P
We present a complete derivation for the Sup-FRP process described in Section
6.2.
The interarrival time of each Fractal-Renewal-Process (FRP) of Figure 6.1 is
defined by the pdf:
, 7 A 1 e = ^J for t ^ A
p{t) = { (10.1)
7 e 7A7£ ^{+^ for t > A
We find the cumulative-distribution-function (CDF) for the interarrival time:
for t ^ A
r4 t
F (t)= / 'yA~1e ^ d t = l - e ^ (10.2)
Jo
similarly for t > A
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
46
X : Life-time interval (original pdf).
Y : Residual-life interval (residual-life pdf).
t
A i-1
Simulation
Start
Time
Ai
Figure 10.1: Life-time and Residual-life interval.
Ae~ U ~
F {t)= [ A 'yA~1e ^ d t + ^ j e - 'Y A'rt - ^ d t = l - e - ^ A 'rt-'r (10.3)
Jo Ja
Using the inverse method and solving both equations we get:
U > e-7
(10.4)
U < e->
Since the continuous pdf of (10.1) is not the exponential pdf (i.e. is not memory-
less) , we need to derive the distribution of the residual-life to model the first interval
T o of each FRP, see Figures 6.1 and 10.1. From [Kle75], the pdf of the residual-life
is defined as follow:
Presidual(f) — E(t) (10.5)
— [ ^7A 1 e = %~dt+ [ t'ye ^A ^t ^ + 1^dt = — — — -(e 7 -I-7 — 1) (10.6)
Jo Ja 7 ( 7 - 1 )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
47
the pdf of the residual life is then:
f for i s : A
Preaiduali?) = I ( ^ 1 } (10.7)
7(T-1 y e T tX 't- * for £ > A
I A (e-Tr-Hr-i) ror c >
As before, we find the CDF for the residual-life interval:
for £ A
w - Jo 2 ^ ~ < + l - i ) - e- 7 f 7 _111 (io-s)
similarly for £ > A
r 7 ( 7 - l ) e ^ - /•t 7 (7 - l ) e - 7A ^ Jx
- J o A ( e ^ + 7 - l ) dt + j A ■ ■ ~ dt
a -M e -' + 7 - 1 )
1
= P 7 ^ T Z - ! (7 - 1 + e"7 - 7e ^ A r - H ^ ) (10.9)
Using the inverse method and solving both equations we get:
- 7 - A l n [ ! 7 + ( t f - l ) ( 7 - l ) - ‘e-T] r > l
1 residual = \ (10.10)
^ A V '-r V < 1
where,
- + (7 1)6V (10.11)
7
Hence, as shown in Figure 6.1, to generate Sup-FRP arrivals, we use TT eaiduai Eq.
(10.10) to find Tq \ and T Eq. (10.4) to find Vi = 1,2,..., M and j > 0.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
48
Bibliography
[AFT98] R. J. Adler, R. E. Feldman, M. S. Taqqu, ” A Practical Guide to Heavy
Tails,” Birkhauser, pp. 3-25, 1998.
[BSTW95] J. Beran, R. Sherman, M. S. Taqqu, and W. Willinger, ’ ’Long-Range
Dependence in Variable-Bit-Rate Video Traffic,” IEEE Trans. Coxnmun.,
vol. 43, no. 2/3 /4, pp. 1566-1579, Feb./M arch/A pril 1995.
[CB96] M. Crovella and A. Bestavros, ’ ’Self-similarity in World Wide Web traf
fic: Evidence and possible causes,” In Proceedings of the ACM SIGMET
RICS Conference on Measurement & Modeling of Computer Systems,
pp. 160-169, Philadelphia, PA, May 1996.
[CG96] C.-H. Chou and E. Geramiotis, ’ ’Performance Prediction and Resource
Allocation for Long-Range Dependent Traffic in ATM Networks,” Proc.
of the 30th Annual Conference on Information Sciences and Systems,
pp.198-204, March 1996.
[DH73] R. 0 . Duda and P. E. Hart, ’ ’Pattern Classification and Scene Analysis,”
John W iley & Sons, 1973.
[DJ91] Peter B. Danzig and Sugih Jamin, ’’tcplib: A library of TC P internet
work traffic characteristics,” Technical Report USC-CS-91-495, Corn-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
49
puter Science Department, University of Southern California, Los An
geles, CA, 1991.
[Edd96] Rusty Eddy, "H T T P analysis of IP level traces, Feb. 1996, Available at
http: / / catarica.usc.edu/eddy/http-traffic/http-traces.htm l.
[GW94] M. G arrett and W. Willinger, ’ ’Analysis, Modeling and Generation of
Self-Similar VBR Video Traffic,” Proc. ACM SIGCOM ’94, pp. 269-280,
Sep. 1994.
[Kle75] Leonard Kleinrock, ’ ’Queueing Systems, Volume I: Theory,” John Wiley
& Sons, 1975.
[LK91] A. M. Law and W. D. Kelton, ’ ’Simulation Modeling and Analysis,”
Second Edition, McGraw-Hill, pp. 375-379 ,1991.
[LNR94] D. M. Lucatoni, M. F. Neuts, and A. R. Reibman, ’ ’Methods for Per
formance Evaluation of VBR Video Traffic Models,” IEEE/A CM Trans
Networking, vol. 2, no. 2, pp. 176-180, April 1994.
[LTWW94] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson,
”On the Self-Similar Nature of Ethernet Traffic (Extended Version),”
EEEE/ACM Trans. Networking, vol. 2, no. 1, pp.1-15, Feb. 1994.
[LTWW95] W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, ’ ’Self-
Similarity in High-Speed Packet Traffic: Analysis and Modeling of E th
ernet Traffic Measurements,” Statistical Science, vol. 10, no. 1. pp. 67-85,
1995.
[Mah97] Bruce A. Mah, ”An Empirical Model of H TTP Network Traffic,” Pro
ceedings of INFOCOM ’97, Kobe, Japan, April 1997.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[MF98]
[Nel95]
[PD96]
[PF94]
[PKC96]
[PKC97]
[PVTF92]
[RE96]
[RJ93]
50
S. McCanne and S. Floyd, ”ns - Network Simulator,” Available at
http: / / www-mash.cs.berkeley.edu/ns/.
R. Nelson, ’ ’Probability, Stochastic Processes, and Queueing Theory,”
Springer-Verlag, pp. 352-356, 1995.
Larry L. Peterson & Bruce S. Davie, ’ ’Computer Networks, A System
Approach,” Morgan Kaufmann Publishers, 1996.
V. Paxson and S. Floyd, ’ ’Wide Area Traffic: The Failure of Poisson
Modeling,” In Proc. ACM SIGCOMM ’94, pp. 257-268, 1994.
K. Park, G. T. Kim, and M. E. Crovella, ” On the relationship between
file sizes, transport protocols, and self-similar network traffic,” in Pro
ceedings of the Fourth International Conference on Network Protocols
(ICNP ’96), pp. 171-180, October, 1996.
K. Park, G. T. Kim, and M. E. Crovella, ”On the Effect of Traffic Self-
Similarity on Network Performance,” CSD-TR 97-024, Dept, of Com
puter Sciences, Purdue University, July 1997.
W. H. Press, W. T. Vetterling, S. A. Teukolsky, B. P. Flannery, ’’Numer
ical Recipes in C,” Second Edition, Cambridge University Press, 1992.
B. K. Ryu and A. Elwalid, ’ ’The importance of Long-Range Dependence
of VBR video traffic in ATM traffic engineering: Myths and Realities,”
In Proc. ACM SIGCOMM, San Francisco, CA, 1996.
L. Rabiner and B. H. Juang, ’ ’Fundamentals of Speech Recognition,”
Prentice Hall, 1993.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
51
[RL96] B. K. Ryu and S. B. Lowen, ’ ’Point process approaches to the modeling
and analysis of self-similar traffic: Part I - Model constriction. In Proc.
IEEE INFOCOM ’96, San Francisco, CA, 1996.
[RL98] B. K. Ryu and S. B. Lowen, ’ ’Point process models for self-similar net
work traffic, with applications” Stochastic Models (M. Neuts, Ed.), vol.
14, Nr. 3, pp. 735-761, 1998.
[Ryu98] Bo Ryu, ’ ’Modeling and Simulation of Broadband Satellite Networks:
Part II - Traffic Modeling,” submitted to IEEE Communications Maga
zine, 1998.
[SS93] P. Skelly and M. Schwartz, ” A Histogram-Based Model for Video Traffic
Behavior in an ATM Multiplexer,” IEEE/ACM Trans. Networking, vol.
1, no. 4, pp. 446-459, Aug. 1993.
[WTSW95] W. Willinger, M. Taqqu, R. Sherman, and D. Wilson, ’ ’Self-similarity
through high-variability: statistical analysis of Ethernet LAN traffic at
the source level,” In Proc. ACM SIGCOMM ’95, pp. 100-113, 1995.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
IMAGE EVALUATION
TEST TARGET (Q A -3 )
150mm
IIVMCBE, Inc
1653 E ast Main Street
R ochester, NY 14609 USA
Phone: 716/482-0300
Fax: 716/288-5989
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A comparative study of network simulators: NS and OPNET
PDF
Software interface, hardware architecture, and test methodology for the NVIDIA nv34 graphics processor
PDF
Boundary estimation and tracking of spatially diffuse phenomena in sensor networks
PDF
Multicast-based micro-mobility protocol: Design and evaluation
PDF
Performance of CA-MLSE with a predictor
PDF
Analysis of wired short cuts in wireless sensor networks
PDF
Testing on-chip and multiprocessor interconnection networks and switches
PDF
Compression, correlation and detection for energy efficient wireless sensor networks
PDF
Architectural support for network -based computing
PDF
Performance issues in network on chip FIFO queues
PDF
SMART: A small world-based reputation system for MANETs
PDF
Turbo codec design for low-BER, latency-critical links
PDF
Extending the design space for networks on chip
PDF
Logic synthesis for high performance and low power
PDF
Raga structure: Geometric and generative models
PDF
Nomadic Threads: Compiler controlled locality exploitation using thread migration for distributed memory parallel computers
PDF
Measurement and spectral analysis of denial of service attacks
PDF
On-chip tuning scheme for CMOS RF filters by implicit gain determination
PDF
Advantages of unpredictable multiagent systems: Randomized policies for single agents and agent-teams
PDF
Models and algorithms for distributed computation in wireless sensor systems
Asset Metadata
Creator
Alsaialy, Hany D. (author)
Core Title
Studies on the impact of long-term correlation on computer network performance
Contributor
Digitized by ProQuest
(provenance)
Degree
Master of Science
Degree Program
Computer Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-29145
Unique identifier
UC11341289
Identifier
1394895.pdf (filename),usctheses-c16-29145 (legacy record id)
Legacy Identifier
1394895.pdf
Dmrecord
29145
Document Type
Thesis
Rights
Alsaialy, Hany D.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical