Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Systematic performance and robustness testing of transport protocols with congestion control
(USC Thesis Other)
Systematic performance and robustness testing of transport protocols with congestion control
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SYSTEMATIC PERFORMANCE AND ROBUSTNESS TESTING OF
TRANSPORT PROTOCOLS WITH CONGESTION CONTROL
by
Shirin Ebrahimi-Taghizadeh
____________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2009
Copyright 2009 Shirin Ebrahimi-Taghizadeh
ii
DEDICATION
To my dear father, mother and brother (Rabie, Tara and Shahin)
for their continuous love and support
To my beloved Shahryar for his endless patience, love and inspiration
and
To the memory of my dear uncle, Mansour Valieh, with whom I started
my journey on the path of higher education
iii
ACKNOWLEDGEMENTS
I would like to hereby extend my great appreciation to Professor Sandeep Gupta and
Professor Ahmed Helmy for their supervision, encouragement and support during
my years at USC graduate school and in preparation of this doctoral thesis.
I would like to express my eternal gratitude to my dear father, Rabie Ebrahimi-
Taghizadeh, my dear mother, Tara Valieh, and my dear brother, Shahin Ebrahimi-
Taghizadeh, for their unconditional love, support and sacrifices without which
completion of my studies would not have been possible.
I would also like to express my foremost appreciation and gratitude to my beloved
husband, Shahryar Karimi-Ashtiani who not only has been my greatest source of
motivation, inspiration and encouragement but also provided me with invaluable
technical insights and advice in preparation of this dissertation. This great
achievement was never possible without his love, patience and commitment.
My special thanks also go to Professor Alice Parker and Professor Ali Zahid for their
advice, impeccable mentorship and support. It has been my great honor and pleasure
to have learned from and worked with them.
iv
I also wish to thank my dear friends Maryam Soltan, Fariba Ariaei and
Shideh Shahidi for their great friendship and emotional support. I will always cherish
the pleasant memories we made during our graduate studies.
I would also like to thank my lab colleagues and friends Ganesha, Shamim, Jabed,
Karim, Fan, Wei-jen, Shao-cheng and Sapon. It has been a great pleasure to work
and study with you.
v
TABLE OF CONTENTS
Dedication ......................................................................................................................... ii
Achnowledgements ......................................................................................................... iii
List of Tables ................................................................................................................. viii
List of Figures .................................................................................................................. ix
Abbreviations .................................................................................................................. xii
Abstract .......................................................................................................................... xiii
Chapter 1 Introduction .................................................................................................... 1
1.1 TCP/IP layered architecture ................................................................................. 1
1.1.1 Transport protocols ...................................................................................... 2
1.1.1.1 Congestion control and loss recovery ................................................. 3
1.1.1.2 Performance and robustness measures ................................................ 4
1.2 Motivation ............................................................................................................ 5
1.3 Contribution of this dissertation ........................................................................... 6
1.4 Review of the related work ................................................................................ 10
1.4.1 TCP related work ....................................................................................... 10
1.4.2 XCP related work ....................................................................................... 15
Chapter 2 TCP vs. TCP ................................................................................................ 20
2.1 Background on TCP ........................................................................................... 20
2.1.1 Slow start .................................................................................................... 21
2.1.2 Congestion avoidance ................................................................................ 21
2.1.3 Loss recovery mechanisms ........................................................................ 22
2.1.3.1 Fast retransmit ................................................................................... 22
2.1.3.2 Fast recovery ...................................................................................... 23
2.1.3.3 Time out ............................................................................................. 23
2.1.4 TCP New Reno .......................................................................................... 24
2.1.5 TCP SACK ................................................................................................. 25
2.1.6 Other TCP variants .................................................................................... 25
2.2 Interaction of TCP short-lived and long-lived flows ........................................ 26
2.3 Arrangement of an adversarial congestion scenario ......................................... 28
2.4 Investigating effect of temporal, spatial and scaling parameters ...................... 35
vi
2.5 Packet-level simulations .................................................................................... 36
2.6 Single bottleneck scenarios ................................................................................ 37
2.6.1 Effect of d, p, m and n on throughput reduction ....................................... 39
2.6.2 Testing other TCP variants and packet drop policies ............................... 45
2.6.3 Testing effect of targeting different links .................................................. 48
2.7 Multiple bottleneck scenarios ............................................................................ 50
2.7.1 Testing RTT-heterogeneous long-lived flows .......................................... 52
2.8 Test-bed Ex peri m e nts ......................................................................................... 55
2.9 Conclusions ........................................................................................................ 58
Chapter 3 A systematic performance and robustness testing framework ................... 62
3.1 Framework overview ......................................................................................... 62
3.2 Framework components ..................................................................................... 67
3.2.1 Step 1: Reviewing protocol specific a tion .................................................. 69
3.2.2 Step 2: Identifying phases of operation and creating the phase
transition diagram ..................................................................................... 70
3.2.3 Step 3: Determining desired phase transitions and the corresponding
external events ........................................................................................... 73
3.2.4 Step 4: Selecting paths of malicious flows (spatial parameters) .............. 75
3.2.4.1 Direct path .......................................................................................... 76
3.2.4.2 Indirect path ....................................................................................... 80
3.2.5 Step 5: Configuration of malicious flows (scaling and temporal
parameters) ................................................................................................ 82
3.2.6 Step 6: Selecting the appropriate scenario ................................................ 85
3.2.7 Step 7: Damage assessment for affected flows ......................................... 85
3.2.8 Conclusions ................................................................................................ 86
Chapter 4 XCP vs. XCP ............................................................................................... 89
4.1 Background on XCP .......................................................................................... 89
4.2 From protocol specification to abstract rules .................................................... 92
4.3 Phases of operatio n ............................................................................................. 94
4.3.1 Aggregate throughput model ..................................................................... 94
4.3.2 Analysis of XCP MIMD rule .................................................................. 100
4.3.3 Analysis of the XCP AIMD rules ............................................................ 102
4.3.4 Phase transition diagram .......................................................................... 109
4.3.5 Configuring temporal, spatial and scaling parameters of malicious
flows in adversarial scenarios ................................................................ 112
4.3.5.1 Creating a severe congestion interval ............................................. 113
4.3.5.2 Creating a severe congestion scenario ............................................ 113
4.3.5.3 Effect of phase of operation on success of adversarial
congestion scenarios ....................................................................... 115
vii
4.3.5.4 Determining number of malicious XCP flows and transition
time to directly or indirectly reach Phase-2 from Phase-3 ............. 118
4.3.5.5 Selecting path of malicious flows ................................................... 122
4.3.5.6 Simulation results ............................................................................ 123
4.4 Conclusions ...................................................................................................... 126
4.5 Potential future extensions ............................................................................... 129
Bibliography ................................................................................................................. 131
viii
LIST OF TABLES
Table 1-1: Comparing UDP attacks with our adversarial scenaros using
short-lived TCP flows .................................................................................. 15
Table 2-1: Effect of adversarial scenarios using short-lived TCP flows on
throughput of long-lived TCP flows ............................................................ 36
Table 2-2: Setup of the test-bed machines .................................................................... 56
Table 3-1: Desired phase transitions for TCP Tahoe .................................................... 74
Table 3-2: Link properties and load conditions ............................................................ 78
ix
LIST OF FIGURES
Figure 1-1: TCP/IP layered architecture ........................................................................... 2
Figure 2-1: An example of an adversarial scenario ....................................................... 28
Figure 2-2: General pattern of short-lived flows ........................................................... 29
Figure 2-3: A single group of fully-overlapped short-lived flows in each
congestion interval ...................................................................................... 33
Figure 2-4: Single bottleneck topology .......................................................................... 37
Figure 2-5: Effect of congestion recurrence period on throughput reduction%
for d=0.75 sec ............................................................................................... 40
Figure 2-6: Effect of congestion recurrence period on throughput reduction%
for d=1 sec .................................................................................................... 40
Figure 2-7: Frequency response of the long-lived TCP flows for d=1sec .................... 41
Figure 2-8: Overall throughput of short-lived flows ..................................................... 43
Figure 2-9: Effect of n on throughput reduction percentage ......................................... 44
Figure 2-10: Effect of aggregation of long-lived flows ................................................. 45
Figure 2-11: Comparing various TCP flavors, DropTail .............................................. 46
Figure 2-12: Comparing various TCP flavors, RED ..................................................... 47
Figure 2-13: Comparing the effects of targeting different links from node a
to node b...................................................................................................... 48
Figure 2-14: Overall throughput of short-lived flows targeting different links
from node a to node b ................................................................................. 50
Figure 2-15: Multiple-bottleneck topology .................................................................... 51
x
Figure 2-16: Effects of targeting backbone links ........................................................... 53
Figure 2-17: Effects of targeting non-backbone links ................................................... 53
Figure 2-18: Comparing throughput of short-livedflows targeting backbone
and non-backbone links .............................................................................. 55
Figure 2-19: Topology of the test-bed for the experiments ........................................... 56
Figure 2-20: Results of experiment 1 (targeting L2), experiment 2 (targeting
the bottleneck, L1) and experiment 3 (targeting both L1 and L2) ............ 57
Figure 3-1: Framework block diagram .......................................................................... 67
Figure 3-2: Phases of operation ..................................................................................... 71
Figure 3-3: Evolution of transmission rate for TCP Tahoe ........................................... 72
Figure 3-4: Phase transition diagram for TCP Tahoe .................................................... 73
Figure 3-5: Arbitrary topology and traffic distribution ................................................. 78
Figure 3-6: Identifying direct path candidates in our arbitrary topology ...................... 80
Figure 3-7: Indirect path candidates ............................................................................... 81
Figure 3-8: Tradeoff between number and duration of concurrent malicious
flows ............................................................................................................ 84
Figure 4-1: Symmetric dumbbell topology with XCP flows and cross traffic ............. 95
Figure 4-2: Effect of cross traffic on aggregate throughput .......................................... 99
Figure 4-3: Comparing aggregate throughput obtained by modeling and
simulation ................................................................................................... 99
Figure 4-4: Effect of control interval ........................................................................... 101
Figure 4-5: Effect of bottleneck capacity ..................................................................... 101
Figure 4-6: Phase-1 ....................................................................................................... 106
xi
Figure 4-7: Phase-2 ....................................................................................................... 107
Figure 4-8: Phase-3 ....................................................................................................... 107
Figure 4-9: Phase-4 ....................................................................................................... 108
Figure 4-10: Phase-5 ..................................................................................................... 108
Figure 4-11: Phase-6 ..................................................................................................... 109
Figure 4-12: Phase transition diagram ......................................................................... 110
Figure 4-13: Adversarial scenarios against target XCP flows in Phase-2 .................. 125
Figure 4-14: Adversarial scenarios against target XCP flows in Phase-3 or
Phase-4 .................................................................................................. 125
xii
ABBREVIATIONS
ACK ............................................................................................ Acknowledgment packet
AIMD .............................................................. Additive Increase Multiplicative Decrease
AQM ....................................................................................... Active Queue Management
AWND ............................................................................................... Advertised Window
CBR ......................................................................................................... Constant Bit Rate
CWND ............................................................................................... Congestion Window
DDoS .................................................................................... Distributed Denial of Service
DoS .......................................................................................................... Denial of Service
DupACK ................................................................................ Duplicate Acknowledgment
FAST ........................................................................................... Fast AQM Scalable TCP
IP .............................................................................................................. Internet Protocol
MIMD .................................................... Multiplicative Increase Multiplicative Decrease
MSS ............................................................................................. Maximum Segment Size
RTT ......................................................................................................... Round Trip Time
RTO ............................................................................................ Retransmission Time Out
RED .................................................................................................... Random Early Drop
SACK .................................................................................... Selective Acknowledgement
SSTHRESH ...................................................................................... Slow Start Threshold
TCP ................................................................................... Transmission Control Protocol
UDP .............................................................................................. User Datagram Protocol
XCP ...................................................................................... Explicit Congestion Protocol
xiii
ABSTRACT
Many modern variations of transport protocols are equipped with improved congestion
control algorithms that are more proactive rather than reactive to congestion.
Consequently it is expected that they will rarely need to invoke loss recovery
mechanisms (namely abrupt reduction in transmission rate and timeout intervals).
Therefore unlike congestion control algorithms, the loss recovery mechanisms have
undergone little or no change as the Internet has evolved and scaled up in terms of
dimension, traffic and bandwidth.
There is however insufficient insight regarding worst-case performance and robustness
of transport protocols with congestion control under severe congestion conditions
when even the tightest and most-enhanced congestion control mechanisms fail to
prevent severe congestion leading to packet drops and are thus forced to invoke their
legacy loss recovery mechanisms.
We develop a systematic framework to test performance and robustness of transport
protocols with congestion control by creating severe congestion scenarios that (a)
stress these protocols and expose performance vulnerabilities and robustness issues of
their congestion control and loss recovery mechanisms, (b) identify and highlight
harmful side effects of employing certain scheduling, active queue management
xiv
(AQM) and routing techniques along with these protocols, (c) serve as a benchmark
for non-malicious worst-case performance analysis in order to identify unintentional
occurrences of our scenarios in networks and, (d) virtually make an ideal Denial of
Service (DoS) or Distributed Denial of Service (DDoS) attack scenario, since they are
inherently more difficult to detect.
Using our systematic framework at the flow-level, we define and model our severe
congestion scenarios in terms of scaling, temporal and spatial parameters of the flows
involved in the scenario.
Our main case-studies include several variants of TCP and a promisingly-efficient and
TCP-friendly transport protocol for high bandwidth delay environments: eXplicit
Congestion Protocol (XCP). In conclusion, our study exposes vulnerabilities of rate-
adjustments rules used in congestion control mechanisms of these protocols and
provides systematic ways to exploit their vulnerabilities to invoke the protocols’ loss
recovery mechanisms repeatedly. We verify our findings by studying our severe
congestion scenarios in a packet-level simulation platform and running test-bed
experiments.
1
Chapter 1
Introduction
1.1 TCP/IP layered architecture
Transmission Control Protocol/Internet Protocol (TCP/IP) model also known as
Internet reference model collectively defines a set of rules to enable different types
of computer communication. TCP/IP suite also provides a framework for a set of
detailed standards to establish end-to-end connectivity and specify formatting,
addressing, transmission, routing and delivery of data from source to destination.
TCP/IP is generally characterized as a layered architecture with five layers of
abstraction: Application, Transport, Internet, Link and Physical layer [23][32][50].
Alternatively in the literature with a top-down approach and little emphasis on
physical layer design issues, TCP/IP is defined as having 4 layers [33][43][51][52].
Figure 1-1 illustrates how the five-layer TCP/IP architecture enables an end-to-end
communication between device A and device B through a physical link. Application
2
or process-to-process layer is where data is created and communicated to the
communication partners or peers. Transport or host-to-host layer provides a logical
communication link between two end-hosts, thus masking the underlying physical
connection and network topology. This is where transport protocols operate.
1.1.1 Transport protocols
The main responsibility of Transport Layer is the end-to-end data transfer
independent of the underlying network which allows applications to seamlessly
communicate through transport protocols. A common analogy to describe the
responsibility of a transport layer protocol is a literal transport mechanism such as a
vehicle whose responsibility is to safely deliver the passengers or cargo from source
to destination. However if a lower layer or an upper layer is responsible for safe
delivery, transport layer may no longer be required to perform its function reliably
Figure 1-1: TCP/IP layered architecture
3
and safely. Transport protocols are further categorized into two main classes:
connection-oriented and connectionless.
While a connection-oriented transport protocol such as TCP deploys mechanisms to
ensure integrity, in-order delivery and reliability of data transfer, a connectionless
transport protocol such as UDP either provides a completely unreliable data transfer
service to upper layer protocols or deploys a loose mechanism to enable data transfer
with some degree of reliability without any guarantees. The choice of appropriate
transport protocol therefore depends on the requirements of the upper layer
application. The focus of this dissertation is on the class of transport protocols that
provide a reliable end-to-end data transfer to application layer by deploying
congestion control and loss recovery mechanisms independent of underlying network
technologies. Such a reliable service ensures in-order, error-free data delivery in its
integrity to the application layer [33][43][50][52].
1.1.1.1 Congestion control and loss recovery
Congestion control mechanisms are algorithms and procedures that maybe deployed
at the transport layer to monitor an explicit or implicit congestion signal and to adjust
transmission rate such that what appears to be a congestion condition (according to
congestion signal) can be either reactively or proactively avoided.
Loss recovery mechanisms are algorithms and procedures that are deployed to fail-
safe the congestion control mechanisms. Regardless of the quality and tightness of
4
congestion control mechanisms, they are deployed along with loss recovery
mechanisms. When congestion control mechanisms fail to prevent congestion,
network congested buffers will ultimately overflow and drop packets and loss
recovery mechanisms must subsequently be invoked to retransmit the lost packets in
order to ensure reliability of data transfer.
1.1.1.2 Performance and robustness measures
Through using a combination of congestion control mechanisms and loss recovery
mechanisms reliability of data transfer can be ensured however this can compromise
performance and robustness of the transport protocol.
Measure of performance for a transport protocol is tied to the utilization of the
bottleneck link capacity along the end-to-end path of data transfer from source to
destination. Measure of robustness for a transport protocol corresponds to the queue
size at the bottleneck link buffer, the steady-state oscillation of the transmission rate
and end-to-end transfer delay. To make matters more complicated the above
performance and robustness measures are unfortunately not independent of each
other. Ideally transport protocols should operate at 100% utilization with zero queue
size at the bottleneck link buffer, bounded end-to-end delay and a stable steady state
transmission rate. We aim to challenge the performance and robustness of these
transport protocols under severe congestion scenarios and to identify the breaking
points and vulnerabilities of such protocols and demonstrate how much the worst-
case departs from ideal conditions.
5
In context of performance, the terms “fairness” and “friendliness” are often used.
Both terms are open to interpretations. Therefore in order to avoid confusion we
define and use the following notion of fairness and TCP-friendliness in this
dissertation.
Our notion of fairness is the same as described in the max-min fairness algorithm in
[4] i.e. a flow is at its max-min fair transmission rate if it is not possible to increase
its rate without reducing another flow’s equal or smaller rate. Also, when it is said
that a transport protocol is a TCP-friendly protocol, it means that a flow of that
protocol type will achieve no more than the average throughput of a TCP flow under
similar circumstances.
1.2 Motivation
Among transport protocols with congestion control, TCP [44] is the dominant
transport protocol with congestion control in the Internet. As the Internet scales up in
terms of dimension, traffic and bandwidth, many performance-enhancing
improvements and variations of the congestion control algorithm of TCP have been
suggested to accommodate this growth. The improved protocols strive to maintain
high utilization, stability and fairness. Most of them benefit from using better
congestion signals to obtain a fine-grained perception of network congestion and use
a combination of proactive and reactive measures to avoid heavy congestion and the
resulting packet losses.
6
It is commonly believed that improved congestion control algorithms rarely need to
invoke loss recovery mechanisms (namely abrupt reduction in transmission rate and
timeout intervals). Consequently, almost all modern variations of TCP simply inherit
the loss recovery mechanisms of the legacy TCP.
Due to insufficient understanding of worst-case performance and robustness of this
class of protocols with respect to their congestion control and loss recovery
mechanisms under severe congestion, it is imperative to systematically test and
expose the vulnerabilities of such protocols and evaluate their worst-case
performance.
1.3 Contribution of this dissertation
This dissertation focuses on systematic testing of performance and robustness of
transport protocols with congestion control under severe congestion conditions. It
provides a systematic test framework and reports on findings of two main case
studies.
Chapter 2 presents the test case study of TCP. In this case study, we develop TCP
adversarial scenarios where we employ short-lived TCP flows as malicious flows to
adversely target long-lived TCP flows [16][17]. We use simulations, analysis, and
test-bed experiments to systematically study the dependence of the severity of impact
on long-lived TCP flows on key parameters of short-lived TCP flows – including
their paths, durations, and numbers, as well as the intervals between consecutive
7
flows. We derive characteristics of pattern of short-lived flows that exhibit extreme
adverse impact on long-lived TCP flows. While randomly generated sequences of
short-lived TCP flows may provide some reductions (up to 10%) in the throughput of
the long-lived flows, the scenarios we generate cause much greater reductions (more
than 85%) for several TCP variants (Tahoe, Reno, New Reno, Sack) [19], and for
different packet drop policies: DropTail, and Random Early Detection (RED [22]).
Counter to common beliefs, our study also shows that (a) targeting bottleneck links
may not always cause maximal performance degradation for long-lived TCP flows,
and (b) constant-bit-rate (CBR) UDP flows are not the only adversary of TCP or
TCP-friendly flows, and short-lived TCP flows can also have an equally-adversarial
impact on long-lived TCP flows.
This further motivated us to suggest a systematic framework that (a) at the high-level
produces the scaling, temporal and spatial parameter settings for the severe
congestion scenarios for a specified transport protocol in an arbitrary topology, (b)
provide an estimation of the damage to target flows of that protocol type as a result
of these congestion scenarios, and (c) identifies breaking points of congestion control
mechanisms of such protocol associated with its loss recovery mechanisms.
Chapter 3 presents the procedural steps of our systematic framework. Our framework
is applicable to both window-based and rate-based TCP-friendly transport protocols
with congestion control. It is based on creating an adversarial congestion
8
environment for target flows, i.e., flows of the protocol type whose robustness is
being evaluated.
Our framework admits three sets of inputs:
(a) An arbitrary network topology configuration in the form of a directed
weighted graph.
(b) Traffic information of network flows (including the target flows)
characterized by their numbers, their flow type, protocol type and their paths.
(c) Protocol specifications of the network flows characterized as a set of rules
for adjusting the transmission rate under different circumstances.
Assuming that target flows in our adversarial scenarios are in steady-state, our
framework formulates an interaction model between malicious flows and target
flows to:
(a) Determine how many, how long and where to inject the malicious flows
in order to create a maximally-invasive congestion scenario.
(b) Provide an estimation of the damage in order to re-adjust the above-
mentioned parameters for the next round of severe congestion.
(c) Determine how often to repeat this scenario to prolong the drastic
performance degradation of the target flows.
9
The set of answers to the above questions exposes and quantifies the breaking points
and performance flaws of the target protocol and forms the structure of an
adversarial congestion scenario.
In Chapter 4, we follow the steps in our systematic framework to test performance
and robustness of a highly promising transport protocol, developed as an efficient
and TCP-friendly transport protocol especially for high bandwidth delay
environments: eXplicit Congestion Protocol (XCP) [18][28]. We also provide a
microscopic mathematical modeling of XCP aggregate throughput and validate its
accuracy through packet-level simulations.
Our test results point out the vulnerabilities and breaking points of the XCP flows
associated with their congestion control and loss-recovery mechanisms and provides
a measure of unfairness and unfriendliness among short-lived and long-lived XCP
flows that are seemingly fair and friendly flows as they belong to the same class of
protocols. This is the first and the only study that shows how XCP congestion control
mechanisms can be manipulated in a variety of severe congestion scenarios to reach
a breaking point while fully complying with the protocol specification and without
any modification or change in the original implementation of the protocol designers.
Our simulation results confirm a throughput reduction of up to 100% for target flows
during the congestion interval. In the next section, we explain how the contributions
of this dissertation are positioned among the related work.
10
1.4 Review of the related work
In this section we provide a review of the related work for TCP protocol and XCP
protocol.
1.4.1 TCP related work
Variants of TCP protocols [19] constitute the majority of flows in the network
(namely TCP Reno and TCP NewReno [20]). The related research in this area of
computer networks address a broad range of issues from modeling of TCP flows,
revisions of TCP congestion-control algorithms, proposals for new TCP-friendly
protocols and performance-improving AQM, Quality of Service (QoS), packet-
scheduling and routing techniques to UDP-based DoS attacks and the required
detection mechanisms[27][39]. The following is a summary of the most related
literature to our work in the context TCP.
All TCP flows in the Internet fall into one of the following two categories: (a) short-
lived and (b) long-lived. The majority of TCP flows in the Internet are short-lived
(e.g. web-like traffic). The main distinction between short-lived and long-lived TCP
flows (also called mice and elephants, respectively) is how the congestion window
grows. Short-lived TCP flows spend most of their lifetime in the slow start phase
when the congestion window (consequently the transmission rate) is increased
exponentially. Long-lived TCP flows also start in the slow start phase, but they
spend most of their lifetime in the congestion avoidance phase in which they perform
11
Additive Increase Multiplicative Decrease (AIMD) congestion control which is to
increase the congestion window (and therefore transmission rate) linearly and to
decrease congestion window (consequently the transmission rate) as required.
A lot of research has been conducted to develop separate models for mice [40] and
elephants [41] in order to predict their performance. Padhye et al. have developed an
analytical model for the steady state throughput of a bulk transfer TCP flow as a
function of loss rate and round trip time [41]. This model captures the behavior of
TCP’s fast retransmit mechanism as well as the effect of TCP’s timeout mechanism.
On the other hand, Mellia et al. [40] have proposed an analytical model to predict
TCP performance in terms of the completion time for short-lived flows.
Meanwhile various active queue managements [24][27] and routing schemes [54] are
proposed to ensure fairness between short-lived and long-lived flows, especially
under competition for bandwidth when links operate close to their capacity. Guo and
Matta proposed to employ a new TCP service in edge routers [24]. In this
architecture, TCP flows are classified based on their lifetime and short-lived flows
are given preferential treatment inside the bottleneck queue so that short connections
experience less packet drop rate than long connections. They have shown that
preferential treatment is necessary to improve response time for short-lived TCP
flows, while ensuring fairness and without hurting the performance of long-lived
flows. Additionally Kantawala and Turner have studied the performance
improvements that can be obtained for short-lived TCP flows by using more
12
sophisticated packet schedulers [27]. They have presented two different packet-drop
policies in conjunction with a simple fair queuing scheduler that outperform RED
and Blue packet-drop policies for various configurations and traffic mixes.
Furthermore, Vutukury and Garcia-Luna-Aceves have proposed a heuristic and
efficient algorithm for QoS-routing to accommodate low startup latency and high
call acceptance rates, which is especially attractive for the short-lived flows [54].
Moreover Jin et al have developed a new version of TCP, called FAST TCP [26] in
which they use queuing delay in addition to packet loss as a congestion measure.
This allows a finer-grained measure of congestion and helps maintain stability as the
network scales up. Meanwhile FAST TCP employs pacing at the sender to reduce
burstiness and massive losses. It also converges rapidly to a neighborhood of the
equilibrium value after loss recovery by dynamically adjusting the AIMD parameters
with more aggressive increase and less severe decrease as congestion window
evolves. Katabi et al. [28] have decoupled utilization control from fairness control
and proposed a new feedback-based transport protocol with congestion control called
XCP. Their protocol uses explicit congestion notification to adjust the transmission
rate and is shown to outperform TCP in both conventional and high-bandwidth delay
environments in terms of efficiency, fairness, stability, queue sizes and packet losses.
Kuzmanovic and Knightly [34] investigated a class of low-rate UDP denial of
service attacks which are difficult for routers and counter-DOS mechanisms to
detect. They have developed low-rate DoS traffic patterns using short-duration bursts
13
of UDP flows. Through a combination of analytical modeling, simulation and
Internet experiments, they have shown that such periodic low-rate attacks are highly
successful against both short-lived and long-lived TCP flows.
In this rather significant body of work, the performance enhancements in the form of
more efficient congestion control algorithms and new protocols have rendered the
improvement of loss-recovery mechanisms of TCP protocols much less compelling.
Also performance-improving AQM, QoS, scheduling and routing techniques all
preferentially treat some types of flows without contemplating the negative
consequences and increased vulnerabilities of the network in the event of a DoS or a
DDos attack using these types of flows. In particular, in the context of DoS and
DDoS attacks and the corresponding detection mechanisms, the focus has mostly
been on CBR-based UDP flows.
In summary, the above-mentioned improvements have significantly overshadowed
the potential vulnerabilities in congestion control mechanism that can be exploited to
invoke the conventional loss recovery mechanisms. There is no study that
systematically tests performance and robustness of this class of protocols with
respect to these mechanisms and evaluates inter-protocol or intra-protocol unfairness
and unfriendliness among flows that belong to the same class (e.g. short-lived and
long-lived TCP flows) under severe congestion (adversarial or not). Also, several
misconceptions prevail:
14
(a) Membership in this class naturally implies a sufficient level of fairness
and friendliness among the interacting flows.
(b) It is safe to provide preferential treatment to some flows in the network in
order to improve the overall performance
(c) The adversary of TCP and TCP-friendly flows is none other than UDP
flows.
Our work breaks new ground in that it presents a new perspective and a novel,
systematic approach to create adversarial congestion scenarios in order to put to test
performance and robustness of this class of protocols and provide a measure of
unfairness and unfriendliness among TCP or TCP-friendly flows in terms of the
reduction in their throughput under severe congestion.
Table 1-1 shows the percentage reduction in throughput of long-lived flows when
attacked by UDP flows [34]. is the congestion recurrence period. The table also
shows that an adversarial scenario using a carefully selected sequence of short-lived
flows achieves nearly equal reduction in throughput [16][17].
15
1.4.2 XCP related work
The following is a summary of the most related literature to our work in the context
XCP. The related research in this area address a wide range of issues from simple
timescale modeling, analysis of XCP equilibrium, simulation-based improvement
suggestions to enhance the congestion control mechanisms of XCP, implementation
as well as deployment issues in the Internet.
As also mentioned in TCP related work, Katabi et al. [28] used a control theory
framework in a novel approach to congestion control and developed a new
congestion control mechanism, i.e., XCP that decouples bandwidth utilization from
fairness control and significantly outperforms almost all varieties of TCP transport
protocol in both conventional and high bandwidth-delay environments. It has been
shown that XCP stays fair, achieve high utilization, small standing queues in steady
state and near-zero packet drop [12][18][28][59].
Table 1-1: Comparing UDP attacks with our adversarial scenarios using
short-lived TCP flows
Type of malicious flows
Long-lived TCP flows
throughput degradation
UDP constant bit rate flows Up to 100%
UDP short bursts with P=1 sec More than 90% [34]
Random mix of TCP short-lived and long-lived flows Up to 10%
Specific pattern of short-lived TCP flows with P=1 sec More than 85% [16][17]
16
Cheng et al [12] investigate the dynamics of XCP using a single bottleneck topology
and through analysis and simulation under certain simplifying assumptions show that
the bottleneck link utilization converges to 100% at an exponential rate and that
throughput of any arbitrary set of XCP flows will converge to its max-min fair share
which confirms the findings of original XCP design paper [28].
S. Low et al [36] theoretically prove that XCP equilibrium solves a constrained
max-min fairness problem and provide an algorithm to compute this equilibrium.
They furthermore show that XCP achieves max-min fairness in a single bottleneck
topology however in an arbitrary network of multiple bottleneck links, the additional
constraint in the XCP max-min fairness problem can cause a flow to achieve an
arbitrarily small fraction of its max-min fair rate. While this issue can lead to
unfairness, with proper choice of XCP control parameters, XCP still achieves at least
80% utilization [36].
Wang and Mills [55] however believe that the window-based dynamic model derived
in by S. Low et al. [36] is too complicated to be easily used and therefore propose a
simple rate-based model to analyze XCP equilibrium performance. They conclude
that their model can reproduce most of the results obtained from the window-based
model and derive the appropriate settings for XCP control parameters in order to
achieve high utilization [55].
17
Zhang and Henderson [59] investigate the implications of deploying XCP in the real
networks by conducting an experimental study. They implement XCP in the Linux
kernel and report on implementation challenges due to lack of large native data types
or floating point arithmetic. Furthermore they identify several possibilities for XCP
to enter into incorrect feedback control loops. These include leaving out framing and
Medium Access Control (MAC) overhead in raw bandwidth estimation, error in
estimation of bandwidth in shared medium such as wireless network or Ethernet,
insufficient kernel and socket buffer sizes at both sender and receiver sides and
deployment of XCP in hybrid networks where not all routers are XCP-enabled. The
identified challenges are believed to be intrinsic to XCP design which makes it
imperative to further improve XCP design before large-scale changes to end-hosts
and routers and deployment in the real network.
The work of Zhang and Henderson [59] is the most comprehensive related work in
the context XCP that reveals some oversights in XCP protocol that may become
problematic during operational deployment of XCP in real networks. While they
show sensitivity of XCP control algorithms to the use of incorrect information
provided by well-intentioned sources, they also point out the potential of a new array
of DoS attacks by misbehaving XCP users that lie about their rate or cheat and
modify the XCP router feedback [59].
Consequently deployment of XCP in public networks may require preventative
measures such as active monitoring on a per flow basis to detect and isolate
18
misbehaving users [59]. This is clearly in contrast to one of the attractive features of
XCP, i.e. not maintaining a per flow state at the routers [18][28].
In the context of DoS and XCP, C. Wilson et al. [56] investigate DoS attacks where a
small number of misbehaving XCP end-hosts (senders or receivers) can ignore or
overwrite the XCP router feedback to increase their share of the bandwidth, thus
severely degrading performance of other well-intentioned and cooperative XCP users
and stay undetected without any explicit and XCP-specific detection mechanism. .
Zhang and Ahmed [60] provide a control theoretic analysis using a revised fluid flow
model for XCP and conclude that XCP will not settle at zero steady state error which
is further shown to be bounded by the estimation error. They suggest that this bound
can be exploited in XCP router queue size planning. Although the results of this
study confirm their earlier observations [59], they suggest using more accurate
models by applying methods in digital control and robust control theory since
otherwise employing a fully discrete and range-limited model, will always render it
nonlinear and not amenable to closed form analysis [60].
Our XCP case study findings depart from any prior performance and robustness
analysis work in this area since the identified vulnerabilities are not due to
deployment conditions or estimation errors. Our adversarial scenarios reveal
vulnerabilities assuming perfect deployment conditions, zero capacity estimation
error and full compliance with XCP protocol specification.
19
As a DoS or DDoS attack, our adversarial congestion scenarios can create the same
severe outcome of the above-mentioned attacks due to other vulnerabilities in XCP
congestion control mechanisms and stay undetected. Unlike other DoS or DDoS
attacks, our scenarios do not require any change or modification in the XCP control
algorithms. In fact our XCP senders, receivers and routers fully comply with XCP
protocol specification and do not cheat or lie about the rate or the amount of
feedback.
Our XCP case study also provides a discrete range-limited fluid flow model for the
aggregate XCP throughput and our simulation results confirm the accuracy of this
model. Unlike the model in [12], our throughput model captures both transient and
steady state aggregate behavior of XCP flows.
20
Chapter 2
TCP vs. TCP
This chapter presents a comprehensive case study on testing robustness and
performance of four versions of TCP protocol: Tahoe, Reno, NewReno and SACK
with two packet drop policies DropTail and RED under severe congestion.
2.1 Background on TCP
In this section we provide a background on TCP congestion control algorithms and
loss recovery mechanisms and a summary of existing variations of TCP.
Adaptive congestion control is an end-to-end distributed algorithm employed at the
transport level of the TCP/IP layered architecture to ensure fairness, stability and
high utilization when network resources namely bandwidth and buffer space are
shared among competing network flows [33][43][50][52]. The first proposed
congestion-responsive transport protocol is TCP [44] which carries the majority of
the Internet traffic today. TCP is developed as a highly reliable, end-to-end, window-
21
based protocol for computer networks. Modern implementations of TCP contain four
intertwined algorithms: slow start, fast retransmit, fast recovery and congestion
avoidance [2]. In order to avoid flooding the receiver and the network, TCP sender
must regulate the rate of injecting segments to the network by observing the rate at
which the acknowledgments (ACKs) are received. To perform this flow control, TCP
sender and receiver maintain two parameters respectively: a window of outstanding
(unacknowledged) segments, i.e., congestion window and an advertised window size
by the receiver (awnd). Congestion window (cwnd) allows TCP sender to assess the
level of perceived network congestion while advertised window (awnd) informs the
sender about the available buffer space at the receiver. The sender then only
transmits up to the minimum of the cwnd and awnd [2][33][43][50][52].
2.1.1 Slow start
When a new TCP connection is established, TCP enters slow start where cwnd
evolves exponentially. On each acknowledgement for new data, cwnd is increased by
one segment until at some point the capacity of the network is reached and packet
losses are experienced at congested routers [2][25][33][43][50][52].
2.1.2 Congestion avoidance
During congestion avoidance regime, cwnd is increased either by one segment per
round trip time or one segment per window (Additive Increase); and if a packet loss
22
is detected by receiving three or more duplicate ACKs, cwnd is reduced to half its
current size (Multiplicative Decrease) [2][25][33][43][50][52].
2.1.3 Loss recovery mechanisms
For TCP, there are two indications of packet loss: (a) the reception of duplicate
ACKs, and (b) the expiration of the timeout timer. A TCP receiver sends a duplicate
For TCP, there are two indications of packet loss: (a) the reception of duplicate
ACKs, and (b) the expiration of the timeout timer. A TCP receiver sends a duplicate
acknowledgement each time it receives a segment with an out-of-order sequence
number due to packet loss, delay variations, bit errors, etc. TCP sender however
interprets reception of three or more duplicates ACKs only as a strong indication of a
packet loss due to congestion. Also and performs a loss recovery function based on
the following mechanisms: fast retransmission and fast recovery [2]
[20][25][33][43][50][52].
2.1.3.1 Fast retransmit
In earlier releases of TCP such as TCP Tahoe [19], fast retransmission was
implemented as a loss recovery measure less drastic than timeout. Using fast
retransmission algorithm, upon reception of the third duplicate ACK, sender records
half of the current value of cwnd (at least two segments or packets) in slow start
threshold (ssthresh) parameter, and then sets cwnd to one segment and without
waiting for a retransmission timer to expire, it retransmits the lost packet. After
23
successful loss recovery, sender invokes slow start until cwnd is equal to ssthresh, at
which point congestion avoidance takes over [2][33][43][50][52].
2.1.3.2 Fast recovery
TCP Reno [19] featured a major improvement over TCP Tahoe by using fast
recovery mechanism together with fast retransmission. The intuition behind fast
recovery is that the network can deliver duplicate ACKs hence it must not be heavily
congested. Therefore it is possible to use the network resources to some extent while
in fast retransmission. Furthermore it would be unnecessary to make an abrupt
change in transmission rate by invoking slow start after loss recovery under
moderate congestion conditions. In fast recovery, ssthresh is set to half of the current
size of cwnd, and cwnd is reduced to ssthresh plus the number of duplicate ACKs
required to trigger fast retransmit, i.e. three segments. Each time another duplicate
ACK arrives cwnd is incremented by one segment and a new packet is injected to the
network until the retransmitted packet and all the packets sent afterwards are
acknowledged. Sender then exits fast recovery, sets cwnd to ssthresh and enters
congestion avoidance [2][20][33][43][50][52].
2.1.3.3 Time out
In the event that packet losses are detected by the expiration of a timer (timeout
mechanism), TCP sets the ssthresh to half of the current cwnd, reduces cwnd to one
segment, retransmits the missing segment, performs slow start until cwnd reaches
24
ssthresh and then enters congestion avoidance [2][33][43][50][52]. Thus, TCP
congestion control mechanisms run on two timescales, one short and one long:
Round Trip Time and Retransmission Time Out, respectively. Allman and Paxon
have experimentally shown that TCP nearly obtains maximum throughput if there
exists a lower bound of one second for RTO [2]. Moreover, they found that in order
to achieve the best performance and ensure that the congestion is cleared, all flows
are required to have a minimum timeout of one second.
2.1.4 TCP New Reno
Although TCP Reno presented a major performance enhancement over TCP Tahoe
for recovering from a single packet loss without reducing the congestion window to
one, it did not address the problem of multiple packet losses within a congestion
window [19][33][43][50][52]. In order to recover from multiple packet losses in a
window, TCP sender needs to receive three duplicate ACKs for each lost packet.
However, since it halves cwnd every time it enters fast retransmission, it is nearly
impossible to recover from multiple packet losses by fast retransmission. In other
words as the congestion windows gets smaller there will not be enough packets left
in the window to transmit and trigger duplicate ACKs, so eventually a timeout will
occur.
A solution is provided in the next release of TCP called New Reno [20]. During fast
recovery, for every duplicate ACK, an unacknowledged packet from the front of the
25
transmission buffer is retransmitted. Furthermore for every regular ACK (not a
duplicate ACK) that acknowledges an outstanding packet, a new packet from the
back of the transmit buffer is sent which also resets the retransmission timer.
Retransmitting outstanding packets allows recovering from multiple losses (holes in
the range of sequence numbers) within a window and transmitting new data helps
maintain a high utilization while it is a form of loss recovery without going into
timeout. TCP New Reno [20] therefore outperforms TCP Reno and TCP Tahoe
significantly.
2.1.5 TCP SACK
Another solution is offered in the form of an option in another extension to TCP
Reno. During loss recovery, selective acknowledgement (SACK) option [19] is
enabled by using some extra bytes in each duplicate ACK to carry information about
the packets that have been delivered successfully. Therefore it suffices for a TCP
sender supporting SACK option to selectively retransmit the packets that have been
lost, hence recover from both single and multiple packet losses within a window.
2.1.6 Other TCP variants
Over time many revisions and improvements of TCP congestion control algorithms
are introduced in the form of a new variant of TCP including, TCP Vegas [7], TCP
Hybla [10], TCP Westwood [11 ], H-TCP [49 ], BIC TCP [57], FAST [26] , XCP
[18][28], Scalable TCP [31] and HS-TCP [21]. The loss recovery mechanisms of
26
these new variants however have undergone little or no change and in the event that
their congestion control mechanisms fail to avoid congestion and packet drops
happen, they still rely on legacy loss recovery mechanisms as the last resort to
recover from congestion and packet losses. There is however no systematic study to
test robustness of their congestion control mechanisms and evaluate the
vulnerabilities of this large class of protocols under severe congestion conditions
with respect to these mechanisms.
In this case study [16][17], we test robustness of Tahoe, Reno, New Reno and SACK
with different packet drop policies (DropTail, RED[22]). This comprehensive case
study and the obtained conclusions provide both motivation and a solid foundation to
develop a systematic framework to test performance and robustness of other
transport protocols with congestion control. We apply our framework to test
performance and robustness of XCP under severe congestion and provide a
comparative analysis based on the results produced by our adversarial congestion
scenarios.
2.2 Interaction of TCP short-lived and long-lived flows
The focus of this study is to systematically investigate the adverse impact of short-
lived TCP flows and long-lived TCP flows. Our work significantly departs from
prior studies in several ways. First, we use short-lived TCP flows (not UDP flows) to
in adversarial congestion scenarios against long-lived TCP flows. This allows us to
27
consider (a) scenarios where short-lived flows are malicious, i.e., designed to
intentionally disrupt long-lived flows, as well as (b) scenarios where the short-lived
flows are normal flows that coincidently adversely affect the long-lived flows.
Second, in contrast to previously studied scenarios using UDP flows against TCP
flows [34], we show that scenarios using short-lived flows at bottleneck links do not
necessarily cause maximum performance degradation for long-lived TCP flows.
Finally, we derive rules to identify locations and durations of short-lived flows, and
intervals between them that cause significant throughput degradation for long-lived
flows. Our work is the first one to study and generate scenarios in which short- lived
TCP flows target long-lived TCP flows so as to drastically affect their performance
[16][17].
We evaluate the effectiveness of our scenarios by measuring the reduction in
throughput of long-lived TCP flows. Simulation results show more than 85%
reduction for various TCP flavors (Tahoe, Reno, New Reno and Sack) and different
packet drop policies (DropTail, RED[22]) [16][17].
The scenarios where the short-lived flows are normal flows that severely affect long-
lived flows are useful for better characterizing worst-case performance of TCP. This
can be especially useful in cases requiring satisfaction of QoS guarantees. These
scenarios can also help obtain better estimates of average-case performance of TCP.
28
2.3 Arrangement of an adversarial congestion scenario
Consider an illustrative adversarial scenario with a single long-lived TCP flow that
passes through a bottleneck link along its path (Figure 2-1).
Figure 2-1: An example of an adversarial scenario
Initially, the long-lived TCP flow is in the congestion avoidance phase and performs
AIMD. We assume that the maximum congestion window is limited by the
bottleneck capacity. Now assume that a malicious user creates severe congestion on
a link along the path of the long-lived flow by sending multiple short-lived TCP
flows that can be visualized as a series of spikes (Figure 2-2). During each burst,
when the total traffic generated by the short-lived flows and the single long-lived
flow exceeds the capacity of that link, packet losses induced are sufficient to force
the long-lived flow to back off for an RTO of one second [16][17][34]. Suppose the
RTO timer expires at time=t1. At this time the congestion window is set to one, the
value of RTO is doubled and packet transmission is resumed by retransmission of the
packet lost at time=t1. Now if the same pattern of short-lived flows repeats between
time=t1 and time=t1+2RTT such that the retransmitted packet is also lost, the sender
1
2
3 4 5
Bottleneck
L
o
n
g
-l i
v
e
d f l
o
w
s Long-lived
flows
Long -l iv ed
fl ows
Short-lived
flows
S
h
o
r
t -l i
v
e
d
f lo
w
s
29
of the long-lived TCP flow now has to wait for 2RTO seconds until the
retransmission timer expires. As a result the long-lived TCP flow repeatedly enters
the retransmission time out phase, which is doubled every time retransmission of a
lost packet fails. Consequently, the long-lived flow obtains nearly zero throughput.
As the result of this congestion scenario, throughput of the long-lived TCP flow is
reduced by almost 100%. However, the short-lived TCP flows (unlike UDP attacks)
are also affected by the congestion since their window evolution is subject to the
TCP slow start rules. Hence the efficacy of short-lived TCP flows in conducting the
above scenario is unclear.
Let us define the following adversarial scenarios. Figure 2-2 depicts the periodically-
injected short-lived flows. Each spike is a short-lived TCP flow in slow start and its
Figure 2-2: General pattern of short-lived flows
Time (sec)
Congestion
window
d(1,1)
P
(1,1)
(1,2)
(1,m)
(n,1)
(n,2)
(n,m)
(2,2)
m
(2,1)
(2,m)
g(1,1) d(i,j) g(i,j) d(n,1) g(n,1) G
(i,j)
W(i,j)
r(i,j)
W(1,1)
r(1,1)
W(n,1)
r(n,1)
M(1,1) M(i,j) M(n,1)
Time (sec)
Congestion
window
d(1,1)
P
(1,1)
(1,2)
(1,m)
(n,1)
(n,2)
(n,m)
(2,2)
m
(2,1)
(2,m)
g(1,1) d(i,j) g(i,j) d(n,1) g(n,1) G
(i,j)
W(i,j)
r(i,j)
W(1,1)
r(1,1)
W(n,1)
r(n,1)
M(1,1) M(i,j) M(n,1)
30
congestion window grows exponentially from 1 to
before it either hits
congestion and enters time out or is terminated. In general, the maximum achieved
window size
may not be a power of two. Let
denote the last value of the
congestion window that is a power of two and
stand for the amount of data (in
bytes) that is transferred in a partial window afterwards. Also let denote capacity
of the targeted link, aggregate throughput of the long-lived flows,
duration of
the spike - which is the time it takes for a short-lived flow to time out or be
terminated, average rate of the spikes in a period of sec and
the total number
of packets sent in a spike. In general, there can be groups of spikes in one period
with a time gap
between successive spikes and another time gap at the end of
each periodic interval before the next interval starts.
In order to force the long-lived flows to time out, the overall throughput of all flows
(short-lived and long-lived) should exceed the targeted link capacity such that many
packets are lost from the corresponding window of data. Therefore the average rate
of short-lived flows in a period should satisfy the following condition:
. 1
Since short-lived flows are in slow start and the congestion window evolves
exponentially, it takes
to transmit the full window and
to send
the partial window. Equation (2) gives
in terms of
,
and
:
. 2
31
Also the total number of data packets sent in each spike (i.e.
) can easily be
found from M
ij
and r
ij
as shown in Equation (3):
2
1
. 3
Equation (4) gives the period of the congestion interval in terms of the durations of
spikes and the time gaps:
∑
. 4
Thus average rate of the short-lived flows in a period is sum of the throughput of
each row in Figure 2-2. The throughput of each row is the amount of data transmitted
in one interval by the spikes in that row divided by the interval duration. Equation
(5) shows average rate of the short-lived flows in a period:
∑
∑
∑
∑
∑
. 5
In order to satisfy condition in (1), should be maximized. Therefore the time gaps
in Figure 2-2 should be removed, i.e.
0 . Using these values, we get:
∑
∑
∑
. 6
Also it seems that increasing would augment the summation in the numerator of
Equation (6) and consequently boost . However it should be noted that the value of
inversely depends on . In other words, increasing n means placing more non-
32
overlapping groups of short-lived flows in a period, which results in smaller spikes
(both in height and width) and consequently smaller . However if there is only one
group of short-lived flows in a period, they will have the entire period to open and
grow their congestion window to the value allowed by the spare capacity of the
targeted link.
Since the flows in this group are fully overlapped, is the sum of their throughputs.
Hence increasing will increase until condition in (1) is satisfied at
(Equation (7)). Conversely further increasing will result in smaller , since the
short-lived flows will start competing with each other. Hence we put 1 and
bound to in equation (6). By substituting in equation (1) we get:
∑
. 7
Now recall from the illustrative adversarial scenario that the effective time interval is
of the order of RTT of the long-lived flows. Therefore condition in (1) on the average
rate seems rather conservative. Figure 2-3 shows a single group of fully-overlapped
short-lived flows in each congestion interval. As can be seen from this Figure, short-
lived flows have a high transmission rate around the peak of the spikes. In other
words, it suffices to have
exceed the spare capacity of the targeted link,
provided that the buffers are already full. However, it takes
to fill the
buffers along the targeted path. Suppose the initial queue size in the buffer is and
the maximum buffer size is . Then the time it takes to completely fill the buffers is:
33
, 8
where is the throughput of the short-flows during this time interval.
Equation (9) shows in terms of
, and for a single group of short-lived
flows, similar to equation (7) for :
′
∑
. 9
The modified condition is therefore:
, 10
where
is the throughput of the short-lived flows during
.
Figure 2-3: A single group of fully-overlapped short-lived flows in
each congestion interval
34
However by this time the buffers of the targeted link(s) are nearly full, therefore the
short-lived flows can send at most one more round of packets, i.e., at most one full
window before they start losing packets. Since the short-lived flows are still in slow
start phase, the next full window will be at most twice as large as the last full
congestion window.
Therefore
for a single group of m short-lived flows is:
∑
′
. 11
Naturally it follows that the effective interval is
. As mentioned before in the case
of a single long-lived flow,
is of the order of the RTT of the long-lived flow. In a
heterogeneous environment with multiple long-lived flows with different round trip
times, an analogous argument suggests that
should be greater than or equal to all
the round trip times of the long-lived flows. Hence most of the long-lived flows
(ideally all of them) are forced to timeout simultaneously for at least RTO
seconds. In
this case RTO is the minimum retransmission time out among the heterogeneous long-
lived flows. Obviously during the timeout phase, the condition in (8) does not hold.
Furthermore, the interval between successive
s which is ideally of the order of
RTO, gives the short-lived flows a chance to gain more energy, increase their overall
rate and prepare to create congestion during
.
35
2.4 Investigating effect of temporal, spatial and scaling parameters
We take the idea of creating such sustained patterns of severe congestion and explore
it in several dimensions. We investigate the scalability of such scenarios in terms of
the number of long-lived flows. Furthermore, we study the effects of the temporal
distribution of malicious flow. Additionally, we investigate the spatial distributions
of various adversarial scenarios on multiple links in an attempt to locate the most
vulnerable targets. Ultimately, we suggest the most effective settings for the pattern
of short-lived flows during the congestion interval that maximizes its effects on the
performance of the long-lived TCP flows [16][17].
Table 2-1 summarizes the results obtained for various settings. Our simulation
results indicate that in a random mix of traffic where the target location is also
randomly selected among all the links shared by short-lived and long-lived flows, the
throughput degradation for long-lived flows is less than 10%. The details will be
explained in the next section. Since the steady state performance of long-lived TCP
flows (such as FTP transfers that are used in the simulation of congestion scenarios)
is characterized by their throughput, we consider the percentage reduction in overall
throughput of long-lived flows as the evaluation metric.
36
2.5 Packet-level simulations
In this section we explore the impact of short-lived TCP adversarial scenarios on the
performance of long-lived TCP flows [16][17]. We designed a series of detailed
packet level adversarial scenarios to answer the following questions. As the number
of long-lived TCP flows increases, how should the pattern of the adversarial scenario
formed by short-lived TCP spikes change in order to maintain the same near-zero
throughput? What links should be the targets of short-lived TCP adversarial
scenarios to achieve the most aggregate throughput reduction? In an arbitrary
topology some long-lived TCP flows may share one or more bottlenecks but all may
not share the same bottlenecks. In a large-scale scenario, how are the period and
duration of short-lived TCP spikes determined? The first group of such scenarios is
designed for a chain topology to study the effects of aggregation of homogeneous (in
Table 2-1: Effect of adversarial scenarios using short-lived TCP flows on
throughput of long-lived TCP flows
Congestion recurrence
period, P
Throughput degradation
d=0.5 d=0.75 d=1
0.5 sec
d=0.5 d=0.75 d=1
1 sec
> 75% > 80% > 80%
1.5 sec
> 80% > 85% > 85%
2 sec
> 75% > 80% > 85%
2.5 sec
> 65% > 65% > 65%
37
terms of RTT) long-lived TCP flows in the event of adversarial scenario created by
short-lived TCP flows. The second group of adversarial scenarios is designed for an
arbitrary topology (produced using random topology generator) to investigate the
effects of aggregation of heterogeneous (in terms of RTT) long-lived TCP flows with
multiple bottleneck links. In general, we refer to the link(s) with minimum unused
capacity as the bottleneck(s).
2.6 Single bottleneck scenarios
In this section, we describe our adversarial scenarios on the single bottleneck
topology (shown in Figure 2-4) simulated using n-2 [8].
In this topology, five groups of long-lived TCP flows share a chain of link s and each
of these links is also shared with a group of short flows. Link L3 is the bottleneck
link with a bandwidth of 3 Mbps and one-way propagation delay of 5 msec. Initially,
the long-lived TCP flows are in the congestion avoidance phase. We assume that
maximum congestion windows of all TCP connections are limited by network
capacity. Short-lived TCP flows are periodically injected on links L1 to L5 for five
6
8
7
9
10
0
11
1 2 3 4 5
14 13
12 15
L1 L2 L3 L4 L5
Bottleneck
L11 L12 L13 L14 L15
L6
L10
L9
L8
L7
Path of Short-lived Flows
Path of Long-lived Flows
6
8
7
9
10
0
11
1 2 3 4 5
14 13
12 15
L1 L2 L3 L4 L5
Bottleneck
L11 L12 L13 L14 L15
L6
L10
L9
L8
L7
Path of Short-lived Flows
Path of Long-lived Flows
5 Mbps
10 ms
L1, L2, L4, L5, L6, L7, L8, L9, L10
3 Mbps
5 ms
L3
5 Mbps
50 ms
L11, L12, L13, L14, L15
Bandwidth
Delay
Links
5 Mbps
10 ms
L1, L2, L4, L5, L6, L7, L8, L9, L10
3 Mbps
5 ms
L3
5 Mbps
50 ms
L11, L12, L13, L14, L15
Bandwidth
Delay
Links
Figure 2-4: Single bottleneck topology
38
successive intervals (periods) according to the pattern depicted in Figure 2-3. There
are 5 sets of concurrent short-lived TCP flows in each time slot. All short-lived TCP
flows that are in a set have the same source and destination. Each set is identified by
one of the following pairs of source and destination nodes: (11, 1), (12, 2), (13, 3),
(14, 4), (15, 5). For instance, Set 3 is the group of short-lived flows that start at node
13 and end at node 3.
We measure the aggregate throughput reduction percentage of the long flows and
plot it vs. , the number of concurrent short-lived flows in a set sent in a time slot.
We also measure the overall throughput of short-lived flows and plot against . We try to
identify the effective values for parameters involved in the adversarial scenario pattern,
i.e., , , and . Preliminary observations indicate that temporal distribution of such
scenarios on these links is less effective as compared to simultaneous scenarios on
the targeted links. Therefore throughout the rest of adversarial scenarios, malicious
short-lived flows are spatially distributed on multiple links but temporally
concurrent. In this case refers only to the slow start duration.
In the following section, we present a baseline set of simulations to identify the best
settings for parameters of the short-lived flows. We verify the findings from our
analytical model for short-lived flows through extensive simulations with different
parameter settings for duration, period and aggregation of short-lived flows.
39
2.6.1 Effect of d, P, m and n on throughput reduction
In Figures 2-5 and 2-6, all the short-lived flows are terminated after 0.75
and 1 , respectively. The period of congestion recurrence is changed from
0.5 sec to 2.5 sec and the throughput reduction percentage of long-lived flows is
plotted vs. , for 1 (Figures 2-5 and 2-6).
In this set of simulations, the TCP version for both long-lived and short-lived flows
is Reno and the packet drop policy at all buffers is DropTail. As can be seen in these
Figures, regardless of the value of , the highest percentage reduction corresponds to
congestion recurrence period of one second. Interestingly, this is the default value for
TCP Retransmission Time Out value (RTO). In other words, when the congestion
recurrence period matches the RTO of the TCP long flows, the throughput
degradation is maximized. As the congestion recurrence period increases, the gap
between expiration of the retransmission timers of the long flows and the subsequent
congestion interval increases and consequently long flows obtain higher throughputs.
In Figures 2-5 and 2-6, for 15 most of the short-lived flows are already in the
time out phase, when they are terminated at 1 . For 15 the most
effective scenario corresponds to 1 and 1 . In this case, even a low-
rate stream of malicious flows can significantly reduce the throughput of the long-
lived flows. For higher values of , the percentage reduction is almost equal for
0.75 and 1 . The time it takes for the short-lived flows in these
40
scenarios to enter retransmission time out varies between about 0.75 sec and 1 sec.
For 15 and 1 , however, most of the short-lived flows are still in slow
start, competing with long-lived flows for 0.25 sec longer than when 0.75 ,
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage, d=1 sec
p=0.5
p=1
p=1.5
p=2
p=2.5
Figure 2-6: Effect of congestion recurrence period
on throughput reduction% for d=1 sec
Figure 2-5: Effect of congestion recurrence period
on throughput reduction% for d=0.75 sec
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage, d=0.75 sec
p=0.5
p=1
p=1.5
p=2
p=2.5
41
and obviously at a higher rate. As a result, more and more long-lived flows are
forced to time out for 15 . As increases more short-lived flows enter time out
thus the throughput reduction percentage for long-lived flows saturates and
increasing from 0.75 sec to 1 sec does not make much difference.
As indicated by the shape of the curves, increasing the rate of malicious flows does
not monotonically increase the percentage reduction in the throughput long-lived
flows. Once the overall throughput of the long-lived and malicious short-lived flows
reaches the capacity of the corresponding shared links, increasing the rate of the
malicious short flows does not further reduce the throughput of the long-lived flows.
Figure 2-7 depicts the frequency response of the long-lived TCP flows to short-lived
TCP flows adversarial scenarios in terms of percentage throughput reduction vs. the
Figure 2-7: Frequency response of the long-lived TCP flows for d=1 sec
0
10
20
30
40
50
60
70
80
90
100
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
P=Attack Period (sec)
Throughput Reduction Percentage
m=10
m=20
m=30
42
period of congestion recurrence for several values of (the number of concurrent
short-lived flows per interval). The most degradation in throughput of the long-lived
flows occurs when the congestion recurrence period is close to the TCP null
frequency, which is identified to be the minimum retransmission time out, i.e., 1 sec.
However, it is worth mentioning that TCP null frequency is practically more than 1
sec, since all flows will not enter time out exactly at the same time.
In [34], Kuzmanovic and Knightly have developed a simple model to capture the
frequency response behavior of a long-lived TCP flow under UDP short-burst
attacks. Equation (12) gives their model for normalized throughput
of a single
long-lived TCP flow in terms of the attack period and the minimum retransmission
time out :
. (12)
Although long-lived TCP flows in our scenarios suffer from the same null frequency,
their frequency response at other frequencies does not follow the trend in [34].
Figure 2-8 shows the overall throughput of the short-lived flows vs. , for different
congestion recurrence periods. For a fixed number of short-lived flows with a fixed
duration of 1 , as the period of the congestion recurrence increases, the time
gap between successive congestion intervals becomes larger. Consequently the
average throughput of short-lived flows decreases.
43
Figure 2-9 shows the effect of on the performance of long-lived flows. During
each congestion recurrence period, consecutive groups of simultaneous short-
lived flows are injected on the targeted links. In order to fit more such groups in one
period, i.e., to increase , must decrease, i.e., short-lived flows must be terminated
sooner. Consequently the overall rate of the short-flows will decrease and fewer
long-lived flows will time out. As can be seen, at 1 , throughput reduction
percentage for the long-lived flows is maximized. According to the observations, it
seems that 1 , 1 and 1 are thus far the most effective settings
[16][17]. The rest of simulation results are obtained with these settings.
Figure 2-8: Overall throughput of short-lived flows
0
500
1000
1500
2000
2500
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Overall Short Flows Throughput (KB/sec), d=1sec
p=0.5
p=1
p=1.5
p=2
p=2.5
44
In the following set of simulations, we study the scalability of our adversarial
scenarios in terms of long-lived flows and evaluate the effectiveness of our
adversarial scenarios on large aggregations of long-lived flows.
Figure 2-10 depicts the effect of aggregation of homogeneous long-lived flows. As
the number of long-lived flows increases, the throughput reduction percentage
slightly decreases. However the adversarial scenarios are still highly successful and
severely degrade the throughput of long-lived flows.
Figure 2-9: Effect of n on throughput reduction percentage
-10
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1 sec,p=1 sec
n=1
n=2
n=4
n=6
n=8
45
2.6.2 Testing other TCP variants and packet drop policies
We further explore the effect of using several TCP variants for different packet drop
policies. Figure 2-11 shows the simulation results for different TCP variants with
DropTail packet drop policy. TCP variants are modifications of the original TCP
algorithm to help TCP flows survive multiple packet drops from a window or within
an RTT and to help maintain a high throughput under moderate congestion. However
simulation results indicate that all variants are significantly affected by the
adversarial scenarios. This is because when many short-lived flows, all in slow-start,
compete with long- lived TCP flows that are in congestion avoidance, so many
Figure 2-10: Effect of aggregation of long-lived flows
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec
5 Groups of 3
5 Groups of 6
5 Groups of 9
5 Groups of 12
46
packets from the window are lost that even these improvements fail to make up for
vulnerabilities inherent to TCP.
Authors in [34] proposed randomization of RTO to reduce the effectiveness of the
short burst UDP attacks. We have not yet tested our scenarios against this method.
Figure 2-12 demonstrates the same effect for the above TCP variants with RED
packet drop policy. The main objective of employing RED in routers is to prevent
global synchronization of TCP flows. It has been shown in [34] that RED is unable
to avoid the synchronization effects if the source of synchronization is an external
malicious source such as DoS attacks. Interestingly, we observe that even if the
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec,m=30
Reno (DropTail)
New Reno (DropTail)
Tahoe (DropTail)
Sack (DropTail)
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec,m=30
Reno (DropTail)
New Reno (DropTail)
Tahoe (DropTail)
Sack (DropTail)
Figure 2-11: Comparing various TCP flavors, DropTail
47
source of synchronization is TCP itself, RED fails to avoid the synchronization of
TCP flows, cutting their window in half or entering the time out almost
simultaneously. Since long-lived TCP flows have the same minimum retransmission
time out value, when the congestion recurrence period matches this value, it
repeatedly induces a synchronization effect by forcing the long-lived flows to enter
the time out at about the same time and exit the time out nearly together.
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec, m=30
Reno, RED
New Reno, RED
Tahoe, RED
Sack, RED
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec, m=30
Reno, RED
New Reno, RED
Tahoe, RED
Sack, RED
Figure 2-12: Comparing various TCP flavors, RED
48
2.6.3 Testing effect of targeting different links
Although the impact of target location is not explicitly captured in our analytical
model for short-lived flows, it is of great importance to the outcome of the
adversarial scenario. Through this set of simulations, we identify the most vulnerable
target locations in order to effectively disrupt the throughput of long-lived TCP
flows. Figure 2-13 depicts the effect of adversarial scenarios on different links in
Figure 2-4 for the following settings: 1 , 1 and 30 . The
horizontal axis shows the targeted links between node (a) and node (b). For instance
‘2to4’ means targeting links L3 and L4 that are between node 2 and node 4.
0
10
20
30
40
50
60
70
80
90
100
0to1 0to2 0to3 0to4 0to5 1to2 1to3 1to4 1to5 2to3 2to4 2to5 3to4 3to5 4to5
Targeted Paths, Node 'a' to Node 'b'
Throughput Reduction Percentage d=1 sec, p=1 sec, m=30
0
10
20
30
40
50
60
70
80
90
100
0to1 0to2 0to3 0to4 0to5 1to2 1to3 1to4 1to5 2to3 2to4 2to5 3to4 3to5 4to5
Targeted Paths, Node 'a' to Node 'b'
Throughput Reduction Percentage d=1 sec, p=1 sec, m=30
Figure 2-13: Comparing the effects of targeting
different links from node a to node b
49
As can be seen in the first five bars, the percentage throughput reduction increases as
more links are targeted. The 5
th
bar shows the largest throughput reduction
percentage when all 5 links (L1 to L5) are targeted. The next 4 bars correspond to the
congestion recurrence intervals initiated on links between node 1 and node 2, 3, 4
and 5, respectively. Again increasing the targeted path length increases the
throughput degradation of the long-lived flows. The next bar shows effect of
targeting the bottleneck link. Surprisingly, the bottleneck link turns out to be more
robust in these scenarios. The key reason for this behavior goes back to the definition
of the bottleneck link. The bottleneck link is a link with the least unused capacity.
The effectiveness of our scenarios highly depends on the ability of malicious short-
lived flows to force the long-lived flows to enter timeout. When there is little
capacity left on the targeted link, short-lived flows are not able to open their
congestion window to a level where they inject a sufficient number of packets on the
link to create many correlated packet losses from long-lived flows. Thus fewer long-
lived flows will enter retransmission timeout and the adverse impact on them will not
maximal. Simulations show a mere 25% throughput degradation when only the
bottleneck link is targeted. However, targeting the bottleneck link and other links
increases the effectiveness of our scenarios to more than 80%.It is also observed that
targeting links closer to the destination is slightly more effective because packets that
are dropped on the last links have already traveled previous links and added to the
traffic and contributed to the congestion on those links.
50
Figure 2-14 shows the overall throughput of the short-lived flows when different
links are targeted. Obviously targeting more links using the same pattern requires
more short-lived flows. As a result the overall rate of short-lived flows increases
when more links are targeted. Finally, when the bottleneck link is targeted, the
overall rate of the short-lived flows is minimal for the reason mentioned above
2.7 Multiple bottleneck scenarios
In this section we describe the adversarial scenarios on an arbitrary topology (Figure
2-15) simulated in ns-2 [26]. In this topology, we identify four duplex links B1 to B4
as the bottleneck links. 48 groups of 3 long-lived flows originate and end in different
0
500
1000
1500
2000
2500
0to1 0to2 0to3 0to4 0to5 1to2 1to3 1to4 1to5 2to3 2to4 2to5 3to4 3to5 4to5
Targeted Paths, Node 'a' to Node 'b'
Overall Short Flows throughput (KB/sec) d=1,p=1,m=30
0
500
1000
1500
2000
2500
0to1 0to2 0to3 0to4 0to5 1to2 1to3 1to4 1to5 2to3 2to4 2to5 3to4 3to5 4to5
Targeted Paths, Node 'a' to Node 'b'
Overall Short Flows throughput (KB/sec) d=1,p=1,m=30
Figure 2-14: Overall throughput of short-lived flows
targeting different links from node a to node b
51
stubs. These flows are further categorized into eight larger groups according to the
bottleneck links that they share. The throughput reduction for each of these groups or
T
ij
is measured and plotted vs. . and denote the nodes incident on the bottleneck
link that is shared among flows in each group. For instance, T03 is percentage
reduction in the throughput of all the flows that pass through link B4 on the direction
from 0 to 3. All links have a bandwidth of 15Mbps and the propagation delays vary
from 1 msec to 49 msec.
As before, we assume that maximum congestion windows of all TCP connections are
limited by network capacity, and before the congestion interval of short-lived flows
starts, the long- lived TCP flows are in the congestion avoidance phase.
B1
B2
B3
B4
A1
A2
A3
A4 A5
A6
A7
A8
A9
A11
A10
A12
B1
B2
B3
B4
A1
A2
A3
A4 A5
A6
A7
A8
A9
A11
A10
A12
Figure 2-15: Multiple-bottleneck topology
52
The simulation results obtained here are consistent with those obtained from the
single bottleneck topology simulation.
2.7.1 Testing RTT-heterogeneous long-lived flows
We further use this arbitrary topology to investigate the effects of aggregation of
heterogeneous long-lived TCP flows (in terms of RTT) with multiple bottleneck
links. Figures 2-16 and 2-17 show the simulation results for two adversarial
scenarios.
In the first scenario short-lived TCP flows are periodically injected on links B1 to B4
in both directions using the pattern shown in Figure 2-3. Note that there are two
groups (T
02
and T
20
) that pass through two bottleneck links.
The throughput reduction percentage for each of the other six groups is inversely
proportional to the unused capacity at the corresponding bottleneck link. This unused
capacity of the link depends on the bandwidth delay product of the link, the buffer
size and the number of flows passing through that link. As can be seen from Figure
2-16, the more capacity left on a link, the more short-lived flows will be able to
utilize this capacity. On the other hand, the throughputs of the two groups that pass
through two bottleneck links are more degraded since they are targeted on two links.
This is of course consistent with the results obtained from the chain topology, as
more links are targeted we notice more reduction in throughput of the long-lived
flows.
53
Figure 2-16: Effects of targeting backbone links
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec
T03
T30
T12
T21
T13
T31
T20
T02
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec
T03
T30
T12
T21
T13
T31
T20
T02
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec
T03
T30
T12
T21
T13
T31
T20
T02
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Throughput Reduction Percentage,d=1sec,p=1sec
T03
T30
T12
T21
T13
T31
T20
T02
Figure 2-17: Effects of targeting non-backbone links
54
In the second scenario, 12 duplex links (A1 to A12) that are not the bottleneck links
are targeted by the same pattern of short-lived flows. Figure 2-17 shows the
throughput reduction percentage of the 8 above-mentioned groups vs. m.
It is worth mentioning that in this scenario each of the flows in these groups passes
through two targeted links. For 15 , the effectiveness of our scenarios is highly
dependent on the unused capacity of the targeted links. However as increases,
more and more short-lived flows are injected on these links such that many
correlated packet losses are created. Consequently the throughput degradation varies
between 80% and 90% in a quasi-stable manner. Comparing the two scenarios, we
conclude that the second scenario (non-backbone links) is more effective since the
percentage throughput reduction for each group of long-lived flows in this scenario
is about 5% to 30% higher. However Figure 2-18 shows that the overall rate of the
short-lived flows in the second scenario is almost 3 times as high as that in the first
scenario, which is consistent with our prior observations from the single bottleneck
topology. Recall that in the second scenario, 12 duplex links are targeted whereas in
the first scenario only four duplex links are targeted. The results obtained from this
set of simulations indicate that the proposed adversarial pattern is highly successful
in heterogeneous-RTT environments with multiple bottlenecks and a high level of
multiplexing.
55
2.8 Test-bed Experiments
In order to verify the findings from simulations, we set up a test-bed with the
topology shown in Figure 2-19 to run several experiments. Table 2-2 shows the
required tools and utilities on the test-bed machines for the experiments. Links L1
(the bottleneck) and L2 are set to 32Mbps and 40Mbps respectively. The
corresponding buffers are limited to 200 packets.
All other links have a total of 50 concurrent long–lived flows are created by
downloading 50 copies of a 4MB file from web servers W1 and W2 web to client C3.
After 10 seconds, client C3 starts creating 10 sets of short-lived flows. To do so, every
one second, C3 download 200 concurrent copies of a small file from web server W4
(experiment 1).
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Overall Short Flows Throughput (MB/sec),d=1sec,p=1sec
Backbone
Non-backbone
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25 30 35 40
m=Number of Short Flows per Interval
Overall Short Flows Throughput (MB/sec),d=1sec,p=1sec
Backbone
Non-backbone
Figure 2-18: Comparing throughput of short-lived
flows targeting backbone and non-backbone links
56
In the second experiment, C3 creates the same number of long-lived flows and C2
creates the short-lived flows in the same manner but from web server W3.
In the third experiment both C2 and C3 create shot-lived flows by downloading
multiple copies of the small file from web servers W3 and W4, respectively, while C3
creates the long-lived flows.
Httpd,tcpdump,zerbra Linux Client C3
ALTQ, zebra FreeBSD Client C2
ALTQ, zebra FreeBSD Client C1
Httpd,tcpdump,zerbra Linux Web server W1,W2
W2,W3
Tools OS Function Machine
Httpd,tcpdump,zerbra Linux Client C3
ALTQ, zebra FreeBSD Client C2
ALTQ, zebra FreeBSD Client C1
Httpd,tcpdump,zerbra Linux Web server W1,W2
W2,W3
Tools OS Function Machine
Table 2-2: Setup of the test-bed machines
Server W 3
C1 C2 C3
Server W 2
Server W 4
Server W 1
32 M bps 40 M bps
100 M bps
100 M bps
100 M bps 100 M bps
L1 L2
Bottleneck
Server W 3
C1 C2 C3
Server W 2
Server W 4
Server W 1
32 M bps 40 M bps
100 M bps
100 M bps
100 M bps 100 M bps
L1 L2
Server W 3
C1 C2 C3
Server W 2
Server W 4
Server W 1
32 M bps 40 M bps
100 M bps
100 M bps
100 M bps 100 M bps
L1 L2
Bottleneck
Figure 2-19: Topology of the test-bed for the experiments
57
In the fourth experiment, no short-lived flows are created. Traces of all downloads are
collected using tcpdump. We measure the throughput of the long-lived flows when the
short-lived downloads are in process (Experiments 1, 2 and 3) and compare it with the
case where only long-lived flows exist (Experiment 4).
Figure 2-20 shows the result of the above three experiments for two packet drop
policies: RED and DropTail. As can be seen in this Figure, targeting L1, which is the
bottleneck link, as compared to targeting L2 is less effective for both packet drop
policies. This observation is completely consistent with the simulation results.
However, the amount of reduction in the throughput of the long-lived flows depends
on the ability of short-lived flows to force them to time out. This is also dependant on
Figure 2-20: Results of experiment 1 (targeting L2), experiment 2 (targeting the
bottleneck, L1) and experiment 3 (targeting both L1 and L2)
0
10
20
30
40
50
60
70
80
90
100
Experiment 2 Experiment 1 Experiment 3
Throughput Reduction Percentage
RED
DropTail
0
10
20
30
40
50
60
70
80
90
100
Experiment 2 Experiment 1 Experiment 3
Throughput Reduction Percentage
RED
DropTail
58
the spare capacity on the targeted links. Therefore if the targeted link is highly loaded,
such as a bottleneck link, the short-lived flows will not be able to open their congestion
window much in order to inject enough packets to fill up and overflow the buffers and
create many packet losses for the long-lived flows. Also as the experiment results
show, targeting more links has a more severe impact on the throughput of the long-
lived flows (targeting L1 and L2).
2.9 Conclusions
We used a combination of modeling and systematic simulations to test several
variants of TCP protocol with different packet drop policies under severe congestion
condition. In our adversarial congestion scenarios, we deployed short-lived TCP
flows in to test performance and robustness of long-lived TCP flows and we showed
that large reductions in the throughput (>85%) of long-lived TCP flows can be
obtained using short-lived TCP flows in severe congestion scenarios with the
following characteristics:
(a) The ideal duration of each adversely-affecting short-lived flow must be 1sec, a
parameter related to the Retransmission Time Out interval (RTO) of TCP
(identified as the NULL frequency of TCP).
(b) The rate of recurrence of congestion intervals created by temporally over-
lapping groups of short-lived flows must also resonate with TCP’s NULL
frequency to achieve worst-case performance degradation for target flows.
59
(c) The number of short-lived flows in each group is determined as a function of
the spare capacity of the target links.
(d) Consecutive groups of short-lived flows must follow each other back-to-back
with no temporal gap in order to create a sustained pattern of congestion and
continuously invoke the loss recovery mechanism of target flows.
(e) Having a shorter roundtrip time (RTT) gives the short-lived flows an advantage
in obtaining more throughputs.
(f) It is ideally preferred that each group of short-lived flows shares only one link
with the path of target flows. It is also best if this common link is neither
heavily congested nor lightly loaded prior to injecting short-lived flows.
(g) Ideally individual groups of short-lived must be injected on all links of the
entire path of the target flow in order to create severe congestion.
(h) In each congestion interval, the buffer at the congested link, must be kept full
(hence dropping all incoming packets) for a period of time that is greater than
or equal to RTT of target flows.
Contrary to common beliefs, injecting the malicious flows on the bottleneck links
alone does not always lead to maximum throughput reduction for the target flows.
This is due to the fact that if the bottleneck link is heavily congested, then the
malicious short-lived TCP flows cannot obtain sufficient bandwidth.
60
Furthermore, we conducted test-bed experiments and were able to validate the
scenarios where targeting the bottleneck link did not lead to significant percentage
throughput reduction. This verified that the above conclusion is not an artifact of
simulation inaccuracies.
Our results demonstrate that even TCP-friendly flows, if carefully orchestrated, can
severely degrade throughput of long-lived TCP flows. They do so even when the
number of long-lived flows is large. In other words our adversarial scenarios are
scalable since the ratio of number of short-lived to number of long-lived flows is
always within normally expected limits.
Our results further demonstrate how short-lived TCP flows can starve long-lived
TCP flows without even having the advantage of a shorter RTT. This shows the
potentially harmful consequences of deploying scheduling, active queue
management (AQM) and routing techniques that provide a preferential treatment to
short-lived flows and give them higher priority over long-lived flows.
Our adversarial congestion scenarios virtually make an ideal Dos or DDoS attack
since they are more difficult to detect. They can also serve as a benchmark for non-
malicious worst-case performance analysis in order to identify unintentional
occurrences of such scenarios in the networks.
In summary, our adversarial congestion scenarios identify the following
vulnerabilities in TCP congestion control and loss recovery mechanisms:
61
(a) TCP congestion control allows a TCP flow to either aggressively or
conservatively increase its transmission rate. If a group of TCP flows that
adjust their rates aggressively share network resources with another group of
TCP flows that adjust their transmission rate conservatively, severe
performance degradation for the second group of flows can occur that would
lead to unfairness and unfriendliness.
(b) Time out mechanism of TCP can be invoked repeatedly through successive
sever congestion intervals which means throughput of TCP flows can stay at
zero theoretically or very close to zero practically for a long time.
(c) Since congestion control mechanisms are reactive rather than proactive and
preventative, severe performance degradation and instability can occur before
any alleviating steps can be taken.
62
Chapter 3
A systematic performance and robustness testing framework
3.1 Framework overview
TCP vs. TCP comprehensive case study [16][17] and its conclusions are the
motivation and foundation to develop a systematic framework to test robustness of
other transport protocols with congestion control under severe congestion conditions.
In order to better understand the circumstances of severe congestion conditions, we
start by examining the characteristics of congestion conditions. This analysis is only
at the level of abstraction for transport layers and does not take into account any
additional delays, congestion conditions, non-congestion losses and back-off and
retransmission delays at underlying layers such as IP or MAC.
Assuming fixed bandwidth and propagation delays, a heavy or severe congestion
condition is manifested by increased end-to-end queuing delay (and therefore
transfer delay) without packet loss or with packet loss (single or multiple drops in
one RTT). All transport protocols with congestion control that have been developed
63
so far employ some form of explicit or implicit congestion signal to detect such a
congestion condition. Protocols with proactive, precise and fine-grained congestion
signals detect congestion at the early stages, e.g., when the perceived queuing delay
reaches a certain threshold and protocols with reactive course-grained congestion
signals detect it only after the packet drops occur (either by timeout as a result of
non-receipt of acknowledgments for outstanding data, or through receipt of duplicate
acknowledgements). In any case, all these protocols respond to a notification of
congestion condition by reducing the transmission rate. We can therefore conclude
that the most drastic response to congestion is reducing the transmission rate to zero.
While the existence of worst-case congestion is a sufficient condition for having a
worst-case response, i.e., transmission rate reduction to zero, this response does not
necessarily correspond to the worst-case congestion condition. Since we cannot
provide guarantees for a worst-case congestion condition, we examine characteristics
of a severe congestion condition and greedily try to take a severe congestion
condition as close as possible to the worst-case congestion. Heavy or severe
congestion condition is characterized by [33][43][50][52]:
1) Maximized end-to-end delay (by increasing queuing delay).
2) Maximum waste of resources (in this case link bandwidth) due to the
maximum number of packet drops.
3) Consequently minimized transmission rate (reduced to zero).
64
The combined effect of the above three factors can be captured by a well-known
performance and robustness metric: throughput. Therefore in our framework we use
throughput of target flows, both as an objective function and an evaluation metric for
our scenarios. Testing performance and robustness of a congestion control transport
protocol thus translates to stressing a transport protocol under severe congestion
scenarios such that throughput is minimized (ideally reduced to zero).
Our framework is therefore based on creating severe congestion scenarios (aka
adversarial congestion scenarios) in a given arbitrary environment for the flows of
the protocol type that is under test for robustness and performance (i.e., the target
flows). The rest of network traffic (background flows) fall into three categories:
(a) Unaffected by the congestion scenario in which case their effect can be
factored in our framework merely as an aggregate throughput that offsets the
available capacity in the network if that information is available.
(b) Adversely affected by our congestion scenario either unintentionally or
deliberately as an accessory to create the congestion scenario.
(c) Favorably affected by our congestion scenario again either unintentionally or
deliberately as an accessory to create the congestion scenario.
Additionally, we select another group of flows either from the same type or the same
family of congestion-responsive protocols that share many characteristics (such as
65
self-clocking and loss-recovery mechanisms) with the target flows to act as
malicious flows that cause congestion in our adversarial scenarios.
The reason for choosing the same class of protocols is three-folds. First, it invalidates
a common misconception that only CBR UDP flows can severely impact congestion-
responsive flows in adversarial scenarios. Second, it allows us to study the
interaction of congestion-responsive flows under heavy congestion and expose and
evaluate the resulting intra-protocol unfairness and unfriendliness. Third, it makes
our adversarial scenarios applicable as DoS or DDoS attacks that are virtually
undetectable by existing detection mechanisms, since our malicious flows and our
target flows exhibit similar dynamics, unlike CBR UDP flows. Our selection criteria
for malicious flows is based on their ability to aggressively adjust their transmission
rate and effortlessly blend well with the rest of flows to stay above suspicion in an
adversarial congestion scenario [16][17].
In our framework we use the following four guidelines [4][16][17][33][43][50][52]
to create severe congestion condition:
(a) Ideally individual groups of malicious flows must be injected on all the links
along the end-to-end path of target flows such that all buffers along the end-to-
end path become full and hence maximize the queuing (and the end-to-end)
delay for target flows. However it should be noted that in networking
environments where queuing delay is not the dominant delay component in the
end-to-end delay (compared to propagation delay), the effect of maximizing
66
queuing delay may be negligible. Regardless of the significance of queuing
delay, for a severe congestion scenario to result in packet drops, filling up
buffers is a necessary task.
(b) In each congestion interval, the buffer at the congested link, must be kept full
(hence dropping all incoming packets) for a period of time that is greater than
or equal to RTT of target flows [34].
(c) Consecutive groups of malicious flows must follow each other back-to-back
with no temporal gap (and if required some temporal overlap) in order to
ensure a sustained pattern of congestion with a traffic intensity greater than 1
at each individual queue form at each buffer [16][17].
(d) Finally it is ideally preferred that malicious flows in each group share only one
link with the path of target flows therefore filling the buffer corresponding to
that link and shares no other link among themselves [16][17].
It should be noted however that the hostile behavior of malicious flows is not
entirely intrinsic as it depends on several scaling, temporal and spatial parameters
including their total number, paths, durations as well as the intervals between
consecutive groups of such flows and the number of flows in each group [16][17].
The above four guidelines along with proper choice of scaling, spatial and temporal
parameters form our adversarial congestion scenarios. Although simulations at the
packet-level are not an integral part of our approach for finding our severe
congestion
platform.
3.2 Fram
Figure 3-1
framework
(a) An
graph w
sources
n scenarios,
mework co
1 shows the
k admits thre
arbitrary net
where edges
s and destina
Fi
we will eva
omponents
block diagr
ee sets of inp
twork topolo
s represent co
ations of netw
igure 3-1: F
aluate our sc
s
am of our fr
puts:
ogy configur
ommunicatio
work flows.
ramework b
enarios usin
framework fo
ration in the
on links and
block diagr
ng a packet-l
or scenario g
form of a di
d vertices or
am
level simulat
generation.
rected weigh
nodes repres
67
tion
Our
hted
sent
68
(b) Traffic information of the network flows (including the target flows)
characterized by their numbers, their flow type, protocol type and their end-to-end
paths.
(c) Protocol specifications of target and malicious flows.
Our framework then creates adversarial congestion scenarios by strategically
injecting several groups of malicious flows in the given topology to minimize
throughput of the target flows. We define an adversarial congestion scenario as a
series of consecutive severe congestion intervals characterized by scaling, temporal
and spatial parameter of malicious flows that create the congestion. For the reasons
mentioned in the previous section, malicious flows are selected either from the same
type or the same family of congestion-responsive protocols that share many
characteristics (such as self-clocking and loss-recovery mechanisms) with the target
flows [16][17]. The selection criteria for malicious flows is also based on their
ability to aggressively adjust their transmission rates and seamlessly blend with the
rest of flows to virtually go undetected in an adversarial congestion scenario
[16][17]. Under certain circumstances, our framework further deploys some
background flows (i.e., flows that co-exist with target flows in the network) as
accessories to establish a transitive connection between the malicious flows and
target flows.
Next we examine each step of the framework (Figure 3-1) in detail and use TCP
Tahoe protocol [19][33][43][50][52] as an example to illustrate the required tasks or
69
procedures to determine and configure scaling, spatial and temporal parameters of an
adversarial congestion scenario.
3.2.1 Step 1: Reviewing protocol specification
The first step is to review the protocol specification [2][44] for both malicious flows
and target flows to identify protocol variables that characterize the transmission rate
and the corresponding minimum and maximum values, the timescale of protocol
operation as well as abstract rules for adjusting the transmission rate or the timescale.
Suppose the protocol under test is TCP Tahoe. We can therefore identify cwnd
(congestion window), ssthresh (slow start threshold), RTT (round trip time) and RTO
(retransmission timeout) as variables that determine the average transmission rate of
TCP Tahoe [41]. We also learn that a sender using TCP Tahoe protocol operates on
two timescales, a short timescale in the order of RTT and a long time scale in the
order of RTO [2][44]. For simplicity, we can assume that both timescale parameters
are fixed during our adversarial scenarios and focus on rules to adjust the
transmission rate.
By definition, transmission rate of a window-based transport protocol, such as TCP
Tahoe, is the amount of data sent during each time interval divided by the duration of
that time interval, i.e., cwnd divided by RTT [33][41][43][50][52]. Since we assume
RTT to be fixed, in order to determine rate adjustment rules we need to determine the
adjustment rules for cwnd. In other words, how sending side of the protocol decides
to increase or decrease the amount of data sent during each RTT.
70
Again considering TCP Tahoe sender as an example, the rules to increase or
decrease value of cwnd consists of an MIMD rule that increases cwnd by one packet
or segment size for each new acknowledgment packet received (before an
acknowledgment timer expires) and decreases cwnd to one packet or segment size if
three duplicate acknowledgments arrive or an acknowledgment timer expires. It also
uses an AIMD rules that increases cwnd by one segment or packet size per RTT
assuming all outstanding data is acknowledged and decreases cwnd in a similar way
as the MIMD rule. Finally TCP Tahoe has a rule not to send any data in three cases:
during handshaking and connection setup (initial state), during its longer operation
timescale (RTO) and during handshaking and tearing down the connection (final
state). In all three cases, cwnd is not increased or decreased and transmission rate for
data packets is effectively fixed at zero although very small control packets are
exchanged during initial and final state [2][44].
3.2.2 Step 2: Identifying phases of operation and creating the phase
transition diagram
Once we identify the rules used to adjust the transmission rate, we consider each rule
as a phase of operation for the protocol. In other words, a phase of operation (PO) is
characterized by the growth or shrinking rules for transmission rate (e.g., cwnd in
case of a window-based protocol such as TCP Tahoe). We can therefore
mathematically quantify different phases of operation and associate a level of
71
aggressiveness to each phase: idle, conservative, moderate or aggressive. Figure 3-2
illustrates such an arbitrary classification.
Figure 3-2: Phases of operation
Figure 3-3 shows a similar classification for evolution of transmission rate for TCP
Tahoe. Next we create a phase transition diagram where each phase is characterized
by the protocol variables used to determine the transmission rate and the rule or
policy to adjust the transmission rate. In this diagram, each edge marks a transition
from one phase to another if the condition specified on the edge (the trigger for phase
transition) is satisfied.
Figure 3-4 shows the phase transition diagram and the associated triggers in terms of
mathematical and logical conditions for TCP Tahoe. As mentioned before, we use
the same protocol for both target and malicious flows in our adversarial scenarios.
Therefore we refer to the same phase transition diagram to determine the desired
phase of operation for both malicious and target flows as well as the desired phase
Idle PO
Aggressive PO
Moderate PO
Conservative PO
Transition
72
Figure 3-3: Evolution of transmission rate for TCP Tahoe
transition for target flows as the outcome of the adversarial congestion scenario in
order to satisfy the objective function which is to minimize the throughput of target
flows.
Unless otherwise specified, in this chapter we use index T to denote the variables and
parameters of target flows and index M to denote the variables and parameters of
malicious or adversarial flows.
73
Figure 3-4: Phase transition diagram for TCP Tahoe
3.2.3 Step 3: Determining desired phase transitions and the
corresponding external events
As we learn from TCP vs. TCP case study, we prefer to start and keep malicious
flows in an aggressive PO where the transmission rate is adjusted quite aggressively
in order to obtain a high transmission rate or throughput in a short amount of time
[16][17]. Since the objective function is to minimize the throughput of target flows
(ideally reduce it to zero), we are specifically interested in the sequence of events
74
that trigger a transition from any PO to a conservative or ideally an idle PO for target
flows. This can be determined by examining our phase transition diagram. Since the
number of rate adjustment rules (hence the total number of identified phases) for all
similar transport protocol is small, the complexity of this task is not a concern. In our
TCP Tahoe example, slow start or Phase-2 is an aggressive phase, while congestion
avoidance or Phase-3 is a relatively moderate (or even conservative) phase and time
out and the initial/final state are the idle phases where the transmission rate is zero.
In this example therefore we create and keep the malicious flows in Phase-2 (the
aggressive phase) and try to force the target flows to reach Phase-4 (time out phase)
regardless of their initial phase.
Table 3-1 shows a list of desired transitions from any PO to an idle PO as well as the
required mathematical and logical conditions for TCP Tahoe as an example.
Table 3-1: Desired phase transitions for TCP Tahoe
As shown in this table, we also need to determine the external cause or event that
satisfies these conditions for phase transitions. Furthermore we determine ways to
create such an event in order to trigger the corresponding phase transition. Since the
objective function is to minimize the throughput (ideally reduce to zero), the desired
75
phase transition is from any phase to Phase-4. If the initial phase for target flows is
unknown, our congestion scenario can be designed assuming steady state (or Phase-3
in case of TCP Tahoe) as the initial phase [16][17].
Based on the timing semantics of the phase transition diagram, the time it takes to
transition from any phase to Phase-4, exit and repeat, determines the recurrence
period of our congestion interval. In this example, this time is identified as RTO.
In this step we can also provide a qualitative answer for duration of congestion
interval. The duration of each congestion interval is in fact the duration of each
group of concurrent malicious flows and is determined by the time it takes to fill the
buffer plus at least one RTT
T
(round trip time of target flows) [16][17][34]. Since we
want to repeatedly and periodically transition to Phase-4, naturally this duration
should be less than or equal to recurrence period of congestion intervals, i.e., RTO
[16][17][34]. In Step 5 we provide a quantitative answer for this parameter.
Next, we explore the given network topology and traffic in order to select
appropriate paths for malicious flows. It should be noted that Step 4 can be
performed in parallel with steps 1 through 3.
3.2.4 Step 4: Selecting paths of malicious flows (spatial parameters)
In this step, we determine paths of malicious flows based on the following criteria:
accessibility, length or hop count, load level and path type (direct or indirect).
76
Based on the accessibility of links of the path of target flows, the path of each
malicious flow either directly shares a link with the path of a target flow or has an
indirect contact with the path of one or more target flows by two degrees of
separation through intermediate paths. Therefore we may also refer to our adversarial
congestion scenarios as either direct or indirect scenarios.
Naturally, we prefer to have a direct path for our malicious flows, since it provides
direct control with less dynamics [16][17]. However, if links of the path of target
flows are not directly accessible or there is a concern about detection of our direct
congestion scenario, we try to find feasible indirect paths for malicious flows. In this
case we use background flows on the intermediate paths to serve as accessories to
our malicious scenarios. As a result, they may be adversely or favorably affected.
Since this step of the framework is independent of the protocol specification, we use
an arbitrary topology and traffic distribution as an example to demonstrate the
required procedures to enumerate path candidates and to identify feasible path
candidates.
3.2.4.1 Direct path
As mentioned before, we attempt to create adversarial congestion scenarios that are
as close as possible to a worst-case congestion condition. Therefore we prefer to
greedily create as many congested links as possible to maximize the effect of our
scenarios on target flows. In other words, we like to have a spatial distribution of
77
groups of malicious flows over the entire path of target flows. Ideally each of these
groups shares only one link with the path of target flows and do not share any links
with each other. This limits the congested links for the path of malicious flows to one
and eliminates unnecessary first-order interaction among spatially-distributed
malicious flows. Furthermore, our heuristics from TCP vs. TCP case study show that
a moderately-loaded link is a better candidate as the common link between target
flows and malicious flows since it has sufficient spare capacity to allow the
malicious flows to obtain a sufficiently-high throughput [16][17].
Figure 3-5 shows the arbitrary topology and traffic distribution used as an example.
Unless otherwise specified, index B denotes parameters of the background flows,
index F denotes traffic in forward and index R denotes traffic in reverse direction.
Each continuous arrow represents the communication link in the forward direction,
denoted by lk_i. Similarly the communication link in the reverse direction is denoted
by rlk_i. Traffic on reverse links travel in the opposite direction of continuous
arrows.
Table 3-2 shows the properties of each link in terms of capacity, buffer size and
propagation delay. As shown in the table, the amount of spare capacity can therefore
be determined directly if rates of target and background flows are known prior to
congestion. Otherwise we can use the max-min fairness algorithm [4] to determine
the link utilization and therefore the spare capacity on each link, based on the
number and distribution of target and background flows. However the max-min
78
fairness algorithm [4] provides the spare capacity on each link in the ideal case and
therefore may underestimate the real spare capacity for an arbitrary topology and
traffic distribution. Alternatively a more conservative approach can be used to
determine the spare capacity of each link by assuming zero rates for all existing
Table 3-2: Link properties and load conditions
Figure 3-5: Arbitrary topology and traffic distribution
79
target and background flows and considering all links empty and unutilized. Next,
assuming all paths and nodes are accessible, we enumerate all direct paths based on
the following criteria: type, hop count or length and load level.
1) Type: determine all direct paths (level-0). These paths share one and only one link
with the path of the target flow level [16][17].
2) Length or hop count: enumerate all direct (level-0) paths that are two hops or longer
(one hop for the shared link and at least one more hop as the incoming link to the
congested buffer).
3) For each identified direct (level-0) path apply one of the following two load
conditions [17]:
Load condition 1: lk_i or rlk_i upstream or downstream of the shared link with
path of target flows
where and
are rates of background and
target flows, is the spare capacity on each of the above-mentioned upstream or
downstream links and is the capacity of the shared link.
Load condition 2: As mentioned before, if the rates of the target and background
flows are not known, a more conservative condition, i.e., can be applied.
Figure 3-6 shows direct path candidates identified in our arbitrary topology for this
example. After eliminating poor candidates, i.e., direct path candidate that do not
satisfy the required load condition, we have three good direct path candidates.
80
In Step 5 we determine the configuration of malicious flows, i.e., the scaling and
temporal parameters for malicious flows for each of the good direct path candidates.
Figure 3-6: Identifying direct path candidates in our arbitrary topology
3.2.4.2 Indirect path
In cases where no direct path candidates are accessible or pass the type, load, and
length criteria, we try to identify paths that provide an indirect way of affecting
target flows by interacting with background flows. In other words, we attack some
background flows in the network such that the pressure of the bottleneck link for
other background flows are relieved which causes an increase in their transmission
rates. If such background flows exist and share a non-bottleneck link with the paths
of target flows, we can therefore affect the target flows by creating this indirect surge
of traffic on the paths of target flows.
81
In determining an indirect way of affecting target flows we require three types of
path as defined below. We define level-0 paths as paths with at least one background
flow that share a link other than their bottleneck link with a path of target flows. We
also define level-1 paths as paths with at least one background flow that have a
common link with path of level-0 flows but do not share any link with any path of
target flows. This common link must be the bottleneck for level-0 flows. As before
we prefer to limit the number of common links between each two groups of such
flows to one, in order to reduce and control the dynamics of our congestion scenarios
[16][17]. Finally, we define level-2 paths as paths with at least one background flow
that share one and only one link with one and only one level-1 path and shares no
links with any path of target flows or any level-0 path (to reduce dynamics and first-
order interaction among background flows). Similar to direct scenarios, we apply the
length (hop-count) and load condition to all identified level-2 paths and their
common link with level-1 paths and eliminate poor candidates. Figure 3-7 shows list
of indirect path candidates.
Figure 3-7: Indirect path candidates
82
3.2.5 Step 5: Configuration of malicious flows (scaling and temporal
parameters)
In this step we determine the scaling and temporal parameters of malicious flows for
each identified path candidate. If scaling and temporal parameters cannot be
determined for any of the path candidates, we conclude that our scenarios are not
feasible for the given topology and traffic distribution.
We start by selecting a malicious flow type and the corresponding PO in order to
formulate the transmission rate of malicious flows. In this example, malicious flows
are short-lived TCP flows in slow start or Phase-2 (the aggressive PO).
Assuming each packet size (denoted by ) is 8000 bits and based on rate adjustment
rule in slow start, we determine rate of a single malicious flows as
. 1
Next, for each selected path candidate we apply the knowledge form queuing theory
[4][16][17][33] to fill and cause an overflow for the buffer corresponding to the
shared link between path of target flows and level-0 paths in the direct scenario, or
the shared link between level-1 and level-2 paths in the indirect scenario.
Assuming a buffer with maximum size of
and the corresponding outgoing link
with capacity , let be the average rate of target flows in steady state, the
average rate of background flows (other flows entering the above-mentioned link)
83
and the average rate of malicious flows entering this buffer, then
or the
time it takes to fill the buffer (assuming it is initially empty) is found as:
. 2
In this equation we substitute for
, , and as our known parameters to
compute and
(the unknowns). Considering only one malicious flow and
assuming it takes for a single malicious flow in slow start to reach a
transmission rate sufficiently high to fill the buffer, we substitute Equation (1) for
and as
in Equation (2). Since we started with a path candidate
we also know the value for
. We can therefore iteratively solve Equation (2)
and find and hence find the number of round trip times it takes a single malicious
flow to fill the buffer. However, using a single malicious flow does not provide a
robust congestion scenario, since this malicious flow is also of a protocol type that is
congestion responsive and may lose packets and reduce its rate. Therefore we need
to use more malicious flows in order to make our scenarios more robust and highly
successful. However there is a tradeoff between the number of concurrent malicious
flows and the duration of malicious flows (Figure 3-8) [16][17].
Using 2
concurrent malicious flows instead of one, we have
where (see Figure 3-8). Since we have already
determined the value of , we can start from 1 and iterate for values of 1
84
Figure 3-8: Tradeoff between number and
duration of concurrent malicious flows
such that
where
.
Finally, after we converge on acceptable values for and , we need to check the
following condition as mentioned in Step 3 [17]. For each path candidate we need to
check whether
. 3
In other words we need to ensure that the determined parameter settings and the RTT
of the candidate path are such that duration of congestion interval is less than or
equal of recurrence period of congestion interval. If none of the path candidates
satisfies this condition with the determined parameter settings, we conclude that no
feasible scenario can be created given the arbitrary topology and traffic distribution.
85
3.2.6 Step 6: Selecting the appropriate scenario
If more than one path candidate pass the above-mentioned test, we can select among
the path candidates based on the following criteria: detectability of path, proximity of
the shared target link to destination node and total rate of malicious flows (the lower
the better). As always, direct paths which provide more control are always preferred
to indirect paths. Furthermore, based on our TCP vs. TCP case study we suggest to
use all feasible direct paths to make the congestion scenarios as severe as possible
[16][17].
3.2.7 Step 7: Damage assessment for affected flows
In this step, we evaluate the severity of our congestion scenarios by providing an
estimation of performance degradation for target flows in terms of their throughput
during and after the congestion interval. This is determined by the rate adjustment
rule during an idle or a conservative PO for the target flows and the number of target
flows that are actually forced to transition to a conservative or an idle PO. The
maximum degradation for target flows is when all target flows are forced to
transition to idle phase (100% throughput reduction).
Our simulation results verify that throughput degradation for long-lived TCP flows is
more than 85% when all direct paths are accessible and used. This means that a
majority of long-lived TCP flows are forced to an idle PO and reduces their rate to
86
zero as a result of multiple packet drops and the rest of target flows are forced into a
conservative PO and reduced their rate by as much as 50%.
The results of damage assessment as well as topology and traffic evaluations can be
used to adjust and refine the parameters of our congestion scenarios for the
subsequent rounds of congestion. In order to create another round of congestion
interval, we need to first re-evaluate the topology and traffic distribution and then
repeat steps 4 through 6.
Additionally we can perform our congestion scenarios in a packet-level simulation to
verify the findings of our framework and the severity of our severe congestion
scenarios. However, the simulations at the packet-level are not an integral part of our
framework.
3.2.8 Conclusions
Our TCP vs. TCP comprehensive case study [16][17] and its conclusions provided
the motivation and foundation to develop a novel systematic framework to test
performance and robustness of transport protocols with congestion control under
severe congestion conditions. To our knowledge the proposed systematic framework
is also the first study to introduce and utilize the concept of phase transition diagram
for transport protocols with congestion control based on abstract rate adjustment
rules of each protocol.
87
While creating and studying the finite state machine of complex protocols such as
TCP [2][44] are far from trivial, the proposed phase transition diagram breaks new
ground in masking the unnecessary details and providing a simple, efficient and
flexible tool at an appropriate level of abstraction to explore various states of a
transport protocol under different network congestion conditions.
Although creating a phase transition diagram requires manual inspection of the
protocol specification but in the steps 1 and 2 of our framework we provide
guidelines and suggestions to facilitate this task. Other steps of our systematic
framework can be automated in order to accommodate large scale topologies and
traffic distributions. The test automation decision however depends on several
factors including financial, organizational and effectiveness impacts and is beyond
the scope of this dissertation.
In summary our systematic framework is based on creating severe congestion
scenarios in an arbitrary environment for the flows of the protocol type that is under
test for robustness and performance. We have explained the framework steps to
systematically create such scenarios and used TCP Tahoe and an arbitrary topology
and traffic distribution as a comprehensive example to demonstrate the operational
details of each step. Through a combination of modeling and simulation, we have
derived rules and heuristics which we have applied in order to determine the most
appropriate settings for the parameters of our scenarios. In this case, our heuristics
show that the duration of the malicious flows should be less than or equal to the
88
duration of each congestion interval. Furthermore, the duration of each congestion
interval is both analytically and heuristically shown to be related to the NULL
frequency of the target flows. Once the duration of malicious flows is determined
and fixed, the number of co-exiting malicious flows only depends on the spare
capacity of the attack path as well as the rate adjustment rule for the aggressive PO.
Since all malicious flows in a congestion scenario are of the same type and in the
same PO, there should not be any temporal gap between consecutive congestion
intervals.
In Chapter 4, we follow the steps of our framework to systematically test
performance and robustness of XCP protocol [18][28].
89
Chapter 4
XCP vs. XCP
In this chapter we apply our systematic testing framework to evaluate performance
and robustness of XCP [18][28] under severe congestion and identify vulnerabilities
of its congestion control and loss recovery mechanisms. This is the first and only
study that investigates the worst-case performance of XCP without any modification
or change in the protocol specification or implementation and with the assumption
that all control and estimation algorithms of XCP are correct and based on error-free
estimation of link bandwidth.
4.1 Background on XCP
XCP is a congestion control protocol developed for high bandwidth delay networks
that has received a significant amount of attention in the past few years due to its
attractive performance and robustness features. XCP has been shown to achieve high
utilization, small standing queues and fairness. XCP design rational is based on the
following three principles [18][28]:
90
1) Deploying a congestion signal in the form of a precise and explicit feedback
from network rather than an implicit and imprecise congestion signal such as
packet loss.
2) Adjusting the aggressiveness of sources proportional to feedback delay, in
order to establish stability.
3) Decoupling the dynamics of the controlled signal (aggregate traffic) from
dynamics of rapidly-changing parameters (number of flows traversing the
bottleneck link) in order to achieve robustness to congestion.
XCP decouples the efficiency control from fairness control and employs a
multiplicative increase and multiplicative decrease (MIMD) principle for efficiency
control and an additive increase multiplicative decrease principle (AIMD) for
fairness control.
The objective of XCP efficiency or utilization controller is to maximize bottleneck
link utilization while minimizing packet loss and queuing delay. In order to do so, in
every control interval an aggregate feedback is calculated and is allocated such that
when it is positive, the increase in transmission rate of all XCP flows is the same and
when it is negative the decrease in rate of an XCP flow is proportional to its current
rate.
Although theoretically packet losses are expected to be rare for XCP, it must deploy
loss recovery mechanisms and since current implementation of XCP congestion
control mechanisms in the network simulator is as a header option on top of TCP,
91
XCP naturally inherits the loss recovery mechanisms of TCP. XCP implementation
(and real network deployment) requires changes to both sending and receiving end-
hosts as well as the intermediate routers. Furthermore, XCP requires complete
cooperation of both end-hosts with the XCP router. Since co-existence of XCP and
non-XCP flows will lead to severe performance degradation for XCP flows,
designers of XCP suggest using separate queues at the router to isolate XCP traffic
from non-XCP traffic and use of fixed or dynamic weights for the service rate of
each queue [28]. Since the only available implementation of XCP in the ns-2
network simulator [8] currently uses fixed weights and separate queues for XCP and
non-XCP traffic, which effectively eliminate any interaction between XCP and non-
XCP flows, in this case study we consider a completely XCP-enabled network with a
traffic mix of only XCP flows.
With every data packet (aka segment) an XCP sender includes its estimation of RTT
and its current sending rate in the XCP header (as a TCP header option) and a
requested change in its sending rate and sends it to the first XCP-enabled router
along the end-to-end path to destination. Each XCP-enabled router updates the
feedback field in the packet header with the smaller of the following two values: the
per-packet feedback calculated in the previous control interval and the requested
change in sending rate (carried in packet header). Each XCP router thereafter updates
the feedback field in the packet header only if its calculated per-packet feedback
allocation is tighter than what the previous router allocated. Ultimately the packet
92
will carry the feedback allocation based on the decision of the bottleneck link router
to the XCP receiver, which will in turn copy the allocated feedback into the
corresponding field in an ACK packet going back to XCP sender. Thus, arrival of
ACK packets at XCP sender serves two purposes: it acknowledges the receipt of a
data packet and it conveys the information regarding the allowed sending rate for
that XCP flow. All these operations are performed on a per packet basis, therefore
XCP router does not maintain per flow state [18][28].
4.2 From protocol specification to abstract rules
Based on XCP protocol specification [18], Equation (1) shows how XCP efficiency
controller in an XCP router calculates the aggregate feedback in th control interval
for an outgoing link with capacity C:
... , 1
where 0.4 0.226 are XCP control parameters,
2
is the bottleneck link spare capacity during th interval, is the aggregate
feedback calculated in th control interval and applied during 1 th control
interval, is the total input traffic to the link during th control interval, is
the persistent queue size at the bottleneck router buffer during the th control
interval and is the duration of th control interval, which is the average RTT of
93
all flows that traverse this link. However is only exists when condition in
Equation (3) is satisfied in the th control interval:
. 3
Also we should note that is a range-limited variable and does not take negative
values or values greater than the maximum buffer size. Since router buffer sizes are
usually set to the bandwidth round trip delay product, therefore when and if
exists it will be in the following range:
0 . 4
In order for each XCP flow to achieve it max-min fair share of the bottleneck
bandwidth while the aggregate traffic converges to full utilization of bottleneck
bandwidth, XCP allows for 10% shuffling (simultaneous allocation and de-allocation
of traffic) when aggregate feedback becomes very small. Equation (5) shows how
this is determined:
max 0,. | | , 5
where is 0.1 and is the amount of shuffled traffic. Nevertheless the overall
change in traffic (with or without shuffling) in the next control interval is . This
is under the assumption that all XCP sources are cooperative, and hence will adjust
their sending rate according to their allocated shared of the aggregate feedback.
94
4.3 Phases of operation
According to our test framework we need to determine phases of operation based on
the abstract rules of the protocol to adjust transmission rate (or cwnd). XCP
specification, however, does not provide the rate adjustment rule as a closed function
of time but as a function of spare capacity and persistent queue in one control
interval. Therefore our first task in identifying the phases of operation is to obtain a
better understanding of throughput or sending rate as a function of time by deriving a
model for the aggregate throughput or the aggregate sending rate of XCP flows that
share a single bottleneck link.
4.3.1 Aggregate throughput model
Figure 4-1 illustrates a symmetric dumbbell topology with a single bottleneck link
between routers R0 and R1 and two groups of XCP flows that traverse this
bottleneck link in the opposite directions. This model allows us to consider effect of
ACK packets in either direction. Unlike XCP data packets, sending rate of ACK
packets are not controlled by the XCP aggregate feedback. Therefore it is important
to investigate the effect of ACK traffic in our throughput model. We use the notation
and parameters defined in the previous section and use the index F to denote forward
traffic and the index R to denote reverse traffic. Assuming all forward and reverse
XCP flows have the same RTT and packet and ACK size, start at the same time and
update their throughput or sending rate at the beginning of each control interval
95
Figure 4-1: Symmetric dumbbell topology with XCP flows and cross traffic
based on the feedback calculated in the previous control interval, the amount of
traffic generated in one control interval in the forward direction (from R0 to R1)
is and in the reverse direction (from R1 to R0) is ′ . XCP controllers divide
time to control intervals of duration (which is the average RTT of all flows that
traverse that link). For simplicity we assume is constant and equal to RTT.
Therefore the aggregate throughput in each control interval is given by
in the
forward direction and
′ in the reverse direction.
The following equations are obtained based on Equations (1) through (4) for the
forward bottleneck link where is the ratio of ACK packet size to data packet size:
. 6
. 7
96
. 8
. 9
Similarly for the reverse bottleneck link we have:
′
. 10
′ . 11
′
. 12
. 13
Equation (5) through (13) can be further simplified to obtain the following system of
equations for forward and reverse bottleneck links:
1 . 14
. 15
1 . 16
. 17
After taking Laplace transform, we obtain the following system of equations for
forward and reverse bottleneck links:
1
. 18
97
. 19
1
. 20
. 21
After substituting for and in and we obtain:
1
. 22
1
. 23
Solving the above system of equations for and yeilds:
. 24
After taking the inverse Laplace transform, we have:
2 12 1 1 . 25
2 12 1 1 . 26
In above equations and are unit step functions.
98
In order to ensure that boundary conditions 0 and 0 are met, the
above equations are modified such that coefficient of becomes zero when there is
no persistent queue, i.e., when the sending rate in one control interval is less than
bottleneck link capacity:
2 12 1 max 0,
|
|
1 max 0,
|
|
. 27
2 12 1 max 0,
|
|
1 max 0,
|
|
. 28
Figure 4-2 shows the steady state aggregate throughput is about % less when cross
traffic exists. In the rest of this chapter, we assume 0.
Figure 4-3 compares the aggregate throughput model obtained by simulation and our
above model. As can be seen there is a small time shift of about 2 control intervals
(or 2RTTs) between the two curves. This corresponds to the initial RTT for
handshaking and establishing an XCP connection and a second RTT for the XCP
controller to make the first round of parameter estimations.
99
Figure 4-2: Effect of cross traffic on aggregate throughput
Figure 4-3: Comparing aggregate throughput
obtained by modeling and simulation
100
Our model does not take into account this initial delay. It is otherwise precise with
less than 1% steady state error. Figures 4-4 and 4-5 show how the aggregate
throughput varies for different control intervals and different bottleneck link
capacities.
It should be noted that when we vary the value of control interval or the bottleneck
link capacity, we also adjust all maximum buffer sizes, which is set to as
recommended by networking literature and is a justified common practice in
simulations and real networks.
4.3.2 Analysis of XCP MIMD rule
Figure 4-3 shows how XCP efficiency-controller adjusts the aggregate sending rate
or throughput (and therefore the aggregate congestion window) of XCP flows that
share the same bottleneck link. As our model shows, between the 3
rd
and the 4
th
control interval XCP efficiency-controller allocates bandwidth and increases the
sending rate of XCP flows most aggressively. Furthermore it appears that in the
subsequent control intervals XCP efficiency controller is less aggressive and more
conservative in allocating bandwidth and may increase or decrease the aggregate
flow rate.
Figure 4-4 shows that as control interval decreases, the feedback delay decreases
therefore XCP efficiency controller allocates bandwidth at a faster pace and
consequently the convergence time reduces. While this may seem ideal from a
F
Figure 4-4
Figure 4-5: E
4: Effect of c
Effect of bo
control inte
ottleneck cap
rval
pacity
101
102
control theory stand-point, from a networking point of view this means a large burst
of input traffic is sent during the 3
rd
and the 4
th
control interval towards the
bottleneck link, leading to a rather large queue at the corresponding buffer.
Figure 4-5 shows as increases XCP efficiency controller allocates bandwidth more
conservatively between the 5
th
and the 10
th
control interval. It also seems that the
steady state error reduces as increases.
While this analytical modeling provides us with good insights about the aggregate
behavior of XCP flows dictated by XCP efficiency controller and a qualitative view
of different phases of operation at the aggregate level (MIMD), it does not identify
the triggers that cause changes from an aggressive phase to a conservative phase and
vice versa. Nor does it provide sufficient quantitative insight about the phases of
operation. Therefore to obtain a quantitative understanding of XCP phases of
operation and identify these triggers as required by our testing framework, we need
to further explore the AIMD bandwidth allocation rules enforced by the XCP
fairness controller.
4.3.3 Analysis of the XCP AIMD rules
Depending on aggregate XCP flow rate in a control interval, XCP fairness controller
uses different AIMD rules to fairly distribute link bandwidth among all XCP flows
that traverse that link. In other words, the choice of the AIMD rule for fairness
control depends on the aggregate flow rate allocated by the MIMD rule. Based on
103
XCP protocol specification [18], we derive an expression for each AIMD rule as
well as the corresponding valid range for the aggregate flow rate. This valid range
and the corresponding AIMD rule together mark a phase of operation. We can
therefore identify all phases of operation and the triggers that cause transition from
one phase to another.
Let be the allocated change in sending rate of flow in th control
interval (consumed in the next control interval) then the allocated change in
congestion window of flow in th control interval is given by
.
Therefore congestion window of flow in the next control interval is
1 max
,
.
In addition to the following MIMD rule
... ,
XCP protocol specification and design paper [18][28] define the net per-packet
feedback or
.
as the difference between a positive feedback
, . ∑
.
.
104
and a negative feedback
, . ∑
,
where is the packet size,
is the RTT of flow in th control interval,
max 0,. | | ,
0.1 and is the number of packets seen by XCP router in one control interval.
Assuming the same packet size s for all flows, the overall increase in congestion
window of flow in th control interval denoted by
is:
, . ∑
. , 2 9
and the overall decrease in congestion window of flow in th control interval
denoted by
is:
, . .
. .
. 30
Therefore the overall change in congestion window of flow in th control interval
is:
, .
. . ∑
, . .
. .
. 31
105
Given that 0.4 . 0.4 0.226 may take a positive or negative
value and consequently max 0,0.1 | | may become either zero
or a positive value, we can derive four different expressions for
.
Considering the initial state and the timeout phase, we can identify a total of six
phases of operation for an XCP flow.
Phase-1 is the initial state when 0 therefore throughput is zero (Figure 4-6).
Phase-2 corresponds to 0 0 and is determined as follows:
If 0 0 , then 0.8 0.452 ,
Since q1 , then 0.862 0.3111 .
The valid range for y[k] is marked by the red region in Figure 4-7 and
.. . .
. . ∑
. 32
Phase-3 corresponds to 0 0 and is determined as follows:
If 0 0 , then 0.8 0.452 .
Since q1 , then 0.862 0.3111 .
The valid range for y[k] is marked by red region in Figure 4-8 and
. .
. . ∑
. .C. .
.
. .
. 33
Phase-4 corresponds to 0 0 and is determined as follows:
If 0 0 , then 1.33 0.753 .
106
Since q1 , then 1.188 0.431
The valid range for y[k] is marked by red region in Figure 4-9 and
.. . .
. . ∑
. . .
. .
. 34
Phase-5 corresponds to 0 0 and is determined as follows:
If 0 0 then, 1.33 0.753 .
Since q1 , then 1.188 0.43 1 .
The valid range for y[k] is marked by red region in Figure 4-10 and
.. . . .
. .
. 35
Phase-6 is the timeout phase when throughput is zero but 0 (Figure 4-11).
Figure 4-6: Phase-1
107
Figure 4-7: Phase-2
Figure 4-8: Phase-3
Fi
Fig
igure 4-9: P
gure 4-10: P
Phase-4
Phase-5
108
109
Figure 4-11: Phase-6
4.3.4 Phase transition diagram
Having examined different phases of operation, we conclude that triggers that cause
transition from one phase of operation to another are manifested by a change in value
of . In other words, when the aggregate traffic on the link (as observed by the
XCP router) shifts from one range of values to another, XCP fairness controller
transitions to another phase of operation where it uses a different AIMD rule to
determine how bandwidth can be fairly allocated to XCP flows on that link. Changes
in value of for a particular link can occur as a result of a flow arrival event
(denoted by with or without buffer overflow and packet loss (denoted by a burst
of packets greater than the buffer size or ) or a flow departure event
(denoted by ) on the corresponding link. A flow arrival event is characterized by
arrival of one or more new XCP flows to a particular link in the network. Similarly a
110
Figure 4-12: Phase transition diagram
flow departure event is characterized by departure of one or more existing XCP
flows from a particular link on the network.
Figure 4-12 shows a phase transition diagram. This diagram illustrates how the
above mentioned triggers can cause phase transitions. It should be noted that a phase
transition among Phases 3 and 5 requires at least one control interval due to the fact
that XCP fairness controller needs at least one control interval to measure the
aggregate input traffic on its link in order to select one of the four AIMD rules. A
transition from Phase-1 to Phase-2 needs one control interval and an extra RTT for
111
establishing the XCP connection and initial handshaking. Transition from Phases 2
through 5 to Phase-6 also requires at least one control interval but a transition from
Phase-6 to Phase-2 takes one RTO which can be in the order of seconds.
Based on this diagram (and following the next step in our test framework presented
in Chapter 3) we create severe congestion scenarios that cause a phase transition
from any state to the idle state (i.e., time out). Regardless of the starting phase,
ideally target XCP flows must be all forced to end in Phase-6 in our adversarial
congestion scenarios in order to have a zero throughput. It is worth mentioning that
in Phase-1, target XCP flows do not have a non-zero throughput hence selecting
Phase-1 as the initial phase for target flows in an adversarial scenario is meaningless.
Therefore the starting phase of target flows in our adversarial scenarios is limited to
Phases 2 through 5.
Although the objective function of our framework is to minimize throughput of
target flows (which can be achieved by reaching Phase-6) and although theoretically
it is possible to reach Phase-6 from any phase, we identify a subset of all such
transitions that require smaller numbers of malicious flows as preferred candidates
for creating an adversarial congestion scenario. The reason for this is twofold. First
we prefer our adversarial scenarios to be scalable when the number of target XCP
flows increases. Second we prefer our adversarial congestion scenarios to be as
undetectable as possible. As a corollary, this makes our adversarial scenarios
attractive as DDoS or DoS attacks.
112
4.3.5 Configuring temporal, spatial and scaling parameters of
malicious flows in adversarial scenarios
Let and
and
denote number, congestion window and RTT of
malicious flows in one group during one congestion interval. For the reasons
mentioned in Chapter 3, we select our malicious flows from the same type of flows
as target flows. Therefore in this case study, our malicious flows are also XCP flows.
This naturally adds to the challenge of creating a successful adversarial scenario,
because after the initial two control intervals, our malicious flows will follow the
orders from the XCP router to adjust their transmission rates. At the same time, this
makes our scenarios even more interesting, since our malicious flows are in fact fully
cooperative and congestion responsive. Consequently the effective duration of
malicious XCP flows is of the order of two control intervals from the start time of
the flows. Since the duration of a control interval is given by mean RTT of all XCP
flows seen by the XCP controller on that link, the duration of malicious XCP flows
can be less than 2
or greater than 2
, depending on the path of malicious
XCP flows. In the rest of this chapter we assume the only flow arrivals and
departures are controlled arrival and departure events in an attempt to create
adversarial congestion scenarios. We also assume that all flow arrivals and
departures happen at the beginnings of control intervals.
113
4.3.5.1 Creating a severe congestion interval
To create a severe congestion interval, we send a group of malicious XCP flows to
the XCP buffer to fill it up in at most one control interval. If it takes any longer than
one control interval to fill the buffer, XCP controller will consider the built-up queue
as a persistent queue and calculate the aggregate feedback such that it will be drained
in subsequent intervals. Therefore we must limit the time to fill the buffer to at most
one control interval. The number of malicious flows required for this purpose is
therefore determined by the buffer size, bandwidth of the corresponding outgoing
link, the duration of control interval and RTT of malicious flows. The required
number of malicious flows to create a buffer overflow therefore can be obtained in
this way:
, where is the link capacity, is the buffer size [28][39],
is the control interval (the time it takes to completely fill the buffer ), is the packet
size (or the MSS) and
is the congestion window for XCP malicious flows.
Therefore
. 36
4.3.5.2 Creating a severe congestion scenario
As discussed in Chapter 3, to create a severe congestion scenario, we need to
maximize the queuing delay (and therefore the end-to-end delay) and probability of
114
multiple packet loss for the target flows to waste network resources and bring the
throughput of target flows to zero. Therefore we need to create the above-explained
severe congestion interval for each buffer along the path of target XCP flows using
malicious XCP flows.
In order to create and maintain a full buffer in each XCP router along the path of
target flow, we send several groups of malicious XCP flows on each link along the
path of XCP target flows in consecutive rounds of congestion intervals. The first
group on each link will fill up the buffer and subsequent groups will ensure that the
buffer remains full. In other words, we create short tall bursts of traffic using XCP
flows to fill the buffer and keep it full. Recall that effective duration of each group of
malicious flows is of the order of two control intervals and since during their first
RTT (during handshaking) no data packets are generated, we need to have a temporal
overlap between successive groups of malicious flows such that handshaking time of
the next round of malicious flows overlap with last RTT of current round of
malicious flows. This ensures that in the subsequent congestion intervals malicious
traffic intensity stays high enough to keep the buffers full during each control
interval.
We may refer to such a severe congestion scenario alternatively as adversarial
congestion scenario. Creating and maintaining full buffers by itself is a necessary but
not a sufficient condition for having multiple packet drops. Therefore in the next
section, we investigate the effect of phase of operation of target XCP flows on
115
success of our adversarial scenarios in creating multiple packet drops for each XCP
flow.
4.3.5.3 Effect of phase of operation on success of adversarial
congestion scenarios
In general, the probability of packet drops increases with burstiness of traffic. In
order to maximize the probability of multiple packet drops, during our adversarial
scenarios XCP target flows must ideally be in a phase of operation where the
corresponding AIMD rule allocates bandwidth such that the target XCP traffic
arrives in large bursts. We were able to identify such a phase by using our aggregate
throughput model (the MIMD rule) presented in Section 4-3-1 and by the first AIMD
rule. The AIMD rule for Phase- 2 shows that per-flow allocated feedback only has a
positive term, which means that all XCP flows are instructed by XCP router to
aggressively increase their congestion windows. In other words, in this phase XCP
router allocates the largest fraction of link spare bandwidth to XCP target flows in
each control interval. Since XCP flows have not yet reached self-clocking in this
phase, a large aggregate feedback (which happens during the 3
rd
and 4
th
control
intervals) generates a large burst of XCP packets.
Unlike Phase-2, in Phases 3, 4 and 5 the aggregate feedback is either negative or a
very small positive number, thus it does not result in bursty XCP traffic. Furthermore
in these three phases, XCP flows have already reached self-clocking, therefore their
116
data packets are spaced and do not necessarily form bursts (unless due to ACK
compression as a result of cross traffic). Therefore in our adversarial scenario we
prefer the target XCP flows to be either in Phase-2 or to make a transition to Phase-2
before reaching Phase-6. This will maximize the probability of creating multiple
packet drops.
As shown by phase transition diagram, a transition to Phase-2 from Phases 3, 4 and 5
is possible by either a flow departure event or a flow arrival event resulting in a
subset of XCP flows to lose packet and reduce rate by invoking fast retransmit or
going to timeout. Consequently the aggregate traffic is reduced to region R2.
Therefore a transition to Phase-2 is inevitable for all flows that did not lose packets,
or lost packets but did not go to time out. This could be outcome of a partially-
successful adversarial scenario (i.e., one that does not create the severe congestion).
To create a transition from Phases 3, 4 or 5 to 6 by a flow departure event, we first
need to have a flow arrival event without necessarily causing any buffer overflows
and packet drops. This causes the XCP fairness controller to de-allocate the
bandwidth from existing target XCP flows and allocate it to our malicious flows.
Depending on the corresponding AIMD rule the convergence time to fairness varies.
While Phase-5 may seem a better candidate in terms of convergence time due to its
negative aggregate feedback (and therefore more aggressive rate-reduction
principle), this also means that our malicious flows like the rest of flows will receive
a negative feedback and thus cannot increase their rate. Therefore Phases 3 and 4,
117
where XCP fairness controller performs a 10% shuffling and gives new flows a
higher chance of joining the network and obtaining bandwidth, are better candidates
for this purpose. It appears that Phase-4 may be a better candidate than Phase-3. This
is due to the fact that in this phase while fairness controller allows new flows to join
by performing shuffling, the aggregate feedback to XCP target flows is negative and
arrival of new flows will cause a greater negative feedback to aggregate XCP flows.
There is however the possibility that our flow arrival event causes a transition from
Phase-4 to Phase-5 in as fast as one control interval which is undesired for the
reasons mentioned earlier.
Given this tradeoff, we identify Phase-3 as the better candidate. This is due to the
fact that in Phase-3, it is possible to design a flow arrival event that ensures either no
transition to any other phase or a single transition to Phase-4. In the next section, we
investigate both cases and determine the required number of malicious flows as well
as the overall time it takes to transition from Phase-3 to Phase-2 and the overall time
it takes to transition from Phase-3 to Phase-4 then to Phase-2. It should be noted that
Phase-3 and Phase-4 both correspond to XCP steady state. However we identified
Phase-3 as the best candidate among Phases 3, 4 and 5 for either a direct or an
indirect transition to Phase-2 and will focus on creating adversarial scenarios that
take advantage of such transitions.
118
4.3.5.4 Determining number of malicious XCP flows and
transition time to directly or indirectly reach Phase-2 from
Phase-3
By examining both the aggregate MIMD rule and the AIMD rule for Phase-4, we
notice that the number of malicious flows required for this transition depends on the
number of existing target flows.
XCP fairness controller allocates positive bandwidth such that all flows achieve the
same throughput regardless of their RTTs. Since we intend to create a transition to
Phase-2 by having a flow departure event, any ratio of the number of malicious flows
to the number of target flows greater than 1 to 1, is sufficient to create a transition to
Phase-2 after departure of malicious flows.
The higher the ratio, the more the spare capacity and the burstier the bandwidth
allocation in Phase-2. For instance a ratio of 9 to 1 means that after departure of
malicious flows, about 90% of bottleneck capacity is now spare capacity which can
be allocated to target XCP flows during Phase-2 quite aggressively and create a
highly bursty traffic.
On the other hand a higher ratio implies a larger of number malicious flows,
therefore a higher cost and a higher chance of detection for our adversarial scenarios
when they are used as DoS or DDoS attacks. It should also be noted that number of
malicious flows should not be so high as to cause a transition to Phase-5.
119
If our flow arrival event can cause at least one packet drop for each target XCP flow
in the aggregate (in Phase-3) and even if each target XCP flow detect this packet
drop not by timeout but due to receipt of duplicate acknowledgements and reduce its
rate by 50% (fast retransmission), assuming that all malicious flows leave the
network right after causing packet drops, the aggregate traffic rate will be reduced to
a value in R2, i.e., the region corresponding to Phase-2. However since Phase-3 does
not cause bursty XCP traffic, it is unlikely to cause packet drops for all XCP flows
and the probability of creating a packet drop for all target XCP flows decreases as
the number of target XCP flows in the aggregate increases. Therefore it is not
possible to ensure a transition from Phase-3 to Phase-2 by relying on probabilistic
packet drops in this phase. Thus we design a flow arrival event such that our
malicious flows (that are fully cooperative with the XCP router) can secure a
significant share of the link capacity, allocated to them by XCP fairness controller.
It should be noted that our flow arrival event must be such that it does not cause a
transition to Phase-5. In other words the overall XCP traffic after the flow arrival
event must still be in R3 or R4.
The amount of time we keep the malicious flows in the network depends on the time
that it takes for both target flows and malicious flows to reach their max-min fair
rate. During this time interval, the fairness controller tries to enforce fairness among
all flows and the efficiency controller tries to maintain a high utilization therefore the
aggregate rate oscillates around corresponding bottleneck link bandwidth.
120
Next we determine the number of malicious flows required for this transition and the
time that it takes for this transition to occur by examining the following two
boundary conditions corresponding to Phase-3 and Phase-4.
Let
and
denote the total malicious traffic corresponding to our flow arrival
event and total target XCP flows. Since target XCP flows are in Phase-3 when
malicious XCP flows arrive, is in R3 and since we want to stay in either Phase-3
or Phase-4 and avoid transition to Phase-5 as a result of our flow arrival event the
overall traffic must be either in R3 or R4.
Let ,
and
denote number, initial congestion window and RTT for
malicious XCP flows participating in the flow arrival event. Then:
and since
2 , the number of malicious flows
used in our flows arrival event is obtained as follows:
.
. 37
As mentioned before, the higher the number of malicious flows, the more the
allocated bandwidth to them as a result of max-min rate allocation performed by
XCP fairness- controller. Therefore once they leave the network, the link bandwidth
is highly underutilized and XCP efficiency-controller quickly and aggressively
allocates the spare capacity to target XCP flows, consequently creating a highly
bursty traffic.
121
Thus we select such that
.
. 38
This provides us with some margin of error in order to avoid a transition to Phase-5.
We can further conclude that the arrival of malicious XCP flows will surely
cause a transition from Phase-3 to Phase-4 in one control interval. We can then use
the AIMD rule for Phase-4, i.e., Equation (34), recursively to determine the time it
takes both malicious and target flows to reach their max-min fair rate. The overall
lifetime of the malicious flows used in the flow arrival event is as follows:
=
,
where
is the initial RTT to establish the malicious XCP connections
(handshaking time), d is the single control interval it takes to transition from Phase-3
to Phase-4 and is the number of control intervals it takes to reach the max-min fair
rate (depending on how many iterations over Equation (34) is required). Therefore
we terminate XCP malicious flows after seconds and immediately start to create
an adversarial congestion scenario as described in Section 4.3.5.2. At this point all
target XCP flows are in Phase-2 and bottleneck link has suddenly become highly
underutilized. This will prompt XCP efficiency-controller to allocate large fractions
of this spare capacity which will result in bursty XCP traffic. Subsequently, we
conduct an adversarial congestion scenario to cause a transition from Phase-2 to
Phase-6 for target XCP flows.
122
In summary, whether we start our adversarial congestion scenarios when target XCP
flows are already in Phase-2 or after they are forced to make a transition to Phase-2,
we can prolong the effect of our congestion scenarios by periodically toggling
between Phase-2 and Phase-6. In other words, once target XCP flows are in Phase-2,
consecutive rounds of adversarial congestion scenarios can force them to toggle
between Phase-2 and Phase-6 indefinitely and duration of each period is
approximately determined by 2 .
4.3.5.5 Selecting path of malicious flows
As explained in Chapter 3, we select an end-to-end path for malicious flows in the
adversarial scenarios based on the following criteria: accessibility, type (direct or
indirect), hop-count or length, load-level, proximity of the shared link between target
flows and malicious flows to destination and overall rate of malicious flows.
In short, path of malicious flows either directly shares a link with path of target flows
or has an indirect contact with path of target flows by two degrees of separation
through intermediate paths. Therefore our congestion scenarios are either direct
scenarios or indirect scenarios.
Naturally, we prefer to have a direct path for our malicious flows since it provides
direct control with less dynamics (lower dependence on background traffic).
However, if links of the path of target flows are not directly accessible or there is a
concern about detection of our direct congestion scenario, we try to find a feasible
123
indirect path for malicious flows. In this case we use background flows on the
intermediate paths to serve as an accessory to our malicious scenarios. As a result,
they may be adversely or favorably affected. As before, if no feasible direct or
indirect path exits, our scenarios will not be feasible for the given topology and
traffic distribution.
Equations (36) and (38) show how number of malicious flows depends on RTT of
their path. As shown in Chapter 3 (example of TCP Tahoe), the spare capacity of all
the links on path of malicious flows upstream or downstream of the shared link with
the target flows must be greater than capacity of the shared link (conservative load
condition). If no single path can accommodate the amount of XCP malicious traffic,
it is possible to distribute the XCP malicious flows on several incoming links to the
XCP router at the bottleneck link. However, it is preferred that all these paths share
the same RTT with each other and the same bottleneck link with the target XCP
flows. This may require precise synchronization among all sources of malicious XCP
flows. Therefore, this is the least-desired option in path selection.
4.3.5.6 Simulation results
Figures 4-13 and 4-14 show the results of packet level simulations in ns-2 [8] to
verify our findings through the application of framework. Simulation topology is
shown in Figure 4-1 with 10 and 0.06 . The target XCP flows
originate at n0 and end at d0. Malicious XCP flows originate at d1 and end at n1.
124
The selected path of malicious flows is a 3-hop direct path that shares the R0-R1 link
with the path of target flows.
Figure 4-13 shows an adversarial scenario against target XCP flows that are in
Phase-2 is highly effective and forces XCP target flows to make a transition to
Phase-6 (i.e., zero throughput during time out). After approximately 2
seconds, Target XCP flows exit the loss recovery mode and change their phase of
operation from Phase-6 to Phase-2 and another round of adversarial congestion
scenario forces them again to make a transition to Phase-6 (a second time out).
Figure 4-14 shows an adversarial scenario against the target XCP flows while they
oscillate between Phase-3 and Phase-4, i.e., in steady state. As a result of a flow
arrival event at 1 , XCP fairness controller reduces throughput of the XCP
target flows, to fairly allocated bandwidth to new flows. However, at 4 , as a
result of a flow departure event, link R0-R1 becomes highly underutilized and XCP
efficiency controller aggressively allocates the spare capacity to the target XCP
flows (transition to Phase-2), at this time we conduct two consecutive rounds of
adversarial congestion scenarios which causes target XCP flows to toggle between
Phase-2 and Phase-6 (time out). In both cases, once we terminate the adversarial
scenarios, the target XCP flows make a transition to Phase-2 (the transient state) then
Phase-5 (the overshoot after the transient state) and settle in steady state where they
oscillate between Phase-3 and Phase-4. This phase transition profile is also
consistent with our aggregate throughput model shown in Figure 4-3.
125
Figure 4-13: Adversarial scenarios against target XCP flows in Phase-2
Figure 4-14: Adversarial scenarios against target XCP flows in
Phase-3 or Phase-4
126
4.4 Conclusions
In this case study, we applied our framework systematically to identify
vulnerabilities of congestion control and loss recovery mechanisms of XCP under
severe congestion scenarios.
We derived a time-scale model for the aggregate XCP throughput based on the
MIMD rules of XCP efficiency controller and verified the accuracy of our model
through simulation. By examining this model we were able to qualitatively identify
XCP phases of operation.
In order to have a microscopic and quantitative understanding of XCP behavior,
starting with the abstract rules for XCP specification [18], we derived four different
AIMD rules for XCP fairness controller and were able to mathematically
characterize a total of six phases of operation (including the initial/final state and
timeout state). We also derived bounds and valid ranges of values for the aggregate
throughput in each phase of operation. We then identified triggers that cause
transitions from one phase to another and developed a phase transition diagram.
Based on this diagram, we explored all possible ways to reach Phase-6 or the time
out state from any other phase, and after several rounds of elimination we identified
the two most successful scenarios that can severely degrade throughput of XCP
flows by causing a phase transition from Phase-2 (the transient state) to Phase-6 (the
time out state) or from Phase-3 or Phase-4 (the steady state) to Phase-2 and then
127
Phase-6. Furthermore, we determined the criteria for choosing the number, the
duration and paths for our malicious flows in each scenario. Finally we verified our
findings through packet level simulations in ns-2 [8] network simulator.
By applying our systematic framework we identified the following vulnerabilities in
XCP congestion control and loss recovery mechanisms:
1-Through carefully orchestrated arrival and departure of malicious flows, XCP
fairness-controller can be manipulated into allocating bandwidth to malicious flows.
This means that even without an adversarial congestion scenario leading to packet
drops, a malicious source can send multiple XCP flows instead of one and practically
obtain more than its fair share without violating XCP specification rules or any
change to the protocol stack.
2- It is possible to force XCP to transition from any phase to Phase-2 during which
XCP efficiency-controller allocate bandwidth quite aggressively, leading to highly
bursty XCP traffic, which significantly increases the probability of multiple packet
drops in a highly dynamic network. As our aggregate throughput model shows, this
effect is more pronounced as control interval (feedback delay) decreases. Also, ACK
compression as a result of cross traffic can lead to further burstiness of forward
traffic thus increasing probability of multiple packet drops.
3-Our aggregate throughput model captures the effect of reverse traffic ACK packets
on the forward traffic data packets as slight steady-state link underutilization for a
128
symmetric topology. However in an asymmetric topology the steady state link
underutilization can be significant due to the following reasons: the ratio of reverse
ACK traffic to forward data traffic can be non-negligible for links with asymmetric
delay and bandwidth properties. Since the rate of ACK traffic cannot be regulated by
XCP fairness and efficiency controllers, XCP ACK traffic appears as non-responsive
flows co-existing with responsive XCP data packets. This can lead to XCP router
misestimating the aggregate feedback and entering an incorrect feedback loop.
4-Current XCP implementation inherits the same loss recovery mechanisms of TCP,
therefore under severe congestion, XCP performance and robustness are reduced to
the same severe level as TCP (i.e., effectively zero throughout).
5- XCP traffic can be burstier than TCP traffic due to use of a more aggressive
MIMD rule for efficiency control. Therefore in a highly-dynamic network
environment with fast arrival and departure of short-lived (web-like) traffic where
the amount of spare capacity rapidly changes, XCP efficiency-controller attempts to
aggressively adapt to changes in aggregate dynamics and allocates the spare capacity
to existing flows, thus the aggregate throughput becomes more bursty and oscillatory
and the probability of multiple packet-drops increases.
Eliminating vulnerabilities of XCP fairness-controller requires maintaining per flow
state which undermines one of the attractive features of XCP, namely not
maintaining per flow state.
129
Eliminating vulnerabilities associated with XCP efficiency-controller (creating
bursty traffic), requires reducing the level of aggressiveness for bandwidth allocation
which again undermines another attractive feature of XCP, namely its ability to
converge rapidly to high utilization.
In conclusion, while XCP outperforms TCP in terms of utilization and robustness in
steady state, it exhibits similar worst-case performance and robustness profile as
TCP while being vulnerable to new DDoS and DoS attacks.
4.5 Potential future extensions
While our study provides a systematic approach to test performance and robustness
of transport-protocol with congestion control, it leaves the following avenues open
for future research:
(a) To study and quantify probabilistic occurrences of our congestion-scenarios in
the Internet, given the increasing trend of web-like short and parallel traffic (i.e.,
groups of short-lived flows), bulk transfers and streaming multi-media (i.e., long-
lived flows).
(b) To study the pattern and degrading effects of our adversarial scenarios in the
context of DoS and DDoS attacks in order to:
1. Develop specific and precise detection mechanisms.
2. Monitor and protect the exposed accessible points of entry.
130
3. Seek effective counter-measures.
(c) Employ the findings of our study as guidelines to seek better alternatives for
congestion control algorithms and loss recovery mechanisms in order to eliminate
the identified vulnerabilities of the transport protocols with respect to their
congestion control and loss recovery mechanisms.
(d) Investigate reverse-path severe congestion scenarios and provide a cost-benefit
comparative analysis between reverse-path and forward-path severe congestion
scenarios.
(e) Examine the temporal correlation of end-to-end delay variations for flows that
share congested links in an attempt to develop a better understanding of the effect of
shared congestion on spatial propagation of congestion.
131
Bibliography
[1] F. Abrantes and M. Ricardo, “XCP for Shared-Access Multi-Rate Media,” ACM
Sigcomm Computer Communication Review, Vol. 36, No.3, July 2006.
[2] M. Allman, V. Paxon. and Stevens, "TCP Congestion Control", RFC 2581,
http://www.ietf.org/rfc/rfc2581.txt, April 1999.
[3] M. Allman and V. Paxon,”On estimating end-to-end network path properties,”
ACM Sigcomm, September 1999.
[4] D. Bertsekas and R. Gallagher, “Data networks,” Prentice Hall, December 1991.
[5] M. Blumenthal, David D. Clark, “Rethinking the design of the Internet: The end
to end arguments vs. the brave new world,” ACM Transactions on Internet
Technology, August 2001.
[6] B. Braden, et al., "Recommendations on Queue Management and Congestion
Avoidance in the Internet," RFC 2309, April 1998.
[7] L. Brakmo, S. O'Malley, and L. Peterson, “ TCP Vegas: New techniques for
congestion detection and avoidance,”ACM Sigcomm, August1994.
[8] L. Breslau, D. Estrin, K. Fall, S. Floyd, J. Heidemann, A. Helmy, P. Huang, S.
McCanne, K. Varadhan, Ya Xu, Haobo Yu, “Advances in network simulation,”
IEEE Computer, Vol. 33, May 2000.
[9] R. Bush; D. Meyer, “Some Internet Architectural Guidelines and Philosophy,”
Internet Engineering Task Force, http://www.isi.edu/in-notes/rfc3439.txt,
December 2002.
[10] C. Caini, “TCP Hybla: a TCP enhancement for heterogeneous networks,”
International Journal of Satellite Communications and Networking, Vol. 22,
August 2004.
[11] C. Casetti, M. Gerla, S. Lee, S. Mascolo and M. Sanadidi, “TCP with Faster
Recovery,” Milcom, October 2000.
[12] S. Cheng et al., “Microscopic Time-scale Analysis of XCP,” IEEE ICCT,
Novmber 2006.
[13] D. Clark, "Window and Acknowledgement Strategy in TCP,” RFC 813, July
1982.
132
[14] D. Comer, “Internetworking with TCP/IP: Principles, Protocols and
Architecture,” Pearson Prentice Hall 2005.
[15] M. Dye, “Network Fundamentals: CCNA Exploration Companion Guide (Cisco
Networking Academy Program),” Cisco Press, November 2007.
[16] S. Ebrahimi-Taghizadeh, A. Helmy and S. Gupta, “A Systematic Simulation-
based Study of Adverse Impact of Short-lived TCP Flows on Long-lived TCP
Flows,” ACM Sigcomm, September 2004.
[17] S. Ebrahimi-Taghizadeh, A. Helmy and S. Gupta, “TCP vs. TCP: a Systematic
Study of Adverse Impact of Short-lived TCP Flows on Long-lived TCP Flows,”
IEEE Infocom, March 2005.
[18] A. Falk et al. “Specification for the Explicit Control Protocol (XCP)”, Internet-
Draft, http://www.isi.edu/isi-xcp/docs/draft-falk-xcp-spec-03.txt, July 2007.
[19] K. Fall and S. Floyd, “Simulation-based Comparisons of Tahoe, Reno, and
SACK TCP,” ACM Computer Communications Review, Vol. 26, No. 3, July
1996.
[20] S. Floyd, “ The NewReno Modification to TCP's Fast Recovery Algorithm,”
RFC 2582, April 1999.
[21] S. Floyd, “HighSpeed TCP for Large Congestion Windows,” RFC 3649,
December 2003.
[22] S. Floyd and V. Jacobson, “Random Early Detection gateways for Congestion
Avoidance,” IEEE/ACM Transactions on Networking Vol.1 No.4, August 1993.
[23] B. Forouzan, “Data Communications and Networking,” 4th Edition Mcgraw
Hill, 2006
[24] L. Guo and I. Matta , “The War between Mice and Elephants,” IEEE
ICNP, November 2001.
[25] V. Jacobson, “Congestion Avoidance and Control,” ACM Sigcomm, September
1988.
[26] C. Jin, D. X. Wei, and S. H. Low, “FAST TCP: motivation, architecture,
algorithms, performance,” IEEE Infocom, March 2004.
[27] Kantawala and J. Turner, “Queue Management for Short-Lived TCP Flows in
Backbone Routers,” High-Speed Symposium, IEEE Globecom, November
2002.
133
[28] D. Katabi, M. Handley, and C. Rohrs, “Congestion Control for High
Bandwidth-Delay Product Networks,” ACM Sigcomm, August 2002.
[29] D. Katabi, “XCP Performance in the Presence of Malicious Flows,” PFLDnet,
February 2004.
[30] F. Kelly, “Fairness and stability of end-to-end congestion control,” European
Journal of Control, Vol. 9, September 2003.
[31] T. Kelly, “Scalable TCP: Improving Performance in Highspeed Wide Area
Networks,” Computer Communication Review Vol. 33 No. 2, April 2003.
[32] C. Kozierok, "The TCP/IP Guide: A Comprehensive, Illustrated Internet
Protocols Reference", No Starch Press, March 2005.
[33] J. Kurose, K. Ross, “Computer Networking: A Top-Down Approach,” 4th
Edition, Pearson Addison Wesley, 2008.
[34] Kuzmanovic and E. Knightly, “Low-Rate TCP-Targeted Denial of Service
Attacks,” ACM Sigcomm, August 2003.
[35] D. Lopez-Pacheco and C. Pham,“Robust Transport Protocol for Dynamic High-
Speed Networks: enhancing the XCP approach,” IEEE ICON, November 2005.
[36] S. Low et al., “Understanding XCP: Equilibrium and Fairness”, IEEE Infocom,
March 2005.
[37] S. Low, L. Peterson, and L. Wang, “Understanding Vegas: a duality model,”
ACM Sigmetrics Performance Evaluation Review Vol. 29 No. 1, June 2001.
[38] S. Low, “A duality model of TCP and queue management algorithms,”
IEEE/ACM Transanctions on Networking, Vol.11 No.4, August 2003.
[39] D. Medhi and K. Ramasamy, “Network Routing: Algorithms, Protocols, and
Architectures,” The Morgan Kaufmann Series in Networking, Elsevier 2007.
[40] M. Mellia, I. Stoica, and H. Zhang, “TCP Model for Short Lived Flows,” IEEE
Communications Letters, February 2002.
[41] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput:
A Simple Model and Its Empirical Validation,” ACM Sigcomm, August 1998.
[42] F. Paganini, Z. Wang, S. Low, and J. Doyle, “A new TCP/AQM for stability and
performance in fast networks,” IEEE Infocom, April 2003.
134
[43] L. Peterson and B. Davie, “Computer Networks, A Systems Approach,” 4th
Edition Morgan Kaufmann Elsevier, 2007.
[44] J. Postel, “Transmission control protocol,” RFC 793, September 1981.
[45] K. Ramakrishnan, S. Floyd, and D. Black, "The Addition of Explicit Congestion
Notification (ECN) to IP," RFC 3168, September 2001.
[46] Y. Sakumoto et al. “On XCP stability in a Heterogeneous Network”, IEEE
ISCC, July 2007.
[47] Y. Sakumoto et al. “Increasing Robustness of XCP (eXplicit Control Protocol)
for Dynamic Traffic,” IEEE Globecom, November 2007.
[48] R. Sherwood, B. Bhattacharjee, and R. Braud, "Misbehaving TCP receivers can
cause internet-wide congestion collapse," ACM CCS, November 2005.
[49] R. Shorten and D. Leith, “ H-TCP: TCP for high-speed and long-distance
networks,” PFLDnet, 2004.
[50] W. Stallings, “Data and Computer Communications,” Prentice Hall, 2006.
[51] W. Stevens, “TCP/IP Illustrated: the protocols”, Addison Wesley, 1994.
[52] A. Tanenbaum, “Computer Networks,” Prentice Hall, 2002.
[53] G. Vinnicombe, “On the stability of networks operating TCP-like congestion
control,” IFAC World Congress, 2002.
[54] S. Vutukury and J. Garcia-Luna-Aceves, “WiNN: An Efficient Method for
Routing Short-Lived Flows,” IEEE ICT , February 2003.
[55] P. Wang and D. Mills, “Simple Analysis of XCP Equilibrium Performance,”
IEEECISS, March 2006.
[56] C. Wilson et al. “Fairness Attacks in the Explicit Control Protocol”, IEEE
IWQoS, June 2007.
[57] L. Xu, K. Harfoush and I. Rhee, “Binary Increase Congestion Control for Fast,
Long-Distance Networks,” IEEE Infocom, March 2004.
[58] G. Yang, M. Gerla and M. Y. Sanadidi, “Randomization: Defense against Low-
rate TCP-targeted Denial-of-Service Attacks,” ISCC 2004.
135
[59] T. Zhang and T. Henderson, “An Implementation and Experimental Study of the
eXplicit Control Protocol (XCP),” IEEE Infocom, March 2005.
[60] Y. Zhang and Moin Ahmed, “A Control Theoretic Analysis of XCP,” IEEE
Globecom, March 2005.
Abstract (if available)
Abstract
Many modern variations of transport protocols are equipped with improved congestion control algorithms that are more proactive rather than reactive to congestion. Consequently it is expected that they will rarely need to invoke loss recovery mechanisms (namely abrupt reduction in transmission rate and timeout intervals). Therefore unlike congestion control algorithms, the loss recovery mechanisms have undergone little or no change as the Internet has evolved and scaled up in terms of dimension, traffic and bandwidth.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Transport layer rate control protocols for wireless sensor networks: from theory to practice
PDF
A framework for worst-case performance evaluation of MAC protocols for wireless adhoc networks
PDF
Accurate and efficient testing of resistive bridging faults
PDF
Structural delay testing of latch-based high-speed circuits with time borrowing
PDF
Improving user experience on today’s internet via innovation in internet routing
PDF
Coordinated freeway and arterial traffic flow control
PDF
A joint framework of design, control, and applications of energy generation and energy storage systems
PDF
Balancing security and performance of network request-response protocols
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Improving reliability, power and performance in hardware transactional memory
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
Asset Metadata
Creator
Ebrahimi-Taghizadeh, Shirin
(author)
Core Title
Systematic performance and robustness testing of transport protocols with congestion control
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/28/2010
Defense Date
01/21/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
congestion control,DoS,OAI-PMH Harvest,TCP,TCP-friendly,Timeout,XCP
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Sandeep K. (
committee chair
), Helmy, Ahmed (
committee chair
), Medvidović, Nenad (
committee member
)
Creator Email
chirine.ebrahimi@gmail.com,sebrahim@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2135
Unique identifier
UC1173192
Identifier
etd-EbrahimiTaghizadeh-2646 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-222002 (legacy record id),usctheses-m2135 (legacy record id)
Legacy Identifier
etd-EbrahimiTaghizadeh-2646.pdf
Dmrecord
222002
Document Type
Dissertation
Rights
Ebrahimi-Taghizadeh, Shirin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
congestion control
DoS
TCP
TCP-friendly
Timeout
XCP