Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Detecting periodic patterns in internet traffic with spectral and statistical methods
(USC Thesis Other)
Detecting periodic patterns in internet traffic with spectral and statistical methods
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DETECTING PERIODIC PATTERNS IN INTERNET TRAFFIC WITH SPECTRAL
AND STATISTICAL METHODS
by
Xinming He
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2006
Copyright 2006 Xinming He
Dedication
To my beloved parents
ii
Acknowledgments
My graduate study at USC has been one of the most rewarding experiences in my life. Many people
have guided and encouraged me during this process. I would like to take this opportunity to express
my deep gratitude for their support.
First I would like to thank my advisers, Prof. Christos Papadopoulos and Prof. John Heide-
mann. They gave me invaluable guidance on learning new things, organizing my thoughts, and
converting my ideas into viable research directions. I greatly appreciate the freedom they gave me
to explore new and exciting research areas while gently and patiently leading me throughout the
process. They always provide constant support and encouragement making my graduate study in
USC very rewarding and enjoyable.
I would also like to thank Prof. Urbashi Mitra, Prof. Ramesh Govindan, and Prof. Leana
Golubchik for serving on my qualifying exam and dissertation committees and giving me invaluable
feedback and suggestions that greatly improved my dissertation. Additionally, I have beneted
immensely with my interaction with Prof. Antonio Ortega during our regular group meetings. His
feedback at various stages of my research have helped me make signicant improvements in my
work.
Furthermore, I would like to thank my fellow graduate students, Gen Barlett, Xuan Chen, De-
bojyoti Dutta, Ramki Gummadi, Pavlin Radoslavov, and Rishi Sinha for their help and suggestions.
They have made the study in USC intellectually stimulating and fruitful.
iii
I have been extremely fortunate to have the complete support of my family during my graduate
study. My parents always encouraged me to pursue my dreams and gave me great freedom in
choosing my goals and the way to achieve them. I would like to thank my sisters and brother-
in-laws who have given my strength throughout this process. I cannot imagine to have such an
achievement without the full support of my family members.
Finally, I gratefully acknowledge the nancial support provided by NSF through the grant ANI-
0133975 during 2001-2006.
iv
Table of Contents
Dedication ii
Acknowledgments iii
List of Tables viii
List of Figures ix
Abstract xiii
1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Capacity Planning and Trafc Engineering . . . . . . . . . . . . . . . . . 5
1.5.2 Detecting Denial-of-Service Attacks . . . . . . . . . . . . . . . . . . . . . 6
1.5.3 Detecting Under-Performing TCP Flows . . . . . . . . . . . . . . . . . . 8
1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 SPECTRAL ANALYSIS OF INTERNET TRAFFIC 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 A Spectral Analysis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 Fourier Transform of Network Trafc . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Selection of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2.1 Spectra for Periodic Packet Streams . . . . . . . . . . . . . . . . 23
2.4.2.2 Aliasing and Sampling Rate . . . . . . . . . . . . . . . . . . . . 25
2.4.2.3 Segment Length . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
v
3 VISUALIZING REGULAR PATTERNS IN INTERNET TRAFFIC 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Regular Patterns in Internet Trafc . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Pattern Imposed by Bottleneck Links . . . . . . . . . . . . . . . . . . . . 35
3.2.1.1 TCP Trafc through a 100Mbps Bottleneck . . . . . . . . . . . 36
3.2.1.2 TCP Trafc through a 10Mbps Bottleneck . . . . . . . . . . . . 38
3.2.1.3 UDP Trafc through a 10Mbps Bottleneck . . . . . . . . . . . . 39
3.2.2 Pattern Imposed by Denial of Service Attacks . . . . . . . . . . . . . . . . 40
3.2.3 Pattern imposed by TCP Windowing Behavior . . . . . . . . . . . . . . . 43
3.2.4 Summary of Regular Patterns . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Effect of Cross Trafc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Classication of Cross Trafc . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2 Impact of Unobserved Bottleneck Trafc . . . . . . . . . . . . . . . . . . 51
3.3.3 Impact of Unobserved Non-bottleneck Trafc . . . . . . . . . . . . . . . . 55
3.3.4 Impact of Observed Non-bottleneck Trafc . . . . . . . . . . . . . . . . . 59
3.3.5 Summary of Cross Trafc Impact . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Wide-area Network Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 NON-PARAMETRIC DETECTION OF PERIODIC PATTERNS 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Maximum Likelihood Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.1 Maximum Likelihood Test Rule . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Two Phases in Maximum Likelihood Detection . . . . . . . . . . . . . . . 70
4.3 Four Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.1 Single-Frequency Detection Algorithm . . . . . . . . . . . . . . . . . . . 75
4.3.2 Top-Frequency Detection Algorithm . . . . . . . . . . . . . . . . . . . . . 77
4.3.3 Top-M-Frequencies Detection Algorithm . . . . . . . . . . . . . . . . . . 79
4.3.4 All-Frequencies Detection Algorithm . . . . . . . . . . . . . . . . . . . . 80
4.4 Evaluation with Real Internet Trafc . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4.2 Performance Comparison of Different Algorithms . . . . . . . . . . . . . 84
4.4.3 Frequency Window Location . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.4 Frequency Window Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4.5 Sampling Rate and Segment Length . . . . . . . . . . . . . . . . . . . . . 94
4.4.6 Transport Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.7 Selection of Fixed Frequency Windows . . . . . . . . . . . . . . . . . . . 98
4.4.8 Training Data Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.4.9 Effect of Signal-to-noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5 Utilizing Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.5.1 A Detection Algorithm Using Harmonics . . . . . . . . . . . . . . . . . . 104
4.5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
vi
5 PARAMETRIC DETECTION OF PERIODIC PATTERNS 111
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 Candidate Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3 Parametric Detection with Trafc V olume . . . . . . . . . . . . . . . . . . . . . . 115
5.3.1 Relationship between Trafc V olume and Spectrum . . . . . . . . . . . . . 117
5.3.2 Three Approaches for Utilizing Trafc V olume Information . . . . . . . . 117
5.3.2.1 Middle-line Approach . . . . . . . . . . . . . . . . . . . . . . . 118
5.3.2.2 Threshold-line Approach . . . . . . . . . . . . . . . . . . . . . 119
5.3.2.3 Pocket Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4 Evaluation with Real Internet Trafc . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6 RELATED WORK 124
6.1 Internet Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2 Spectral Analysis of Network Trafc . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2.1 Spectral Analysis of DoS Attacks, Network Anomalies, and Bottlenecks . . 126
6.2.2 Network Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3 Non-Spectral Analysis of Network Trafc . . . . . . . . . . . . . . . . . . . . . . 129
6.3.1 Detection of DoS Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3.2 Detection of Bottlenecks and Estimation of Link Capacity . . . . . . . . . 130
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7 CONCLUSIONS AND FUTURE WORK 132
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2.1 Quantitative Model of Periodic Pattens and New Parametric Detection . . . 135
7.2.2 Temporal and Spatial Stability . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2.3 Alternative Detection Features . . . . . . . . . . . . . . . . . . . . . . . . 136
7.2.4 Application to Other Periodic Patterns . . . . . . . . . . . . . . . . . . . . 136
7.3 Final Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Reference List 137
vii
List of Tables
3.1 Mapping in the experiment for unobserved bottleneck trafc . . . . . . . . . . . . 51
3.2 Throughput with Iperf TCP ow as observed bottleneck trafc and web trafc as
unobserved bottleneck trafc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Throughput with Iperf UDP ow as observed bottleneck trafc and web trafc as
unobserved bottleneck trafc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Mapping in the experiment for unobserved non-bottleneck trafc . . . . . . . . . . 56
3.5 Throughput with Iperf TCP ow as observed bottleneck trafc and web trafc as
unobserved non-bottleneck trafc or observed non-bottleneck trafc . . . . . . . . 57
3.6 Throughput with Iperf UDP ow as observed bottleneck trafc and web trafc as
unobserved non-bottleneck trafc or observed non-bottleneck trafc . . . . . . . . 57
3.7 Mapping in the experiment for unobserved non-bottleneck trafc . . . . . . . . . . 59
3.8 Summary of the impact from cross trafc . . . . . . . . . . . . . . . . . . . . . . 62
4.1 Comparison of detection features . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Experiment scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Traceroute result for the ow path in the experiment . . . . . . . . . . . . . . . . . 82
4.4 Performance comparison of the four non-parametric detection algorithms . . . . . 88
4.5 Statistics of the peak amplitudes (after log) in different frequency windows . . . . 109
5.1 Performance of parametric detection with packet rate information . . . . . . . . . 122
5.2 Performance of parametric detection with bit rate information . . . . . . . . . . . . 122
viii
List of Figures
2.1 Experimental methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Two phases for packet capturing . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Packet delivery methods in wired networks . . . . . . . . . . . . . . . . . . . . . 18
2.4 Steps to obtain the spectral representation for Internet trafc traces . . . . . . . . . 21
2.5 Spectra for periodic packet streams of different packet rates with 8kHz sampling
rate and 1s segment length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Spectra for an 800Hz sinusoidal wave with different sampling rates and 1s segment
length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Spectra for a periodic packet stream of 800 packets per second with different sam-
pling rates and 1s segment length . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8 Spectra for a periodic packet stream of 800 packets per second with 8kHz sampling
rate and different segment length . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Spectral signature of a 100Mbps link saturated with a TCP ow . . . . . . . . . . 36
3.2 Ethernet frame format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Spectral signature of a 10Mbps link saturated with a TCP ow . . . . . . . . . . . 38
3.4 Spectral signature of a 10Mbps link saturated with different size UDP packets . . . 40
3.5 Spectral signature of a Mstream attack stream through a 10Mbps link . . . . . . . 41
3.6 An Example of TCP Windowing Behavior . . . . . . . . . . . . . . . . . . . . . . 42
3.7 Spectral signature of an under-performing TCP ow . . . . . . . . . . . . . . . . . 43
ix
3.8 Spectra of a TCP ow with increasing window size . . . . . . . . . . . . . . . . . 43
3.9 Different types of cross trafc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.10 Testbed topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.11 Experiment setup for unobserved bottleneck trafc . . . . . . . . . . . . . . . . . 51
3.12 Power spectra of observed bottleneck ow as unobserved bottleneck trafc increases 53
3.13 Experiment setup for unobserved non-bottleneck trafc . . . . . . . . . . . . . . . 56
3.14 Power spectra of observed bottleneck trafc as unobserved non-bottleneck trafc
increases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.15 Experiment setup for observed non-bottleneck trafc . . . . . . . . . . . . . . . . 59
3.16 Power spectra as background trafc increases . . . . . . . . . . . . . . . . . . . . 60
3.17 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.18 Power spectra of aggregate trafc at USC Internet-2 link . . . . . . . . . . . . . . 64
4.1 Steps in the training phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Steps in the detection phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Four approaches in selecting the detection feature x . . . . . . . . . . . . . . . . . 73
4.4 PDFs of the amplitude (after log) at 796Hz for group H
0
(lines on the left) and H
B
(lines on the right) with dashed lines representing empirical PDFs and solid lines
representing the log-normal approximation . . . . . . . . . . . . . . . . . . . . . . 77
4.5 PDFs of the peak amplitude (after log) in the [780Hz, 800Hz] window for H
0
(lines
on the left) and H
B
(lines on the right) with the dashed lines representing empirical
PDFs and solid lines representing the log-normal approximation . . . . . . . . . . 78
4.6 Performance of the Single-Frequency Algorithm . . . . . . . . . . . . . . . . . . . 85
4.7 Performance of the Top-Frequency Algorithm, W
s
= 20Hz . . . . . . . . . . . . . 85
4.8 Performance of the Top-10-Frequencies Algorithm, W
s
= 20Hz . . . . . . . . . . 86
4.9 Performance of the All-Frequencies Algorithm, W
s
= 20Hz . . . . . . . . . . . . 86
x
4.10 Statistics of the peak amplitudes for H
0
and H
B
. . . . . . . . . . . . . . . . . . . 87
4.11 Detecting the 10Mbps bottleneck (T10L) with varying window locations, W
s
= 20Hz 89
4.12 Spectrum and packet interarrival time distribution of the isolated Iperf TCP ow
(T10L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.13 Detecting the 100Mbps bottleneck (T100H) with varying window locations, W
s
=
200Hz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.14 Detecting the 10Mbps bottleneck (T10L) with varying window sizes, W
c
= 790Hz . 93
4.15 Detecting the 100Mbps bottleneck (T100H) with varying window sizes, W
c
= 8100Hz 93
4.16 Detection accuracy as a function of segment length . . . . . . . . . . . . . . . . . 95
4.17 Detecting the 10Mbps bottleneck saturated with different protocols . . . . . . . . . 96
4.18 Spectrum and packet interarrival time distribution of the isolated Iperf UDP ow
(U10L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.19 Differences between using the xed [780Hz, 800Hz] window and the best windows
for the training set in T10L scenario . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.20 Impact of training data on detection accuracies. . . . . . . . . . . . . . . . . . . . 100
4.21 Training with different trafc volume traces . . . . . . . . . . . . . . . . . . . . . 100
4.22 Detection accuracy as a function of signal-to-noise ratio using xed frequency win-
dows specied in Section 4.4.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.23 Spectrum from 0Hz to 10kHz of a trace segment in T10L . . . . . . . . . . . . . . 105
4.24 Detection accuracies with increasing number of frequency windows explored . . . 107
4.25 Accuracy on individual trace pairs after training with the 11am trace pair . . . . . . 107
4.26 Mean of the peak amplitude (after log) in the 1st and 2nd frequency window for
individual trace pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.1 Peak amplitude in the [780Hz, 800Hz] window vs packet rate for trace segments in
the T10L Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Peak amplitude in the [780Hz, 800Hz] window vs bit rate for trace segments in the
T10L Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
xi
5.3 Separation by the Middle-line Approach for parametric detection . . . . . . . . . . 118
5.4 Separation by the Threshold-line Approach for parametric detection . . . . . . . . 120
xii
Abstract
Internet trafc contains a rich set of periodic patterns. Examples include regular packet transmis-
sions along bottleneck links, periodic routing information exchange, and periodicities inside Denial-
of-Service attack streams. Analyzing such periodic patterns has wide applications, including a bet-
ter understanding of network trafc dynamics, diagnosis of network anomalies, and detection of
Denial-of-Service attacks. However, current understanding of periodic behavior in aggregate trafc
is quite limited. Many previous approaches often analyze trafc on a per-ow basis, and are not
suited to analyze high speed aggregate trafc. In addition, a number of approaches only indicate
that they may reveal periodic patterns, but fall short of proposing automatic detection algorithms
and quantitatively evaluating their performance.
This thesis explores the application of spectral and statistical methods to detect periodic patterns
in Internet trafc. In our approach we rst apply spectral techniques to obtain the trafc spectrum,
and then use algorithms based on rigorous statistical methods to automatically detect periodic pat-
terns from the trafc spectrum. One salient feature of our approach is that it operates at the aggregate
trafc level and does not require ow separation.
We rst carry out controlled lab experiments to demonstrate the spectral characteristics of vari-
ous periodic patterns. We then propose four non-parametric detection algorithms and evaluate their
performance using real-world Internet trafc. Results show that one of them, the Top-Frequency
Algorithm, is the best choice in terms of detection performance and algorithm simplicity. It can
xiii
provide excellent accuracy (up to 95%) for detecting the periodic pattern caused a bottleneck link
even when the trafc through the bottleneck accounts for less than 10% of the aggregate trafc
observed at the monitoring point.
We also investigate two extensions to our algorithms. The rst one is to utilize harmonics, and
the second one is to have parametric detection that considers the variation of trafc spectra according
to other factors, such as trafc volume. Evaluation results show that we can get signicant improve-
ment by considering harmonics for trafc similar to the training data and marginal improvement by
considering trafc volume for parametric detection.
xiv
Chapter 1
INTRODUCTION
1.1 Motivation
There exist a variety of processes that govern the generation and shaping of Internet trafc. Many
of them are periodic and reside at different communication layers. For example, congestion due
to xed bandwidth at the link layer can cause packets to be transmitted out back-to-back along the
bottleneck link, creating strong periodic patterns in network trafc. At the network layer and trans-
port layer, routing information exchange and the TCP windowing mechanism can result in periodic
packet transmissions on the network. Strong regularities also exist in the trafc automatically gen-
erated by machines at the application layer, such as zombies in DoS (Denial-of-Service) attacks
or miscongured DNS clients. Such periodic processes imprint their own periodic signatures on
network trafc.
Studying such periodicities has wide applications, including a better understanding of network
trafc dynamics, diagnosis of network trafc congestion and anomalies, and detection of DoS at-
tacks. For example, trafc congestion along a bottleneck link will result in a stream of back-to-back
packet transmissions. We will show that that such periodic transmissions can survive and be de-
tected by a downstream observation point despite interference from cross trafc. This enables us
1
to infer the existence of the bottleneck link and some of its properties such as the link bandwidth
by studying the trace gathered at the remote downstream observation point. Another example is
the detection of attacks attempting to overload a web server through repeated requests by zombies.
Requests automatically originated by machines are typically more regular and this can be used to
distinguish them from human-originated web requests. In Section 1.5 we will explore the potential
applications of studying periodic patterns in more detail.
Spectral techniques have been widely used in many other elds to detect hidden patterns and
trends from a noisy background. Examples include sonar detection of submarine signals from the
ocean acoustic background, analysis of weather data to model the patterns for forecast, applications
in stock market and other nancial markets, etc. In the past few years, researchers have begun
to apply spectral techniques to analyze network trafc for various purposes. Their work presents
strong evidence that applying such techniques to the analysis of network trafc is a very promising
approach to study DoS attacks [36, 37], DNS trafc behavior [13], trafc anomalies [9], and even
protocol behavior in encrypted trafc [65]. We cover them in more detail in Section 2.1 and 6.2.
However, current understanding of periodic behavior in general aggregate trafc is limited.
Many previous approaches often analyze trafc on a per-ow basis, and are not suited to analyze
high speed aggregate trafc. In addition, a number of approaches only indicate that they may re-
veal periodic patterns, but fall short of proposing automatic detection algorithms and quantitatively
evaluating their performance. Hence it is important to device automatic detection algorithms that
can operate at the aggregate trafc level so that they can keep up with the analysis of high speed
aggregate trafc.
2
1.2 Thesis Statement
Our thesis is that algorithms using statistical methods can robustly detect periodic patterns from
Internet trafc without ow separation by extracting the signatures of such patterns in the spectral
representation of Internet trafc. We show this by quantitatively evaluating the performance of
such algorithms using real Internet trafc. Results show that they can have up to 95% accuracy
in detecting the periodic pattern caused by trafc congestion along a bottleneck link, even if the
background trafc volume at the observation point is more than ten times the trafc through the
bottleneck link.
1.3 Approach
Our basic approach is to passively monitor Internet trafc and analyze it in the spectral domain by
applying Fourier transformation to the trafc. In this way, the periodic patterns inside the Internet
trafc will be revealed by distinct strong frequencies in the spectral representation. The values of
the strong frequencies are the inverse values of the corresponding periods.
Since we carry out analysis on the trafc gathered at a remote observation point without ow
separation, there will be interference from cross trafc, which creates serious challenges in detect-
ing the periodic patterns from the spectral representation of the aggregate trafc. To meet these
challenges, we resort to rigorous statistical methods and develop detection algorithms based on the
Bayes Maximum Likelihood Classier [27]. Details of these algorithms will be covered in Chap-
ter 4. Furthermore, we investigate inuential factors on the detection problem, and propose para-
metric detection algorithms that take into consideration the network trafc volume to improve the
3
performance. We cover parametric algorithms in more detail in Chapter 5. All detection algorithms
are evaluated using real Internet trafc traces.
1.4 Key Contributions
There are several contributions of this thesis. First, we validate an experimental methodology that
guides the application of spectral techniques to Internet trafc by following it to detect periodic
patterns in Internet trafc. Second, we investigate the impact of aliasing on accurate spectral rep-
resentation of periodic patterns and solutions to reduce its impact. Third, we visually demonstrate
the spectral characteristics of various periodic patterns and the impact from cross trafc. Finally,
and most importantly, we enrich the experimental methodology by proposing both non-parametric
and parametric detection algorithms based rigorous statistical methods for detecting periodic pat-
terns even when they are mixed with noisy background trafc, and we systematically evaluate the
performance of these algorithms using real-world Internet trafc.
Overall our techniques are completely passive and incur no additional network overhead. One
salient feature of our techniques is that they operate at the aggregate trafc level and do not require
ow separation. As a case study, we carry out extensive investigation on the detection of periodic
patterns caused by trafc congestion along bottleneck links. We show that our algorithms can
remotely detect the existence of a bottleneck with an accuracy up to 95% even if the trafc through
the bottleneck accounts for less than 10% of the aggregate trafc observed at the monitoring point.
4
1.5 Applications
There are many applications with the exploring of periodicities in Internet Trafc. We explore
several possibilities below.
1.5.1 Capacity Planning and Trafc Engineering
An important job in capacity planning and trafc engineering is to understand what parts of the
network are limited by the bandwidth of bottleneck links. Identifying bottleneck links is useful in
capacity planning as it can help informed decisions about link upgrades. It can also help with trafc
engineering by helping operators to make informed decisions about routing and label switching to
alleviate persistent congestion.
There are two common denitions of a bottleneck link [73]. The rst one denes a bottleneck
as a tight link which is the link with the minimum available bandwidth along a ow path, while the
second one denes a bottleneck as a narrow link which is the link with the minimum raw capacity
along a ow path. While both denitions are suitable to study individual ow paths, they are not
suitable for studying trafc congestion from the perspective of a network. For example, given any
ow path, it always contains a bottleneck link based on the two common denitions. However such
a link may be of no interest to network operators if it is not congested. Hence in our thesis, we
dene a bottleneck link as the link with full or near full utilization. A salient feature of such a link
that it transmits packets back-to-back. Our denition of a bottleneck link is from the perspective of
a network and would be more suitable for studying trafc congestion.
Current approaches to detect bottleneck links typically use network monitoring (with SNMP [16])
and trafc matrix estimation [80]. While essential, these approaches are limited to links that can
be directly monitored, and are often computed at fairly coarse timescales (5 minutes per reading or
5
more). In contrast, our approach based on rigorous statistical methods is able to detect bottleneck
trafc (dened as the trafc that traverses the bottleneck link) from the observed aggregate trafc,
hence is able to infer the presence of a bottleneck link that is far away from the monitoring point.
Our experiments in Chapter 3 show that a bottleneck link creates strong periodic packet transmis-
sions in bottleneck trafc and such periodicity can survive and manifest itself at the downstream
monitoring point despite the interference from cross trafc. By examining the periodicity in net-
work trafc, our system can infer the existence of bottleneck links that are not directly monitored,
such as those in an upstream ISP. For example, results in Chapter 4 show that our system can have
up to 95% accuracy in detecting a remote bottleneck link even if the trafc through the bottleneck
accounts for less than 10% of the aggregate trafc observed at the monitoring point.
In addition to detecting problems in neighbors, remote monitoring may reduce costs and allow
monitoring of old or slow equipment. Moreover, our approach is able to detect bottleneck trafc
over timescales of seconds rather than minutes. While obviously one would make planning and even
trafc engineering decisions infrequently, evaluation of short-term behavior may provide a better
view of worst-case performance than consideration of average behavior over longer timescales.
Although our current techniques do not reveal the exact location of the bottleneck link, they can
be used as an early-warning system to alert network operators of the presence of saturated links and
trigger further investigation if needed.
1.5.2 Detecting Denial-of-Service Attacks
Although smart DoS attacks have been described in [46], many DoS attacks harm their victims by
saturating links. Our system can directly detect these bottlenecks. More importantly, a distributed
6
DoS attack formed when many individual hosts (often called zombies) target their trafc at a sin-
gle victim. The stream from any individual zombie is fairly small, and it may be difcult to detect
when mixed with other trafc. However, individual zombie attack trafc is often quite periodic, ei-
ther because it saturates the zombie's access link, or because the attack application itself is periodic.
(See [36] for a more detailed examination of these issues.)
Although our current work does not focus specically on zombie detection, our approaches
are directly applicable to this problem. Today, zombie detection is often done via signature-based
schemes, or by simply detecting high-rate ows. The approaches in this thesis avoid the need for
signature discovery and distribution, and promise to be much more sensitive than simple rate-based
detection schemes.
Zombies that carry out DDoS attacks typically transmit packets as fast as they can, saturating the
access link of the machine (e.g., cable or DSL links). While attack tools typically randomize packet
contents, they do not randomize packet size - they either use very small packets in an attempt to
overload the interrupt handler at the target or large packets to overload a network link. Such behavior
will result in very strong frequencies in trafc. Our techniques can reveal spectral characteristics
of DoS attacks (see an example in Section 3.2.2) and alert a network operator of the presence of
DoS attacks, thus enabling zombie detection at the source. Our techniques can also be used as a
means for anomaly detection, providing a hint to distinguish ow behavior. For example, suppose
a network operator receives a complain that a zombie is attacking a particular target. The network
operator can search for a strong periodic behavior in packets toward the target, which is a strong
indication that an attack is indeed taking place. This determination cannot be made on the basis of
data or packet rate alone.
7
1.5.3 Detecting Under-Performing TCP Flows
The throughput of a TCP ow is limited by the TCP window that species the amount of data a
sender can send before it gets acknowledgment from the receiver. If the window size is lower than
the BDP (Bandwidth Delay Product) of the underlying path, the TCP ow will not achieve the
maximum throughput that the path can support.
As we will show in Chapter 3, a window limited TCP ow has a unique pattern in the packet
transmission and such a pattern can be revealed by our techniques in the spectral representation.
Hence our techniques can detect under-performing TCP ows so that end hosts can take steps to
increase the window size to achieve higher throughput.
1.6 Organization
The remainder of this dissertation is organized as follows. In Chapter 2, we present a methodology
for spectral analysis of Internet trafc and the details of the rst two major components how to
capture packet traces and how to obtain their spectral representation. We then visually demonstrate
the spectral characteristics imposed by three different processes, trafc congestion along the bottle-
neck link, DoS attacks, and TCP windowing behavior, and how cross trafc can affect the spectral
characteristics in Chapter 3. In Chapter 4, we propose four non-parametric detection algorithms
based on Bayes Maximum Likelihood Classier and evaluate their performance using real Internet
trafc. We also present an extension to the algorithm that utilizes harmonics in the detection and
evaluate its performance. Then in Chapter 5 we introduce a parametric detection algorithm that
considers not only the spectrum of the trafc, but also knowledge of other factors such as aggregate
trafc volume, in the detection process. We then investigate how this can improve performance.
8
Chapter 6 gives a survey of related work in Internet Trafc Analysis, and nally we conclude the
thesis with conclusions and directions for future work in Chapter 7.
9
Chapter 2
SPECTRAL ANALYSIS OF INTERNET TRAFFIC
2.1 Introduction
The Internet has evolved from a testbed used mostly by scientists thirty years ago to a vital part of the
world economy and the daily lives of millions of people around the world. In the United States alone
the number of broadband subscribers had reached 49 million by Dec. 2005 according to the statistics
by OECD (Organisation for Economic Co-operation and Development) [4]. Understanding the
Internet at such a large scale is critical to making it robust and protecting it against intentional and
unintentional service disrupts.
This dissertation is focused on the rich set of periodic patterns inside Internet trafc. A number
of processes that govern the generation of Internet trafc are periodic. Examples include back-to-
back packet transmissions on the bottleneck link, periodic routing information exchange in the IP
layer, transport layer effects such as TCP windowing behavior, and application layer effects such
as mis-congured DNS clients and zombies in Denial-of-Service attacks. Analyzing such periodic
patterns can aid in a better understanding of the Internet trafc and in the detection of network
anomalies and Denial-of-Service attacks.
10
A number of techniques have been proposed to retrieve the periodic patterns from Internet traf-
c. However, they are insufcient to capture and understand the periodic behavior in Internet trafc
at the aggregate level. For example, a common method is to analyze packet interarrival times (for
example, [44]). However, for the high speed network network of interest, such an approach is in-
feasible as it requires the separation of ows. Per-ow analysis can not even keep up with moderate
speed routers. Other approaches such as statistical sampling [30, 29] result in coarse information
and inevitably miss some ne-grain network phenomena.
We propose a methodology based on statistical signal processing of spectral signals to meet
this challenge. Spectral techniques have been widely used in many other elds to detect hidden
patterns and trends from noisy background. Examples include sonar detection of submarine signals
from the ocean acoustic background, processing of weather data to model its patterns and forecast
its future, and analysis of stock market and other nancial markets. We believe that they can also be
used successfully for our goal of detecting periodic patterns inside Internet trafc at the aggregate
level with no ow separation or combination to reduce the processing overhead.
We are not alone in suggesting that spectral techniques can be applied to Internet trafc. In
recent years there have been a number of papers from the network research community that take
advantage of such techniques in analyzing Internet trafc. Examples include ngerprinting DoS
attacks based on their spectra [36], detecting DoS attacks and other network anomalies by analyzing
IP ow-level and SNMP information using wavelets [9], and identifying TCP ows based on its
windowing behavior [19, 65]. Their work shows that spectral techniques can be a very powerful
tool for network trafc analysis.
However, it is insufcient to carry out spectral analysis with only visual evidence that spectral
methods work, for example, by showing that a spectral representation of a network trace looks
11
substantially different depending on whether some phenomenon of interest is present or not. While
visual approaches help motivate the need for spectral techniques, as warned by Partridge in [63, 64],
there is a danger in blindly applying spectral techniques to the networking eld without careful
analysis and knowledge of the ground truth. Without careful examination of the ground truth, we
may reach wrong conclusions or interpret incorrectly the pretty pictures obtained from spectral
analysis.
To face the challenge, it is essential to have a clear methodology that denes the path from
the raw data to the nal conclusion. We dene such a methodology, and in our methodology we
go beyond the visual evidence in two ways. First, rather than relying on human operators, we
propose to apply rigorous statistical methods to automatically examine the visual evidence and draw
conclusions. Second, we push to understand the underlying causes of the pattern detected from the
network trafc and use the knowledge to design better algorithms for the detection purpose.
In the following sections, we present our methodology that guides the application of spectral
techniques to detect periodic patterns in Internet trafc. We rst give an overview of the method-
ology with a summary of each of its components. Then we describe in detail its rst two major
components including Data Generation and Data Representation We also give a summary of the
last component, Detection, which will be elaborated further in Chapter 4 and 5 after we visually
demonstrate the capability of the spectral representation technique in revealing periodic patterns in
Chapter 3.
2.2 A Spectral Analysis Methodology
Figure 2.1 illustrates our methodology for applying spectral and statistical methods to detect peri-
odic patterns from Internet trafc (originally proposed by Antonio Ortega). It contains three major
12
Figure 2.1: Experimental methodology
components, Data Generation, Data Representation, and Detection. The more detailed steps inside
the three components include:
1. Representative Data Collection: This is the generation of representative data sets. The raw
data can be gathered from real-world trafc, generated in controlled lab experiments, or in
more abstract level, synthesized by network simulations. From them we select representative
data to do further processing.
2. Real-world Events Selection This is the selection of interesting real-world events that can
be directly measured. Examples real-world events are packet arrivals, packet losses, TCP
connection establishment, TCP connection tear-down, etc. The corresponding raw measure-
ment data can be packet arrival time, packet length, packet delay, packet loss rate, connection
duration, etc. For our purpose of detecting periodic patterns from the aggregate trafc, we
13
select packet arrivals as the measurable event, and the corresponding raw measurement data
are packet arrival times. We do not gather ow level information to reduce the processing
overhead.
3. Time Domain Representation: This involves the conversion from raw trafc measurements
to the signal represented by time series. This process is also called sampling. Depending
on the length of the sampling period interval, we can have even sampling with the uniform
sampling intervals, or uneven sampling with varying sampling intervals. Compared with even
sampling, uneven sampling can signicantly reduce the number of samples, hence reduce the
processing overhead. The selection of even or uneven sampling depends on which spectral
transform to be performed in the following step. For Fourier transform or wavelet analysis
we have to use even sampling. For Lomb periodograms both even and uneven sampling are
compatible.
4. Frequency Domain Representation: In this step we transform the time domain signal rep-
resentation into the frequency domain signal representation. Possible transform techniques
include Fourier transform [11, 12], Lomb periodograms [65], and wavelet analysis [52, 6]
with different types of mother wavelets. Among them, Fourier transform is the simplest one
and has the longest history. It has been applied successfully to many elds since Fourier rst
proposed it nearly two centuries ago. The fast Fourier transform (FFT) was published by
James W. Cooley and John W. Tukey [24] in 1965 signicantly reducing the computation
complexity of the Fourier transform. On the contrary, wavelet analysis became matured in
1980's. The main difference with the Fourier transform is that wavelets are localized in both
time and frequency whereas the standard Fourier transform is only localized in frequency.
14
Both discrete Fourier transform and discrete wavelet transform require even sampling for the
input time series. Lomb periodograms were introduced 1976 [62] to take advantage of un-
even sampling, but their application is more restricted compared with the other two transform
techniques.
5. System Modeling: Based on knowledge gained from above steps, we model the underlying
processes to capture the key elements that create the periodic patterns in the trafc and reshape
them as the trafc ow through the network. We do not intend to model the general Internet
trafc, as the difculty of modeling such trafc has been widely recognized [68]. Instead we
are modeling specic aspects of the trafc that affect detection to reduce the complexity for
modeling.
6. Non-parametric Detection: Non-parametric detection methods identify and detect high-
level events, such as the existence of bottlenecks and DoS attacks using heuristics. They do
not require explicit modeling of the underlying processes. One commonly used statistical
method is the Bayes Maximum Likelihood Classier [27] which treats the detection problem
as a binary hypotheses testing problem and use statistics learned from the training set to infer
the proper threshold for classication. We have proposed several non-parametric detection
algorithms based on this method and more details can be found in Chapter 4.
7. Parametric Detection: Compared with non-parametric detection, parametric detection will
utilize the system model of the underlying processes and other information to improve the de-
tection accuracy of the event of interest. For example, we have identied several key param-
eters that affect the accuracy of detecting bottleneck links and investigated the performance
15
improvement of utilizing one of the parameters the aggregate trafc volume. The algorithm
and results will be presented in Chapter 5.
In the following two sections we describe in detail the rst two major components of the method-
ology including Data Generation and Data Representation. We will also have a summary of the last
component, Detection. More details of the Detection component will be presented in Chapter 4 and
5 after we visually demonstrate the capability of the spectral representation techniques in revealing
periodic patterns in Chapter 3.
2.3 Data Generation
The rst component of our methodology is the generation of representative data sets. The raw mea-
surement data can be gathered from real-world trafc, generated from controlled lab experiments,
or in more abstract level, synthesized from network simulations. From them we select representa-
tive data to do further processing. In our work we mostly use real trafc traces gathering through
controlled lab experiments and real-world Internet experiments. These experiments are similar in
that the trafc data obtained are real and gathered through physical devices. The only difference is
the environment where they are gathered. In this section we will describe how real network traces
can be gathered.
There are basically two steps for trace gathering as illustrated in Figure 2.2. The rst step is
packet delivery, in which packets are delivered to the trace machine. The second one is packet
recording, in which the trace machine records the packet stream.
For packet delivery, if the trace machine is in the trafc path, it can observe the trafc directly
and no special action is needed to observe the trafc. However, putting the trace machine directly in
16
Figure 2.2: Two phases for packet capturing
the trafc path will adversely affect the trafc as trace machine now has two jobs to do, forwarding
the trafc in addition to recording the trafc. It is better to have a dedicated trace machine that is
outside of the normal trafc path. In this case, we need to employ some technique to deliver the
trafc to the trace machine.
There are three types of techniques to deliver the trafc to the trace machine that is not in the
normal trafc path in wired networks. The rst one works if the monitored link is through a shared
Ethernet hub. Once we connect the trace machine to the shared hub, the broadcast nature of the
hub will give the trace machine access to the trafc going through the hub. This is illustrated in
Figure 2.3(a). Although this method is easy and the least expensive, its usage is limited because it
requires a shared hub for the monitored link. Shared hubs are becoming less common now as they
are being replaced switched hubs as the latter provides better performance.
The second method is port mirroring, which is commonly available in most enterprise level
routers. In this method, routers forward a copy of the trafc through one or more ports to the mirror
17
(a) Packet delivery method I: Shared Hub (b) Packet delivery method II: Port Mirroring
(c) Packet delivery method III: In-line Tapping
Figure 2.3: Packet delivery methods in wired networks
port which is connected to the trace machine. Figure 2.3(b) illustrates how it works. While this
method can be commonly applied, it adds a burden to the router which can adversely affect the
router's performance. In addition, packets to the trace machine are subject to additional queuing
delay if packets from multiple ports or directions are forwarded to the same mirroring port. To
obtain accurate packet timestamping, we should avoid mirroring multiple ports to one port.
The third method is called in-line tapping, in which a network tap is physically attached to
the monitored link, providing full access of network trafc through the link. Examples of network
18
taps include ber taps which split the light signal through the ber link and render it to both the
destination and the trace machine, and copper taps which regenerate the signal transmitted through
the copper link and send it to the trace machine. This method is illustrated in Figure 2.3(c). In-line
tapping does not affect network operation except at the time when it is rst installed. It is able to
keep up with high speed links and give the trace machine full access to trafc, including accurate
timing of packets. But it is expensive, as extra hardware needs to be purchased.
Once the packets are delivered to the trace machine, we need to timestamp and record them
in the trace machine for the packet recording phase. We can either use normal network cards and
commonly available snifng tools (e.g., tcpdump) or specialized devices to accomplish this. While
using common tools is inexpensive, it may suffer from inaccurate timestamping. For example,
a Gigabit Ethernet card typically employs coalescing to reduce the number of interrupts to the
operating system. While coalescing improves system performance, it introduces signicant jitter to
packet timestamping. This makes it unsuitable for monitoring high speed links (e.g. 1Gbps links).
For high speed links, we need specialized devices, e.g., Endace DAG network monitoring cards
[75], so that we can get accurate and high resolution timestamps for each packet.
2.4 Data Representation
Once we gather the raw measurement data, we need to transform them into appropriate representa-
tion to reveal the patterns embedded in the trafc. Several researchers have explored different forms
for spectral representation of network trafc including Power Spectral Density (through Fourier
transform) [36], Lomb periodograms [65], and wavelets [9]. In our work we select Power Spectral
Density (through Fourier transform) as our choice for the spectral representation of network trafc
since it is simple and effective.
19
In this section we rst describe how to transform the raw measurement data into appropriate
spectral representation, and then we discuss the aliasing effect which creates a serious challenge for
obtaining an accurate spectral representation and how to reduce its impact. We use the same Power
Spectral Density technique as in [36], but we go beyond it by carrying out investigation on the
aliasing effect and how to reduce its impact.
In the next chapter we will visually demonstrate the capability of the Power Spectral Density
technique in revealing periodic patterns, and associate each pattern with the underlying process that
generates the pattern. We will also present experiment results that qualitatively demonstrate the im-
pact of cross trafc on the periodic pattern. Based on the intuition from these results we will present
several non-parametric detection algorithms and preliminary parametric detection algorithms that
perform fairly well in detecting periodic patterns from Internet trafc in Chapter 4 and 5. It is our
future work to quantitatively model the system and further improve our detection algorithms based
on the system model.
2.4.1 Fourier Transform of Network Trafc
There are three main steps from the raw packet trace to the corresponding spectral representation.
The process is illustrated in Figure 2.4. We describe them in detail below.
First, we select packet arrivals as the measurable real-world events. The only information we are
interested in from the packet trace is the packet arrival times. We extract them from the packet trace
to form a sequence of arrival times. In addition, we divide a packet trace into smaller segments
of the same length (`-second long each) before extracting packet times from it. The purpose of
segmentation is that we can calculate and then compare the spectrum for segments of the same
20
Figure 2.4: Steps to obtain the spectral representation for Internet trafc traces
length later. The length of each segment ` is a congurable parameter and we will discuss it further
in the next subsection.
Second, we sample each segment with a sampling rate p (we will discuss shortly in Section 2.4.2
for how to select a proper p) to obtain a time series X, where X(i) is the number of packets that
arrive in the time period [
i
p
;
i+1
p
). The time is relative to the start of the segment, and i varies from
0 to ` p 1. This results in N = ` p number of samples for each segment. In addition, we
subtract the mean arrival rate before proceeding with spectral transform in the next step, since the
mean value results in a DC component in the spectrum that does not provide useful information for
our purposes.
Finally, to get the spectral domain representation, we compute the Power Spectral Density (PSD)
by performing the discrete-time Fourier transform on the auto-covariance function (ACF) of the
time series. The auto-covariance is a measure of how similar the stream is to itself shifted in time
by offset k [11, 12]. When k = 0 we compare the packet stream to itself, and the auto-covariance is
21
maximum and equals the variance of the packet stream. When k > 0 we compare the packet stream
with a version of itself shifted by lag k. The auto-covariance sequence c(k) at lag k is
c(k) =
Nk1
X
t=0
(X(t)
X)(X(t + k)
X); (2.1)
where
X is the mean of X(t), N is the number of samples, and k varies fromN + 1 to N 1.
We do not divide the sequence by N as normal auto-covariance denition does because N reects
the trace segment length and we want the keep the impact of the trace segment length in the nal
result.
The power spectral density is obtained by applying discrete-time Fourier transform to the auto-
covariance sequence of length M. While the PSD contains both phase information and amplitude
information, we are mostly interested in the amplitude information calculated as follows.
S(f) =j
M1
X
k=0
c(k)e
{2fk
j (2.2)
The spectrum amplitude S(f) captures the power, or strength, of the individual observable frequen-
cies embedded in the time series. Meanwhile, we calculate the cumulative spectrum P(f) as the
power in the range 0 to f, and normalize P(f) by the total power to get the Normalized Cumulative
Spectrum (NCS) C(f).
P(f) =
f1
X
i=0
S(i) + S(i + 1)
2
(2.3)
C(f) =
P(f)
P(f
max
)
(2.4)
22
Intuitively, the spectrum S(f) captures the power or strength of individual observable frequen-
cies embedded in the time series, while the normalized cumulative spectrum C(f) shows their
relative strength. The spectrum can be compared both across time for consecutive segments gath-
ered at the same point and across space for segments gathered at different points on the network to
study how it evolves across time and across the network.
2.4.2 Selection of Parameters
There are two important parameters in the calculation of the trafc spectrum. They are sampling
rate p and segment length `. In the following subsections we will discuss how to select proper values
for them. We start with spectra for idealized periodic packet arrivals, and then investigate the impact
of sampling rate and segment length with general guideline for conguring their values.
2.4.2.1 Spectra for Periodic Packet Streams
To understand the effect of sampling rate and segment length we start with a simple scenario in
which we assume the packet stream has an idealized periodic pattern with one packet arrival every
T seconds. Hence the packet rate is 1=T packets per second (pps). Such a packet stream with
innite length can be viewed as an impulse train, or a comb function, in the time domain. The
function has a value innity at the time when there is a packet arrival and has a value 0 at all other
times.
comb
T
(t) =
1
X
n=1
(t nT); n = 0; 1; 2; ::: (2.5)
23
The frequency response of this impulse train is another comb function whose period equals to
1=T . It has a base frequency equal to 1=T , and innite number of harmonics that are multiples of
the base frequency.
FFT(comb
T
(t)) = comb
1=T
(t) =
1
X
n=1
(t n=T); n = 0; 1; 2; ::: (2.6)
Figure 2.5(a) shows the spectrum of an example of such a packet stream. In this example, the
packet stream has one packet arrival every 0.00125 second, i.e. a packet rate of 800pps. To obtain
its spectrum, we set the sampling rate to 8k per second and the segment length to 1 second. We see
the spikes at 800Hz and its harmonics in the spectrum. The amplitude at 800Hz base frequency is
equal to 635,000, very close to the square of the packet rate. The amplitudes at harmonics are less
than the amplitude at the base frequency, and are decreasing for higher harmonics.
Harmonics may create some difculty in correctly interpreting the trafc spectrum as the har-
monic of one periodic packet stream may be mis-interpreted as the base frequency of another peri-
odic packet stream. Fortunately we can distinguish them by their amplitudes as the amplitude of the
former one would be much less than the amplitude of the latter. For example, Figure 2.5(b) shows
the spectrum for a periodic packet stream of 1600pps. The amplitude at its base frequency 1600Hz
is four times the amplitude of the harmonic at 1600Hz for the 800pps periodic stream.
As we vary the sampling rate and segment length, the spectrum of the periodic packet stream
will change accordingly. We will discuss their impact and how to select proper values for them in
detail in the following two subsections.
24
0 500 1000 1500 2000 2500 3000 3500 4000
0
2
4
6
8
x 10
5
Frequency (Hz)
PSD
0 500 1000 1500 2000 2500 3000 3500 4000
0
0.5
1
Frequency (Hz)
NCS
(a) Spectrum for a packet stream of 800 packets per
second
0 500 1000 1500 2000 2500 3000 3500 4000
0
1
2
3
x 10
6
Frequency (Hz)
PSD
0 500 1000 1500 2000 2500 3000 3500 4000
0
0.5
1
Frequency (Hz)
NCS
(b) Spectrum for a packet stream of 1600 packets per
second
Figure 2.5: Spectra for periodic packet streams of different packet rates with 8kHz sampling rate
and 1s segment length
0 50 100 150 200 250 300 350 400 450 500
0
1
2
3
x 10
5
Frequency (Hz)
PSD
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
Frequency (Hz)
NCS
(a) Spectrum with a sampling rate of 1kHz
0 100 200 300 400 500 600 700 800 900 1000
0
5
10
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
Frequency (Hz)
NCS
(b) Spectrum with a sampling rate of 2kHz
Figure 2.6: Spectra for an 800Hz sinusoidal wave with different sampling rates and 1s segment
length
2.4.2.2 Aliasing and Sampling Rate
By Nyquist-Shannon sampling theorem, the sampling rate must be at least twice the highest fre-
quency component of a signal for accurate representation. If this condition is not satised, a fre-
quency component higher than half of the sampling rate will be folded back and represented by a
lower frequency in the spectrum. This artifact is called aliasing.
25
Aliasing has a negative impact as it gives misleading representation of real world events. For
example, Figure 2.6 shows the aliasing effect when an 800Hz sinusoidal wave is sampled with a
sampling rate of 1kHz. The segment length is 1 second. We can see that the 800Hz signal is folded
back to 200Hz. The peak at 200Hz in the spectrum is an alias of the original 800Hz signal. Without
other knowledge of the real signal, the aliasing makes us falsely conclude that the signal is a 200Hz
sinusoidal wave, where in reality the signal is an 800Hz sinusoidal wave. In contrast, Figure 2.6
shows the spectrum with a sampling rate of 2kHz, which correctly represents the original 800Hz
signal by having a single peak at 800Hz.
In general, if the sampling rate is p, a sinusoidal wave of F Hertz will be represented by a peak
at a frequency f calculated as follows.
f = F mod p; if F mod p < p=2
= p - (F mod p); otherwise
(2.7)
As we have seen in the previous subsection, the frequency response of an impulse train contains
the base frequency and innite number of harmonics. Hence the band of this impulse train is not
bounded by any value. In other words, no matter what sampling rate we choose, the impulse train
always contains higher frequency components and such components will be folded back creating
aliases. The locations of the aliases are governed by the rule in 2.7.
Traditionally people use analog or digital low pass lter to remove frequency components above
half of the sampling rate to prevent aliasing. However, the analog lter can not be applied to packet
arrival time sequence as the signal is already digital. The digital low pass lter also fails to remove
the aliases that fall in the pass band as the lter is applied after the sampling. In other words, If a
26
0 200 400 600 800 1000 1200 1400
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 200 400 600 800 1000 1200 1400
0
0.5
1
Frequency (Hz)
NCS
(a) Spectrum with a low sampling rate of 3kHz
0 2000 4000 6000 8000 10000 12000 14000
0
2
4
6
8
x 10
5
Frequency (Hz)
PSD
0 2000 4000 6000 8000 10000 12000 14000
0
0.5
1
Frequency (Hz)
NCS
(b) Spectrum with a high sampling rate of 30kHz
0 200 400 600 800 1000 1200 1400
0
2
4
6
8
x 10
5
Frequency (Hz)
PSD
0 200 400 600 800 1000 1200 1400
0
0.02
0.04
Frequency (Hz)
NCS
(c) Partial spectrum in 0Hz - 1500Hz with a high
sampling rate of 30kHz
Figure 2.7: Spectra for a periodic packet stream of 800 packets per second with different sampling
rates and 1s segment length
high frequency component is mapped to a frequency (alias) in the pass band according to the rule
in 2.7, it will pass the digital low pass lter and be represented by an alias.
The only solution left to reduce the impact of aliasing is to have high sampling rate. Generally
speaking, the amplitude for the harmonic tends to decrease as the frequency of the harmonic in-
creases in the spectrum for a periodic packet arrival time sequence of nite length. Having a higher
sampling rate can reduce the aliasing effect in a xed low frequency band because now only higher
frequency harmonics with weaker amplitudes can affect the spectrum in that band.
27
This can be illustrated by an example in Figure 2.7. In this example, the periodic packet stream
has a packet rate of 800pps and the segment length is 1 second. We sample it with two different
sampling rates, 3kHz and 30kHz, and the spectra under these two sampling rates are in Figure 2.7(a)
and 2.7(b), respectively. Figure 2.7(b) shows the amplitudes of the harmonics are decreasing as
their frequencies increase. Figure 2.7(c) shows the partial spectrum in [0Hz, 1500Hz] with sampling
rate 30kHz. We can see it has a much better representation for the frequency band [0Hz, 1500Hz]
compared with Figure 2.7(a). There is only one clear peak at 800Hz in this gure, while Figure
2.7(a) shows clear peaks not only in 800Hz, but also in 600Hz, 1000Hz, 1200Hz, and 1400Hz which
are aliases of the harmonics above 1500Hz.
However, a higher sampling rate also means higher overhead for both storage and processing. A
compromise has to be made between reducing the overhead and obtaining a better spectral represen-
tation. Unless specied otherwise, in this paper we select a conservative sampling rate of 200kHz
for most periodic patterns in Internet Trafc. This sampling rate is sufciently high to capture the
periodic patterns of transmitting 1500-byte packets over an 100Mbps bottleneck link (8333 packets
per second). A more thorough exploration of the trade-off for selecting a proper sampling rate is the
subject of our future work.
2.4.2.3 Segment Length
Besides sampling rate, another important parameter for the spectral representation is the trace seg-
ment length `. Figure 2.8 shows the spectra for a periodic packet stream of 800pps with different
segment length. For the left subgraph the segment length is 2 second, while the segment length is 4
second for the right subgraph. For both subgraphs we the same sampling rate of 8kHz. As we can
see, both subgraphs show spikes around 800Hz and its harmonics. The difference is the amplitude
28
0 500 1000 1500 2000 2500 3000 3500 4000
0
1
2
3
x 10
6
Frequency (Hz)
PSD
0 500 1000 1500 2000 2500 3000 3500 4000
0
0.5
1
Frequency (Hz)
NCS
(a) Spectrum with 2s segment length
0 500 1000 1500 2000 2500 3000 3500 4000
0
5
10
15
x 10
6
Frequency (Hz)
PSD
0 500 1000 1500 2000 2500 3000 3500 4000
0
0.5
1
Frequency (Hz)
NCS
(b) Spectrum with 4s segment length
Figure 2.8: Spectra for a periodic packet stream of 800 packets per second with 8kHz sampling rate
and different segment length
of these spikes. The amplitude at 800Hz with 4s segment length is four times the amplitude at
800Hz with 2s segment length which in turn is four time the amplitude at 800Hz with 1s segment
length shown in Figure 2.5(a). In general, the amplitude at the base frequency increases as a square
function of the segment length, specically it equal to approximately the square of the product of
segment length and packet rate.
While having longer segment length can increase the signal amplitude, potentially making it
easier to recognize the periodic pattern inside the trafc, it also signicantly increases the compu-
tation overhead for obtaining the spectrum. We need to make a compromise for selecting a proper
segment length. On one hand, if the segment length ` is too short, we may not have enough sam-
ples to reveal interesting patterns inside the packet trace from the spectrum. What it shows may
be temporary or transient phenomena on the network. On the other hand, if it is too long, we may
unnecessarily increase the computation overhead. In addition, patterns inside the packet stream may
have changed during this long period. For the detection of bottleneck trafc and other periodic pat-
terns investigated in this paper, we suggest to use segment length on the order of seconds. Unless
29
specied otherwise, we use a default value of 1 second for segment length. We will present further
investigation of the impact of different segment lengths on detecting periodic patterns in Chapter 4.
2.5 Detection
One salient feature of our methodology is that we go beyond the visual evidence by applying rig-
orous statistical methods to automatically examine the visual evidence and draw conclusions. We
divide all detection algorithms into two categories, non-parametric detection and parametric detec-
tion. The classication is based on how much information the algorithm needs for the detection
purpose. Non-parametric detection algorithms only require the input of the spectral information
of the trafc, while parametric detection algorithms consider not only the spectral information, but
also other parameters such as trafc volume.
The advantage of non-parametric detection algorithms is that they use heuristics for detecting
periodic patterns. They do not require explicit modeling of the underlying processes. Results show
that with simple heuristics they can achieve pretty good performance in the common cases. On
the other hand, parametric detection algorithms are proposed for their capability to look beyond
simple heuristics about the periodic patterns. They intend to get better performance by utilizing the
modeling work for how the periodic patterns are formed and reshaped.
For non-parametric detection we have built a framework based on the Bayes Maximum Likeli-
hood Detection. In this framework we treat the detection of a periodic pattern as a binary hypothesis
testing problem where the hypothesis H
B
corresponds to the presence of such a pattern and hypothe-
sis H
0
corresponds to its absence. We select a feature from the spectral representation, for example,
the peak amplitude in a frequency window, as the feature for detection. Then we use training to
estimate the probability density functions of the feature under both hypotheses. To decide if there
30
is such a period pattern in an unknown trafc we calculate the likelihood of each hypothesis and
declare the presence of the periodic pattern in the trafc if H
B
is more likely than H
0
.
We have designed four specic non-parametric detection algorithms, each utilizing a different
kind of feature for detection. The Single-Frequency Algorithm considers the amplitude at a single
frequency, while the Top-Frequency Algorithm checks the peak amplitude in a frequency window.
The other two algorithms, the Top-M-Frequencies Algorithm and the All-Frequencies Algorithm,
use multiple amplitudes to form a multi-variate detector. Details of these algorithms and the evalu-
ation using real Internet trafc can be found in Chapter 4.
For parametric detection we have investigated a number of parameters that play a role in the
generation and evolution of the periodic pattern. Given the constraints on our system that no ow-
level analysis can be performed and the decision has to be made solely based on the observed trafc,
we select the aggregate trafc volume as a key parameter and explore several approaches to utilize it
for improving the detection algorithm's performance. Details of these approaches and the evaluation
results can be found in Chapter 5. As we gain a better understanding of the processes that generate
and reshape the periodic patterns in our future work, we would be able to design better parametric
detection algorithms to take advantage of the system model.
2.6 Summary
As the Internet evolves, it is important to understand Internet trafc in a growing large scale. Flow-
based analysis is unable to keep up with the high speed network trafc. Hence we proposed a
statistical signal processing of spectral signals methodology to analyze Internet trafc at the aggre-
gate level to extract periodic patterns. Our intuition is based on existing work which has shown
spectral techniques provide a promising approach to analyze Internet trafc. We have gone beyond
31
current work by relying on rigorous statistical methods to automatically detect periodic patterns in
Internet trafc and pushing to understand the underlying causes of the patterns detected from the
network trafc. In this chapter we presented a summary of our methodology and the details of its
rst two major components how to capture packet traces and how to obtain their spectral repre-
sentation. We also gave a summary of the Detection component which will be elaborated further in
Chapter 4 and 5. As we will see in subsequent chapters, this methodology can used successfully to
detect various periodic patterns in Internet trafc.
32
Chapter 3
VISUALIZING REGULAR PATTERNS IN INTERNET TRAFFIC
3.1 Introduction
There are a variety of periodic patterns in Internet trafc. In Chapter 2 we have presented a method-
ology to detect such patterns using spectral techniques and the details for how to capture Internet
trafc and obtain its spectral representation. Before we introduce algorithms to automatically de-
tect the periodic patterns in the spectral domain, we visually present the spectral characteristics of
three regular patterns in this chapter to demonstrate the capability of our technique for revealing the
existence of such patterns in Internet trafc.
The advantage of visual representation is that they can be easily understood and digested by
readers for illustration purpose. On the other hand hand, the drawback of visual representation is that
they require human interpretation which can be subjective, qualitative, and infeasible for complex
scenarios. In this chapter we focus on visual representations of the spectra to build intuition for the
signature we associate with each pattern. In Chapter 4 and 5, we build upon this intuition to design
algorithms that rely on rigorous statistical methods to detect periodic patterns i quantitatively.
The three patterns we select are patterns caused by trafc congestion along bottleneck links,
DoS (Denial-of-Service) attacks, and TCP windowing behavior, respectively. We visually present
33
their spectral signatures by plotting both PSD (Power Spectral Density) and NCS (Normalized Cu-
mulative Spectrum) of the trafc with the respective pattern inside. We rst present the spectral
signatures of these three patterns in a simple topology with no or little cross trafc interference.
Then we use the pattern caused by bottlenecks as a representative case to investigate how cross
trafc can affect the spectral signature using controlled lab experiments. Finally we carry out wide-
area network experiments to demonstrate the feasibility of detecting periodic patterns in real Internet
trafc.
3.2 Regular Patterns in Internet Trafc
There are many periodic patterns in the Internet. For example, trafc congestion at a bottleneck link
results in packets being transmitted back-to-back along the bottleneck link. At the network layer
and transport layer, many protocols have periodic exchange of the information, such the periodic
routing information exchange in the routing protocols. Strong regularities also exist in the trafc
automatically generated by machines at the application layer, such as zombies in DoS (Denial-of-
Service) attacks or miscongured DNS clients. Such periodic processes imprint their own periodic
signatures on network trafc. In this section, we visually demonstrate the spectral characteristics of
such periodic patterns. We select three types of patterns as representatives to show the capability of
our spectral technique for revealing them. These three types of patterns are imposed by bottleneck
links, by DoS attacks, and by TCP windowing mechanism. They are described below in Section
3.2.1, 3.2.2, and 3.2.3.
34
3.2.1 Pattern Imposed by Bottleneck Links
When a link is congested, it sends packets out back-to-back. Assuming all packets are of the same
size, we will see a single periodic pattern with the period equal to the packet interarrival time. The
packet interarrival time here is calculated as follows.
Interarrival Time =
Packet Size
Link Bandwidth
(3.1)
The base frequency of this pattern would be a reciprocal of the interarrival time. Hence it can be
calculated by the following formula.
Base Frequency =
Link Bandwidth
Packet Size
(3.2)
Even if packets are of different sizes, a few common packet sizes tend to dominate the trafc (see,
for example, work by CAIDA [20], Katabi et al. [44], and Sinha et al. [72]), and such trafc
still shows strong energy on the frequencies determined by the link bandwidth and packet size
distribution when it experiences congestion along the bottleneck link.
In this subsection, we will examine such periodicities imposed by bottleneck links by conduct-
ing experiments in a simple topology. In this simple topology, the sender and the receiver are
directly connected through a switched Ethernet hub. We select a switched hub instead of a shared
hub because a switched hub has higher efciency in forwarding trafc and is becoming more com-
mon today. We run tcpdump on the receiver side to gather packet traces. We vary the Ethernet link
bandwidth and the trafc type to get the spectra under different scenarios.
35
0 1 2 3 4 5 6 7 8 9 10
x 10
4
0
0.5
1
1.5
2
x 10
7
Frequency (Hz)
PSD
0 1 2 3 4 5 6 7 8 9 10
x 10
4
0
0.5
1
Frequency (Hz)
NCS
(a) Complete spectrum
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.5
1
1.5
2
x 10
7
Frequency (Hz)
PSD
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.05
0.1
Frequency (Hz)
NCS
(b) Partial spectrum
Figure 3.1: Spectral signature of a 100Mbps link saturated with a TCP ow
For these experiments, we use Iperf [78] to generate controlled TCP and UDP packet streams.
Iperf can mimic le downloads with TCP or CBR (Constant-Bit-Rate) trafc sent via UDP. Each
experiment lasts for 30 seconds. We cut the trace into individual segments of 1 second long and
use the technique in Section 2.4 to calculate their spectral representations. Although there is some
variation in the spectra among these segments, the variation is small. Thus we only present the
result of an arbitrarily selected trace segment here.
3.2.1.1 TCP Trafc through a 100Mbps Bottleneck
In the rst experiment, we use a single Iperf TCP ow to saturate a 100Mbps switched Ethernet
link with 1500-byte packets. We set the TCP socket buffer size to 128K bytes. Since the sender and
receiver are directly connected through the Ethernet link with a RTT (Round-Trip Time) less than 1
ms, the 128K-byte TCP buffer is more than enough for the TCP ow to capture the entire link band-
width. In the experiment, the actual bit rate of the TCP ow (including Ethernet packet overhead)
reaches 100Mbps. This shows that the TCP ow can fully utilize the Ethernet link bandwidth.
36
Figure 3.2: Ethernet frame format
Figure 3.1(a) illustrates the complete measured spectrum of the packet stream along the Ethernet
link. We dene the complete measured spectrum as the spectrum that shows the energy or strength at
all observable frequencies from 0Hz up to half of the sampling rate. Since we use a default sampling
rate of 200k per second, the highest observable frequency is 100kHz here. In the spectrum we
observe spikes around the 8130Hz base frequency and at multiples of this frequency. The amplitude
at 8130Hz in the PSD reaches nearly 17,000,000. We call the 8130Hz base frequency the bottleneck
frequency.
To fully understand why the strong energy appears at 8130Hz, we refer to the Ethernet frame
format specied in IEEE 802.3 2002 Standard [3] and illustrated in Figure 3.2. According to this
Standard, each Ethernet frame has a total of 38-byte overhead, including 8-byte preamble, 6-byte
destination MAC address, 6-byte source MAC address, 2-byte type/length eld, 4-byte CRC, and
12-byte Inter-Packet Gap. The max length of the Ethernet data payload is 1500-byte, which is
used by the Iperf TCP ow in this experiment. Hence the packet interarrival time for the Iperf ow
according to formula 3.1 would be (1500 + 38) * 8 bit / 100Mbps = 0.12304ms. The reciprocal of
0.12304ms is a frequency of 8127.44Hz. Since tcpdump has a time resolution of 1 microsecond,
the majority of the packet interarrival times would be 0.123ms, resulting in strong energy around
8130Hz, the reciprocal of 0.123ms, in the actual spectral representation.
From this experiment we can see that high energy around the 8130Hz base frequency and its
harmonics is a strong indication of the presence of trafc through a 100Mbps bottleneck link. The
37
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
Figure 3.3: Spectral signature of a 10Mbps link saturated with a TCP ow
harmonics exist in the spectrum for the reason as we have explained in Section 2.4.2. Since har-
monics do not provide additional information, for simplicity we shall focus mainly on the partial
spectrum that contains only the base frequency for the visual demonstration of the trafc spectra in
this chapter. For example, we focus on the partial spectrum from 0Hz to 10kHz in Figure 3.1(b)
because this partial spectrum contains only the 8130Hz base frequency for the experiment with a
100Mbps bottleneck link.
3.2.1.2 TCP Trafc through a 10Mbps Bottleneck
In the second experiment, we repeat the prior experiment but with a 10Mbps switched Ethernet
link. The TCP can still fully utilize the link bandwidth with an actual bit rate of 10Mbps (including
the Ethernet packet overhead). This bit rate is one tenth of the previous experiment, since the link
bandwidth is only one tenth of the previous one.
Figure 3.3 shows the corresponding spectrum of the packet stream. As expected, we see strong
energy around 813Hz, which is the base frequency for the 10Mbps bottleneck link. It agrees with
the maximum packet rate of 1500-byte packets over the 10Mbps Ethernet link. The PSD amplitude
at 813Hz reaches 510,000, which is much lower than the amplitude at 8130Hz for the experiment
with a 100Mbps link in Figure 3.1(b). Harmonics of the 813Hz base frequency are not included
38
in the graph since we focus on the partial spectrum that covers only the 813Hz base frequency for
simplicity in the visual demonstration.
3.2.1.3 UDP Trafc through a 10Mbps Bottleneck
To investigate how the spectrum changes with CBR trafc, we saturate a 10Mbps link with an Iperf
UDP ow. We rst congure Iperf to send out 1472-byte UDP packets with a sending rate of
10Mbps. The Ethernet frame data payload length would be 1500-byte after adding 28-byte UDP/IP
headers. Again we observe a bit rate of 10Mbps (including Ethernet packet overhead), showing that
the UDP ow can also fully capture the link bandwidth.
Figure 3.4(a) depicts the spectrum of the Iperf UDP ow. It includes a single peak at the 813Hz
base frequency (harmonics at higher frequencies are omitted as we focus on the partial spectrum
containing only the base frequency for simplicity in the visual demonstration). The PSD amplitude
at 813Hz is 612,000, which is a bit higher than the amplitude at 813Hz for the TCP ow in Fig-
ure 3.3. This agrees with our intuition that packet transmissions of UDP trafc are more regular
than TCP trafc because UDP trafc typically does not adjust the sending rate according to the net-
work condition, while TCP trafc does. However, the difference here is not very signicant because
we do not have competing cross trafc. We will see increasing differences between TCP and UDP
trafc in the presence of competing cross trafc when we investigate the impact of cross trafc in
Section 3.3.
In the next experiment, we congure Iperf to send out 772-byte UDP packets with a sending rate
of 10Mbps. The Ethernet frame data payload length would be 800-byte after adding 28-byte UDP/IP
headers. The bit rate along the Ethernet link is still 10Mbps (including Ethernet packet overhead).
The spectrum in Figure 3.4(b) shows a spike around 1492Hz, which agrees with the base frequency
39
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
8
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(a) With 1472-byte UDP Packets
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
1.5
2
x 10
6
Frequency (Hz)
PSD
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.005
0.01
0.015
0.02
Frequency (Hz)
NCS
(b) With 772-byte UDP Packets
Figure 3.4: Spectral signature of a 10Mbps link saturated with different size UDP packets
calculated by formula 3.2 with 10Mbps link bandwidth and 838-byte packet size (800-byte data
payload plus 38-byte Ethernet overhead). In the PSD, the amplitude at 1492Hz reaches 1,944,000,
which is more than 3 times the amplitude at 813Hz for the experiment with 1500-byte packets in
Figure 3.4(a).
From the above experiments, we see that the spectrum of a bottleneck link can vary according
to a number of factors. Among them, link bandwidth and packet size distribution are the two most
important factors and they will determine where the energy spike will appear, i.e. the bottleneck
frequency and its harmonics. Beyond that, UDP or CBR streams appear more regular than TCP
ows, resulting in higher amplitude in the bottleneck frequency.
3.2.2 Pattern Imposed by Denial of Service Attacks
In Denial of Service (DoS) attacks, attacking machines typically send packets out as fast as possible
in order to overwhelm the victim so that it does not have enough resources, e.g. bandwidth and
CPU power, to continue normal service for legitimate users. If we look at individual attack packet
ows, they will demonstrate strong periodicities as regulated by the attack packet size, bottleneck
40
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
0
2
4
6
x 10
7
Frequency (Hz)
PSD
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
0
0.05
0.1
0.15
0.2
Frequency (Hz)
NCS
Figure 3.5: Spectral signature of a Mstream attack stream through a 10Mbps link
link speed, hardware and software of the attack machine (e.g. CPU power, attack tool algorithm),
etc. This will be translated into strong energy in the associated frequencies in the spectral domain.
To validate the ability of our spectral technique in revealing DoS attacks, we carry out an ex-
periment with the Mstream [2] attack tool. In the experiment, two machines are connected directly
through a 10Mbps switched Ethernet hub. One machine is running Mstream to send 40-byte TCP
packets as fast as possible. We run tcpdump on the receiver side to gather a 30-second long packet
trace. We cut the trace into 1-second long segments and calculate their spectra according to the
formulas in Section 2.4.
Figure 3.5 shows the spectral characteristics of the attack stream in one trace segment. Spec-
tra for other trace segments are similar. We see a sharp energy peak around the 14881Hz base
frequency. The 14881Hz frequency agrees with the highest packet rate possible over the 10Mbps
Ethernet. Although the attack tool sends only 40-byte long packets, Ethernet layer will pad the
payload to 46-byte, the minimum Ethernet payload length. The packet interarrival time would be
(46 + 38) * 8 bit / 10Mbps = 67.2 microseconds. This corresponds to a frequency of 14880.95Hz,
which matches closely with the actual spectral representation in our experiment.
Comparing with the spectrum of the same link saturated by a TCP ow with 1500-byte packets
in Figure 3.3, the DoS attack stream has an amplitude that is almost two-magnitude higher. This
41
(a) Initial Slow-start state (b) Steady state
Figure 3.6: An Example of TCP Windowing Behavior
unusually strong energy at the high frequencies can be an indication of ongoing Denial-of-Service
attacks.
There are certain ways that an attacking machine can try to evade detection through spectral
analysis. One is to introduce random delay between two consecutive packets. This will add ran-
domness in the packet interarrival time distribution, causing the energy to be spread out over a large
range of frequencies in the spectral representation, so that it becomes more difcult to detect the
attack. Another approach is to randomize the packet size, which can also spread out the energy.
But both approaches will reduce the effectiveness of the attack, as they reduce the attack packet rate
compared with the highest packet rate by sending the smallest packets as fast as possible. A more
thorough investigation on detecting DoS attacks with spectral techniques in the paper by Hussain et
al. [35].
42
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
1
2
3
4
x 10
5
Frequency (Hz)
PSD
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.05
0.1
0.15
0.2
Frequency (Hz)
NCS
(a) Spectrum in 0Hz - 10kHz with 64K-byte Window
0 20 40 60 80 100 120 140 160 180 200
0
1
2
3
4
x 10
5
Frequency (Hz)
PSD
0 20 40 60 80 100 120 140 160 180 200
0
0.01
0.02
0.03
Frequency (Hz)
NCS
(b) Spectrum in 0Hz - 200Hz with 64K-byte Win-
dow
Figure 3.7: Spectral signature of an under-performing TCP ow
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
1
2
3
4
x 10
6
Frequency (Hz)
PSD
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.05
0.1
0.15
0.2
Frequency (Hz)
NCS
(a) Spectrum in 0Hz - 10kHz with 256K-byte Win-
dow
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
2
4
6
8
x 10
6
Frequency (Hz)
PSD
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.05
0.1
Frequency (Hz)
NCS
(b) Spectrum in 0Hz - 10kHz with 1M-byte Window
Figure 3.8: Spectra of a TCP ow with increasing window size
3.2.3 Pattern imposed by TCP Windowing Behavior
Several network protocols also exhibit periodic behavior. For example, at the transport level, TCP
employs a windowing mechanism for ow control and congestion control. Other protocols such as
BGP, exchange periodic updates. Such behavior can be captured and characterized using spectral
techniques. This in turn can help diagnose problems with such protocols. For example, spectral
techniques can detect under-performing TCP ows that fail to utilize the full capacity of the path
because their sender buffers are smaller than the Bandwidth-Delay Product (BDP) along the ow
43
path. Such ows show strong regularity around its Round-Trip Time (RTT) due to the TCP win-
dowing mechanism that we will describe shortly. Hence the existence of such ows can be detected
by our spectral techniques. Once we detect the existence of such ows, we can carry out further
analysis using more expensive ow-based techniques to identify these ows so that the correspond-
ing end hosts can be notied (through out-of-band channels) to increase their TCP buffer size to
achieve higher throughput.
Figure 3.6 illustrates the TCP windowing mechanism where ACKs from the receiver control
the sender's injection of packets into the network. For simplicity we assume all TCP data packets
are of the same size and the TCP sending buffer is smaller than the bandwidth-delay product along
the ow path and can hold only four data packets. In addition we assume the receiver advertises a
sufciently large window to the sender and we do not consider competition from cross trafc for
the illustration purpose.
Overall the TCP ow has two states, slow-start state and steady state. In the slow-start state,
the sender sends two data packets back-to-back for each ACK packet received, basically doubling
the sending window size every RTT. In our example the TCP will be in the slow-start state for the
rst two RTTs, as illustrated in Figure 3.6(a). In the rst RTT the sender sends one packet and then
gets an ACK packet. In the second RTT it sends out two packets and gets two ACK packets. At the
beginning of the third RTT the sender's sending window size becomes four packets and the sending
window size will not be increased any more due to the limit of the sending buffer size. From now
on the TCP enters the steady state. In this state, the sender repeatedly sends out four data packets
back-to-back and then wait for the ACKs before sending out four more data packets. The steady
state is illustrated in Figure 3.6(b). In both the slow-start state and steady state, we see a busy-then-
idle pattern repeats for every RTT. In case delayed ACKs are implemented, the receiver will send
44
an ACK only when receiving every second data packet or when an timer expires if the second data
packet does not arrive. Although delayed ACKs change the number of ACKs in our example, we
will still see the busy-then-idle pattern for every RTT. More details of delayed ACKs can be found
in [7].
For the above example, we expect to see two prominent frequencies, one related to the back-
to-back packet transmission and the other related to the busy-then-idle pattern every RTT. The rst
one can be calculated using the formula 3.2 and we call it link frequency. We call the second one
the RTT Frequency and it can be calculated as follows.
RTT Frequency =
1
RTT
(3.3)
To test the ability of our spectral technique in detecting such under-performing TCP ows, we
conduct a cross-country experiment with a TCP ow whose window size is much smaller than the
bandwidth-delay product of the path. The experiment involves two machines located on the east
and west coast of the US, respectively. We rst generate a TCP ow between these two machines
using Iperf with the default window size of 64K bytes. The path has a RTT around 68.5 ms. The
minimum link bandwidth along the path is 100Mbps. So the default window size is less than the
bandwidth-delay product of the path which is equal to 6.85M bits or 856K bytes. As a result, the
TCP ow is unable to fully utilize the full capacity of the path. It has a throughput of 7.8Mbps for
the 30-second long experiment, which matches closely to our expectation (64 * 1024 * 8 bits / 68.5
ms = 7.65Mbps).
Figure 3.7 shows the spectrum for this ow in a 1-second long segment. There are two important
points to observe. The rst is that there is a clear energy concentration around 8kHz as indicated
45
by the NCS in the left subgraph. The 8kHz corresponds to the back-to-back packet transmission
of 1500-byte packets over the 100 Mbps link. It has an amplitude around 30,000, only 1/500 of
the amplitude for the 100Mbps bottleneck spectrum in Figure 3.1(b). The reason is because the
100Mbps link is only utilized lightly. The second point, however, is the energy spikes at a much
lower frequency, 14.5Hz, and its harmonics. The spikes in the lower frequencies can be seen more
clearly by zooming into the [0Hz, 200Hz] range in the right subgraph. The 14.5Hz frequency
coincides with the RTT of the ow. This signal is a result of the windowing behavior of TCP.
Since the sending window are too small to ll the pipe, TCP degrades to having the busy-then-idle
pattern similar to Figure 3.6. This pattern is captured by its spectral representation. Therefore, our
technique is able to detect an under-performing TCP ow that is not congured properly to utilize
the full capacity along its path.
In Figure 3.8 we show what happens when we gradually increase the sending window size. With
increasing window size we notice that the energy around the 8kHz link frequency becomes stronger
while the energy around the RTT frequency becomes relatively weaker, indicating that TCP can
better utilize the path capacity. In fact, the TCP throughput reaches 32.4Mbps with a 256K-byte
sending window and 93.3Mbps with a 1M-byte sending window. As the sending window size
approaches the bandwidth-delay product, the energy around the RTT frequency becomes much
weaker compared with the energy around the 8kHz link frequency, and the graph indicates a healthy
TCP ow.
46
We can see from the above experiments that under-performing TCP ows can be detected
through their strong low frequency components, i.e., energy concentration around their RTT fre-
quencies. Once we detect the existence of such ows, we can do further analysis using more ex-
pensive ow-based techniques to identify these ows so that the corresponding end hosts can be
notied to increase their TCP buffer size to achieve higher throughput.
3.2.4 Summary of Regular Patterns
In the above subsections we have visually demonstrated the spectral characteristics of three different
regular patterns in Internet trafc. For the pattern caused by trafc congestion along a bottleneck
link, its spectrum shows strong energy around the base frequency regulated by the bottleneck link
bandwidth and packet size as in formula 3.2. For DoS attacks, its periodicity can also be calculated
using the same formula 3.2 if the link bandwidth is the resource that is exhausted in the attack.
As many attacks use small packet sizes, we see much higher frequency component compared to the
spectrum for the bottleneck saturated with normal trafc. The pattern in TCP windowing behavior is
revealed by the its low frequency component, the reciprocal of the RTT of the TCP ow. In summary
these patterns show unique spectral characteristics and automatic algorithms can be designed to
detect their existence using their spectral characteristics.
In addition to detecting these three types of regular patterns, our spectral techniques can be used
to characterize other periodic patterns, including those in the application layer. For example, by
monitoring trafc on a normal day for a web server, we can create a signature of a normal pattern
of web requests. Then, if a server is attacked, there will be a sudden change in the pattern of web
requests, which will be visible in the spectrum.
47
Finally, we point out that the input data for spectral analysis need not come just from packet
traces. Time-series data can be created from many other sources, such as SNMP traces, NetFlow
logs, etc. Signatures could be constructed for several applications, such as RealAudio, V oIP, Mail,
etc. There are many possibilities, but research is needed to determine which of these sources of data
are more interesting and how to best represent them for the human operator. Part of our future is to
discover more regular patterns in Internet trafc using spectral techniques.
In this section we focus on the graphic representation of the spectral characteristics of periodic
patterns. In the next section we will use controlled lab experiments to investigate how cross traf-
c can affect the spectral characteristics of periodic patterns. Based on the intuition gained from
these experiments, we design automatic detection algorithms that can quantitatively detect periodic
patterns in Chapter 4 and 5.
3.3 Effect of Cross Trafc
In previous experiments, we have studied the spectral characteristics of the regular patterns caused
by bottleneck links, DoS attacks, and TCP windowing behavior when there is no or little competing
cross trafc. In this section, we will use the pattern caused by bottleneck links as a representative of
regular patterns to investigate how cross trafc can affect the spectral signature of regular patterns.
We rst classify cross trafc into three different types in Section 3.3.1. Then we carry out
experiments to visually demonstrate the impact of each type of cross trafc on the the signature of
the regular pattern caused by bottleneck links in Section 3.3.2, 3.3.3, and 3.3.4. We expect the
cross trafc to have similar effect on the spectral signatures for DoS attack and under-performing
TCP ows. The impact of cross trafc will be considered in the design of our automatic detection
algorithms in Section 4.3.
48
Figure 3.9: Different types of cross trafc
3.3.1 Classication of Cross Trafc
Figure 3.9 illustrates three types of cross trafc that can affect our observation of the bottleneck
trafc. In this gure, trafc travels from source S to destination D and it passes a bottleneck link
between R1 and R2. We monitor trafc along the link between R3 and R4. We are interested in
observing the bottleneck signal generated at the R1R2 link at the monitored link between R3 to R4.
We identify three types of cross trafc in the graph depending on whether the cross trafc traverses
the bottleneck link and the monitored link or not. They are listed below:
Unobserved bottleneck trafc: cross trafc that traverses the bottleneck link but does not reach
the monitored link. Such trafc carries part of the energy from the signature imposed by the
bottleneck. Missing this trafc may attenuate the signal strength observed at our monitored
link.
Unobserved non-bottleneck trafc: cross trafc that shares some link(s) with observed bottle-
neck trafc, but does not go through the bottleneck and is not observed at the monitored link.
Such trafc can distort the signal of the bottleneck link as it competes with bottleneck trafc
49
Figure 3.10: Testbed topology
in the shared path, introducing variation in packet arrival times and making the signal more
noisy.
Observed non-bottleneck trafc: cross trafc that does not go through the bottleneck link but
is observed at the monitored link. We also call it background trafc. Its impact comes from
two aspects: (a) like unobserved non-bottleneck trafc, it competes with bottleneck trafc
in the shared path; (b) it directly inuence the spectrum of observed aggregate trafc, as the
aggregate contains both bottleneck trafc and background trafc.
We carry out controlled lab experiments to investigate the impact of each of the three types of
cross trafc. In the experiments, we use a dumbbell topology depicted in Figure 3.10 instead of the
network as in Figure 3.9, since the dumbbell topology is easier to construct and can satisfy our need
to evaluate the impact of each type of cross trafc separately.
In the dumbbell topology, all links are through switched Ethernet hubs. We run tcpdump to
gather the trafc along the monitored link. We use Iperf to generate either a TCP ow or a UDP
ow from node S to node D to saturate a bottleneck link, and use a tool called Surge [8] to generate
web trafc from node N1 to node N4 as cross trafc. We vary the bottleneck link and the monitored
50
Figure 3.11: Experiment setup for unobserved bottleneck trafc
Table 3.1: Mapping in the experiment for unobserved bottleneck trafc
Node in the logical topology S D A1 R1 A2 R2 A3 R3 A4 R4
Node in the physical topology S D N1 N2 N4 N3 - - - -
link in dumbbell topology to get different experiment scenarios to investigate the impact of different
types of cross trafc. Section 3.3.2, 3.3.3, and 3.3.4 show the experiments for each of the three
types of cross trafc, including the corresponding mapping from the nodes in the logical topology
in Figure 3.9 to the nodes in the physical dumbbell topology in Figure 3.10.
3.3.2 Impact of Unobserved Bottleneck Trafc
We rst investigate the impact of unobserved bottleneck trafc. The exact experiment setup is
illustrated in Figure 3.11. Table 3.1 shows the mapping from the nodes in the logical topology in
Figure 3.9 to the nodes in the physical dumbbell topology in Figure 3.10. In the experiment setup
all links have a capacity of 100Mbps except the bottleneck link N2N3 which has a capacity of
10Mbps. We run tcpdump at node D to monitor the trafc along the link from N3 to D. We create
an Iperf ow in either TCP mode or UDP mode with a congured sending rate of 10Mbps from S to
D to see how TCP and UDP react differently to cross trafc. The Iperf ow serves as the observed
51
Table 3.2: Throughput with Iperf TCP ow as observed bottleneck trafc and web
trafc as unobserved bottleneck trafc
Simulated Web trafc (link N1N2) Iperf TCP ow (link N3D) Aggregate trafc (link N2N3)
web users bit rate, packet rate bit rate, packet rate bit rate, packet rate
10 0.82Mbps, 85pps 8.93Mbps, 745pps 9.75Mbps, 830pps
80 3.56Mbps, 379pps 6.15Mbps, 513pps 9.71Mbps, 892pps
640 5.19Mbps, 606pps 4.45Mbps, 371pps 9.64Mbps, 977pps
Table 3.3: Throughput with Iperf UDP ow as observed bottleneck trafc and web
trafc as unobserved bottleneck trafc
Simulated Web trafc (link N1N2) Iperf UDP ow (link N3D) Aggregate trafc (link N2N3)
web users bit rate, packet rate bit rate, packet rate bit rate, packet rate
10 0.55Mbps, 61pps 9.19Mbps, 766pps 9.74Mbps, 828pps
80 2.26Mbps, 231pps 7.46Mbps, 622pps 9.72Mbps, 854pps
640 3.45Mbps, 328pps 6.28Mbps, 524pps 9.73Mbps, 852pps
bottleneck trafc as it goes through the bottleneck link N2N3 and passes the monitored link N3
D. We use Surge [8] to generate web trafc between nodes N1 and N4. The web trafc serves as
unobserved bottleneck trafc as it goes through the bottleneck link N2N3 and does not pass the
monitored link N3-D.
In the experiment we vary the number of web users simulated by Surge to control the volume
of the web trafc that competed with the Iperf TCP ow along the bottleneck link N2N3. The
throughput of the trafc along link N1-N2, link N2N3, and link N3D is summarized in Table 3.2
for the TCP case and Table 3.3 for the UDP case. As we increase the number of web users simulated
in Surge, the web trafc volume increases. This results in a decrease of the Iperf TCP/UDP trafc
volume, because the web trafc and the Iperf trafc share the same bottleneck link N2N3. But
the decrease for the Iperf UDP ow is much less dramatic than the Iperf TCP ow with the same
number of simulated web users. For example, when the number of simulated web users increases
from 10 to 80, the packet rate of the Iperf UDP ow decreases from 766 pps to 622 pps, a drop of
144 pps, while the Iperf TCP ow has a drop of 233 pps from 745 pps to 512 pps. This agrees with
52
0 100 200 300 400 500 600 700 800 900 1000
0
1
2
3
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(a) Iperf TCP ow with light web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
1
2
3
4
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(b) Iperf UDP ow with light web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(c) Iperf TCP ow with medium web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
5
10
15
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(d) Iperf UDP ow with medium web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
Frequency (Hz)
NCS
(e) Iperf TCP ow with heavy web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
5
10
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(f) Iperf UDP ow with heavy web trafc
Figure 3.12: Power spectra of observed bottleneck ow as unobserved bottleneck trafc increases
our intuition that UDP trafc is less sensitive to cross trafc than TCP trafc as the UDP sender
typically does not adjust its sending rate according to the network condition while the TCP sender
does.
53
Figure 3.12 shows the corresponding spectra of the observed bottleneck trafc (i.e. the Iperf
bottleneck ow observed on link N3D), as we increase the number of simulated web users from
10 to 80 and 640. The three left subgraphs are the spectra for the TCP case, while the three right
subgraphs are for the UDP case. For both TCP and UDP case, increasing unobserved bottleneck
trafc results in the following three types of changes on the spectra for observed bottleneck trafc.
In addition, the change is generally less dramatic for the UDP case than for the TCP case, as we
will see in the specic numbers below.
The rst change is the decrease of the peak amplitude around the 813Hz base frequency in the
PSD. For example, when the number of simulated web users increases from 10 to 80, the peak
amplitude around the base frequency for the Iperf UDP ow decreases from 390,000 to 140,000,
a decrease of 64%, while for the Iperf TCP ow the peak amplitude decreases from 240,000 to
53,000, a decrease of 78%. The reason for the decrease is because the unobserved bottleneck trafc
also carries part of the energy of the periodic pattern caused by the bottleneck link. Missing such
trafc at the monitored link means the energy in the spectrum will decrease accordingly.
The second change is a wider spread of the energy around the base frequency, i.e. less steep
increase around the base frequency in the NCS. For example, when the number of simulated web
users is 10, NCS for TCP has an increase of 0.008 from 805Hz to 818Hz (a 13Hz wide window),
while NCS for UDP has an increase of 0.008 from 808Hz to 816Hz (a 8Hz wide window). When
the number of simulated web users increases to 80, NCS for TCP has an increase of 0.006 from
790Hz to 830Hz (a 40Hz wide window), while NCS for UDP has an increase of 0.006 from 800Hz
to 820Hz (a 20Hz wide window). The wider spread of the energy around the base frequency is due
to the increasing interference from the unobserved bottleneck trafc. As we use web trafc for the
unobserved bottleneck trafc and the web trafc consists of packets of various sizes, the competition
54
between the increasing web trafc and the observed bottleneck trafc leads to less regular packet
interarrival times inside the observed bottleneck trafc, resulting in wider energy spread.
The third change is that with the presence of cross trafc the peak amplitude is no longer xed
at 813Hz as the case with no cross trafc in Figure 3.3, but rather shift its location in a small range
of window around the 813Hz base frequency. For example, the peak amplitude appears at 810Hz in
Figure 3.12(e). For the 30 trace segments with 640 simulated web users, the exact location of the
peak amplitude varies from 798Hz to 816Hz. This shift of peak amplitude locations is due to the
variation of the web trafc over different times and different number of simulated web users. As
the volume and composition of the web trafc vary, the exact peak amplitude location also varies
depending on the exact interaction between the web trafc and observed bottleneck ow.
Overall the changes in the spectra of observed bottleneck trafc make it difcult to detect the
presence of bottleneck trafc. Hence special attentions are required to handle them well in the
design of automatic detection algorithms.
3.3.3 Impact of Unobserved Non-bottleneck Trafc
To investigate the effect of unobserved non-bottleneck trafc, we use the experiment setup as in
Figure 3.13. Table 3.4 shows how we map the nodes in the logical topology in Figure 3.9 to the
nodes in the physical dumbbell topology in Figure 3.10. Compared with the experiment setup for
unobserved bottleneck trafc in Figure 3.13, the bottleneck is moved to link SN2 from link N2N3.
Again the bottleneck link has a capacity of 10Mbps while other links have a capacity of 100Mbps.
We run tcpdump at node D to capture the trafc through the link N3D. Again we generate an
Iperf ow either in TCP mode or UDP mode with a congured sending rate of 10Mbps between S
and D, and this Iperf ow serves as the observed bottleneck trafc. We use Surge [8] to generate
55
Figure 3.13: Experiment setup for unobserved non-bottleneck trafc
Table 3.4: Mapping in the experiment for unobserved non-bottleneck trafc
Node in the logical topology S D A1 R1 A2 R2 A3 R3 A4 R4
Node in the physical topology S D - - N1 N2 N4 N3 - -
web trafc between N1 and N4, and this web trafc serves as the unobserved non-bottleneck trafc
as it does not go through the bottleneck link SN1 and does not pass the monitored link N3D.
We vary the number of web users simulated by Surge from 10 to 80 and 640 to control the vol-
ume of unobserved non-bottleneck trafc that shares the non-bottleneck link N2N3 with observed
bottleneck trafc. The corresponding throughput along link N1N2, link N2N3, and link N3D is
summarized in Table 3.5 and 3.6. We observe that as the volume of unobserved non-bottleneck
trafc increases, there is only limited impact on the throughput of observed bottleneck trafc since
the shared link N2N3 is not congested. For example, when the number of simulated web users
increases from 10 to 80, the packet rate of the Iperf TCP ow decreases from 811 pps to 805 pps,
while the packet rate of the Iperf UDP ow decrease from 812 pps to 807 pps. Both changes are
very small.
56
Table 3.5: Throughput with Iperf TCP ow as observed bottleneck trafc and web
trafc as unobserved non-bottleneck trafc or observed non-bottleneck trafc
Simulated Web trafc (link N1N2) Iperf TCP ow (link N3D) Aggregate trafc (link N2N3)
web users bit rate, packet rate bit rate, packet rate bit rate, packet rate
10 0.40Mbps, 50pps 9.73Mbps, 811pps 10.13Mbps, 861pps
80 5.89bps , 582pps 9.66Mbps, 805pps 15.55Mbps, 1387pps
640 15.19Mbps, 1810pps 9.46Mbps, 791pps 24.65Mbps, 2601pps
Table 3.6: Throughput with Iperf UDP ow as observed bottleneck trafc and web
trafc as unobserved non-bottleneck trafc or observed non-bottleneck trafc
Simulated Web trafc (link N1N2) Iperf UDP ow (link N3D) Aggregate trafc (link N2N3)
web users bit rate, packet rate bit rate, packet rate bit rate, packet rate
10 0.50Mbps, 69pps 9.74Mbps, 812pps 10.24Mbps, 881pps
80 5.21bps, 568pps 9.68Mbps, 807pps 14.89Mbps, 1375pps
640 12.72Mbps, 1584pps 9.56Mbps, 797pps 22.28Mbps, 2381pps
Figure 3.14 shows the spectra of observed bottleneck trafc, as the number of simulated web
users increases. The three subgraphs on the left side correspond to the TCP case, while the three
subgraphs on the right side correspond to the UDP case. Like the results for unobserved bottleneck
trafc in Figure 3.12, we observe three types of changes in the spectra for observed bottleneck
trafc. These changes include a decrease in the peak amplitude around the base frequency, a spread
of the energy around the base frequency, and change in the peak amplitude location. These changes
are due to the contention between observed bottleneck trafc and unobserved non-bottleneck trafc
at link N2N3. The contention changes the packet interarrival times for the observed bottleneck
trafc, and the changes in the packet interarrival times result in the changes in the spectrum.
However, these changes are much less than the changes introduced by unobserved bottleneck
trafc for the same number of simulated web users. For example, when the number of simulated web
users increases from 10 to 80, the peak amplitude around the base frequency for the Iperf UDP ow
decreases from 560,000 to 530,000, a decrease of only 5%, while for the Iperf TCP ow the peak
amplitude decreases from 250,000 to 200,000, a decrease of 20%. For comparison, the decrease
57
0 100 200 300 400 500 600 700 800 900 1000
0
1
2
3
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(a) Iperf TCP ow with light web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(b) Iperf UDP ow with light web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(c) Iperf TCP ow with medium web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(d) Iperf UDP ow with medium web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
5
10
15
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
8
x 10
−3
Frequency (Hz)
NCS
(e) Iperf TCP ow with heavy web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
8
x 10
−3
Frequency (Hz)
NCS
(f) Iperf UDP ow with heavy web trafc
Figure 3.14: Power spectra of observed bottleneck trafc as unobserved non-bottleneck trafc in-
creases
with unobserved bottleneck trafc is 64% for the UDP case and 78% for the TCP case as shown in
Figure 3.12. The reason for the limited impact of unobserved non-bottleneck trafc is because the
58
Figure 3.15: Experiment setup for observed non-bottleneck trafc
Table 3.7: Mapping in the experiment for unobserved non-bottleneck trafc
Node in the logical topology S D A1 R1 A2 R2 A3 R3 A4 R4
Node in the physical topology S D - - - - N1 N2 N4 N3
capacity in the link N2N3 is much larger than the offered load (only about 15% utilization level
with 80 simulated web users). As the contention between observed bottleneck trafc and unobserved
non-bottleneck trafc at link N2N3 is not so common, the spectra of observed bottleneck trafc is
not changed much.
Like the results for unobserved bottleneck trafc in Figure 3.12, we see that the TCP case shows
greater changes than the UDP case caused by the unobserved non-bottleneck trafc in Figure 3.14.
The reason is because the TCP ow adjusts its sending rate according to the feedback from the
network while the UDP ow does not.
3.3.4 Impact of Observed Non-bottleneck Trafc
Finally we consider the effect of observed non-bottleneck trafc. The experiment setup is illustrated
in Figure 3.15. Table 3.7 shows the mapping from the nodes in the logical topology in Figure 3.9 to
the nodes in the physical dumbbell topology in Figure 3.10. The experiment setup is the same as the
59
0 100 200 300 400 500 600 700 800 900 1000
0
1
2
3
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(a) Iperf TCP ow with light web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(b) Iperf UDP ow with light web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
0.02
Frequency (Hz)
NCS
(c) Iperf TCP ow with medium web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
Frequency (Hz)
NCS
(d) Iperf UDP ow with medium web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
0.02
Frequency (Hz)
NCS
(e) Iperf TCP ow with heavy web trafc
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
Frequency (Hz)
NCS
(f) Iperf UDP ow with heavy web trafc
Figure 3.16: Power spectra as background trafc increases
experiment setup for unobserved non-bottleneck trafc in Figure 3.13 except that the monitored link
is the link N2N3, instead of the link N3D. So now the web trafc between N1 and N4 serves as
observed non-bottleneck trafc as it does not cross the bottleneck link SN2 but pass the monitored
60
link N2N3. The throughput along link R2-A2, link R2-D, and link R1-R2 are the same as reported
in Table 3.5 and 3.6.
Figure 3.16 shows the spectra of the aggregate trafc through link R1-R2 as the number of
simulated web users increases. We have the following observations. First, the energy around the
813Hz base frequency in the PSD is almost the same as the result in Figure 3.14, which shows
limited changes as the number of simulated web users increases. The reason for this is that the
interarrival times for observed bottleneck trafc are largely unchanged by observed non-bottleneck
trafc along link R1-R2 as this link has low utilization. So the energy around the 813Hz base
frequency in the PSD does not change much.
The second observation is the signicant increase of the energy over other frequencies (espe-
cially in the frequencies lower than 100Hz) in the PSD. Correspondingly we see a bigger jump from
0Hz to 100Hz in NCS. This results in less prominent increase over the frequencies around the 813Hz
base frequency in NCS. For example, the NCS only has an increase of 0.002 from 800Hz to 820Hz
with 640 simulated web users, while it increases by 0.004 from 800Hz to 820Hz with 80 simulated
web users. The reason is because observed aggregate trafc contains not only the bottleneck ow,
but also the web trafc which has strong low frequency components.
3.3.5 Summary of Cross Trafc Impact
In this section we have investigated the impact of cross trafc on the spectral characteristics of
periodic patterns using the pattern caused by bottleneck links as a case study. We rst classify cross
trafc into three different types depending on whether the cross trafc traverses the bottleneck link
and the monitored link or not. Then we carry out controlled lab experiments to see how each of the
three types of cross trafc can affect the spectrum of the trafc observed at the monitored link.
61
Table 3.8: Summary of the impact from cross trafc
Cross trafc lower peak wider energy shifting location other frequency
type amplitude spread of peak amplitude components
Unobserved
bottleneck trafc strong impact strong impact strong impact little impact
Unobserved non-
bottleneck trafc medium impact medium impact medium impact little impact
Observed non-
bottleneck trafc medium impact medium impact medium impact strong impact
Table 3.8 summarizes the impact from cross trafc. Overall cross trafc can cause four types of
changes to the trafc spectrum, including lower peak amplitude, wider energy spread, shifting loca-
tion of the peak amplitude, and introduction of other frequency components. Unobserved bottleneck
trafc has strong impact for the rst three types of changes, as it carries part of the signal caused by
the bottleneck link. But it has little impact for introducing other frequency components since it does
not add new trafc to the observed trafc. In comparison, both unobserved non-bottleneck trafc
and observed non-bottleneck trafc have only medium impact for the rst three types of changes
since the shared link between them and the observed bottleneck trafc is not congested. While un-
observed non-bottleneck trafc has little impact for adding other frequency components, observed
non-bottleneck trafc has strong impact for introducing other frequency components because ob-
served non-bottleneck trafc is part of the observed trafc.
3.4 Wide-area Network Experiments
Previous examples show that strong periodicities of bottleneck trafc can be detected from the
spectra of aggregate trafc in controlled lab experiments with simple topology and articial cross
trafc. Things would be more complex when we move to wide-area network experiments with real
Internet trafc. First we will have a much more complex network topology. The monitored link
may be far away from the actual bottleneck link. Second, we will have much richer cross trafc
62
Figure 3.17: Experiment setup
which can interact with the bottleneck trafc and affect the spectra for the aggregate trace gathered
at the monitored link. The cross trafc can include all three types of cross trafc as we dened in
Section 3.3. The richer cross trafc pose serious challenges to our goal to detect bottleneck from
the aggregate trace without ow separation, since rst, the periodicities of bottleneck trafc may
be distorted by competing trafc before it reaches the monitored link, and second, the signal from
bottleneck trafc may be buried inside the spectra of the aggregate by background trafc.
We illustrate the difculty by carrying out experiments on a wide-area network with richer and
live background trafc. In the experiment, we place the trace machine close to a router at the edge of
USC network. Figure 3.17 illustrates the experiment setup. The router forwards all incoming trafc
through USC Internet-2 link to the trace machine by port mirroring. The trace machine then records
all packets using an Endace DAG network monitoring card [75], which is capable of better than
microsecond resolution timestamps and can keep up with 1Gbps link speed. We introduce an Iperf
TCP ow which traversed a known 10Mbps bottleneck link and several other links before crossing
USC Internet-2 link.
63
0 100 200 300 400 500 600 700 800 900 1000
0
5
10
15
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
Frequency (Hz)
NCS
(a) with an Iperf ow through a 10Mbps bottleneck
link
0 100 200 300 400 500 600 700 800 900 1000
0
5
10
15
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
0.015
Frequency (Hz)
NCS
(b) without the Iperf ow
Figure 3.18: Power spectra of aggregate trafc at USC Internet-2 link
In the experiment, there is no other ow that shares the 10Mbps bottleneck link with the Iperf
bottleneck ow, so we do not have unobserved bottleneck trafc. As we only monitor trafc at
the Internet-2 link, we do not know for sure if there is unobserved non-bottleneck trafc in the ex-
periment or not, but it is highly likely that unobserved non-bottleneck trafc exists because this is
a wide-area network experiment. Finally, we do have observed non-bottleneck trafc. In the ex-
periment the volume of observed non-bottleneck trafc is around 20.6Mbps, while the throughput
of the Iperf TCP bottleneck ow is around 9.03Mbps, less than half of the volume for observed
non-bottleneck trafc. Hence in the experiment we expect to see the same impact from observed
non-bottleneck trafc and possibly unobserved non-bottleneck trafc as we have observed in Sec-
tion 3.3.3 and Section 3.3.4.
Figure 3.18(a) shows the spectrum of the observed aggregate trafc with the Iperf TCP bottle-
neck ow inside. We see a spike around 793Hz. Compared with the spectra in Figure 3.3 for an
Iperf TCP ow through a 10Mbps bottleneck link with no cross trafc, we observe four changes in
this graph. First, we see a weaker peak amplitude of 140,000, only 27.5% of the peak amplitude for
the case without cross trafc. Second, there is a wider energy spread around the peak amplitude.
64
However it is hard to quantify the energy spread here because the small jump around 800Hz in
NCS is partly caused by the introduction of other frequency components. Third, the peak amplitude
appear at 793Hz, which is different with the 813Hz for the spectrum without cross trafc. Finally,
we see many other frequency components in the spectrum. These frequency components make the
jump around 800Hz in NCS much less clear.
These four changes are consistent with our controlled lab experiment results for unobserved
non-bottleneck trafc and observed non-bottleneck trafc in Section 3.3. We see strong impact for
the introduction of other frequency components here, which agrees with the impact from observed
non-bottleneck trafc in Section 3.3.3. We also see medium impact for lower peak amplitude, wider
energy spread, and shift location of the peak amplitude, which agrees with the impact from both
unobserved non-bottleneck trafc and observed non-bottleneck trafc in Section 3.3.3 and Section
3.3.4.
Despite the four changes, we still see a prominent spike in PSD near the 813Hz base frequency,
suggesting that the bottleneck trafc can be detected even mixed in the aggregate trafc. For com-
parison, Figure 3.18(b) shows the spectrum of aggregate trafc at an earlier time without the Iperf
TCP bottleneck ow. The strongest energy around 790Hz is only 1/3 of the peak amplitude when
the bottleneck ow is present. The trafc volume is about 17.8Mbps. This suggests that despite
the noise from background trafc and interference from competing trafc, the bottleneck trafc
can still be detected from the PSD of the aggregate trafc. But it becomes more difcult with the
presence of cross trafc. The results here indicate that we should examine the amplitude near the
predicted bottleneck frequency in the design of automatic detection algorithms.
65
3.5 Summary
In this chapter we visually demonstrated the spectral signatures of three different regular patterns in
Internet Trafc caused by trafc congestion along bottleneck links, DoS attacks, and TCP window-
ing behavior, respectively. We see that each of them can be revealed by strong energy in the corre-
sponding frequencies in the spectrum of the trafc. For bottlenecks, the frequencies are determined
by the bottleneck bandwidth and the packet size distribution. For DoS attacks, the frequencies are
determined by the resource limit, such as the highest possible packet rate along the bottleneck link.
For TCP windowing behavior, the frequencies are determined by the RTTs of the under-performing
TCP ows. Our experiment results show that the signatures can survive despite the interference
from cross trafc. The results also show that TCP trafc is more sensitive to the impact of cross
trafc than UDP trafc as expected.
66
Chapter 4
NON-PARAMETRIC DETECTION OF PERIODIC PATTERNS
4.1 Introduction
In the previous chapter we have visually demonstrated the signatures of various periodic patterns
in network trafc and showed that cross trafc can affect the signatures and make them hard to
detect. To face the challenges in automatically detecting periodic patterns in network trafc, we
apply a classic detection method, the Maximum Likelihood Detection [27], to develop four non-
parametric detection algorithms. These four algorithms are called non-parametric since they focus
mainly on the trafc spectrum for the detection purpose. In the Chapter 5 we will introduce a
parametric algorithm which utilizes not only the trafc spectrum but also its variation according to
other factors such as trafc volume in the detection.
We use the periodic pattern caused by trafc congestion along bottleneck links as a representa-
tive to illustrate these algorithms and evaluate their performance in real Internet trafc. They can
be easily extended to detect other periodic patterns. For abbreviation we use bottleneck trafc to
denote not only the trafc that crosses the bottleneck link, but also the periodic pattern caused by
the bottleneck link and carried inside this trafc.
67
In this chapter we rst describe how to apply the Maximum Likelihood Detection to detect
bottleneck trafc with a generalized framework suitable for any feature used for detection in Sec-
tion 4.2. Then in Section 4.3 we present four specic algorithms that extract different features from
the spectral representation of the trafc for detection and make different assumptions to simplify the
operation. They are called the Single-Frequency Algorithm, the Top-Frequency Algorithm, the Top-
M-Frequencies Algorithm, and the All-Frequencies Algorithm. We explore these four algorithms
to nd a good trade-off between algorithm performance and algorithm complexity.
After presenting details of these four algorithms, we evaluate their performance using real In-
ternet Trafc in Section 4.4. We have also explored alternative detection features that are associated
with harmonics, since bottleneck trafc shows strong energy not only around the base frequency
governed by link bandwidth and packet size, but also around its harmonics. Section 4.5.1 presents
our approach to utilize harmonics and the improvement in performance.
4.2 Maximum Likelihood Detection
Maximum Likelihood Detection [27] has long been used in many elds. To apply it for the detection
of bottleneck trafc, we treat the detection problem as a binary-hypothesis-testing problem where
hypothesis H
B
corresponds to the presence of bottleneck trafc of a given type in the aggregate,
and hypothesis H
0
corresponds to the absence of such bottleneck trafc.
For these two hypotheses, we select a feature drawn from the aggregate trafc as the random
variable. For example, one possible feature can be the highest amplitude in a frequency window of
the spectrum of the aggregate trafc. We denote the PDFs (probability density functions) for the
feature under these two hypotheses as p(xjH
0
) and p(xjH
B
), respectively. Informally, a probability
density function can be seen as a smoothed out version of a histogram. Formally, if a random
68
variable has a probability density function f(x), then it has a probability of f(x)dx to have a value
in the innitesimal interval [x; x + dx].
Below we rst introduce the Maximum Likelihood Test Rule that determines which hypothesis
is more likely to be true for a given trace segment using the PDFs of the designated feature in
Section 4.2.1. Then we describe the two phases of the overall scheme, the training phase and the
detection phase, in Section 4.2.2.
4.2.1 Maximum Likelihood Test Rule
Assuming we know the PDFs for both hypotheses H
0
and H
B
, we can determine which hypothesis
is more likely to be true for a given aggregate trace segment, i.e. if the aggregate trace segment
contains bottleneck trafc of a given type or not, by comparing the values of the two PDFs at X,
where X is the value of the selected feature in the aggregate trace segment. The test rule is dened
as follows:
if p(XjH
B
) >= p(XjH
0
); select H
B
if p(XjH
B
) < p(XjH
0
); select H
0
(4.1)
We do not consider the prior probabilities of the two hypotheses H
0
and H
B
in our test rule
because they are hard to obtain. If the prior information were available, we could improve the accu-
racy of the test by forming the Maximum a Posteriori (MAP) test, which compares P[H
0
]p(XjH
0
)
and P[H
B
]p(XjH
B
), where P[H
0
] and P[H
B
] are the prior probability for hypothesis H
0
and H
B
.
69
Figure 4.1: Steps in the training phase
4.2.2 Two Phases in Maximum Likelihood Detection
In order to apply the Maximum Likelihood Test Rule in Section 4.2.1, we need to know the PDFs
of the designated feature for each of the two hypotheses. Thus, our overall scheme has two phases:
training and detection. We describe each of them in detail below.
In the training phase we estimate the probability density functions for the designated feature for
each hypothesis. The steps are illustrated in Figure 4.1. In the rst step T1 we capture a training
trace that we know contains no bottleneck trafc and another trace that contains bottleneck trafc
of a given type using the trace capturing techniques described in Section 2.3. Then in step T2
we segment both traces and calculate the spectrum for each segment according to the formulas in
70
Figure 4.2: Steps in the detection phase
Section 2.4. Next we extract the value of the designated feature from the spectrum of each segment
in step T3. We will discuss how to choose the designated feature in the Section 4.3 shortly. In
step T4 we estimate the PDFs of the designated feature based on its values across all segments for
each of the two hypotheses. As a result we get the PDF for H
0
and the PDF for H
B
. We then
register this pair of PDFs into a database together with the associated bottleneck trafc type in step
T5. We repeat the same steps T1 - T5 to get the pairs of PDFs for other types of bottleneck trafc
and register them in the database. Please note that we always form a binary-hypothesis-testing
problem for each type of bottleneck trafc, and we may use different features, e.g. peak amplitude
in different frequency windows, for different types of bottleneck trafc.
71
Table 4.1: Comparison of detection features
Algorithm Feature PDF Estimation
Single-Frequency the amplitude at a single frequency
(after log)
normal distribution
Top-Frequency the highest amplitude in a fre-
quency window (after log)
normal distribution
Top-M-Frequencies the M highest amplitudes in a fre-
quency window (after log)
multi-variate normal distribution
All-Frequencies all amplitudes in a frequency win-
dow (after log)
multi-variate normal distribution
In the detection phase the algorithm uses the PDFs obtained in the training phase to determine
if an unknown trace segment contains bottleneck trafc or not. The detailed steps are illustrated in
Figure 4.2. We rst obtain a trace segment in step D1. Then we calculate the spectrum for the trace
segment in step D2 using the formulas in Section 2.4. In step D3 we extract the value of the feature
from the spectrum of the trace segment. Then in step D4 we match the feature value against a pair
of PDFs registered in the database by applying the Maximum Likelihood Test Rule in Section 4.2.1
to determine which hypothesis is more likely. If H
B
is more likely, then we declare the presence
of bottleneck trafc of the given type in the trace segment. In step D5 we repeat step D3 and D4 to
test the trace segment with other pairs of PDFs registered in the database to see if it contains other
types of bottleneck trafc.
4.3 Four Detection Algorithms
Based on the feature selected for detection, we have devised four detection algorithms. Two of
them, the Single-Frequency Algorithm and the Top-Frequency Algorithm, use a single variable for
detection. The other two, the Top-M-Frequencies Algorithm and the All-Frequencies Algorithm,
use multiple variables for detection. We investigate these four algorithms to explore how much we
can gain in performance in terms of detection accuracy (dened in Section 4.4) by examining more
72
(a) Choosing x in the Single-Frequency Method (b) Choosing x in the Top-Frequency Method
(c) Choosing x in the Top-M-Frequencies Method,
where M = 2
(d) Choosing x in the All-Frequencies Method
Figure 4.3: Four approaches in selecting the detection feature x
spectral information. Examining more spectral information increases the complexity of the algo-
rithm. Hence we explore the trade-off between algorithm performance and algorithm complexity
by studying these four algorithms.
Table 4.1 summarizes the differences of the four algorithms. They differ not only in the se-
lected detection feature, but also the assumptions made to simplify their operations. For example,
73
the Single-Frequency Algorithm assumes normal distribution in the log domain for the selected de-
tection feature, and then simplify the test rule in Section 4.2.1 by calculating a cut-off threshold
(see Section 4.3.1 for details). We believe that the four algorithms have captured a good range of
possible detection features. In Section 4.5.1 we will explore alternative detection features including
different approaches for utilizing harmonics.
The Single-Frequency Method uses the amplitude at one single frequency as the feature for
detection. The intuition behind it is that when bottleneck trafc is present, the aggregate trafc
spectrum will have strong amplitude at some particular frequencies. Looking any of these frequen-
cies may yield clue to detect the bottleneck. The Top-Frequency Method uses the highest amplitude
in a small frequency window. This considers the fact that the strong amplitude from the bottle-
neck link may appear in close frequency range instead of a xed single frequency due to impact of
cross trafc and other factors. A generalization of the Top-Frequency method would be the Top-M-
Frequencies Method which considers the M highest amplitude in a frequency window, in the hope
that by examining more frequencies, we can get better performance. The All-Frequencies Method
will use all amplitudes in the a frequency window to form a multi-variate detector, with the same
hope to improve the performance by looking more information.
Figure 4.3 illustrate these four detection algorithms. In the Figure the Single-Frequency Al-
gorithm uses the amplitude at 800Hz as the detection feature; the Top Frequency Algorithm uses
the highest amplitude in the [800Hz, 804Hz] window for detection; the Top-M-Frequencies Algo-
rithm considers the two highest amplitudes in the [800Hz, 804Hz] window; and the All-Frequencies
Algorithm utilizes the amplitudes at all frequencies in the [800Hz, 804Hz] window. The 800Hz
frequency and [800Hz, 804Hz] frequency window are selected here for illustration only. In real
operation they are selected according to a number of factors, most importantly the the bottleneck
74
type and packet size. In the following subsections we describe each of the four detection algorithms
in more detail.
4.3.1 Single-Frequency Detection Algorithm
Figures 3.1 to 3.4 suggest that in the absence of background trafc, the periodicity exhibited by
a bottleneck reveals itself by a strong energy near the key frequency determined by the bottleneck
bandwidth and packet size. Even when obscured by background trafc, the energy near the key
frequency is still quite strong in Figure 3.18(a). Thus, we conjecture that the strong amplitude in
the associated key frequency would be a feature of interest.
In the Single-Frequency Detection, we generalize this observation by using the amplitude at
any single xed frequency F as the feature for detection. In the example illustrated in Figure 4.3(a),
the feature is the amplitude at 800Hz. After obtaining the distribution of the amplitude at 800Hz
across the training set, we can estimate the PDF for each hypothesis, and then apply the Maximum
Likelihood Test Rule in Section 4.2.1 to classify the new trace using its amplitude at 800Hz.
Rather than directly compare empirical probability density functions, we simplify the operation
by rst tting a mathematical distribution to the data. This approach allows us to estimate the
observed behavior parsimoniously and simplies the hypothesis testing. As we do not have concrete
intuition for a particular distribution model, we test a number of distribution models and t them to
the actual data. At the end we select the log-normal distribution as the model for the distribution
under each hypothesis, since this distribution is simple and can provide good t to the actual data.
75
Log-normal distribution means the log of the amplitude follows a Gaussian distribution. Thus
for the two distributions of interest, the parameters which completely characterize the distributions
are the mean and standard deviation, i.e.(
0
;
0
) and (
B
;
B
), and the distributions are given by,
p(xjH
0
) =
1
0
p
2
e
(x
0
)
2
=2
2
0
(4.2)
p(xjH
B
) =
1
B
p
2
e
(x
B
)
2
=2
2
B
(4.3)
where x is the log of the amplitude at the selected frequency. The mean and standard deviation
under each hypothesis are determined from the training data.
Although the log-normal distribution does not always offer the best t to the data, it provides
a good trade-off between t and a parsimonious modeling of the feature. Figure 4.4 shows the
distributions of the amplitude at 796Hz for the training data associated with the two hypotheses, H
0
and H
B
, in one experimental trace set involving a 10Mbps TCP bottleneck ow. We select 796Hz
here as it is close to the key frequency for this experiment. All data are presented in the log domain.
The dashed lines are the empirical PDFs for H
0
and H
B
, while the solid lines are the log-normal
distributions with the mean and standard deviation derived from the experimental set for H
0
and
H
B
. We see the solid lines are not far apart from the corresponding dashed lines. Kolmogorov-
Smirnov goodness-of-t hypothesis test [17] shows that the largest distance between the CDFs of
the empirical distribution and the corresponding log-normal distribution is less than 8.1% for H
0
and less than 9.5% for H
B
.
With the log-normal distribution, we can simplify the Maximum Likelihood Test Rule in Sec-
tion 4.2.1 by solving the equation p(xjH
0
) = p(xjH
B
) rst. This equation yields a quadratic
function which has two roots if
0
6=
B
, and one root if
0
=
B
. If it has two roots, we will
76
0 2 4 6 8 10 12 14
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Log of Peak Amplitude
PDF
Figure 4.4: PDFs of the amplitude (after log) at 796Hz for group H
0
(lines on the left) and H
B
(lines on the right) with dashed lines representing empirical PDFs and solid lines representing the
log-normal approximation
discard one root that contradicts the intuition that H
B
should usually have higher peak amplitude
than H
0
since the presence of bottleneck trafc will typically increase the amplitude of the aggre-
gate trafc. We select the root that agrees with the intuition as the cut-off threshold. If there is only
one root for the quadratic function, then this root should agree with the expectation and it will be
selected as the cut-off threshold.
In both cases, the detection rule can be simplied to a direct comparison between the cut-off
threshold and the peak amplitude of the input trace in the selected window. If the latter is larger,
then the input trace will be classied as H
B
, having the associated bottleneck trafc. Otherwise, we
select H
0
and declare that there is no such bottleneck trafc.
4.3.2 Top-Frequency Detection Algorithm
As the Single-Frequency Detection Algorithm only considers the amplitude in a single xed fre-
quency, it may not perform well because the strong energy associated with the bottleneck may shift
its location in the spectrum due to the impact of cross trafc, as we have observed in Section 3.3. To
77
9 9.5 10 10.5 11 11.5 12 12.5 13
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Log of Peak Amplitude
PDF
Figure 4.5: PDFs of the peak amplitude (after log) in the [780Hz, 800Hz] window for H
0
(lines on
the left) and H
B
(lines on the right) with the dashed lines representing empirical PDFs and solid
lines representing the log-normal approximation
deal with this shift of strong energy location we design a new detection algorithm that uses the high-
est amplitude in a frequency window as the detection feature. It differs from the Single-Frequency
Algorithm as the latter uses the amplitude in a xed single frequency as the detection feature.
In the example illustrated in Figure 4.3(b), we select the highest amplitude in the frequency win-
dow [800Hz, 804Hz] as the feature for detection. The frequency window [800Hz, 804Hz] is selected
here for illustration only. In real operation the frequency window should be selected according to
a number of factors, most importantly the bottleneck bandwidth and packet size. For example, we
would check the window around 800Hz for a 10Mbps link saturated mostly with 1500-byte packets
as opposed to the 8kHz for the 100Mbps link. The window size should not be too small or we will
miss the strong energy associated with the bottleneck for some instances. Neither should it be too
wide as this may include strong energy caused by other network phenomena. We will discuss the
proper selection of window location and size in Section 4.4.3 and 4.4.4 using real Internet trafc.
Like in the Single-Frequency Detection Algorithm, we simplify the detection by approximating
the training data with log-normal distributions and then calculating the cut-off threshold by solving
the equation p(xjH
0
) = p(xjH
B
) directly. Figure 4.5 shows the distributions of the peak amplitude
in the frequency window [780Hz, 800Hz] for the training data associated with the two hypotheses,
H
0
and H
B
, in the same experimental trace as in Figure 4.4. The frequency window [780Hz,
78
800Hz] is selected as it is shown to be a good detection window for this scenario according to our
investigation regarding frequency window location and size in Section 4.4.3 and 4.4.4. Again, all
data are presented in the log domain. The dashed lines are the empirical PDFs for H
0
and H
B
, while
the solid lines are the log-normal distributions with the mean and standard deviation derived from the
experimental set for H
0
and H
B
. We can see the two empirical PDFs can be closely approximated
by log-normal distributions represented by the solid lines. Kolmogorov-Smirnov goodness-of-t
hypothesis test [17] shows that the largest distance between the CDFs of the empirical distribution
and the corresponding log-normal approximation is less than 3.3% for H
0
and less than 4.4% for
H
B
. In addition, there is a much wider separation between the lines for H
0
and H
B
compared
with the Figure 4.4 obtained with the Single-Frequency Detection method, suggesting that the Top-
Frequency method captures the difference between H
0
and H
B
better than the Single-Frequency
method.
4.3.3 Top-M-Frequencies Detection Algorithm
A generalization of the Top-Frequency Method is to use the M highest amplitudes instead of only
the rst highest amplitude in a frequency window for detection. This method considers more fre-
quencies in the hope for better detection since exploring more properties of objects usually leads to
better classication of the objects. It works in the following way. First it picks the M highest ampli-
tudes in a particular frequency window and put them into a vector [x
1
; x
2
; :::; x
M
] in the order from
high to low for each training instance. It then estimates the joint distribution of this vector for each
hypothesis. Figure 4.3(c) illustrates the selection of the two highest amplitudes in the frequency
window [800Hz, 804Hz] as the feature for detection.
79
To simplify the algorithm we approximate the joint distribution for both hypotheses H
0
and H
B
using multi-variate log-normal distributions. Thus for the two distributions of interest, the param-
eters which completely characterize the distributions are the mean vector and covariance matrix,
i.e.(
0
; C
0
) and (
B
; C
B
), and the distributions are given by,
Pr(xjH
0
) =
e
1
2
(x
0
)
T
C
0
(x
0
)
p
2
m
det(C
0
)
Pr(xjH
B
) =
e
1
2
(x
B
)
T
C
B
(x
B
)
p
2
m
det(C
B
)
where x is a vector containing the log of the M highest amplitudes in the frequency window, (
0
; C
0
)
and (
w
; C
w
) are are determined from the training data.
Once we obtain (
0
; C
0
) and (
w
; C
w
) from the training data, we can apply the Maximum
Likelihood Test Rule in Section 4.2.1 to test which hypothesis is more likely to be true for an input
trace with X formed with the M highest amplitudes in the corresponding frequency window. In
comparison to using a single variable for detection, like in the Single-Frequency Algorithm and
the Top-Frequency Algorithm, using multiple variables makes the algorithm more complex as it
examines multiple variables and estimate their joint distribution in the hope to get better detection
accuracies.
4.3.4 All-Frequencies Detection Algorithm
In this method, we use all frequencies in a particular frequency window to form a multi-variate
detector. For example, Figure 4.3(d) illustrates the selection of all amplitudes in the [800Hz, 804Hz]
window across all instances to obtain the joint distribution.
80
Similar with the Top-M-Frequencies Method, we approximate the joint distributions for both
H
0
and H
B
using multi-variate log-normal distributions with parameters (
0
; C
0
) and (
B
; C
B
).
We use the training data to estimate the parameters (
0
; C
0
) and (
B
; C
B
). We then plug the values
in the PDFs and apply the Maximum Likelihood Test Rule in Section 4.2.1 to classify an input trace.
4.4 Evaluation with Real Internet Trafc
To systematically evaluate the performance of our detection algorithms we articially introduce bot-
tleneck trafc into a wide-area network and observe them in packet traces gathered at the Internet-2
access link of USC. Overall, our primary goal is to understand the performance of our detection
algorithms under different network conditions and with different algorithm parameters. We rst
compare the results for the four detection algorithms under the same network condition and similar
algorithm parameters in Section 4.4.2. The results show that the Top-Frequency Algorithm achieves
a good balance between detection accuracy and algorithm complexity, so we pick it as a representa-
tive and investigate its performance under different conditions and with different algorithm param-
eters. We present the impact of various algorithm parameters including window location, window
size, and other parameters in Section 4.4.3, 4.4.4, and 4.4.5. We then consider the stability of the
results with different protocols in Section 4.4.6 and how to select the proper frequency window in
Section 4.4.7. Finally, we investigate the inuence of using different training data in Section 4.4.8
and how the signal-to-noise ratio (dened as the ratio of bottlenecked trafc volume to background
trafc volume) affects the algorithm performance in Section 4.4.9.
81
Table 4.2: Experiment scenarios
Scenario Bottleneck Trafc Type Background Trafc V olume
T10L an Iperf TCP ow through a 10Mbps bottleneck low (24Mbps to 59Mbps)
U10L an Iperf UDP ow through a 10Mbps bottleneck low (19Mbps to 57Mbps)
T10H an Iperf TCP ow through a 10Mbps bottleneck high (94Mbps to 186Mbps)
T100H an Iperf TCP ow through a 100Mbps bottleneck high (112Mbps to 204Mbps)
Table 4.3: Traceroute result for the ow path in the experiment
[cs599@garvey ]$ traceroute 204.57.6.162
traceroute to 204.57.6.162 (204.57.6.162), 30 hops max, 38 byte packets
1 router.postel.org (128.9.112.7) 0.535 ms 0.416 ms 0.408 ms
2 128.9.0.9 (128.9.0.9) 0.521 ms 0.547 ms 0.524 ms
3 198.32.16.6 (198.32.16.6) 0.498 ms 0.491 ms 0.444 ms
4 lax-hpr.losnettos-hpr.cenic.net (137.164.27.245) 0.937 ms 0.928 ms 0.869 ms
5 v251-gw-17.usc.edu (128.125.251.154) 0.886 ms 0.897 ms 0.848 ms
6 netlab-gw.usc.edu (204.57.3.234) 0.978 ms 1.078 ms 0.991 ms
7 kost.usc.edu (204.57.0.28) 1.246 ms 1.142 ms 1.147 ms
8 204.57.6.130 (204.57.6.130) 1.275 ms 1.488 ms 1.355 ms
9 204.57.6.162 (204.57.6.162) 1.305 ms 1.424 ms 1.474 ms
4.4.1 Experiment Setup
In our experiments we use the same wide-area network experiment setup as in Section 3.4. we
monitor the trafc through the Internet-2 access link of USC. A router forwards all incoming packets
through the link to our trace machine using port mirroring. These packets are then captured using an
Endace DAG network monitoring card [75]. The bandwidth of the Internet-2 link is 1Gbps, while
the actual load as measured in our experiments varies from 19Mbps to 204Mbps.
To create bottleneck trafc, we intentionally generate a ow using Iperf [78] from an outside
source to an inside destination. The source (204.57.6.162) is nine hops away from the destination
(204.57.6.162). Table 4.3 shows the traceroute result for the ow path between the source and
destination. The link between the source and the rst hop (128.9.112.7) is the bottleneck link along
the ow path since this link has the lowest capacity in the path and is saturate by the Iperf ow
with over 90% capacity utilization. No other ow shares this link during the experiment. The
82
Iperf bottleneck ow traverses the Internet-2 link between 198.32.16.6 and 137.164.27.245, and is
observed by the trace machine together with the background trafc through the Internet-2 link.
In the experiment setup we consider variations of three factors, including the transport protocol
of the bottleneck trafc (TCP or UDP), bandwidth of the bottleneck link (10Mbps or 100Mbps), and
amount of background trafc at the monitored link (varying from less than 20Mbps to more than
200Mbps). Among all possible combinations of these variations we have investigated four specic
scenarios as shown in Table 4.2. These four scenarios are selected because they have provided good
coverage of the variations of the three factors. For example, U10L differs from T10L in the transport
protocol of the bottleneck trafc; T10L differ from T10H in background trafc volume; and T10H
differs from T100H in the bandwidth of the bottleneck link.
For each of the four scenarios we gather a pair of packet traces at the monitored link every
two hours for 24 hours. Each pair consists of a 5-minute long trace gathered when there is no
intentionally introduced bottleneck ow (H
0
) and another 5-minute long trace (H
B
) gathered when
we intentionally introduce a bottleneck Iperf ow. In all cases we use a default segment length ` of
1 second and a default sampling rate p of 200kHz. Thus each trace pair has 300 trace segments that
have no bottleneck trafc and 300 trace segments that contain the bottleneck Iperf ow.
To measure the performance of each detection algorithm, we follow the following procedure.
First, we select one trace pair as the training data set and train the algorithm on it. For the Single-
Frequency Algorithm and the Top-Frequency Algorithm we obtain the cut-off threshold as the train-
ing result (see Section 4.3.1 and 4.3.2 for details). For the Top-M frequencies Algorithm and the
All-Frequencies Algorithm we calculate the mean vector and covariance matrix for both hypotheses
H
0
and H
B
to estimate the PDFs in training (see Section 4.3.3 and 4.3.4 for details). Next, we
use the training result to classify the trace segments in both the training data and other trace pairs
83
under the same scenario, and compare the answer with the ground truth to nd the probability that
the algorithm gives the correct answer about the existence of bottleneck trafc in the trace. We
dene this probability as the Detection Accuracy. In our case, the detection accuracy is equal to the
average value of the true positive rate and the true negative rate, as we have equal number of trace
segments with and without the bottleneck ow.
In this thesis, we present two types of detection accuracies. The rst one is the accuracy on the
training set itself and it measures the ability of the algorithm to distinguish trace segments in the
training set. The second is the average accuracy on all other trace pairs gathered under the same
scenario as the training set. In real operation, network operators can always calculate the accuracy
on the training set as they are assumed to know the ground truth about the training set, but it is hard
to measure the accuracy on unknown traces without knowing the ground truth of these traces. We
can measure both types of accuracies because we know the ground truth of all trace pairs in the
experiments. We rst use the accuracy on the training set to tune the algorithm parameters, such as
selecting the proper detection window, and then present the average accuracy on other trace pairs
to show the overall picture of the algorithm's performance. As we are going to see in Section 4.4.3
and 4.4.4, there is a strong correlation of these two types of accuracies in their response to the
changes in the algorithm parameters. This strong correlation gives network operators condence on
the parameter values selected by using the training set alone.
4.4.2 Performance Comparison of Different Algorithms
In Section 4.3 we have described how each of the four non-parametric algorithm works in detecting
bottlenecks. Their main difference lies in the feature used in the detection. Here we give a compari-
son of the performance of these four algorithms using the traces in the T10L scenario which targets
84
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(b) Average Accuracy on other traces
Figure 4.6: Performance of the Single-Frequency Algorithm
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(b) Average Accuracy on other traces
Figure 4.7: Performance of the Top-Frequency Algorithm, W
s
= 20Hz
detecting a 10Mbps bottleneck under low background trafc. We observe similar differences among
the performance of these four algorithms under other scenarios.
Figure 4.6, 4.7, 4.8, 4.9 show the performance for the four detection algorithms when we train
each of them using the same trace pair gathered at 7am. In each Figure, the left subgraph shows
the detection accuracy of the algorithm on the training set itself. The right subgraph shows the
corresponding average accuracy on other trace pairs in T10L scenario. For the Single-Frequency
Algorithm, the x value is the frequency whose amplitude is used as the detection feature. For the
other three algorithms, the x value is the center location of the frequency window used in gathering
the detection feature and we set the frequency window size (W
s
) to 20Hz. For each x value, we
85
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(b) Average Accuracy on other traces
Figure 4.8: Performance of the Top-10-Frequencies Algorithm, W
s
= 20Hz
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(b) Average Accuracy on other traces
Figure 4.9: Performance of the All-Frequencies Algorithm, W
s
= 20Hz
learn the cut-off threshold or PDFs from the training set by following the steps in the training
process using the corresponding detection feature. Then we use the cut-off threshold or PDFs to
classify trace segments in the training set and other trace pairs to obtain the detection accuracy on
the training set and the average accuracy on other trace pairs, respectively.
We have the following three observations from these four Figures. First, all four Figures show
spikes around 790Hz and its harmonics. The 790Hz is associated to the base frequency caused by
saturating a 10Mbps bottleneck with 1500-byte packets. The spikes around 790Hz and its harmonics
clearly demonstrate that we can use the amplitudes near the base frequency and its harmonics to
detect the presence of bottleneck not only in the training data but also in the other traces. In the
86
1 3 5 7 9 11 13 15 17 19 21 23
10
10.5
11
11.5
12
12.5
13
Statistics of the Peak Amplitude
Gathering Time of the Training Set
Figure 4.10: Statistics of the peak amplitudes for H
0
and H
B
next subsection, we will further discuss the locations of the frequency windows that yield high
detection accuracies, and why the spikes are around 790Hz and its harmonics instead of 813Hz
and its harmonics as indicated by the spectrum for a TCP ow through a 10Mbps switched hub in
Figure 3.3.
Second, all four detection algorithms have higher detection accuracies on the training data itself
than the average accuracy on other traces. This is due to the mismatch between the statistics of
the training data and the statistics of other traces. For example, Figure 4.10 plot the statistics of
the peak amplitude in the [780Hz, 800Hz] window for all 12 traces pairs gathered in the T10L
scenario. The statistics include mean values and standard deviations. The x axis is the time of the
day when the trace pair was gathered. For each trace pair, there are three vertical lines in the Figure.
The middle one shows the mean values. It is drawn from
0
to
B
. The other two lines show the
standard deviations. The left one is drawn from
0
0
to
0
+
0
, while the right one is drawn
from
B
B
to
B
+
B
. We can see signicant variations of the statistics over different trace
pairs. For example, for the 7am trace pair,
0
is 10.54 and
B
is 11.46, while for the 11am trace
pair,
0
is 11.25 and
B
is 11.8. Since the statistics of these two trace pairs are quite different, the
cut-off threshold learned by training the Top-Frequency Algorithm on the 7am trace pair would not
perform well in distinguishing the trace segments in the 11am trace pair. This mismatch in statistics
87
Table 4.4: Performance comparison of the four non-parametric detection algorithms
Accuracy Single-Frequency Top-Frequency Top-10-Frequencies All-Frequencies
on training data 0.685 0.937 0.978 0.987
on other traces 0.604 0.711 0.734 0.701
contributes to a lower average accuracy on other traces than the accuracy on the 7am training trace
pair. We will present more details of the effect of match and mismatch in statistics in Section 4.4.8.
Third, for both types of detection accuracies, the Top-Frequency Algorithm performs much bet-
ter than the Single-Frequency Algorithm. On the other hand, the Top-M-Frequencies Algorithm and
the All-Frequencies Algorithm yield only small improvements over the Top-Frequency Algorithm.
Table 4.4 summarizes the best detection accuracies of the four detection algorithm from the four
Figures. It shows the Top-Frequency Algorithm leads the Single-Frequency Algorithm by 25.2%
for detection accuracy on the training set and 10.7% for the average accuracy on other traces. The
improvements of the Top-M-Frequencies Algorithm and the All-Frequencies Algorithm over the
Top-Frequency Algorithm are less than 5% for detection accuracy on the training set and less than
2.3% for the average accuracy on other traces. This demonstrates that the Top-Frequency Algorithm
has captured the most signicant difference between hypothesis H
0
and H
B
that persists across all
traces in the same scenario.
The Top-Frequency Algorithm is also less complex than the Top-M-Frequencies Algorithm and
the All-Frequencies Algorithm because it only gathers the distribution for one single variable and
uses it for detection instead of relying on multiple variables as in the Top-M-Frequencies Algorithm
and the All-Frequencies Algorithm. So our experiment results show that the Top-Frequency Algo-
rithm achieves a good trade-off between detection accuracy and algorithm complexity. In view of
this, we focus on the Top-Frequency Algorithm in our following investigation of the algorithm's
sensitivity to different network conditions and algorithm parameters.
88
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(b) Average Accuracy on other traces
Figure 4.11: Detecting the 10Mbps bottleneck (T10L) with varying window locations, W
s
= 20Hz
4.4.3 Frequency Window Location
There are many questions to explore for the algorithm's sensitivity. For example, how would the
algorithm perform when we vary its parameters such as window location and size, how would the
performance change with different training data sets, and what is the impact of the signal-to-noise
ratio? In this subsection we investigate the impact of window location. Other questions will be
studied in Section 4.4.4 to 4.4.9.
For window location we expect the algorithm to perform best with the window located around
the base frequency calculated according to the formula 3.2. It would be around 813Hz for 10Mbps
bottlenecks congested with 1500-byte packets. For 100Mbps bottlenecks it would be around 8130Hz.
To validate our hypothesis we repeatedly run the detection algorithm with the center of the frequency
window W
c
varying from near 0Hz to near 10kHz. Here we focus on the results under two scenarios,
T10L which targets detecting a 10Mbps bottleneck and T100H which targets detecting a 100Mbps
bottleneck.
Figure 4.11 shows the impact of window location on detecting a 10Mbps bottleneck under the
T10L scenario. The left subgraph 4.11(a) shows the accuracy on the training set gathered at 7am,
89
while the right subgraph 4.11(b) depicts the corresponding average accuracy on other trace pairs.
In both subgraphs we use a xed window size of 20Hz and vary the window center from 10Hz
to 9990Hz. We can see that the center of the frequency window has a huge effect on both types
of detection accuracies. Both types of accuracies reach peak values with windows around 790Hz
(close to the predicted 813Hz frequency) and its multiples. For example, the detection accuracy on
the training set can reach as high as 93% when the window is around 790Hz, but it drops to 50%
when the window is around 100Hz.
Furthermore, there is a strong correlation of the two types of accuracies in terms of their re-
sponse to the changes in window location. As the window location changes, the average accuracy
on other trace pairs will increase as the accuracy on the training set increases, and decrease as the
latter decreases. However, in general the former is lower than the latter. For example, the average
accuracy is only 71% while the accuracy on the training set reaches 93% with the window around
790Hz. This indicates some mismatch among the statistics of the training trace pair and other trace
pairs, which we have discussed in Section 4.4.2.
We have similar observations when we train the algorithm with different trace pairs under the
T10L scenario. The correlation between these two types of accuracies demonstrates that a window
with a high accuracy on the training set will generally lead to a high accuracy on other trace pairs.
This correlation gives network operators condence on the window selected by inspecting the accu-
racy on the training set alone and on the answer returned by the detection algorithm on the unknown
trace. This is important because network operators do not know the ground truth of the ongoing
trafc without resorting to sophisticated and expensive ow-based analysis.
To nd out why the detection accuracies peak near 790Hz instead of 813Hz as we have expected,
we isolate the Iperf bottleneck ow from the aggregate and plot its spectrum in Figure 4.12(a) for
90
0 100 200 300 400 500 600 700 800 900 1000
0
2
4
6
x 10
4
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(a) Spectrum
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
0
0.05
0.1
0.15
0.2
Inter−arrival times (ms)
Probability
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
0
0.5
1
Inter−arrival times (ms)
CDF
(b) PDF and CDF of packet interarrival times
Figure 4.12: Spectrum and packet interarrival time distribution of the isolated Iperf TCP ow
(T10L)
one trace segment. We can see that the spectrum suggests the existence of signicant cross trafc
as it is consistent with the results of prior lab-controlled experiments for cross trafc in Section 3.3.
In the spectrum the peak amplitude appears around 790Hz, a frequency lower than 813Hz for the
case without cross trafc in Figure 3.3. In addition, the peak amplitude is only 55,000, much
weaker than the peak amplitude of 510,000 for the case without cross trafc. The CDF and PDF
of packet interarrival times for the isolated Iperf ow in Figure 4.12(b) show a spike bump pattern,
also suggesting that the bottleneck Iperf experiences signicant cross trafc in a downstream link
according to the ndings in [44]. From the spectrum and the distribution of packet interarrival
times, we believe the bottleneck trafc spectrum is affected by signicant cross trafc and as a
result the detection accuracies peak around 790Hz instead of the predicted 813Hz base frequency.
Figure 4.13 reveals the impact of frequency window location on the accuracies of detecting a
100Mbps bottleneck under the T100H scenario. We x the window size to 200Hz here, as the
predicted base frequency for a 100Mbps bottleneck link is ten times of the value for a 10Mbps
bottleneck link. Again we see a strong correlation of the two types of accuracies in terms of their
91
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(b) Average Accuracy on other traces
Figure 4.13: Detecting the 100Mbps bottleneck (T100H) with varying window locations, W
s
=
200Hz
response to the changes in window location. Both types of accuracies peak around 8100Hz, close the
predicted 8130Hz base frequency. For example, when the window is around 8100Hz, the accuracy
on the training trace pair gathered at 7am can reach 100%, while the average accuracy on other
trace pairs reaches 97%. These values are signicantly higher than the accuracies for detecting the
10Mbps bottleneck in Figure 4.11, because we have much higher signal-to-noise ratios. The impact
of signal-to-noise ratio will be discuss in more detail in Section 4.4.9. Both experiments validate
our intuition about the location of the frequency window for the best detection accuracy.
4.4.4 Frequency Window Size
In the previous section we considered the impact of window location with xed window sizes. In this
section we consider the impact of window size while xing the window location. Our expectation
behind the choice of window size is that the window should be neither too narrow nor too wide,
since narrow windows might fail to capture the strong signal for the bottleneck that is shifted to
another location due to noise, and wide windows might result in increasing false positives due to
92
0 10 20 30 40 50 60 70 80 90 100
0
0.2
0.4
0.6
0.8
1
Window Size (Hz)
Detection Accuracy
Detection Accuracy on the 7am training set
Average Detection Accuracy on other traces
Figure 4.14: Detecting the 10Mbps bottleneck (T10L) with varying window sizes, W
c
= 790Hz
0 100 200 300 400 500 600 700 800 900 1000
0
0.2
0.4
0.6
0.8
1
Window Size (Hz)
Detection Accuracy
Detection Accuracy on the 7am training set
Average Detection Accuracy on other traces
Figure 4.15: Detecting the 100Mbps bottleneck (T100H) with varying window sizes, W
c
= 8100Hz
noise from unrelated events. Intuitively, we expect the window size of roughly 1% to 5% of the
predicted frequency to be appropriate.
To systematically study the impact of window size, we run the Top-Frequency Algorithm using
the trace pair gathered at 7am in scenario T10L as the training set while varying the window size
from 1Hz to 100Hz with xed window center location W
c
at 790Hz. Figure 4.14 shows the de-
tection results. The upper line is the accuracy on the 7am training set while the bottom line is the
corresponding average accuracy on other trace pairs under Scenario T10L. We see that two types
of accuracies reacts in a similar way to the changes in window size. For both types of accuracies,
we observe that window sizes between 20Hz to 30Hz (about 2.5% to 4% of the predicted 813Hz
base frequency) give the best result. In addition, smaller sizes show much lower accuracies because
with narrower windows it becomes possible to miss a signal that is shifted outside the window due
93
to noise. On the other hand, larger sizes also result in lower accuracies but the penalty is not as
dramatic as smaller sizes.
We have also conducted a similar investigation for detecting the 100Mbps bottleneck in the
T100H scenario with windows centered at 8100Hz. Figure 4.15 shows the detection results using
the trace pair gathered at 7am as training set while varying the window size from 1Hz to 1000Hz.
Again we observe a strong correlation of the two types of accuracies in terms of their response to
the changes in window size. Both types of accuracies reach their peak values with the window size
in the range of 100Hz to 200Hz (about 1.2% to 2.5% of the predicted 8133Hz base frequency). In
addition, smaller sizes show much lower accuracies than larger window sizes.
4.4.5 Sampling Rate and Segment Length
Segment length and sampling frequency are also important parameters of the algorithm. A sampling
frequency that is too low will result into aliasing; one that is too high increases processing overhead.
In future work we will investigate the optimum balance between the two. For this work we choose
a conservative frequency of 200kHz, which allows detection of bottlenecks up to 100Mbps.
We desire a relatively short segment length to allow rapid detection of bottleneck trafc. How-
ever, as discussed in Section 2.4.2, segment length cannot be too short or we cannot compute an
accurate spectrum. In addition, the algorithm may become too sensitive to transient ows.
To study the impact of segment length, we vary it from 1 second to 5 seconds. Again we use the
trace pair gathered at 7am in scenario T10L as the training set and use xed window size of 20Hz
and xed window location at 790Hz. Figure 4.16 shows that the detection accuracy on the training
set and the average detection accuracy on other trace pairs with different segment length. We only
see small changes for both types of detection accuracies when we vary the segment length from 1
94
0 1 2 3 4 5 6
0
0.5
0.4
0.6
0.8
1
Segment Length (seconds)
Detection Accuracy
detection accuracy on other trace pairs
best detection accuracy on the training set
Figure 4.16: Detection accuracy as a function of segment length
second to 5 seconds. In future work we will explore other trace lengths. One practical constraint for
using longer trace segment is that we need longer traces so that we have enough number of segments
to form a distribution and estimate its parameters.
4.4.6 Transport Protocol
In Section 3.2.1 we have observed that TCP and UDP ows have different spectra. In general, UDP
ows are more regular and provide stronger signals than TCP ows. Here we evaluate how that
translates into the detection accuracy.
Figure 4.17 compares the accuracy for detecting a 10Mbps link saturated by a TCP ow (sce-
nario T10L) and the accuracy for detecting the same 10Mbps link saturated by a UDP ow (scenario
U10L). Again we vary the window center from 10Hz to 9990Hz with xed window size of 20Hz.
The two top subgraphs 4.17(a) and 4.17(b) compares the accuracy on the 7am training trace be-
tween TCP and UDP. We can see that the spikes for TCP are around 790Hz and its harmonics, and
the spikes for UDP are around 810Hz and its harmonics. The 810Hz base frequency for UDP is
more close to the predicted 813Hz base frequency for 10Mbps bottleneck links than the 790Hz base
frequency for TCP. In addition, UDP has higher detection accuracies (up to 98%) around its base
frequency than TCP (up to 93%). UDP also keeps high detection accuracies around the harmonics,
95
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(a) Accuracy on the 7am training set with the bottle-
neck saturated by TCP, Ws = 20Hz
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Detection Accuracy
(b) Accuracy on the 7am training set with the bottle-
neck saturated by UDP, Ws = 20Hz
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(c) Average Accuracy on other trace pairs with the
bottleneck saturated by TCP, Ws = 20Hz
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
0.2
0.4
0.6
0.8
1
Window Location (Hz)
Average Detection Accuracy
(d) Average Accuracy on other trace pairs with the
bottleneck saturated by UDP, Ws = 20Hz
Figure 4.17: Detecting the 10Mbps bottleneck saturated with different protocols
while TCP has a decay of detection accuracies among harmonics as they move further away from
the base frequency.
The two bottom subgraphs 4.17(c) and 4.17(d) compares the corresponding average accuracy
on other traces between TCP and UDP. Again we see that UDP has spikes around 810Hz and its
multiples, while TCP has spikes around 790Hz and its multiples. In addition, UDP has higher
average accuracies (up to 75%) around its base frequency than TCP (up to 71%). UDP also keeps
high detection accuracies around the harmonics while TCP has a decay of detection accuracies
among harmonics.
The differences for the detection accuracies between TCP and UDP are due to the fact that TCP
adjusts packet transmissions by considering feedback from the network while UDP does not in our
96
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
x 10
5
Frequency (Hz)
PSD
0 100 200 300 400 500 600 700 800 900 1000
0
0.005
0.01
Frequency (Hz)
NCS
(a) Spectrum
0 0.5 1 1.5 2 2.5
0
0.02
0.04
0.06
0.08
Inter−arrival times (ms)
PDF
0 0.5 1 1.5 2 2.5
0
0.5
1
Inter−arrival times (ms)
CDF
(b) PDF and CDF of packet interarrival times
Figure 4.18: Spectrum and packet interarrival time distribution of the isolated Iperf UDP ow
(U10L)
experiments. This results in less regular packet transmissions for the TCP ow, which translates in
lower detection accuracies and a decay of detection accuracies among harmonics.
To afrm that UDP has more regular packet transmissions than TCP, we isolate the Iperf UDP
bottleneck ow from the aggregate and plot its spectrum in Figure 4.18(a) and the distribution of its
packet interarrival times in Figure 4.18(b). We can see that the Iperf UDP ow indicates much less
impact from cross trafc compared with the Iperf TCP ow as shown in Figure 4.12(a) and 4.12(b).
For example, the peak amplitude for the UDP ow is 170,000, three times the peak amplitude for
the TCP ow, and it appears around 810Hz instead of the 790Hz for the TCP ow. The energy also
has a much narrower spread than the TCP case. For the UDP ow NCS has an increase of 0.007
from 800Hz to 815Hz, while for the TCP ow, NCS has an increase of less than 0.007 from 780Hz
to 820Hz. The PDF and CDF of packet interarrival times for UDP show a single spike at 1.23ms,
while for TCP the interarrival times has a spike bump pattern. These Figures demonstrate that the
UDP ow has more regular packet transmissions than the TCP ow.
97
0 5 10 15 20
−1
−0.5
0
0.5
1
Gathering Time of the Training Set
Detection Accuracy
Difference in the accuracy on the training set
Difference in the average accuracy on other traces
Figure 4.19: Differences between using the xed [780Hz, 800Hz] window and the best windows
for the training set in T10L scenario
4.4.7 Selection of Fixed Frequency Windows
Based on our investigation on the window location, window size, and transport protocol, we select
to use the following xed detection windows for the four scenarios rather than using the window that
yield the best detection accuracy on the training set by exhaustively searching all possible window
locations and sizes. We call the latter the best training window. For scenario T10H and T10L we use
the 20Hz wide window centered at 790Hz. For scenario U10L we select the same size 20Hz wide
window but centered at 810Hz. For scenario T100H we choose the 200Hz wide window centered
at 8100Hz.
A clear advantage of using these xed windows over the best training window is the simpleness
and reduced operation cost for selecting the detection window. Results also show that the detection
accuracies with these xed windows are comparable to the detection accuracies yielded by the best
training window, and in some cases the xed windows can even yield better average accuracy on
other traces than the best training window, because the latter only guarantees the best accuracy on
the training set, but not the best average accuracy on other traces.
For example, Figure 4.19 shows the differences between using the xed [780Hz, 800Hz] win-
dow and using the best training window in the T10L scenario The x axis represents the time of the
98
day when the training trace pair was gathered. For each training trace pair we nd the best training
window that yields the best detection accuracy on the training set by exhaustively searching all pos-
sible window locations and sizes. Then we calculate the accuracy on the training set and the average
accuracy on other traces using this window. In comparison we also calculate the corresponding de-
tection accuracies using the xed [780Hz, 800Hz] window. The differences plotted in Figure 4.19
are obtained by deducting the accuracies for the best training windows from the accuracies for the
xed window. We can see that for most training traces the differences between using the xed win-
dow and the best training windows are small. The largest difference happens with the training trace
pair gathered at 11pm. With the best training window, the detection accuracy on this training trace
pair is 8.7% higher than the value using the xed window, but the average accuracy on other traces
is 18% lower than the value using the xed window.
In general, we should follow the following principles in selecting the detection window. First,
we should use a detection window whose center is located near the predicted base frequency for
the bottleneck link (e.g. 813Hz for 10Mbps links and 8133Hz for 100Mbps links). Second, the
detection window size should be around 1% to 5% of the predicted base frequency. Third, the exact
location and size of the detection window to be used in real operation should be adjusted slightly
according to the specic network. Network operators should do some training to select the proper
detection window.
4.4.8 Training Data Variation
The accuracy of any training-based detection algorithm is inuenced by the quality of the training
data. In this section we investigate the impact of using different training data on the two types of
99
0 2 4 6 8 10 12 14 16 18 20
0
0.2
0.4
0.6
0.8
1
1
3
5
7
9
11
13
15
17
19
21 23
Traffic Volume (kilo−packets per second)
Detection Accuracy
(a) Accuracy on the training set under T10L
0 2 4 6 8 10 12 14 16 18 20
0
0.2
0.4
0.6
0.8
1
1
3 5 7
9
11
13
15
17
19 21
23
Traffic Volume (kilo−packets per second)
Average Detection Accuracy
(b) Average Accuracy on other traces under T10L
Figure 4.20: Impact of training data on detection accuracies.
0 2 4 6 8 10 12 14 16 18 20
0
0.2
0.4
0.6
0.8
1
1
3
5
7
9
11
13
15
17
19
21 23
Traffic Volume (kilo−packets per second)
Detection Accuracy
(a) training with low trafc volume trace (7am)
0 2 4 6 8 10 12 14 16 18 20
0
0.2
0.4
0.6
0.8
1
1
3 5
7
9
11
13 15
17
19
21 23
Traffic Volume (kilo−packets per second)
Detection Accuracy
(b) training with medium trafc volume trace (11am)
0 2 4 6 8 10 12 14 16 18 20
0
0.2
0.4
0.6
0.8
1
1
3 5
7
9
11
13
15
17
19
21 23
Traffic Volume (kilo−packets per second)
Detection Accuracy
(c) training with high trafc volume trace (3pm)
Figure 4.21: Training with different trafc volume traces
detection accuracies, the accuracy on the training trace pair and the average accuracy on other trace
pairs under the same scenario as the training trace pair. We select the T10L scenario for illustration.
100
Figure 4.20 shows the impact of using different training data on the two types of detection
accuracies. The left subgraph shows the accuracy on the training trace pair itself, and we vary the
trace pair using for training algorithm. The x axis is the average aggregate trafc volume of the
training trace pair. The number beside each point in the graph indicates the time of the day when
the training trace pair was gathered. For example, the point denoted by 9 shows the accuracy
on the training trace pair gathered at 9am. In the right subgraph, each point represents the average
accuracy on all other trace pairs using the cut-off threshold learned from the corresponding training
trace pair. For example, the point denoted by 9 reects the average accuracy on all other trace
pairs using the cut-off threshold learned from the training trace pair gathered at 9am. The x axis for
the right subgraph is also the average aggregate trafc volume of the training trace pair.
We see that in general training on trace pairs with lower aggregate trafc volume yields higher
accuracies on the training set itself, with the exception of the points denoted by 19, 21 and 23.
An intuition behind this is that lower aggregate trafc means less interference to the bottleneck
trafc by the background trafc on the monitored link, and this translates into clearer signal for the
bottleneck trafc making it easier to detect the presence of the bottleneck ow. On the other hand,
the average accuracy on other trace pairs has high values when we train on trace pairs with medium
aggregate trafc volume. This agrees with the intuition that it is better to train with the common
cases (middle trafc volume) instead of extreme cases (either low or high trafc volume).
As the average accuracy reported in Figure 4.20(b) may hide the variation among the accuracy
on individual trace pairs used for evaluation, we dissect it into individual graphs, each representing
the result with one training set. Figure 4.21 shows the detection accuracy when training with three
different trace pairs corresponding to low, medium, and high aggregate trafc volume at different
101
periods of the day. In each graph, a point represents the accuracy on the indicated trace pair using
the cut-off threshold learned through the training trace pair.
We see a strong correlation between the accuracy on a trace pair and its distance to the training
trace pair in terms of trafc volume. For example, in Figure 4.21(a), trace pairs at 3am and 5am
have very close trafc volume to the training trace pair gathered at 7am. Both have accuracies very
close to the accuracy on the training set at 7am. As the distance to the training set increases in terms
of packet rate, the detection accuracy on the corresponding trace pair decreases. This observation
suggests that we could take advantage of the similarity in terms of trafc volume between the trace
pair used for training and the unknown trace to get high accuracies on the unknown trace. We could
use the cut-off threshold learned through a training set that has similar trafc volume to the unknown
trace for detecting bottleneck trafc in the unknown trace. In Chapter 5 we will present a parametric
detection algorithm that utilizes the trafc volume for the detection purpose.
4.4.9 Effect of Signal-to-noise Ratio
Detection theory tells us that the detection accuracy is related to the signal-to-noise ratio (SNR).
When detecting bottleneck trafc in aggregate network trafc, the signal is the intensity of the
bottleneck trafc, and the noise is the level of background trafc. Since the spectral representation
is obtained through the processing of packet arrivals, we dene signal-to-noise ratio as the ratio of
bottleneck trafc packet rate to background trafc packet rate.
Figure 4.22 shows how the detection accuracy varies as a function of signal-to-noise ratio using
xed frequency windows specied in Section 4.4.7. The left subgraph shows the detection accuracy
on the training trace pair, while the right subgraph shows the detection accuracy on all other trace
102
0 0.05 0.1 0.15 0.2 0.25
0
0.2
0.4
0.6
0.8
1
Signal to Noise Ratio
Detection Accuracy
10Mbps bottleneck with TCP and high background traffic
10Mbps bottleneck with TCP and low background traffic
10Mbps bottleneck with UDP and low background traffic
100Mbps bottleneck with TCP and high background traffic
(a) Accuracy on the training set
0 0.05 0.1 0.15 0.2 0.25
0
0.2
0.4
0.6
0.8
1
Signal to Noise Ratio
Average Detection Accuracy
10Mbps bottleneck with TCP and high background traffic
10Mbps bottleneck with TCP and low background traffic
10Mbps bottleneck with UDP and low background traffic
100Mbps bottleneck with TCP and high background traffic
(b) Average Accuracy on other traces
Figure 4.22: Detection accuracy as a function of signal-to-noise ratio using xed frequency windows
specied in Section 4.4.7
pairs using the cut-off learned from the corresponding training trace pair. The x axis is the signal-
to-noise ratio for the training trace pair. Both subgraphs include all trace pairs under the four
experiment scenarios listed in Table 4.2 (different scenarios are indicated with different symbols).
As expected, we see a general trend that a higher SNR leads to a better detection accuracy.
The T100H scenario for 100Mbps TCP bottleneck trafc with high background trafc shows the
highest detection accuracies because the bottleneck trafc (around 7.5kpps)is quite large relative
to background trafc (32 - 74 kpps) for SNRs of (0.1 - 0.23). Both detection accuracy on the
training set and average accuracy on other traces can reach over 94% in the T100H scenario. The
U10L scenario for 10Mbps UDP bottleneck trafc with low background trafc (denoted by *)
also has pretty good detection accuracies even though the SNRs are lower (0.053 - 0.125). The
detection accuracy on the training set ranges from 75% to 99%, while the average accuracy on other
traces ranges from 76% to 86%. The T10L scenario for 10Mbps TCP bottleneck trafc with low
background trafc shows generally lower detection accuracies (around 63% to 94%) than the U10L
scenario with similar SNRs for the reason that we have discussed in Section 4.4.6. Finally the T10H
103
scenario for 10Mbps TCP bottleneck trafc with high background trafc gets the lowest detection
accuracies (around 53% to 61%) as the SNRs are the lowest (0.01 - 0.02).
4.5 Utilizing Harmonics
Besides the four detection features listed in Table 4.1, there are some alternative detection features.
In this section we focus on the exploration of utilizing harmonics for the detection purpose. We
know from Section 2.4.2 that the spectrum for a periodic packet arrival time sequence would contain
strong energy not only in the base frequency, but also in its harmonics. This has been validated by
our controlled lab experiments (e.g. Figure 3.1).
One candidate approach to improve the performance of the detection algorithms is to utilize
harmonics since strong energy in harmonics also indicates the presence of the periodic pattern. To
do so, we have designed a detection algorithm which checks the peak amplitude not only in the
frequency window surrounding the base frequency associated with the bottleneck trafc, but also in
the frequency windows around the multiples of the base frequency. We present the details of this
detection algorithm in Section 4.5.1 and then evaluate its performance in Section 4.5.2.
4.5.1 A Detection Algorithm Using Harmonics
To consider harmonics information in detection we design the following detection algorithm. In the
algorithm we dene B
f
to be the base frequency, W
s
to be the size of the base frequency window
(the window around the base frequency), N to be the number of frequency windows to be explored,
and F to be a function that maps an integer between 1 and N to a positive real number.
The algorithm will explore the following frequency windows. The i
0
th frequency window has
a center at i * B
f
and a size of F(i) * W
s
.
104
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
2
4
6
x 10
4
Frequency (Hz)
PSD
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.05
0.1
Frequency (Hz)
NCS
Figure 4.23: Spectrum from 0Hz to 10kHz of a trace segment in T10L
[i * B
f
- F(i) * W
s
/ 2, i * B
f
+ F(i) * W
s
/ 2], where i varies from 1 to N
It picks the peak amplitude in each of these windows, and use their log values to form a multi-
variate vector [x
1
; x
2
; :::; x
N
], where x
i
is the log value of the peak amplitude in the i
0
th frequency
window. In the training phase, it estimates the PDFs of this vector for hypotheses H
B
and H
0
. Like
in the Top-M-Frequencies Detection Algorithm, it assumes multi-variate normal distributions for
the PDFs, so that it only needs to measure the mean vector and covariance matrix. In the detection
phase, it uses the PDFs to classify a new trace according to the Maximum Likelihood Testing Rule
in Section 4.2.1.
To select a proper function for F we examine the energy spread in the trafc spectrum. Fig-
ure 4.5.1 shows the partial spectrum from 0Hz to 10kHz of the isolated bottleneck ow for a seg-
ment in T10L. We see that the energy spikes are becoming more spread out when they appear in
higher frequencies. We have this general observation for all trace segments in our experiments. This
suggests that we should have wider windows at higher frequency locations.
105
In view of this observation, we have explored the following three functions for F . All of them
have non-decreasing window sizes as i increases. The rst one, F(i) = 1 for all i, dictates to use the
same size for all frequency windows. The second one, F(i) = i for all i, implies that the size of the
frequency window increases linearly as the window's sequence number i, while the last one, F(i)
= sqrt(i) for all i, species that the size will increase linearly as the square root of i. We believe
these functions provide a good coverage for proper choices of function F .
4.5.2 Evaluation
To evaluate the performance of this algorithm, we apply this algorithm on the Internet traces under
the T10L scenario. We select the T10L scenario since it is a very challenging scenario with wide
detection accuracies (from 63% to 94%) for the Top-Frequency Algorithm. It is our future work to
apply this algorithm to other scenarios including the T10H scenario which is the most challenging
scenario with the lowest detection accuracies among all scenarios. In addition, we have evaluated
the algorithm with all three different F functions and the results are similar. Here we will focus
on the algorithm with the size of the frequency window increases linearly as the square root of the
window's sequence number i.
Figure 4.24 shows how the detection accuracy varies as N, the number of frequency windows
explored by the algorithm, increases. The left subgraph shows the accuracy on the training set itself.
Each line corresponds to a different training set. We can see that in general the detection accuracy
is improving as N increases. The improvement is more signicant when N increases from 1 to 2.
As N becomes larger, the improvement becomes less clear.
106
1 2 3 4 5 6 7 8 9 10 11 12
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Harmonic Windows
Detection Accuracy
1am trace pair
3am trace pair
5am trace pair
7am trace pair
9am trace pair
11am trace pair
1pm trace pair
3pm trace pair
5pm trace pair
7pm trace pair
9pm trace pair
11pm trace pair
(a) Accuracy on the training set under T10L
1 2 3 4 5 6 7 8 9 10 11 12
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Harmonic Windows
Average Detection Accuracy
1am trace pair
3am trace pair
5am trace pair
7am trace pair
9am trace pair
11am trace pair
1pm trace pair
3pm trace pair
5pm trace pair
7pm trace pair
9pm trace pair
11pm trace pair
(b) Average Accuracy on other traces under T10L
Figure 4.24: Detection accuracies with increasing number of frequency windows explored
However such observations do not hold for the average detection accuracy on other trace pairs
shown in the right subgraph. As we explore more frequency windows, the average detection accu-
racy on other trace pairs actually decreases slightly. This suggests that the single peak amplitude in
the base frequency window has captured the most signicant difference between H
0
and H
B
that
is persistent across different trace pairs. Considering harmonics does not signicantly improve the
average detection accuracy. The reason is due to the variation of the peak amplitude over differ-
ent traces, which we are going to see shortly when we explore the impact of using harmonics on
individual trace pairs below.
To explore the impact of using harmonics on individual trace pairs, we plot the detection ac-
curacy on individual trace pair after training the algorithm with a specic trace pair. Figure 4.25
shows the detection accuracy on trace pairs obtained at 7am, 9am, 11am, 1pm, and 3pm, after we
train the algorithm with the trace pair obtained at 11am. We can see there are three groups of trace
pairs in terms of exploring harmonics. The rst group contains the trace pairs obtained at 11am
and 1pm. The detection accuracy on these trace pairs are increasing as we explore more frequency
107
1 2 3 4 5 6 7 8 9 10 11 12
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Harmonic Windows
Detection Accuracy
7am trace pair
9am trace pair
11 am trace pair
1pm trace pair
3pm trace pair
Figure 4.25: Accuracy on individual trace pairs after training with the 11am trace pair
6 7 8 9 10 11 12 13 14 15 16
10
10.5
11
11.5
12
12.5
13
Time of the Day
Mean of the Log of Peak Amplitude
1st frequency window
2nd frequency window
Figure 4.26: Mean of the peak amplitude (after log) in the 1st and 2nd frequency window for
individual trace pairs
windows (i.e. consider more harmonics). The second group contains the trace pair obtained at 3pm.
The accuracy on this trace pair does not change much when we consider more harmonics. The
last group contains the trace pairs obtained at 7am and 9am. The accuracy on these trace pairs are
actually decreasing with more frequency windows.
To nd out the reason behind the above observation, we examine the statistics of the peak
amplitude in each frequency window for these trace pairs. Specically we inspect the mean values
and standard deviations of the log of the peak amplitude for hypothesis H
B
and H
0
. Figure 4.26
shows the mean values in the 1st and 2nd frequency windows for the trace pairs gathered at 7am,
9am, 11am, 1pm, and 3pm. The solid line is drawn from the
0
(the mean value of the log of the
108
Table 4.5: Statistics of the peak amplitudes (after log) in different frequency windows
Trace Pair 1st window 2nd window 3rd window 4th window 5th window
7am
0, 0 10.536, 0.3271 10.490, 0.3070 10.505, 0.2864 10.506, 0.2815 10.503, 0.2563
B , B 11.456, 0.3200 11.235, 0.2884 11.040, 0.2640 10.987, 0.2683 10.900, 0.2628
9am
0, 0 10.766, 0.3360 10.701, 0.3212 10.704, 0.2983 10.681, 0.2845 10.684, 0.2839
B , B 11.551, 0.3160 11.365, 0.3094 11.224, 0.2972 11.115, 0.2793 11.031, 0.2608
11am
0, 0 11.245, 0.3328 11.173, 0.3073 11.111, 0.2878 11.137, 0.2658 11.158, 0.2815
B , B 11.802, 0.3044 11.590, 0.2810 11.448, 0.2823 11.468, 0.2833 11.368, 0.2663
1pm
0, 0 11.345, 0.3301 11.178, 0.3076 11.149, 0.2715 11.158, 0.2581 11.144, 0.2658
B , B 11.832, 0.3251 11.598, 0.2664 11.462, 0.2734 11.386, 0.2697 11.323, 0.2502
3pm
0, 0 11.475, 0.3075 11.384, 0.3032 11.316, 0.2850 11.269, 0.2859 11.295, 0.2796
B , B 12.030, 0.3408 11.769, 0.2875 11.649, 0.2836 11.583, 0.2759 11.512, 0.2530
peak amplitude for H
0
) to
B
(the mean value of the log of the peak amplitude for H
B
) in the 1st
frequency window, while the dash line is drawn between
0
and
B
in the 2nd frequency window.
We can see a strong correlation between the detection accuracy in Figure 4.25 and the similarity
in the mean values shown in Figure 4.26. When
0
and
B
for a trace pair are similar to
0
and
B
for the 11am trace pair, the detection accuracy on that trace pair will be close to the detection
accuracy on the 11am trace pair after training the detection algorithm with the 11am trace pair.
When the mean values become more different, the detection accuracy will be lower, and the more
frequency windows utilized by the detection algorithm, the worse the detection accuracy. This is
expected as the PDFs are determined by the mean and standard deviation. Table 4.5 shows the
statistics including the mean and standard deviation for the log the peak amplitude of the ve trace
pairs in the rst six frequency windows. We can see that considering harmonics, especially in
the rst one to three harmonics, is helpful to improve the detection accuracy, when the trace under
detection has similar statistics with the training trace. If the statistics are very different, the detection
algorithm will perform poorly and it is counterproductive to consider more harmonics.
109
4.6 Summary
In this chapter, we proposed four non-parametric detection algorithms based on the Maximum Like-
lihood Detection [27]. They differs on how to choose the feature for detection. Two of them, the
Single-Frequency Algorithm and the Top-Frequency Algorithm, use a single random variable for
detection, while the other two, the Top-M-Frequencies Algorithm and the All-Frequencies Algo-
rithm uses multiple random variables.
Evaluation results using real Internet traces show that the Top-Frequency Algorithm achieves
a good trade-off between detection accuracy and complexity. It captures the most signicant and
persistent change introduced by bottleneck trafc (and other periodic network phenomena), while
keeping the detection logic simple and efcient. In addition, we see signicant variations of the the
peak amplitude across different traces and evidence that similarity in trafc volume can affect detec-
tion accuracy. We are going to explore this direction further for parametric detection in Chapter 5.
We also presented an extension to the detection algorithm that utilizes harmonics. Evaluation results
showed that it can improve the detection accuracy signicantly if the training data are statistically
similar to the trace under detection.
110
Chapter 5
PARAMETRIC DETECTION OF PERIODIC PATTERNS
5.1 Introduction
In the last chapter we presented four detection algorithms based on the Maximum Likelihood Detec-
tion [27]. While they differ in how to select the detection feature, all of them are non-parametric in
the sense that they consider only the trafc spectrum information for the detection purpose. Speci-
cally, they gather the statistics of the detection feature drawn from the spectrum of the training trace
for both hypotheses (with bottleneck trafc and without bottleneck trafc), and use the statistical
differences between the two hypotheses to classify an unknown trace. However, the limitation of
these non-parametric algorithms is that they do not consider the variation of the feature statistics
over the time, trafc volume, and other factors.
Experiment results in Section 4.4.2, 4.4.8, and 4.5.2 have shown that the statistics of the
detection feature do have signicant variations over traces gathered at different times, specically
they have a strong correlation with trafc volume. A natural extension to existing non-parametric
algorithms is to study the variation of the detection feature statistics over factors such as trafc
volume, and then utilize this knowledge in the detection process. This leads to the parametric
detection algorithm.
111
Essentially the variation of the detection feature statistics according to other factors depends
on the how the trafc spectrum changes according to these factors. For parametric detection we
rst study the relationship between the trafc spectrum and other factors, and then modify existing
non-parametric algorithms to take advantage of this relationship. The strong correlation between
the detection feature statistics and trafc volume motivates us to design the parametric detection
algorithm. We expect to get better detection accuracy with the parametric detection algorithm as it
utilizes the relationship and adjusts the detection process according to the input parameters.
Like in Chapter 4, we use the pattern caused by trafc congestion along bottleneck links as a
representative of periodic patterns in Internet trafc in the design and evaluation of the parametric
detection algorithm for general periodic patterns. Since we have shown the Top-Frequency Algo-
rithm achieves a good trade-off between detection accuracy and algorithm complexity among the
four non-parametric algorithms in Section 4.4.2, we pick this algorithm as the base for extension
and explore approaches to improve its performance by considering the variation of the spectrum
according to the input parameter.
In the following sections, we rst present the constrains on our systems in measuring parameters,
and then go through a list of candidate parameters that can affect trafc spectrum and select trafc
volume as a good parameter for exploration given the constraints in Section 5.2. Then in Section
5.3 we propose three approaches to utilize the relationship between the trafc spectrum and trafc
volume for parametric detection. Finally we evaluate each of the three approaches using real Internet
trafc in Section 5.4.
112
5.2 Candidate Parameters
There are many factors that affect the spectrum of a trace segment of Internet trafc. However,
not all of them can be measured and utilized for parametric detection due to the constraints on our
system. Basically there are two main constraints on what parameters we can measure and use for
parametric detection. First, the detection decision should be based solely on the trace gathered by
the observation machine. This means we have no direct knowledge about all unobserved trafc in-
cluding unobserved bottleneck trafc and unobserved non-bottleneck trafc. Second, the detection
algorithm should not do ow separation and grouping, as such operations are expensive for high
speed network trafc. This will exclude measurements related to individual ows from considera-
tion, including number of ows, ow duration and ow rate.
Below is a list of inuential factors that can change the spectrum of a trace segment for the
aggregate trafc observed at the monitored link. For each of them, we discuss its impact on the
spectrum and if it is a good parameter for parametric detection given the constraints on our system.
Bottleneck Bandwidth. We have seen in Section 3.2.1 that the bandwidth of the bottleneck
has a direct impact on the spectrum of the trafc going through the bottleneck. When other
conditions are held stable, increasing the bottleneck bandwidth will increase the energy of the
bottleneck spectral signature and make it appear in a higher frequency location as calculated in
formula 3.2. However we do not know the bottleneck bandwidth in advance for the detection
of bottlenecks. So it would not be a good parameter for parametric detection.
Bottleneck Trafc Composition. The composition of the bottleneck trafc includes the packet
size distribution, number of ows going through the bottleneck, duration, rate, and type
(TCP/UDP) of such ows. They all can affect packet spacing in the bottleneck trafc stream,
113
which in turn affects the spectrum. Among them, the packet size distribution plays a more
important role. The higher the percentage of small packets, the higher the energy of the bot-
tleneck spectral signature and the higher frequency location it appears. In addition, UDP
ows tends to be more regular as UDP ows typically do not adjust their sending rate as TCP
ows do. Since we do not do ow separation for the aggregate trafc, we do not know details
of each ow, and we do not know what are in the bottleneck trafc. So bottleneck trafc
composition is also not a good parameter for the parametric detection.
Cross Trafc Composition. In Section 3.3 we have classied three different types of cross
trafc based on their link sharing with the observed bottleneck trafc. They are unobserved
bottleneck trafc, unobserved non-bottleneck trafc, and observed non-bottleneck trafc. Ex-
periment results in that section show that unobserved bottleneck trafc has a direct impact
because it carries part of the energy imposed by the bottleneck. Observed non-bottleneck
trafc also plays an important role because it is a part of the aggregate trafc used in gener-
ating the trafc spectrum. Unobserved non-bottleneck trafc has the least signicant impact
because the link it shares with the observed bottleneck trafc is not congested. For each type
of cross trafc, its trafc composition including the packet size distribution, number of ows
inside, ow duration, ow rate and ow types, can be all affect the spectrum of the observed
aggregate trafc. However only the observed non-bottleneck trafc is captured by the trace
machine as part of the aggregate trafc. Since we do not do ow separation, we do not know
what are in the observed non-bottleneck trafc. So cross trafc composition is a also not a
good parameter for the parametric detection.
114
Aggregate Trafc Composition. Aggregate trafc at the monitored link consists of both ob-
served bottleneck trafc and observed non-bottleneck trafc. Its composition has a direct
affect on its spectrum. Since we do not carry out ow separation, we can only measure prop-
erties related to the whole aggregate trafc, such the trafc volume in terms of packet rate and
bit rate, packet size distribution, etc. They can be used for parametric detection.
Given the constraints on our system, the only available parameters are measurements related
to the aggregate trafc at the monitored link. The measurements include trafc volume in terms
of packet rate and bit rate, packet size distribution, etc. In general the higher trafc volume of
the aggregate trafc, the stronger the energy in the trafc spectrum. Regarding the packet size
distribution, if the percentage of small packet increases, there will be an increase of energy on the
high frequency component.
We select trafc volume as a key parameter to our parametric detection algorithm as trafc
volume is more straightforward to be represented than the packet size distribution. In addition,
experiment results in Section 4.4.2, 4.4.8, and 4.5.2 have shown a strong correlation between
trafc spectrum and trafc volume. In the following section, we will explore three approaches to
utilize it for improving the detection algorithm's performance. Studying packet size distribution and
other parameters is part of our future work.
5.3 Parametric Detection with Trafc Volume
The idea of parametric detection is to study the relationship between trafc spectrum and the input
parameter, and then use this knowledge in the detection process. In the previous section we have
identied trafc volume of the aggregate trafc as a key parameter that can be used for parametric
115
0 5 10 15 20 25
0
2
4
6
8
10
12
14
16
Packet Rate (Kpps)
Peak Amplitude (in log)
(a) Trace segments in hypothesis H0
0 5 10 15 20 25
0
2
4
6
8
10
12
14
16
Packet Rate (Kpps)
Peak Amplitude (in log)
(b) Trace segments in hypothesis HB
Figure 5.1: Peak amplitude in the [780Hz, 800Hz] window vs packet rate for trace segments in the
T10L Scenario
0 20 40 60 80 100 120
0
2
4
6
8
10
12
14
16
Bit Rate (Mbps)
Peak Amplitude (in log)
(a) Trace segments in hypothesis H0
0 20 40 60 80 100 120
0
2
4
6
8
10
12
14
16
Bit Rate (Mbps)
Peak Amplitude (in log)
(b) Trace segments in hypothesis HB
Figure 5.2: Peak amplitude in the [780Hz, 800Hz] window vs bit rate for trace segments in the
T10L Scenario
detection. In this section we present three approaches that explore its relationship with the trafc
spectrum. Here we focus on improving the Top-Frequency Algorithm since Section 4.4.2 has shown
that it captures the most signicant difference introduced by the bottleneck link. In principle, it is
easy to modify other non-parametric algorithms in a similar fashion to consider the trafc volume
information, however we leave that to our future work.
116
5.3.1 Relationship between Trafc Volume and Spectrum
In the Top-Frequency Algorithm, we extract the peak amplitude in a selected frequency window
across the spectra for all trace segments in the training set to form two distributions, one for hy-
pothesis H
0
(with no bottleneck trafc) and the other for hypothesis H
B
(with a particular type of
bottleneck trafc). Now we consider modications to this algorithm by investigating the relation-
ship between the peak amplitude and the trafc volume of the trace segment.
Figure 5.1 plots the peak amplitude in the frequency window [780Hz, 800Hz] against the trafc
volume measured in terms of packet rate using all traces in the T10L Scenario which targets detect-
ing a 10Mbps bottleneck congested with TCP trafc. We select the T10L Scenario as it presents a
challenging scenario for improving the detection accuracies. The left subgraph is for hypothesis H
0
and the right subgraph is for hypothesis H
B
. We can see that there is a general trend of increasing
peak amplitude with increasing packet rate for both hypotheses H
0
and H
B
. The trend is a bit more
clear for hypothesis H
0
.
Figure 5.2 plots the peak amplitude against the trafc volume measured in terms of bit rate.
We observe similar trends for the relationship between the peak amplitude and the bit rate. So
both packet rate and bit rate can used for parametric detection. We will compare the performance
improvement with each of them in Section 5.4.
5.3.2 Three Approaches for Utilizing Trafc Volume Information
We have investigated three approaches to explore the relationship between the peak amplitude and
trafc volume. Each of them has a slightly different assumption, but all of them produce a linear
function that calculates the cut-off threshold based on the input trafc volume. This cut-off threshold
117
0 5 10 15 20 25
0
2
4
6
8
10
12
14
16
Packet Rate (Kpps)
Peak Amplitude (in log)
(a) Based on the Packet Rate
0 20 40 60 80 100 120
0
2
4
6
8
10
12
14
16
Bit Rate (Mbps)
Peak Amplitude (in log)
(b) Based on the Bit Rate
Figure 5.3: Separation by the Middle-line Approach for parametric detection
can then be compared with the unknown trace to classify the unknown trace as in the Top-Frequency
Algorithm 4.3.2. We describe each approach below.
5.3.2.1 Middle-line Approach
.
This approach assumes a linear relationship between the peak amplitude and trafc volume for
both hypotheses in the training data. It uses linear regression [18] to t a line to the data points
for each hypothesis by minimizing the sum of the squares of the vertical deviations from each data
point to the line. As a result, we will get one line for hypothesis H
0
and another line for hypothesis
H
B
. Each line is of the following form.
Y
c
= a
c
X
c
+ b
c
(5.1)
118
where c is the index for the hypothesis (0 or B), Y
c
is the peak amplitude, X
c
is the trafc volume
of the trace segment (either packet rate or bit rate), a
c
and b
c
are constants derived from the data
points. After obtaining a
0
, b
0
, a
B
, and b
B
, we use the average value to obtain the middle line
Y =
a
0
+ a
B
2
X +
b
0
+ b
B
2
(5.2)
as the line to separate hypothesis H
0
and H
B
. For the classication, any points above the middle
line will be classied as in hypothesis H
B
, while any points below the middle line will be classied
as in hypothesis H
0
.
For example, Figure 5.3 shows the separation line between H
0
and H
B
calculated by this ap-
proach for the data points in the T10L scenario. The left subgraph is based on the relationship
between the peak amplitude and the packet rate, while the right subgraph is based on the relation-
ship between the peak amplitude and the bit rate. In both subgraphs, the two solid lines corresponds
the linear regression of the data in H
0
and H
B
, respectively, and the dashed line is the middle line
calculated for separating data points in H
0
and H
B
.
5.3.2.2 Threshold-line Approach
.
In this approach we take advantage of the training results of the original Top-Frequency Algo-
rithm. In the training phase, we obtain multiple pairs of traces. For each trace pair, we run the
original Top Frequency Algorithm to obtain the cut-off threshold, and then plot the cut-off thresh-
old against the average trafc volume for the trace pair. This approach assumes a linear relationship
between the detection threshold and trafc volume, and use linear regression to t a line to the data
119
0 5 10 15 20 25
0
2
4
6
8
10
12
14
16
Packet Rate (Kpps)
Threshold
(a) Based on the Packet Rate
0 20 40 60 80 100 120
0
2
4
6
8
10
12
14
16
Bit Rate (Mbps)
Threshold
(b) Based on the Bit Rate
Figure 5.4: Separation by the Threshold-line Approach for parametric detection
points with x being the trafc volume and y being the threshold. It then uses this line to separate hy-
pothesis H
0
and H
B
. An unknown trace with peak amplitude above the line for the corresponding
trafc volume is classied as in hypothesis H
B
. Otherwise it is classied as in hypothesis H
0
.
Figure 5.4 shows the separation line obtained by this approach for the traces in the T10L Sce-
nario. The left subgraph is based on the packet rate, while the right subgraph is based on the bit
rate. We rst calculate the cut-off thresholds for each of the 12 trace pairs using the Top-Frequency
Algorithm, and plot it against the average trafc volume of the trace pair. The line is then obtained
by using linear regression to t the 12 data points.
5.3.2.3 Pocket Algorithm
.
This approach also assumes a linear separation between the trace segments in H
0
and trace
segments in hypothesis H
B
. It follows the Pocket Algorithm [31] to nd the sub-optimal line for
the separation. The Pocket Algorithm is built based on the Perceptron Learning Algorithm [31]
and is suitable to nd a linear separation for non-separable training data. The basic idea of the
Perceptron Learning Algorithm is to rene the weight vector of the separation line when it fails to
120
correctly classify a training example (negative reinforcement). If there exists a line that can correctly
separate all training examples, the Perceptron Learning Algorithm is guaranteed to nd such a line.
However it does not do well when such a line does not exist.
The Pocket Algorithm was proposed to deal with non-separable training data where there does
not exist a line that can correctly separate all training examples. It considers not only negative
reinforcement, but also positive reinforcement by counting how many training examples that are
correctly classied by the algorithm. For faster convergence, we apply data normalization [77] to
the training data so that the normalized training data have zero mean and unit variance along each
of the two features, peak amplitude and trafc volume (packet rate or bit rate).
5.4 Evaluation with Real Internet Trafc
To evaluate the performance of the three different approaches for parametric detection, we use
the traces gathered under the four different scenarios described in the previous chapter. For each
scenario, we use all 12 trace pairs for training and then calculate the detection accuracy of the
three approaches on the training data. As a comparison, we also run the original Top-Frequency
Algorithm on the same training data and calculate its detection accuracy on the training data. For
the Pocket Algorithm, we stop after running ten passes over the training data. In other words, each
data point in the training set is used on average ten times by the Pocket Algorithm.
Table 5.1 shows the detection accuracy of the original Top-Frequency Algorithm and the three
parametric detection approaches with the packet rate information. For each scenario, we use the
same xed frequency windows as the ones selected in Section 4.4.7. The results show that the
Pocket Algorithm performs best, while the other two approaches are slightly better than the original
Top-Frequency Algorithm except for the T10H scenario. For example, for the T10L Scenario, the
121
Table 5.1: Performance of parametric detection with packet rate information
Data Set Top-Frequency Middle-line Approach Threshold-line Approach Pocket Algorithm
T10L 0.7578 0.7753 0.7761 0.7844
U10L 0.8472 0.8579 0.8633 0.8646
T100H 0.9693 0.9668 0.9699 0.9779
T10H 0.5510 0.5489 0.5490 0.5547
Table 5.2: Performance of parametric detection with bit rate information
Data Set Top-Frequency Middle-line Approach Threshold-Line Approach Pocket Algorithm
T10L 0.7578 0.7374 0.7290 0.7650
U10L 0.8472 0.8317 0.8358 0.8522
T100H 0.9693 0.9628 0.9715 0.9894
T10H 0.5510 0.5456 0.5486 0.5550
Pocket Algorithm has 78.4% accuracy, the Threshold-line Approach has 77.6% accuracy, and the
Middle-line Approach has 77.5% accuracy. They are all higher than the accuracy for the original
Top-Frequency Algorithm at 75.8%. However, the improvement is not signicantly (less than 3%
for all scenarios).
Table 5.2 shows the evaluation results using the bit rate information. Again we see only limited
improvement of the three approaches over the original Top-Frequency Algorithm (less than 2% for
all scenarios), and in general the improvement using the bit rate information is a bit lower than the
improvement using the packet rate information. For example, for the T10L scenario, the Pocket
Algorithm has an accuracy of 76.5%, lower than the 78.4% accuracy using the packet rate infor-
mation. The Middle-line Approach and the Threshold-line Approach even have lower accuracies
than the original Top-Frequencies Algorithm. Since trafc spectrum is calculated based on packet
arrival times, intuitively the packet rate would be relevant to trafc spectrum compared with the
bit rate. This can explain the lower improvement using the bit rate information compared with the
improvement using the packet rate information.
122
As we see from the above two tables, parametric detection using either packet rate or bit rate
only yields small improvement over the original non-parametric detection. We believe the reason
for the limited improvement is because we do not have a model of all important parameters that
affect the trafc spectrum. All three approaches only consider a single parameter, trafc volume.
There are many other important factors that have not been considered due to the constraints on our
system. For example, unobserved bottleneck trafc and unobserved non-bottleneck trafc are not
considered because they are not observed at the monitoring point.
Figure 5.1 and 5.2 show that trafc spectrum can change fairly widely (by a difference of 2
in the log scale) even when the trafc volume remains the same. This conrms that trafc volume
information alone is not sufcient to model the process governing the trafc spectrum. It explains
the limited improvement for using only trafc volume information in the parametric detection.
5.5 Summary
In this chapter, we investigated parametric detection which considers new information in addition
to the spectrum information as we did for non-parametric detection. We rst presented a list of
parameters that can shape the trafc spectrum, then discussed the constraints on our system for
the parameter selection. In view of the constraints we picked trafc volume and proposed three
approaches to utilize it for parametric detection. We then evaluated their performance using real
Internet traces. The results show only limited improvement in detection accuracy by considering
trafc volume. This argues for a more complete and quantitative model of all important factors that
affect the trafc spectrum.
123
Chapter 6
RELATED WORK
There are numerous approaches for the analysis of network trafc. In the past a number of systems
have been proposed to capture and analyze network trafc for various purposes. In the previous
chapters we presented a general framework to detect periodic patterns in Internet trafc using spec-
tral techniques and a range of detection algorithms that rely on rigorous statistical methods. In the
following sections we rst review existing systems for Internet Measurement. Then we give a sur-
vey of systems that analyze Internet trafc with spectral techniques and how our system differs from
them. Finally we review other techniques for detecting DoS attacks, bottlenecks, and other related
problems.
6.1 Internet Measurement
In this section we briey review both active and passive, measurement systems developed for the
Internet. A measurement system is called active if it injects measurement trafc, such as probe
packets, into the network. A passive measurement system on the other hand, does not inject any
measurement trafc to the network, but rather observes the actual trafc on the network. In this
124
dissertation we developed a passive measurement system to capture network trafc and analyze it
in the spectral domain.
There have been many active measurement systems deployed in the past. A few of the large
active measurement systems include NIMI [5], MINC [14], Surveyor [42], and AMP [53]. The
NIMI (National Internet Measurement Infrastructure) project is aimed to develop an architecture
for deploying and managing scalable active systems. It uses tools such as ping, traceroute, mtrace,
and treno to perform the actual measurements. The MINC (Multicast-based Inference of Network-
internal Characteristics) measurement system infers link loss rates, delay distributions, and topology
based on the correlations of the received packets after multicast probe messages are transmitted to
multiple destinations. Surveyor uses a set of synchronized hosts to measure one-way and round-trip
delay over various Internet paths. The AMP (NLANR Active Measurement Project) system employs
a set of monitoring stations to measure the performance of the vBNS backbone. In addition, compa-
nies such as Keynote and Matrix conduct commercial network performance measurements [57, 1].
Examples of passive measurement systems include SNMP-based network trafc measurement
tools [16] and NetFlow [74]. SNMP is the most widely used network management protocol in
today's Internet. Agents update a management information base (MIB) within network routers,
and management stations retrieve MIB information from the routers using UDP. Most routers sup-
port SNMP and implement public MIBs as well as vendor specic private MIBs. Typical SNMP
information includes the number of packets and bytes received on an interface, the number of trans-
mission errors on a link, and packet drop rate. Although SNMP provides some aggregate statistics,
it does not permit detailed study of network dynamics which is the aim of the system proposed
in this dissertation. NetFlow is a monitoring system available on Cisco routers that collects ow
statistics observed at an interface. NetFlow provides more details about the ows than SNMP, but it
125
requires an external system to record the NetFlow data. Additionally, it incurs a signicant perfor-
mance burden on routers and in order to keep up with line rate it have to use sampling to reduce the
overhead.
Our system is a passive measurement system. Compared with other passive measurement sys-
tems, we only need packet arrival times in the trafc gathering. In addition, we apply spectral
techniques such as Fourier transformation to retrieve the spectral characteristics of periodic net-
work phenomena and use rigorous statistical methods to quantitatively detect the existence of such
phenomena. For the gathering of trafc traces, we use a variety of existing methods to passively
monitor trafc, including snifng through shared Ethernet hub, port mirroring, and in-line tapping,
and data capturing by either non-commercial tools like tcpdump or specialized devices such as
Endace DAG network monitoring cards [75].
6.2 Spectral Analysis of Network Trafc
Recently, a number of researchers have applied spectral techniques in the analysis of network trafc
for various purposes. In this section, we review signal processing techniques used in analyzing
DoS attack, network anomalies, and bottlenecks, and spectral techniques in the eld of network
tomography.
6.2.1 Spectral Analysis of DoS Attacks, Network Anomalies, and Bottlenecks
Hussain et al. applied spectral techniques to packet arrival time series to distinguish single-source
and multi-source DDoS attacks [36]. They also proposed a spectral approach to use the spectral
signature to identify repeated DoS attacks [37]. But all these are done after detecting and isolating
attack packets from background trafc. More recently, they proposed to use wavelet to detect attacks
126
from the aggregate trafc without ow separation [35]. Cheng et al. also employed spectral analysis
to separate normal TCP from DDoS by using the strong periodicities around the RTTs in the TCP
ows [19]. Network spectroscopy [13] allows characterization of link technology. Partridge et al.
proposed Lomb periodograms to retrieve periodicities in wireless communication, including CBR
trafc and FTP trafc [65]. Partridge also warned of the dangers of blind application of signal
processing techniques in networking without careful analysis and knowledge of ground truth [63,
64].
Barford et al. used wavelets to analyze IP ow-level and SNMP information to detect DoS
attacks and other network anomalies [9]. Magnaghi et al. proposed a wavelet-based framework to
proactively detect network miscongurations, which utilizes the TCP retransmission timeout events
during the opening phase of the TCP connection [49]. Kim et al. applied wavelet denoising to
improve the detection of shared congestion among different ows [45]. It uses the cross-correlation
between the one-way-delay experienced by packets in different ows for the detection, and then
improves the performance by reducing the impact of random queuing behavior while preserving
the behavior of highly congested links through wavelet denoising. The technique requires inserting
active probing to measure the one-way-delay.
Unlike most of the above systems, we target the detection of periodic patterns inside Internet
trafc using spectral techniques on the aggregate level without ow separation. We mostly focus on
the periodicities caused by trafc congestion along the bottlenecks as well as DoS attack streams
and TCP windowing behavior. We examine their signatures in the spectral domain and investigate
how they can be shaped by different factors, including bottleneck bandwidth, packet size, and cross
trafc. In addition, we rely on rigorous statistical methods to detect their existence. We have pro-
posed four non-parametric detection algorithms based on a classic detection method, the Maximum
127
Likelihood Detection [27]. We have also studied a parametric detection algorithm that utilizes the
variation of the trafc spectrum over different trafc volume for better detection accuracies.
6.2.2 Network Tomography
People have also been using spectral techniques for network tomography [22]. Network tomogra-
phy is concerned with identifying network bandwidth, performance, and topology by taking mea-
surements, either actively from network nodes [28, 21, 23], or passively using measurements from
existing trafc [54, 80]. Active probing techniques are designed to measure a specic network prop-
erty. Multicasting mechanism is well suited for active probing and has been commonly used in the
past [14, 28]. Each multicast packet sent from the source is replicated across the network whenever
there is a new branch in the multicast tree. Consequently, when a packet gets dropped or queued on
a certain link, all receivers descended from that link will observe the effect of the loss or queuing
behavior. Passive techniques estimate packet loss by observing the evolution of application trafc.
The passive approach analyzes network trafc measurements to nd lossy links affecting the per-
formance of the applications. The input signal is typically packet delays, round-trip time, loss, and
tomography tries to infer network characteristics using correlation techniques such as maximum
likelihood estimation and Bayesian inference.
Unlike our techniques, in network tomography multiple observation points may be required and
ows need to be separated from the aggregate. Some of the techniques in network tomography
requires the insertion of active probing packets into the network. Moreover, the scope of network
tomography is mainly to characterize the network, whereas we try to detect periodic patterns inside
the trafc.
128
6.3 Non-Spectral Analysis of Network Trafc
Besides spectral techniques, people have long been using other techniques to analyze Internet traf-
c for various purposes, including the detection of DoS attacks and bottlenecks. In this section,
we briey review existing non-spectral techniques used in detecting DoS attacks, bottlenecks, and
related problems.
6.3.1 Detection of DoS Attacks
As a rapid growing issue, the DoS attack has attracted huge interests from network research com-
munities. There have been numerous systems designed to tackle DoS attacks. In [59] Mirkovic et
al. presented a systematic taxonomy of DDoS attacks and DDoS defense mechanisms. In general,
there are two types of strategies for detecting DoS attacks. The rst one, pattern-based strategy,
stores signatures of known attacks such as a particular combination of port number and byte string
in the packet payload, and monitors each ow for the presence of these signatures. Bro [67] and
Snort [71, 10] are two examples of using the pattern-based strategy. While this strategy has low
false positive rate, it may miss new attacks or even slight variations of old attacks.
The second strategy uses anomaly detection to detect DoS attacks. It rst models normal system
behavior, and then periodically compare the current system state with the model to detect anomalies.
Approaches presented in [79, 51, 32, 76, 60, 61, 58] provide examples of the anomaly detection
approach. The advantage of anomaly detection over pattern detection is that previously unknown
attacks can be discovered. The drawback is that anomaly detectors must trade the their ability to
detect attacks against their tendency to misidentify normal behavior as an attack.
Our system follows the second strategy. A salient feature of our system is that we do not examine
any packet content, but rather rely solely on the periodicities caused by DoS attacks. Compared
129
with normal network trafc, DoS attacks introduce strong high frequency components to the trafc
spectrum. We use this to detect DoS attacks from the aggregate trafc.
6.3.2 Detection of Bottlenecks and Estimation of Link Capacity
For bottleneck detection, Katabi et al. proposed to use the statistics on packet interarrival times to
infer the path characteristics such as bottleneck capacity and detect bottleneck sharing among ows
based on entropy in [44]. Their approach is passive, but it require ow separation and combination
to reveal the periodic packet transmissions along the bottleneck. Hu et al. [33] presented a tool to
detect the location of a bottleneck in a path using active probing. This tool, however, requires active
probing of the network and is costly for routine bottleneck monitoring.
A number of techniques have been used for estimating link capacity and available bandwidth.
Specically, Pathchar [38], Clink [26], Pchar [50], and the tailgating technique of [48] measure per-
hop capacity, while Bprobe [15], Nettimer [47], Pathrate [25], and the PBM methodology [66]
measure end-to-end capacity. CapProbe [43] also estimates end-to-end capacity but with a much
lower computation cost and faster convergence by using the packet pair with minimum delay sum.
Cprobe [15] and Pipechar [41] estimate the available bandwidth along a path based on the
dispersion of long packet trains at the receiver. However [25] shows that the dispersion of long
packet trains does not measure the available bandwidth in a path; instead, it measures a different
throughput metric that is referred to as Asymptotic Dispersion Rate (ADR). Pathload [39, 40]
measures the available bandwidth by varying the probing stream rate and observing the change in
the one way delay of probing packets. Hu et al. [34] proposed two techniques, IGI and PTR,
to measure the available bandwidth using packet pairs. Another packet pair technique, Spruce,
130
was proposed in [73]. Other techniques for estimating available bandwidth include TOPP [55],
Delphi [69], Pathchirp [70], and Smart [56].
There are several key differences between the above techniques and ours. First of all, many of
these techniques rely on active measurements, whereas ours are passive. Second, many of them
need access to both ends of the path, whereas we assume a single observation point. Finally, for the
above techniques that are passive, all of them need to isolate specic ows, whereas our techniques
work on large aggregate trafc without ow separation and grouping.
6.4 Summary
While prior research has applied signal processing to detect various network phenomena, none has
been designed systematically to detect periodic patterns from the aggregate trafc without ow
separation and grouping. In this dissertation, we transformed the packet stream into a spectral do-
main representation to reveal the existences of various periodic patterns caused by trafc congestion
along bottlenecks, DoS attacks, TCP windowing behavior, etc. As compared to past research, our
approach is passive (i.e., does not require probing packets), requires only packet arrival time in-
formation, operates on aggregate trafc so that individual trafc ows do not have to be extracted,
transforms the data to a suitable spectral domain representation rather than operating with time-
domain information, and makes use of more rigorous statistical methods rather than relying on
qualitative visual evidence.
131
Chapter 7
CONCLUSIONS AND FUTURE WORK
We close this dissertation with the nal conclusions and a number of research problems that may be
addressed in future work.
7.1 Conclusions
Given the size and complexity of the Internet today, tools that can help understand and diagnose
problems with the network are very important. Spectral techniques have great potential in creating
powerful tools to extract hidden patterns in the network to understand phenomena ranging from net-
work anomalies to application and protocol behavior, and detection theory provides the background
to translate the pretty pictures in visual representation into quantitative algorithms.
In this dissertation, we presented a methodology to apply these techniques to the detection of
periodic patterns in Internet trafc, including periodicities caused by trafc congestion along bottle-
neck links, by Denial-of-Service attacks, and by TCP windowing behavior . Practical applications
our methodology include troubleshooting, capacity planning, trafc estimation, DDoS detection,
application monitoring, etc. While we cannot pinpoint the exact location of the source causing the
periodic pattern, such as the location of the bottleneck, our techniques can be used to determine
132
the existence of such periodic pattern and infer related information such as the bandwidth of the
bottleneck and the duration.
In addition to visually demonstrating the spectral signatures imposed by bottleneck links and
other network processes and the effect of cross trafc on the spectral signatures, we proposed four
non-parametric detection algorithms based on the Bayes Maximum Likelihood Classier to au-
tomatically detect periodic patterns embedded in aggregate trafc. Specically we targeted the
detection of the pattern associated with bottleneck links and evaluated the performance of the four
algorithms with real-world Internet trafc.
Our results show that we can detect the periodic pattern caused by a remote bottleneck link
in the aggregate trafc without ow separation, even if observed bottleneck trafc volume is only
10% or less of total aggregate trafc. Our techniques are completely passive, suitable for routine
network monitoring, and can detect bottlenecks remotely with no need for direct observation. We
investigated the effects of several factors on detection performance, including the effect of Signal-
to-Noise ratio, the selection of the detection window, and the variation using different training data.
The results show that among the four detection algorithm, the Top-Frequency Algorithm provides a
good trade-off between detection accuracy and algorithm complexity.
We also extended the basic detection algorithm in two directions. In the rst direction, we
proposed to take advantage of harmonics that exist commonly for periodic patterns to improve the
detection performance. For the second direction, We developed a parametric model to study other
inuential information besides spectrum in the detection process. Specically we studied the impact
of trafc volume and proposed three approaches to utilize it as well as the spectrum information
to get better performance. The evaluation of these extensions show that we can get signicant
133
improvement by considering harmonics for traces with similar statistics to the training data and
marginal improvement by considering the aggregate trafc volume in the detection process.
In summary, the results in this thesis show that periodic patterns have unique spectral charac-
teristics and our algorithms based on rigorous statistical methods can detect periodic patterns in
Internet trafc despite the interference from noisy cross trafc.
There are some limitations of our current work. Above all we do not have a quantitative model
for the evolution of the spectrum in the presence of competing cross trafc, even though we have
presented formulas 3.2 and 3.3 to calculate the base frequency for the periodic patterns and have
visually demonstrated the impact of cross trafc in Section 3.3. Without a quantitative model we
can not predict accurately the trafc spectrum in the presence of cross trafc, hence we are unable
to design a versatile parametric algorithm that can operate under all scenarios with limited training.
We believe it is hard to model every aspect of cross trafc impact just like it is hard to model the
general Internet trafc, but it is still possible to model a limited number of parameters to aid for the
detection purpose.
In addition, we have only carried out experiments in controlled lab environments or limited
wide-area network scenarios. For example, for the wide-area network experiments we only have one
observation point which monitors USC Internet-2 trafc. To verify that our ndings hold for general
networks we need to carry out experiments in more diversied environments, such as including
different monitoring points and different bottleneck locations. It would be our future work to address
these limitations.
134
7.2 Future Directions
In the previous section we have discussed a few limitations of our current work. In the future we
plan to address these limitations and strengthen our methodology as follows.
7.2.1 Quantitative Model of Periodic Pattens and New Parametric Detection
In our current work we have identied several period patterns in Internet trafc and examined how
they are generated. We have also proposed two formulas 3.2 and 3.3 to calculate their base
frequencies. However, for their evolution in the presence of cross trafc, we only have visual
demonstration of the changes caused by cross trafc. We do not have a quantitative model for the
evolution of the periodic patterns in the presence of competing cross trafc.
It is important to have a quantitative model so that we can design more intelligent parametric
detection algorithms that use few training and achieve higher detection accuracy. It is hard to have
a complete model for the evolution of the periodic patterns, but we believe it is feasible to model a
limited number of parameters to aid for the detection purpose. This would be our top priority for
the future work.
7.2.2 Temporal and Spatial Stability
Currently we only have a limited set of experiments in the evaluation of our detection algorithms.
To gain a more thorough understanding of stability of the algorithms temporally and spatially, we
should apply the detection methods in more diversied environments, including different monitoring
points, different bottleneck locations, different types of trafc composition (e.g., single ow versus
multiple ows), and different types of cross trafc. In this way we can examine if our current
135
ndings still hold, or new factors have to be included to improve our detection algorithms. To do
this we need access to more remote systems.
7.2.3 Alternative Detection Features
The success of the algorithms in Chapter 4 and 5 depends largely on the feature selected for the
detection purpose. In the past we have used a variety of detection features, such as the amplitude
in a single frequency or the peak amplitude in a frequency window. We have also investigated the
effect of varying parameters on the performance of the algorithms, such as the frequency window
size and location. Results show that the peak amplitude in a proper-size frequency window cen-
tered around the base frequency seems to capture the most signicant difference introduced by the
periodic pattern. However, we need to conrm this as we gather more traces in diversied environ-
ments. Alternative detection features should be explored if our current ndings do not hold well for
the new traces.
7.2.4 Application to Other Periodic Patterns
Currently we have applied our techniques to detect periodic patterns caused by bottleneck links,
by DoS attacks, and by TCP windowing behavior. We believe that our techniques are applicable
to many other regular patterns. Examples include the periodic-ties embedded in voice and video
streams, in Peer-to-Peer Network trafc, and in trafc generated by softbots. We would analyze In-
ternet trafc to identify other interesting periodic patterns inside. This will expand the applicability
of our methodology and help gain insight into other network phenomena.
136
7.3 Final Notes
In this thesis we show that we can reveal periodic patterns in Internet trafc by extracting their
spectral characteristics from the trafc spectrum. We design a range of automatic detection algo-
rithms based on rigorous statistical methods, and evaluation results using real-world Internet traces
show they can provide can provide very high accuracy in detecting periodic patterns from Internet
trafc even when they are mixed with noisy background trafc. Although we focus on the pattern
caused by trafc congestion along bottleneck links in the design and evaluation of the automatic
algorithms, they can easily adapted to detect other periodic patterns in Internet trafc. Our system
based on these algorithms has wide applications, including the detection of trafc congestion along
bottleneck links, DoS attacks, and under-performing TCP ows. It also helps us for a better under-
standing of Internet trafc dynamics. All these can contribute to the continuous success of Internet
in the future.
137
Reference List
[1] The interaction of web content and internet backbone performance. http://www.
keynote.com/services/html/wp-compdata.html, Keynote White Paper.
[2] Mstream: Distributed denial of service tool. May 2000. CERT incident Note IN-2000-05.
[3] IEEE 802.3-2002 standard. March 2002.
[4] OECD Broadband Statistics, December 2005. April 2006.
[5] Andrew Adams, Tian Bu, Timur Friedman, Joseph Horowitz, Don Towsley, Ramon Caceres,
Nick Dufeld, Francesco Lo Presti, Sue B. Moon, and Vern Paxson. The use of end-to-end
multicast measurements for characterizing internal network behavior. IEEE Communications
Magazine, 38(5):152159, May 2000.
[6] Paul S. Addison. The Illustrated Wavelet Transform Handbook. Institute of Physics, 2002.
[7] M. Allman, S. Dawkins, D. Glover, J. Griner, D. Tran, T. Henderson, J. Heidemann, J. Touch,
H. Kruse, S. Ostermann, K. Scott, and J. Semke. Ongoing TCP research related to satellites.
RFC 2760, Internet Request For Comments, February 2000.
[8] Paul Barford and Mark Crovella. Generating Representative Web Workloads for Network
and Server Performance Evaluation. In Proceedings of the ACM SIGMETRICS'98, Madison,
Wisconsin, USA, June 1998.
[9] Paul Barford, Jeffery Kline, David Plonka, and Amos Ron. A Signal Analysis of Network
Trafc Anomalies. In Proceedings of the ACM SIGCOMM Internet Measurement Workshop,
Marseilles, France, November 2002.
[10] Jay Beale. Snort 2.1 Intrusion Detection. Syngress, second edition edition, May 2004.
[11] George Box, Gwilym Jenkins, and Gregory Reinsel. Time series analysis: forecasting and
control. Prentice-Hall, 1994.
[12] Ronald Bracewell. The Fourier Transform and Its Applications. McGraw-Hill, 1986.
[13] Andre Broido, Evi Nemeth, and kc Claffy. Spectroscopy of DNS Update Trafc. In Proceed-
ings of the ACM SIGMETRICS, pages 320321, San Diego, CA, June 2003.
138
[14] R. Caceres, Nick Dufeld, J. Horowitz, and Don. Towsley. Multicast-based inference of
network-internal loss characteristics. IEEE Transactions on Information Theory, 45(7):2462
2480, 1999.
[15] Robert Carter and Mark Crovella. Measuring bottleneck link speed in packet-switched net-
works. Technical Report 1996-006, 1996.
[16] J Case, M Fedor, M Schoffstall, and J Davin. A Simple Network Management Protocol
(SNMP). Request For Comments (RFC) 1157, May 1990. http://www.ietf.org/rfc/rfc1157.txt.
[17] I.M. Chakravarti, R.G. Laha, and J. Roy. Handbook of Methods of Applied Statistics, Volume
I. John Wiley and Sons, 1967.
[18] Samprit Chatterjee and A S Hadi. Inuential Observations, High Leverage Points, and Outliers
in Linear Regression. Statistical Science, 1(3):379 416, 1986.
[19] Chen-Mou Cheng, H.T. Kung, and Koan-Sin Tan. Use of spectral analysis in defense against
DoS attacks. In Proceedings of the IEEE GLOBECOM, pages 2143 2148, Taipei, China,
November 2002.
[20] K Claffy, G Miller, and K Thompson. The nature of the beast: Recent trafc measure-
ments from an internet backbone. In Proceedings of International Networking Conference
(INET)'98, Geneva, Switzerland, July 1998.
[21] M. Coates, R. Castro, and R. Nowak. Maximum likelihood network topology identication
from edge-based unicast measurements. In Proceedings of the ACM SIGMETRICS, pages
1120, Marina del Rey, CA, USA, June 2002.
[22] M. Coates, A. O. Hero, R. Nowak, and B. Yu. Internet tomography. IEEE Signal Processing
Magazine, 19(3):4765, May 2002.
[23] Mark Coates, Michael Rabbat, and Robert Nowak. Merging logical toplogies using end-to-end
measurements. In Proceedings of the ACM Internet Measurement Conference, pages 192203,
Miami Beach, FL, USA, October 2003.
[24] James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex
Fourier series. Mathematics of Computation, 19(90):297301, 1965.
[25] C. Dovrolis, P. Ramanathan, and D. Moore. What do packet dispersion techniques measure?
In Proceedings of the IEEE Infocom 2001, pages 905914, Anchorage, Alaska, USA, April
2001.
[26] Allen B. Downey. Using pathchar to estimate internet link characteristics. In Proceedings of
the ACM SIGCOMM Conference, pages 241250, Cambridge, MA, USA, August 1999.
[27] Richard Duda, Peter Hart, and David Stork. Pattern Classication. Wiley Interscience, 2000.
139
[28] N.G. Dufeld, J. Horowitz, F. Lo Presti, and D. Towsley. Multicast topology inference from
measured end-to-end loss. IEEE Transactions on Information Theory, 48(1):2645, January
2002.
[29] Cristian Estan, Stefan Savage, and George Varghese. Automatically inferring patterns of re-
source consumption in network trafc. In Proceedings of the ACM SIGCOMM Conference,
pages 137149, Karlsruhe, Germany, August 2003.
[30] Cristian Estan and George Varghese. New directions in trafc measurement and accounting.
In Proceedings of the ACM SIGCOMM Conference, pages 323336, Pittsburgh, Pennsylvania,
USA, August 2002.
[31] Stephen I. Gallant. Neural Network Learning and Expert Systems. The MIT Press, 1993.
[32] T. M. Gil and M. Poletto. Multops: a data-structure for bandwidth attack detection. In Pro-
ceedings of 10th Usenix Security Symposium, Washington, D.C., USA, August 2001.
[33] Ningning Hu, Li Erran Li, Zhuoqing Morley Mao, Peter Steenkiste, and Jia Wang. Locating
Internet Bottlenecks: Algorithms, Measurements and Implications. In Proceedings of the
ACM SIGCOMM 2004, pages 4154, Oregon, USA, August 2004.
[34] Ningning Hu and Peter Steenkiste. Evaluation and characterization of available bandwidth
probing techniques. IEEE JSAC Special Issue in Internet and WWW Measurement, Mapping,
and Modeling, 21(6):879 894, August 2003.
[35] Aleya Hussain. Measurement and Spectral Analysis of Denial of Service Attacks. Ph.D.
thesis, University of of Southern California, 2005.
[36] Aleya Hussain, John Heidemann, and Christos Papadopoulos. A Framework for Classify-
ing Denial of Service Attacks. In Proceedings of the ACM SIGCOMM'2003, pages 99110,
Karlsruhe, Germany, August 2003.
[37] Aleya Hussain, John Heidemann, and Christos Papadopoulos. Identication of Repeated
Denial of Service Attacks. In Proceedings of the IEEE Infocom 2006, Barcelona, Spain, April
2006.
[38] Van Jacobson. Pathchar: A Tool to Infer Characteristics of Internet Paths.
ftp://ftp.ee.lbl.gov/pathchar/.
[39] M. Jain and C. Dovrolis. Pathload: A measurement tool for end-to-end available bandwidth.
In Proceedings of Passive and Active Measurements (PAM) Workshop 2002, Fort Collins, CO,
USA, March 2002.
[40] Manish Jain and Constantinos Dovrolis. End-to-end available bandwidth: measurement
methodology, dynamics, and relation with tcp throughput. ACM/IEEE Transactions on Net-
working, 11(4):537 549, August 2003.
140
[41] G. Jin, G. Yang, B. Crowley, and D. Agarwal. Network characterization service (NCS). In
Proceedings of the 10th IEEE Symposium on High Performance Distributed Computing, Ed-
inburgh, Scotland, August 2001.
[42] Sunil Kalidini and Matthew Zekaukas. Surveyor: An Infrastructure for Internet Performance
Measurements. In Proceedings of International Networking Conference (INET)'99, Stock-
holm, Sweden, June 1999.
[43] Rohit Kapoor, Ling-Jyh Chen, Li Lao, Mario Gerla, and M. Y . Sanadidi. CapProbe: A Simple
and Accurate Capacity Estimation Technique. In Proceedings of the ACM SIGCOMM'2004,
pages 67 78, Oregon, USA, August 2004.
[44] Dina Katabi and Charles Blake. Inferring congestion sharing and path characteristics from
packet interarrival times. Technical Report MIT-LCS-TR-828, Massachusetts Institute of
Technology, Laboratory for Computer Science, 2001.
[45] Min Sik Kim, Taekhyun Kim, YongJune Shin, Simon S. Lam, and Edward J. Powers. A
Wavelet-Based Approach to Detect Shared Congestion. In Proceedings of the ACM SIG-
COMM 2004, pages 293 306, Portland, Oregon, USA, August 2004.
[46] Aleksandar Kuzmanovi´ c and Edward W. Knightly. Low-rate tcp-targeted denial of service
attacks (the shrew vs. the mice and elephants). In Proceedings of the ACM SIGCOMM Con-
ference, Karlsruhe, Germany, August 2003.
[47] Kevin Lai and Mary Baker. Measuring bandwidth. In Proceedings of the IEEE Infocom 1999,
pages 235245, April 1999.
[48] Kevin Lai and Mary Baker. Measuring link bandwidths using a deterministic model of packet
delay. In Proceedings of the ACM SIGCOMM'2000, pages 283 294, Stockholm, Sweden,
September 2000.
[49] Antonio Magnaghi, Takeo Hamada, and Tsuneo Katsuyama. A Wavelet-Based Framework
for Proactive Detection of Network Miscongurations. In Proceedings of ACM workshop on
Network Troubleshooting: Research, Theory and Operations Practice Meet Malfunctioning
Reality, pages 253 258, Portland, Oregon, USA, August 2004.
[50] Bruce A. Mah. Pchar: A tool for measuring internet path characteristics. http://www.
employees.org/bmah/Software/pchar/.
[51] Ratul Mahajan, Steven M. Bellovin, Sally Floyd, John Ioannidisand, Vern Paxson, and Scott
Shenker. Controlling high bandwidth aggregates in the network. In ACM Computer Commu-
nications Review, volume 32, pages 62 73, July 2002.
[52] Stephan Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.
[53] Tony McGregor and Braun Hans-Werner. Balancing cost and utility in active monitoring:
The amp example. In Proceedings of International Networking Conference (INET) 2000,
Yokohama, Japan, June 2000.
141
[54] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot. Trafc matrix estimation:
Existing techniques and new directions. In Proceedings of the ACM SIGCOMM Conference,
pages 161174, Pittsburgh, Pennsylvania, USA, October 2002.
[55] B. Melander, M. Bjorkman, and P. Gunningberg. A new end-to-end probing and analysis
method for estimating bandwidth bottlenecks. In Proceedings of the IEEE Globecom 2000,
pages 415 420, San Francisco, California, USA, November 2000.
[56] Liu Min, Shi Jinglin, Li Zhongcheng, Kan Zhigang, and Ma Jian. A new end-to-end mea-
surement method for estimating available bandwidth. In Proceedings of the Eighth IEEE
International Symposium on Computers and Communications, pages 1393 1400, Kemer -
Antalya, TURKEY , June 2003.
[57] MIQ rating methodology. http://rating.miq.net/method.html.
[58] Jelena Mirkovic, Greg Prier, and Peter Reiher. Attacking DDoS at the source. In Proceedings
of the IEEE International Conference on Network Protocols, pages 312 321, Paris, France,
November 2002.
[59] Jelena Mirkovic and Peter Reiher. A taxonomy of ddos attack and ddos defense mechanisms.
ACM SIGCOMM Computer Communication Review, 34(2):39 53, 2004.
[60] David Moore, Colleen Shannon, Douglas J. Brown, Geoffrey M. V oelker, and Stefan Sav-
age. Inferring Internet denial-of-service activity. ACM Transactions on Computer Systems,
24(2):115 139, May 2006.
[61] David Moore, Colleen Shannon, and Jeffery Brown. Code-red: a case study on the spread and
victims of an internet worm. In IMC '02: Proceedings of the 2nd ACM SIGCOMM conference
on Internet measurement, pages 273 284, Marseille, France, November 2002.
[62] Lomb N.R. Least-squares frequency analysis of unequally spaced data. In Astrophysics and
Space Science, volume 39, pages 447462, 1976.
[63] Craig Partridge. Frequency analysis of protocols, February 2004. Talk given at U. Minnesota.
[64] Craig Partridge. Internet signal processing: Next steps, November 2004. Talk given at the
Workshop on Internet Signal Processing, San Diego, CA, USA.
[65] Craig Partridge, David Cousins, Alden Jackson, Rajesh Krishnan, Tushar Saxena, and W. Tim-
othy Strayer. Using Signal Processing to Analyze Wireless data Trafc. In Proceedings of
ACM workshop on Wireless Security, pages 6776, Atlanta, GA, September 2002.
[66] Vern Paxson. End-to-end internet packet dynamics. In Proceedings of the ACM SIGCOMM
Conference, pages 139152, Cannes, France, October 1997.
[67] Vern Paxson. Bro: A system for detecting network intruders in real-time. Computer Networks:
The International Journal of Computer and Telecommunications Networking, 31(23-24):2435
2463, December 1999.
142
[68] Vern Paxson and Sally Floyd. Why we don't know how to simulate the Internet. In the 29th
Winter Simulation Conference, pages 1037 1044, Atlanta, Georgia, December 1997.
[69] V . Ribeiro, M. Coates, R. Riedi, S. Sarvotham, B. Hendricks, and R. Baraniuk. Multifractal
cross trafc estimation. In Proceedings of ITC specialist seminar on IP trafcc Measurement,
Spetember 2000.
[70] Vinay Ribeiro, Rudolf Riedi, Richard Baraniuk, Jiri Navratil, and Les Cottrell. PathChirp:
Efcient available bandwidth estimation for network paths. In Proceedings of the Passive and
Active Measurement Workshop, La Jolla, CA, USA, April 2003.
[71] Martin Roesch. Snort - lightweight intrusion detection for networks. In the 13th USENIX
conference on System administration, pages 229 238, Seattle, Washington, USA, November
1999.
[72] Rishi Sinha, Christos Papadopoulos, and John Heidemann. Fingerprinting internet paths us-
ing packet pair dispersion. Technical Report USC/CS-TR-2006-876, University of Southern
California Computer Science Department, 2006.
[73] Jacob Strauss, Dina Katabi, and Frans Kaashoek. A measurement study of available band-
width estimation tools. In Proceedings of the 3rd ACM SIGCOMM conference on Internet
measurement, pages 3944, Miami Beach, FL, USA, 2003. ACM Press.
[74] Cisco Systems. Netow services and applications. http://www.cisco.com/warp/
public/732/netflow, 2005.
[75] Endace Measurement Systems. Endace dag network capture cards. http://www.
endace.com/, 2005.
[76] BBN Technologies. Intrusion tolerance by unpredictability and adaptation.
[77] Sergios Theodoridis and Konstantinos Koutroumbas. Pattern Recognition Second Edition.
Academic Press, 2003.
[78] Ajay Tirumala, Feng Qin, Jon Dugan, Jim Ferguson, and Kevin Gibbs. Iperf, 2003. http:
//dast.nlanr.net/Projects/Iperf/.
[79] Jianxin Yan, Stephen Early, and Ross Anderson. The xenoservice - a distributed defeat for
distributed denial of service. In Information Survivability Workshops 2000, Boston, Mas-
sachusetts, USA, October 2000.
[80] Yin Zhang, Matthew Roughan, Carsten Lund, and David Donoho. An information-theoretic
approach to trafc matrix estimation. In Proceedings of the ACM SIGCOMM Conference,
pages 301312, Karlsruhe, Germany, October 2003.
143
Abstract (if available)
Abstract
Internet traffic contains a rich set of periodic patterns. Examples include regular packet transmissions along bottleneck links, periodic routing information exchange, and periodicities inside Denial-of-Service attack streams. Analyzing such periodic patterns has wide applications, including a better understanding of network traffic dynamics, diagnosis of network anomalies, and detection of Denial-of-Service attacks. However, current understanding of periodic behavior in aggregate traffic is quite limited. Many previous approaches often analyze traffic on a per-flow basis, and are not suited to analyze high speed aggregate traffic. In addition, a number of approaches only indicate that they may reveal periodic patterns, but fall short of proposing automatic detection algorithms and quantitatively evaluating their performance.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Detecting and characterizing network devices using signatures of traffic about end-points
PDF
Collaborative detection and filtering of DDoS attacks in ISP core networks
PDF
Global analysis and modeling on decentralized Internet
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
Scaling-out traffic management in the cloud
PDF
Understanding the characteristics of Internet traffic dynamics in wired and wireless networks
PDF
Modeling, searching, and explaining abnormal instances in multi-relational networks
PDF
Supporting faithful and safe live malware analysis
PDF
Essays on bioinformatics and social network analysis: statistical and computational methods for complex systems
PDF
Scheduling and resource allocation with incomplete information in wireless networks
Asset Metadata
Creator
He, Xinming
(author)
Core Title
Detecting periodic patterns in internet traffic with spectral and statistical methods
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Computer Networks)
Publication Date
10/13/2006
Defense Date
09/01/2006
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
anomaly detection,bottleneck detection,network traffic analysis,OAI-PMH Harvest,periodic patterns,spectral analysis
Language
English
Advisor
Papadopoulos, Christos (
committee chair
), Heidemann, John (
committee member
), Mitra, Urbashi (
committee member
)
Creator Email
xhe@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m95
Unique identifier
UC1172097
Identifier
etd-He-20061013 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-25360 (legacy record id),usctheses-m95 (legacy record id)
Legacy Identifier
etd-He-20061013.pdf
Dmrecord
25360
Document Type
Dissertation
Rights
He, Xinming
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
anomaly detection
bottleneck detection
network traffic analysis
periodic patterns
spectral analysis