Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Network reconnaissance using blind techniques
(USC Thesis Other)
Network reconnaissance using blind techniques
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NETWORK RECONNAISSANCE USING BLIND TECHNIQUES by Genevieve Elizabeth Bartlett A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2010 Copyright 2010 Genevieve Elizabeth Bartlett Acknowledgments I am indebted to my advisors, Prof. John Heidemann and Prof. Christos Papadopoulos. With much patience on their part, I have learned to structure my approach to problem solving and research. I am greatly appreciative of the time and effort they have put into me. I would also like to thank Prof. Golubchik, Prof. Antonio Ortega and Prof. Ramesh Govindan for being on my committees. I have benefited repeatedly from my interac- tion with Prof. Antonio Ortega and Prof. Urbashi Mitra during group meetings and discussions. I would like to thank Jim Pepin, Walter Prue, and Sanford George of Los Nettos, and Maureen Dougherty, Mark Baklarz and Brian Yamaguchi of USC for supporting the collection of our datasets. I would also like to thank Fabio Silva and Yuri Pradkin for their work on our data collection system. Lastly, I would like to thank my friends for their support, very valuable input and help on my work. In particular, I’d like to thank Nupur Kothari, Vivek Singh, Jonathan Kelly, Sean McPherson, Gautam Thatte, Xue Cai, Alefiya Hussain, Affan Syed, Rishi Sinha, Xinming He and all of ISI’s EBEER crew—especially Yorgos Economou, Mike Ryan, and John Hickey. ii Table of Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Reconnaissance Solution Space . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Summary of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Contributions and Dissertation Outline . . . . . . . . . . . . . . . . . . 9 Chapter 2 Service Discovery: A Study of Passive vs. Active Measurements . . . . . . . . . . . . . . . . 12 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Overview of Service Discovery Techniques . . . . . . . . . . . . . . . 16 2.2.1 Active Probing . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Passive Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Methodology and Datasets . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Methodology for Active Probing . . . . . . . . . . . . . . . . . 21 2.3.2 Methodology for Passive Monitoring . . . . . . . . . . . . . . 22 2.3.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Evaluation of Service Discovery . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2 Server Discovery Over Time . . . . . . . . . . . . . . . . . . . 28 2.4.3 Effect of External Scans on Passive Monitoring . . . . . . . . . 34 2.4.4 How target type affects detection . . . . . . . . . . . . . . . . . 36 2.4.5 Discovery of UDP Services . . . . . . . . . . . . . . . . . . . 44 2.5 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 iii 2.5.1 Time and Frequency of Active Probing . . . . . . . . . . . . . 46 2.5.2 Partial Perspectives in Passive Monitoring . . . . . . . . . . . . 47 2.5.3 Passive Monitoring with Sampled Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Chapter 3 Identifying Peer-to-Peer: An Example of a Specific Target . . . . . . . . . . . . . . . . . . . . . . . 56 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3 Inherent Behaviors in P2P . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.1 Peer Coordination and Failed Connections . . . . . . . . . . . . 63 3.3.2 Bidirectional Connections . . . . . . . . . . . . . . . . . . . . 64 3.3.3 User Accessibility . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.1 Translating Behaviors to Metrics . . . . . . . . . . . . . . . . . 65 3.4.2 Metrics to Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.3 System Operation . . . . . . . . . . . . . . . . . . . . . . . . . 67 Algorithm to Process TCP connections . . . . . . . . . . . . . . . . . . 68 3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.1 Detection Accuracy for BitTorrent . . . . . . . . . . . . . . . . 69 3.5.2 Understanding false positive rate . . . . . . . . . . . . . . . . . 70 3.5.3 Effectiveness for Gnutella . . . . . . . . . . . . . . . . . . . . 72 3.5.4 Estimating previously undetected P2P hosts . . . . . . . . . . . 73 3.6 Evaluation of Second Dataset . . . . . . . . . . . . . . . . . . . . . . . 74 3.6.1 Detection accuracy for BitTorrent in 2005 data . . . . . . . . . 75 3.7 Result Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.7.1 Detection speed . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.7.2 Sensitivity to Threshold Selection . . . . . . . . . . . . . . . . 77 3.7.3 Sensitivity to Window Size . . . . . . . . . . . . . . . . . . . . 78 3.7.4 Detection Sensitivity to Minimum Connections Threshold . . . 79 3.8 Using Peer-to-Peer Identification to Measure P2P Usage . . . . . . . . 82 3.9 A Second Approach to Identify P2P . . . . . . . . . . . . . . . . . . . 84 3.10 Data Collection and Evaluation Methodology . . . . . . . . . . . . . . 85 3.11 P2P Activity at USC . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.11.1 Estimating Total P2P Activity . . . . . . . . . . . . . . . . . . 86 3.11.2 Comparison of Detection Methods . . . . . . . . . . . . . . . . 88 3.11.3 P2P Activity on Commercial vs. Academic Networks . . . . . . 90 3.11.4 Type of Host Participating in P2P . . . . . . . . . . . . . . . . 92 3.11.5 Ingress vs. Egress . . . . . . . . . . . . . . . . . . . . . . . . 92 3.11.6 Determining Popular P2P Protocols . . . . . . . . . . . . . . . 94 iv 3.12 P2P Usage: Related Work . . . . . . . . . . . . . . . . . . . . . . . . 95 3.13 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Chapter 4 Low Rate Periodic Communication: An Example of a General Behavior Target . . . . . . . . . . . . . . . . . . 98 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.1 Application Detection . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.4.1 Timeseries Extraction . . . . . . . . . . . . . . . . . . . . . . 107 4.4.2 Multi-resolution Analysis . . . . . . . . . . . . . . . . . . . . 108 4.4.3 Periodic Events and Energy . . . . . . . . . . . . . . . . . . . 110 4.4.4 Filter to Frequency . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.5 Energy and Frequency to Detection . . . . . . . . . . . . . . . 113 4.4.6 Flipping of Frequency Bands . . . . . . . . . . . . . . . . . . . 115 4.4.7 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.5 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.6 Evaluation of the Approach . . . . . . . . . . . . . . . . . . . . . . . . 122 4.6.1 Does Identification Work? . . . . . . . . . . . . . . . . . . . . 122 4.6.2 Effects of Noise . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.7 What is periodic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.7.1 Variety of applications that show periodic behavior . . . . . . . 128 4.7.2 Prevalence of periodic applications in real networks . . . . . . . 131 4.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.8.1 Self-surveillance: Identifying Changes in Periodic Behavior of a Host . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.8.2 Pre-filtering to assist network surveillance . . . . . . . . . . . . 138 4.9 Advantages of Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.10 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.10.1 Evasion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.10.2 Parameter sensitivity . . . . . . . . . . . . . . . . . . . . . . . 146 4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Chapter 5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . 148 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.2 Broad Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 v List of Tables 2.1 List of datasets. D TCP1-12h and D TCP1-18d are subsets of D TCP1 . . . . 23 2.2 Summary of completeness for active and passive methods at various duration using dataset D TCP1-18d . . . . . . . . . . . . . . . . . . . . 25 2.3 Categorization from observations of IP addresses in D TCP1-12h . . . . . 32 2.4 Traits and subsequent categorization of IP addresses. . . . . . . . . . . 32 2.5 Summary of content served by web servers detected. . . . . . . . . . . 38 2.6 Summary of server discovery broken down by service type. . . . . . . . 42 2.7 Summary of UDP services discovered. . . . . . . . . . . . . . . . . . . 45 2.8 Summary of servers found on each of the three monitored links. . . . . 49 3.1 Summary of Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 Summary of results of BitTorrent detection for 2006 Data Set. . . . . . 69 3.3 Summary of results of BitTorrent detection for 2005 Data Set. . . . . . 74 3.4 Well known ports used by our port-based method. . . . . . . . . . . . . 85 3.5 Summary of P2P activity at USC . . . . . . . . . . . . . . . . . . . . . 87 3.6 Summary of P2P Identified . . . . . . . . . . . . . . . . . . . . . . . . 88 3.7 Summary of traffic volume over monitored links. . . . . . . . . . . . . 91 3.8 Break down by subnet of identified P2P activity. . . . . . . . . . . . . . 93 3.9 Hosts identified as P2P broken down by protocol . . . . . . . . . . . . 95 vi 4.1 Variety of applications that show periodic behavior . . . . . . . . . . . 131 4.2 Prevalence of malware with periodic behavior on our network. . . . . . 133 4.3 Breakdown of pre-filtering for RSS feed aggregators. . . . . . . . . . . 141 vii List of Figures 1.1 3-D solution space for network reconnaissance. . . . . . . . . . . . . . 3 1.2 3-D depiction of how our work explores the network reconnaissance solution space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 3D depiction of how Chapter 2 explores the network reconnaissance solution space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Weighted and unweighted cumulative server discovery over 12 hours for selected services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Cumulative server discovery over 18 days, over all and non-transient addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Comparison of cumulative server discovery over 90 days and 18 days, over all and non-transient addresses . . . . . . . . . . . . . . . . . . . 30 2.5 Cumulative server discovery with and without the effect of external net- work scans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.6 Server discovery grouped by transience of address block. . . . . . . . . 40 2.7 Server discovery over time for passive monitoring and active probing, broken down by protocol. . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.8 Comparison of network scanning at different times of day. . . . . . . . 47 2.9 Cumulative server discovery with different duration, fixed-period sam- pling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1 3D depiction of how Chapter 3 explores the network reconnaissance solution space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 viii 3.2 BitTorrent Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3 Gnutella Leaf Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4 Time until detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5 Effect of metrics on lowerbound thresholds . . . . . . . . . . . . . . . 80 3.6 Effects of Chosen Constants . . . . . . . . . . . . . . . . . . . . . . . 81 3.7 Venn diagram of P2P hosts . . . . . . . . . . . . . . . . . . . . . . . . 87 3.8 CDF of number of connections using a standard P2P port . . . . . . . . 90 4.1 3D depiction of how Chapter 4 explores the network reconnaissance solution space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Filter bank implementation . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3 Full decomposition of a filter tree . . . . . . . . . . . . . . . . . . . . . 109 4.4 Perfect artificial period of 8s (window aligned). . . . . . . . . . . . . . 111 4.5 Long-duration artificial period of 600s. . . . . . . . . . . . . . . . . . . 112 4.6 Impulse response of low- and high-pass filters and combinations. . . . . 116 4.7 Filter depiction for three levels of filtering. . . . . . . . . . . . . . . . . 117 4.8 Periodicity in traffic to a BitTorrent tracker. . . . . . . . . . . . . . . . 120 4.9 Periodicity in an RSS News feed reader (SNR: 0.62). . . . . . . . . . . 121 4.10 Mix of foreground traffic with simulated background traffic . . . . . . . 124 4.11 Effect of SNRon Coefficient Energy . . . . . . . . . . . . . . . . . . . 126 4.12 Visualization illustrating periodic behavior before and after removal of OS update checks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.13 Dipiction of pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 4.14 Effect of jitter on coefficient energy. Original period 128s. . . . . . . . 146 ix Abstract Blind techniques to detect network applications—approaches that do not consider packet contents—are increasingly desirable because they have fewer legal and privacy con- cerns, and they can be robust to application changes and intentional cloaking. This body of work asserts that passive methods enable network reconnaissance without the need for deep-packet inspection. We demonstrate the validity of our assertion by first demon- strating the effectiveness of passive techniques and then presenting two separate passive techniques which aid in network reconnaissance. x Chapter 1 Introduction To meet diverse user needs, large organizations often split control of their network amongst subgroups within the organization. These composite networks consist of mul- tiple individually administered subnets, but often are protected and maintained by a central IT group. Network operators of composite networks do not directly operate computers on their network, yet are responsible for assessing network vulnerabilities, ensuring compliance with site-wide policies, and tracking services that affect provision- ing. For example, within a university the history department may have their own IT group which caters to the history department’s needs, but the campus IT group is still responsible for protecting and providing the appropriate infrastructure for the end-hosts within the history department as well as other departments. Without direct control over individual end hosts on the network, or direct contact with users, central IT administrators must perform network reconnaissance to under- stand end hosts in a composite network. Understanding the end hosts in a network is the first step in protecting, auditing and improving a composite network. Network reconnaissance protects composite networks by identifying software vul- nerabilities before internet worms and botnet sweeps find and exploit vulnerable ser- vices. Performing network reconnaissance also highlights the type and extent of infor- mation an external attacker can obtain. Additionally, network reconnaissance can iden- tify already exploited end hosts by identifying anomalous behavior from compromised hosts. 1 Many organizations have site-wide policies regarding the types of external services which may be offered by internal hosts. Network reconnaissance can identify policy violations in two ways. First, it can identify active violations through monitoring an end host’s network behavior. Alternatively, it can identify hosts offering undesired services by using active approaches to identify services offered by hosts. Network reconnaissance also aids in network planning. Understanding what services are in use can identify who and how many users will be affected by a change in policy or configuration. Likewise, monitoring trends in service and host popularity, and following new services as they appear, identifies where network improvements will have the most benefit. 1.1 Reconnaissance Solution Space Network reconnaissance techniques can be categorized by three main attributes: the target type of the reconnaissance, the method of collecting reconnaissance data and the level of data required. Together these three attributes can be viewed as three axes which describe a three-dimensional solution space for network reconnaissance. Here we discuss this solution space and describe the areas of this solution space this body of work covers. Each axis in the solution space is a continuum between opposites. The target type of the reconnaissance ranges from a specific application to a general or broad range of applications. Data collection range from passive collection of data, to active measure- ments. Finally, the level of data required to perform a reconnaissance technique ranges from deep where deep packet inspection must be performed to blind where little to no information beyond the packet headers is needed. 2 Figure 1.1: 3-D solution space for network reconnaissance. All reconnaissance methods aim at understanding a given target. This target may be specific or general. For example, a network operator may wish to identify and under- stand usage of a specific web browser client on his network. Studying the general class of peer-to-peer file sharing applications is a slightly less specific target. In contrast, mea- suring and studying the network as a whole, such as looking at the variety of protocols used, is a more general reconnaissance target. Identifying end-hosts running specific applications aids network reconnaissance in two ways. First, particular applications may be of interest when planning for and pro- tecting a composite network. For example, peer-to-peer file sharing applications may be of interest when performing policy auditing. Second, identifying which end-hosts use 3 a particular application helps classify these end-hosts. Knowing which end-hosts are high-end servers and which are not, helps when planning for a network and identifying anomalies. Network reconnaissance methods are either based on passively monitoring the net- work to identify attributes of active end-hosts or actively probing end hosts to determine available services. Active probing gives an accurate depiction of all open and available services on a network at the time of the probe, but it may miss services which are only available intermittently or are hidden behind firewalls. In addition, probing is invasive and may be considered inappropriate when crossing organization boundaries. For exam- ple, though it may be legal for an ISP to probe its customers, an ISP company may opt to not probe to avoid perturbing customers. Passive monitoring will detect all services that are exercised over the observation period, including services behind firewalls. Passive monitoring, unlike active probing, can also track client behavior. With both passive and active reconnaissance methods, there is a temptation to ignore privacy concerns in order to gain more direct and immediate knowledge of end-hosts. In active methods, specially crafted probes can reveal information such as the exact type of service being offered and the server software version, but these probes elicit responses from end-hosts which otherwise may not have been sent. In passive meth- ods deep packet inspection—inspection of packet content past the IP and secondary transport layer headers—can also reveal software version information but may expose personal information about a user, such as the web sites the user browses. While in some environments user privacy is not an issue, there are a number of situations where user privacy is a large concern. This work advocates blind techniques —approaches which do not consider application-level packet content—for network reconnaissance. Blind techniques work 4 by identifying end hosts exhibiting specific network behavior and do not use payload based signatures. We advocate blind techniques for two main reasons: First, blind tech- niques arguably preserve user privacy far better than techniques which rely on deep packet inspection. Second, blind techniques are not affected by payload encryption or minor changes in payload content which can often be introduce when application proto- cols are improved and redesigned. In this proposal, we examine three network reconnaissance methods each based on a different approach. We first look at a current approach to reconnaissance and compare this approach to a basic passive technique. Next we look at a passive technique which focuses on end-hosts using popular peer-to-peer (P2P) applications. Our last group of techniques looks for periodic communication between hosts or groups of hosts, and uses the detection of such behavior for both self-surveillance and as a first pass for detection of specific classes of applications such as Malware. 1.2 Thesis Statement This body of work asserts that passive methods enable network reconnaissance without the need for deep-packet inspection. Thus, our techniques preserve user privacy and are not affected by payload encryption and variation. We demonstrate the validity of our assertion by presenting three separate network reconnaissance techniques which do not require deep-packet inspection. First we demonstrate the effectiveness of simple passive techniques in finding available services on a network. Second, we present a passive network reconnaissance technique which examines network behavior of end- hosts to identify hosts involved in peer-to-peer networking. Last, we present a passive technique for identifying low-rate periodic signals within network traffic. With this last 5 Figure 1.2: 3-D depiction of how our work explores the network reconnaissance solution space. technique we demonstrate a self-surveillance application, as well as a first-pass detection reconnaissance technique. 1.3 Summary of Work We substantiate our claim through three studies which all address aspects of network reconnaissance. In this section we briefly discuss how each study relates to network reconnaissance and summarize the contributions of each of these studies. Figure 1.2 6 depicts the areas of the solution space for network reconnaissance which we explore in this body of work. Our first study is a quantitative comparison of passive versus active reconnaissance methods which explores opposite ends of the passive/active axis in the solution space (Section 1.1). In Figure 1.2, this study is depicted by the beige cylinder. With decentralized network management, service discovery becomes an important part of maintaining and protecting computer networks. We explore two approaches to service discovery: active probing and passive monitoring. Active probing finds all services currently on the network, except services temporarily unavailable or hidden by firewalls; however, it is often too invasive, especially if used across administrative boundaries. Passive monitoring can find transient services, but misses services that are idle. We compare the accuracy of passive and active approaches to service discov- ery and show that they are complimentary, highlighting the need for multiple active scans coupled with long-duration passive monitoring. We find passive monitoring is well suited for quickly finding popular services, finding servers responsible for 99% of incoming connections within minutes. Active scanning is better suited to rapidly find- ing all servers, which is important for vulnerability detection–one scan finds 98% of services in two hours, missing only a handful. External scans are an unexpected ally to passive monitoring, speeding service discovery by the equivalent of 9–15 days of addi- tional observation. Finally, in this first body of work, we show how the use of static or dynamic addresses changes the effectiveness of service discovery, both due to address reuse and VPN effects. Our second study, looks at identifying a specific class of applications: peer-to-peer file sharing applications. We use blind techniques to identify our target application class and identify several behaviors that are inherent to peer-to-peer (P2P) traffic. This study 7 demonstrates an example of a strictly passive reconnaissance method with a specific target, and largely blind techniques. The red area in Figure 1.2, depicts the area within the solution space this study explores. To validate, we use an active reconnaissance method, that targets a very specific set of hosts we identify as participating in peer-to- peer file sharing, and uses deep packet inspection which reads payload information to identify key P2P peers. This validation explores a separate nearly opposite area in the solution space, represented by the purple sphere. We demonstrate that our passive techniques can detect both BitTorrent and Gnutella hosts using only packet header and timing information. We identify three basic behav- iors: failed connections, the ratio of incoming and outgoing connections, and the use of unprivileged ports. We quantify the effectiveness of our approach using two day-long traces, achieve up to an 83% true positive rate with only a 2% false positive rate. Our system is suitable for on-line use, with 75% of new P2P peers detected in less than 10 minutes of trace data. Our last study uses passive data and identifies regular communication between hosts which is non-user initiated. Such communication can be good (OS updates, RSS feed readers, and mail polling), bad (keyloggers, spyware, and botnet command-and-control), or ugly (adware or unauthorized peer-to-peer applications). Since many applications exhibit regular communication, this reconnaissance method has a general target. To study regular communication, we use passive and largely blind techniques, exploring a new area in the solution space from our previous two studies. This new area is depicted by the yellow sphere in Figure 1.2. 8 Non-user initiated communication in applications is often periodic but infrequent, perhaps every few minutes to few hours. This infrequent communication and the com- plexity of today’s systems makes these applications difficult for users to detect and diag- nose. We show that there are several classes of applications that show low-rate periodic- ity and demonstrate that they are widely deployed on public networks. In this last study we present a new approach to identify low-rate periodic network traffic. We employ signal-processing techniques, using discrete wavelets implemented as a fully decom- posed, iterated filter bank. This approach allows us to cover very low-rate periodicities, from seconds to hours, and to identify approximate times when traffic changed. Net- work administrators and users can use our techniques for network- or self-surveillance. To measure the effectiveness of our approach, we show that it can detect changes in peri- odic behavior caused by events such as an installation of a keyloggers, or an interruption in OS update checks and the P2P application BitTorrent. We quantify the sensitivity of our approach, showing that we can find periodic traffic when it is at least 5–10% of overall traffic. 1.4 Contributions and Dissertation Outline In the previous section we introduced our three studies and showed how they both relate to network reconnaissance and the reconnaissance solution space. In this section, we summarize the main contribution of our three studies, and introduce the structure for the following dissertation. In the first chapter we detail the work done in our comparison study of passive and widely practiced active reconnaissance methods [BHP07b,BHP07c]. Our work demon- strates the power of blind passive service discovery and how passive methods can be 9 used independently or in conjunction with more traditional active methods. Our com- parison study is the first quantitative comparison of multiple active scans to long-term passive monitoring. We compare passive and active service discovery in identifying all services, and are the first to compare passive and active discovery in identifying only the most popular services. We also quantify the effect of multiple periodic scans, effects of external scans on passive methods, and each method’s ability to identify services on transient hosts (hosts which do not have fixed IP-addresses). The second chapter looks at our reconnaissance method with a specific target of peer-to-peer file sharing. We introduce a set of passive techniques for identifying end- hosts running peer-to-peer applications [BHP06]. Our passive techniques for identify- ing peer-to-peer participants preserves user privacy by only considering TCP/IP packet headers. Our contribution is a set three of light-weight metrics suitable for on-line use. We demonstrate the ability of these metrics—individually and combined—to identify end-hosts participating in peer-to-peer. Additionally, we offer one of the first quanti- tative measurements of P2P file sharing [BHPP07]. We estimate that 3–13% of active hosts at USC participate in P2P and account for 21–33% of the total traffic volume at USC. Our last study, detailed in chapter three, is focused on identifying low-rate periodic events in network traffic. Identifying low-rate periodic signals can aid in network recon- naissance by identifying groups of periodically coordinated hosts. Once such a group is identified, the uses and purposes of the end-hosts can be determined by examining the host(s) which coordinates with the group. For example, a group of machines which periodically coordinate with a POP3 mail server are likely running mail clients which periodically check for new mail. These machines can therefore be classified as personal desktops or laptops and are not likely to be servers. Other sources of low-rate periodic 10 communication include RSS feed aggregators and application update checks, as well as more malicious sources such as spyware update checks, bot-net coordination and keylogger reports. Additionally, identifying which coordinating servers an end-host periodically con- tacts, and the frequency of the contact potentially can be used to develop a unique signa- ture for the host. This signature can then be used to identify the host even if the end-host changes IP address and even hardware addresses. To address the difficulty and complexity in finding low-rate periodic communication we propose a signal processing approach. Our proposed approach focuses only on tim- ing information, allowing us to passively determine important network reconnaissance information about an end-host without relying on payload information. To our knowl- edge, we are the first to look at low-rate periodic events in network traffic and to study how this can be used to identify the purpose of end-hosts. All three studies demonstrate blind techniques which classify end-hosts and aid in understanding a network, but each study takes a different approach, and explores a sig- nificantly different area within the solution space. In all of our work, we are able to perform network reconnaissance and yield vital information from passive data without the need for deep-packet inspection. We use the success of our passive and blind tech- niques within these three areas to demonstrate the validity of our thesis. 11 Chapter 2 Service Discovery: A Study of Passive vs. Active Measurements In this chapter we look at the vital network reconnaissance practice of service discovery. Service discovery is the process of identifying active services on a network—an impor- tant step in protecting and maintaining a network as we will discuss later. This widely used practice is typically performed using actively collected data, where specifically designed probes are sent to hosts and the responses are collected and analyzed. In this work we perform a comparison of common active reconnaissance methods with passive methods, and demonstrate that passive methods obtain complimentary and comparable results to the popular active methods. The first part of our thesis states that passive methods can enable network recon- naissance. This chapter is central to supporting this statement because the majority of service discovery is currently performed via active methods. In this work we prove that passive methods successfully perform service discovery and in some cases provide results active methods cannot. We also explore in detail the trade-offs of solely using passive methods. Figure 2.1 visually depicts this area of the 3D network solution space (Section 1.1) which we explore to prove our thesis. We use largely blind techniques to discover a 12 Figure 2.1: 3D depiction of how Chapter 2 explores the network reconnaissance solution space. variety of services, and explore the ends of the passive and active axis. We also study using a combination of active and passive methods for service discovery, and so, in this chapter we also explore the space between the ends of the passive and active axis. 2.1 Introduction Today’s computer networks support very diverse sets of services, and network adminis- trators must manage and protect an organization’s network from vulnerability and inap- propriate information disclosure. This diversity makes service discovery a necessary 13 step in network reconnaissance. While in small organizations service discovery may be unnecessary since externally visible computers and services may be centrally managed, in large organizations and ISPs control of servers is delegated. Ultimately the respon- sibility for security and auditing often remains centralized, so in these cases service discovery becomes a central part of maintaining and protecting networks. Service discovery is an essential capability for network administrators for the follow- ing reasons. First, it helps protect against software vulnerabilities. Internet worms and botnet sweeps exploit vulnerabilities in open network services. Rapid identification of vulnerable software is important after disclosure of an exploit; preemptive surveys can track an organization’s service “surface area”. Second, most organizations have policies about computer use, often including what external services may be offered. Service dis- covery supports auditing of policies. Third, service discovery is often the first step in network planning. Understanding what services are in use can identify who and how many users will be affected by a change in policy or configuration. Finally, service dis- covery can also help monitor trends in service popularity, as new services appear and the relative importance of services change. Even if one cannot control individual hosts, access to the network allows two gen- eral methods to discover services: active probing and passive monitoring. With active probing, one attempts to contact each service at each host. Active probing gives an accu- rate depiction of all open and available services on a network at the time of the probe, but it may miss services which are only available intermittently or are hidden behind by firewalls. In addition, probing is invasive and may be inappropriate when crossing organization boundaries (for example, an ISP probing its customers). In passive monitoring, one observes network traffic destined to servers, building up a picture of active services over time. Passive monitoring will detect all services 14 that are exercised over the observation period, including services behind firewalls and transient services—services which are available at a single IP for only a brief period, either because the service or server itself is shutdown or the server is a transient host and does not have a fixed IP address. Since it is non-invasive, passive monitoring cannot be confused with malicious behavior. However, it misses services which are idle, even though they may still represent a vulnerability. The following is a quantitative evaluation and comparison of passive monitoring and active probing for service discovery based on data collected at the University of Southern California. Although these approaches have been compared qualitatively in IT trade magazines [Sch05] there has been little quantitative exploration. Our comparison of passive and active service discovery is closer to Webster [WLZ06], but goes deeper by evaluating multiple periodic active scans (Section 2.4.2) and the effects of transient hosts and external scans (Sections 2.4.4 and 2.4.3). Additionally, we investigate the sensitivity of passive/active discovery to time of day, monitor location and portion of traffic seen by the monitor (Section 3.7). Finally, using a larger and more varied population we confirm the core conclusions in Webster [WLZ06] that passive and active are effective and often complimentary means of service discovery. We find passive monitoring is well suited for quickly finding popular services, such as for trend monitoring; within minutes passive monitoring finds servers responsible for serving 99% of incoming connections. We find that active scanning is better suited to finding all servers, such as for vulnerability detection; one scan finds 98% of services in two hours, missing only a handful. In addition, we look carefully at what network conditions affect the completeness of active and passive service detection. On our network, long-duration passive monitoring 15 is ultimately reasonably successful at finding even idle servers (finding 72–91%); per- haps ironically, external, possibly malicious scans of our network, provide great assis- tance in rapidly detecting services. We also show how the use of static vs. dynamic addresses changes the effectiveness of service discovery. We see a great deal of ongo- ing service discovery with more dynamic addresses, corresponding to transient hosts that possibly reuse addresses. In addition, we show that services on VPN addresses are almost never discovered passively, but are found with active probing. 2.2 Overview of Service Discovery Techniques We next describe briefly how active probing and passive monitoring are used to discover services, and review the trade-offs between these approaches. 2.2.1 Active Probing Active probing finds services by sending packets to each host and monitoring its response. Active probing requires participation of the host running the service, so results can be affected by firewalls or host counter-measures. To discover available services hosts are scanned by probing all target ports on each host on the network. Probes may be generic (specific only to the protocol, not the application), or customized to an expected application. Host discovery can speed service discovery by checking for host presence and skipping unused addresses. For some services, a probe may need to be specific to a given application. However, the TCP connection setup suggests that for TCP services simply initiating a connection is a generic probe that will detect the presence of a server on a well-known port. This process of discovering TCP services is known as half-open scanning, where the prober 16 attempts to set up a new TCP connection to a given port. Other possible responses include a TCP reset message, confirming no service runs on that port or lack of response, suggesting a firewall. Generic TCP probing is insufficient, however, in two cases. First, it only tests for willingness to open a TCP connection, but not what service that connection supports. It will therefore misinterpret services running on non-standard ports, such as a web server running on the SMTP port. Second, it cannot classify servers that have no standard port, or those that use dynamic port assignment. For example, many RPC protocols allocate TCP ports dynamically and discover allocation through service brokers or portmap- pers (for example, [Sun88, BEK + 00, RSGC + 02]). To discover these services an active probe must be specifically designed for that service’s protocol. Nevertheless, use of well known ports is common today, and a necessary means of coordination without a third party. Though generic UDP probing gives ambiguous results, such probing is still possible for well-known UDP services. Certain protocols will respond to a “malformed” UDP packet and hence will respond to a generic UDP probe. In other cases, we can indirectly infer the presence of a UDP service by lack of a negative response, since many hosts automatically generate ICMP port unreachable messages when no process is listening to a given UDP port. A lack of response is not definitive, but may indicate that a UDP server is present. In the majority of our study we focus on TCP services, but delve briefly in UDP service discovery in Section 2.4.5. 17 2.2.2 Passive Monitoring Passive monitoring finds services on a network by observing traffic generated by servers and clients as it passes an observation point and is generally invisible to the hosts running the services. Passive monitoring requires support from the network operator, often with special- ized hardware inserted at the monitoring point. There are multiple hardware devices available for passive monitoring, with different costs and tolerance of high traffic vol- umes. Many routers can “mirror” ports, sending copies of packets out another interface to a monitoring host. Port mirroring can often be added with no interruption to service, but may not support full channel capacity. Alternatively, hardware taps such as optical splitters place no additional burden on the router, but require a brief service interruption to install. Detection of well-known services (both TCP and UDP) with passive monitoring is fairly straightforward. An exchange of traffic with a given host indicates an operational service. For TCP, monitoring need only capture TCP connection setup messages (SYN bit set); completion of the “three-way handshake” clearly indicates a service is available. Under normal operation, even just the presence of a positive response to a connection request (SYN-ACK) is sufficient evidence of a TCP service. UDP services can also be identified by observing traffic; however, since UDP is a connectionless protocol, the concept of “server” and “client” is not clear without appli- cation protocol information. In addition, while bi-directional traffic positively indicates a UDP service, unidirectional traffic may also indicate a service (since UDP does not mandate a response), but may also indicate unsolicited probe traffic. As with active probing, passive monitoring can not identify services that do not run on well-known ports or are indirected without protocol-specific decoders. 18 2.2.3 Discussion Based on the descriptions of active and passive service discovery above, we next com- pare their advantages and disadvantages. With few exceptions, active probing gives a complete report of all ports that are open and unprotected at the time of the probing. Active probing for services will miss ports that are filtered by firewalls or obscured by mechanisms such as port knocking [Krz03]. Arguably, protected services are less likely to be vulnerable to malicious scanning and unsolicited attacks, so detection of such services is less critical for vulnerability assess- ment. However, for goals of auditing and resource planning, discovery of all (including protected) services is important. Active probing can often be done quite quickly. While they consume some band- width, scanners can be placed near the probed hosts where bandwidth is plentiful. The main disadvantage of active probing is that it is very intrusive. Active probes solicit a response that would not have been sent otherwise. Probes can be detected and logged by the host or intrusion detection systems, particularly if one systematically scans all hosts in a region. Scanning across organizations (such as an ISP scanning its customers) may be considered unacceptable by the customers and may even be illegal. Even within a single organization, there can be a lack of central authority to coordinate and authorize scans. Recognizing these concerns, scanning tools such as Nmap support special scanning modes that intentionally slow their probe rate to conceal their behavior. Scanning is often intentionally avoided as a policy decision out of regard for client privacy. When active probing is used, it is often limited to short probes done relatively infrequently, or perhaps only carried out when motivated by a specific vulnerability. A second disadvantage of active scanning is that it misses hosts that may be tem- porarily unavailable at the time of scan. We quantify this effect in Section 2.4.1, and in 19 fact show that the time of day of the scan matters (Section 2.5.1). This disadvantage can be mitigated with multiple active scans, as we show in Section 2.4.2, although additional scans my draw further notice from those operating the scanned hosts. Passive monitoring has the advantage of being non-intrusive. In fact, it generally cannot be detected by either party of a conversation. As a result, use of passive moni- toring is constrained primarily by policy decisions by the network operator. A second advantage of passive monitoring is that it can better detect active services running on transient hosts. Thus, vulnerabilities on machines that are frequently powered off such as laptops, or hosts temporarily disconnected from the network, all may be found. While it may seem surprising that one may run services on hosts that are intermittently avail- able, we see that this effect can be significant in Section 2.4.4. Third, passive monitoring can catch services that active probing misses because of firewall configurations. Fourth, although not a primary focus of this work, passive monitoring can also pro- vide insight into trends and other behaviors which active probing cannot. While mon- itoring servers, passive monitoring can also track clients, providing extra information such as server popularity and server load. Finally, since passive monitoring consumes no network resources (other than the monitoring host), it can be run on a long-term basis as part of normal operation. The main disadvantage of passive monitoring is that it only detects services that are active. Silent servers therefore escape notice, even though they may still pose vulnerabil- ities or policy violations. We quantify the number of these silent servers in Section 2.4.4 by using active probes to discover servers which escape notice during passive moni- toring. This disadvantage can be somewhat mitigated by long-term monitoring. We quantify the effect of duration of passive monitoring in Section 2.4.2. 20 2.3 Methodology and Datasets To compare passive monitoring with active probing we carried out five experiments in 2006 for periods of up to 90 days as shown in Table 2.1. We next describe our data collection and give details on our experiments. The data was collected at the University of Southern California, with a student population of about 28,000 and faculty and staff adding another 10,500. We describe this population in more detail in Section 4.4.7. 2.3.1 Methodology for Active Probing Our active scans were performed by the staff of our campus network administration using Nmap [nma]. Probing was done from internal campus machines, thus both the probes and the responses were invisible to our passive monitoring. For larger experi- ments (Datasets D TCP1 and D TCPbreak ), an address space of 16,130 IP addresses was split roughly in half and scanned separately by two internal machines. For smaller experiments, scanning was performed from a single internal machine. All IP addresses in the scanned space were probed (there was no separate phase for host discovery). For our larger datasets, probing took one to two hours to complete. Scans used Nmap’s half-open scanning mode, We focus on a set of standard TCP service ports: port 21 (FTP), 22 (SSH), 80 (web), 443 (SSL web) and 3306 (MySQL). We have chosen a small set of standard ports for simplicity and out of privacy concerns. We believe that our results hold for other services that use well-known ports. To complement discovery of TCP-based services, one dataset (Dataset D UDP ) uses Nmap’s generic UDP probing to probe a set of four standard UDP ports: 80 (HTTP and other applications), 53 (DNS), 137 (Microsoft Windows NetBIOS Name Service) 21 and 27015 (common multiplayer game port). We discuss results from our UDP scans in Section 2.4.5. 2.3.2 Methodology for Passive Monitoring Our passive measurements are collected at the regional ISP serving our university as well as other academic and commercial institutions. Based on discussions with our IT staff, we estimate we capture over 99% of non-Internet2 traffic to and from the university. (In section 2.5.2, we investigate how adding monitoring of Internet2 traffic affects our results.) We used a continuous network tracing infrastructure [HBP + 05] and collected all TCP SYN, SYN-ACK and RST packets, as well as all UDP traffic. To discover available TCP services, we assume that any host sending a SYN-ACK is running a service. TCP SYNs and RSTs are used in Section 2.4.3 to identify exter- nal hosts, which scan the university network. To discover available UDP services, we assume that any host which sends UDP traffic from a well known server port is running a UDP service on that port. 2.3.3 Datasets Using the methodology described above, we collected five datasets summarized in Table 2.1. Each dataset has an active and a passive component: data from continu- ous passive monitoring and data from one or more active scans. Each dataset contains information for a set of IP addresses from one or more subnets on our campus. The total number of possible IP addresses in each set is listed in column six of Table 2.1. Four of our datasets cover 38 of the most densely populated subnets on campus. Together, these 38 subnets contain 16,052 IP addresses. Roughly 75 % of this address space has assigned host names, and over 40% of the IP addresses we probed during our 22 Dataset Passive Active Target Number Discussion Name Start Date Duration Scans Services of addresses Section D TCP1 10 Aug. 2006 90 days 35 total TCP/selected 16,130 D TCP1-12h 19 Sept. 2006 12 hours once TCP/selected 16,130 Section 3.11 D TCP1-18d 19 Sept. 2006 18 days every 12 hrs TCP/selected 16,130 Section 3.11 D TCP1-18d-trans 19 Sept. 2006 18 days every 12 hrs TCP/selected 2,296 Section 2.4.4 D TCP1-90d 10 Aug. 2006 90 days - TCP/selected 16,130 Section 2.4.2 D TCPbreak 16 Dec. 2006 11 days every 12 hrs TCP/selected 16,130 Section 2.5.2 D UDP 18 Oct. 2006 1 day once UDP/selected 16,130 Section 2.4.5 Table 2.1: List of datasets. D TCP1-12h and D TCP1-18d are subsets of D TCP1 . study responded with at least one TCP RST and/or TCP SYN–ACK, indicating at least 6,450 of the 16,130 IP addresses are assigned to live hosts. Our main dataset, D TCP1-18d , is an 18-day period with concurrent active probes every 12 hours and passive collection over the entire period. We also use two variations of this dataset. The dataset is actually a subset of the longer D TCP1 dataset, which includes 90 days of passive monitoring, but we only have active measurements for 18 days, captured in D TCP1-18d . We use the full 90-day version to study very long dura- tion passive monitoring in Section 2.4.2. We also use the first 12-hours of D TCP1-18d for our preliminary analysis and D TCP1-18d-trans , the set of “transient” addresses of D TCP1-18d in Section 2.4.4 to discover transient hosts. Dataset D TCP1 was taken during a the semester when students, faculty and staff are present. Dataset D TCPbreak compliments D TCP1 with a similar duration, but but was taken during the December break in classes when many students are absent from campus, giving insight into how our results change with a reduced number of users. Dataset D UDP is used for a brief exploration into UDP service discovery and covers a selected set of 4 UDP ports (discussed in Section 2.4.5). Our datasets are made available through the PREDICT program [The05, USC06a]. 23 Due to privacy concerns both passive and active results are anonymized after collec- tion, and all processing was done on anonymized traces. The anonymized datasets are available through the PREDICT project [The05] or by contacting the authors. 2.4 Evaluation of Service Discovery We next evaluate passive and active approaches to service discovery, considering com- pleteness (Section 2.4.1), the importance of observation time, repeated probing, and external scans on completeness (Section 2.4.2 and 2.4.3), and finally how the type of the target computer and service affects accuracy (Section 2.4.4). 2.4.1 Completeness Our first goal is to evaluate completeness: how closely active or passive detection comes to detecting everything. To answer this question we first define ground truth and explore how close we come to detecting all servers. We then consider other definitions of com- pleteness, such as all connections or all traffic. Hosts as Ground Truth We first establish the effectiveness of both methods. We look at the servers discovered by active and passive methods during a brief survey and compare the completeness each method achieves. For this comparison, we use the first 12-hours of passively collected data and the first active scan from dataset D TCP1-18d . We call this subset D TCP1-12h . It makes up 3% of dataset D TCP1-18d ; we expand to consider all data in D TCP1-18d in Section 2.4.2. 24 Percent of D TCP1-18d used 3% 6% 50% 100% Passive duration in hours 12 25 205 410 Number of active scans 1 2 17 35 Total servers found (union) 1,748 (100%) 1,848 (100%) 2,551 (100%) 2,960 (100%) Passive AND Active 286 (16%) 1,074(58%) 1,738 (68%) 1,925 (65%) Active OR Passive (but not both) Active only 1,421 (81%) 716 (39%) 683 (27%) 848 (29%) Passive only 41 (2.3%) 58(3.1%) 130 (5.0%) 186 (6.3%) Active 1,707 (98%) 1,790 (92%) 2,421 (95%) 2,773 (94%) Passive 327 (19%) 1,132 (61%) 1,868 (73%) 2,111 (71%) Table 2.2: Summary of completeness for active and passive methods at various duration using dataset D TCP1-18d To compare passive and active methods we must first define ground truth. Ideally we would get ground truth by confirming, externally, what services run on each machine. However, we cannot do this for our dataset since it spans a significant portion of a university with hundreds of separately administered groups and thousands of privately run machines. Instead, we define ground truth as the union of servers found by passive and active methods. While we expect that passive monitoring will not give as complete a picture as active probing, we also expect passive monitoring to find a number of services active probing misses. The leftmost column of Table 2.2 summarizes server discovery for each method as well as the union and overlap of the two methods. Combined, both methods find 1,748 hosts running one or more service of interest. Treating these 1,748 as the ground truth for completeness, a single network scan discovers 98% of all servers by detecting 1,707 hosts. Passive monitoring for 12-hours achieves only 19% completeness by detecting 327 servers. Given the large percentage of hosts missed, it is clear that passive mon- itoring by itself is not sufficient for situations when one must rapidly find all servers that meet a given criteria, such as doing a vulnerability scan immediately following the disclosure of a software flaw. 25 While Table 2.2 quantifies the overlap and completeness of passive and active meth- ods, Table 2.3 gives context to these numbers by interpreting each combination of obser- vations. For example, because 286 servers were found by both methods, we know that 16% of servers found in dataset D TCP1-12h are open and active servers, while the vast majority of servers (81%) are idle. Despite the power of active probing, passive monitoring finds 41 servers (2.3%) active probing fails to detect. Active may have missed these servers because the servers were born after the active scan completed, or these servers may be protected by a firewall that discards our active probes, while accepting requests from other IP addresses. We look closer at firewalled services and server birth in Section 2.4.2. While 2% is a very small percentage of services found exclusively by passive monitoring, finding these few services may be valuable if, for example, one of these services violates policy. In cases where completeness is key, a combination of both methods is advantageous. Other Measures of Completeness In the last section we looked at completeness in terms of absolute number of servers found. While finding all servers is important in some cases, such as identifying soft- ware vulnerabilities, in other cases one may care more about identifying popular or active services. We therefore next consider two alternate definitions of completeness that weigh service discovery by their popularity, as reflected by the number of clients and the number of flows to a given service. First, we weigh by unique clients, by counting the number of unique client IP addresses that connect to the server during the duration of our measurements. When we first discover a server, we add the number of clients this IP address serves through- out the study. Thus, if there were only servers A and B to be discovered, with 9 and 26 0 20 40 60 80 100 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00 Percent of union found Time (hour:minute) Passive Active Unweighted Flow weighted Client weighted Figure 2.2: Weighted and unweighted cumulative server discovery over 12 hours for selected services. 1 clients over the traced duration respectively, we would discover 90% of the client- weighted servers when we detect serverA. Second, we consider weighing by number of flows. This weighing by flows follows the same methodology as weighing by clients, but adjusted by flows over the dataset duration rather than unique clients. Figure 2.2 compares the weighted and unweighted completeness of active and pas- sive discovery. As described above (Section 2.4.1), we see that passive discovery takes some time to find the 19% of hosts that it will find over 12 hours. However, we see that passive monitoring finds the most popular servers almost immediately—in fact it finds 99% of the client-weighted servers in 14 minutes, and 99% of the flow-weighted servers in 5 minutes. Thus, while passive is very poor at finding all servers, it can very rapidly find popular and active servers. We will see the cause of this difference when we look at server type in Section 2.4.4. The services that passive misses are rarely used with default configurations. In fact, passive monitoring actually finds the most popular servers faster 27 0 500 1000 1500 2000 2500 3000 09-18 09-20 09-22 09-24 09-26 09-28 09-30 10-02 10-04 10-06 10-08 Number of server IP-addresses discovered Time (month-day) Active Probes Continuous Passive Monitoring Passive (all hosts) Active (all hosts) Passive (static only) Active (static only) Figure 2.3: Cumulative server discovery over 18 days, over all and non-transient addresses than they would be found with active scanning. This can been seen in Figure 2.2, where our active scan takes well over an hour to find 99% of the flow- and client-weighted servers. This difference is because it is relatively slow to scan a large address space, particularly if the scan is rate-limited to reduce the effects to normal traffic, to avoid flooding hosts, or avoid triggering intrusion-detection systems. 2.4.2 Server Discovery Over Time As demonstrated in the previous section, passive monitoring for a short period only observes a fraction of servers. However, active monitoring misses a few servers as well. In the following sections we look at extended service discovery, either through long duration passive monitoring, or through multiple rounds of active probes. 28 Effect of Duration on Passive Monitoring In section 2.4.1, Figure 2.2 demonstrates that passive monitoring continues to discover servers as time progresses—this trend suggests that a longer observation period is more effective. To confirm the benefits of longer duration, we look at server discovery over an 18- day period with dataset D TCP1-18d to see if discovery levels off. We expect that given sufficient time, passive monitoring will detect the majority of servers that active probing detects. Figure 2.3 depicts passive server discovery over time. Separate lines depict server discovery over all IP addresses and over a subset of all IPs: IPs with non-transient addresses. In Section 2.4.1 we determined that within 12 hours, passive monitoring found 17% of the 1,714 servers found by one active probe. After another 17.5 days, passive mon- itoring detects 92.5% (1,587) of the 1,714 servers found by a single active probe. We conclude that long-duration passive monitoring can be very effective, although it may still fall short of active probing. A significant portion of servers missed by passive monitoring are servers with tran- sient IP addresses (such as PPP and VPN addresses). We discuss server discovery for servers using transient IP addresses in Section 2.4.4. Over all IP addresses, transient and non-transient together, passive service discov- ery never levels off. Even in the last five days of monitoring during D TCP1-18d , new servers are still discovered at an average rate of one per hour. This continual discovery is not surprising since transient hosts have a strong effect—every time a server with a transient IP address disconnects, there is the potential of re-discovering this server at a new IP address the next time it connects. Additionally, transient IP addresses can 29 0 500 1000 1500 2000 2500 3000 3500 08-05 08-19 09-02 09-16 09-30 10-14 10-28 11-11 Number of server IP addresses discovered Time (month-day) 90 days 18 days TCP1 90d (all hosts) TCP1 18d (all hosts) TCP1 90d (static only) TCP1 18d (static only) Figure 2.4: Comparison of cumulative server discovery over 90 days and 18 days, over all and non-transient addresses represent many more hosts than static networks, with a variety of users connecting and disconnecting continually. Over non-transient hosts, discovery nearly levels off after 11 days but even in the last five days, hosts are still discovered at an average rate of one every 3 hours. We suggest that server request rates are heavy tailed, and so there are a number of very rarely accessed servers that require a long time to discover. Extended Duration for Passive Monitoring In the previous section, we found that new servers continue to be discovered even after 18 days of passive monitoring. In this section, we use D TCP1-90d to extend our passive monitoring period to 90 days to see if passive server discovery levels off. Figure 2.4 shows cumulative server discovery over time for all hosts, with an addi- tional line for just servers with non-transient IP addresses. Server discovery over non- transient hosts drops to an average of just one newly discovered host every 12-hours in 30 the last five days of monitoring. In contrast, server discovery over all hosts only drops to roughly one every hour and a half. Again, this difference can largely be explained by the effect of transient hosts, which are included in the total. We examine transient hosts further in Section 2.4.4. Effect of Multiple Probes on Active Monitoring Just as passive observation over a longer duration can find more hosts, we expect that multiple active probes will be more effective as well. Figure 2.3 shows server discovery as the number of probes increases over 18 days. Over all scans, the majority of servers (62%) are found in the first scan, but the last 10 scans still find 10–30 new servers per scan. Similar to passive discovery, this continuing increase in newly discovered servers is due to transient hosts. Figure 2.3 also shows server discovery over multiple probes for non-transient hosts only. We observe that the number of discovered servers roughly levels off after five scans; however, new servers appear often enough in our environment that the last 10 scans done over the last five days each discover four servers per scan on average. This is close to the passive discovery rate after 10 days of monitoring, implying that even over extended duration passive monitoring can never fully catch up. Completeness Over Time Previously, for completeness, we defined ground truth as the union of a single active scan and 12 hours of passive observation. As shown in the previous section, multiple active scans discover a larger set of hosts, as does extending the duration of passive monitoring, so it is appropriate to revise the definition of ground truth. In this section we 31 D TCP1-12h address Passive Active categorization count yes yes active server address 286 no yes idle server address 1,421 yes no firewalled address or birth 41 no no non-server address 14,553 Table 2.3: Categorization from observations of IP addresses in D TCP1-12h . D TCP1-12h D TCP1-18d − D TCP1-12h address Passive Active Passive Active Transient categorization count yes yes yes yes * active server address 37 yes yes no no * server death 6 yes yes yes no * intermittent 1 yes yes no yes * mostly idle 242 no yes * * yes idle/intermittent 99 no yes yes * no semi-idle 1,247 no yes no * no idle 75 yes no * * yes intermittent 26 yes no yes yes no birth 1 yes no yes no no possible firewall 4 yes no no no no death 3 yes no no yes no birth/mostly idle 7 no no no no * non-server address 13,341 no no yes yes yes intermittent/active 188 no no yes yes no birth 125 no no no yes yes intermittent/idle 655 no no no yes no birth/idle 73 no no yes no yes possible firewall/intermittent 140 no no yes no no possible firewall/birth 31 Table 2.4: Traits and subsequent categorization of IP addresses. define ground truth as the union of all servers discovered by active and passive methods in dataset D TCP1-18d . Though 18 days of passive monitoring may be comparable to a single active scan, passive monitoring, when compared to multiple active scans is not nearly as effective. When we compare passive monitoring against 35 active probes taken over 18 days, (summarized in the last column of Table 2.2) we see that 18 days of passive monitoring detects only 71% of all servers. 32 Though passive misses a significant number of servers, as seen during our 12-hour study in Section 2.4.1, passive monitoring finds a handful of servers before active dis- covers them, as well as servers that are never discovered by any active scan. As shown in the last column of Table 2.2, at the end of 18 days and 35 scans, 6.3% of all servers found are never found by an active probe. In our preliminary analysis we used Table 2.3 to interpret our observations from one active probe and a short passive observation. Table 2.4 extends this classification to consider the implications of our additional scans and monitoring; we next look at how each group of servers from D TCP1-12h fare with longer surveillance. In our first survey using D TCP-12h , 286 servers were found by both passive and active methods. A handful of servers die off and are never seen again by either method. Only 37 of the original servers seen by both continue to be seen by both. However, this group of 37 active servers are the most active and popular servers, responsible for serving the majority of clients and connections to our campus (Section 2.4.1). The majority (242) of servers first seen by both approaches are not seen by future passive monitoring, suggesting that these hosts are mostly idle and happened to be overheard in the first 12 hours of monitoring. The largest group of detected servers in D TCP1-12h were the 1,421 servers seen by active but not passive observation. The majority of these servers are mostly idle servers with fixed IP addresses and 1,247 of these servers are found with passive monitoring over extended time. A few servers (75) are still missed by passive scans. A slightly larger number of servers (99) are on transient addresses, explaining their intermittent behavior. Finally, most addresses (14,553) showed no servers present in our initial 12-hour study. While most of these addresses continue to not have servers (13,341), more than 33 1000 show activity in the longer period. We highlight two categories here. First, we see a significant number of new servers, either through later passive and active, or just active. Many of these are on transient addresses (188 detected by both, and 655 by active only), but a fair number are on stable addresses (125 detected by both, and 73 by active only). Second, we see 31 possible firewalled servers on stable addresses, as indicated by their lack of response to active probing but presence of traffic. Throughout the total 18 day study, we find 35 potentially firewalled servers (4 from the first 12 hours and 31 in the remaining time). We confirm these 35 servers are running a firewall by two methods: First, if during a single scan probes to these services receive TCP RST packets from some ports, but no responses from other ports, we assume the server is dropping probes to firewalled services and sending resets from ports not pro- viding services. Second, if activity to a server is passively observed during an active scan, we assume the server is available during probing, but blocking our probes. We confirmed 32 out of the 35 servers are running a firewall with the first method. We con- firmed 10 out of the 35 servers with the second method. Only one server could not be confirmed as firewall–protected. Though firewalled services represent a small fraction of all hosts found, as discussed in Section 2.4.1, context defines how important finding these services are. Thus, if completeness is the goal, a combination of both methods is beneficial. 2.4.3 Effect of External Scans on Passive Monitoring Figure 2.3 shows several large jumps in passive server discovery (for example, at 9-20 and again at 9-23). We determined that these jumps are due to external scans of the address space—in effect, potentially malicious external parties carrying out an active 34 scan of the address space we monitor. These scans benefit passive monitoring by unveil- ing otherwise inactive servers. We next evaluate how important these external scans are to passive monitoring. We expect that external scans contribute greatly to the server discovery in passive monitoring. Unpopular or unused services may never be discovered without these kind of systematic walks through the address space. We show that without external scans, passive monitoring is significantly hindered. To remove the effect of external scans from D TCP1-18d , we identify remote hosts which scan significant portions of our campus network during the 18 day period. We consider scanners to be any IP address which attempts to open TCP connections to 100 or more unique IP address on our network within 12 hours, and receives TCP RST responses from at least 100 of these contacted hosts. Our definition of scanners is not perfect; we miss scanners whose probing is rate-limited below our threshold, or which distribute probes over multiple source IP addresses. Our definition classifies 65 external IP addresses as scanners (only 0.001% of the external IPs seen contacting campus). While a broader definition of scanner may result in a larger number of detected scanners, we have certainly identified the hosts responsible for the largest scans. We will next show that these 65 scanners significantly change the effectiveness of passive monitoring. Figure 2.5 shows the difference between passive server discovery with and without the use of external scans. In the first 12 hours, without external scans server discovery is effectively the same as server discovery with external scans. The first scan on 9/20 aids passive discovery to find over 700 new servers bringing the total of discovered servers to 1,224. Without the aid of scans, passive monitoring takes a additional 9.5 days to discover over 1,200 servers. Within three days, passive monitoring detects over 1,300 servers. Without scans, passive monitoring takes an additional 15 days to find 35 over 1,300 servers. At the end of 18 days, passive monitoring detects 779 (36%) fewer servers when the effect of external scans is removed. Given the significant number of servers discovered through passive monitoring of external scans, we conclude that passive server discovery in a protected (scanner-free) environment will be significantly delayed, and likely less effective. We expect that a broader definition of scanner would further slow passive discovery, but not qualitatively change this conclusion. 2.4.4 How target type affects detection The previous sections evaluated passive monitoring and active probing based on their ability to detect select services across a large set of university machines. In the following sections we examine how the type of server and service affects detection by passive and active methods. Server Purpose Passive monitoring can only detect services that have active clients. It will not find unpopular services that are listening but never receive traffic. If this is the only reason services are missed by passive monitoring, servers missed by passive monitoring are all unpopular services. We hypothesize that many of these unpopular services are actually completely inactive and often are either accidental services from a default system instal- lation, or services of strictly local interest, such as web control for a physical device. It is difficult to measure the popularity of a service independent from passive mon- itoring; by definition we see popular services, and we have no way of evaluating how unpopular missed services are. However, for the special case of web servers, the content is usually human readable, so we can manually evaluate the content of the web server. 36 To evaluate the content of discovered web servers, we first download root web pages from all web servers discovered during the 18 days in dataset D TCP1-18d . Each web server is contacted within a day of discovery. We then categorize these root web pages into seven categories: custom content (content that is unique and likely is globally interesting), default content (such as the Apache server test page), minimal content (fewer than 100 bytes), device configura- tion/status pages (such as JetDirect printer pages), database interface pages (such as Oracle database front-ends), pages with restricted content (log in pages) and hosts which did not respond. To categorize web pages we developed a set of 185 web page signa- tures, which contain sets of strings commonly found in specific types of web pages. For example, one of our “default content” signatures matches 14 different strings often found in the default Apache web server page. We expect that passive monitoring has no problem finding web servers serving glob- ally interesting (custom) content. Additionally, we expect that pages missed by passive monitoring fall into less interesting categories such as “default content”. It is impossible to determine the global interest for configuration pages, database front-ends and pages with log in access without specific knowledge of their use within the organization. While we suspect many of these pages are intended only for campus use, there may be a set of external users accessing these documents. Table 2.5 summarizes the content of root web pages. Passive monitoring achieves the best completeness for custom content pages finding all custom content servers. Passive monitoring finds a surprising number of web servers hosting non-globally interesting content, finding 95% of the union. Finding this many servers of non-globally interesting content is contrary to what we expect; however, if we remove web servers only found through external scans, passive monitoring finds 69% of the 504 web servers 37 Total Passive Active OR Passive Active Passive Page type (Union) AND Active Active only Passive only Custom content 170 (100%) 151 (89%) 0 (0.0%) 19 (11%) 151 (89%) 170 (100%) Not globally interesting 504 (100%) 479 (95%) 23 (4.5%) 2 (0.39%) 502 (100%) 481 (95%) Default content 493 469 22 2 491 471 Minimal content 11 10 1 0 11 10 Unknown 1,446 (100%) 798 (55%) 474 (33%) 174 (12%) 1,272 (88%) 972 (67%) Config/status pages 683 212 327 144 539 356 Database interface 61 61 0 0 61 61 Restricted content 17 17 0 0 17 17 No response 685 508 147 30 655 538 Table 2.5: Summary of content served by web servers detected. identified as serving non-interesting content. Though 69% is still a surprisingly large percent, our method for removing external scans (described in Section 2.4.3) does not remove the effects of all scanners including some web crawlers, hence many non- interesting servers are still found. There are a large number of servers (685 servers) which did not respond after initial discovery. The vast majority of these servers have transient IP addresses, and are pos- sibly unintentional default web servers on dial-up machines, or potentially intentional web servers on machines with stable IP addresses, but their web server is found by active probing the host’s VPN interface. Transient Hosts We next consider transient hosts—hosts which change IP addresses, or which are often turned on and off. We expect that passive monitoring will out perform active probing in server discov- ery when looking at transient hosts since active probing may miss hosts that come and go. On the other hand, we expect relatively few active services to run on transient hosts, since just as transience make them difficult for an active scan to find, it also makes them difficult for clients to find. 38 0 500 1000 1500 2000 2500 09-18 09-20 09-22 09-24 09-26 09-28 09-30 10-02 10-04 10-06 10-08 Cumulative Number of Servers Found Time (month-day) Effects of external scans mitigated With external scans Figure 2.5: Cumulative server discovery with and without the effect of external network scans. To evaluate this hypothesis, we compute service discovery for IP addresses that we know correspond to transient hosts. Our dataset is drawn from a large campus network with known blocks of addresses allocated to VPN, PPP, Wireless and DHCP hosts. Of the 16,130 addresses, 2,296 of them correspond to transient blocks (one /22 campus DHCP; two /23s, DHCP and wireless; and one /24 subnet, for VPNs); we call this subset D TCP1-18d-trans . We then compare active and passive server discovery over this subset. Figure 2.6 shows server discovery over time for both passive monitoring and active probing, grouped by different address space classes. Ground truth is defined by the union of passive and active discovery of each service type. We omit wireless from this graph, since unfortunately we were not able to actively probe the wireless address range. In addition, passive monitoring found no services in the wireless region. Overall, D TCP1-18d-trans confirms the relative performance of active and passive monitoring. Active probing usually discovers more hosts than passive monitoring, 39 0 10 20 30 40 50 60 70 80 90 100 09-18 09-20 09-22 09-24 09-26 09-28 09-30 10-02 10-04 10-06 10-08 Percent of total servers (union of methods) found Time (month-day) Active Probes Continuous Passive Monitoring Passive VPN Active PPP Passive PPP Active VPN Active DHCP Passive DHCP Figure 2.6: Server discovery grouped by transience of address block. except for the PPP subset where they are relatively close. This result is perhaps not surprising for transient hosts since there is likely to be relatively few active users of services that come and go. However, our analysis of transient address is interesting because different kinds of transient address space show somewhat different results. The data for DHCP addresses is most similar to our general results. This similarity can be explained because the majority the DHCP addresses are dedicated to Residence Halls, with an allocation policy where each student keeps the same IP for a full semester or more. However, for PPP addresses, passive discovery finds about 15% more servers than active. We speculate that this inversion is because PPP hosts are typically active only for short periods of time. Another significant difference is monitoring VPN addresses, where passive discov- ery finds almost no services (10 after 18 days), while active finds many (nearly 100 in the same time). A possible explanation for this is that VPN hosts often have two IP 40 addresses, one that corresponds to VPN access and another that is direct access to the Internet. While active service discovery suggests that many of these hosts run services, passive discovery says that the VPN address is very rarely used. We speculate that users of services on these hosts are typically using the non-VPN address. Finally, our focus on transient hosts suggests that address transience is a major cause of service birth and death. We reach this conclusion because, in Figure 2.6, server discovery does not level off. However, because actual hosts-to-address mappings are transient, this discovery may represent a small number of hosts simply moving to dif- ferent addresses rather than a large number of actual hosts. If that were the case, we would expect server discovery to converge when all transient addresses were marked as servers. While address reassignment may account for some server births, it does not account for all. When we compare server discover with and without transient hosts in Figure 2.3. We review this question when we consider very long passive monitoring in Section 2.4.2. Protocols The previous section we looked at how address stability affects server discovery. In this section we look to what how different services and service types affect server discovery. We expect that different services are used in different ways and so may be available to different degrees. For example, a MySQL service may be firewall protected since it is a service provided to a limited number of users, whereas a webserver typically has a more global audience and will not be firewall protected. To evaluate the effects of service type, we return to D TCP1-18d , but break out server discovery by different server types. We consider four services: Web, FTP, SSH and MySQL. 41 Total Passive Active OR Passive Active Passive Service (Union) AND Active Active only Passive only (non exclusive) Web 2,120 (100%) 1,428 (67%) 497 (23%) 195 (9.2%) 1,925 (91%) 1,623 (77%) FTP 815 (100%) 566 (68%) 241 (30%) 8 (1.0%) 807 (99%) 574 (70%) SSH 925 (100%) 701 (76%) 221 (24%) 3 (3.2%) 922 (100%) 704 (76%) MySQL 164 (100%) 78 (48%) 79 (48%) 7 (4.2%) 157 (96%) 85 (52%) Table 2.6: Summary of server discovery broken down by service type. Figure 2.7 shows discovery over time for both passive and active for these specific services, and Table 2.6 summarizes server discovery. Ground truth is the union of active and passive discovery in D TCP1-18d . The results for specific services confirm our overall conclusion that active probing discovers more servers than passive monitoring. Passive monitoring discovers particularly few MySQL servers, achieving only 52% completeness, while active scans reach 96% completeness. We suspect that the majority of MySQL servers on campus are used locally, with little external access, or external access only through web interfaces. In Figure 2.7 the stepped and sudden increases in passive MySQL server discovery indicate MySQL servers are probed from external sources, yet interestingly, these scans are not nearly as helpful in passive service dis- covery as for other services. Upon inspection of our passive and active data, we find that 63 out of the 79 MySQL servers missed by passive responded to our campus probes on 9/29, just after a large external scan probed the campus address space for MySQL servers. Though these 63 MySQL servers were probed, we observed no responses. Potentially, these missed MySQL servers block probes from external sources, but still respond to our internal active probes, hindering passive discovery from our monitoring point, but enabling active discovery. While active probing finds nearly all FTP and SSH servers (99% and 100% respec- tively), passive monitoring finds significantly fewer. This suggests that many of these 42 0 10 20 30 40 50 60 70 80 90 100 09-18 09-20 09-22 09-24 09-26 09-28 09-30 10-02 10-04 10-06 10-08 Percent of total servers (union of methods) found Time (month-day) Active Probes Continuous Passive Monitoring Passive Active SSH FTP MySQL Web Figure 2.7: Server discovery over time for passive monitoring and active probing, bro- ken down by protocol. servers exist but that they are infrequently used. For FTP, this result consistent with HTTP replacing FTP as the primary means of data dissemination. We presume that FTP servers are primarily legacy servers. For SSH, this result is consistent with a workstation model of use, where nearly all hosts are available for remote access via SSH, but that protocol is used primarily for maintenance, while most workstation access is direct at the console. These results are dependent on the particular services we examined. While we expect our basic results to hold for other general well-known services, we speculate that protocols such as peer-to-peer file sharing may be different since they are known to have a much higher server turnover rate (churn) [BDGM02]. 43 2.4.5 Discovery of UDP Services The majority of this work considers only discovery of TCP services because the TCP connection setup makes them easy to discover. In this section we broaden our view to consider UDP service discovery with both active probing and passive monitoring. We consider four selected UDP services: port 80 (HTTP and other applications), 53 (DNS), 137 (Microsoft Windows NetBIOS Name Service), and 27015 (common multiplayer game port). Dataset D UDP collects 24 hours of passive monitoring and one active scan, both only considering the primary /16 network at USC. The active probes are not customized to an expected application, in other words, we use generic UDP probing for active host discovery. Our passive monitoring considers any packets with the above destination ports as indicating the presence of the corresponding service on that host. Generic UDP probing is difficult because there is no generic positive response for service present. We therefore interpret only an ICMP port unreachable as a true negative response and a UDP reply a true positive response. If a host responds to some probes and not to others, we know the host is alive, and can then consider a lack of response as suggesting a possibly open service. We are able to make this final conclusion on the assumption that we did not generate a proper application-specific request, but most kernels generate negative ICMP responses when no service is present. Finally, if no ports give an explicit response (either positive or negative), we assume no host is present. (Nmap contains support for service-specific probing, however, we were not allowed to use that service due to potential privacy concerns.) We expect active probing to perform well at detecting DNS and NetBIOS name servers because these two protocols are common and these servers often respond to generic UDP probes. 44 service All Web DNS NetBIOS Gaming port 80 53 137 27015 Passive 37 0 32 4 1 Active definitely open (UDP response) 116 0 52 64 0 possibly open 4,862 137 376 4,238 111 no response from any probed port 6,359 - - - - definitely closed (ICMP response) 9,826 9,687 9,449 5,572 9,713 Table 2.7: Summary of UDP services discovered. Table 2.7 summarizes services discovered by passive monitoring and active probing. Of the 37 servers found by passive monitoring only one was not found by active probing, indicating that considering any traffic from these selected ports to confirm the presence of a server obtains accurate, but not complete, results. The vast majority of hosts indicated as possible UDP servers by active probing sent no response to external sources. Given the prevalence of Microsoft Windows Operating Systems which use the peer-to-peer NetBIOS name server protocol, it is not surprising that a large number of hosts on campus have port 137 open. We observe only 37 UDP servers on the NetBIOS port. Though NetBIOS has the potential to generate a significant amount of traffic, under normal circumstances, NetBIOS traffic does not typically cross border routers. 2.5 Sensitivity Section 3.11 presented the general results of our work, but deployment of either passive or active measurement requires understanding of a number of parameters, including when and how frequently to perform active probes and completeness of passive obser- vation. We evaluate these factors here to understand their effect on our general results. 45 2.5.1 Time and Frequency of Active Probing Our results in Section 3.11 based on active probing rely on probes done periodically at set intervals. In this section we explore how the time of day and the frequency of these probes affects active probing service discovery. In datasets D TCP1-18d and D TCPbreak , active probes occur every 12 hours. Each scan started daily at 11am and then again at 11pm and took 90–120 minutes to cover the address space. We expect that the time these probes were done directly affects the number of servers discovered. To evaluate the effect of probe time-of-day we re-examine D TCP1-18d . We take the full D TCP1-18d , with both passive and active discovery, as ground truth. We compare three time-of-day dependent subsets against this baseline. First, we select the 17 probes taken every 24 hours in the daytime (11am) or at night (11pm). While these subsets cap- ture time-of-day dependence, they also have the scan frequency and so are not directly comparable to the 35-probe dataset. We therefore also consider a third subset where we take alternative day and night measurements from each consecutive day to get an unbiased mix of 17 day and night observations. Figure 2.8 shows cumulative server discovery over multiple probes for the baseline and three subsets. We first evaluate time-of-day dependence, looking at the completeness scanning at night and during the day achieves. Though the difference is small, scanning during the day is marginally more effective than scanning at night, reducing completeness by 3%. This is not surprising since we expect that there are more transient hosts with active services available during the day. While scanning at night finds 232 servers not found by scanning during the day, scanning during the day finds 325 not found at night. 46 0 10 20 30 40 50 60 70 80 90 100 09-18 09-20 09-22 09-24 09-26 09-28 09-30 10-02 10-04 10-06 10-08 Percent of total hosts found by active probing Time (month-day) Probing every 24 hours -- day Probing every 24 hours -- night Probing alternating day and night Probing every 12 hours at 11am and 11pm Figure 2.8: Comparison of network scanning at different times of day. These differences strongly suggest that host discovery done every 24-hours is affected by diurnal patterns. The shortfall of probing once a day may be due to the use of fewer probes. When we compare alternative probes at day and night, we can keep 17 probes as in day- or night-only, but factor out the time of day. In this case we see performance like day-only probing. This result suggests that number of probes is more important than capturing day-only or night-only servers. Ultimately, by reducing the probe frequency we reduce our completeness by 8% after 18 days. 2.5.2 Partial Perspectives in Passive Monitoring Ideally passive monitoring sees all traffic to and from the monitored network. However, this complete viewpoint may be difficult to achieve at multi-homed sites. The problem can be further complicated by policy restrictions which may limit the type of traffic 47 transferred over monitored links. In this section we evaluate two types of partial per- spectives in passive monitoring: homogeneous partial perspectives where unmonitored links carry traffic of a similar composition to monitored links, and heterogeneous partial perspectives where unmonitored links have different policy restrictions from monitored links and therefore carry a different composition of traffic. Our university connects to the internet through a regional network that has three peerings with commercial ISPs; in addition we have an Internet2 connection. For most of our datasets we monitor two of the three commercial peerings of our university’s regional network, and we estimate we capture 99% of all university traffic not destined to Internet2. For dataset D TCPbreak , we also monitored our university’s Internet2 peering. To evaluate our partial observation of a network we would like to compare a full and partial observation. However, as described in Section 2.3.2, we do not have a complete monitoring view. While we cannot compare a complete view of traffic to a partial view, we can look at how subsets of our observation affect completeness of our results. To evaluate the effect of homogeneous partial perspectives we use datasets D TCP1-18d and D TCPbreak in which we monitored two commercial links. We can then compare servers found exclusively on each links to see how the homogeneous partial perspective of monitoring only one of the commercial links, and not both, would affect our results. Table 2.8 summarizes the number of servers found from each link (and possibly on other links) as well as the number of servers found exclusively on a specific link. In both D TCP1-18d and D TCPbreak both commercial links see the majority of total servers found, with a range of 0.05–9.5% of the servers found exclusively on a single commercial link. Given the high number of servers found on both commercial links in 48 servers found in D TCP1-18d D TCPbreak link duplicative exclusive duplicative exclusive Commercial 1 1,874 (89%) 201 (9.5%) 1,770 (96%) 59 (3.2%) Commercial 2 1,874 (89%) 39 (1.8%) 1,711 (93%) 1(.05%) Internet2 — — 669 (36%) 3 (.16%) all 2,111 — 1,835 — Table 2.8: Summary of servers found on each of the three monitored links. both D TCP1-18d and D TCPbreak (89–96%), we conclude that a homogeneous partial perspective does not greatly affect core results. To evaluate the effect of heterogeneous partial perspectives, we use dataset D TCPbreak in which we monitored our university’s Internet2 peering, as well as the two commercial links monitored in our other datasets. We can then look at service discovery over the unrestricted commercial links and compare this discovery to service discovery over the Internet2 peering which is restricted by Internet2’s academic-only policy. The D TCPbreak in Table 2.8 summarizes the number of servers found from our monitored commercial links as well as an Internet2 link. Though both commercial links observe most servers, the Internet2 link observes only about 36% of the servers in D TCPbreak . We conclude that policies placed on monitored links can strongly affect service discovery with passive monitoring, though the effect is greatly dependent on the restrictiveness of the policy. From these results we can also conclude that given the very small number of servers seen exclusively on Internet2 (3 servers total), an addition of Internet2 data to our main datasets (D TCP1-18d and D TCP1-90d ) would not greatly affect our results or change our conclusions. 49 2.5.3 Passive Monitoring with Sampled Observations Our observation system is able to collect and process a complete trace because our link speeds are fairly low (1Gb/s), we only collect packet headers (64B/packet), and we only process TCP packets with SYN-ACK flags set. However, passive monitoring becomes hard at very high bitrates, such as a 10Gb/s link speed or shifting to deeper packet inspection. An alternative to collecting a complete packet header trace is to sample packet headers and observe only a fraction of the traffic on a link. In this section we explore the effect of using various sampling durations on service discovery in passive monitoring. There are several possible approaches to sampling: observing and then idling for a fixed period of time, collecting a fixed number of packet headers and then idling, or collecting each packet header with some (non-unity) probability. These approaches are increasingly amenable to higher speed or hardware realizations. Here we consider only sampling for fixed durations; evaluation of other kinds of sampling is left as future work. We return to dataset D TCP1-18d to evaluate the effects of fixed-period sampling. In Figure 2.9 we sample data from the first 2, 5, 10 and 30 minutes of each hour (3%, 8%, 16%, and 50% of the data, respectively) and compare how each sample duration affects service discovery throughout the 18 day trace period. As in previous sections, we define ground truth as the union of servers found both passively and actively throughout the full dataset D TCP1-18d , then evaluate sampled data for completeness against this ground truth. As expected, capturing a greater portion of the data provides a closer match to a complete observation. However, the relationship between sampling and coverage is not linear—capturing only 50% of the data does not require doubling the observation period 50 0 10 20 30 40 50 60 70 80 09-18 09-20 09-22 09-24 09-26 09-28 09-30 10-02 10-04 10-06 10-08 Percent of total servers found by continuous monitoring Time (month-day) 2 min 5 min 10 min 30 min No sampling Figure 2.9: Cumulative server discovery with different duration, fixed-period sampling. to get the same results. In fact, sampling at 30 minute durations is almost as effective as monitoring continuously, with only a 5% drop in the number of servers discovered over 18 days. Capturing only 16% of the data results only in an 11% drop in discovered servers. The relationship between sampling duration and cumulative discovered servers is not directly proportional primarily because of the effect of external scans. As described in Section 2.4.3, external scans are important to the completeness of passive monitor- ing. Since scans are often rapid but short, whether or not a scan is caught in sampled observation affects the coverage of that observation. Full and 30-minute samples both are greatly aided by scans on 9-20 in Figure 2.9, while the servers found in this scan are likely found by a different scan on 9-22 for the 5- and 10-minute samples. 51 2.6 Related Work Beyond qualitative evaluations in trade publications [Sch05], there has been little eval- uation of passive service discovery in the research community and even less comparing passive methods to periodic active scans for service discovery. Closest to our work is Webster et al. [WLZ06], where they passively monitored 800 workstations and servers located in a network demilitarized zone (DMZ). for a period of 86 days. All 800 hosts were actively probed twice: once ten days before the beginning of the passive study and once more at the end. Our work was independently developed and differs in many ways. We perform multiple active probes periodically during passive monitoring. Our host population is much larger and much more diverse, including multiple types of transient and dynamic hosts. Thus our study includes many additional dimensions such as the effectiveness of active discovery over multiple scans initiated at different times of the day, the effectiveness of both passive/active techniques for discovering services on transient and dynamic hosts, and the unintentional effects of external scans. Our study also covers additional metrics for measuring completeness, including a popularity metric derived independently from passive measurements, and the effects of sampling on passive monitoring. There has been considerable interest in passive monitoring, leading to a number of widely used tools. De Montigny-Leboeuf et al. discuss how a variety of information can be obtained through passive monitoring and how this information can be used to aid in policy enforcement and intrusion detection [MLM04]. Tools such as P0f (Passive OS Fingerprinting) rely on examination of packet content (although they can also be used in active mode). Intrusion detection systems such as Bro [Pax99] and Snort [sno] rely mostly on passive monitoring to maintain situation awareness. Dayioglu et al. discuss how intrusion detection systems can benefit from using a hybrid approach of both active 52 and passive methods [D ¨ O01]. Examples of hybrid approaches include Prelude, a hybrid IDS framework that combines a large number of other tools (e.g., Snort and Nessus), and Ettercap, a suite of attack tools for man-in-the-middle attacks. This work offers important insight into the power of passive monitoring, and our approach could benefit from their sometimes more sophisticated forms of monitoring. However, our work adds to this work a quantitative comparison between passive and active methods. For our work we used Nmap [nma] to perform active probing, but there are a number of other network scanning tools available. Popular scanners such as Nessus [nes], offer a large number of tools to assess services and identify specific vulnerabilities in a network. Though all of these tools, including Nmap, offer optimizations and vulnerability identi- fication not studied in this body of work, the core principle of active probing remains the same and our work can capitalize on better methods of active probing as they become available. Our work complements these tools, however, by indicating cases that passive monitoring can miss, such as transient hosts. Finally, passive monitoring has been widely used for traffic analysis and model- ing (some examples include traffic engineering [DG00], web [SHJO01] and peer-to- peer [GDS + 03] workloads, and model parameterization [LH02]). Our work differs from this work in that we explore service discovery rather than modeling or analysis of a par- ticular service’s traffic. Our passive monitoring shares with this work the same set of questions about completeness when monitoring is only partial. 2.7 Conclusions Service discovery is vital for protecting and administrating networks across organiza- tional boundaries, as well as monitoring and researching growth trends. Often, con- straints such as time and privacy concerns limit the frequency of active scans or the 53 duration of passive monitoring, and it is important to understand how these constraints affect results. In this work, we quantified a variety of factors that directly impact passive and active service discovery. We have shown that passive and active service discovery are complimentary methods for discovering services on a network. While active discovery finds servers without relying on client activity, it misses services not available at the time of probing and those which actively block probes. Passive discovery quickly finds very popular services, even if these services are protected by firewalls. Over time, passive discovery is able to find intermittent and protected services that are missed by active probing. Interestingly, this process is greatly aided by external, potentially malicious scans. Service discovery is a central and key part of network reconnaissance. Through our comparison study we demonstrate that passive methods clearly enable network recon- naissance in this important area, and provide results which cannot be obtained through active methods alone. This comparison study provides a strong example within the solution space for network reconnaissance (discussed in Section 1.1), and supports our thesis statement by giving a powerful example of passive methods performing network reconnaissance without deep-packet inspection. Both our passive and active discovery methods refrain from using deep-packet inspection, and rely on port numbers to pro- vide application information. Even without deep packet inspection, our passive and active methods provide comprehensive and essential information for providing for and protecting a network. In addition to being direct proof that passive techniques enable network reconnais- sance, the conclusions drawn from this study indicate the usefulness of passive tech- niques outside of service discovery. The depth of information provided by passive techniques as well as its ability to find intermittent communication applies not just to 54 performing reconnaissance on servers, but also reconnaissance on peers and clients— regardless of the application target. For example, studying the popularity of a service (Section 2.4.1) gives information not just on the service, but the clients of the service as well. We can extrapolate the results of this study and see that passive techniques are beneficial in an ambit of targets ranging from a specific network application to a diverse set of applications. By quantifying the strengths and pitfalls of passive techniques in service discovery we can now intelligently apply passive techniques in other areas of reconnaissance. In Section 2.4.4, we briefly discussed how discovery of peer-to-peer communication may differ from traditional service discovery due to the high turn over rate peers. In the next chapter, we look in depth at targeting peer-to-peer communication and the success of passive methods at identifying such communication. 55 Chapter 3 Identifying Peer-to-Peer: An Example of a Specific Target Previously we described the solution space for network reconnaissance (Section 1.1). One of the three dimensions of the 3D network solution space is the target type, which ranges from general to specific. In the previous chapter, we looked at service discovery, which targets all available services on a network. While this set of targets is relatively general—since services is a broad category of applications—in this chapter we look at targeting a more specific class of applications—peer-to-peer file sharing (P2P) applica- tions. Both studies test the effectiveness of passive and blind techniques, but this study demonstrates the effectiveness of passive and blind techniques in identifying a more specific set of applications, rather than a general class of applications. The red area in Figure 3.1 depicts how this study of identifying P2P explores the solution space dis- cussed in Section 1.1. Through this study we demonstrate the validity of our thesis in this new area of network reconnaissance—the area of specific reconnaissance targets. Peer-to-peer file sharing is of particular interest in network reconnaissance for three main reasons. First, since peers typically accept connections and obtain data from unau- thenticated sources, every peer on a network is a potential security vulnerability. Also, file sharing can demand a significant portion of network bandwidth, so understanding 56 Figure 3.1: 3D depiction of how Chapter 3 explores the network reconnaissance solution space. the amount of peer sharing which occurs can be necessary for monitoring and plan- ning for a network. Last, organizations may maintain restrictive policies regarding file sharing, so monitoring peer-to-peer activity may be necessary for policy auditing. In this study we also explore a small and secondary area of the solution space. In part of our validation for our passive identification of P2P we use an active probing method targeted at a specific P2P program—BitTorrent. While exploring this area of reconnais- sance does not directly support our thesis, it does help quantify the effectiveness of our passive techniques. 57 We again support our thesis by showing that blind and passive techniques can func- tion effectively in an important area of network reconnaissance. 3.1 Introduction Identifying and filtering network traffic is central to firewalls and intrusion-detection systems. The majority of these systems deployed today use ports or packet signatures to classify traffic for filtering. While fast and effective for typical traffic, these approaches are becoming less and less effective because both ports and packet contents are easy to conceal, either intentionally or accidentally. We see three reasons for a greater need to identify network applications by packet header information alone, rather than packet payload. First, benign traffic often varies its port usage and packet contents. For example, traffic using remote-procedure calls, multiplexed protocols such as SOAP, or UDP-based protocols (like NSF or SIP) often varies port usage and communicates ports out-of-band. An increasing use of traffic encryption hides packet-contents, both with network-level approaches like IPsec, and application-level tunnels like ssh or TLS. Second, malware and protocols that receive mixed acceptance often intentionally hide identity by varying port usage and packet contents. Protocols such as Skype and P2P file sharing often hide themselves out of concern for restrictive use policies in some networks. Finally, even when traffic is not accidentally or actively concealing itself, ISP pol- icy concerns sometimes prevent analysis of data packet contents. For example, in the United States, laws about student privacy and wiretapping can be interpreted to preclude analysis of packet data contents. 58 The goal of this work is to identify network applications based on their inherent char- acteristics without considering packet contents. We therefore distinguish application behaviors that are easily changed or incidental, from those behaviors that are inherent and would incur a performance penalty or require application restructuring to change. In this sense, we are investigating blind techniques to identify applications [KPF05]. To identify inherent behaviors, we take a top-down approach, first identifying an application’s goals, then the network behaviors implied by these goals, and finally spe- cific metrics that capture these behaviors. We evaluate our approach by considering two popular P2P file sharing applications: BitTorrent and Gnutella. We evaluate our detection methods with two full day traffic traces taken from a regional ISP in 2005 and 2006, and compare our detection rates to ground truth obtained by manual analysis of the data. After evaluating our methods, we use our methods and more standard port-based methods for P2P detection, to study P2P usage at the University of Southern California. The contribution of this work is two-fold. First we identify and evaluate several metrics that are applicable to blind identification of multiple types of P2P file sharing applications. Second, we evaluate the usage of P2P in a university environment. We show that the metrics we identify can detect hosts running BitTorrent applications with an 83% true positive rate with a 2% false positive rate and detect hosts running Gnutella with a a 75% true positive rate with a 4% false positive rate. Of the P2P peers caught by our system, 75% required less than 10 minutes of trace data to determine P2P activity. Lastly, we show that previous estimates of P2P usage are on the high-side and P2P usage on our campus ranges from 21%-33% by traffic volume with only 3-13% of campus hosts participating in P2P sharing. 59 We suggest that our approach to identifying inherent behaviors will be applicable to other protocols. 3.2 Related Work There are three general areas of related work: detection based on network port usage, packet payload, and traffic behavior. Port-based detection works well for applications that use well-known port numbers. While it does not depend on packet payload, port usage is an incidental behavior that can be easily changed. Many applications, including most P2P applications, already include mechanisms allowing the user to change port assignment. Payload-based detection uses signatures derived from application-specific packet contents. Examples of techniques for P2P detection include by Sen et al. [SSW04] and Karagiannis et al. [KBB + 05]. A disadvantage of payload detection techniques is that signatures must be discovered in advance, either manually or automatically, and updated frequently. Once signatures are discovered payload detection is effective. However, they rely on incidental behavior because applications can easily alter packet contents to foil signatures, and do not work when payload is encrypted. In addition, they can not be used in environments where privacy concerns prohibit payload examination. Port- and payload-based signatures are widely used today. Unfortunately, both rely on relatively ephemeral behaviors: port assignments are easily changed, and payload contents can be hidden by encryption or randomization. We therefore do not consider these approaches further. An alternative is to detect based on network behaviors such as an application’s packet trace and communications pattern, since these can be often more difficult to conceal. Karagiannis et al., identify P2P traffic from connection patterns and the concurrent use 60 of UDP and TCP [KBFk04]. Constantinou and Mavrommatis classify P2P traffic based on connection direction and number of peers in a connected group [CM06]. In later work, Karagiannis et al. introduce BLINC [KPF05], a general classification mechanism that classifies hosts based on protocol usage, port usage and connection patterns. These methods rely on behavior that is inherent to P2P applications. While our approach is similar in that it uses inherent behavior, our metrics require significantly less state than the methods used in this previous work. Further more, we quantify our on-line detection time. We differ from BLINC further in that our metrics are selected to be specific to P2P file sharing, and work by drawing out the inherent characteristics which are unique to P2P. Closest to our work is that by Collins et al. [CR06] who distinguish BitTorrent flows from FTP, HTTP and SMTP flows between pairs of hosts. They study three metrics: packet size (looking for small control messages), amount of data exchanged between hosts, and rate of failed connections. We do not consider packet size to be an inher- ent metric since it is easily spoofable. The later two metrics are inherent, and we have independently determined that failed connections are an important indicator of P2P traf- fic. Our work differs from theirs through the addition of two other inherent behaviors (ratio of incoming-to-outgoing connections and privileged-to-non-privileged ports); by demonstrating that this approach applies to multiple kinds of P2P traffic, not just BitTor- rent; and by demonstrating that our approach can operate on-line rather than post-facto. 3.3 Inherent Behaviors in P2P In this section we investigate three behaviors inherent to P2P applications. In Section 3.4 we map these behaviors to specific metrics for detection. 61 0 200 400 600 800 1000 1200 1400 Time (in seconds) BitTorrent Active Incoming Connections Outgoing Connections Successful Failed Failed with RST Figure 3.2: BitTorrent Peer 0 200 400 600 800 1000 1200 1400 Time (in seconds) Gnutella Active Incoming Connections Outgoing Connections Successful Failed RST Figure 3.3: Gnutella Leaf Peer Our target applications are BitTorrent and Gnutella as specific targets. Both are file sharing protocols described in detail elsewhere. For our purposes, the important characteristics of BitTorrent is that a peer typically contracts a tracker to find out about other peers. It then directly contacts many peers (often 20) to exchange pieces of those files. Gnutella instead uses a two-tier system of leaf peers and ultrapeers. Leaf peers typically talk only to ultrapeers, while ultrapeers communicate widely with both each other and leaf peers. Figures 3.2 and 3.3 show new TCP connections for both BitTorrent and Gnutella. The x-axis represents time in seconds, while each connection is numbered sequentially on the y-axis. Symbols indicate when a connection is successfully started (a square) and when they fail to connect (a plus) or are terminated with a RST (an X). Figure 3.2 shows a BitTorrent client performing a partial BitTorrent download of a large ISO image over 62 15 minutes. Figure 3.3 depicts the connections made from a Gnutella leaf node that performs several searches and partial downloads over 1240s. Since the Gnutella peer was a leaf peer that shared no files, there were no incoming TCP connections. 3.3.1 Peer Coordination and Failed Connections P2P file sharing is effective because peers share with each other directly rather than only contacting centralized servers. Since peers are end-user machines, there is considerable churn as they come and go frequently [BDGM02]. Mechanisms which track the pres- ence of peers do so imperfectly, and this information is quickly out of date when given to a new peer. As a result, an inherent behavior of P2P sharing are many failed attempts to contact peers that have left the network. At the network level, these failed contacts result in TCP RST messages from a busy or no-longer participating peer, or in multiple SYN packets attempting to start a con- nection and timing out. We see both these behaviors in Figure 3.2: at time 0 four hosts out of ten send a reset and three hosts do not respond at all. At time 680s we observe the peer that we monitor attempt to replace a departed neighbor and there is another burst of failures. The same behavior occurs in Gnutella, where Figure 3.3, shows sev- eral attempted downloads (e.g. at 700s and 900s) In four out of ten of these attempted downloads, two or more sources never respond or sent a TCP reset. This behavior is not only common to P2P traffic, but relatively uncommon among more traditional client/server applications. In client/server protocols, the servers are often well known and persistent. Failures are usually due to misconfiguration or hard- ware failure and there are not usually small clusters of failures. 63 3.3.2 Bidirectional Connections P2P applications not only start connections with peers, but each peer attempts to main- tain this network independently. Since peers are equivalent, this means each initiates and receives new connections. Client/server hosts instead primarily either initiate con- nections (clients) or receive them (servers). Thus, unlike client-server applications, an inherent behavior of many hosts in a P2P application is a balance of both incoming and outgoing connections. We can see this in our samples, where in Figure 3.2, the BitTorrent peer makes 44 outgoing connections and accepts 30 incoming connections. Gnutella’s two-level architecture differs, and leaf nodes initiate connections (in Figure 3.3 the leaf starts 5 connections to ultrapeers at time 0), but ultrapeers maintain both incoming and outgoing connections. 3.3.3 User Accessibility P2P file sharing applications are typically user-level processes operating on a variety of platforms and user environments, using unprivileged ports. Thus, a P2P file-sharing connection will typically have source and destination ports above 1024, unlike server applications such as mail and web servers, which typically use well-known privileged ports. In Figure 3.2, out of 44 peers contacted by the BitTorrent peer, none were listening on a privileged port. Additionally, out of the 143 unique peers suggested by the tracker, only one was listening on a privileged port. All connections in the trace used unpriv- ileged source ports. In Figure 3.3, out of 119 remote peers contacted by the Gnutella peer, none were listening on a privileged port. As with BitTorrent, all connections in the trace used unprivileged source ports. 64 3.4 Implementation The previous section outlined three P2P file sharing application behaviors which are identifiable at the network level. In this section we translate these behaviors into spe- cific, testable metrics and describe how they can be used to perform on-line detection. Given the metrics and corresponding tests, we re-evaluate the status of each host as new associated flows appear, looking for whether that host is P2P or non-P2P. We describe this process in detail in Section 3.4.3. 3.4.1 Translating Behaviors to Metrics Peer coordination and failed connections: As discussed in Section 3.3.1, coordina- tion with other peers often corresponds to bursts of failed connections. We capture this behavior with the following ratio of failed connections: M PC = failed out successful out + failed out where failed out is the total number of new outgoing connections that fail and successful out is the total number of new outgoing connections that were successfully established. Values of M PC tend to be low for normal clients and servers, medium (0.1–0.8, our thresholds) for P2P hosts, and high (more than 0.8) for hosts doing port scans. Bidirectional connections: As discussed in section 3.3.2, P2P clients both initiate and receive new connections. We to capture this behavior we use the following ratio of bidirectional connections, 65 M BC = successful in successful out + successful in where successful in and successful out is the total number of new, successfully estab- lished, incoming and outgoing connections. The metric M BC will be close to 1 for servers, and close to0 for clients, and we consider values between 0.2 and 0.9 indicative of P2P hosts. User accessibility: As discussed in section 3.3.3, although the individual port number varies, P2P clients connect to unprivileged ports, while other clients connect to standard servers on privileged ports. Thus we define: M UA = successful user2user successful in + successful out where successful user2user is the number of successful connections which have a source and destination port in the unprivileged range and successful in + successful out is the number of total new connections at that host which were successful. For clients and servers, the expected value for ratio M UA is near 0. Hosts doing user-level P2P run closer to 1; we consider any value over 0.2 to indicate a potential P2P host. 3.4.2 Metrics to Tests We must now map individual ratios M X into binary tests that confirm or disclaim P2P traffic on a host. P2P traffic corresponds to medium values of each ratio, so we define high and low thresholdsh X and ℓ X . In general, high values indicate non-P2P behav- iors (such as port-scanning), so exceedingh X terminates the test as non-P2P host. Low 66 values often occur when a new host appears, so we consider values belowl X as inconclu- sive. Values in-between the thresholds after a warm-up number of connections positively indicate a P2P host. Thus we define a test T X as: T X = indeterminate, if M X <ℓ X ; p2p, ifℓ X ≥ M X <h X ; non-p2p, ifh X < M X . While each metric by itself corresponds to a specific P2P behavior, we found indi- vidual metrics to be noisy. We therefore test multiple metrics in parallel. A negative P2P result from any metric disqualifies a host, while a positive result from all metrics is required to flag the host as P2P. Typically, we evaluate each metric over all connections over a sliding time win- dow until we get either positive or negative confirmation of P2P activity. However, failed connections captured by M PC primarily occur at the beginning of a P2P session. When combining multiple metrics, M PC often triggers before the others, but then can be “washed out” by the time the other metrics trigger. We therefore define a “sticky” equivalent M sPC , as indicative of P2P traffic provided no metric indicated non-P2P and the host was flagged as P2P over the last window of time. 3.4.3 System Operation Our system runs on top of a continuous network tracing infrastructure [HBP + 05]. We transform the packet-level trace into a flow-level trace by observing only the TCP SYN and SYN-ACK packets. We identify failed connections by four or more duplicate SYNs, and compute the ratio of incoming connections and privileged/non-privileged ports by 67 looking at all flows to a particular destination. We process the data sequentially, on- line, evaluating the metrics for each source IP address once we have 10 connections or more. If at any time the metrics indicate a positive or negative result we classify the host as P2P or non-P2P and then discard any remaining information in the time window pertaining to that host. Otherwise, we continue to acquire information about that host until we reach a conclusion, or until flows time out after the configurable sliding window of time, currently set at 20 minutes. The pseudo code in Algorithm 1 summarizes the steps taken each sliding time win- dow. Algorithm 1 Steps done each sliding window to process TCP connection records ordered by start time of the connection. 1: ClearHostStructure() 2: CR = ReadNextConnectionRecord() 3: while CR.StartTime< Window.EndTime do 4: ThisHost = CR.USCHost 5: if ThisHost / ∈ FlaggedAsP2P and ThisHost / ∈ FlaggedAsNotP2P then 6: UpateHostStructure(ThisHost,CR) 7: HR = HostStructureRecord(ThisHost) 8: if HR.ConnectionCount> RequiredMinConnections then 9: {If we have seen a minimum number of “warm-up” connections, attempt to make a decision.} 10: Result = PerformTests(HR) 11: if Result = FlagAsP2P then 12: Add ThisHost to FlaggedAsP2P 13: else if Result = FlagAsNotP2P then 14: Add ThisHost to FlaggedAsNotP2P 15: else 16: {No decision was made} 17: end if 18: end if 19: end if 20: CR = ReadNextConnectionRecord() 21: end while 68 Data Set Collection Times (local) Total Hosts BitTorrent Hosts Gnutella Hosts 2005 (Aug. 31) 2:00–23:59 10,415 251 n/a 2006 (Oct. 3) 0:00–23:59 9,656 132 160 Table 3.1: Summary of Data Sets metric: M PC M BC M UA M sPC+BC M sPC+UA M all Total unique hosts: 9,656 P2P hosts : 290 Known BitTorrent hosts: 130 True Positives 110 (85%) 114 (88%) 120(92%) 108 (83%) 109 (84%) 108 (83%) False Negatives 20 (15%) 16 (12%) 10 (8%) 22 (17%) 21 (16%) 22 (17%) Known Gnutella hosts: 160 True Positives 123 (77%) 109 (68%) 155 (97%) 93 (58%) 120 (75%) 91 (57%) False Negatives 37 (23%) 51 (32%) 5 (3%) 67 (42%) 40 (25%) 69 (43%) Other Hosts : 9,366 Likely non-P2P: 4,075 False Positives 530(13%) 1,018(25%) n/a 81(2%) n/a n/a True Negatives 3,545(87%) 3,057(75%) n/a 3,994(98%) n/a n/a Discarded as likely-P2P: 608 Unclassified hosts: 4,683 Flagged as P2P 702 (15%) 1,639(35%) 1,592(34%) 140(3%) 187(4%) 70(1%) Not flagged as P2P 3,981(85%) 3,044(65%) 3,091(66%) 4,543(97%) 4,496(96%) 4,613(99%) Table 3.2: Summary of results of BitTorrent detection for 2006 Data Set. 3.5 Evaluation We next evaluate our approach to determine how detection accuracy interacts with false positive rates. Our evaluation uses network packet traces from two network taps at Los Nettos, a regional ISP in the Los Angeles area serving both commercial and academic institutions. We collected two datasets, each about 24 hours long, August 31, 2005 [USC05] and October 3 2006 [USC06c].We see qualitatively similar results for both traces and present only the 2006 data here due to space constraints. 3.5.1 Detection Accuracy for BitTorrent We first look at detection accuracy to verify that our approaches do successfully identify P2P traffic. To establish ground truth, we classify some hosts as known BitTorrent hosts first by identifying flows on the default BitTorrent tracker port (6969). We then manually verify 69 that the destination was a BitTorrent server by contacting it ourselves within several hours of the trace collection. The Known BitTorrent section of Table 3.2 shows the 130 hosts we identified. We first observe that each individual metric is successful at detecting the majority of known BitTorrent hosts (85–92%), and that M UA detects the most. Our 2005 trace (not shown here due to space) shows similar qualitative results. Among the individual metrics, M UA appears to be the best, with both low false pos- itives and false negatives. However, this advantage is an artifact of our ground truth— since non-BitTorrent traffic (described in the next section) always includes some priv- ileged ports, this metric is artificially perfect. Unfortunately, it will also consistently classify any applications that use only non-privileged ports as P2P, including direct user-to-user chat programs and games. Our simple definition of ground truth places these applications in the unknown category and therefore excludes them from this anal- ysis. However, we suggest that UA is useful in M all because it helps eliminate the few client-server applications that would be detected as a false positive by M sPC+BC . Finally, we observe that the combined metrics M sPC+BC and M all perform almost as well as the stand-alone metrics at detecting true positives (83% and 84% compared to 85–92%), and the combined metrics perform much better in reducing false positives (2% instead of 13–25%). 3.5.2 Understanding false positive rate Even if the system performs well at detecting P2P hosts, it will not be useful if it also falsely tags many non-P2P hosts as P2P. We therefore evaluate the false positive rate of individual and combined metrics. 70 Although it is easy to confirm the presence of known P2P traffic to a host, it is signif- icantly more difficult to prove absence of P2P traffic. To establish a rough body of non- P2P hosts, we first remove all known P2P hosts from our population and select half of the remaining hosts. (We will use the other half in Section 3.5.4.) While these hosts include no known P2P hosts (those running on well known P2P ports), there may be some hosts using non-standard ports. We assume these non-standard ports are non-privileged, so we therefore discard 608 hosts that have only non-privileged-to-non-privileged ports as “potential P2P”. We label the remaining hosts as likely non-P2P. This decision is con- servative since we know that it is possible to tunnel P2P traffic over well-known ports (such as web port 80). We use this set of likely non-P2P hosts to look for false positives in our metrics. We expect that individual metrics will have some number of false positives: port scanners and misconfigured machines or servers can accidentally trigger M PC , and some services that have bidirectional traffic (such as DNS) and user machines that host some servers can trigger M BC . Note that we cannot consider M UA with this methodology because our definition of likely non-P2P distorts this metric. The likely non-P2P section of Table 3.2 shows these results. Individual metrics show moderate-to-high false positive rates (13–25%). Because the number of likely non- P2P hosts is so much larger than the number of known P2P hosts, these false positive rates imply 5–10 errors for every true positive. Such high false positive rates mean that an individual metric is impractical without additional confirmation. Examination of specific traces suggest that many M PC failures are due to false identification of port scans as P2P. We examined a few cases of M BC failure; they were typically due to user hosts that also run small server applications. 71 Our hope is that combining multiple metrics can reduce the false positive rate, and M sPC+BC shows a false positive rate of only 2% rather than 13–25%. This success is because the false positives are triggered by different circumstances. Combining all three metrics in M all eliminates all false positives, but as described above this is an anomaly due to our definition of likely-non-P2P. From our evaluation of true and false positives we conclude that the combined met- rics are essential to get good accuracy and few false positives. The combined metrics show only a few percent reduction (2–5%) in detection accuracy for BitTorrent (although a larger drop for Gnutella, 19%, when using all metrics), while the percent of false pos- itives is cut in four. 3.5.3 Effectiveness for Gnutella Since our detection methods are based on behaviors of P2P applications in general, and not specific to the BitTorrent protocol, we expect that our system is capable of detecting hosts running other P2P applications. To test this claim we next evaluate our approach on Gnutella hosts. We establish Gnutella ground truth as all hosts that contact known Gnutella ultra- peers. We track Gnutella ultrapeers by joining the Gnutella network repeatedly on the day of trace collection and recording lists of the suggested ultrapeers. Some protocol differences between BitTorrent and Gnutella affect our metrics, how- ever. We expect that metrics M PC and M UA will perform well at detecting Gnutella, but because of Gnutella’s two-tiered architecture, M BC will not perform nearly as well. The known Gnutella section in Table 3.2 shows that M PC alone detects 77% of the Gnutella hosts. However, as discussed in Section 3.5.2, this metric alone has a high false 72 positive rate and so we need combined metrics to reduce false positives. We see that M all is still fairly effective at detecting Gnutella, detecting 57% of the known Gnutella hosts. We observed that M BC does not work well with Gnutella because only ultrapeers are bidirectional, not Gnutella leaf nodes unless a leaf peer is uploading. We observe that M sPC+UA detects nearly as many true positives as M PC alone (75% vs. 77%), but significantly decreases false positives. For networks where Gnutella is very prevalent, this metric may be preferable to M all . 3.5.4 Estimating previously undetected P2P hosts Our above analysis isolated traffic into known-P2P and likely-non-P2P categories to study the accuracy of our approaches. We next look at unclassified traffic to estimate how many hosts appear to be P2P file sharing but escaped identification as known-P2P. To estimate P2P traffic in unclassified traffic, we start with the half of hosts not considered above. These hosts exclude all traffic to known trackers, so they all would be unclassified by detection schemes using known sites. We then run our detection algorithms on these hosts and examine the hosts flagged as P2P. The unclassified section of Table 3.2 shows our estimate of P2P traffic in this sample of hosts. Our combined metrics flagged 1.5% (70 hosts) of the unclassified hosts as running P2P applications. Our analysis of the false positive rate of M sPC+BC suggests that at least half of these are true positives. Although we do not know the likely true positive rate for M all , it should be greater since M all reduces further the number of flagged hosts by including metric M UA . To confirm that some of these 70 hosts have P2P traffic we looked at what ports they use. Of these 70 hosts, 17 made connections to remote hosts on default BitTorrent ports (6969, 6881–6888) and 15 made connections to remote hosts on the default Gnutella 73 metric: M PC M BC M UA M sPC+BC M sPC+UA M all Total unique hosts: 10,415 P2P hosts : 251 Known BitTorrent hosts: 251 True Positives 210 (84%) 219 (87%) 225 (90%) 201 (80%) 204 (81%) 201 (80%) False Negatives 41 (16%) 32 (13%) 26 (10%) 50 (20%) 47 (19%) 50 (20%) Other Hosts : 10,415 Likely non-P2P: 4,378 False Positives 525 (12%) 1,313 (30%) n/a 175 (4%) n/a n/a True Negatives 3,853 (88%) 3,065 (70%) n/a 4,203(96%) n/a n/a Discarded as likely-P2P: 704 Unclassified hosts: 5,082 Flagged as P2P 711 (14%) 1,677 (33%) 1,575 (31%) 101 (2%) 152 (3%) 50 (1%) Not flagged as P2P 4,371 (86%) 3,405 (67%) 3,507 (69%) 4,981 (98%) 4,930 (97%) 5,032 (99%) Table 3.3: Summary of results of BitTorrent detection for 2005 Data Set. port (6346), strongly suggesting that we successfully found true P2P traffic. (If the host had contacted a known tracker or ultrapeer we would have already classified it as known-P2P.) We conjecture that some of the other hosts were doing P2P sharing on non- standard ports, although we could not confirm that at the time. Finally, our analysis of these unclassified hosts sheds some light on the benefit of adding M UA to form M all — this addition reduces the number of hosts identified as potential P2P in half (70 vs. 187). Of the 17 hosts just described none are eliminated, suggesting (but not proving) that M all does not reduce the true positive rate. 3.6 Evaluation of Second Dataset To further determine the effectiveness of our approach, we perform a second evaluation of our metrics using our second data set from 2005. We expect that the results from the 2005 data set are similar to the results obtained from the 2006 data set. Our results over the 2005 data set are summarized in Table 3.3. 74 3.6.1 Detection accuracy for BitTorrent in 2005 data Following the same methodology as described in section 3.5.1, we examine our 2005 data and show in this section that our detection accuracy is similar across both data sets. The Known BitTorrent section of Table 3.3 shows the 251 hosts we identified in our 2005 data set as running BitTorrent. As expected, our detection accuracy with the 2005 data set is similar to our 2006 data set. Individual metrics are successful at detecting the majority of BitTorrent hosts (84–90%). Combined metrics perform almost as well (80–81%) as individual metrics over the 2005 data set. These results are comparable to the results we obtained using our 2006 data set. As seen in the 2006 data set, combining metrics slightly decreases the true positive rate, but significantly decreases the false positive rate (up to 26%) as shown by the combined metric M sPC+BC . Again, false positive rates involving metric M UA are not included in the table because our definition of likely non-P2P hosts distorts the false positive rate for metric M UA . 3.7 Result Sensitivity We use several fixed constants in our system: the metric thresholds, the sliding time window size and the minimum number of connections considered before a decision can be made. Ideally, varying these constants will have little to no effect on our results. In this appendix, we will demonstrate how varying our constants affects our results. 3.7.1 Detection speed As well as being accurate, we wish to detect P2P hosts quickly. To estimate detection time we consider known P2P hosts. We identify the first contact with a known tracker 75 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 CDF Time in Minutes 2006, Gnutella 2005, BitTorrent 2006, BitTorrent Figure 3.4: Time until detection or ultrapeer as the “start” time and then determine how much later we classify that host as P2P. This start time corresponds to an unrealizable, idealized detection system based on a perfectly known P2P network, and so it represents a conservative estimate of our detection time. Figure 3.4 shows CDFs of detection time for both BitTorrent and Gnutella for our two traces. As is shown, about 75% of the time we can identify a P2P client in less than ten minutes, and about one-fifth of the time we can decide within a minute. Given that P2P applications often run for tens of minutes, we believe these detection times are more than sufficient for on-line identification. To understand the cause of the delay we looked at how the combined metrics operate. For BitTorrent, M PC triggers quickly, but M BC is much slower. These timings are consistent with the behaviors they track, since failed connections occur when a new peer starts up and actively probes other peers, while bidirectional communication happens only later as other peers learn about the target host and connect to it. In the case of 76 Gnutella, M PC often does not trigger till several minutes after contacting ultrapeers. We believe this slower trigger is because ultrapeers are more reliable than typical BitTorrent peers, and with Gnutella M PC is triggered only when a peer attempts to contact remote resources and download files. In addition to the delay described above, our current implementation batches packet traces into 2–6 minute segments. Thus actual delay in our current implementation is up to 16 minutes 75% of the time. This batching is due to our data collection sys- tem [HBP + 05] and could be greatly reduced or eliminated by integrating trace collection with metric evaluation. 3.7.2 Sensitivity to Threshold Selection Our first set of constants are the lower and upper threshold bounds for each of our met- rics. We have set our threshold values fairly permissive, allowing a large range of values to trigger a metric. In this section we explore how the lower bound thresholds affect the distinguishing ability of metrics M PC and M BC . We do not consider metric M UA in this discussion due to the lack of good ground truth in determining false positives. We expect that there is no single cutoff point for each metric at which the metric performs ideally. There are two separate reasons a single cutoff point is problematic. First, our methods are designed to detect a variety of types of peers. The variety of behaviors leads to a variety of metric values. For example, a BitTorrent peer joining a peer group with a high rate of churn will have a much higher value for M PC than a Gnutella leaf peer performing a search in which only a few sources are non-responsive. Second, P2P behaviors at a host may be somewhat masked by other on-going activi- ties at a host. If a user is surfing the content on several reliable web servers while running a P2P peer, the successful HTTP connections will lower the value of M PC . Similarly, if 77 a user follows several bad links to web servers which are down, the failed connections to these servers will raise the value of M PC independently from the host’s P2P activity. As shown in Figure 3.5(a), metric M PC is a relatively strong distinguisher after a lower bound cutoff of 0.2. This is not true for metric M BC as seen in Figure 3.5(b). This is partly because there are many non-P2P activities which can trigger M BC including other user-to-user programs or a mix of server and client activities. As shown in section 3.5.2, it is the combination of metrics which reduces the false positive rate. Combining M PC with M BC reduces the false positive rate 8–26%. 3.7.3 Sensitivity to Window Size Our approach computes metrics based on connections in a sliding time window. During our evaluation of our method we set our sliding time window size to be 20 minutes. We expect that our results are not directly dependent on the size of the sliding win- dow since our metrics are updated per flow and not computed over the entire time win- dow. However, too large of a time window may cause undesirable merging of two distinct dynamic hosts which during the window share a single IP. To minimize the potential problem of dynamic hosts, we do not examine window sizes greater than 50 minutes. To confirm the independence of our detection ability on our sliding window size, we examined window sizes from 1 to 50 minutes, varying the window size by 5 minutes at a time. Figure 3.6(b) shows the effect of the window size the true positive rate and false negative rate. As expected, there is no significant increase in the false positives at any specific window size. There is, however, a very slight decline in the true positive rate as the window size increases. This is due mostly to M sPC not triggering because activities at a 78 host earlier during the time window “wash out” bursts of failed connections which occur later in the time window. 3.7.4 Detection Sensitivity to Minimum Connections Threshold As with our sliding window size, the minimum number of connections we consider before attempting to make a decision is a fixed number. We wish to show that our choice of requiring at least 10 connections before a decision is made reduces the false positive rate without significantly reducing the true positive rate. We expect that there is a range of threshold values for the minimum number of connections considered for which the results are roughly the same; however, at some point the minimum number of connections considered does greatly affect the results. If too few connections are considered, the likelihood of a host being falsely identified as P2P is expected to increase since metrics can be easily triggered over a small number of connections. If too many connections are considered, metrics are likely to be “washed out” by other activities at the host. We examined from 5–150 minimum connections. Figure 3.6(a) summarizes the effect of minimum connections considered on the false positive rate and true positive rate. There is a slight increase in false positives at fewer than 10 connections, which is consistent with our expectations. The true positive rate drops steadily above 20 connections and significantly above 100 connections. Partly, the drop in true positives is due to metrics being diluted by non-P2P activities at the host as more and more connections are considered in the deci- sion. The overall steady decline in true positives is due to peers never making the required minimum connections. Requiring too many connections before a decision is 79 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Positive Rate (Positives/Total) Lower Threshold Value for M1 True Positives, 2005 True Positives, 2006 False Positives, 2005 False Positives, 2006 (a) M PC True and False Positive Rates 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Positive Rate (Positives/Total) Lower Threshold Value for M2 True Positives, 2005 True Positives, 2006 False Positives, 2005 False Positives, 2006 (b) M BC True and False Positive Rates Figure 3.5: Effect of M PC and M BC lowerbound thresholds 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 20 40 60 80 100 120 140 160 Positive Rate (Positives/Total) Number of Min Connections False Positives, 2006 True Positives, 2006 (a) Varying Minimum Connections 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 5 10 15 20 25 30 35 40 45 50 Positive Rate (Positives/Total) Window Size in Minutes False Positives, 2006 True Positives, 2006 (b) Varying Sliding Window Size Figure 3.6: Effects of Chosen Constants 81 made excludes peers which are relatively idle and never make the minimum number of connections during the time window. Setting our minimum connection threshold to 10 avoids eliminating relatively idle peers while still reducing the false positives seen when too few connections are consid- ered. 3.8 Using Peer-to-Peer Identification to Measure P2P Usage Using the passive methods described in this chapter, and a new method which uses port numbers to identify P2P, we performed network reconnaissance at the University of Southern California to understand the usage of peer-to-peer file sharing on our network. Since the emergence of peer-to-peer (P2P) file sharing applications in 1999 there has been a steady increase in the use of P2P file sharing. P2P sharing began primarily as a means to share music, often in violation of copyright laws. Today it is widely used to share a range of media, including music, video, and large data sets such as open source operating systems. It remains controversial, both because it supports a mix of illegal and legal content, and because widespread use can consume a significant amount of network bandwidth. Although there has been a great deal of interest in understanding P2P file sharing, there has been relatively little quantitative measurement. A study at the University of Calgary is one of the few well documented studies, reporting an average of 38% of bytes of traffic represent P2P sharing [MW06]. While this study includes two years of data, it provided relatively little detail about who uses P2P and how many hosts were involved. 82 Other reports are often anecdotal, leaving the details of the methodology in doubt, but reporting up to 80% of outgoing bytes at ISPs consisting of P2P traffic. Following P2P trends has become important because P2P consumes significant net- work resources. In addition, there is great interest in understanding the degree of shar- ing of illegal content, and the opportunities and frequency of use of P2P to share legal content. Our goal in this work is to prove quantitative evaluation of P2P file sharing, including information about the relative use on academic-only networks like Internet2, and use by students relative to the general university population. In this section we discus educated estimates of the amount of P2P file sharing detected over the University of Southern California’s campus network. We monitor two (of two) commercial access links to Los Nettos and one (of two) Internet2 links. Of all traffic passing over these links, we analyze traffic for USC’s two ranges of network addresses. Since P2P traffic often conceals itself in different ways we use two different, complementary methods to identify P2P traffic: first we identify P2P hosts based on communication over standard well-known P2P ports, second we use a novel technique that identifies P2P hosts based on their pattern of communication with other hosts, iden- tifying patterns that are inherent to P2P activities. From this analysis of these packet traces we present approximate statistics about P2P traffic quantities at USC. We estimate that 3–13% of active hosts at USC participate in P2P activities and account for at most 21–33% of the traffic volume at USC (Section 3.11.1). We also quantify activity on commercial networks as compared to academic networks like Inter- net2 (Section 3.11.3), and by some types of network access (wired, dormitory, etc., Section 3.11.4). We demonstrate that student lab networks and resident hall networks account for the majority of P2P activity at USC, indicating that students are the main users of P2P applications on campus. Because we do not have access to packet data, 83 these measures provide only bounds on P2P traffic, we cannot comment the types of data being shared (either music or data, restricted or freely available). 3.9 A Second Approach to Identify P2P In previous sections we discussed inherent behaviors of P2P (Section 3.3) and an imple- mentation (Section 3.4) of passive and blind techniques to identify peer-to-peer hosts and communication. These techniques are based on inherent network behavior. In this section, we now introduce a secondary method of identifying peer-to-peer based on a less inherent method—port usage. We use this secondary method to further study the use of peer-to-peer file-sharing in our university environment. The simplest method to identify hosts running a target application is to look at which port numbers a host has open for incoming connections. Firewalls and filters often rely on remote servers listening on one or more well known ports to determine which outgo- ing connections to block or allow. Port-based has in the past been an effective method to both identify hosts running a particular application and identify specific application flows. While many well known protocols typically do listen on a well known port, such as port 80 for web traffic, applications which have reasons to hide often use non-standard ports to evade detection. It has become increasingly clear that it is not possible to iden- tify P2P flows based solely on port numbers [CFEK06,KBB + 05], and in fact port num- bers are not useful in general for application detection [CFEK06]. With traditional well-known port number detection, we can easily identify P2P hosts which still listen on a standard P2P port. To enhance traditional port number detection, 84 Protocol Standard Ports BitTorrent 6969, 6881–6889 eDonkey 4661–4671 WinMx 6257, 6699 Gnutella 6346, 6347 Kazaa 1214 Table 3.4: Well known ports used by our port-based method. we also look for hiding peers to communicate with peers which still listen on a stan- dard P2P port. With this enhancement, it only takes one connection to identify a host participating in P2P activities, even if the host is not listening on a standard P2P port. The port numbers we use to identify P2P traffic are summarized in Table 3.4. We target five popular P2P protocols using 26 well known port numbers. We feel this set of ports is a complete set of well established popular P2P ports which non-P2P communi- cation typically will not use. 3.10 Data Collection and Evaluation Methodology In this section we describe how we collected data to estimate the amount of P2P traffic at USC, while Section 3.11 presents the results. Our evaluation uses USC network traffic captured from two of three commercial provider links at Los Nettos, a regional ISP, and one of two Internet2 links. Full network packet traces were collected during a 14 hour period from December 14th, 2006 at 9pm to December 15th at 11am [USC06b]. Over the 14 hour monitoring period, we compile a list of all hosts detected via the port-based method as discussed in section 3.9 and all hosts detected via inherent meth- ods discussed in section 3.4. 85 We also quantify the volume of P2P traffic at USC. Because both of our methods identify hosts participating in P2P and not individual P2P flows, our methods can not directly calculate the P2P traffic volume. To estimate the P2P traffic volume, we count all bytes to and from an identified P2P host during the 14 hour monitoring period as P2P traffic. Counting all traffic to and from a host for the full monitoring duration is a conservative decision and will lead to an overestimate of P2P traffic volume because a host participating in P2P will likely also be running other applications. 3.11 P2P Activity at USC In the following sections we present estimates on the amount and types of P2P activity at USC. 3.11.1 Estimating Total P2P Activity We begin by establishing upper and lower estimates of the total amount of P2P activity on the USC network. We first quantify how many hosts are detected in total (the union), by both methods (the intersection) and by only one of the two methods (see Figure 3.7). We claim the union represents an upper bound on the amount of P2P activity and the intersection represents a lower bound. We discuss the validity of this claim in the next section. Table 3.5 summarizes the number of hosts and volume of traffic which was seen by our three monitoring points. The union shows an upper bound of 13% of USC’s hosts participate in P2P activities and account for 33% of the total traffic volume seen over the three links we monitor. The intersection suggests a lower-bound estimate of only 3% of 86 Figure 3.7: Venn diagram of P2P hosts hosts volume Total 16,120 (100%) 1,431 (100%) GB Identified as P2P (UNION) 2,051 (13%) 468 GB (33%) Inherent-based Only 164 (1%) 26 GB (2%) Port-based Only 1,423 (9%) 139 GB (10%) Both (INTERSECTION) 464 (3%) 303 GB (21%) Not identified as P2P 14,069 (87%) 963 GB (67%) Table 3.5: Summary of P2P activity at USC 87 Counting Method Total Identified Identified by Identified by Identified by both (Union) Port-based Inherent (Intersection) Hosts 2,051 1,887 628 464 Bytes 468 GB 442 GB 329 GB 303 GB Table 3.6: Summary of P2P Identified USC’s hosts participate in P2P activities, accounting for 21% of the total traffic volume seen at our monitored links. From these results, it appears that prior reports that up to 80% of traffic is due to P2P applications do not apply to USC’s university environment. 3.11.2 Comparison of Detection Methods In the previous section we presented an estimate of the total amount of P2P activity present at USC. We based our estimate on activity identified by two methods. In this section, we give perspective on the upper and lower estimates given in the previous section by comparing the P2P behaviors caught by each method. We expect to see a significant overlap in the hosts detected by each method since both methods are designed specifically to detect P2P; however, we do not expect the overlap to be close to complete. Each method has limitations which cause the method to miss specific types of P2P behavior, and each method has separate causes for false identifications, causing a decrease in the intersection. Table 3.6 summarizes the overlap between hosts identified by the two methods as participating in P2P activities and bytes identified as P2P. Despite the limitations of each method, we claim that the union, with 2,051 hosts, represents a fair upper bound of P2P activity at USC. Each method’s limitations in detecting P2P activity is offset by the other method. Our inherent-behavior-based method will capture active P2P hosts, even hosts which do not use standard or well- known P2P ports. Our port-based detection needs only one connection to detect a P2P 88 host, and so is able to capture relatively idle P2P hosts which our inherent-behavior- based method may miss. Because hosts in the intersection were identified by two separate and independent methods, we believe these 464 hosts represent a solid set of true positives, and offer a reasonable lower bound. Port-based detection does not identify 26% of the 628 hosts identified by inherent- behavior-based detection. The fact that port-based detection misses a significant portion of hosts is not surprising since our port-based method will miss any P2P host which never communicates over a standard P2P port including hosts using P2P protocols which avoid using well established P2P ports all together. Inherent behavior based detection does not identify 75% of the 1,887 hosts iden- tified by port-based detection. It does, however, identify 68% of the 442 GB of traffic identified by port-based detection, implying the inherent based method catches the high- volume P2P hosts. Our inherent-based detection does miss idle hosts and is more sen- sitive to incomplete traffic views than our port-based detection. Our inherent-behavior- based method will miss P2P hosts which are relatively idle or which perform the major- ity of their P2P activity over an unmonitored link. (We require at least 10 new connec- tions to be made at a host within a 20 minute time window in order to reach a decision.) However, missing idle peers does not greatly affect our estimates in the previous section since idle hosts do not contribute greatly to the P2P traffic volume. We can quantify how many idle P2P hosts are missed by our inherent-behavior-based method by defining an idle P2P peer as any host which makes or receives relatively few connections (during our 14 hour monitoring period) over a well known P2P port. As seen in the CDF of the number of connections using a well known P2P port (Figure 3.11.2), of the hosts caught only by port-based detection, 67% made fewer than 20 89 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 CDF Number of Connections Using a Standard P2P Port Connections at hosts found by both methods Connetions at hosts found only by Port-Based Figure 3.8: CDF of number of connections using a standard P2P port connections over a known P2P port. Twenty connections is relatively few compared to other identified P2P hosts, and so the majority of hosts caught only by port-based detection can be considered idle peers. In contrast, of the hosts caught by both methods, only 18% had fewer than 20 P2P related connections. We conclude that the majority of hosts missed by the inherent-behavior-based detection are missed because not enough actual P2P activity is captured. 3.11.3 P2P Activity on Commercial vs. Academic Networks The previous two sections provided insight into the total amount of P2P activity at USC. In this section we give insight into who is participating in P2P networks by quantifying how much of the identified P2P activity is seen on the data collected with commercial ISP peers, versus how much is seen over one link with Internet2’s Abilene Network. We expect that the majority of P2P sharing is done between universities, and not between USC and commercial sites, for two main reasons: students are often attracted to 90 Link Type Total Ingress Egress Academic (Internet2 link) 705 GB 658 GB 47 GB Identified as P2P (union) 299 GB (42%) 297 GB (45%) 2 GB ( 4%) Inherent-based Only 16 GB ( 2%) 16 GB ( 2%) 0 GB ( 0%) Port-based Only 103 GB (15%) 102 GB (16%) 1 GB ( 2%) Both (intersection) 180 GB (25%) 179 GB (27%) 1 GB ( 2%) Not identified as P2P 406 GB (58%) 361 GB (55%) 45 GB (96%) Commercial 726 GB 70 GB 656 GB Identified as P2P (union) 169 GB (23%) 18 GB (26%) 151 GB (23%) Inherent-based Only 10 GB ( 2%) 1 GB ( 2%) 9 GB ( 1%) Port-based Only 36 GB ( 5%) 5 GB ( 7%) 31 GB ( 5%) Both (intersection) 123 GB (16%) 12 GB (17%) 111 GB (17%) Not identified as P2P 557 GB (77%) 52 GB (74%) 505 GB (77%) Table 3.7: Summary of traffic volume over monitored links. P2P and university networks often offer high-bandwidth connections with light restric- tions. The now defunct “i2hub” P2P service, which connected over 400 universities, is an example of the popularity of P2P sharing between university students [i2h]. Because the majority of Internet2’s Abilene Network participants are universities, P2P activity on the monitored Internet2 link is likely traffic between USC and other universities. Therefore, we expect that the majority of P2P activity is over the Internet2 link. Table 3.7 summarizes the traffic volume over the monitored Internet2 links and the two commercial provider links. Any traffic from a host identified as participating in P2P is counted as P2P traffic, as discussed in section 3.10. The traffic monitored at the Internet2 link has a significantly higher percentage of P2P traffic than the commercial provider links (42% vs. 23%). This difference supports our claim that significantly more P2P traffic is inter-campus than between campuses and commercial sites. 91 3.11.4 Type of Host Participating in P2P The previous section claims the majority of P2P traffic is over the Internet2 link because the majority of P2P users are students. To further prove this claim, in this section we look at which machines on campus are participating in P2P activities. The subnetworks we monitored can be broken into six groups: wireless subnets, student lab subnets, student residence subnets, PPP subnets, VPN subnets and other subnets, which include internal operation machines. When we look at the breakdown of how much P2P activity was found on the subnets in each group, we expect that a significant majority of P2P activity is found on the student related subnets, such as the residence subnets. Table 3.8 summarizes the break down of P2P activity by subnet groups. As expected, the majority of P2P traffic is detected on the student resident subnets, both by absolute volume and by percent of the total volume, with P2P accounting for 49–70% of resident hall traffic. By percent, the student labs also have a relatively high volume of P2P, with up to 45% of the total traffic identified as P2P. However, the overlap between the two methods for the student labs is nonexistent, implying that the P2P peers on the lab nets are idle. These results again indicate that students are the main contributors to P2P activity in the university environment. 3.11.5 Ingress vs. Egress The previous sections estimated the total P2P traffic volume at USC. In this section we look at the volume of P2P traffic leaving USC and the amount coming into USC to estimate to what extent USC provides content to P2P file sharing networks. 92 hosts traffic volume Wireless (0–5% traffic due to P2P) 1,024 0.039 GB (100%) identified as P2P (UNION) 31 0.002 GB ( 5%) Identified by Inherent-based only 0 0 GB ( 0%) Identified by Port-based only 31 0.002 GB ( 5%) Identified by both 0 0 GB ( 0%) Not identified as P2P 993 0.037 GB ( 95%) Student Labs (0–45% traffic due to P2P) 1280 40.346 GB (100%) identified as P2P (UNION) 32 18.126 GB ( 45%) Identified by Inherent-based only 2 0.051 GB ( 0%) Identified by Port-based only 30 18.075 GB ( 45%) Identified by both 0 0 GB ( 0%) Not identified as P2P 1248 22.220 GB ( 55%) Residence Halls (49–70% traffic due to P2P) 9,984 518.722 GB (100%) identified as P2P (UNION) 1500 362.956 GB (70%) Identified by Inherent-based only 467 21.628 GB ( 4%) Identified by Port-based only 687 86.726 GB (17%) Identified by both 346 254.602 GB (49%) Not identified as P2P 8484 155.766 GB (30%) PPP (1–2% traffic due to P2P) 1024 50.114 GB (100%) identified as P2P (UNION) 53 0.906 GB ( 2%) Identified by Inherent-based only 24 0.158 GB ( 0%) Identified by Port-based only 17 0.332 GB ( 1%) Identified by both 12 0.416 GB ( 1%) Not identified as P2P 971 49.208 GB (98%) VPN (17–30% traffic due to P2P) 1024 285.387 GB (100%) identified as P2P (UNION) 402 85.643 GB (30%) Identified by Inherent-based only 124 3.202 GB ( 1%) Identified by Port-based only 174 34.172 GB (12%) Identified by both 104 48.269 GB (17%) Not identified as P2P 622 199.744 GB (70%) Other (0% traffic due to P2P) 1,784 536.693 GB (100%) identified as P2P (UNION) 33 1.238 GB ( 0%) Identified by Inherent-based only 11 0.462 GB ( 0%) Identified by Port-based only 20 0.473 GB ( 0%) Identified by both 2 0.303 GB ( 0%) Not identified as P2P 1,751 535.455 GB (100%) Table 3.8: Break down by subnet of identified P2P activity. Due to less restrictions in a university environment, coupled with high bandwidth connections, we expect USC peers to be strong content providers for commercial hosts. Between universities, we expect that the sharing is more mutual. The last two columns in Table 3.7 summarize the traffic entering and leaving USC over the Internet2 link and the commercial links. The volume of traffic leaving USC over the commercial links implies that USC is generally a content provider to non-university hosts (of the 726GB seen over the com- mercial links, 656GB is traffic is leaving USC). The percentages of P2P traffic in either 93 direction are roughly the same (26% of outgoing traffic is P2P, 23% of incoming is P2P), indicating that P2P data flow between USC and commercial sites is proportional to general data flow. Over the Internet2 link, the incoming to outgoing ratio of P2P bytes is nearly 150GB to 1GB. This vast difference implies P2P sharing between USC and other universities is not mutual, with USC leeching more P2P content than it shares. However, this ratio could be skewed due to our monitoring view point. 3.11.6 Determining Popular P2P Protocols The previous sections dealt with the amount of P2P activity at USC and the main con- tributors. In this section we look at which protocols appear to be popular and give insight into which protocols are easiest to detect. We expect to see BitTorrent and Gnutella among the most popular applications. Bit- Torrent has a unique, and popular web-integrated system for directly connecting users interested in downloading and/or sharing a specific resource [btw]. Gnutella has a pop- ular, long standing, network which connects millions of peers [RSR06] through a tiered system. Using our port-based method, we can estimate which protocols are used by the iden- tified P2P hosts. Table 3.9 summarizes the protocol break down of the hosts identified by the port-based method, as well as the number of hosts which were also identified by our inherent-behavior-based method. As expected, BitTorrent and Gnutella appear to be the most popular out of the five protocols our port-based method can identify. Though the overlap in host detection 94 All Identified Inherent Port-based Both (Union) Method Method (Intersection) Total 2,051 628 1,880 464 Protocol Estimated by Ports Used BitTorrent 341 44 341 44 Gnutella 866 145 866 145 Kazaa 55 3 55 3 eDonkey 210 28 210 28 Winmx 1 0 1 0 Multiple Protocols 414 244 414 244 Other 164 164 N/A N/A Table 3.9: Hosts identified as P2P broken down by protocol between the two methods is in the range of 0–13% for each of the protocols, the inherent- based method detects 79% of the bytes detected by the port-based method for the three most popular protocols (BitTorrent, Gnutella and eDonkey). A large number of P2P peers used a mix of port numbers leading us to believe that a large number of P2P users do not have a single protocol preference and use multiple types of P2P applications. There is also a greater overlap between the two methods for the multiple protocols category, indicating that there are fewer false identifications with either method when looking for hosts which use multiple P2P applications. This overlap is not surprising since a host is unlikely to contact or listen on multiple different well-known P2P ports unless it is involved in P2P activities. Also, if multiple P2P applications are run simul- taneously, our inherent-behavior-based method is more likely to detect the host since, presumably, the P2P behavior is increased. 3.12 P2P Usage: Related Work In this section we briefly discuss other areas of research related to this exploration of P2P usages at USC. 95 Closest to this study is a longitudinal comparison study done by Madhukar et al [MW06]. This study is a two year analysis of P2P activity at the University of Calgary and compares three methods of classifying P2P: a port-based method, a signature-based method and a blind technique based on work by Karagiannis et al [KBFk04]. Their find- ings suggest 30–70% of flows on their campus are due to P2P, with P2P responsible for an average of 38% of the bytes transfered. While our study does not compare methods of classification, our final findings offer a more in depth look at the users and sources of P2P in a university environment. Similar to our port-based method, Wagner et al. look for hiding peers which are not listening on standard ports to contact non-hiding peers occasionally [WDHP06]. Their PeerTracker algorithm is successful at detecting the majority of high-volume P2P hosts. 3.13 Conclusions In this chapter, we demonstrated we can use passive methods to identify peer-to-peer traffic. Targeting P2P is an important piece of network reconnaissance because the popu- larity and functionality of P2P directly affects network provisioning and security. Addi- tionally, due to the content often shared within P2P networks, may organizations have site-wide policies regarding peer-to-peer file sharing which central IT may be responsi- ble for enforcing. Our technique demonstrates two contributions to network reconnaissance. First we offer a new technique for identifying P2P by mapping inherent P2P behaviors into met- rics. Our technique can perform on-line detection of P2P and identify most participating hosts within less than 10 minutes of P2P activity. Using a combination of metrics allows for high accuracy and low false positives. 96 Second, we offer a in-depth study of peer-to-peer usage in a university environment. Based on our 14 hour study, we estimate that 21–33% of USC’s traffic is P2P related and 3–13% of the active hosts on campus participate in P2P activities. The work presented in this chapter supports our thesis statement in two ways. First, we offer a concrete example of passive and blind techniques which enable network reconnaissance for a targeted application. We show the success of passive and blind techniques at targeting a specific set of applications—peer-to-peer. Targeting this set of applications is important to network reconnaissance because P2P applications may rep- resent network vulnerabilities, policy violations and the wide-spread use of such appli- cations can impact network provisioning. Second, our approach can be generalized passive and blind reconnaissance tech- niques. By identifying inherent network behaviors of an application, we can then build a reconnaissance application based on blind and passive techniques. This processes can be followed to target any number of network applications with blind and passive techniques. Targeting peer-to-peer file sharing is a relatively specific target for network recon- naissance, In the next chapter we target a more diverse application set, which shares a network behavior—low-rate periodic communication. 97 Chapter 4 Low Rate Periodic Communication: An Example of a General Behavior Target In the previous chapter we looked at targeting a specific set of applications by first choosing network behavior unique to the set and then building a reconnaissance appli- cation based on the behavior. In this chapter we look at targeting a broader range of applications which share a common behavior. As done previously, we will demonstrate a process of choosing a network behavior and exploring passive reconnaissance appli- cations based on the chosen behavior. In this chapter we look at a single and prevalent network behavior—low-rate periodic communication. Such periodic communication is used by many programs, both good and bad to poll for information and check for updates. Using signal processing we can identify low-rate periodic communication, and in this chapter we demonstrate two reconnaissance applications based on this identification. To perform our analysis we use only packet header information, placing this tech- nique towards the blind end of the blind vs. deep axis. Figure 4.1 depicts the area in the solution space explored by this study. This study explores a third and new area of the solution space, using the same process demonstrated in the last chapter—developing a reconnaissance application based on network behavior. Rather than use a set of network 98 Figure 4.1: 3D depiction of how Chapter 4 explores the network reconnaissance solution space. behaviors to target a specific class of applications, in this study, we use a single network behavior to target a variety of applications. 4.1 Introduction As systems become more complicated, system maintenance and inter-machine coordi- nation has become increasingly automated. As a result of this automation, communica- tion is no longer strictly driven by user actions. Instead, computer-initiated communi- cation is now common. Such automatic network communication means that users are 99 increasingly unaware of with whom and when their machine communicates, and what information is being shared. Positive examples of this automatic communication include automatic application or reporting of updates to the operating system or applications and automated tracking of information for later consumption, such as RSS feeds, aggregator polling for new feeds and auction bots checking the current status of an auction. Finally, long-running applications such as peer-to-peer file sharing coordinate periodically to share data and maintain an overlay network. While much of this communication is beneficial to users, not all automatic com- munication is desirable. Some applications may share more information than a user may like, such as web assistants that report the user’s click-stream to advertisers or keyloggers that share a user’s passwords. Other applications inject ads into user brows- ing, periodically polling their controlling sites for updates and new ad content. Finally, compromised computers are connected by increasingly sophisticated control networks, often operating over decentralized, peer-to-peer schemes to form botnets of hundreds of thousands of computers. Frequently, hidden communication is periodic and happens at timescales of minutes to hours. Application and OS updates often happen at a regular interval set by the vendor or user, usually from hourly to weekly. Applications which poll for new information do so on a periodic basis. For example, web sites such as cnn.com use a refresh HTML directive to cause web browsers to reload the page each half-hour. For one-time communications, such as reports of a new operating system or application installation, personal firewalls such as ZoneAlarm often alert users to some of this non-user-driven communication. For behind-the-scenes periodic communication however, a personal firewall does not typically alert a user every time the communication occurs, and such alerts would be quickly ignored if they could not be correlated with an activity. 100 These challenges point to a need for a self-surveillance tool that will warn users when suspicious network activity is detected from their machines. Such an application could run on the user’s machine, or on a network monitoring appliance to be insulated from compromise of end user machines. Completing this work, network administrators could use an application that watches aggregate network traffic for unexpected periodic traffic. While evaluation of these specific applications is outside the scope of this study, we explore the principles lay the foundation for self-surveillance (Sections 4.4 through 4.6), evaluate the range of such applications (Section 4.7) and two prototypes (Section 4.8). This study presents a new approach to understand low-rate, periodic communication of network flows. By low-rate, we mean communication that occurs every few minutes, hours, or days. By periodic, we mean new connections which occur at relatively fixed intervals. We show that such traffic is common in classes of both wanted and unwanted applications (Section 4.7.1), and that these applications are widely present on computers today (Section 4.7.2). Finally, we show that we can identify changes in periodic behavior that can indicate the presence of malware or termination of automatic updates in hosts (Section 4.8.1), and detect periodic RSS traffic in aggregate traffic (Section 4.8.2). Our contribution is three-fold. First, we identify low-rate periodicity as a phenom- ena in network traffic (Section 4.5). Second, we develop a novel method for detect- ing low-rate periodic signals in network traffic using flow-level periodicity and full wavelet expansion (Section 4.4). Although wavelets have been widely used for com- pression [TM02], and have been sometimes applied to traffic [FGHW99,BKPR02,Hus], we are the first to apply full expansion to detection of very low-rate periodic traffic. We show how our approach works (Section 4.6.1) and quantify its sensitivity to background traffic (Section 4.6.2). Finally, we show how detecting changes in periodic communi- cation can be used for self-surveillance and network surveillance (Section 4.8), for to 101 detect a range of positive or malicious behaviors that are pervasive in today’s networks (Section 4.7). 4.2 Related Work Our work builds on two general areas of related work: network-based detection of appli- cation and classification, and the application of spectral techniques to network traffic. 4.2.1 Application Detection There are several ways to detect applications in the network or on the host. Port and payload-based signature detection is widely used to identify applications and classify traffic. Although widely used in intrusion detection systems and traffic classification system today, port assignments can be easily changed and payload can be encrypted or randomized. An alternative to such signature-based schemes is identification of behaviors unique to specific applications, such as an applications pattern of connecting with other comput- ers. Examples of such schemes include work by Karagiannis et al. [KPF05, KBFk04], Constantinou and Mavrommatis [CM06], and Bartlett et al. [BHP07a] where behaviors such as port usage, protocol usage and number of connections made aid in classifying traffic. Our work also looks at network behavior to identify applications, however we look at the previously unexplored behavior of low-rate periodic communication. A final class of network-based detection tools identify traffic anomalies. These sys- tems characterize normal traffic and then watch for unexpected divergence using traffic entropy [GMT05,NSA + 08] or through signal-processing techniques [BKPR02]. Unlike 102 this prior work, we focus on identifying low-rate periodicity in applications, not general characteristics of aggregate traffic. Host-based malware detection represents another class of application-detection schemes related to our work. Malware detection tools such as virus- and spyware scan- ners. run locally on a host and use signatures specific to each malware program to either identify unwanted files on the local system or unwanted incoming downloads. Part of our work looks at detecting new installations of programs such as keyloggers and adware, but unlike typical malware detection, we are the first to look at general network behavior to identify some of these applications. 4.2.2 Spectral Analysis Recent research has applied spectral techniques to identify network anomalies, and study network characteristics such as routing and congestion. Current application of spectral techniques look for high-frequency occurrences to identify anomalous behavior [CKsT02,HHP03,BKPR02,MHK04]. Hussain et al. apply spectral techniques to timeseries of packet arrival times. Based on spectral character- istics, they are able to distinguish between single- and multi-source attacks [HHP03], and identify repeat attacks [HHP06]. Barford et al. use wavelets to analyze SNMP and flow-level information to identify DoS attack and other high frequency anoma- lies [BKPR02]. Magnaghi et al. detect anomalies within TCP flows using a wavelet- based approach to identify network misconfigurations [MHK04]. Spectral techniques have also been employed to to identify bottleneck links [HPHH04, HPH + 09] and rout- ing information [PCJ + 02]. We differ from previous work in three main ways. First, the majority of spectral analysis in computer networks studies events at the packet level. Our work looks at 103 periodicity at the flow-level to identify hosts which maintain regular contact. Second, most prior work considers high frequency behavior; we instead consider events which occur at much very low frequencies (sub-1Hz) and use long observation windows (hours to days) to see such low rate events. Lastly, our application of iterated filtering and detection differs from previous applications of spectral analysis. Finally, there has been some work exploring general approaches to applying signal processing to network traffic [MOHP06]. Our work represents a specific instance of this approach as applied to low-rate periodicity. 4.3 Background At the core, our work focuses on identifying and separating a periodic network behav- ior, the signal, from other network traffic, noise. We show that important kinds of traffic show periodic behavior, providing the signal. The fields of signal processing and detec- tion theory provide decades of research to build upon; we extend this work to develop processing techniques specific to our application, and to show that those techniques can successfully detect low-rate periodic activities in aggregate Internet traffic. Both Fourier and wavelet analysis are widely used to study periodicity in signals. We choose wavelets analysis for two reasons. First, it is well suited to analyzing non- stationary signals with low signal-to-noise events. The wavelet transform represents complicated signals as a function of wavelets [VK95]. Each wavelet is a version of a basis function, possibly scaled in space 104 or translated in time. We use the Haar wavelet as our basis function. The discrete Haar wavelet is: ψ(i) = 1 i = 0 −1 i = 1 0 otherwise The second reason we chose wavelets is that they let us focus on frequency or time, or some combination of both. We use their ability to focus on frequency to indicate the presence or absence of periodic traffic (Section 4.6.1), and their temporal resolution to identify when that traffic starts and stops (Section 4.8.1). Discrete wavelets are scaled and translated only in discrete steps. One helpful way to view wavelets is as iterated filter banks, where a single wavelet acts as a band-pass filter, and scaling is in effect a low-pass filter. Each pass with a high and low-pass filter pair splits the frequency spectrum roughly in half, decomposing the signal. The resulting set of wavelet coefficients contain both time and frequency information. Thus iterated application (Figure 4.2(c)) produces a particular split of frequencies (Figure 4.2(a)). The filter bank in Figure 4.2(c) is widely used in image compression. However, other configurations are possible. The filter bank in Figure 4.2(d) selects a different set of frequencies (Figure 4.2(b)). With very low-rate periods, this ability to “focus in” on the important parts of frequency domain and ignore less interesting parts has potential to improve performance. 4.4 Methodology As just described, wavelets and iterated filter banks are a useful mechanism to identify periodicity in traffic. Although wavelets provide a well developed mathematical theory, 105 (a) Frequency split from filters in Figure c f B B 4B 2B (b) Frequency split from filters in Figure d H L H L L H Original Data 4B 2B B B (c) Filters resulting in Figure a H L H L L H Original Data 4B 2B B B (d) Filters resulting in Figure b Figure 4.2: Filter bank implementation and resulting split in the frequency domain. 106 and there has been some work applying them to network traffic before, discovering infre- quent periodic traffic is particularly demanding because of the long-timescales involved. In this section we describe the four main parts of our approach (roughly following the outline of applying signal processing to networking [MOHP06]): extracting a timeseries of events from network traffic, decomposing the timeseries using an iterated filter bank, visualizing the resulting multi-resolution representation, and detecting the presence of a periodic signal. Our focus on long-timescales influences each of these steps. 4.4.1 Timeseries Extraction To apply signal processing to network traffic we first must generate a timeseries of events that represent network traffic [MOHP06]. We begin by discarding all traffic that is not of interest, then map events of interest into a fixed-interval timeseries of events per time period. Our first step is to subset traffic. As we show later (Section 4.6.2), the signal-to- noise ratio governs our ability to detect behavior of interest. Any irrelevant traffic we can discard decreases the rate of background noise, improving our sensitivity. Exactly what can be discarded depends on the application, and we use different rules in different cases. In general, we consider all TCP flows, but as one example, when searching for malware that is known to be sent over web connections, we could discard all traffic other than TCP connections to port 80. Next, we must define what represents an event of interest. We are interested in long- duration interactions, so we monitor TCP flows rather than individual packets (as has been done previously, see Section 4.2.2) Our input data source is a packet stream, so we identify new TCP flows by the presence of a SYN-ACK packet, and use the arrival time of the SYN-ACK as the start time of the connection and a single event. Malicious traffic 107 with forged SYN-ACK packets may taint this data, but is unlikely to show long-term periodicity and so does not alter our results. After extracting and sampling events, we create a timeseries covering the duration of analysis. We use fixed-duration bins. Since our target events are infrequent (minutes or hours apart), we expect to study durations of 24 to 72 hours. Such studies require less precision than most prior analysis of periodic network traffic, so we use 1s bins. 4.4.2 Multi-resolution Analysis The timeseries provides observations from the network, but our goal is to find periodic behavior in those observations. In Section 4.3 we described how discrete wavelets can be implemented as an iterative filter bank. Figure 4.3 showed who different combinations of filters can focus attention on different portions of the frequency spectrum. That figure illustrates two different filter configurations that extract particular bands. Using a different path through the filter tree, efficiently gives more resolution in a target frequency range. This ability to “focus in” on a range is useful if the target range is know a priori. However, in our work we do not have a pre-determined, specific range of frequen- cies we are interested in. Instead, we want to look for all possible low-rate periods— anything from a period of a few seconds to a few hours may be of interest. If we consider all the combinations of low- and high-pass filters, the full set can be viewed as a com- plete binary tree, which we will refer to as a filter tree. Therefore, we perform a full decomposition, and use all paths through the filter tree. Figure 4.3 shows a filter bank configured for a full decomposition and the logical frequency bands we extract. 108 f B B B B B B B B (a) Frequency split from filters in Figure b H L H L L H Original Data B B H L B B L H L H B B H L B B (b) Filters resulting in Figure a Figure 4.3: Full decomposition of a filter tree, with “flip” in covered frequency bands. Although wavelets are a relatively mature analytic approach, to our knowledge, we are the first to use a full decomposition to simultaneously explore large ranges of fre- quencies in network traffic. Full decomposition requires multiple operations on a single timeseries and appears quite expensive. We employ two optimizations that make our analysis quite efficient. First, because we use the Haar wavelet as our basis function, we can employ simple differencing and averaging implementations of our high- and low-pass filters. Given timeseriesX, we get X H = x 2 −x 1 2 , x 4 −x 3 2 , x 6 −x 5 2 ,..., x n −x n−1 2 109 and X L = x 1 +x 2 2 , x 3 +x 4 2 , x 5 +x 6 2 ,..., x n−1 +x n 2 4.4.3 Periodic Events and Energy Given a multi-scale decomposition of observations, we must know how to identify peri- odic events. The wavelet coefficients define the energy for a given timeseriesX at some pathP in the decomposition tree: e(X P ) ≡ ss(X PL )+ ss(X PH ) ss(X) ≡ n X i=0 x 2 i A concentration of energy in a narrow frequency range indicates the presence of a periodic signal. We show shortly how we use energy to automate detection (Sec- tion 4.4.5). One important aspect of our choice of the Haar wavelet as the basis is that it is orthogonal, so energy is conserved at each level of decomposition: e(X) = e(X L ) + e(X H ). This property means we can normalize energy and evaluate the relative energy of each decomposition as a percentage of total energy. Finally, we will see that it is possible to undershoot or overshoot a given frequency in the filter tree. With insufficient levels of decomposition, energy is spread uniformly across large frequency ranges. With excessive decomposition, imperfections in real- world periods cause traffic is dispersed across several bins. These constraints again motivate our desire to adaptively expand the tree until we find periodic behavior. 110 Figure 4.4: Perfect artificial period of 8s (window aligned). 4.4.4 Filter to Frequency While energy identifies the presence of periodic behavior, we also must know where that behavior occurs—at what frequency. We therefore must map a position in the filter tree to a specific range of periods. At first glance, filter mapping seems easy: low- and high-pass filters each sepa- rate the low and high frequency bands. Unfortunately, because filters are symmetric, repeated application of high pass filters “flip” the covered frequency bands. If we define < as “covers lower frequencies than”, andX ab as applying filtera thenb to timeseries X, thenX L <X H andX LL <X LH , butX HH <X HL . We discuss flipping and how we correct for it in Section 4.4.6. The correct mapping of filters to frequencies is essential for proper detection, and it also supports visualization of energy over the frequency space as well. Figure 4.4 shows 6 levels of expansion of an artificial signal that occurs every eight seconds. In this figure, each row corresponds to one level of decomposition of the filter tree, the scale of the 111 Figure 4.5: Long-duration artificial period of 600s. 112 decomposition. The top row is scale 0, indicating the original timeseries, the next is scale 1, showingX L andX H , and so on. Each row is divided into several blocks, showing increasing frequency from left to right. The top row shows one block, by definition capturing 100% of the energy across all frequencies. Each lower row shows twice as many blocks, each representing energy over bands of half the previous frequency. Finally, we represent energy on thez-axis, using both color (white is large amounts of energy, black little) and a numeric value representing percentage of total energy. Because frequency bands become narrower at each level, we scale color in each row to the maximum energy in any band of that row. Thus in Figure 4.4, all blocks are either black or white since energy is perfectly distributed with in this case, but in later examples intermediate cases appear. While we use visualization to observe a decomposition and assist our intuition, dif- ferences can be subtle, particularly in real data and when traffic with different periods is mixed. We therefor next present a quantitative detection method. 4.4.5 Energy and Frequency to Detection We have shown how periodic events correspond to energy (Section 4.4.3), and how to relate that energy to frequencies (Section 4.4.4). We now combine these to describe our detection algorithm. We also exploit the temporal structure of wavelets to identify the start and stop times of a periodic behavior. For detection of the presence of periodic events, we have two thresholds: the range threshold on the size of the frequency range , and the energy threshold on the amount of energy in a given bin. We expect that energy from non-periodic events will disperse as we perform further decomposition, causing individual paths through the filter tree 113 to correspond to narrower and narrower frequency ranges. Energy from periodic events will instead remain concentrated around a specific frequency throughout decomposition. Therefore, to identify a periodic set of events, we must look for strong energy in a narrow range of frequencies. We must vary our thresholds to match a given location in the filter tree. To the left of the tree, in the lower frequency ranges, we relax our range threshold since we expect lower frequency periods to have more jitter (a few seconds of jitter on a half-hour period is not as significant as a few seconds off on a 5 second period). The further down in the tree, at higher levels of decomposition, we lower the energy threshold since each bin represents a narrower and and overall energy will be dispersed over multiple bins. Specifically, we decrease the energy threshold, such that the threshold for noden is t energy = c ℓ n for a constantc ℓ , set based on the tree depth (c ℓ starts at 0.1 at the first few levels, then increasing linearly). Although we expect events occurring at intervalp to provide energy at frequencyf = 1/p, they also energy to harmonics at small integerk multipleskf. We therefore choose the lowest frequency range in a harmonic set as the period of an identified frequency. Noise and the the pair-wise operation of our filters can affect the harmonic frequencies in different ways causing the harmonics to appear stronger than the base frequency. Occasionally we will identify a periodic signal off of one of its harmonic and incorrectly identify the period as two or three times the signal’s actual frequency. Once we have identified the frequency range of a periodic series of events, we can estimate when the events started and stopped by looking at the timeseries of coefficients 114 corresponding to that frequency range. Recall that each point at pathP in the decom- position contains a timeseries X P . Each element i in this timeseries indicates a time x P i . To find the beginning and ending of an event in time, we look for a consecutive series of strong coefficients x P i . Our current simple approach is to compute the mean μ =E[X P ], then search backwards in time to find the firstx P b <μ as the beginning, and forwards through the signalx P i >μ to find the nextx P e <μ, giving a periodb≤i≤e. Often, the level of decomposition which identified the frequency range contains too lit- tle information in the time domain to make any useful statements about timing. In these cases, we an back up the filter tree two or more levels and examine coefficients at a level with better temporal resolution. Optimizing Second, although our filters reduce to a simple set of additions and subtractions, we can reduce the amount of work done by pruning out certain paths through the filter tree. We prune on both depth and breadth: we cease expanding the whole tree downwards when we have sufficient frequency resolution, and we need not expand a branch of the tree if that branch is “uninteresting”, as defined next. We discuss the number of filtering steps pruning saves us in Section 4.9. 4.4.6 Flipping of Frequency Bands As discussed in Section 4.4.4, repeated application of our filters leads to non-intuitive behavior where covered frequency bands “flip”. We must take this flipping into account to correctly relate filter bands with specific frequencies in periodic communication. Without correctly accounting for flipping there is no way to even approximate a cor- rect frequency range for identified periodic behavior. In this section we discuss why 115 (a) Low-pass filter (b) High-pass filter (c) Up-sampled low-pass filter. (d) Up-sampled high-pass filter. (e) Combination of two low-pass filters. (f) Combination of two high-pass filters. (g) Combination of a high- and then low-pass filter. Figure 4.6: Impulse response of low- and high-pass filters and combinations. 116 / \ / \ / \ → ← / \ / \ / \ / \ →← →← Figure 4.7: Filter depiction for three levels of filtering. and when this flipping occurs, and we show how to account for it when mapping filter coefficients to frequency. Flipping is due to the shifting and contracting of the filters which occurs when we iteratively apply these filters. Flipping is a well known occurrence in wavelets (for exam- ple, it is discussed in homework problem 3.16 in Vetterli’s and Kovacevic’s Wavelets text book [VK95]), yet to our knowledge it has never been precisely defined in published work. We found careful definition essential to apply wavelets to our problem, and such definition required a surprising level of care, even when working with wavelet experts. In wavelets, each successive step repeatedly applies the low- and high-pass filters by first up-sampling the filters and then convolving them with the impulse response from the previous filter(s). This up-sampling both shifts and contracts the filters, changing the expected impulse response of convolved filters. To clarify when flipping occurs, we visually demonstrate how iterative application of our filters effects frequency bands. To start, we first show the impulse response of our low- and high-pass filters. We then show what happens to this response at each iterative application of our filters over three iterations. 117 Figures 4.6(a) and 4.6(b) depict the response of our low- and high-pass filters respec- tively. In these figures, the grey boxes represent the ranges of frequencies each filter passes. These representations depict the ideal frequency response, and do not indicate any overlap or fall-off which is present in the actual filters. Note that the filters are symmetrical aboutπ. From Figures 4.6(a) and 4.6(b), we can see that with the first iteration of a high- and low-pass filters, we have no unexpected results. The low-pass filter allows only the low frequencies to pass, and the high-pass filter allows only the high frequencies to pass. However, if we continue and iteratively filter by combining the first pass of filters with second pass, we discover unexpected results. To combine the first pass of filters with a second pass, we must first up-sample. Figures 4.6(c) and 4.6(d) demonstrate how up-sampling changes the response of the low- and high-pass filters respectively. For each filter, the range of frequencies passed is shifted to the right, and contracted to half its original width. We expect that if two iterations of low-pass filters would result in a combined fil- ter which passed only the lowest frequencies. In other words, if we convolve the up- sampled low-pass filter (Figure 4.6(c)) with the initial low-pass filter (Figure 4.6(a)), we pass only the lowest frequencies. Figure 4.6(e) depicts the combination of two low-pass filters (LL). The dark grey region is the spectral range of frequencies allowed to pass by the combination of two filters (i.e. where the filters align). As expected, the combined low-pass filters pass only the lowest frequencies. Likewise, when we combine two high-pass filters, intuitively we expect to pass only the highest frequencies, however this intuition is incorrect. Figure 4.6(f) shows the combination of two high-pass filters (HH). From the dark grey region in Figure 4.6(f) we can see that the combination of two high-pass filters passes the second highest frequency 118 range, and not the highest range as expected. Instead, the combination of a high and then low-pass filter (shown in Figure 4.6(g)) passes the highest range of frequencies. In other words, the passed frequency ranges of the HL and HH filters are flipped. We can visually see this continued flipping in Figure 4.7, which shows all com- binations of filters at the third level of decomposition. Flipping is marked by arrows indicating the direction we flip in order to correct for the effects of shifting and con- tracting. From Figure 4.7 we can see that flipping occurs every time the highest half of a frequency range is iteratively split again. In order to accurately narrow down the fre- quency of periodic behavior, we need to accurately translate between a filter path which results in identifying a periodic behavior and the frequency range passed by the filters. 4.4.7 Datasets In this work, we use one main dataset, a four-day trace [USC08b] consisting of the majority of traffic entering and exiting the main USC border router using the LAN- DER [HBP + 05]. Due to privacy concerns, we only collect TCP and IP packet headers. Since we are interested in analysis of TCP flows, we discard all by TCP setup pack- ets (SYN, SYN-ACK, FIN, RST), and As discussed in Section 4.4.1, we only use TCP SYN-ACKs for our analysis. Our dataset includes four days of data from Tuesday December 9th, 2008 00:01 to Friday December 12th 23:59 (local time). For most experiments, we use 18-hour subsets of this trace. We use two additional traces in Section 4.8.1, each a three-day trace of a particular machine [USC08a]. All of our traces are freely available by request from the authors. 119 Figure 4.8: Periodicity in traffic to a BitTorrent tracker. (Single client, 300s period, 0.42 SNR). 4.5 Proof of Concept To give a rough idea of how our approach works, we next show several examples of periodic behavior in network traffic. Through these examples, we wish to demonstrate first that popular classes of applications generate periodic traffic and that we can find such behavior. In the next section we move from these demonstrations to a more sys- tematic exploration of the effectiveness of our approach. Here, we look at two different applications that show strong periodicity: BitTorrent and an RSS news feed aggregator. We first consider BitTorrent. We build a timeseries of all TCP flows originating from a single host running several applications, including a BitTorrent client, a web browser, and an e-mail reader. We selected the host from our four-day trace [USC08b] based on port-identification; we manually verified that it was running BitTorrent by connecting to the tracker with which it was conversing. We then extracted all TCP flows from the host (the host’s client was not accepting incoming connections). 120 Figure 4.9: Periodicity in an RSS News feed reader (SNR: 0.62). Figure 4.8 shows our visualization of TCP flows from this host. The tracker is used to coordinating peers sharing a resource (file). In this case we see a strong periodicity around 300s: at the 11th decomposition (the lowest row), 4% of the energy is in the frequency range corresponding to a 292–315s period. We confirmed that this BitTorrent client contacts the tracker (to coordinate data exchange) every 600s. Although we do not see particularly strong energy at 600s, 300s is a harmonic of this frequency, as is 150s that also shows strong energy. We conclude that our approach can detect regular control messages in BitTorrent traffic, and later in Section 4.6.2 we explore why the frequency of the periodic network behavior does not always correlate to the strongest concentration of energy. Figure 4.9 shows traffic from a single host running an RSS feed aggregator (the Wizz plug-in for Firefox). Again, we found this host in our four-day dataset, this time by identifying it by traffic with FeedBurner, a large hosting site for RSS feeds. The RSS aggregator has a configuration that polls for new information every 600s. As with Bit- Torrent, application our our approach shows energy indicating strong periodic behavior, although again it is strongest at 300s, a harmonic of the true application period. 121 These examples have shown real applications have low-rate periodic behavior, and that our approach can pick that behavior out of aggregate traffic. Although both of these applications are benign, we will show later that malicious applications are detectable (Section 4.8.1) and common in our network (Section 4.7.2). We next look more carefully at how detection process works. 4.6 Evaluation of the Approach In this section we systematically evaluate our approach, beginning with artificial data to understand how observations of known behavior are affected by our our system (Sec- tion 4.6.1). We then define and explore noise (Section 4.6.2), the primary impediment to detection. 4.6.1 Does Identification Work? We begin by evaluating our method with artificial data. We start here because artificial data provides complete control over ground truth, allowing us to understand the benefits and limits of our approach. In this section we present a simple artificial signal with no background noise. For our example, we choose an artificial period of 600s. Many applications such as mail clients and RSS aggregators poll with periods of 10–60 minutes. A period of 600s offers two challenges: First, this period is not a power of two times our sample period, so it suffers from window misalignment and frequency band energy leakage. The easiest case would be an aligned signal, where the period of the signal is a power of two. Second, 600s is relatively large compared to our sampling rate of 1s (Section 4.4.1). Such sparse events require multiple levels of decomposition; our graphs omit the top 122 levels since in lower frequency signals the graphs simply show energy spread uniformly across all blocks. Figure 4.5 shows the visualization for an artificial timeseries with an event every 600s (1.6milliHz). We expect to see energy around 600s, but since 600s is not aligned, we expect some blurring. The figure shows strong energy (5%) in the 588–625s. We also see harmonics near 300s, 200s, and 150s. We conclude that, with sufficient levels of decomposition, our approach can detect low-rate periodic behavior. 4.6.2 Effects of Noise The previous section shows how our methodology works with pure signal. In real net- works, the traffic of interest the signal, will be mixed with other traffic, noise, that may distort and obscure the signal. In this section we explore the effects of adding back- ground noise to a controlled (artificial) periodic signal. Our goals are to evaluate how different types of noise distort the signal, and to define a measure of the signal-to-noise ratio (SNR) for our system to quantify interference. Controlled noise In this section we look at two types of background traffic as noise. We begin with simulated background traffic, since it can provide a perfectly controlled level of noise to exercise our system. We then shift to replays of real-world traffic from traces to better capture current network traffic. Since web traffic remains the dominant use of the network in terms of flows, we focus on web models to simulate background noise. We use the Surge web traffic model to generate controlled background traffic [BC98] (version 1.00a). We use the exam- ple Surge parameters, but we reduce the number of total documents retrieved to better 123 Figure 4.10: Mix of foreground traffic (600s period) with Surge simulated background traffic. model a single client (instead of a server). We then extract the traffic for one client and concatenate multiple such instances to generate an 18 hour trace. After Surge generates a web workload, we reduce this workload to a timeseries (TCP flow starts per unit time) as described in Section 4.4. Finally, we mix this traffic with our artificial signal by sim- ply summing the timeseries, element-by-element. We use the 600s periodic signal from Section 4.6.1. Before we quantify effects of background noise, we first show visually how it changes our observations of the signal. We expect two kinds of interference: web traffic will induce other periodic signals (we know it has strong periodicities at small timescales [FGHW99]), and interference may blur the signal, similar to the distortion caused by non-alignment (Section 4.6.1). Figure 4.10 shows the levels 6-12 of decomposition for our periodic signal and the combined Surge background traffic. By comparison, Figure 4.5 showed the same traffic without background traffic. We see the same periods and harmonics as before (around 600s, 300s, 200s, and 150s), but rather than 4–5% of energy, they show up as 1–2%. 124 Thus, the main result of this simulated background traffic is a large increase in the “DC component” of our signal in the left-most bin (consistent application of low-pass filters). This result strongly suggests the importance of quantified (non-visual) methods for detecting periodic behavior. It also suggests the importance of quantifying noise to allow us to quantify sensitivity. We explore this question next. Quantifying Noise with SNR Figures 4.5 and 4.10 show that background traffic greatly reduces our ability to observe low-rate periodicity. To quantify this effect, we next define the signal to noise ratio (SNR) for our environment and estimate what SNRvalues allow detection. In general, SNRis the “amount” of signal to interfering background traffic. Events in our application are flow starts, so we define SNRas the ratio of periodic traffic starts to all traffic starts. To study SNR, we must vary it. There are several strategies to control SNR: either varying the intensity of foreground or background traffic, or changing the period of the foreground. We wish to draw on traces of real traffic, so we cannot easily vary the rate of background traffic (at least, without possibly altering the nature of the application and TCP control loops). We therefore vary the period of the periodic foreground traffic, considering a range of values from periods of 1800s down to 1000s by 100s increments, then by 50s increments below. To quantify the change, we establish a target frequency range that shows the strongest energy with just foreground traffic at a specific level of decomposition (the level of decomposition is dependent upon the foreground traffic). In all cases this was the dominant frequency (the frequency at the period of the foreground traffic). We then compare the fraction of total energy in that bin with and without background traffic, 125 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 Percent of Original (no noise) Coefficient Energy Linear Signal to Noise Ratio (SNR) Web Client 1 Web Client 2 Web Client 3 SURGE generated Figure 4.11: Effect of SNRon Coefficient Energy and we report the fraction of original (foreground-only) energy that appears with mixed traffic. Figure 4.11 shows the effect of the SNRon the original coefficient energy for varying foreground signals, and for four different background traffic loads: Surge (the squares connected with a line), and web traffic extracted from traces of three different clients. First we consider the Surge results. We see signal energy is fairly consistent above SNR values of 0.15—when there is at least one flow of signal to seven of background. With sparser signals, the percent of energy quickly drops off. We applied our detection algorithm (Section 4.4.5), and found good detection as long as the energy at the target period was at least 40% of original, suggesting our approach might work reasonably to SNR values as low as 0.05 (one flow in 20 is signal). While Surge provides an easily controlled background load, the model is now fairly old and may not reflect current web traffic. We therefore chose three web clients at random from our week-long dataset and extracted 18-hour timeseries for each client. 126 We chose the 18-hour period when each client was most active. As with our artificial background traffic, we combine each web client’s timeseries with varying frequencies of periodic events to study the effect of SNRon detection. These values are shown as scatter plots of three different symbols on Figure 4.11. As with Surge, the strength of the signal to the three web clients are consistent and large for large SNRs (above 0.15), where noise only degrades the signal by 30% or less. However, as SNR falls, we begin to see inconsistent results—the fraction of original energy will be 60–80% for a signal at one period and 20–40% for a signal that is only 100s different. We believe this large variability is because of “accidental” low-rate periodicity in today’s web traffic. We conclude that our approach works well provided the signal is at least 10% of the selected traffic. While this may seem high at first, recall that we are interested in traffic a machine initiates (client actions). Thus, we may be able to even monitor busy servers after filtering out externally initiated traffic. The threshold may be quite reasonable for user machines that tend to be mostly idle or have idle periods long enough to detect periodic traffic. A difficult case would be busy clients, but even then further filtering of known benign traffic might be possible, which may bring the SNR down to the required detection level. 4.7 What is periodic? We have just shown that we can detect periodic behavior and identified the limits of detection. But how many periodic applications are out there and how many would a user care about? We next show that many applications are periodic and they occur in real networks. Moreover, we show that there are classes of periodic applications of interest to users and network operators. 127 4.7.1 Variety of applications that show periodic behavior In the previous sections, we demonstrated when and how often we can identify traffic that contains a set of periodic events occurs, even in the presence of noise. We now look at the variety of applications which exhibit periodic behavior that can be detected with our approach. As shown in Section 4.8.1, applications range from the beneficial to the malicious. We enumerate important classes and examples next. OS and application updaters: Nearly every current operating system today (Win- dows, MacOS, and most distributions of Linux including Fedora, Ubuntu, SUSE) includes an automatic update service—such as service is a necessity given the large number of Internet-facing programs that may have vulnerabilities. As attacks have become more sophisticated, an increasing number of applications also include auto- matic updates, including web browsers (Firefox, Google Chrome), tools for browsing web content such as Adobe Acrobat, and programs where the applications change fre- quently, such as peer-to-peer file sharing clients. Update polling periods vary, from once an hour to once a week. We demonstrate our observations of Fedora OS updates in Section 4.8.1. User services: Many user services poll the network regularly to track weather, stocks, news, and other information users are interested in. Weather monitoring services exist for every OS, for example, WeatherEye for Windows and MacOS, weather sidebar of gadgets for Windows, weather dashboard widgets for MacOS, and the Clock applet in Gnome on Linux. These services often poll a single site every 30 to 120 minutes. RSS News Feeds: While weather or stock monitors poll a central server, other applications watch a variety of user-selected sites. RSS News Aggregators and pod- cast trackers are probably the most commonly used example of such applications. Spe- cific examples of RSS readers include the Wizz plugin for Firefox, NewzCrawler and 128 FeedDemon for Windows, Shrook and Cyndicate for MacOS, Liferea and Akregator for Linux. Different tools use different default polling intervals, typically 30 minutes to an hour. Some tools adapt their polling frequency to the monitored website. We look at identifying periodic traffic from RSS feed aggregators in Section 4.8.2. Web Counters: Many web pages include counters and JavaScript to monitor when and how long a web page is seen. Examples include Google Analytics, Yahoo Web Analytics, Microsoft adCenter Analytics, and the Livejournal counter. These tools often insert code into a web site, so that every browser which views the site also contacts the count server. They create periodic behavior because many of the web pages in which they are embedded automatically refresh at regular intervals, often from 5 to 30 minutes. Even without of browser-side scripting languages, these refreshes can quickly generate a large amount of periodic traffic. We have seen such updates at news sites such as CNN, ESPN, and MSNBC. Peer-to-peer protocols: Peer-to-peer protocols must coordinate activities between peers, possibly mediated by a central tracker. Trackers can therefore be identified by regular polling from peers. We observe this periodic traffic at a BitTorrent tracker in Figure 4.8. We expect that the periodic traffic used to maintain the Kademlia ring will show strong periodic behavior [MM02]; Kademlia is the DHT protocol behind several widely used peer-to-peer services such eDonkey. Gnutella keep-alives are also often periodic. Peer-to-peer protocols coordinate fairly frequently, typically every 20 minutes or so. Adware: A number of tools derive revenue from displaying advertisements. Although sometimes chosen by users, these tools are often installed without complete user consent. Such tools often report back to their masters, either to fetch new ads or to report back user activity. There are hundreds of adware programs; we observed 36 129 in our incomplete survey of USC traffic (Section 4.7.2). We also reproduced periodic behavior in a Gator adware component that appeared in a Kazaa version 2005. These tools often probe very few minutes to few hours. Some adware aggregates reports about the user and requests to update ads into user- initiated search, and we did not expect them to cause periodic traffic on their own. How- ever, in one case (the ISTbar from 180 Solutions), we see periodic traffic. We speculate that this is the result of an auto-update service. Sypware and keylogging: Significantly more malicious than adware, spyware sur- reptitiously monitors what a user is doing. The most benign spyware may collect demo- graphic information to support targeted advertising, while others may harvest passwords and bank accounts. We discuss detection of one keylogging application in Section 4.8.1. Botnet command and control: Finally, many botnet systems employ command and control systems. Because these are adversarial we expect them to be difficult to detect. However, bots do query their masters periodically for updates and new email lists for spam, either directly, or via a peer-to-peer protocol that may cause traffic at regular intervals. We have shown that we can identify example applications in several of these classes. This range of applications, and the growing use of periodic traffic for both good and ill suggests that our ability to remotely detect such applications is an important new tool for network surveillance, particularly to identify changes in the network (Section 4.8.1). The problems created by some of these applications are sufficient to support an industry of adware and spyware detectors and removers. Blacklists and host-based mal- ware scanners offer protection against a variety of malware, but they rely on external sources to maintain up-to-date information. The ability to identify the who, when and where of periodic behavior, can help classify traffic and hosts (by looking at periodic 130 Table 4.1: Variety of applications that show periodic behavior Category Examples Seen? Period User services WeatherEye yes 30-120 RSS News Feeds NewzCrawler yes 15-120 Web Counters Google Analytics yes 5-30 P2P Protocols Gnutella yes 2-120 Adware Gator, ISTbar yes 15-60 Spyware/Keyloggers SpyBuddy no N/A Botnet cmd&ctl (non-commercial) no N/A services) and identify new adware without the need of a new signature (by looking at periodic contact with ad servers). Additionally, detection of new types of periodic traffic may support “zero-day” detection of malware that has not yet been identified and placed on a blacklist. Finally, discovery of malware based on network traffic rather than with host-invasive software, allows network administrators to remotely survey user machines. 4.7.2 Prevalence of periodic applications in real networks In the previous section we looked at examples of when changes in periodic behavior are of interest, and demonstrated that we can pinpoint when these changes occur by looking at aggregate network traffic. In this section, we briefly discuss other applications which exhibit periodic behavior, discuss why changes in their periodic behavior are interesting and explore the prevalence of such applications in real networks. Table 4.1 outlines seven categories of applications which we have investigated and identified as participating in periodic communication. A further discussion of these categories and example applications can be found in our technical report [BHP09]. We have found numerous examples of applications in five of these seven categories in our four-day trace from USC [USC08b]. However, example applications don’t characterize how widespread hosts exhibiting applications with periodic behavior are. 131 To evaluate how prevalent such applications are, we looked for periodic communi- cation in our four-day trace. We compared this traffic with a widely referenced black- list of IP addresses that have been identified as serving malware such spyware and adware [YoS]. This list represents a candidate list of servers for questionable appli- cations in three of the categories: User services, Web Counters and Adware. We performed two levels of subsetting before applying multi-scale analysis. First, for each blacklisted address we took all flows to that destination. We found that many destinations had a mix of one-off connections as well as periodic behavior, so for a second level of subsetting, we selected subsets of source hosts in USC that transmit to that destination in groups of up to twenty. This step helps our validation because it improves sensitivity since periodic traffic that could be lost in the aggregate of thousands of hosts can show up more strongly in smaller aggregates. Finally, we used our detection method (Section 4.4.5) to identify subsets that have strong periodic traffic at periods of 60s or longer. For destinations that have some periodic traffic, we then count all IP addresses at USC that contact that destination. Although these steps could be automated and applied systematically to the network by a monitoring appliance, we are not suggesting they lay out a practical implementa- tion of such a service. However, this process could be inverted, so that each user (or potentially, each group of users on a LAN segment) monitor their traffic for changes in periodic behavior (as we describe in Section 4.8.1). Table 4.2 shows the results of this analysis. We found traffic to 181 of the blacklisted destinations from our campus. About 45,000 IP addresses at USC had traffic to some of these sites, nearly one-third of all active campus addresses. (The presence of dynamic addresses means that this count may not correspond exactly to 45,000 users, since one user may occupy multiple addresses, and vice versa.) 132 Table 4.2: Prevalence of malware with periodic behavior on our network. Blacklisted Unique IPs Group Destinations (users) active to anywhere – – 128,614 [100%] active to blacklisted 181 (100%) – – Non-periodic 120 (66%) n/a n/a Periodic 61 (44%) n/a n/a User Services 5 (3%) 22 [0%] Web Counters 15 (8%) 16,405 [13%] Ad Servers 36 (20%) 31,277 [24%] Other 5 (3%) 6 [0%] For the 61 blacklisted hosts that had periodic traffic, we manually examined the site and classified it in one of four categories (user services, web counters, ad servers, and other). We expected to see wide use of web counters and ad servers, since both support the advertising driven nature of current Internet. This observation strongly supports the presence of servers with periodic traffic and pervasive contact with those servers. While our approach cannot pick such traffic out of aggregate traffic (an area of planned future work), it could easily be used by a user to monitor their own computer for suspicious outgoing traffic. In addition, the large number of users accessing known adware sites (24% of active IP addresses) and the strong periodic nature of such traffic suggest that that is a promising target for future automated detection. 4.8 Applications With an understanding of the underlying fundamentals of detection of low-rate peri- odic behavior (Section 4.6), and evidence that periodic behavior occurs (Section 4.7), we next look at two applications, self-surveillance and network surveillance, and then 133 demonstrate that a variety of applications show low-rate periodicity and those applica- tions occur in real networks. 4.8.1 Self-surveillance: Identifying Changes in Periodic Behavior of a Host In the previous sections, we demonstrated that malware shows periodic behavior that can be identified, even in the face of noise. In this section we show how to identify dynamic changes in periodic behavior, and the time when these changes occur. We wish to identify changes in the periodic behavior of a given host to help users better understand activities on their computer. All operating systems and an increasing number of applications automatically poll for updates periodically. In addition, spyware and adware often report back to or request new information from the external master. In fact, application updates sometimes do not reveal the presence of automatic polling, or how much information they disclose. Moreover, malware may terminate automatic updates after infecting a host. Thus, users will want to know when automatic checks stop, or the addition of an automatic reporting service to their machines. We consider two examples where the change in periodic behavior is of interest, namely a change in OS update checks and a change in communication patterns created by the installation of a keylogger. Detecting operating system updates Security policies of all operating systems and many applications include automatic polling for updates with typical periods ranging from 30 minutes to a week. Just as network administrators wish to detect the presence of bad behavior, the absence of good behavior may also be of great interest. In addition, since automatic updates are often 134 disclosed only in the fine print of an end-user license agreement, users may also wish to know when a newly installed application performs regular update checks. To confirm we can see a change in update checks we monitored a lab machine run- ning the Fedora 10 distribution of Linux for three days (dataset: [USC08a]). By default, Fedora polls update servers every hour usingyum-updatesd. During the second day of the experiment, we disabled update checks at 2pm. The machine was lightly used for web browsing and e-mail over the three day period. Our system correctly identifies periodic behavior near 3600s in this test traffic. Our algorithm to place events in time finds a change in this periodic behavior between noon and 9pm, consistent with our known time of 2pm. We have not tuned our algorithm for temporal placement; a more sophisticated approach would most likely narrow this window, although precision is ultimately limited by the 1-hour period. To understand how our system can automatically identify absence of OS checks, Figure 4.12 shows traffic periodicity with and without OS update checks. At the 16th level of decomposition of Figure 4.12(b) we see OS update polling appear as energy at the base period of one hour (3600s, two adjacent 3% bins), and as harmonics at half, three fourths and one and a half times the frequency (6553s, 4% energy, 4800s, 4% and 2400s, 3% energy). Disabling updates, by contrast, shows no energy below the 14th level of decomposition (Figure 4.12(c)). Our algorithms detect periodicity automatically (Section 4.4.5) with an adaptive threshold. Figure 4.12(a) shows the numeric comparison corresponding to the whole 72 hour observation. As we can see, each of the periodicities that are visible in Fig- ure 4.12(b) are above the detection threshold in Figure 4.12(a). More importantly, because detection is numeric, it can be automated and more sensitive and consistent than human interpretation. 135 0 1 2 3 4 5 16384.0 8192.0 5461.3 4096.0 3276.8 2730.7 2340.6 2048 % energy Period in seconds (corresponding to node) Below detection threshold Detection (a) Automatic detection of OS updates: energy vs. detection threshold, all 72 hours. (b) Traffic with automatic polling for OS updates. (c) Traffic without automatic update polling. Figure 4.12: Visualization illustrating periodic behavior before and after removal of OS update checks. 136 This example demonstrates that our method successfully identifies a periodic behav- ior, and can also identify when that behavior starts and stops. While in some cases system administrators may be able to directly monitor OS update polling if they have administrative access to the machines in question, we suggest our approach could be useful when only network access is possible or desirable, for example, due to privacy reasons. In addition, monitoring periodic checks is robust to a potentially changing set of servers hosting OS updates. Detecting a keylogging application OS updates are an example of desirable periodic behavior. We next look at an example of a periodic behavior which is undesirable, namely a keylogger. Many keyloggers report on user activity at specified intervals, to inform their masters what they have learned and that they are still operational—we confirmed supervisor-configured reporting intervals in both SpyBuddy and Keyboard Guardian. To investigate if we can detect keylogger reporting we installed Keyboard Guardian on a dedicated Windows computer. We monitored all TCP flows from the test machine for a three day period while using the test machine for occasional e-mail and web brows- ing (dataset [USC08a]). On the second day of the experiment, we installed Keyboard Guardian at 4pm, and configured Keyboard Guardian to email reports every three hours. Our computer use compared to keylogger reporting resulted to an SNRof 0.1. Figures showing these results are omitted here due to space, but are available in our technical report [BHP09]. We ran our system on trace files collected from the test machine, which correctly identified not only the periodic behavior but also the frequency of the reporting period (10,800s, 92μHz). Additionally, the system identified the presence of the signal between 137 12pm and 9pm on the second day of our experiment, correctly bracketing the 4pm instal- lation time. This experiment shows we can detect low rate but regular traffic as well as changes in periodic communication associated with a known spyware tool. We anticipate that this approach could be used by a network administrator to monitor a large number of user machines, searching for malicious activity. Although centralized companies could do such monitoring more easily by modifying software individual machines, some com- panies (for example, Google) and most ISPs do not have this ability. While such network monitoring is possible today with centrally maintained blacklists, our approach detects behavioral changes that would apply to malware before the control site is blacklisted. After detection, network administrators could take action to further investigate, perhaps notifying the machine’s owner or subjecting that host to more invasive monitoring or quarantine. 4.8.2 Pre-filtering to assist network surveillance Self-surveillance allows an individual to track changes to their computer’s operation, even with partial control or understanding of the machine. Network administrators have the even more challenging job of supervising all machines on a network, often with no understanding or control over those hosts. We next show that detection of low-rate periodic behavior can assist in network-wide surveillance. Our goal in network pre-filtering is to identify applications of interest that show peri- odic behavior. We have identified particular applications such as OS updates and keylog- gers, which are important for self-surveillance (Section 4.8.1), but also other interesting applications (Section 4.7.1). Pre-filtering is useful because it provides a lightweight, privacy-sensitive mechanism to reduce the monitoring load for a network operator. We 138 discuss the cost below (Section 4.8.2), but we find that a large enterprise can easily pre- filter for tens of thousands of hosts on a single PC, because it requires minimal state and operates on partially aggregated traffic. Pre-filtering is privacy sensitive because it con- siders only communication patterns, not packet contents. We anticipate that our system could be deployed as a front-end to a more general response system, where responses may be automatic e-mail to a user (“your system appears to have changed”), or may trigger automatically a deep-packet inspector, or even manual investigation. We next evaluate how pre-filtering might work and how we validate it for the specific example of periodic traffic (RSS aggregators) Pre-filtering design Our pre-filtering system works by detecting periodicities in traffic and marking hosts that show changes in periodic behavior for further analysis. We follow the methodology from Section 4.4 with the following modifications. A network tap captures packets in and out of the target network, but since we care only about flows, an intelligent NIC can discard the body of all flows, reducing the traffic that must be processed. Because we are concerned about long-term periodicities, we count events into long, fixed-time bins. Finally, we aggregate traffic by target host. These choices allow pre-filtering to be quite efficient. Filtering to flows (rather than packets) at the NIC greatly reduces the packet rate, typically by a factor of 200:1 or more. Aggregation of traffic by target host means there is no need for flow separation. Aggregation by target and relatively long bin sizes mean memory required is quite small: 340 bytes per host (at one byte per bin per 256s for a day’s data), so tracking a large enterprise of 20,000 hosts requires 8MB of state. (Our current implementation uses 1s bins to support other analysis, incurring 8 extraneous levels of processing; a real 139 application would directly use 256s bins and avoid this cost.) Standard DRAM (not expensive SRAM) is sufficient, since updates are rare (every 4 minutes). As described in Section 4.4.7, analysis is lightweight and can run in real-time. Finally, the advantage of pre-filtering is that it avoids analysis of most hosts—we will show in our case it avoids analysis for two-thirds of the hosts in the monitored network. In doing it so, we miss very few suspicious hosts. We next describe how we evaluate pre-filtering accuracy for a specific application. Evaluation methodology Evaluation of pre-filtering requires that we identify what specific applications we wish to detect, that the application has periodic behavior, and that we find ground truth in some real network dataset. Therefore, the target application we select is identifying users of end-host RSS aggregators at our university. Most RSS aggregators poll regularly for updates at intervals of 60 minutes, providing long-term periodic behavior. Although not perfect, we use communication with a well-known RSS site as ground truth. Ground truth is the set of hosts which run one or more RSS aggregators. Privacy con- cerns require us to eliminate any application data in our input dataset, so we cannot use deep-packet inspection to confirm our classification. We therefore approximate ground truth by assuming that any host which contacts a popular RSS feed server is running an RSS aggregator. We choose the popular server FeedBurner as our feed server. (At the time of data collection, FeedBurner RSS was provided by dedicated servers.) We show later that other RSS servers appear in our data as false positives; reanalysis with larger target list is future work. We use a set of 500 randomly chosen hosts from our network to evaluate our pre- filtering capabilities. We run our methods on each host over all TCP traffic from the 140 Randomly chosen hosts 500 100% Detected as periodic (at 30 or 60 min.) 160 32% Contacted known feeder (true positive) 31 6% No contact w. known feeder (false positive) 129 26% manual evaluation 10 really aperiodic 2 non-RSS periodic 3 other RSS periodic 5 Not detected as periodic 339 68% No contact w. known feeder (true negative) 321 64% Contacted known feeder (false negative) 19 4% low SNR 10 different polling period 7 other 2 Table 4.3: Breakdown of pre-filtering for RSS feed aggregators. host during a 24-hour period of observation [USC08b]. In this aggregate traffic, we look for periodic behavior at 30 or 60 minutes since these are common RSS polling periods. Presence of such traffic defines a host as a detected RSS user. We then compare the hosts we eliminate and the hosts we detect against our ground truth. Evaluation of accuracy We next evaluate the accuracy of using our approach to detect RSS in aggregate traffic. Our goal is to rule out many of our randomly chosen hosts (a large number of true nega- tives), while missing no or few RSS hosts (few false negatives). Accidentally detecting non-RSS hosts is an overhead, but since such hosts can be ruled out with additional pro- cessing, such false positives are not a primary concern. In fact, our analysis is based on an aggregate view of all traffic out of a host. Since we know many non-RSS applications have periodic behavior, false positives are inevitable. Finally, we expect to understand the causes of any incorrect results. 141 Table 4.3 shows the results of our study. We rule out as true negatives 321 of the 500 hosts with pre-filtering, reducing by 64% the number of hosts that must be analyzed further. We identify 160 hosts for further inspection. Of these, only about 1 in 5 are RSS readers to our designated feed (true positives). We discusses reasons for the remaining false positives below. Understanding false negatives: Of greater concern are the false negatives, or the 19 hosts using RSS to our feed that we missed. We examined each case to understand the causes of these misses. The majority of the cases (10 of 19) were missed because of low signal to noise ratio. The SNRs range from 0.001 to 0.04 because these hosts have a large amount of HTTP traffic and we look at aggregate traffic to and from the host; these signal to noise ratiovalues are at the edge of detection ability (Section 4.6.2). It is possible that our system could detect these with a more careful algorithm. For example, user-generated HTTP traffic follows diurnal patters, while automated RSS is often continuous, so one could likely improve SNR by discarding high-traffic periods. We then found 7 hosts which had detectable periodic contact to our RSS feed, but with different periods than our target 30 or 60 minutes. These periods range from 20–90 minutes. These misses indicate the importance of understanding the target’s signature. It also suggests that an adversary could vary period as a countermeasure; we discuss countermeasures in Section 4.10. Understanding false positives: Since we operate a pre-filter, false positives repre- sent reduced performance and not an incorrect answer. However, we still wish to under- stand their causes to improve our algorithm. There are three potential sources of false positives: mis-identification of aperiodic behavior as periodic, identification of periodic behavior that is not RSS-related, and identification of periodic RSS traffic that is not 142 destined to our target. These cases represent failure of our mechanism, our mapping of that mechanism to this application, and our evaluation methodology, respectively. While we could not evaluate all false positives, we manually examined 10 of them. 1 In 80% of the hosts we found the presence of periodic behavior in the underlying data. For two hosts, although our automatic detection algorithm triggered, we were unable to identify specific periodic behavior through manual analysis. False-positives due to non-RSS periodic traffic occurs in three hosts. In two cases, this is due to non-RSS web traffic, possibly a self-refreshing web page. We could not identify the application for the third case. The second case is more challenging. These false positives are inevitable because we are looking at a very general behavior in traffic. Finally, in half the hosts we found periodic RSS traffic, but not to our target (instead to Yahoo and Google aggregators). This error represents a limitation of our definition of ground truth. Re-analysis with stronger ground truth is future work; because we lack packet contents, we cannot revise our analysis for our current dataset. 4.9 Advantages of Pruning As described in our methodology, while we normally expand each node in the filter tree, we reduce the number of filtering operations we perform by pruning branches which appear “uninteresting” (Section 4.4.5). We define branches as uninteresting for two reasons. First, we prune if further division of the frequency domain gives us unnecessary resolution, such as resolution higher than our sampling rate. Second, we prune if the energy in the frequency band is lower than our detection threshold. In this section, 1 We plan to complete this analysis in the very near future. 143 Figure 4.13: Dipiction of pruning. F=Pruned due to frequency resolution. E=Pruned due to low energy, P=Parent already pruned. 31000 31200 31400 31600 31800 32000 32200 32400 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Number of filtering steps pruned SNR we demonstrate the advantages of performing pruning by quantifying the reduction of filtering steps we perform. Before demonstrating the advantages of pruning when filtering real world examples, we first show a simple example using an artificial signal. Figure 4.13 shows an 8s arti- ficial signal with no background noise. Frequency bands marked with F or E are bands 144 that we stop filtering due to our frequency or energy thresholds respectively. Frequency bands marked with P are frequency bands we never reach because their branch was pruned further up the tree. 4.10 Robustness In previous sections we demonstrated that we can detect and identify changes in periodic behavior. In this section we briefly look at the robustness of our scheme, and discuss it sensitivity to parameter settings. 4.10.1 Evasion As with most security protocols, detection of low-rate periodicity can be evaded by a determined attacker. Evasion can be accomplished in one of two ways. First, as shown in Section 4.6.2, our detection is sensitive to noise. Decreasing the SNR to lower than 5% effectively hides all periodic behavior from our current imple- mentation. Thus, a determined attacker can decrease the frequency of his traffic or generate spurious other traffic. Second, an application can evade our detection scheme by adding jitter to its periodic behavior. By increasing jitter, the energy of the signal is diffused. As a simple example, we can vary the period of an artificial signal and study the effect this jitter has on the coefficient energy in a target frequency range. Figure 4.14 shows an artificial signal with a 128s period as the period varies up to 50%, based on an observation period of 2-hours. In this case, jitter of more than 15% is relatively effective at hiding the signal. Of course, the countermeasure to this behavior is to employ a longer observation period. 145 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 Percent of Original (no jitter) Coefficient Energy Jitter as a Percentage of Period Figure 4.14: Effect of jitter on coefficient energy. Original period 128s. 4.10.2 Parameter sensitivity As described in Section 4.10.2, we have three main parameter settings to set when per- forming our full decomposition: the length of our observation period, how large a time- bin we use, and how many levels of decomposition we perform. To increase our ability to find longer periods, we must increase the time-bin size, effectively performing a low- pass filter. For longer periods, further levels of decomposition might be necessary to see the low frequencies, and our observation period must be sufficiently long to observe several periods. A detailed study of these effects is an area of future work. However, we can describe the trade-offs: a longer bin size allows detection of longer periods, but reduces the temporal sensitivity and may increase nose (as more traffic is grouped into individual bins). 146 4.11 Summary In this work we have shown that low-rate periodicity is common to several broad classes, both good (OS updates), bad (keyloggers and malware), and ugly (adware), and that these applications are widely deployed on public networks. We have explored a wavelet- based approach to identify such periodic behavior, and begun to explore the sensitiv- ity and robustness of this approach. Promising applications of such analysis are self- surveillance, as a user watches his or her own traffic to detect unexpected changes, and pre-filtering, where a network operator deploys the scheme to reduce the number of hosts to carefully examine for possible malware infection. This third and last study we present to support our thesis covers an entirely different area of network reconnaissance than either our service discovery study or our P2P study. In this chapter we looked at a variety of applications which share a network behavior and show how we can target this general group of applications with passive and blind techniques. Again, we support our claim with our success at using blind techniques to perform network reconnaissance without deep-packet inspection. Beyond offering an example of performing network reconnaissance using a net- work behavior, we demonstrated again, the process of choosing a network behavior and building reconnaissance application based on the behavior. While previously we demonstrated targeting a specific application set, in this chapter, we explored looking at a variety of applications united by a shared network behavior. 147 Chapter 5 Conclusions and Future Work We conclude this dissertation with a brief discussion on future directions for our work and final conclusions of our contribution. 5.1 Future Work All three of our studies give compelling results to support our thesis, but given more time, each study has several parts of future work which would help further explore and validate our results. In this section, we briefly discuss short-term future work for our three studies. In Chapter 2 we detail a comparison study of active versus passive methods. We compared the two methods within a university environment. Future work in this area would include expanding the study to measure service discovery in other environments, such as residential networks. Additionally, due to privacy concerns, we limited the scope of our active scans. Further work in comparing passive and active methods in service discovery would include comparing passive methods to higher frequency active scanning. Our current study looks at scans done every twelve hours, but an increase to every six hours may significantly improve active service discovery’s ability to map dynamic networks such as DHCP networks. Recent work by Heidemann et al. [HPG + 08] and Cai et al. [CH09] use active scan- ning to understand internet address usage and map out network demographics. By adding what we have learned about passive service discovery to these studies which 148 currently use active scanning we can strengthen the results in the same way combining active and passive methods strengthens service discovery. In our second study in Chapter 3—identifying peer-to-peer network traffic—we base our detection on several network behaviors, and demonstrate that these behaviors are prevalent in Gnutella and traditional BitTorrent. Additional network behaviors are also potential candidates for peer-to-peer identification. For example, in tracker-less BitTor- rent, peers maintain a distributed hash table (DHT) to perform tracking. Communication performed by peers updating the DHT may be a unique enough network behavior to be used to identify trackerless BitTorrent and other peer-to-peer protocols which use decen- tralized tracking mechanisms. Additionally, we can use our methods for identifying peer-to-peer communication to continue measuring current usage of P2P protocols. Since our measurement study in 2006, we expect the usage and variety of peer-to-peer protocols has changed and a new measurement study could offer insight into the trends and goals of current peer-to-peer protocols. Beyond monitoring trends in file sharing peer-to-peer, future work includes applying our techniques to other protocols outside of file sharing applications, such as command and control peer networks used to organize and maintain botnets. Traffic from such networks are designed to be covert and is unlikely to be found using payload information since the protocols and packet content are both hidden and unstandardized. Focusing on inherent behaviors of the nature of peer communication provides a stronger detection basis. In the last chapter (Chapter 4), we present our study of low-rate periodic network communication. Our methods currently use a configurable window of network data to determine if periodic communication is present. To build an on-line and continuous 149 surveillance or detection system the next step is to develop a windowed algorithm so that network data can be continuously analyzed, and produce results at regular intervals. In Chapter 4 we describe two applications for detection of low-rate periodic com- munication. A third application for the detection of low-rate periodic communication is to apply periodic detection to the problem of identifying users. In today’s networks, users use a large range of devices from desktops to mobile devices such as laptops and phones. Identifying regular communication which is unique to a user, such as the set of RSS feeds a user subscribes to, allows for tracking a user. Tracking users can be used for entrepreneurial purposes such as targeting advertising, but in the realm of network reconnaissance user tracking enables better network provisioning and planning. Under- standing which users use the provided devices and network services, gives information about where upgrades and adjustments should occur. For example, knowing that the same users who use the student labs on a campus also use the wireless network across the street, clues central IT to allow remote access to the labs from the wireless network. 5.2 Broad Future Work In the previous section we discussed short-term future work for each of our three studies. In this section, we discuss directions for future long-term work based on this body of work. In our last two chapters (Chapters 3 and 4), we demonstrated developing a recon- naissance application based on network behavior. We foresee this process being fun- damental to future work in network reconnaissance, as the targets of reconnaissance become more varied. In Chapter 4 we not only demonstrated the process of developing a reconnaissance application but also demonstrated the potential of identifying a network behavior which 150 is shared by multiple applications. We anticipate there are many network behaviors which if studied can bring out a range of applications which are beneficial to identify but were not previously considered in the scope of network reconnaissance. For example, flow duration is a new network behavior to study. Statistical anal- ysis of flow duration may reveal unique patterns for specific applications. Applica- tions which could be targeted by looking at flow duration include Tor—an onion routing application—and communication applications such as IRC, ssh and voice over IP (V oIP) tunneled through TCP. Tor exhibits potentially unique flow duration behavior because Tor uses the same route for connections which happen within roughly the same ten min- utes. Other communication applications often exhibit unique flow duration due to the long-term duration of the connections. Building on the process for designing techniques furthers our work by adding to the variety of methods available for reconnaissance. However, we advocate moving beyond just developing a collection of methods to combining reconnaissance techniques into an automated system for identifying important changes on a network. Such a system will track and record activities for each host on a network. By understanding how each host on a network is used, hosts can then be classified into clusters of similar type machines. For example, a possible classification scheme would separate servers from machines which are predominantly user desktops. Once a sufficiently detailed scheme is determined for a network, identifying hosts which over time deviate from their original classification, gives network operators a list of machines which may need additional attention. A change classification signifies either that there has been an intentional change in the intended use for the host or the host has experienced a break-in which results in new behavior. If the host’s usage is intentionally changed, flagging the host allows network operators to assess if the new 151 use of the host affects network provisioning and planning. In the case of a break-in, flagging the change as soon as possible enables corrective action. Our research lays the groundwork for a more in-depth approach to network recon- naissance. Traditional reconnaissance focuses on tracking and providing for network services, but with the added information passive techniques provide, our research demonstrates the benefits of tracking all hosts and various applications—-not just ser- vices. Continued work in this area is increasingly important as the complexity and diversity of network applications grows. 5.3 Conclusions Composite networks are networks which consist of many individually administered hosts and subnets. Such networks are common place on today’s Internet, since large organizations can more easily address the diverse needs of users by splitting administra- tion. While control and security of individual machines varies in composite networks, often the network as a whole is protected and maintained by a central IT group. These central IT groups are tasked with assessing network vulnerabilities, ensuring compliance with site-wide policies and tracking services which affect network provisioning. With- out direct control over end-host machines, central IT must use network reconnaissance to perform these tasks. Our work systematically analyzes solutions to network reconnaissance. We first define a solution space for network reconnaissance using the three main attributes of any reconnaissance method: the target type of the reconnaissance, the method used to collect data, and the level of data required for analysis. Target types range from a specific target—such as a specific application—to a general target—such as a whole class of 152 applications which share a behavior. The collection method for the data can be active– where specially crafted probes are sent to machines and the responses are analyzed— to passive collection where network traffic is simply observed. The level of data required ranges from deep-packet inspection—where application payload data is analyzed—to blind techniques—where only packets headers or packet timing information is needed. We assert that passive methods enable network reconnaissance without the need for deep-packet inspection. Though such techniques may require careful design—since they do not rely on straightforward payload signature matching, or direct active probing— such techniques provide non-invasive techniques which preserve user-privacy. We sup- port our thesis in two ways. First we give three solid examples of successful passive and blind reconnaissance techniques which cover different areas of the 3D solution space. Second, from the evaluation of these techniques we can further understand how pas- sive techniques would successfully perform in reconnaissance areas beyond our three examples. In the first chapter, we demonstrated how passive and blind techniques enable a cen- tral part of network reconnaissance—service discovery. Service discovery is vital to protecting existing open services, and tracking and provisioning for new service trends. We quantitatively compared passive service discovery to the traditional active service discovery and analyzed in detail the trade-offs of using passive techniques. Passive methods can provide detailed information which active methods cannot, such as deter- mining popularity of a service. Additionally, passive methods quickly find popular ser- vices even when services are protected by firewalls. Along with the benefits of passive service discovery, we also looked at the trade- offs of only using passive techniques—namely data collection must be performed for extended periods of time, and inactive services may not ever be found. Understanding 153 and quantifying the tradeoffs of using passive techniques in service discovery, allows us to intelligently design passive techniques to address other areas of reconnaissance outside of service discovery. The success of identifying servers by observing traffic, implies we can also perform reconnaissance on clients as well as peers, regardless of the network application. In Chapter 3 we give an explicit example of using passive and blind techniques to successfully identify hosts participating in peer-to-peer communication. This example targets a specific class of applications, and is tailored to identify behavior specific to this class. With this example, we demonstrate that with careful design, we can target a specific application—such as the peer-to-peer file sharing application BitTorrent— and with some adjustment to our techniques we can broaden our target to include other peer-to-peer applications such as Gnutella. Again we support our thesis by demon- strating another successful set of passive and blind techniques which perform important reconnaissance— the identification of a wide-spread application which affects network security and provisioning. In Chapter 3 we demonstrated not just a technique for identifying P2P but also demonstrate a process for designing passive and blind reconnaissance techniques. We demonstrate that by first identifying inherent network behaviors of a target application we can then design a passive and blind technique to identify that application. By follow- ing this process, any number of specific applications can be targeted by a passive and blind reconnaissance technique. In Chapter 4 we demonstrated how targeting a network behavior passive methods can identify a variety of applications which share a common communication pattern. By intelligently choosing the network behavior to study, we can build a range of surveillance applications based on passive methods. We demonstrated that by studying low-rate 154 periodic communication we can use passive methods to perform self-surveillance and perform pre-filtering to aid in application detection. By targeting a network behavior, we demonstrated not only interesting and useful applications for identifying low-rate periodic network behavior, but also demonstrate a new paradigm for network reconnaissance which is only possible through passive methods—using network behavior to perform reconnaissance. By choosing an inherent behavior, avoidance of detection becomes more difficult. User privacy can be main- tained by choosing a behavior at the network level, rather than some pattern within packet pay-load. By following this emphasis on inherent behavior at the network level, passive techniques can be used to target any number of general classes of applications. Our three studies explore three separate aspects of the 3D network solution space. We compare passive techniques to active techniques, and then show two differ- ent approaches to designing reconnaissance applications using passive techniques— targeting an application and targeting a class of applications which share a common behavior. By choosing the network behavior properly, a wide range of reconnaissance applications can be built with an emphasis on user privacy. Through the process of basing a reconnaissance method on a network behavior, we demonstrate the validity of our thesis—passive techniques enable network reconnaissance without the need for deep-packet inspection. 155 Bibliography [BC98] Paul Barford and Mark Crovella. Generating representative web workloads for network and server performance evaluation. In Proceedings of the SIG- METRICS: Joint International Conference on Measurement and Modeling of Computer Systems, pages 151–160, 1998. [BDGM02] Mayank Bawa, Hrishikesh Deshpande, and Hector Garcia-Molina. Tran- sience of peers and streaming media. In Proceedings of the ACM HotNets, pages 107–112, Princeton, NJ, USA, October 2002. [BEK + 00] Don Box, David Ehnebuske, Gopal Kakivaya, Andrew Layman, Noah Mendelsohn, Henrik F. Nielsen, Satish Thatte, and Dave Winer. Sim- ple object access protocol (soap) 1.1. Technical Report NOTE-SOAP- 20000508, W3C, May 2000. [BHP06] Genevieve Bartlett, John Heidemann, and Christos Papadopoulos. Inherent behaviors for on-line detection of peer-to-peer file sharing. Technical Report ISI-TR-627, ISI, 2006. [BHP07a] Genevieve Bartlett, John Heidemann, and Christos Papadopoulos. Inherent behaviors for on-line detection of peer-to-peer file sharing. In Proceedings of the 10th IEEE Global Internet, pages 55–60, Anchorage, Alaska, USA, May 2007. IEEE. An extended version is ISI-TR-2006-627. [BHP07b] Genevieve Bartlett, John Heidemann, and Christos Papadopoulos. Under- standing passive and active service discovery. In Proceedings of the ACM Internet Measurement Conference, pages 57–70, San Diego, California, USA, October 2007. ACM. [BHP07c] Genevieve Bartlett, John Heidemann, and Christos Papadopoulos. Under- standing passive and active service discovery (extended). Technical Report ISI-TR-2007-642, USC/Information Sciences Institute, May 2007. To appear, ACM Internet Measurement Conference 2007. 156 [BHP09] Genevieve Bartlett, John Heidemann, and Christos Papadopoulos. Using low-rate flow periodicities in anomaly detection. Technical Report ISI-TR- 661, USC/Information Sciences Institute, July 2009. [BHPP07] Genevieve Bartlett, John Heidemann, Christos Papadopoulos, and James Pepin. Estimating p2p traffic volume at usc. Technical Report ISI-TR-2007- 645, USC/Information Sciences Institute, June 2007. [BKPR02] Paul Barford, Jeffery Kline, David Plonka, and Amos Ron. A signal anal- ysis of network traffic anomalies. In Proceedings of the ACM SIGCOMM Workshop on Mining network data Mine Net, pages 173–185, Marseilles, France, November 2002. [btw] http://www.bittorrent.org/. [CFEK06] Kenjiro Cho, Kensuke Fukuda, Hiroshi Esaki, and Akira Kato. The impact and implications of the growth in residential user-to-user traffic. In Proceed- ings of the ACM SIGCOMM Conference, pages 207–218. ACM, September 2006. [CH09] Xue Cai and John Heidemann. Understanding address usage in the visi- ble internet. Technical Report ISI-TR-2009-656, USC/Information Sciences Institute, February 2009. [CKsT02] Chen-mou Cheng, H. T. Kung, and Koan sin Tan. Use off spectral analysis in defense against dos attacks. In Proceedings of the IEEE GLOBECOM, pages 2143–2148, Taipei, Taiwan, 2002. [CM06] Fivos Constantinou and Panayiotis Mavrommatis. Identifying known and unknown peer-to-peer traffic. In IEEE International Symposium on Network Computing and Applications (NCA), pages 93–102, Cambridge, MA, USA, July 2006. [CR06] Michael P. Collins and Michael K. Reiter. Finding peer-to-peer file-sharing using coarse network behaviors. In Proceedings of the European Sympo- sium On Research In Computer Security, pages 1–17, Hamburg, Germany, September 2006. [DG00] Nick Duffield and Matthias Grossglauser. Trajectory sampling for direct traf- fic observation. In Proceedings of the ACM SIGCOMM Conference, pages 179–191, Stockholm, Sweeden, August 2000. ACM. [D ¨ O01] Burak Dayioglu and Attila ¨ Ozgit. Use of passive network mapping to enhance signature quality of misuse network intrusion detection systems. In Proceedings of the Sixteenth International Symposium on Computer and Information Sciences, 2001. 157 [FGHW99] Anja Feldmann, Anna C. Gilbert, Polly Huang, and Walter Willinger. Dynamics of IP traffic: A study of the role of variability and the impact of control. In ACM SIGCOMM Conference, pages 301–313, August 1999. [GDS + 03] Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry M. Levy, and John Zahorjan. Measurement, modelling, and analysis of a peer-to-peer file-sharing workload. In Proceedings of the 19th Sympo- sium on Operating Systems Principles, pages 314–329, Bolton Landing, NY , USA, October 2003. ACM. [GMT05] Yu Gu, Andrew McCallum, and Don Towsley. Detecting anomalies in net- work traffic using maximum entropy estimation. In Proceedings of the ACM SIGCOMM Workshop on Internet Measurement (IMC), pages 345– 350, October 2005. [HBP + 05] Alefiya Hussain, Genevieve Bartlett, Yuri Pryadkin, John Heidemann, Christos Papadopoulos, and Joseph Bannister. Experiences with a contin- uous network tracing infrastructure. In Proceedings of the ACM SIGCOMM Workshop on Mining network data Mine Net, pages 185–190, Philadelphia, PA, USA, August 2005. [HHP03] Alefiya Hussain, John Heidemann, and Christos Papadopoulos. A frame- work for classifying denial of service attacks. In Proceedings of the ACM SIGCOMM Conference, pages 99–112, Karlsruhe, Germany, 2003. [HHP06] Alefiya Hussain, John Heidemann, and Christos Papadopoulos. Identifica- tion of repeated denial of service attacks. In Proceedings of the IEEE Info- com, pages 1–15, Barcelona, Spain, April 2006. IEEE. [HPG + 08] John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopoulos, Genevieve Bartlett, and Joseph Bannister. Census and survey of the visi- ble internet. In Proceedings of the ACM Internet Measurement Conference, pages 169–182, V ouliagmeni, Greece, October 2008. ACM. [HPH + 09] Xinming He, Christos Papadopoulos, John Heidemann, Urbashi Mitra, and Usman Riaz. Remote detection of bottleneck links using spectral and statis- tical methods. Computer Networks, 53(3):279–298, February 2009. [HPHH04] Xinming He, Christos Papadopoulos, John Heidemann, and Alefiya Hus- sain. Spectral characteristics of saturated links. Technical Report USC-CSD- TR-827, University of Southern California Computer Science Department, June 2004. [Hus] Alefiya Naveed Hussain. Measurement and Spectral Analysis of Denial of Service Attacks. PhD thesis, U. of Southern California, Comp. Sci. Dept. 158 [i2h] http://en.wikipedia.org/wiki/i2hub. [KBB + 05] Thomas Karagiannis, Andre Broido, Nevil Brownlee, kc claffy, and Michalis Faloutsos. Is P2P dying or just hiding? In Proceedings of the IEEE Global Internet, pages 1532–1538, January 2005. [KBFk04] Thomas Karagiannis, Andre Broido, Michalis Faloutsos, and kc claffy. Transport layer identification of p2p traffic. In Proceedings of the ACM SIGCOMM Workshop on Internet Measurement (IMC), pages 121–134, Taormina, Sicily, Italy, October 2004. [KPF05] Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos. BLINC: Multilevel traffic classification in the dark. In Proceedings of the ACM SIGCOMM Conference, pages 229–240, Philadelphia, PA, USA, August 2005. ACM. [Krz03] Martin Krzywinski. Port knocking: Network authentication across closed ports. SysAdmin Magazine, 12(6):12–17, June 2003. [LH02] Kun-Chan Lan and John Heidemann. Rapid model parameteration from traffic measurement. ACM Transactions on Modeling and Computer Simu- lations, 12(3):201–229, July 2002. [MHK04] Antonio Magnaghi, Takeo Hamada, and Tsuneo Katsuyama. A Wavelet- Based Framework for Proactive Detection of Network Misconfigurations. In Proceedings of ACM workshop on Network Troubleshooting, August 2004. [MLM04] A. De Montigny-Leboeuf and F. Massicotte. Passive network discovery for real time situation awareness. In Proceedings of the The RTO Information Systems Technology Panel (IST) Symposium on Adaptive Defence in Unclas- sified Networks, pages 288–300, November 2004. [MM02] Petar Maymounkov and David Mazi` eres. Kademlia: A peer-to-peer infor- mation system based on the XOR metric. In Proceedings of the International Workshop on Peer-to-Peer Systems, Cambridge, MA, USA, March 2002. [MOHP06] Urbashi Mitra, Antonio Ortega, John Heidemann, and Christos Papadopou- los. Detecting and identifying malware: A new signal processing goal. IEEE Signal Processing Magazine, 23(5):107–111, September 2006. [MW06] Alok Madhukar and Carey Williamson. A longitudinal study of P2P traffic classification. In Proceedings of the International Symposium on Model- ing, Analysis and Simulation of Computer and Telecommunication Systems, pages 179–188, Washington, DC, USA, September 2006. IEEE Computer Society Press. 159 [nes] Nessus vulnerability scanner. http://www.nessus.org. [nma] Nmap (“Network Mapper”). http://insecure.org/nmap/. [NSA + 08] George Nychis, Vyas Sekar, David Andersen, Hyong Kim, and Hui Zhang. An empirical evaluation of entropy-based traffic anomaly detection. In 8th ACM SIGCOMM Workshop on Internet Measurement (IMC), pages 151– 156, October 2008. [Pax99] Vern Paxson. Bro: a system for detecting network intruders in real-time. Computer Networks (Amsterdam, Netherlands: 1999), 31(23–24):2435– 2463, 1999. [PCJ + 02] Craig Partridge, David Cousins, Alden W. Jackson, Rajesh Krishnan, Tushar Saxena, and W. Timothy Strayer. Using signal processing to analyze wireless data traffic. In 1st ACM Workshop on Wireless Security, pages 67–76, 2002. [RSGC + 02] J. Rosenberg, H. Schulzrinne, A. Johnston G. Camarillo, J. Peterson, M. Handley R. Sparks, and E. Schooler. SIP: Session initiation protocol. RFC 3261, Internet Request For Comments, June 2002. [RSR06] Amir Rasti, Daniel Stutzbach, and Reza Rejaie. On the long-term evolution of the two-tier gnutella overlay. In Proceedings of the IEEE Infocom, pages 1–6, April 2006. [Sch05] D. Schweitzer. Two sides of vulnerability scanning. http://www.computerworld.com/, February 2005. [SHJO01] F. Donelson Smith, Felix Hernandez, Kevin Jeffay, and David Ott. What TCP/IP protocol headers can tell us about the web. In Proceedings of the ACM SIGMETRICS, pages 245–256, Cambridge, MA, USA, June 2001. ACM. [sno] Snort. http://www.snort.org/. [SSW04] Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. Accurate, scalable in-network identification of P2P traffic using application signatures. In Pro- ceedings of the International World Wide Web Conference, pages 512–521, 2004. [Sun88] Sun Microsystems. RPC: remote procedure call protocol specification ver- sion 2. RFC 1057, Internet Request For Comments, June 1988. [The05] The PREDICT Program. Predict: Protected repository for the defense of infrastructure against cyber-threats. http://www.predict.org, Jan- uary 2005. 160 [TM02] D.S. Taubman and M. W. Marcellin. JPEG2000: image compression fun- damentals, standards, and practice. Kluwer Academic Publishers, Boston, MA USA, 2002. [USC05] USC/LANDER project. P2P 2005 dataset, PREDICT ID USC-LANDER/ p2p_detection-200508, August 2005. [USC06a] USC/LANDER project. Active/Passive datasets, PREDICT ID USC-LANDER/usc_lander_passive_active-20060919, September 2006. [USC06b] USC/LANDER project. P2P 2006 14 hour dataset, PREDICT ID USC-LANDER/14_hour_p2p_study-20061214, Decemeber 2006. [USC06c] USC/LANDER project. P2P 2006 dataset, PREDICT ID USC-LANDER/ p2p_detection-20061003, October 2006. [USC08a] USC/LANDER project. Specialized TCP flow traces, PREDICT ID USC-LANDER/specialized_tcp_flow_usc-20081209, Dece- meber 2008. [USC08b] USC/LANDER project. TCP flow traces, PREDICT ID USC-LANDER/ tcp_flow_usc-20081209, Decemeber 2008. [VK95] Martin Vetterli and Jelena Kovaˇ cevic. Wavelets and Subband Coding. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1995. [WDHP06] Arno Wagner, Thomas Dubendorfer, Lukas Hammerle, and Bernhard Plat- tner. Flow-based identification of p2p heavy-hitters. In Proceedings of the ICISP International Conference on Internet Surveillance and Protection, page 15. IEEE Computer Society Press, August 2006. [WLZ06] Seth Webster, Richard Lippmann, and Marc Zissman. Experience using active and passive mapping for network situational awareness. In Proceed- ings of the 5th IEEE International Symposium on Network Computing and Applications, pages 19–26, July 2006. [YoS] YoSponge. http://www.geocities.com/yosponge/blockips.txt. Last upddated Jul 2008. 161
Abstract (if available)
Abstract
Blind techniques to detect network applications--approaches that do not consider packet contents--are increasingly desirable because they have fewer legal and privacy concerns, and they can be robust to application changes and intentional cloaking. This body of work asserts that passive methods enable network reconnaissance without the need for deep-packet inspection. We demonstrate the validity of our assertion by first demonstrating the effectiveness of passive techniques and then presenting two separate passive techniques which aid in network reconnaissance.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Improving network reliability using a formal definition of the Internet core
PDF
Global analysis and modeling on decentralized Internet
PDF
Robust routing and energy management in wireless sensor networks
PDF
Design of cost-efficient multi-sensor collaboration in wireless sensor networks
PDF
Detecting periodic patterns in internet traffic with spectral and statistical methods
PDF
Improving network security through collaborative sharing
PDF
Enabling efficient service enumeration through smart selection of measurements
PDF
Distributed edge and contour line detection for environmental monitoring with wireless sensor networks
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
Measuring the impact of CDN design decisions
PDF
Leveraging programmability and machine learning for distributed network management to improve security and performance
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Balancing security and performance of network request-response protocols
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Improving efficiency, privacy and robustness for crowd‐sensing applications
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
Cloud-enabled mobile sensing systems
PDF
Benchmarking interactive social networking actions
PDF
Supporting faithful and safe live malware analysis
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
Asset Metadata
Creator
Bartlett, Genevieve Elizabeth
(author)
Core Title
Network reconnaissance using blind techniques
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/09/2010
Defense Date
03/08/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
active network measurements,OAI-PMH Harvest,passive network measurements,signal processing,traffic classification,user privacy,wavelets
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Heidemann, John (
committee chair
), Govindan, Ramesh (
committee member
), Ortega, Antonio (
committee member
), Papadopoulos, Christos (
committee member
)
Creator Email
gbartlet@usc.edu,genevieve.bartlett@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3357
Unique identifier
UC1472721
Identifier
etd-Bartlett-3955 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-379672 (legacy record id),usctheses-m3357 (legacy record id)
Legacy Identifier
etd-Bartlett-3955.pdf
Dmrecord
379672
Document Type
Dissertation
Rights
Bartlett, Genevieve Elizabeth
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
active network measurements
passive network measurements
signal processing
traffic classification
user privacy
wavelets