Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Enabling efficient service enumeration through smart selection of measurements
(USC Thesis Other)
Enabling efficient service enumeration through smart selection of measurements
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ENABLING EFFICIENT SERVICE ENUMERATION THROUGH SMART SELECTION OF MEASUREMENTS by Xun Fan A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2015 Copyright 2015 Xun Fan Dedication To my dear parents, wife and lovely son. ii Acknowledgments It is a long journey towards the completion of my Ph.D study. During these several years, there are many people who helped and inspired me, deserving my sincere gratefulness. First of all, I want to present my deepest appreciation to my advisor, Prof. John Heidemann, for his guidance, support, encouragement and patience to me throughout my entire Ph.D. study. He is nice and considerate, making me feel warm when dicul- ties of life appear. He is also a serious researcher, always being rigorous in study and pursuing the highest standard in research. His guidance has helped me to develop many important skills and abilities, including critical thinking, problem-solving, writing, pre- sentation and many more that I haven’t realized but will realize in the rest of my life. I believe the benefits of these skills will keep me company in my future career and life. I would also like to specially thank my two co-advisors, Prof. Ramesh Govindan and Prof. Ethan Katz-Bassett. As world class researchers, they help me to broaden my view to the problems and think from dierent angles. I won’t make it to the current achievements without their help and mentorship. I want to thank Prof. Konstantinos Psounis, Prof. Minlan Yu, Prof. William G.J. Halfond for their service on my qualifying exam and dissertation committee. I would like to thank my fellow friends and workmates in USC and ISI who support me and bring me joyfulness: Yuri Pradkin, Unkyu Park, Chengjie Zhang, Xue Cai, Lin iii Quan, Zi Hu, Calvin Ardi, Liang Zhu, Hang Guo, Abdul Qadeer, Abdulla Alwabel, Matt Calder, Hao Shi, Xiyue Deng, Lihang Zhao, Weiwei Chen, and many others. I will also thank Joe Kemp, ALba Regalado and Jeanine Yamazaki for their helps on administrative tasks in ISI. iv Table of Contents Dedication ii Acknowledgments iii List of Tables viii List of Figures x Abstract xiii 1 Introduction 1 1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Demonstrating the Thesis Statement . . . . . . . . . . . . . . . . . . . 7 1.2.1 Our Studies Support the Thesis . . . . . . . . . . . . . . . . . 8 1.2.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3 Additional Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Structure of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 16 2 Selecting Representative IP Addresses for Internet Topology Studies 17 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.1 Hitlist Requirements . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.2 Background: Internet censuses . . . . . . . . . . . . . . . . . . 23 2.2.3 Prediction Method . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.4 Gone-Dark Blocks . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.5 Hitlist Description . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.1 Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.3 Stability and Inertia . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.4 Eects of Probe Frequency . . . . . . . . . . . . . . . . . . . . 41 2.3.5 Eects on Other Research . . . . . . . . . . . . . . . . . . . . 43 v 2.3.6 Cost of Hitlist Generation . . . . . . . . . . . . . . . . . . . . 48 2.4 Other Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.5 Sharing Hitlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.5.1 Benefits of Sharing Hitlists . . . . . . . . . . . . . . . . . . . . 52 2.5.2 Costs of Sharing Histlists . . . . . . . . . . . . . . . . . . . . . 53 2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3 Evaluating Anycast in the Domain Name System 58 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Background of the Domain Name System . . . . . . . . . . . . . . . . 63 3.3 A Taxonomy of Anycast Configurations . . . . . . . . . . . . . . . . . 66 3.4 Methods for Anycast Discovery . . . . . . . . . . . . . . . . . . . . . 69 3.4.1 CHAOS Queries . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.2 IN Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.5.2 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5.3 Precision for CHAOS Queries . . . . . . . . . . . . . . . . . . 81 3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.6.1 Anomalous Anycast Configurations . . . . . . . . . . . . . . . 83 3.6.2 Characterizing Anycast in Other Roots . . . . . . . . . . . . . 86 3.6.3 Anycast Use in Top-Level Domains . . . . . . . . . . . . . . . 89 3.6.4 How Many Anycast Providers Exist? . . . . . . . . . . . . . . 91 3.7 Security Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.7.1 Limiting diagnosis to the provider . . . . . . . . . . . . . . . . 93 3.7.2 Discouraging Masquerader spoofing . . . . . . . . . . . . . . . 97 3.7.3 Relationship to DNSsec . . . . . . . . . . . . . . . . . . . . . 97 3.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4 Assessing Anity Between Users and CDN Sites 102 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Background: Measuring CDNs . . . . . . . . . . . . . . . . . . . . . . 106 4.2.1 DNS Redirection Basics . . . . . . . . . . . . . . . . . . . . . 106 4.2.2 Enumerating CDN Front-End Servers . . . . . . . . . . . . . . 108 4.2.3 Geolocating Front-Ends . . . . . . . . . . . . . . . . . . . . . 109 4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3.1 Clustering Front-Ends . . . . . . . . . . . . . . . . . . . . . . 109 4.3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.4.1 Accuracy of Front-End Clustering . . . . . . . . . . . . . . . . 122 vi 4.4.2 Does Discarding Non-clustered IPs Aect Our Results? . . . . 130 4.4.3 Are Prefixes Mapped Dierently When Accessing Other TLDs of Google Search? . . . . . . . . . . . . . . . . . . . . . . . . 131 4.4.4 How Often To Probe? . . . . . . . . . . . . . . . . . . . . . . 133 4.4.5 Accuracy of Geolocation With Open Resolvers . . . . . . . . . 135 4.5 Dynamics of User Redirection . . . . . . . . . . . . . . . . . . . . . . 137 4.5.1 Are user prefixes mapped to dierent FE Clusters? . . . . . . . 137 4.5.2 Duration of User-to-FE Mapping . . . . . . . . . . . . . . . . . 140 4.5.3 Is There a Primary FE Cluster for Each Prefix? . . . . . . . . . 142 4.6 Impacts of User Redirection . . . . . . . . . . . . . . . . . . . . . . . 144 4.6.1 Distances of Mapping Changes . . . . . . . . . . . . . . . . . 145 4.6.2 Eects of Mapping Changes on Users . . . . . . . . . . . . . . 147 4.6.3 The Geographic Footprint Seen by User Prefixes . . . . . . . . 159 4.7 Reasons for Mapping Changes . . . . . . . . . . . . . . . . . . . . . . 163 4.7.1 FE Clusters Drain and Restoration . . . . . . . . . . . . . . . . 163 4.7.2 Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . 166 4.7.3 Reconfiguration of User-to-FE Clusters Mapping . . . . . . . . 171 4.7.4 Unknown: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 4.8 Prefix-FE Mapping Implications for Other Studies . . . . . . . . . . . . 172 4.8.1 Impact on Client Centric Geolocation . . . . . . . . . . . . . . 172 4.8.2 Improving Drafting Behind Akamai . . . . . . . . . . . . . . . 175 4.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5 Future Work and Conclusions 180 5.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 5.1.1 Immediate Future Work for Our Studies . . . . . . . . . . . . . 180 5.1.2 Future Work Suggested by This Thesis . . . . . . . . . . . . . 183 5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Bibliography 191 vii List of Tables 2.1 IPv4 censuses [USC06] used in this chapter. . . . . . . . . . . . . . . . 24 2.2 Fraction of responsive representatives . . . . . . . . . . . . . . . . . . 31 2.3 Responsive representatives with power weighting . . . . . . . . . . . . 32 2.4 Causes of unsuccessful representatives . . . . . . . . . . . . . . . . . . 33 2.5 Fraction of representatives that are non-responsive . . . . . . . . . . . . 36 2.6 Released hitlists to-date . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Prediction accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.8 Evaluation of random and informed representatives . . . . . . . . . . . 46 3.1 Diversity of vantage points . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2 Evaluation of IN queries coverage . . . . . . . . . . . . . . . . . . . . 79 3.3 Accuracy of CHAOS queries without traceroute. . . . . . . . . . . . . . 81 3.4 Accuracy of CHAOS queries augmented with traceroute. . . . . . . . . 83 3.5 Anomalies found for F-root CHAOS records in Netalyzr data. . . . . . 85 3.6 Comparing measured against published numbers of anycast nodes . . . 87 3.7 Characterizing L-root using IN queries over open resolvers . . . . . . . 88 3.8 Interpretation of CHAOS queries and traceroute on TLD nameservers. . 90 3.9 Anycast discovered for TLD name servers . . . . . . . . . . . . . . . . 90 3.10 Anycast services discovered for TLD names . . . . . . . . . . . . . . . 91 viii 4.1 Judgement of the clustering result . . . . . . . . . . . . . . . . . . . . 112 4.2 Use ASN and TTL-based clustering to fix false positives . . . . . . . . 114 4.3 Datasets collected as part of this chapter. . . . . . . . . . . . . . . . . . 115 4.4 Number of IPs and FE Clusters found . . . . . . . . . . . . . . . . . . 117 4.5 Rand index for our nodesets . . . . . . . . . . . . . . . . . . . . . . . 123 4.6 Rand index under dierent level of outlier removal . . . . . . . . . . . 124 4.7 Statistics of the number of Google DNS groups . . . . . . . . . . . . . 129 4.8 Number and percentage of prefixes mapped to unclustered Front-End IPs. 130 4.9 Domain names of Google search service we use . . . . . . . . . . . . . 132 4.10 Statistics of the number of PlanetLab nodes . . . . . . . . . . . . . . . 132 4.11 Top 10 source countries and their pertentage of non-domestic mappings 161 4.12 Number and fraction of mapping changes caused by drain and restore . 166 4.13 Percentage of occurrence oof Google FE Clusters . . . . . . . . . . . . 171 4.14 Mapping change index () of 7 bad-geoloc-FE-Clusters . . . . . . . . . 175 ix List of Figures 1.1 Demonstrating the thesis statement . . . . . . . . . . . . . . . . . . . . 7 2.1 Weight of each bit for three dierent functions. . . . . . . . . . . . . . 26 2.2 Comparison of three history functions for selected addresses. . . . . . . 27 2.3 A t cumulative distribution . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4 Relative size of hitlist components. . . . . . . . . . . . . . . . . . . . . 38 2.5 Eects of dierent inertia on representative churn . . . . . . . . . . . . 40 2.6 Eects of dierent inertia on responsiveness . . . . . . . . . . . . . . . 41 2.7 Number of responsive /24 block increases when using more cycles . . . 47 2.8 Frequency of responsiveness by last octet of the address (from it29w). . 50 2.9 Rank of responsiveness by last octet . . . . . . . . . . . . . . . . . . . 51 3.1 Example of DNS resolution process. . . . . . . . . . . . . . . . . . . . 63 3.2 Anycast and routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3 Anycast node configurations . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 Recall as number of vantage points vary . . . . . . . . . . . . . . . . . 77 3.5 Estimates of number of anycast services . . . . . . . . . . . . . . . . . 93 3.6 Encrypted, changing replies . . . . . . . . . . . . . . . . . . . . . . . . 96 4.1 CDN Front-End Clusters . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2 Distance plot of Google servers with airport codes . . . . . . . . . . . . 126 x 4.3 The output of the OPTICS clustering algorithm using reverse-TTL . . . 127 4.4 CDF of fraction of time prefixes stay at unclustered Front-End . . . . . 131 4.5 Eects of probe frequency . . . . . . . . . . . . . . . . . . . . . . . . 134 4.6 Compare open-resolver-CCG results with ground truth . . . . . . . . . 136 4.7 Number of FE Clusters and total mapping changes . . . . . . . . . . . 138 4.8 The number of prefixes seen prefix-FE Cluster mapping changes . . . . 139 4.9 Duration of prefix-FE Cluster mapping . . . . . . . . . . . . . . . . . . 141 4.10 Cumulative portion of users seen mapping changes . . . . . . . . . . . 142 4.11 CDF of mean prefix-FE Cluster mapping duration over all prefixes . . . 143 4.12 CCDF of time fraction that prefixes at their primary FE Cluster . . . . . 144 4.13 CDF of distance between switching pairs . . . . . . . . . . . . . . . . 145 4.14 CDF of maximum distance of switching pair . . . . . . . . . . . . . . . 146 4.15 CDF of the number of times prefix see distant mapping changes . . . . 147 4.16 Prefix-FE latency changes after a mapping change . . . . . . . . . . . . 149 4.17 Correlation of latency changes and distances of switching pairs for Google150 4.18 Correlation of latency changes and distances of switching pairs for Akamai151 4.19 Examples of relations of distance of switching paris and latency changes. 153 4.20 CDF of fraction of time user prefixes stay at a large latency FE Cluster . 156 4.21 CDF of Maximum distance of switching pair . . . . . . . . . . . . . . 157 4.22 CDF of Number of times prefixes see large distance mapping changes . 158 4.23 Total number of Google FE Clusters seen at each observation . . . . . . 164 4.24 CDF of the number of dierent countries to which prefixes are mapped. 165 4.25 Number of Google FE Clusters seen from all routable /24 prefixes . . . 166 4.26 Comparing the number of times prefixes see distant mapping changes . 167 4.27 A Google FE Cluster shows diurnal pattern of the number of user prefixes168 4.28 Amplitude of FFT for a FE Cluster . . . . . . . . . . . . . . . . . . . . 169 xi 4.29 Time fraction at primary for on-net primary and o-net primary prefixes 170 xii Abstract The Internet is becoming more and more important in our daily lives. Both the gov- ernment and industry invest in the growth of the Internet, bringing more users to the world of networks. As the Internet grows, researchers and operators need to track and understand the behavior of global Internet services to achieve smooth operation. Active measurements are often used to study behavior of large Internet service, and ecient service enumeration is required. For example, studies of Internet topology may need active probing to all visible network prefixes, and monitoring large replicated service requires periodical enumeration of all service replicas. To achieve ecient service enu- meration, it is important to select probing sources and destinations wisely. However, there are challenges for making smart selection of probing sources and destinations. Prior methods to select probing destinations are either inecient or hard to maintain. Enumerating replicas of large Internet services often requires many widely distributed probing sources. Current measurement platforms do not have enough probing sources to approach complete enumeration of large services. This dissertation makes the thesis statement that smart selection of probing sources and destinations enables ecient enumeration of global Internet services to track and understand their behavior. We present three studies to demonstrate this thesis statement. First, we propose a new automated approach to generate a list of destination xiii IP addresses that enables ecient enumeration of responsive addresses across all net- works. Second, we show that using a large number of widely distributed open resolvers enables ecient enumeration of anycast nodes which helps study abnormal behavior of anycast DNS services. In our last study, we eciently enumerate Front-End (FE) Clusters of Content Delivery Networks (CDNs) and use the ecient enumeration to track and understand the dynamics of user-to-FE Cluster mapping of large CDNs. We achieve the ecient enumeration of CDN FE Clusters by selecting probing sources from a large set of open resolvers. Our selected probing sources have a smaller number of open resolvers but provide the same coverage on CDN FE Cluster as the larger set. In addition to our direct results, our work has also been used by several published studies to track and understand the behavior of the Internet and large network services. These studies not only support our thesis as additional examples but also suggest this thesis can further benefit many other studies that need ecient service enumeration to track and understand behavior of global Internet services. xiv Chapter 1 Introduction The Internet is becoming an important part of people’s lives. In addition to the connec- tions at home and in oce, 3G/4G connect people to the Internet everywhere at any- time. Approximately 3 billion people have been connected to the Internet [ITU14] (by the end of 2014), and the impact of the Internet continues growing. Governments from around the world keep investing in Internet infrastructure, both in developing coun- tries [Reu13, Mar15] and in developed countries [Fed10, Kob13]. Large companies are also developing new technologies to either enable people to connect to the Inter- net [Lar13], or help reduce the cost of accessing the Internet [Con13]. Growth of the number of Internet users leads to growth of Internet services. With more users enjoying the benefits of online services, more network trac is flowing to service providers. In addition, service providers also need to satisfy impatient users. For example, Amazon reports that every 100ms delay costs 1% of sales [Lin06]. As a result, service providers need to distribute the trac and reduce user access latency. To distribute network trac and reduce user access latency, Internet service providers repli- cate their services at multiple dierent locations (each location has a service replica) and increase the number of service replicas. DNS root servers first deployed the anycast technique [PMM93, AL06] around early 2000s [Har02] to replicate the root name services. With anycast, root name services now have more than 400 repli- cas [RD15], 30 times more than the original 13 sites [Kar05]. Large content providers built their own datacenters and content delivery networks to replicate their services. For example, Facebook announced plans for three new datacenters during 2010 and 1 2011 [Hic10,Hel10,Mil11], and the number of serving sites of Google search increased by 600 percent in 10 months [CFH + 13]. Large web service providers started to use con- tent delivery networks to replicate their services and can have more than 1000 replicas, each from a distinct network [DMP + 02a]. The rapid growth of both the Internet infrastructure and global on-line services brings greater challenges for achieving smooth operation, diagnosing problems of the service and measuring performance. Identifying outages is harder than before, because there are more hosts and networks in the Internet. Detecting service anomalies is not easy, as the anomalies may happen at various parts of the service, such as network com- munication, server side software or hardware. The fact that there are multiple replicas of the service makes the anomaly detection more complicated. The performance of large distributed services is not simply the performance of one service replica, but also the combination of all service replicas and their interactions with the users from all around the world. Need to track and understand global Internet service: In order to solve the problems brought by the fast growth of the Internet, researchers and service providers need to track and understand the behavior of global Internet services. Service behaviors include many dierent properties, for example, topology, provisioning and user-to-service latency. Those properties provide important information for achieving smooth operation, service diagnose and performance measurements. Understanding topology helps find bottle- neck links and helps understand reachability which are important to achieve smooth operation. Understanding service provisioning means knowing which service replica is serving which area of the Internet, so that users know where their data goes. Under- standing user-perceived latency is one type of performance measurement. This discussion shows that the continuous growth of the Internet has brought great challenges, and to tackle the challenges we need to track and understand the behavior 2 of global Internet services. We next discuss that researchers often need to enumerate services with active probing to study behavior of global Internet service, and the service enumeration needs to be ecient. Need to eciently enumerate services with active probing: Tracking and under- standing the behavior of global Internet services often requires service enumeration with active measurements. We define service enumeration as counting and listing replicas of replicated global services and entities of the Internet. Service enumeration is an impor- tant step in many studies [BFR06,SPT04,AMSU11,HWLR08,BDS11] that try to under- stand Internet service behavior, and the enumeration is often done using active probing. First, the distributed manner of large Internet services means that researchers often need to enumerate services in order to study the behavior of the services. For example, study- ing Internet topology involves enumerating all entities in the Internet in order for the topology to be closer to the real Internet. Depending on the granularity of the topol- ogy, the entities researchers enumerate could be organizations, ASes, routers and edge hosts. Full enumeration is important to study service performance and correctness. We have used it to study performance (x 4.6.2) and to detect anycast hijacking (x 3.6.1), and we show that third-parties evaluation increases the transparency of service quality. Second, while passive measurements can be a good way to study service behavior, such as data mining BGP tables to study AS level topology [MKF + 06, ZLMZ05] and study- ing service performance from provider side [LHF + 07, HBvR + 13, KMS + 09a, ZHR + 12], in many cases active measurements are required to study service behavior. For example, studies that are interested in exploring the path through the Internet or the edges of the Internet [GT00, CHK + 09, FJJ + 01, HFP + 02] need active probing to enu- merate responsive edge hosts. Due to various reasons such as need of a client-side 3 view of the services, need of third-party audit and research purpose, service enu- meration using active probing is required to study behavior of large replicated ser- vices [BFR06, SPT04, AMSU11, HWLR08, BDS11]. Service enumeration based on active measurements often needs to be ecient to achieve the goal of tracking and understanding global service behavior. We define e- cient as using shorter measurement time or generating less network trac for mea- surements. Some longitudinal studies [SKB06, TWR11] prefer or even require short measurement period in order to capture dynamics at the time scale of interest, such as service monitoring and anomaly detection. In addition, to reduce the measurement cost and possible negative eects on services or users, service enumeration using active measurements requires generating minimum network trac. Though ecient service enumeration is often required to study service behavior, it is not easy to achieve ecient service enumeration through active measurements. Using active probing to do service enumeration often involves a large number of prob- ing sources or destinations in order to approach full enumeration. There are challenges for making good selection of both probing sources and destinations that can enable e- cient service enumeration. This thesis will demonstrate that smart selection of probing sources and destinations can overcome these challenges to achieve ecient service enu- meration. Challenges of selection of probing sources and destinations for ecient service enu- meration: One challenge of selection of probing destinations for ecient service enumeration is to both minimize measurement trac and reach more responsive des- tinations. Studies of the router-level topology of the Internet want to include as many network addresses or blocks as possible, thus they often need a selected destination list to probe. Though it is possible to probe the entire IPv4 address space [HPG + 08], using this strategy for mapping is time consuming and sometimes excessive since many 4 probes duplicate each other. A recent probing tool [DWH13] can scan the entire IPv4 address space in under 45 minutes. While it shortens the measurement time, this tool still has many duplicated probings and its aggressive probing is likely to trigger com- plaints from many networks. Previous studies have chosen random [CHK + 09] or pre- defined [SBS08] representative addresses which reduces the probing trac, but, as the chosen targets are often non-responsive, the eciency of the active probing is greatly aected. To reach more responsive destinations, some studies use manually maintained lists of well known sites [HFP + 02]. However, the size of the Internet and rate of churn in even well-known servers makes manual maintenance untenable as the list of selected servers quickly becomes incomplete. In addition, selection of probing sources for ecient service enumeration is not easy either. Large distributed services have many service replicas at dierent physical and network locations. Active probing from outside of the services usually can’t control which service replica is identified. Thus, in order to enumerate most or all of the service replicas, active probes from multiple dierent sources are required. Without enough probing sources that help reach most or all of the service replicas, recall of the service enumeration is greatly aected. The research and operation communities have set up platforms [CCR + 03, Hou12, NCC15] to increase the number of vantage points for vari- ous studies. However, the maximum number of vantage points of those platforms is at most 8,000 [NCC15] (as of May 2015) which is insucient to approach full enumera- tion (Chapter 3). To demonstrate that the importance of good selection of probing sources and des- tinations in service enumeration, we next introduce our overall thesis statement (x 1.1) which asserts that smart selection of probing sources and destination enables ecient service enumeration. We present three studies as examples to support the thesis state- ment (x 1.2). 5 1.1 Thesis Statement The thesis of this dissertation states that smartselectionofprobingsourcesanddesti- nations enables ecient enumeration of global Internet services to track and under- stand their behavior. We approach full enumeration of those global services so as to get a better understanding of their behavior. We demonstrate the validity of our asser- tion by presenting three complete studies. First, we propose a new method to generate an informed IP hitlist which is a list of IP addresses that each represents a distinct /24 prefix and together cover the allocated IPv4 address space. We evaluate the performance of our hitlist and compare using our hitlist to using random selected representative IP addresses. We show that the representative IP addresses (one per /24 prefix) selected by our method can eciently reach many more edge links than random probing in a single round. Second, we show that active probing from open resolvers enables e- cient enumeration of DNS anycast nodes with good recall, approaching full enumera- tion of anycast nodes. Enumeration of DNS anycast nodes helps monitor the anomalies of anycast service such as masquerading (unauthorized service replicas, without mali- cious behavior observed) or hijacking (unauthorized service replicas conducting mali- cious behavior). Last, we select dierent kinds of probing sources to enumerate serving infrastructure of large Content Delivery Networks (CDNs) periodically. To reduce user latency, CDNs deploy many Front-Ends (FEs) which are servers that users connect to to request web pages or services (x 4.2.1). We group CDNs’ Front-Ends into FE Clusters to further reduce the number of probing sources needed to cover all FE Clusters. FE Clusters are FEs in a single physical and network that provide the same services to end users (details of the definition inx 4.2.1). By using the selection of probing sources and destinations, we can do ecient and periodic enumeration of CDN’s FE Clusters to study the dynamic of the mapping of users to CDN FE Clusters. 6 Figure 1.1: Demonstrating the thesis statement. Our three studies, Chapter 2 (A.), Chap- ter 3 (B.) and Chapter 4 (C.), each demonstrates the thesis statement as a specific exam- ple. Several other studies [HH12, QHP13, MdDP13] have already benefited from our contribution of applying smart selection of probing sources and destinations, and we expect more future studies that require ecient service enumeration can benefit from our contribution. 1.2 Demonstrating the Thesis Statement In the following chapters, we support the thesis statement through three specific studies. Each study smartly selects either probing sources or destinations so as to enable ecient enumeration of global Internet services. We also show that the ecient enumeration enabled by our work can help study the behavior of global Internet services. Several other studies have benefited from our studies and our thesis can generalize to many other classes of studies. 7 1.2.1 Our Studies Support the Thesis Our three studies all support the thesis statement as strong examples. As illustrated in Figure 1.1, each of our three studies applies some insight (smart selection of mea- surements) to enable a specific mechanism (ecient enumeration of services) which supports various studies to understand service behavior. These studies can then benefit end users in dierent ways. 1.2.1.1 IP Hitlist Generation (Chapter 2 In our first study, we demonstrate that smart selection of probing destinations enables ecient enumeration of the Internet (edge links), as shown in Figure 1.1 (A.). While we do not directly use the enumeration of edge links to study the behavior of the Internet, we are confident that our method can help other studies to better understand the behavior of the Internet. First, we present our smart selection of probing destinations. We pro- vide a new and automated method to generate an informed destination list (hitlist). Our hitlist contains a list of representative IP addresses, each from a single /24 prefix, and the whole list covers the whole IPv4 address space. More importantly, the representative IP addresses in our hitlist are the most likely to be responsive in its /24 prefix. Unlike previous methods which use random [SMW02,MIP + 06a,BHM + 07,MKBA + 09] or man- ual [LABJ00,HFP + 02] selection of representative IP addresses, our hitlist is an informed prediction based on the history of responsiveness and is generated and updated automat- ically. We evaluate our hitlist, quantify its accuracy and conclude that only 50-60% of selected representatives are likely to respond three months later. To our knowledge we are the first to study hitlist responsiveness and accuracy. We also investigate the reason why we miss over 40% and show that those non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. This result suggests that no 8 manual system could ever have been successful due to natural turnover of addresses in parts of the network, further proving our selection is smart. Second, we show that this destination list, as stated in the thesis statement, enables ecient enumeration of responsive IP addresses. Although we don’t guarantee full enumeration, we show that, compared to previous random selection method, our hitlist provides 1.5 million (1:7) more edge links to traceroute-based Internet topology study. Last, using our work, others have shown that our informed hitlist helps understand dierent behaviors of the Internet through massive IP geolocation [HH12] and Internet outages detection [QHP13]. 1.2.1.2 DNS Anycast Characterization (Chapter 3) In the second study, we show that smart selection of probing sources enable ecient enumeration of service replicas to detect anomalies of the service, as shown in Figure 1.1 (B.). Our research goal of this study is to characterize and enumerate nodes of anycast DNS services. First, we make a smart selection of probing sources for ecient enumeration of anycast nodes. Our selected probing sources are a large number (300k) of open recur- sive name servers. Open recursive name servers (or open resolvers) are servers that accept and process queries made from any public Internet address (not just clients of the ISP operating the server; a more detailed definition can be found from this Internet- Draft [HSF15]). We find that anycast discovery is challenging because anycast configu- rations can be somewhat complex, existing diagnosis methods are not standardized and can result in measurement ambiguity, and the visibility of anycast servers can be topo- logically scoped requiring a large number of vantage points. We decide that to use open recursive name servers is a smart selection after testing three dierent kinds of probing sources and finding that only open recursive name servers support ecient enumeration of DNS anycast nodes. The other two kinds of probing sources include public research 9 infrastructure (PlanetLab) which contain several hundreds of sites and 60k Internet users who used Netalyzr [KWNP10] network debugging tool during a four-months period. While we use all available open resolvers we can get, suggesting many probing sources are redundant, this choice is required to guarantee maximizing the coverage of anycast nodes. As a result, our selection of open resolvers is a smart selection for our purpose— maximize the coverage of anycast nodes, but not a cost-ecient choice that may be required when reducing measurement trac and reducing the enumeration duration are parts of the goal. To our knowledge, we are the first to use these probing sources to enumerate anycast nodes and evaluate the completeness of anycast nodes enumeration. Second, we demonstrate the thesis statement that our selection of probing sources enables ecient enumeration of anycast nodes of a global DNS service, when the goal is to maximize the recall of enumeration. Active probing from this large number of prob- ing sources has many redundant measurements as the total number of anycast nodes is at most a few hundreds. However, a large number of probing sources are essential to detect masquerading anycast nodes and understand service scope of each anycast node. We show that only our selection of probing sources (open recursive name servers) achieves both high enumeration recall (over 90%) and on-demand enumeration (for ser- vice anomaly detection), which are required by ecient enumeration of anycast nodes. Netalyzr users and PlanetLab nodes both don’t support ecient enumeration because Netalyzr users don’t support on-demand enumeration and PlanetLab nodes can’t guar- antee high recall. Last, we demonstrate that we can use the enumeration of anycast nodes to under- stand the behavior of anycast DNS services, and it also suggest that with ecient enu- meration, we could track the behavior of anycast DNS services. We enumerate root name servers and top level domain name servers with Netalyzr users and PlanetLab 10 nodes to study abnormal behaviors of DNS roots. We find one example of masquerad- ing F-root node and many potential masquerading root name server nodes. We also use PlanetLab nodes to study anycast deployment in top level domain name servers and find that up to 72% of TLDs use anycast. The success on identifying masquerading root server nodes suggests that using more ecient enumeration enabled by probing with open recursive name servers, we will be able to do frequent enumerations to track masquerading anycast nodes. 1.2.1.3 Assessing CDN Front-End to User Mapping (Chapter 4) In our third study we demonstrate that smart selection of probing sources enables e- cient enumeration of service replicas to understand performance of services, as shown in Figure 1.1 (C.). CDNs improve performance by serving users from nearby FE Clusters (user-to- FE Cluster mapping) and also spread users across FE Clusters for reasons like load- balancing, maintenances. Unlike the previous anycast study which has many redundant probing sources, in this work, we go one step further to reduce the redundancy of prob- ing sources to enable fast and periodic enumerations. Our research goal is to first enu- merate the number of dierent physical locations CDNs have to place their Front-End servers, and second, study the properties of user to Front-End mappings. The CDNs we study are Google and Akamai. First, we use three steps to achieve our smart selection of probing sources. In step one, we want to enumerate as many Front-End IP addresses as we can, so we use a large number of open resolvers to enumerate Akamai Front-End IP addresses and EDNS client-subnet extension as illustrated in prior work [CFH + 13] to enumer- ate Google Front-End IP addresses. In step two, we introduce a clustering method to cluster the Front-End IP addresses into a smaller number of Front-End Clusters. This 11 clustering not only is one of our research goals, but also can help us to reduce the redun- dant probing sources. In the last step, we reduce the redundant probing sources and select a subset (32k, each from a distinct /24 prefix) of them that could still cover all Front-End Clusters and also has a good diversity on their origin countries and ASes, to do fast and periodic enumerations. Second, using this subset of probing sources enables us to do ecient enumerations of CDN Front-End Clusters. Our enumeration takes less than 15 minutes and we period- ically do it every 15 minutes for one month. As far as we know, we are the first to study dynamics of CDN user-to-FE Cluster mapping from a large number of user prefixes (more than 30,000). Last, our periodical enumeration allows us to track and understand the behavior of large CDNs. Using the data collected from the periodical enumeration, we study CDN user to Front-End Clusters mapping. We find that many prefixes (50-70%) switch between Front-End Clusters that are far away from each other, and these shifts some- times (30-40%) result in large latency shifts (100ms or more). We then find that many prefixes are directed to several countries over the course of a month, complicating ques- tions of jurisdiction. We also characterize the reasons for the user-to-FE Cluster map- ping changes. We find that FE drain/restoration, load-balancing and user-to-FE mapping reconfiguration are possible reasons for mapping changes. 1.2.2 Generalization We believe our thesis statement also generalizes to many other studies. We have shown that each of our three pieces of work, IP hitlist generation, anycast characterization and CDN user mapping characterization, supports part of the thesis statement as a specific example. There are also some studies that benefit from our methods and results sup- porting our thesis as additional examples. Moreover, we are confident that many other 12 classes of studies can benefit from our results or methods to achieve ecient service enumeration and use the service enumeration to track and understand behavior of global Internet services. Several other studies benefit from our work, either directly using the results of our work, or using the methods our work proposes. Marchetta et al. [MdDP13] use our IP hitlist to detect third-party address in traceroute paths. Hu et al. [HH12] apply our idea of history-based destination prediction to improve the eciency of large scale active probing for IP geolocation. Recently, Lin et al. [QHP13] proposed an adaptive probing technique for eciently discovering network outages which is based on our method of selecting responsive IP addresses. Adaptive probing selects representative IP addresses based on the prediction from the history of IP addresses’ responsiveness like we do in hitlist generation. Adaptive probing extends our work by selecting multiple representa- tives for each prefix when they are not responsive to eciently and accurately identify prefixes that are in an outage. Enlightened by our anycast characterization work, stud- ies of the AS112 anycast service have started to use large number of open recursive name servers as probing sources to enumerate AS112 anycast nodes [AS112b] like we do in anycast enumeration. Similarly, enlightened by our selection of open resolvers for anycast enumeration, L-root started to support DNS IN TXT queries for anycast node identification so as to be compatible with open resolvers [AM14]. These studies all benefit from our methods to make smart selection of probing sources or destinations to achieve ecient service enumeration. Then, they use the ecient service enumeration to study behavior of the Internet [MdDP13, HH12, QHP13] and large anycast DNS ser- vices [AS112b,AM14]. Thus, these studies all support our thesis statement as additional examples. 13 In addition, the above studies also suggest many other classes of studies can ben- efit from our thesis. First, there are many classes of studies that need active prob- ing to large number of destinations, such as active-probe-based route hijack detection studies [ZZH + 08], Internet reachability studies [LABJ00, BHM + 07, QHP13], active- probe-based massive geolocation [HH12] and active-probe-based Internet topology studies [SMW02, MIP + 06a]. These studies could take advantage of smart selection of probing destinations to predict the responsiveness of IP addresses based on their respon- siveness history to eciently identify responsive IP addresses/prefixes or test reachabil- ity, like we do in Chapter 2. Second, dierent studies involving multiple probing sources can benefit from dierent strategies of selection of probing sources. There are classes of studies that need to enumerate service replicas, such as studies that try to understand provisioning of the global services [AMSU11,BDS11,FDC14] and the third-party stud- ies that audit the large online services [SPT04]. These studies can benefit from selecting the type of probing sources that are widely distributed and cover a large number of net- works to increase the coverage on service replicas, such as open recursive name servers we choose in Chapter 3. Also, there are studies of large services which needs both longi- tudinal measurements and coverage on all service replicas, such as studies that monitor large replicated service performance [SKB06, TWR11]. These studies can benefit from smart selection of probing sources to first increase the coverage on the service replica, and then reduce redundant vantage points to shorten measurement period, like we do in Chapter 4. 1.3 Additional Contributions Each of our three studies supports part of the thesis statement and serves as an example that future work could select probing source or destinations wisely to achieve ecient 14 enumeration of global distributed service and study their behavior. Therefore, our first contribution is to prove the thesis statement. In addition, our studies also makes broader contributions that could benefit the research community, industry and others, summa- rized as follows. In our DNS anycast characterization study, we discuss two security implications of our method. First is how to protect information from attackers or competitors when any- cast providers consider the details of their infrastructure to be proprietary. We proposed approaches to limit the enumeration of anycast nodes to the anycast providers them- selves, so that theoretically no one else can enumerate the anycast service to know the identity of each anycast node and also to tell how many dierent anycast nodes there are. Second, masqueraders and hijackers may prevent their fake node from being discovered. We discussed the challenges of identifying these masquerading nodes. In our third work that study dynamics of user-to-FE Cluster mapping of CDNs, one of the contributions is to validate the accuracy of using the existing Client-Centric- Geolocation (CCG) [CFH + 13] technique with 600k open resolver prefixes, which is a much smaller number comparing to all routable /24 prefixes which CCG originally used. We found that CCG using 600k open resolver prefixes gets similar accuracy to CCG using all routable /24 prefixes on geolocation of Google Front-End IP addresses. The other contribution is we evaluate how user to FE mapping changes aect previous studies. Since CCG technique depends on users being mapped to a close FE, user to FE mapping changes may cause some users being mapped to distant FEs, which may aect the accuracy of CCG. We evaluate whether user to FE mapping changes aect the accuracy of CCG and find that user to FE mapping changes do aect CCG accuracy, but only to a few percent of FE Clusters. 15 1.4 Structure of the Dissertation This dissertation is organized along our three specific studies that take advantage of smart selection of probing sources and destinations to enable ecient large service enu- meration. In Chapter 2, we present our new method of automated generation of informed hitlist. In Chapter 3, we show how we enable ecient enumeration of anycast nodes, maximize the recall of enumeration and use the enumeration to study behaviors of large DNS anycast services. In Chapter 4, we present our methods to enumerate and cluster CDN Front-End IPs, and our findings about the behavior of CDN user to Front-End mappings. In Chapter 5, we first explore possible directions for future studies based on this dissertation and then conclude this dissertation. 16 Chapter 2 Selecting Representative IP Addresses for Internet Topology Studies Our thesis is to understand behavior of global Internet services through ecient service enumeration by smart selection of probing sources and destinations. In this chapter, we propose novel approaches to select probing destinations which enable ecient enumer- ation of edge links of the Internet. Studies of Internet topology [SMW02, MIP + 06a], reachability [LABJ00, BHM + 07] and performance [HFP + 02, MKBA + 09], usually use an IP addresses list serving as the destination of traceroute or performance probes. Some prior studies generated this destination addresses manually [LABJ00, HFP + 02], but evolution and growth of the Internet make human maintenance untenable. Other studies generated the destination addresses by random selection [SMW02, MIP + 06a, BHM + 07, MKBA + 09], but most random addresses fail to respond. In this chapter, we focus on developing new approach to automatically generate this destination addresses list (hitlist) and maximize its responsiveness. As discussed in x 1.2, this chapter serves as a strong evidence to support parts of the thesis statement that smart selection of probing destinations enable ecient service enumeration. This chapter shows that smart selection of probing destinations (IP hitlist) enables ecient enumeration of Internet edge links. Several other projects benefit from our ecient enumeration of Internet edge-links to study various behaviors of the Inter- net. Hu et al.use our method of hitlist generation to geolocate IP addresses [HH12]. Quan et al.also use our method of hitlist generation to detect Internet outages [QHP13]. 17 Marchetta et al.directly use our hitlist to discover third-party addresses in traceroute paths [MdDP13]. The fact that these studies benefit from our work in this chapter sug- gests that many other classes of studies can also benefit from smart selection of probing sources. Many studies need active probing to many destinations, such as some of the topology studies, prefix hijacking studies, performance evaluation and network reach- ability studies. These studies can all benefit from using responsiveness history of the addresses to select probing destinations, like we do in this chapter. Part of this chapter was published in the 2010 Internet Measurement Conference (IMC) [FH10]. 2.1 Introduction Smooth operation of the Internet is important to the global economy, so it is essen- tial that Internet users, providers, and policy makers understand its performance and robustness. Although on the surface, individuals care only about their personal perfor- mance, a full diagnosis of “why is my web connection slow?” must consider not just the user’s “first mile” connection, but dozens of servers that aect performance [CPRW03]. Web content providers invest great eort in optimizing page load times to sub-second values [MKBA + 09] and in building distributed content distribution networks that man- age trac (for example, [DMP + 02b]). Policy makers debate questions about universal access [Wya10], a nation’s relative availability for broadband access [Pfa09], and the robustness of what is recognized as critical infrastructure. To answer these questions, network researchers, operations, and industry have devel- oped a number of tools to map the Internet [GT00, FJJ + 01, SMW02, HFP + 02, MIP + 06a, CHK + 09], evaluate performance [Wol98,HFP + 02,MKBA + 09], consider questions about routing and reachability [WMW + 06,BHM + 07], or the performance of replica placement 18 (examples include [Wol98, CJJ + 02]), and evaluate topology robustness [AJB00]. With the Internet’s lack of centralization and multiple overlapping global “backbones”, active probing plays an essential role in this process, with traceroute and ping and their vari- ants providing the main source of router-level reachability. While one may add AS-level views [BMRU09], the Internet’s router-level topology is the focus of this chapter. Dier- ent router-level studies either target specific networks [SMW02] or the whole Internet. Here we are most interested in observing the whole Internet—more than three billion allocated IPv4 addresses. Studies of the entire Internet typically employ a hitlist—a list of IP address that can represent the billions of allocated addresses. The defining characteristic of a hitlist is completeness, where a representative is chosen for every autonomous system or, in our case, for every allocated block of addresses defined by a /24 prefix, the smallest unit typically present in a default-free routing table. Representatives provide a 256-fold (or more) reduction in scanning size, allowing Internet-wide studies to take place in hours instead of months and also generate less trac of the measurements. While a recent probing tool [DWH13] can scan the entire IPv4 address space in less than an hour, the aggressive probing will trigger many complaints from users in various networks. Although completeness is necessary to study the whole Internet, an ideal hitlist is also responsive and stable. A responsive representative replies to ICMP messages, allowing traceroute to confirm a path to the edge of the network, and ping to measure round-trip time to an edge host. To support longitudinal studies, the hitlist should be stable, with representative identities not changing frequently or arbitrarily. Although hitlists are easy to define and have been used in topology studies for many years (we review related work in Section 2.6), they are surprisingly hard to create and maintain. Early hitlists were built manually from well known sites [HFP + 02], but the 19 size of the Internet and rate of churn in even well-known servers made manual main- tenance untenable as it quickly became incomplete. More recent studies have typically used randomly chosen representatives. While randomness has some advantages (it can be statistically unbiased), it sacrifices secondary goals of stability and responsiveness. The contribution of this chapter is to provide a new, automated method of hitlist gen- eration that provides complete coverage while maximizing stability and responsiveness. 1 Our hitlists are constructed (Section 2.2) by mining data from IP address censuses, com- plete, ping-based enumerations of the allocated IPv4 address space taken every two to three months [HPG + 08]. The second contribution of our work is to evaluate our hitlists (Section 2.3). Our hitlists are 100% complete as of when they are constructed, although when we have no history (in about two-thirds of the blocks) we select representatives at random. We define the accuracy of our hitlists have how many representatives are responsive three months after the hitlist is taken. We find that two-thirds of the allocated address space never responds to ICMP probes and so never has responsive representatives. Of the remaining, responsive Internet, our hitlists select representatives that are responsive about 55% of the time. To our knowledge we are the first to study hitlist responsive- ness and accuracy. The final contribution of our work is what hitlists reveal about that nature of the Internet itself. We were surprised that, in spite of such complete input data, the respon- siveness of our predicted representatives is not higher. We believe this upper bound on responsiveness characterizes the portion of the Internet that has an inherently high rate of address churn. One corollary of this limit to representative responsiveness is that no manual system could ever have been successful due to natural turnover of addresses in 1 We would like to thank Randy Bush for suggesting the idea that the address censuses data [HPG + 08] could support hitlist generation. 20 parts of the network. We also characterize the distribution of addresses in each block and show that it strongly reflects address allocation patterns (Section 2.4). We make our hitlists available free-of-charge, and they are already being used by several research projects. In Section 2.5 we discuss the security and policy issues involved in sharing this data. 2.2 Methodology We next describe the requirements of an IP hitlist (Section 2.2.1), and how we trans- form census data (reviewed in Section 2.2.2) using several possible prediction methods (Section 2.2.3) to get a good quality hitlist. We also provide some details on how our implementation copes with Internet-sized datasets. 2.2.1 Hitlist Requirements Our goal is to provide representatives that are responsive, complete and stable. By responsive, we mean each representative is likely to respond to an echo request with an echo reply instead of an ICMP error code or not responding at all. As we describe below, we select representatives that have responded frequently in the past. We do not guarantee that that address responded in the most recent census, but we bias our selection to favor recent results. We consider several prediction functions below in Section 2.2.3. By complete, we mean we report one representative address for every allocated /24 block. Some groups have used other definitions of completeness, such as one represen- tative per AS [BMRU09], or per ISP [SMW02]. AS- or ISP-complete hitlists will be both sparser and smaller than /24-complete maps, since ASes typically include routes for many prefixes, and prefixes of an ISP often cover blocks larger than /24s. Instead, 21 we select one representative for each /24 block for two reasons. The main reason is so that the hitlist is decoupled from the routing system, to allow independent study of the routing system itself, and since routing tables vary depending on when and where they are taken. Second, it is relatively easy for researchers to derive custom hitlist from ours, perhaps selecting sparser representatives per AS or per ISP. We believe our per- /24 representatives thus provide a more general and reusable result than more “cooked” alternatives. By stable, we mean that representatives do not change arbitrarily. We change rep- resentatives when a new representative would significantly improve the score for that block, typically because a representative has ceased to be reachable. We promote stabil- ity as a goal to simplify longitudinal studies, where frequent changes of representative would make comparisons across time more dicult. Stability also reduces the eects of transient routing outages or packet loss on the long-term hitlist. We implement stabil- ity by applying inertia as a threshold to changing a previously selected representative. Currently we switch representatives when the switch will improve its score (represents responsiveness) significantly, with inertia of 0.34 (see Section 2.3.3). The three goals of responsiveness, completeness, and stability can be in conflict. For example, completeness requires that we select representatives that may be non- responsive. To guarantee representatives for all allocated addresses, we select repre- sentatives even for blocks that have no recent responses. We also select representatives for blocks that have never responded. In both cases, we annotate these representatives with distinguishing scores. Stability and inertia can also decrease responsiveness; in Section 2.3.3 we examine how the inertia threshold reduces churn and how it aects responsiveness. 22 2.2.2 Background: Internet censuses Our main goal with a hitlist is to predict the future: a representative should be responsive in the future. Our tool to make this prediction is data from past responses. Hitlists selec- tion leverages Internet censuses that have been taken regularly since 2003 [HPG + 08]. Each Internet census is the results of a ping (an ICMP ECHO REQUEST message) sent to every allocated IPv4 address. Censuses are far from perfect: a census must be taken carefully to avoid ICMP rate limiting or transient router errors, and firewalls reduce ping response rates by around 40%. Hitlist, however, prefers hosts that are ICMP responsive, since traceroute consists of iterated, TTL-limited ICMP messages. Firewall- limited censuses are therefore ideal for hitlist generation. It takes 2–3 months to carry out a full census (the IPv4 space has more than 3 billion allocated unicast addresses, with 4 machines probing at about 6000 probes/s [HPG + 08]). For this chapter we consider censuses starting in Mar 2006 as shown in Table 2.1, since censuses before this date used a slightly dierent collection methodology. The results of this chapter use all 22 censuses taken over the four years preceding analysis, but we update our results as new censuses become available. A census elicits a number of responses, including ECHO REPLY messages as well as a variety of errors. Each census is quite large, with more than 3 billion records. Thus, 22 censuses is over 260GB of raw data. We therefore pre-process all censuses into a history map convenient for analysis. A history map consists of a bitstring for each IP address where each 1 indicates a positive response, and a 0 indicates either a non-response or negative response. 23 Censuses Date Duration (days) it11w 2006-03-07 23 it12w 2006-04-13 24 it13w 2006-06-16 31 it14w 2006-09-14 31 it15w 2006-11-08 61 it16w 2007-02-14 50 it17w 2007-05-29 52 it18w 2007-09-14 47 it19w 2007-12-18 48 it20w 2008-02-29 86 it21w 2008-06-17 49 it22w 2008-09-11 35 it23w 2008-11-25 29 it24w 2009-02-03 29 it25w 2009-03-19 29 it26w 2009-05-27 31 it27w 2009-07-27 25 it28w 2009-09-14 30 it29w 2009-11-02 30 it30w 2009-12-23 29 it31w 2010-02-08 30 it32w 2010-03-29 29 Table 2.1: IPv4 censuses [USC06] used in this chapter. In this chapter, we only consider echo replies (“positive” responses) as indicating a responsive address. We also explored treating both positive and negative responses (des- tination unreachable and similar error replies) as predicting a responsive address. How- ever, we found that negative replies only rarely are helpful in predicting future respon- siveness. We look at it28w and found 64% of positive responses were also responsive in the next census (165M in it28w, 107M of which respond in it29w). By contrast, of the 50M negative replies in it28w, only 2.7% (1.4M) respond positively in it29w. We therefore believe that negative responses are of little value in predicting future respon- siveness. 24 We next show how this history map can predict future response rates. 2.2.3 Prediction Method Of our hitlist goals of responsiveness, completeness, and stability, completeness and stability are under our control, but responsiveness requires predicting the future. Our guidance in this task is the prior history of each address. We next review several predic- tion functions that strive to select the best representative for each /24 block, where best is most likely to respond in the future. Prediction functions take the prior history of address a as input and weights that history in dierent ways. Each bit of the history is presented by r i (a), the response (1 for positive, otherwise 0) of address a to the ith probe, numbered from 0 (oldest) to N h 1, the most recent observation. We consider several dierent weights w(i) to get the scores s(a) in the form: s(a) = N h 1 X i=0 r i (a)w(i) For each block of addresses, the address with the highest s(a) is selected as the best representative. We may bias this towards prior representatives to promote stability. In the case of ties and no prior representative we select any top scoring address in the block at random. We considered several possible weights w(i). The simplest is w(i) = 1, so all responses are averaged. To give more recent observations greater influence we con- sider two biased weights. With linear weighting, w(i) = (i + 1) 1=N h , and for a power function, w(i) = 1 N h i . Weighting of each observation for an 8-observation history is shown in Figure 2.1. 25 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 weight history bits(old to new) average linear power Figure 2.1: Weight of each bit for three dierent functions. In addition, we can normalize scores by the maximum possible score (the minimum in all cases is zero), allowing all to fall in the range 0 to 1. As an example of the dierent functions, Figure 2.2 shows scores for three dier- ent weights and dierent history lengths. For simplicity, we assume N h = 8, shorter than we use in practice (in Section 2.3.1.2 we vary history duration). We consider three cases, all with 4 of 8 responding, but either responding most recently (Figure 2.2a), in the middle past (Figure 2.2b), or alternating response and non-response (Figure 2.2c). To a first approximation, all three weights are about the same, particularly with inter- mittent responsiveness in Figure 2.2c. The dierences in decay rates are more obvious when responsiveness is consistent for blocks of time, with power and linear decay faster than average in Figures 2.2a and 2.2b. Finally, dierences in history duration make a large dierence when a block is non-responsive, comparing the left and right parts of Figures 2.2a and 2.2b, and these eects are even greater when comparing across weights (for example, compare history durations 1–4 of Figures 2.2a and 2.2b). 26 0 0.2 0.4 0.6 0.8 1 8 7 6 5 4 3 2 1 0 weighted score history length (censuses) average linear power (a) 0000 1111. 0 0.2 0.4 0.6 0.8 1 8 7 6 5 4 3 2 1 0 weighted score history length(censuses) average linear power (b) 0011 1100. 0 0.2 0.4 0.6 0.8 1 8 7 6 5 4 3 2 1 0 weighted score history length (censuses) average linear power (c) 0101 0101. Figure 2.2: Comparison of three history functions for selected addresses. 27 This framework provides flexibility, but requires setting several parameters. We later evaluate which weighting is best (Section 2.3.1.1), how much history is benefi- cial (Section 2.3.1.2), and the underlying reasons addresses are dicult to predict (Sec- tion 2.3.1.3). 2.2.4 Gone-Dark Blocks Firewalls are probably the greatest impediment to active probing, since a conservative firewall can suddenly stop trac to an entire block. We will see in Section 2.3.1.3 that gone dark blocks are one cause of poor representative responsiveness. A gone- dark block is one that contained responsive addresses for some period of time, but then becomes unresponsive and stays that way, due to firewall or possibly renumber- ing. While we must select a representative for each allocated block, even if populated only by non-responsive addresses, we would like to indicate our low expectations for gone-dark blocks. We define a block as gone dark within history N d if, for the most recent N d obser- vations, no address in the block responded, even though we had some positive response before N d observations. We add gone-dark analysis to our hitlist generation by overriding the representative’s score with a designated “gone-dark” value to indicate our skepticism that it will reply. We explored dierent values of N d (x 2.3.1.2) and ultimately select N d = N h = 16, identifying only those addresses whose responses have aged-out of our history as gone- dark. We use this large value of N d because this value maximizes the absolute number of responsive representatives, while only decreasing the percentage of responsive, pre- dicted representatives a small amount. 28 For gone-dark blocks, we still select the representative as the address with the best score. For allocated but never-responsive blocks, we select the .1 address as the repre- sentative because that is most likely to be first used (Section 2.4). In Section 2.3.1.3 we show the contribution of gone-dark blocks to responsiveness. 2.2.5 Hitlist Description To summarize, our hitlist contains three kind of representatives for all allocated /24 blocks: informed and predicted representatives, where we select the best responder; gone-dark representatives, where some address once responded but has not recently; and allocated but never-responsive blocks, where we pick .1 as the representative. Table 2.6 lists the hitlists we have publicly released to-date. We identify hitlists by the name of the last census used in their creation, and include the number of censuses in the history. Thus HL28/16 uses 16 hitlists through it28w. When necessary, we add the gone-dark window, so HL28/16-3 uses a window of 3. If no gone-dark window is specified, we disable gone-dark processing. In addition to these public hitlists, Tables 2.2 and 2.3 show unreleased hitlists used to evaluate our methods. 2.3 Evaluation We next evaluate the success of our hitlist: how accurate are its predictions and how complete and stable is it? We first consider how responsiveness is aected by choices in our prediction mechanism. In section 2.3.1.3, we then look at causes of prediction failure. Finally, we consider completeness and stability. 29 2.3.1 Responsiveness Our primary goal with prediction is responsiveness: how accuracy is our prediction that the representatives in a hitlist will respond in the future? We can define based on the number responding in the future, N r , from the number of predicted representatives (including representatives of gone-dark and informed predicted blocks) N p as: = N r N p Responsiveness accuracy is aected by our choice of history weighting and length. We consider these next, and then consider structural reasons perfect accuracy is impos- sible to achieve. Our general approach to test responsiveness is to generate a hitlist, then evaluate it against ICMP probes in the next census. For example, the first line of Table 2.2 evaluates HL19/8, generated from the eight censuses from it12w through it19w, tested against it20w. This approach has the advantage of supporting retroactive evaluation of hitlist quality under dierent, controlled conditions. However, it also means each representative is only given one opportunity to be available. For this reason we report exact counts of results, without error estimates such as standard deviation. We evaluate repeatability of our results by considering multiple hitlists at dierent times. 2.3.1.1 Comparing History Weights We first consider how our weighting of prior history aects accuracy. Here we assume a history duration of 8 prior censuses (a reasonable choice as evaluated next in Sec- tion 2.3.1.2), and from that history we predict the results of the next census for the three weights we defined in Section 2.2.3. Since the network is dynamic, our expectation is 30 weighting function hitlist average linear power HL19/8 0.50 0.51 0.51 HL21/8 0.53 0.54 0.55 HL23/8 0.53 0.54 0.54 HL25/8 0.53 0.54 0.54 HL27/8 0.54 0.55 0.55 Table 2.2: Fraction of responsive representatives across 5 dierent hitlists for three dif- ferent history weights. that biased weightings will perform best since they favor recent information over older information. To answer this question, Table 2.2 compares our three weightings for several predic- tions. Each row in the table evaluates a dierent hitlist as generated with three dierent weights, and evaluated for all predicted representatives (N p ). The most important obser- vation is that all weights provide quite similar performance—the worst case responsive- ness is only 5% worse than the best. Linear and power functions provide marginally better responsiveness. The examples of the weights in Figure 2.2 suggests reasons why the dierence is so small. For many histories, all three weights produce roughly the same relative scores. 2.3.1.2 Eects of History Duration A second factor that can aect responsiveness is the duration of history considered in a prediction. Does more history provide more information, or does very old information become irrelevant or even misleading? To study this question, we considered all history available to us at time of analysis— then we had 18 Internet censuses covering 3.5 years. We consider only the power weighting of history, and look at the responsiveness of our predictions. 31 Responsive representatives and fraction hitlist predicted 4 8 12 16 representatives (N p ) HL19/- 3.091 (100%) 1.558 (50%) 1.586 (51%) — — HL21/- 3.386 (100%) 1.813 (54%) 1.846 (55%) — — HL23/- 3.613 (100%) 1.925 (53%) 1.948 (54%) 1.950 (54%) — HL25/- 3.794 (100%) 2.007 (53%) 2.049 (54%) 2.059 (54%) — HL27/- 3.971 (100%) 2.135 (54%) 2.179 (55%) 2.193 (55%) 2.200 (55%) Table 2.3: Responsive representatives (in millions) with power weighting across 5 dif- ferent hitlists for dierent history length. Table 2.3 shows responsiveness of our predictions as a function of history length, for five predictions. We see that very short histories are insucient: prediction rates are 1– 2% lower when fewer than 8 (about 1.5 years) observations are considered. On the other hand, we see no dierence in prediction accuracy for histories from 8 to 16 censuses. (We also looked at history duration with the average function, and found there that long histories became slightly less accurate, although only by 1–2%. This observation argues in favor of a weighting that decays by history, like power.) Finally, while longer histories may not improve the fraction that respond, it does provide information that allows more representatives to be selected. Table 2.3 shows the absolute number of responders as a function of history duration. Longer history allows 20k more responders with length 16 than with length 8. More history always increases the number responding, although with diminishing returns past 12 censuses or so. In practice, the incremental cost of longer history lengths is not large. So we use a history length of 16 censuses in our production lists. Although 8 censuses provides slightly betters results, the faction of responding, only 55%, seems lower than we might expect. We therefore next consider causes of non- responsiveness. 32 HL28/16 HL31/16 predicted representatives (N p ) 4,055,193 100% 4,307,644 100% not stable 1,749,471 43% 1,820,806 42% gone dark 703,987 17% 772,014 18% responsive (N r ) 2,250,091 56% 2,560,420 59% non-responsive (N n ) 1,805,102 44% [100%] 1,747,224 41% [100%] not stable (only) 590,472 14% [33%] 565,654 13% [32%] gone dark (only) 0 0% [0%] 0 0% [0%] not stable and gone-dark 693,832 17% [38%] 766,338 18% [44%] just unlucky 520,798 13% [29%] 415,232 10% [24%] Table 2.4: Causes of unsuccessful representatives predicted from HL28/16 and HL31/16, evaluated against responses in it29w and it32w. We don’t apply gone-dark window on prediction here, gone-dark blocks are detected separately with gone-dark window size of 3. 2.3.1.3 Causes of Failed Responses We found the observation that our best methods get only 55% responsiveness seems to be somewhat surprising. Surely such a large amount of history (over three years of full censuses) can be explored somehow to select representatives with greater accuracy. To answer that question, we next explore the causes of why representatives fail to respond. Our conclusion is that it is unlikely that any prediction can do better than about 70% because of the use of dynamic address assignment and firewalls. To support this claim, Table 2.4 counts prediction failures for HL28/16, tested against it29w (We found roughly similar results in examination of HL31/16 evaluated against it32w.) We see that 44% of representatives are non-responsive (1.8M of the 4M blocks). Two explanations account for the majority of our misses: blocks that use only dynamic address assignment, and “gone-dark” blocks. We consider each of these below. While dynamic addressing and firewalls are target-specific causes of representative non-responsiveness, measurement error is a possible source of uncertainty. We believe 33 that Internet census-taking methodology reduces these sources of error to random noise for reasons described in prior work [HPG + 08]. To summarize briefly: they monitor the network hosting the probes for local routing outages. Probes are in pseudorandom order, so routing outages in the middle or near the destination result in lower responsiveness in proportion to outage rates, but randomly distributed. Pseudorandom probing is spread over two months, so the probe rate to any individual /24 is well below typical ICMP rate limits. We considered packet loss and routing outages in the middle or of the network or near probe sources being potential sources of error. For more complete discussion of sources of error in Internet census-taking, and validation studies, we refer to prior work [HPG + 08]. Defining stable blocks: Blocks that lack stable addresses make representative selec- tion inherently dicult. In a block with a stable representative, it will likely remain responsive, but if all addresses in the block are unstable then the probability a repre- sentative will respond is equal to the occupancy of that block and independent of prior history. Addresses can lack stability either because the hosts using the addresses are only on intermittently, or because addresses in the block are allocated dynamically to a chang- ing population of computers. Multiple groups have used dierent techniques to identify dynamically assigned addresses in the Internet [SM06,XYA + 07,CH09]. A recent study estimates that about 40% of responsive Internet blocks are dynamic based on Internet address surveys using ICMP probes taken every 11 minutes for two weeks [CH09]. (We assume here that non-stable blocks are primarily due to dynamic addressing.) To evaluate the prevalence of stable and non-stable blocks, we would like to identify them from the history that we collect. Prior analysis of surveys used address availabil- ity and volatility to identify dynamic addressing. Availability is the fraction of times the address responds in all probes, while volatility is the fraction of times the address 34 changes between responsive and non-responsive [CH09]. While appropriate for survey data with 11-minute probes, volatility makes less sense when probes are months apart. To identify stable blocks with infrequent probes, we define a new metric, truncated availability, the fraction of time an address responds from its first positive response. More formally, if r i (a) is the response of address a to the ith probe, the raw and scaled availability, A (a) and A(a) (from [CH09]) and truncated availability, A t (a) are: A (a) = N h X 1 r i A(a) = A (a)=N h A t (a) = A (a)=L (a) where L (a) is the length of a history, in observations, from the first positive response to the present. While both volatility and truncated availability are correlated, we found that low volatility and high truncated availability are both good predictors a stable block. Low A t values are a good predictor of intermittently used addresses. Continuing the examples in Figure 2.2, 00001111 has A t = 1, while 01010101 has A t = 0:57. While A t is good at dierentiating between these solid (00001111) and intermittent (01010101) addresses, it interacts with gone-dark addresses, which will have a string of trailing 0s. From these, we define a stable representative as A t 0:9. From Table 2.4 we find that 43% of all representatives are not stable by A t < 0:9 , a little higher than other independent observations 40% [CH09], and of 34–61% [XYA + 07] (these values are for a random sample of DNS and for Hotmail users, respectively). We do not claim strong validation of this exact percentage because each work is a percentage of a dierent 35 non-responsive representatives A t < 0:9 1,284,304 (71%) A t 0:9 520,798 (29%) total 1,805,102 (100%) Table 2.5: Fraction of representatives that are non-responsive, based on A t (HL28/16 tested against it29w). population, and a dierent definition of what is dynamic or not stable. We claim only that our metric is in the right order of magnitude and so provides some insight into sources of non-responsive representatives. Re-evaluating causes of non-responsive representatives: With these definitions, we return to Table 2.4. We see that both gone-dark and not-stable blocks contribute three-quarters of our misses. Almost one third are in not-stable only blocks, with almost 40% gone-dark. We therefore claim that three-quarters of our non-responses are due either to new firewalls or selection of representatives in not-stable blocks, neither of which can ever have always responsive representatives. To support the claim that lower A t values correlate with poorer response, Table 2.5 breaks out the 1.8M non-responsive representatives by two values of A t . We see that 29% of non-responses come from stable blocks (A t > 0:9). Representatives with poor truncated availability (A t < 0:9) account for more than two-thirds of non-responses. We conclude there are many unstable blocks that simply cannot be expected to support sta- ble representatives. Also note, by our definition of gone-dark, dark blocks also qualify as not stable (because A t < 0:9). To show our choice of threshold for A t does not alter our conclusion, Figure 2.3 shows the cumulative distribution of A t for both non-responsive and responsive repre- sentatives. It shows a large dierence in responsiveness for any value of A t . 36 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 cumulative number of representatives Truncated A value responsive non-responsive all Figure 2.3: A t cumulative distribution on responsive part of N p , non-responsive part of N p and all N p class HL28/16 HL29/16 HL30/16 HL31/16 HL32/16 allocated /24 blocks 12.774 12.774 12.905 13.036 13.167 never responding 8.718 8.631 8.679 8.728 8.775 predicted 4.055 4.142 4.225 4.307 4.392 gone-dark 0.035 0.075 0.109 0.154 0.195 informed prediction 4,019 (100%) 4.066 (100%) 4.116 (100%) 4.153 (100%) 4.196 changed Rep. — (5%) 0.218 (8%) 0.341 (7%) 0.292 (7%) 0.306 new Rep. — (4%) 0.171 (2%) 0.082 (2%) 0.082 (2%) 0.084 responsiveness 2.250 2.344 2.411 2.451 — Table 2.6: Released hitlists to-date, by last census used in prediction (top), number in millions. The top group of rows show hitlist composition, including churn (changed) and new representatives (“Rep.” for short) relative to the prior hitlist. The bottom line, responsiveness, evaluates the hitlist against the census. 2.3.2 Completeness To evaluate completeness, Figure 2.4 shows the absolute number of representatives for using 16-deep histories through five dierent censuses, and Table 2.6 shows the raw 37 0 2 4 6 8 10 12 14 HL28w/16 HL29w/16 HL30w/16 HL31w/16 HL32w/16 number of /24 blocks in hitlist (millions) hitlist datasets never responsive blocks not recently responsive blocks predicted blocks Figure 2.4: Relative size of hitlist components. data. We consistently see that about one-third of blocks have some history data allowing an informed selection of representatives (the white region of the graphs, with around 4.2M blocks). By contrast, about two-thirds of blocks have never responded (the top grey regions) In addition, this data shows gone-dark selection from Section 2.2.4. We identify about 0.3–1.5% of allocated blocks as formerly responsive (the black region in the mid- dle of Figure 2.4). To guarantee completeness, we select random representatives for never-responsive blocks. However, we can see that we can provide informed choices for only a third of blocks. Finally, we note that IANA only releases new allocation maps quarterly, and routing studies suggest this space becomes routable gradually [BHM + 07], so we expect our hitlist to be useful for at least three months, about the frequency we update them. 38 2.3.3 Stability and Inertia We next consider two aspects of hitlist stability: how much churn is there in the hitlist, with and without a representative inertia, and how much does inertia reduce prediction accuracy? Recall that inertia is the amount I by which prediction score must improve to change representatives. An inertia I = 0 means we always pick the highest rank address in a block as the representative, independent of the representative in a hitlist based on a prior censuses. As inertia approaches 1, we will never switch representatives once chosen. For our production hitlists, we use I = 0:34 based on score changes due to weighting (Section 2.3.1.1) and the following analysis. How much inertia aects churn: Churn is that rate at which we switch representa- tives for established blocks. Table 2.6 shows the amount of churn for four hitlists when using our standard inertia I = 0:34. Churn is shown in the “changed representatives” row, so in the HL32/16 column, we see that about 7% of all predictions (306,588 rep- resentatives) changed relative to the prior hitlist (HL31/16)). This Table 2.6 shows that the rate of churn is relatively stable over time, with 5–7% of all informed predictions changing each census. While Table 2.6 shows churn over time for a fixed inertia, in Figure 2.5 we vary inertia to observe its eect on churn. To estimate the relationship shown in this figure, we generate HL28/16, then modify it three times with censuses it29w, it30w, and it31w with dierent levels of inertia. (Here we suspend gone-dark processing to focus only on inertia.) We then evaluate the hitlist against observations from census it32w. We evaluate inertia over several steps for two reasons. First, hitlist staleness is partially a function of time. Second, large values of inertia suppress changes in single or a few censuses. 39 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.2 0.4 0.6 0.8 churn (fraction) Inertia modify 3 times modify twice modify once Figure 2.5: Eects of dierent inertia on representative churn (HL28/16 modified three times by it29w through it31w; modified twice by it29w, it30w; and once by only it29w). As expected, Figure 2.5 shows that higher inertia suppresses churn, because it takes several new negative responses for a representative’s score to change. In fact, weight selection means score can change only by 0.3 from one new census, and decrease to 0.5 from two new censuses since the weight decrease in our pow weighting, so with three new observations here, an inertia of 0.2 has one observation that might cause change, while I = 0:4 has two; I = 0:6, three; and I = 0:8 requires more than eight observations to change. Inertia on responsiveness: Inertia is selected to keep hitlists stable, reducing the amount of arbitrary representative turnover in long-running experiments. Such turnover can be eliminated by simply never changing representatives (setting I = 1), but prior experience shows that the responsiveness of a static hitlist will degrade over time as servers move, losing as much as 2–3% per month for the early Skitter web-server-based 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.2 0.4 0.6 0.8 predication responsiveness(fraction) Inertia HL29w/16 HL30w/16 HL31w/16 Figure 2.6: Eects of dierent inertia on responsiveness (HL28/16; modified once by it29w, then tested against it30w; modified twice by it29w, and it30w, then test against it31w; then modified three times by it29w through it31w and tested against it32w). list [HFMkc01]. We would therefore like to know the trade-o between inertia and representative responsiveness. Figure 2.6 shows hitlist responsiveness for dierent values of inertia after this pro- cess. (This analysis was generated with the same multi-step process as Figure 2.5 described above.) We see that responsiveness degrades slightly for high inertia values, from 59% responsiveness with no inertia, to a low of 53% responsiveness when I = 0:8, when there are eectively no changes. We conclude that a moderate inertia has little eect on responsiveness costing at most 6 percentage points, even over eight months. 2.3.4 Eects of Probe Frequency We build our hitlists from periodic censuses, so an important question is how the fre- quency of such censuses aects the quality of the hitlists. Intuitively, more frequent sampling can provide more information on address responsiveness, resulting in better 41 predictions. Our current censuses are taken every two to three months (Table 2.1). More frequent collection is possible, but should be justified because data collection entails some operational cost (computing power, storage, etc.). While Internet censuses cover the whole Internet every a few months, we also have access to surveys that probe 1% of the Internet every 11 minutes for two weeks [HPG + 08]. Here we turn to these much more frequent probes to evaluate the eects of dierent probe rates. A survey has about 1800 observations, but we downsam- ple this information to get observations every 12, 24, or 48 hours, providing 28, 14, and 7 bits of history. We use survey 30 (taken at the same time as it30w in Dec., 2009) to do prediction, and test it against it31w. We use the average weighting function to eval- uate these observations, since it seems unnecessary to favor recent observations when all are taken of a short, two-week period. In addition, in Section 2.3.1.1 we showed that weighting has relatively little eect on responsiveness. We then test the prediction from this history against the next census (it31w) to evaluate prediction accuracy. We compare the representatives found from the survey with those computed in HL30/16. Table 2.7 compares responsiveness as a function of probe frequency. First, we see that, of the survey-derived hitlists, more frequent probing provides a slightly better prediction (68% responsive from 12-hour samples vs. 66% for 48 hours). This small improvement is because more frequent probing gives more information on host respon- siveness. Second, we see that the census-derived hitlist is a better prediction than any of the survey-derived hitlists, by 2–4%. The main source of this dierence is that the sur- vey’s hitlist finds some addresses that don’t appear in the census-derived hitlist. These addresses seems less stable in the next census because the survey-derived hitlist consid- ers only a short period of history, while the census-based hitlist considers many months history and so finds long-term stable addresses, if they exist. In addition, because our survey observations are taken at specific times of day, they may discover addresses that 42 Responsive Probe Frequency Representatives Survey, 12 hours 15,484 (68%) 24 hours 15,362 (67%) 48 hours 15,197 (66%) HL30/16 (3-months) 16,123 (70%) total blocks analyzed 22,861 (100%) Table 2.7: Prediction accuracy (in responsiveness) from dierent probe frequencies, for a given number of /24 blocks. are only up during particular daily hours, while evaluating against the next census tests at random times of day. 2.3.5 Eects on Other Research The above sections evaluate hitlists based on our goals: responsiveness, completeness, and stability. But hitlists are a tool to enable other research, so their ultimate bene- fits come by how they improve the quality of other network performance and topology studies. Some network performance studies require responsiveness in their destinations. These studies include those that evaluate performance [Wol98, HFP + 02, MKBA + 09], consider questions about routing and reachability [WMW + 06, BHM + 07], or study the performance of replica placement (examples include [Wol98, CJJ + 02]). For studies that require end-to-end latency measurements, our representative selection methods optimize reachability within the constraints of sparse measurement. Our work also suggests direc- tions for potential improvements: more frequent measurement could potentially better track reachable addresses in dynamically assigned blocks. In addition, our approach to stability assists evaluation of long-term performance trends. 43 Responsiveness is helpful but not essential for many other topology studies (such as [GT00, FJJ + 01, SMW02, HFP + 02, MIP + 06a, CHK + 09]). Most topology studies employ traceroutes to study paths across the Internet. A traceroute attempts to discover all IP addresses on the path towards a destination, and many such paths are aggregated with alias elimination [GT00, SMW02, Key08] to produce a router-level map. How- ever, in such studies, the presence or absence of the destination itself aects only the last hop. Topology studies thus do not require responsive representatives, but they may benefit from responsiveness. However, many have moved towards the use random or deterministically selected representatives. As one example, topology probing in Skit- ter [HFP + 02] began with a manually generated hitlist, but later shifted to random prob- ing in Archipelago [CHK + 09]. Two reasons for this shift were diculty in maintaining a responsive hitlist and the recognition that responsive targets are not essential. Although responsiveness is not essential for topology studies that focus on the core of the Internet, it is important for studies that wish to explore the edge of the network. We can get a rough estimate on the number of edge links that are missed by randomly selected representatives: empirically, about 4–7% of the Internet responds to ICMP probes [HPG + 08], so we expect that 93% of random representatives will not respond (approximating the distribution of responders as uniform, to get a rough estimate). If 55% of our hitlists respond, that will improve edge detection for 38% of blocks. With 1.3 million allocated /24 blocks (as of Nov. 2009), statistics suggest that responsive hitlists will detect about 630,000 additional links compared to those found using a random hitlist. Some topology studies examining the core of the Internet find about 33M links, so this increases the size of the discovered Internet topology by 2% [CAI10a]. This simple analysis ignores correlation in the data, so it is only approximate. 44 To confirm this simple analysis, we consider CAIDA’s Internet Topology Data Kit (ITDK-2010-01 [CAI10a]). ITDK combines the results of traceroutes from many tracer- outes with alias resolution [GT00, CHK + 09] to produce a router-level map of the Inter- net. ITDK is formed from 42 cycles of data, each including a traceroute to a randomly chosen representative in each /24 prefix. Unfortunately, we cannot directly evaluate ITDK with dierent approaches to repre- sentative selection because it is the result of several levels of processing (aggregation of multiple cycles of data, and duplicate and alias elimination). In addition, direct compari- son would be somewhat misleading since the goal of ITDK is to map the Internet’s core, not edges. Instead, we obtained the raw IPv4 topology data [CAI10b] used to generate ITDK. First, we want to know how many more links can be found by traceroute to hitlist representatives than to randomly selected representatives. To do so, we compare how many responsive /24 prefixes found by using each method. Note that a fair compari- son should be comparing traceroutes of random representatives to traceroute of hitlist representatives. However, we don’t have traceroute data using hitlist representatives. Thus, we evaluate how many responsive /24 blocks are found by one cycle of random traceroute, while we test responsiveness of our HL29/16 (based on censuses from 2007 through Dec. 2009) by comparing to it30w completed in Jan. 2010. (Here report these results comparing to a single ITDK cycle, #739, of the raw IPv4 data. We get very similar results comparing to cycles 738 and 740.) For traceroute, the responsiveness of representative addresses may not be related to the responsiveness of the /24 block because a traceroute may find another router in the same /24 block in the intermediate hops. So when computing the number of responsive /24 block found by one cycle of ran- dom traceroute, we count in all IP addresses appeared in the log, including intermediate hops and responsive destinations. Since we only report responsive destinations found 45 routable /24 blocks studied 8,248,027 100% no prediction (N p ) 4,239,166 51% prediction (N p ) 4,008,861 49% responsive (only destinations) 2,454,500 30% (100%) random probing responsive (destinations and intermediate hops) 885,794 11% random probing responsive (only destinations) 730,496 9% (30%) extra responsive blocks found (destinations and intermediate hops) 1,568,706 19% extra responsive blocks found (only destinations) 1,724,004 21% (70%) Table 2.8: Evaluation of random and informed representatives on destinations. The random probing results are from one cycle of IPv4 traceroute data. The responsiveness of the informed representatives are evaluated against a census. by using hitlist representatives (equivalent to using ping), the result we report is a lower bound of responsive /24 block that can be found by tracerouting to hitlist representatives. Table 2.8 summarizes our observations when comparing our hitlist representatives with one cycle random representatives using traceroutes. First, we see that we only have data to make predictions for just under half of the blocks (N p of 4,008,861). For the remaining 51%, we too fall back on random probing. (We actually predict about 100k additional prefixes because we use a more recent routing table compared to the IPv4 raw data. However, we omit these from the table to provide a more fair comparison.) Second, we see that random probing finds responsive addresses about 9% of the time if only consider responsiveness of the destination addresses of the traceroute, and about 11% when including intermediate hops. This result is somewhat better than we predict (4-7% [HPG + 08]), a dierence probably because our prediction is over all allocated blocks, but here we study only routable blocks, and non-routable blocks cannot possibly predict representatives. Finally, we find about 1.5 million additional responsive blocks if we use our hitlist compared to random probing, about 1:7 more edge links than random probing alone finds. In addition, this 1.5 million increase is a lower bound, per our discussion in the previous paragraph. For studies of edge links, this is a significant 46 0 0.5 1 1.5 2 2.5 3 2 4 6 8 10 12 14 16 18 20 Number of responsive /24 blocks (million) Cycles One cycle informed hitlist 17 Figure 2.7: Number of responsive /24 block increases when combing more cycles of random ping probing. One cycle informed hitlist finds ths same number of responsive /24 block as 17 cycles of random ping probing. amount of additional coverage. Compared to the 33M links found in ITDK-2010-01, this use of informed probing would discover about 4.5% greater coverage. Using the above traceroute data, we can also compare the responsiveness of prob- ing destinations when using two ways to select representatives. This comparison shows how many more responsive representatives can be found by using our hitlist than ran- dom probing when we only care about the responsiveness of the destinations, such as when using ping probing. We see from Table 2.8 that hitlist find 1.7 million additional responsive blocks, about 2:3 more than random probing. Second, using ITDK data, we can simulate how the number of /24 blocks increase after combining more cycles of randomly selected representatives when only using pings. Dierent from the above comparison which focuses on link discovery of tracer- oute, this comparison focus on responsive /24 blocks found by pings. 47 Figure 2.7 shows that the number of responsive /24 blocks increases when using more cycles of random probing of pings. Because each cycle probes a dierent random destination, total coverage will improve as it eventually finds responsive destinations by chance. The data confirms that one cycle of random representatives only finds about 0.7 million responsive representatives, but more cycles can find more. We also see that our hitlist (one cycle) finds the same number of responsive addresses as 17 cycles of random representatives. So, to find the same number of responsive blocks, using our hitlist only needs 1/17 probing eort. Care must be taken in making the above comparison, because it is important to note that ITDK is designed to study the Internet core, not the edges, and this goal is well met with random probing alone. However, our results show they could get additional coverage with no additional probing eort by changing to representatives selected with a method such as ours. 2.3.6 Cost of Hitlist Generation Finally, turning from hitlist quality, we consider the cost of generating hitlists. Hitlists are generated from a number of Internet censuses. We therefore review the cost of taking a census and then look at the cost of processing censuses to produce a hitlist. Hitlists require Internet censuses as input. New censuses are started every two to three months; during most of that time four machines are actively probing the Internet, while there is a week or two of setup and data curation time at the beginning and end of a census [HPG + 08]. Each census currently requires 12–15GB of archival storage Censuses are carried out as part of an ongoing Internet topology research project and are used for multiple purposes, but if carried out exclusively for hitlist generation they would represent an ongoing cost. 48 Processing hitlist from census data incurs several ongoing costs. First, we maintain a master history file with bitmaps of all censuses to date, indexed by IP address. Each census creates an observation file with about 1–2GB of new positive observations. We merge observations into an ongoing history file of all observations to date. Currently 22 observation files (32GB total) are merged into one 7.5GB history file, and the size of that file grows by about 300MB every new census. We parallelize our computation with the Hadoop implementation [Cut06] of map/reduce [DG04], running over a cluster of about 40 computers with about 120 CPU cores. With this parallelism, join in a new census into an existing history takes about half an hour, and evaluation of a new hitlist takes about another half hour (our code is written in Perl and not optimized for speed). Our map function groups results by block, while the reducer carries out the join or evaluation. 2.4 Other Observations Given no knowledge about a /24 block, which address is most likely to be responsive? 2 This question has some bearing on which representative we should select for gone-dark blocks, or for newly-allocated blocks with no census data yet. Discussions with network operators suggest some network practices are common. Often addresses are allocated sequentially from the start of an block, and network man- agers often use the first or last address in a block for the routers. Since address blocks are allocated on powers-of-two according to CIDR [FLYV93], we expect to see uneven use of the address space. Recent work has confirmed visibility of allocation blocks in census data [CH09], but not last-octet usage. 2 We thank kc clay for suggesting the question of last octet distribution for study. 49 0 0.002 0.004 0.006 0.008 0.01 0 50 100 150 200 250 fraction of occurrence last octet Figure 2.8: Frequency of responsiveness by last octet of the address (from it29w). To evaluate this question, Figure 2.8 shows the distribution of responsiveness for the last octet for all IP addresses in it29w. (We got similar results on it28w and it30w cen- suses.) Consistent with expectation, the most responsive octet is .1, responding 0.86% of the time, more than twice as often as the median responsiveness (0.38%), and 1.5 more frequent than .129 (0.55%), the next most responsive last octet. Figure 2.8 shows a pattern in responsiveness, with responses being most frequent at addresses that are one greater than a power of two. The top ten are ranked .1, .129, .65, .33, .2, .254, .17, .193, .97, .9, and of these only 2 and 254 do not follow this pattern. To show this trend more clearly, Figure 2.9 shows the rank of each last octet as the area of a circle, with the octets arranged in a sequential grid (so the x axis lists octets sequentially in groups of 16, while each step up the y axis is 16 more than the previous). The vertical lines correspond to more frequent responses, with x = 1 showing strong response from .1, .129, .65, etc., and x = 9 for .9, etc. The other prominent features are .254 and across 50 0 32 64 96 128 160 192 224 0 2 4 6 8 10 12 14 16 address last octet, high 4 bits address last octet, low 4 bits Figure 2.9: Rank (shown by circle area) of responsiveness by last octet, in a 16 16 grid by address (from it29w). y = 0 (.1, .2, .3, etc.). While showing ranks exaggerate what can be small absolute dierences, these strong patterns show power-of-two allocations aect responsiveness. 2.5 Sharing Hitlists Our goal in generating hitlists is to share them with other research groups carrying out topology studies. We oer them free-of-charge to all, and up to May 2015, we have provided them to 11 other projects, from 8 dierent organizations. Although hitlists are not human subjects, networks are operated by and involve humans. Hitlist use by multiple projects prompts us to consider their distribution in the context of the Belmont 51 protocols [The79], weighing the benefits and potential costs of sharing and designing policies accordingly. 2.5.1 Benefits of Sharing Hitlists The benefits of sharing hitlists are similar to sharing of other research results. Shared data is a boon to researchers. A common data source can lower the barrier to entry for future research, and it also makes it easier for researchers to compare their results. (For example, the TREC benchmarks are seen as essential to rapid advances in the field of Information Retrieval [DH06], although our eorts are far more modest.) As impor- tantly, we expect that the scrutiny of multiple researchers on a common dataset can often identify data or methodological errors that might be otherwise unnoticed. (In Internet topology, the problem of alias resolution is one that is still being refined [BSS08,Key08], nearly ten years after the first techniques [GT00].) For the hitlist creator, a shared result amortizes the operational costs of collection and processing of the input data (Internet censuses) needed to create hitlists. Finally, for the hitlist subjects, network operators in the Internet, a common source allows us to centralize “do-not-probe” blacklists and reduces raw data collection. We have provided our hitlists to 11 other projects. Some of them are now pub- lished work and some are on-going. Here, we list four that have published results as examples to show our hitlists benefit other studies. These studies either directly use our hitlist or use the method of our hitlist prediction to select probing destinations. Marchetta et al. [MdDP13] choose traceroute probing destinations from our hitlist to study third-party addresses in traceroute paths. They benefit from hitlist’s high proba- bility of positive response to ping to eciently do large scale (327K destinations from 53 PlanetLab nodes) traceroute probings. Ackermann et al. [AA14] use our hitlist to filter out IP blocks that are either private or not assigned as public use to present their novel 52 methodology of analysing big datasets used in social science research. While providing allocated non-private IP blocks is only a natural consequence of making hitlist complete, not the primary goal of generating our hitlist, we are happy to see out hitlist can con- tribute in various ways. Hu et al. [HH12] use the hitlist prediction algorithm to select probing destinations to ecient geolocate a large number of IP addresses. They don’t directly use our hitlist because our hitlist contains only one representative IP address for each /24 prefix and they need to probe three IP addresses per /24 prefix. To eciently detect Internet outages, Quan et al. [QHP13] propose adaptive probing which selects probing destinations based on their responsiveness history, same concept as we do in this chapter. The dierence is adaptive probing extends our work by probing to multiple representatives in a /24 prefix if necessary, so as to confidently judge if the prefix is in an outage. 2.5.2 Costs of Sharing Histlists Shared hitlists have some costs, however. Most serious is that a hitlist can focus the probing of several researchers on a specific representative address in a network, while independently derived hitlists are more likely to distribute probing load. Second, even- tually hitlists will be acquired by malicious users on the Internet. Potential harms are hiding malicious trac mixed with research trac, and the slight risk that any list of known active IP addresses may be at risk of additional malicious trac such as worms or cracking attempts. While a risk, the eort to generate a hitlist is within the reach of a motivated individual, so strong restrictions on hitlists seem unwarranted. Our current hitlist distribution policies are designed to balance risks with benefits. Although we share hitlists free-of-charge, we provide them subject to a usage agreement. Hitlist users may not redistribute hitlists so we can establish this agreement directly with all users. Tracking hitlist users allows us to estimate load on representatives. We 53 also hope controlled hitlist distribution delays their acquisition by malicious parties. A 2013 project in our research group has seeded the hitlist with representatives that we monitored to track load, and at that time we saw no extra load on the representatives compared to the Internet “background radiation”. This observation suggests there is no evidence of focusing probing trac to histlist representatives at that time. As there are increasing number of users of our hitlist, we expect to review these policies as we gain more experience. 2.6 Related Work Hitlists are used in active probing for studies of topology [GT00, FJJ + 01, HFMkc01, SMW02, WCVY03, MIP + 06a, SBS08, CHK + 09], performance [Wol98, HFP + 02, MKBA + 09], and reachability [WMW + 06, BMRU09]. and for other purposes [Wol98, AJB00, CJJ + 02]. Each of these studies uses some hitlist (sometimes called a seed or probe list) generated manually, randomly, or automated from several sources. We review each hitlist generation method next. Early topology work used manually generated lists. Skitter is a well-known mea- surement tool developed at CAIDA, to study the IPv4-interface-level of the core Inter- net topology [HFMkc01]. (With IP alias resolution, this can provide a router-level map.) It uses traceroutes from multiple locations to a hitlist of destinations. Their target address list was manually built from many sources, including tcpdump from the UCSD–CERF link, hostnames from search engines, and intermediate addresses seen from their own traces records. In 2000 their hitlist included about 313,000 destina- tions, and by 2004 it had grown to 971,080. While their hitlist was of high quality, they found it very labor-intensive to maintain, and responsiveness degraded over time as destinations changed. (They report that their initial, web-server based list shows a 54 2–3% reduction each month in reachable destinations [HFMkc01].) The cost of manual list maintenance was one reason that prompted them to change to random probing with Archipelago [CHK + 09]. More recently, Bush et al. [BMRU09] has maintained a manual list, derived from the Skitter list, but augmented with guided scanning to cover each AS and provide 306,708 representatives. They require reachable addresses to study routing reachability [BMRU09]. We used a version of their list to seed our initial stable list, but our techniques provide much greater coverage at lower cost. Unlike all of these manual hitlists, our goal is to fully automate hitlist generation to allowing more complete and timely coverage. Random representative selection allows low-cost generation of hitlists to much larger numbers of networks. Mercator developed informed random probing to adaptively adjust its probe list based on prior results [GT00]. By adaptively growing the hitlist, Mercator strives to quickly and eciently discover a topology while minimizing hitlist size. Archipelago (Ark) is a measurement platform designed to support traceroute and other measurements [CHK + 09], eectively a next-generation Skitter. Ark’s hitlist cov- ers all routed /24 blocks, choosing a random last-octet within each /24 block, The ran- dom hitlists in Mercator and Ark are essential to cover the millions of /24 networks in today’s Internet, but Mercator’s adaptive algorithm means completeness is uncertain (although eciency, not completeness, was their goal), and random probing in both Mercator and Ark may sacrifice responsiveness of the destination address. Our hitlist provides complete coverage while maximizing responsiveness (both defined precisely in Section 2.2.1). In Section 2.3.5 we evaluate the degree to which informed hitlist generation may improve the number of links discovered in a topology study. Rather than a random destination, DisCarte’s hitlist selects the .1 address in each /24 block of the routed address space. DisCarte [SBS08] adds record route information to traceroute probing to obtain more accurate and complete network topologies. DisCarte 55 requires a responsive destination, and find 376,408 responsive representatives in the .1 address of each routed /24. Our work confirms that the .1 address is responsive twice as often as the address with median responsiveness (Section 2.4), but we suggest that census-informed representative selection can get much better responsiveness. Finally, there has been some work in IPv6 topology discovery. The Atlas system uses a manually generated list built from 6bone destinations [WCVY03], then expanded it based on discoveries. The full address-space census that is the basis of our work is only applicable to the IPv4 address space, so combinations of active and passive methods as proposed in Atlas are essential for IPv6. As future work, comparison of both active and passive hitlist generation approaches in the IPv4 space may provide a basis for inferring coverage of passive studies of IPv6. 2.7 Conclusions We have defined the properties that are important to hitlists: representatives that are responsive, stable, and provide complete coverage for the Internet. We have developed a fully automated algorithm that mines data from Internet censuses to select informed representatives for the visible Internet. We employ information that is available for about one-third of the Internet, and when an informed representative is available we see it is 50–60% likely to respond 2–3 months later. We showed that the primary reasons for prediction failure are blocks with dynamic addressing and gone-dark blocks that are probable firewalls. This chapter serves as a strong evidence to support part of the thesis statement that smart selection of probing destinations enables ecient enumeration of global Internet services (Section 1.2). In particular, we propose a new, automated approach to smartly select probing destinations (hitlist). We then show that our smart selection of probing 56 destination enables ecient enumeration of Internet edge links. Though we don’t pro- vide full enumeration, our hitlist can reach many more edge links (1:7) in one round of probing compared to previous random selection. Last, other studies benefit from our hitlists or method of generating hitlist to study dierent behaviors of the Internet, such as detecting third-party addresses in traceroute [MdDP13], massive IP geoloca- tion [HH12] and Internet outage detection [QHP13]. These studies demonstrate that ecient enumeration of Internet edge links enabled by our work in this chapter can help understand behavior of the Internet. In this chapter we showed that smart selection of probing destination enables e- cient service enumeration. In the next chapter, we will show that smart selection of probing sources can also enable ecient service enumeration to study global service behavior. 57 Chapter 3 Evaluating Anycast in the Domain Name System In this chapter we evaluate dierent approaches to characterize and enumerate anycast services in the Domain Name System (DNS). The Domain Name System, as described in RFC1034 [Moc87a] and RFC 1035 [Moc87c], is a distributed system that allows queries for service names and returns Internet addresses or other information. Anycast is an important routing and addressing mechanism that is used by many critical production DNS services as well as many large content delivery networks. In anycast [PMM93], a single IP address (anycast address) is allocated to multiple servers that are located in dierent topological locations (often also dierent geographic locations). Packets sent to the anycast address will be routed to the topologically nearest server serving the anycast address. While the proximity, anity and load balancing of anycast have been explored by prior work [BFR06,Col05,SPT04,LHF + 07,BLKT04], there has been little attention to third-party discovery and enumeration of components of anycast service 1 . Enumeration can reveal abnormal service configurations, such as masquerading (unau- thorized organizations providing replicas of the service, yet no malicious behaviors are observed from unauthorized replicas) or hostile hijacking of anycast services, and can help characterize the extent of anycast deployment. 1 The anycast study by Renesys was in parallel [Dou13] and there has been since studies on anycast enumeration [CJR + 15]. 58 To support our thesis statement, the previous chapter provides a strong evidence by demonstrating that smart selection of probing destinations enables ecient enumeration of the Internet edge links. In this chapter, we will demonstrate other parts of the thesis statement, as described inx 1.2, We first show that smart selection of probing sources enables ecient enumeration of anycast nodes of global anycast services. We select a large number of open resolvers as probing sources. This selection is smart because it is the only one that supports both high recall of enumeration and on-demand mea- surements, compared to other two selections of probing sourses. While we have many redundant probing sources, these probing sources are needed to guarantee maximiz- ing the recall of the enumeration. After demonstrating the smart selection of probing sources, and showing that it supports ecient anycast node enumeration, we also show that enumeration of anycast nodes helps identify abnormal service behavior such as masquerading and hijacking. Our proposed method for enumerating anycast nodes has been adopted by two large operational anycast services, L-root [AM14] and AS112 [AS112b]. The adoption of our work in operation serves as a strong support to our thesis statement. We also believe that many other studies can benefit from smart selection of probing sources. As illustrated in x 1.2, those studies that need to enumerate service replicas [AMSU11, BDS11, FDC14, SPT04] can benefit from smart selection of probing sources to increase the recall of replica enumeration. Those studies can use probing sources that are widely distributed in large number of networks, like what we do in this chapter. Part of this chapter was published in IEEE International Conference on Computer Communications (INFOCOM) 2013 [FHG13a]. 59 3.1 Introduction Rapid response and high availability require that large network services be distributed widely, often with a single logical service being provided by distributed replicas accessed using a single logical identifier. Content delivery networks (for exam- ple, [DMP + 02b]), mirroring services (for example, [CJJ + 02]), URNs [SM94], and IP anycast [PMM93] all fit this model. In IP anycast, as standardized by the IETF [PMM93, AL06], an anycast service operates on a single IP address, but multiple anycast nodes replicate that service at dierent physical locations. Each node may be implemented with one or more servers (a physical or virtual machine), each of which listens to the anycast address and often also one or more unicast addresses. Standard interdomain routing directs clients to the nearest replica and handles fail-over to other nodes as required. (We review details and terms in Section 3.3.) Anycast is used for many core services of the Internet today. It is widely used for DNS [Har02]: as of April 2011, 10 out of 13 root name servers employ any- cast [RD15]. Other uses include discovering IPv6-to-IPv4 relay routers [Hui01], sinkholes [GM03], for load distribution [The12, EPS + 98] and content delivery net- works [Pri11, FMM + 15, ALR + 08]. Anycasted services benefit from the load splitting provided by the routing system (because of the use of anycast) and ability to mitigate denial-of-service attacks [AL06], and research proposals have discussed improvements to scale to many anycast destinations [KW00]. The use of anycast for core Internet services suggests we need to understand its per- formance, use, and robustness. In this chapter, we focus on understanding anycast use in DNS. Extensive prior work (reviewed in Section 3.8) has measured server proximity, the anity between clients and anycast services, and the performance of load balancing 60 of anycasted DNS. However, to date there has been no eort to systematically discover and map anycast use in DNS. As we show in Section 3.6, such a capability can aid in diagnosing abnormal name service configurations, and help understand the extent of anycast deployment. The first contribution of this chapter is to evaluate dierent approaches to automat- ically discover and enumerate all nodes of an anycast service. To understand the chal- lenges in anycast discovery, we first classify anycast deployment configurations (Sec- tion 3.3). Anycast discovery is challenging because anycast configurations can be some- what complex, existing diagnosis methods are not standardized and can result in mea- surement ambiguity, and the visibility of anycast servers can be topologically scoped requiring a large number of vantage points. We then discuss the design of two methods to enumerate anycast nodes. The first method uses an existing anycast diagnosis technique based on CHAOS-class TXT DNS records [WC07], but augments it with traceroute to identify non-cooperative any- cast nodes and resolve ambiguities in CHAOS-based discovery (Section 3.4.1). This approach requires specific measurement support to be widely deployed, sometimes lim- iting its coverage. Our second method (Section 3.4.2) proposes the use of Internet-class (IN) TXT DNS records to enable the use of tens of thousands of recursive DNS servers as vantage points. Open recursive name servers (or open resolvers) are servers that accept and process queries made from any public Internet address (not just clients of the ISP operating the server; a more detailed definition can be found from this Internet- Draft [HSF15]). Finally, we identify security concerns to anycast diagnosis and describe how providers can control its use (Section 3.7). A careful validation of these methods, using 60k and 300k vantage points, reveals interesting trade-os (Section 3.5). CHAOS queries issued from 60k Netalyzr clients discover (measured using recall from information retrieval) 93% of F-root anycast 61 servers. However, because the CHAOS query format is not standardized, dierent providers use dierent conventions to identify anycast servers; this results in a mea- surement ambiguity that can be resolved using traceroute probes. A smaller scale exper- iment on PlanetLab using 238 nodes reveals that precision of CHAOS queries can be improved from 65% to 89% using traceroute, and to 100% if the provider’s CHAOS labeling conventions are known. Finally, we show that, up to 90% recall is possible on-demand when we shift to IN queries and 300k recursive DNS servers, as evaluated on the AS112 anycast service [The12]. More important, we find that, if randomly choosing probing sources, 10,000 or more vantage points are required on average to reach a recall of 80% for either method (Sec- tion 3.5.2) on the two large anycast services we study (F-root and AS112). For context, almost all prior work on anycast performance (the exception being [BFR06]) has used only hundreds of vantage points. Interesting future work may be to examine whether their conclusions would be significantly altered by a broader perspective as suggested by our approaches. Our second contribution is to understand how anycast is used in practice over many services (Section 3.6). Until recently, the AS112 anycast service used manual methods to track its extent of deployment; our evaluations find that the manual list is out-of- date and incomplete, with about 26% of listed nodes no longer operational, and four providers operating multiple nodes. Recently, AS112 operators have adopted a discov- ery method similar to what we propose. Second, we evaluate anomalous anycast usage (Section 3.6.1). We found one third-party DNS server masquerading as an anycast node for a public root server, and hundreds of users observe what are likely in-path proxies. This demonstrates the importance of dynamic discovery methods to audit critical Inter- net infrastructure. Finally, in Section 3.6.3, we apply anycast discovery to servers for all top-level domains, showing that up to 72% of TLDs (top level domains) may now 62 1 (DNS query): IN A “example.org” 8 (DNS response): 192.168.0.1 Client DNS resolver 2 (DNS query): IN NS “.org ” 3 (DNS response): 199.19.56.1 DNS root server DNS server of “.org ” (199.16.56.1) 4 (DNS query): IN NS “example.org” 5 (DNS response): 192.168.0.53 6 (DNS query): IN A “example.org ” 7 (DNS response): 192.168.0.1 DNS server of “example.org ” (192.168.0.53) Figure 3.1: Example of DNS resolution process. The user sends a DNS request to local DNS resolver (message 1), then the local resolver does the resolution for the user (message 2 through 7). Last, the resolver sends the response to the user (message 8). be using anycast. Moreover, we are able to estimate the distribution of TLDs across providers of anycast service and find that almost half the TLDs are hosted by two any- cast providers. Thus, our methods can lead to new insights about anycast usage and, in the future, enable an understanding of how this usage evolves over time. Data we generated for this chapter is free upon request [LAN04]. 3.2 Background of the Domain Name System In this section, we briefly provide background about the Domain Name System (DNS). For more information about DNS, please refer to RFC1034 [Moc87a] and RFC1035 [Moc87c]. DNS is a distributed system that provides a mechanism for naming resources on the network, and makes the names usable. The most common use cases are translating 63 domain names to the corresponding IP addresses. Each domain name has its authorita- tive name server, which is responsible for resolving the names under this domain. By resolving, we mean providing the information related to the names under that domain, such as IP addresses, authoritative name server address of sub-domain, etc. For exam- ple, the authoritative name server of example.com is usually responsible for resolving www.example.com, ftp.example.com, etc. The IP addresses of the authoritative name server of a particular domain name can be got from the authoritative name server of a higher level domain. For example, the authoritative name server of example.com can be got from the authoritative name server of com. The com domain is also known as a top level domain name (TLD) as only the root domain (“.”) is above it. To get the addresses of authoritative name servers of TLDs, one needs to contact root name servers. The addresses of root name servers are pre-loaded into the operating system, although there is work underway to refresh information with a primary query [KL15]. When a user wants to know the IP address of a domain, the user will typically contact a DNS resolver which is from his/her local ISPs or the user’s own organizations. The DNS resolver will do the name resolution for the user and cache the records. With the cache on the resolver, the resolver does not need to contact the root and authoritative name servers of the domain names they look up, and the users may immediately get responses if the records of the domain are cached. DNS records have Time To Live (TTL) values speci- fying after how long the records will expire. An example of DNS resolution process is shown in Figure 3.1. A User wants to know the IP address of a domain name (example.com), so the user sends a DNS query to a local recursive DNS resolver (message 1). The DNS resolver has no cached information, so it starts from contacting the root name server to get the authoritative name server of com TLD (message 2 and 3). Then the resolver contacts the authoritative name server ofcom to get the authoritative name server of example.com (message 4 and 5). The last DNS 64 query sent by the resolver is a query to the authoritative name server of example.com to get the IP address of example.com (message 6 and 7). Last, the resolver will send the response back to the user (message 8). DNS responses contain DNS records which are the answers to the requests. DNS records have standardized format, consisting of six fields: NAME (the domain name), TYPE (record type), CLASS (record class), TTL (time interval the record can be cached), RDLENGTH (length of the data field) and RDATA (data field, for example the actual IP address). Whenever a DNS request is sent, the request will specify the NAME, TYPE and CLASS (the rest three fields will be empty), and the corresponding reply will contain values for all six fields (NAME, TYPE and CLASS will be the same as the request). NAME specifies which domain name this query is about. TYPE spec- ifies the type of the DNS record. For example, if the DNS query is about to request an IP address, the TYPE is A (message 1 and 6 in Figure 3.1); and if the query is about to request the address of authoritative name server, the TYPE is NS (message 2 and 4 in Figure 3.1). In our anycast discovery, we use TXT as the record TYPE. TXT holds descriptive text message, which in our case can be identification information about the anycast servers and nodes. CLASS specifies the class of the record. The most com- mon CLASS is IN class, which means the Internet class. IN class is the default class of queries for translating domain names to IP addresses, getting authoritative name server addresses etc. Another kind of CLASS is CH class, which means the CHAOS class, and is re-purposed for server identification as we will describe later. Since we only use TXT type, in the rest of the chapter, we use IN query for short to refer to a DNS request that is in IN class and TXT type. We also use CHAOS query for short to refer to a DNS request that is in CHAOS class and TXT type. 65 3.3 A Taxonomy of Anycast Configurations IP anycast provides clients of a distributed service with a single-server abstrac- tion [PMM93]. Clients send trac to a designated IP address identifying the anycast service. However, the service itself is implemented by a service provider using one or more anycast nodes that can be physically distributed around the Internet. Standard routing protocols such as BGP ensure that the user’s packets are sent to a nearby anycast node. Since successive packets from a client can be routed to dierent anycast nodes (e.g., as a result of network dynamics) anycast is originally used mainly for stateless, single-packet-exchange services like DNS [Har02] or datagram relay [Hui01]. However, after the deployment of anycast in content delivery networks, it turns out that anycast can work well with connection-oriented protocols, such as TCP [Pri11, FMM + 15]. Figure 3.2 shows how three anycast nodes cover six ASes; each node covers a dier- ent region or catchment as shown by dierent shades of gray. We now discuss anycast routing terminology (RFC4786 [AL06]) and present a taxonomy of anycast node con- figurations; this taxonomy informs the design of our anycast enumeration methods. Routing configurations: Anycast nodes have two levels of external visibility: global and local. Global nodes can be seen across multiple ASes, while local nodes are visible only within the hosting or adjacent ASes. In Figure 3.2, anycast nodes N1 and N2 are global, each with catchments encompassing multiple ASes (AS10 and AS11; and ASes 20, 23, and 23, respectively), Node LN3 is local and so aects only AS32. Global nodes advertise an anycast IP prefix to BGP and are visible Internet- wide [AL06]. Local nodes advertise an anycast IP prefix with the no-export BGP attribute [Abl03] and are visible only to adjacent autonomous systems. Larger anycast services often include both local and global nodes, but either may be omitted. 66 Figure 3.2: Anycast and routing: three anycast nodes (N1, N2, and LN3) and their catchments (dark, light and medium grey regions). Observing two nodes from vantage points a to e. Anycast is available in both IPv4 and IPv6. In this chapter we consider only IPv4, although to our knowledge our approaches generalize to IPv6. Node configurations: While routing allows clients to access a nearby anycast node, a node itself can be structurally complex; each node may be implemented by one or more anycast servers. Figure 3.3 classifies all important configurations that we have encountered. The top row (T1) shows the simplest case, where a single server provides service at a given anycast node. This server listens to trac on the anycast service address. To provide access to management functions, it also typically listens on a second unicast address, unique to the server, but not shown in Figure 3.3. Since anycast nodes are often placed in Internet exchange points (IXP) with complex local topology, row T2 of Figure 3.3 and node N1 in Figure 3.2 show a single server with 67 Figure 3.3: Anycast node configurations. Observed labels in italics and penultimate-hop traceroute routers in bold. “VP” indicates vantage points, “R1, R2” indicates penulti- mate routers, “N1a, N1b” indicates servers in the same node, “N1, N2” indicates dier- ent nodes. links to multiple adjacent routers, either connected by a shared LAN or with multiple network interfaces. For large services such as a top-level domain server, a service at an anycast node may be provided by multiple physical servers. Cases T3 and T4 show multiple servers behind one (T3) or two or more (T4) routers. Node N2 in Figure 3.2 shows the T4 case. Nodes or servers often have labels for diagnostic purposes. Current diagnostic prac- tices encourage the use of unique labels [WC07], but in some cases, either due to mis- configuration or hijacking, dierent nodes can end up with the same labels. Case T5 identifies this incorrect case. We cannot distinguish T5 from T2 by external observa- tion; we see neither case in our studies of F-root and AS112 anycast services, but do observe such cases in our study of TLDs (Section 3.6.3). 68 3.4 Methods for Anycast Discovery We wish to discover all anycast nodes and servers. Since we focus on anycast service in DNS, our general method is to use special DNS queries to get the identity of each anycast node. Anycast nodes and servers cannot be enumerated simply by sending an IN-class DNS query to an anycast address, since standard responses contain no infor- mation specific to the anycast node. Instead, we must send queries from within the catchment of an anycast node that elicit unique information from that node, as shown in Figure 3.2. We next describe two such active probing methods. 3.4.1 CHAOS Queries Anycast providers require methods to observe and debug their services. Their current methods use DNS records to identify individual anycast nodes and servers as docu- mented in RFC-4892 [WC07]. Although not mandatory, we find these conventions used widely (Section 3.6.3). Since anycast is often used for DNS services, and DNS provides a convenient exter- nal query mechanism, RFC-4892 uses distinct DNS records to identify specific anycast servers. It re-purposes CHAOS-class network [Moo81] records from the now defunct Chaosnet to provide anycast diagnosis. Standard DNS records [Moc87b] with class CHAOS, type TXT, and name hostname.bind or id.server are defined to return a unique string per anycast server. The contents of the string are provider-defined and not for- mally standardized, although we have identified common conventions (see [FHG12]). In principle, presence of these records should make identifying anycast servers triv- ial. Standard DNS tools (such as dig or nslookup) can retrieve this information. Because CHAOS records are tied to individual servers, they correctly identify single-server nodes 69 (cases T1 and T2 in Figure 3.3) and can also detect each server in multi-server nodes (cases T3 and T4). In practice, CHAOS records are not always sucient to identify anycast servers. They are specified in an informational RFC, and not in a mandated standard, so providers may choose to not to provide them. They were initially implemented in the BIND imple- mentation of DNS (hence “bind” in the record name). As of Dec. 2011, half of the 16 dierent DNS implementations listed in Wikipedia support CHAOS records [Wik12], including all implementations we know that are used to host large services (BIND, NDS, Nominus ANS, Microsoft DNS, PowerDNS, and UltraDNS). In addition, CHAOS records indicate anycast servers, but conventions to relate anycast servers to nodes are unspecified. Thus, the multi-server cases T3 and T4 in Figure 3.3, or example N2 in Figure 3.2, require additional information to determine the two servers located at the anycast node. Finally, CHAOS records may be misconfigured (case T5 of Figure 3.3). For example, a DNS masquerader or hijacker may intentionally omit or provide dupli- cate CHAOS records (as shown in Section 3.6.1). These shortcomings motivate our design of a qualitatively dierent method based on IN queries (Section 3.4.2). However, it is possible to overcome some of these limitations by augmenting CHAOS queries with traceroute information. Using Traceroute for Disambiguation: Consider a traceroute from a vantage point to its nearest anycast node. We simplify the path and focus on the penultimate router, or PR. Our hypothesis behind this method is that each anycast node will have one PR, as exemplified by case type T1 in Figure 3.3. In practice, this hypothesis is only partially correct, since anycast nodes with a rich local topology sometimes have multiple PRs (case T2 of Figure 3.3) or multiple servers per node (cases T3 and T4). These cases complicate our analysis. Since these routers are nearly always co-located with the anycast node in the same IXP, we use simple 70 heuristics to partially address this problem. We assume routers with “nearby” addresses are in the same IXP; currently, we define nearby as within the same /24 address block. In Section 3.5.3 we show how a combination of the methods can help. From Figure 3.3, we see that traceroute complements CHAOS queries. Sometimes both methods work (case T1), or one of the two works (cases T2, T3, and T5). In case T4, both methods fail with an overcount of the anycast node, and when no vantage point is in the node’s catchment, they undercount. When possible we use them together: if either method results in a single identifier, we know there is a single anycast node, even if the other suggests multiple nodes. We take observations from all vantage points as input, separately merge records with duplicate CHAOS and PR identifiers, and finally merge these lists to get a combined estimate. Vantage Points: As a result of the specific naming convention used for anycast identifi- cation (hostname.bind), these records cannot be retrieved using recursive DNS queries. As such, use of this method requires customized software in each catchment. One option is to use public research infrastructure like PlanetLab. In our experiments we generally use 238 PlanetLab nodes, about one per unique site. However, as a research platform, PlanetLab servers do not provide the geographic and topological diversity we need to cover all catchment areas. Even today, with “only” around 500 sites, PlanetLab cannot cover all ASes. To overcome this limitation, we have also crowd-sourced anycast discovery. We requested the Netalyzr [KWNP10] developers to add our methods to their service. They implemented CHAOS queries, but omitted traceroute due to constraints of Java. In Section 3.5, we examine Netalyzr data obtained from about 62k vantage points. 71 3.4.2 IN Queries While the CHAOS query is current practice, its use requires diagnostic software at a vantage point in each anycast catchment. (For example, LN3 in Figure 3.2 is missed for this reason.) While Netalyzr’s clients provide reasonable coverage, we consider an alternative that provides more convenient anycast discovery. Regular Internet-class (IN) DNS records support recursive DNS (rDNS) queries, allowing the use of tens of thousands of open recursive DNS servers to serve as vantage points which can be accessed easily from a centralized measurement site. We therefore propose a new approach using IN TXT records for anycast enumeration. For IN queries, we propose that each anycast service define a designated subdomain _ns-diagnostics delegated to the anycast server. Inside that subdomain, dedicated TXT-type resource records identify anycast servers (label _instance-id) and nodes (_node-id) anycast instances. Thus, a node of the F-root could be identified by querying _node-id._ns-diagnostics.f.root-servers.net. The key advantage of IN records is that, unlike CHAOS records, they can be retrieved through recursive queries. We place them as a subdomain of the service domain so they require no new protocols; we use an unusual designated subdomain so their label is unlikely to conflict with existing domains. Our mechanism therefore requires that each anycast service create a separate name for diagnostic information, and that each server populate that name with server- specific resource records following our convention. Placing diagnostic information in the public namespace risks mixing user and operational information, it is consistent with existing usage such as in-addr.arpa reverse resolution, and allows use of one protocol for both purposes. 72 We have oered this proposal for standardization [FHG11], but it is under considera- tion and not yet deployed. However, the AS112 anycast service uses a similar approach; it provides a proxy to evaluate our approach in Section 3.5.2. Vantage Points: Our IN-class records can be queried using recursive DNS servers (rDNS), so they do not require custom diagnostic software (in PlanetLab or Netalyzr) at each vantage point. Many DNS servers oer recursive service, and a few hundred thousand of these support public queries. By sending queries indirectly through rDNS, each rDNS server eectively becomes a vantage point, potentially covering many more ASes and anycast catchments. We use open rDNS servers to quantify the performance of IN query based enumeration. 3.5 Validation We next evaluate the accuracy of CHAOS and IN queries, and illustrate the role that traceroute plays in improving the accuracy of CHAOS queries. 3.5.1 Methodology We are interested in the ecacy of the anycast discovery methods discussed in the pre- vious section. We evaluate this from many global vantage points to three large anycast services for which we have ground truth. Vantage points: We use three dierent sets of vantage points: Netalyzr clients, rDNS servers, and PlanetLab. The number and reach of these vantage points is shown in Table 3.1. We note that, with the exception of PlanetLab, the number of vantage points in our study is an order of magnitude higher than in previous work [BFR06]. 73 Vantage points number countries ASes PlanetLab 238 40 186 Netalyzr clients 61,914 164 4153 rDNS 318,988 220 15,210 Table 3.1: Diversity of vantage points Targets: We study three anycast services in our experiments. In most cases we study the F-root DNS service run by ISC, and the Packet Clearing House (PCH) Anycast service that provides service for 56 TLDs. We selected these providers as targets for two reasons. First, as large, professional anycast providers serving important major domains, they are representative of other major anycast providers. Second, both are non-profit organizations willing to support research, both with public descriptions of their infrastructure and willingness to respond to our queries. To evaluate IN queries, we use AS112 an anycast service [The12] providing reverse name lookups for private address space [AS15]. Ground truth: We consider two types of ground truth: oracle truth, and authority truth. By oracle truth, we mean the actual set of nodes that respond to an anycast address in the Internet at any instant. We identify it as “oracle” truth because defining it requires a perfect snapshot of network routing from all parts of the Internet—an impossible goal. We define authority truth as the list of anycast nodes that we get from the anycast service provider. Oracle and authority truth can diverge for two reasons. First, third parties may oper- ate anycast nodes for a given service with or without the provider’s knowledge. Such third party nodes would not be part of authority truth. We discuss an example of a third- party node for F-root in Section 3.6.1. Second, we derive authority truth from public 74 web pages, which can sometimes lag the current operational system, as discussed in Section 3.5.3. Metrics: Our active probing methods can result in several categories of results, with respect to authority and oracle truth. This influences our choice of metrics. Consider Figure 3.2 where five vantage points (a through e) probe three anycast nodes. Probes from a through c find N1, and d and e find N2, the true positives (or tp). Node LN3 is omitted because there are no vantage points in its catchment, so it is a false negative (an undercount, fn). To correct this error, we need a new vantage point in LN3’s catchment; we study this relationship in Section 3.5.2. There are three cases that might be classified as false positives. If we are unable to distinguish that two machines at N2 represent a single anycast node, then we would overcount. In an overcount, neither observation is completely wrong (since both N2a and N2b are anycast servers), but they result in a mis-estimate of the number of anycast nodes. When we detect a node that we confirm is operated by the anycast provider but is not in authority truth, we have a missing authority (“missauth” for short). Finally, if a non-authorized anycast node appeared in the AS with vantage point b, we record an extra node. An extra node is a false positive when compared to authority truth, but it is a true positive when compared to oracle truth. We define precision against authority and oracle truth: precision authority = tp tp+overcount (3.1) precision oracle = tp+missauth+extra tp+missauth+extra+overcount (3.2) In general, we do not have false positives (because everything we find is an anycast server). Therefore authority precision reflects our level of accidental overcounts due to multi-server or multiple PR nodes. 75 Recall captures the coverage of authority truth: recall = tp tp + fn (3.3) We do not define a recall for oracle truth because we do not have a complete set of the oracle population. 3.5.2 Recall Ultimately our recall is dominated by our ability to see dierent anycast nodes. At best, each vantage point is in a dierent catchment and sees a new node; at worst, they are all in the same catchment and we are uncertain if the target is using anycast at all. In Figure 3.2, we see that vantage points a, b and c duplicate each other, as do d and e. We next explore how query method and numbers of vantage points aect recall. Recall for CHAOS Queries: We first consider recall for CHAOS queries. Figure 3.4 shows recall as a function of number of vantage points for F-root from PlanetLab and Netalyzr. For each line, the right-most point represents the complete observation. We also plot recall from smaller subsets of vantage points by taking 1000 random samples of smaller populations to estimate the eect of numbers of vantage points on recall. For Netalyzr, we show quartiles and extrema with box plots; other cases have similar distributions, omitted for clarity. First, we see that with 62k vantage points, Netalyzr finds nearly all F-root anycast nodes at 93% recall (53 of the 57 ocial F-root nodes). By contrast, the 238 vantage points in PlanetLab provide a recall of only 37%. We also see a roughly logarithmic relationship between recall and the number of vantage points: recall grows very slowly with increasing numbers of vantage points. On average, about 10,000 vantage points are required to achieve 80% recall; we note 76 0 0.2 0.4 0.6 0.8 1 100 1000 10,000 100,000 Recall number of vantage points (log) F-root Netalyzr ( ) F-root PlanetLab ( ) AS112 rDNS ( ) Figure 3.4: Recall as number of vantage points vary. F-root/PlanetLab uses CHAOS+traceroute (best case: 37%); F-root/Netalyzr uses CHAOS-only (best: 93%); AS112/rDNS uses IN-query only (best: 90%). Lines show means of subsamples, and rightmost point is our complete observation. Boxes on F-root/Netalyzr show median and quartiles, with the whiskers showing extrema. that, with the exception of [BFR06] which used 20K rDNS servers, all prior anycast measurement studies have used far fewer vantage points. Recall for IN Queries: Our proposal for standardizing IN queries for anycast identifica- tion is not yet widely deployed. Fortunately AS112’s anycast service is ideal to test our IN queries approach because its providers have adopted the convention that each anycast node include a unique hostname.as112.net IN TXT DNS record; these records can serve as a proxy for our IN query based approach. We queried AS112 using over 300,000 rDNS servers, and found 65 servers; in con- trast, issuing IN queries for AS112 from PlanetLab reveals only 17 of these servers. These statistics reveal the scale of vantage points required in order to enumerate anycast 77 servers; almost 3 orders of magnitude more vantage points were required to quadru- ple the number of observed servers. This is more evident in Figure 3.4 which shows that 300,000 rDNS servers achieved a recall of 90%. Furthermore, our analysis of the recall achieved by subsets of rDNS servers reveals that almost 100,000 rDNS servers are required to achieve a recall of 80%. Intriguingly, rDNS exhibits lower recall than using Netalyzr clients (the line for AS112 is consistently lower than the line for F-Root); we have left to future work an understanding of whether this dierence results from dif- ferences in the two anycast services, or arises from the type or placement of vantage points. To compute recall, we needed to calculate the authority list of anycast servers for AS112, and this proved tricky. The AS112 project maintains a voluntary list of known providers [AS112a]. However, because all AS112 nodes are run by volunteers, who use public information to set up new nodes [AM11] and who only coordination loosely with each other, this list is both incomplete and out-of-date. In particular, AS112 coordinators and our data confirms that the list is missing some providers and has others that have ceased providing service. Each entry of the list identifies a provider by name and AS number. Some entries include one or more unicast IP addresses for an anycast node’s DNS server. The list identifies providers, not anycast nodes, so even when up-to-date, it can under-represent providers that run multiple anycast nodes. Table 3.2 compares anycast nodes found by our IN queries approach to this list. We found that rDNS discovered 35 nodes that were not in the AS112 list, confirming that the voluntary list is incomplete and that automatic diagnosis is important. Moreover, rDNS discovered 42% of the provider’s list; this value does not represent recall, how- ever, because that list is also out-of-date—some providers shut their services down but neglected to remove themselves from the list. 78 authority rDNS Found by rDNS, but not in ground truth 35 missing new provider not present 26 missing new (of 4 providers have multiple nodes 9 providers) missing new Operator list (authority truth) 70 100% both known found by rDNS 30 42% both known not found by rDNS 40 58% known possible missing possible alive 52 74% [100%] definite alive 37 [71%] (rDNS found by rDNS (and not BGP) 30 recall) [58%] both known (PlanetLab found by PlanetLab (and not BGP) 14 recall) [27%] both known found by BGP (and not rDNS) 7 [13%] known missing have rDNS in address block 1 no rDNS in address block 6 not found by any means 15 [29%] interpretation uncertain possible down 18 26% out-of-date corrected unicast-IP know and down 12 unicast-IP unknown, but have rDNS in address block 6 Conservative ground truth (52 + 35) 87 100% (oracle found by rDNS (30 + 35) 65 recall) 75% Realistic ground truth (37 + 35) 72 100% (higher bound found by rDNS (30 + 35) 65 recall) 90% Table 3.2: Evaluation of IN queries coverage compared to the AS112 providers list as ground truth. To build a more accurate “ground truth”, we evaluate which entries on the list are actually alive or we can confirm are no longer operational. Besides rDNS probes, we confirm nodes are alive or down in two additional ways. First, when the list provides a unicast IP address for the node, we can confirm its presence with direct unicast DNS queries. Second, we use BGP only to confirm live nodes. BGP cannot confirm down nodes because BGP also has limited recall. We probe the AS112 anycast prefix from 40 open BGP looking glasses and Merit’s BgpTables service [Mer11] which provides 38 BGP peers, and search for the provider’s AS number in any AS paths. 79 Using these methods, we confirm that 18 nodes in the list (26%) are no longer oper- ational. Finally, turning to the subset of 52 nodes in the list that we cannot prove are down, we see that IN queries alone find 30. From this analysis, there are two ways to define ground truth: a) the nodes on the list that are possibly alive (52), plus those found by rDNS but not on the list (30), or b) the nodes on the list known to be definitely alive (37), plus those found by rDNS but not on the list (30). Recall defined by (a) is 75%, while that defined by (b) is 90%. We argue that (a) is a conservative choice, and that the true recall is likely to be closer to 90% (since we were able to determine that 18 of the nodes are no longer operational). The AS112 community has recently recognized the need for automated methods of node discovery, and have recently implemented an automated discovery method that also uses rDNS servers obtaining similar corrections to their public ground truth [AS112b]. Role of Vantage Point Diversity: While we focus on the number of vantage points and their eect on recall, the role of additional VPs is their presence in multiple anycast cachements—the diversity of their network locations. There is clear redundancy in cov- erage, since there exist fewer than 100 anycast nodes in each of our systems, while we probe from thousands of VPs. While seemingly wasteful, such redundancy is essential to detect masquerading anycast nodes and to understand the scope of each cachement. In addition, the redundancy in coverage is a first step towards future selection of vantage points that try to achieve cost ecient measurements. Cost ecient measure- ment means less measurement trac and good recall of enumeration. A recent study on anycast enumeration [CJR + 15] shows that selecting a subset (about 200) of vantage points from a larger set (about 8,000) can achieve a recall that is close to the recall of using the larger set. Since the recall of a subset of vantage points can’t exceed the recall of the full set, getting a large number of vantage points to maximize recall is essential to improve the recall of a subset of these vantage points. 80 CHAOS queries: F-Root PCH authority truth 57 53 oracle truth 58 53 records considered 216 215 estimated anycast nodes 34 26 him true positives 21 26 overcounts 12 (0) 0 missing authority 1 0 extra 0 0 authority precision 64% (100%) 100% oracle precision 65% (100%) 100% Table 3.3: Accuracy of CHAOS queries without traceroute. 3.5.3 Precision for CHAOS Queries While determining the ground truth (and therefore recall) for AS112 was challenging, CHAOS queries face a dierent challenge: since CHAOS queries are not standardized, providers adopt dierent conventions for labeling servers and nodes, and this can aect precision. To evaluate precision, we use our PlanetLab experiments on F-Root and PCH; this is the only set of vantage points from which we were able to issue both CHAOS queries and traceroutes, and precision can be aected by the choice of whether to use traceroutes are not. Table 3.3 describes the precision of using CHAOS queries alone (without tracer- oute). PCH precision is 100%. F-root precision falls to 64%, mostly because of 12 overcounts. These overcounts are due to T3 or T4 configurations where multiple servers provide service for a single node. Since CHAOS records of ISC (Internet Systems Con- sortium, Inc. [Int15], ocial operator of F-root) are per-server (not per-node), multi- server configurations result in overcounts. CHAOS records also reveal one case of incomplete authority truth for F-root. Although missing from the public web page, ISC confirmed that the one anycast node 81 we found should have been listed. This missing authority makes our oracle precision slightly better than authority precision, from 64% to 65%. Our basic CHAOS algorithm does not interpret the contents of the reply, because there is no formal standard. However, each anycast service provider has their own con- vention, something we explore in Section 3.6.4. As an experiment, we decoded ISC’s convention, to extract identities of both the anycast node and the specific server. We show the results of this F-Root-aware CHAOS algorithm in parenthesis in Tables 3.3 and 3.4. This provider-specific interpretation makes the CHAOS method completely correct, suggesting it would be beneficial to standardize reply contents, or use other means of making this distinction. In the absence of being able to confirm provider-specific conventions, it is also pos- sible to use traceroute to improve precision. Combining traceroute with CHAOS queries introduces one new source of failure: if routing changes between the CHAOS observation and traceroute, analysis could incor- rectly combine observations from dierent nodes. We detected these cases and identi- fied them as false combinations removing them before analysis; they occurred primar- ily because our prototype took CHAOS and traceroute data hours apart. We plan to take CHAOS observations before and after traceroutes to automate detection of routing changes. Table 3.4 measures how much our results improve by augmenting CHAOS queries with traceroute. Combining the two sources allows true positives to follow the larger of the two stand-alone methods for both targets. It reduces overcounts by 75% (3 instead of 12 or 13) for F-root, even without decoding F-root CHAOS replies, and eliminates overcounts for PCH. These improvements translate into better precision for the combined method. For F- Root, precision rises to 88% (compared to 64% or 58% authority precision, with similar 82 Combined Method: F-root PCH authority truth 57 53 oracle truth 58 53 records considered 225 223 estimated anycast nodes 27 26 true positives 21 26 overcounts 3 (0) 0 missing authority 2 0 extra 1 0 authority precision 88% (100%) 100% oracle precision 89% (100%) 100% Table 3.4: Accuracy of CHAOS queries augmented with traceroute. results for oracle precision), and PCH precision remains at 100%, the maximum of the single-source algorithms. Thus, because CHAOS conventions are not standardized, augmenting CHAOS queries with traceroute can improve precision significantly (from 65% to 89% for F-Root). 3.6 Evaluation Methods for identifying anycast nodes can help uncover anomalies in anycast config- uration, characterize the level of deployment of anycast among root name servers and TLDs, and help us understand how anycast is managed as a service by providers for use by DNS root and TLD operators. This section demonstrates these capabilities. 3.6.1 Anomalous Anycast Configurations Root Masquerading: While validating CHAOS queries on F-Root, we encountered an anycast server that was not on ISC’s list of F-Root anycast nodes, and which returned an 83 empty CHAOS response. Discussions with ISC confirmed this site was masquerading as an F-Root anycast node—a non ISC server operating on the F-root IP address. ISC described two general cases where third parties operate nodes at the F-Root anycast address. First, some organizations operate local copies of the root zone, and internally masquerade responses to *.root-servers.org. While ISC discourages this behavior, redirection of internal trac is generally left up to the organization. Second, other organizations have attempted to hijack root DNS servers from others, often to promote a modified root zone. We observed this masquerading host from two vantage points inside CERNET, the China Education and Research Network. In both cases the PR of the target is 202.112.36.246, at AS4538 in CERNET. The contents of the two zones appeared the same based on the SOA record, although we did not exhaustively compare the zones. ISC identified this non-ISC anycast node as a masquerading node, not a hijacked one, and we concur. While this case represents a network provider choosing to handle requests from their own users using masquerading, nearly the same mechanisms can be used to detect hijacking. This potential illustrates the benefits of actively monitoring anycast services, at least until use of DNSsec becomes pervasive. In-Path Proxies and Others: Beyond masquerading, our anycast server discovery methods can identify other abnormal configurations. We detected these anomalies when analyzing our Netalyzr dataset. While Netalyzr does not augment CHAOS queries with traceroute, it does include CHAOS queries to each root server, IP resolution requests for www.facebook.com and for a non-existent domain name RANDOM.com (where RANDOM is a string longer than 40 characters, that triggers a non-existent domain error message). It also reports when the CHAOS queries timeout without response; we ignore these cases. We next use 84 Total observations 61,914 100% expected replies 59,509 96.1% no reply 1,289 2.1% firewall discards or routing failure abnormal replies 1,116 1.8% observations have fake F-root CHAOS 355 0.6% [100%] in-path proxies Got facebook or non-existent-domain 354 [99.7%] in-path proxies neither facebook nor non-existent-domain 1 [0.3%] in-path proxy or hijack/masquerade observations got empty F-root CHAOS 761 1.2% (100%) Got facebook or non-existent-domain 550 0.8% (72%) in-path proxies no facebook and non-existent-domain 211 empty CHAOS for all roots 93 0.15% firewall or hijack/masquerade valid CHAOS for some other roots 117 0.2% hijack/masquerade fake CHAOS for some other roots 1 in-path proxy Table 3.5: Anomalies found for F-root CHAOS records in Netalyzr data. this information, with additional manual probing, to infer possible root causes of these abnormal responses for F-root. In this dataset, we see two abnormal responses to CHAOS queries: incorrect CHAOS records and missing CHAOS records, making up about 1.8% of the obser- vations (Table 3.5). We believe these observations detect in-path proxies. Usually end- systems are configured to use a local DNS resolver. An in-path proxy is a network middlebox that captures and redirects all DNS trac directly. We believe that incorrect F-root CHAOS records (355 cases, 0.6%) indicate in-path proxies that modify CHAOS queries, since we know all F-root nodes provide correct CHAOS records. We believe these are in-path proxies because almost always (354 of the 355 cases) the client also gets a direct reply for Facebook from the supposed-F-root node. A true F-root server would not have directly responded with an entry for Facebook, but would have redi- rected to the .com DNS server. (The one case that omits Facebook is located in China, where Facebook is blocked.) 85 Empty CHAOS records occur more often (761 cases, 1.2%). In most of these cases (550, 72% of 761, 0.8% of all), we observe Facebook or non-existent domain replies, suggesting an in-path proxy for the same reasons as above. However, in some of these cases, we see an empty CHAOS record for F-Root, but also get neither a Facebook nor the non-existent domain reply. In some cases we get empty CHAOS records for all roots, in others we see some valid and other invalid CHAOS records. Without additional data (like traceroutes) we cannot diagnose these problems with certainty. We believe they are either firewalls or masqueraders. To summarize, in about 62k unique IP addresses from the clients who used Netalyzr (in our data), our data suggests that 0.2% appear to be behind potentially masquerading F-root nodes, while 1.4% (0.6% + 0.8%) see in-path proxies, and about 0.15% see other unusual behavior. These observations suggest that DNS manipulation is not common, but does occur. They also suggest the need for external monitoring using our IN-queries, and for additional information to disambiguate these cases, as with our use of traceroute. 3.6.2 Characterizing Anycast in Other Roots We have confirmed that with nearly 62k vantage points, CHAOS queries discover almost all F-root nodes. We next examine other anycast root servers with CHAOS or IN queries. Since all Roots have CHAOS records, we study all root servers discovered by Netalyzr clients (Arp 2012), and compare our findings to public records [RD15]. In addition, in Dec 2012, L-root told us about their support of IN queries for server identification, so we also use IN queries to characterize L-root. 3.6.2.1 Characterizing Other Roots with CHAOS queries We began by examining the CHAOS record formats for each root, finding that 9 use CHAOS records that embed location information in the string, while 2 have location 86 DNS root servers measured published found A (Verisign) 2 < 6 33% B (ISI) 1 = 1 100% C (Cogent) 6 = 6 100% D (Univ. of Maryland) 1 = 1 100% E (NASA) 9 > 1 900% F (ISC) 53 > 49 108% G (DISA) 6 = 6 100% H (U.S. ARL) 3 > 2 150% I (Automica) 39 > 38 103% J (Verisign) 59 < 70 84% K (RIPE) 17 < 18 94% L (ICANN) 78 < 107 73% M (WIDE) 6 = 6 100% Table 3.6: Comparing measured against published numbers of anycast nodes for all anycast root servers (April 2012). information in some records but not all. Our measurements above assume providers use unique CHAOS strings. In Table 3.6 we compare the number of measured anycast nodes, from CHAOS queries with 62k Netalyzr vantage points, against the published number from root-servers.org (as of April 2012). We expect 10 of the 13 to use anycast. In 3 of the 10 cases (F, H, and I) we detect anycast nodes not reported, suggesting public information is out-of-date, omitting up to 14 nodes. In addition in one case, E, we find that it uses anycast although that use is not published. Examination of the CHAOS strings suggests that NASA has outsourced E-root anycast to PCH. In 4 cases (A, J, K and L), we miss some nodes, either because recall with Netalyzr is not perfect, or because the published list is out-of-date. Finally, for three cases (C, G, and M), each with relatively few nodes, we find all. 87 Total number of nodes Found Recall 273 237 87% Table 3.7: Characterizing L-root using IN queries over open resolvers (January 2013). 3.6.2.2 Characterizing L-root with IN queries Enlightened by our proposal of using IN queries to facilitate probings from open resolvers, L-root starts to support mapping of L-root using IN queries [AM14] in 2012. The configuration of L-root IN record is not exactly the same as our pro- posal inx 3.4.2. Instead of adding IN TXT record directly under a subdomain of l.root-server.net, L-root runs a dedicated, separate DNS server that serves the IDENTITY.L.ROOT-SERVERS.ORG zone. IDENTITY.L.ROOT-SERVERS.ORG zone has IN TXT records for identification purposes, including hostnames and location informa- tion. The DNS servers hosting the IDENTITY.L.ROOT-SERVERS.ORG zone are assigned IPv4 and IPv6 addresses that are covered by the same routing advertisement as L- root server, thus for identification purpose, they are the same as L-root server. With this setting of L-root, we characterize L-root with IN queries that are sent to open resolvers. In addition, L-root also puts IN TXT record of the list of all L-root nodes under a domain name NODES.L.ROOT-SERVERS.ORG. We use the list of anycast nodes in this IN TXT records as ground truth to evaluate our recall of L-root enumera- tion. The details about IDENTITY.L.ROOT-SERVERS.ORG zone’s setup and domain name NODES.L.ROOT-SERVERS.ORG can be found in RFC7108 [AM14]. Table 3.7 shows the result as of January 2013. We find 237 of 273 total anycast nodes of L-root, achieving a recall of 87%. This result again proves that using IN queries with a large number of open resolvers can get a high recall of anycast node enumeration. 88 3.6.3 Anycast Use in Top-Level Domains Anecdotal evidence suggests that anycast is widely used in DNS, but to our knowledge there has been no systematic study of how extensively it is used. In this section, we determine how many TLDs use anycast by using CHAOS queries (with traceroute) on PlanetLab. Although PlanetLab’s recall is low, that should not aect the results dis- cussed in the section since we are not trying to enumerate all of the anycast servers in each TLD. Rather, we try to determine if more than one server responds to a CHAOS query sent to a TLD nameserver. Target: The targets for our study are the authoritative name servers for the country- code top-level domain names (ccTLDs), and the generic TLDs (gTLDs), as listed by IANA [Int12]. Together there were 1133 IP addresses providing TLD nameservice in April 2012. Methodology: We use CHAOS queries and traceroute against each IP address for each name server, querying from 240 PlanetLab vantage points. (We omit IN queries because, until further standardization, only AS112 and L-root support them.) We col- lected data on May 2011 and April 2012, and present the data collected on 2 April 2012. (We see similar results on other days in 2012, and fewer in 2011.) In Table 3.8 we interpret these results to identify definite and possible anycast ser- vices, since in this case there is no ground truth. Of these cases CHAOS > 1^ PR > 1 is the strongest evidence for anycast, though our combined method still finds a few T4 unicast cases. The other cases where CHAOS > 1 or PR > 1 are likely partially observed anycast addresses. We classify these results in three ways. Definite anycast means our method finds multiple nodes. Possible anycast means there are multiple records but we cannot guarantee anycast, such as when CHAOS == 0^ PR > 1 or 89 CHAOS traceroute (number of PRs) (# recs.) > 1 1 0 > 1 anycast; T4 unicast anycast; T3 unicast anycast; T3 unicast 1 T2 unicast; mis-config anycast unicast unicast; mis- config anycast 0 non-BIND anycast; T2/T4 unicast unicast insucient information Table 3.8: Interpretation of CHAOS queries and traceroute on TLD nameservers. 2012 April Results CHAOS traceroute (# PRs) (definite, (# recs.) > 1 1 0 possible) anycast > 1 255 (238, 0) 14 (3, 0) 159 (0, 159) (241, 159) 1 99 (2, 1) 117 (0, 2) 312 (0, 0) (2, 3) 0 44 (0, 44) 32 (0, 0) 101 (0, 0) (0, 44) total TLD name servers and anycast: 1133 (243, 206) Table 3.9: Anycast discovered for TLD name servers. The first number in braces is definite anycast, the second number in braces is possible anycast. CHAOS> 1^ PR == 0. Finally, definite unicast means the method confirms that there is only one node. Results: Table 3.9 shows our results of anycast deployment in TLD name servers. We report definite anycast as the first number in braces, and the second number is possi- ble anycast. We observe that about 21% (243 of 1133) of TLDs nameservers definitely 90 Number of definite possible higher bound TLD names anycast anycast 314 (100%) 177 (56%) 48 225 (72%) Table 3.10: Anycast services discovered for TLD names use anycast, while another 18% (206 of 1133) are possibly anycasted. If we adopt defi- nite anycast as a lower bound and definite plus possible as an upper bound, then at least 21% and perhaps 39% of TLDs nameservers use anycast. A complementary view classifies use of anycast by the name of the TLD, rather than by IP address. As there are always several authoritative name servers for a top level domain name, we count a TLD name as definitely anycasted if at least one of its authoritative name servers is definitely using anycast. Table 3.10 shows anycast deployment in TLD names. When there are no definite anycasted name servers, but at least one is possible anycast, then we count the TLD name as a possible anycast. We see that at least 56% of the TLD names are definitely anycast, and 72% of TLD names possibly so. Thus, more than half and perhaps three-quarters of TLDs include anycast as part of their DNS service. The main implication of these findings is that anycast is an important component of the DNS, and needs to be continuously monitored for abnormal configurations, mas- querading or, worse, hijacking (Section 3.6.1). 3.6.4 How Many Anycast Providers Exist? In the operational Internet, several providers provision anycast services; TLD managers operate their own servers use these providers. Thus, a single anycast provider may support multiple TLDs. In this section, we measure the number of anycast providers, and the distribution of TLDs across these providers. We next use our data of CHAOS 91 queries to all TLDs from Section 3.6.3 to explore these providers. Our results here depend on CHAOS naming conventions, and do not require full enumeration, so the moderate recall from PlanetLab’s few VPs. To identify anycast providers, we review the CHAOS queries to confirmed anycast nodes; we study 243 definite anycast nodes (Section 3.6.3). To go from services to providers, we examine the patterns in their replies. We iden- tify a potential provider based on either a unique pattern, or a provider-specific identifier in a standard pattern. For example, several organizations include the provider’s domain name in their reply, while others use distinctive patterns. We see two general patterns: in the most common (22 likely providers found), the reply uses hostname-like strings, often encoding geographic information along with server and node identity. Examples of this format include lax1b.f.root-servers.org for a server b at an IXP-1 in Los Angeles, and host.bur.example.net for server host at an IXP near Burbank airport. The second format we identify, with 10 likely providers, is even more provider specific, with just a serial number example1 for server 1 by provider example, or server plus a geographic code s1.lax for server 1 in Los Angeles. From these kinds of patterns we find 32 unique providers in the entire set. This count represent a likely lower bounds: “likely” since it seems unlikely for providers to use very dissimilar patterns, and a lower bound since the second format is general enough that two providers may have adopted the same scheme coincidentally. Figure 3.5 shows how many services each provider operates. We see that some providers are unique to one service (about two thirds, 20 of 32). A few large providers operate many services, with the top two providers (PCH and DynDNS) accounting for more than 58% of services. 92 0 10 20 30 40 50 60 70 4 8 12 16 20 24 28 32 number of services provider identifier Figure 3.5: Estimates of number of services (anycast IP addresses) operated by each estimated provider (as identified by CHAOS response patterns). 3.7 Security Implications Root DNS operations are critical infrastructure, so we next explore security implications of our approach. Some anycast providers consider any details of their infrastructure proprietary, to avoid giving information to attackers or competitors. Second, attackers can use our discovery methods to masquerade or hijack anycast services. We discuss solutions to these security threats in the rest of this section. 3.7.1 Limiting diagnosis to the provider While some anycast providers welcome open diagnostic tools, discussions of our pro- posed diagnosis mechanisms on IETF mailing lists suggest that several providers require the ability to limit their use. One concern is disclosure of details of anycast operation to 93 competitors. A second concern is that diagnostic information may assist attacks on any- cast infrastructure. For example, disclosure of the unicast address for an anycast server may enable a targeted denial-of-service attack. The challenge in limiting access to diagnostic information is that diagnosis requires probing from many and possibly unknown public sites, such as PlanetLab nodes or rDNS. Traditional access control, such as a whitelist implemented by a firewall, is therefore likely porous and not completely eective, dicult to maintain, and may also reduce diagnosis accuracy. We propose two methods to limit diagnosis to the provider: coordinated, chang- ing queries which are easier to deploy but less secure, and encrypted, changing replies which require server side software modifications of DNS implementation but are more secure. Coordinated, changing queries: To restrict access, providers can move information under a private name, so instead of placing _node-id in _ns-diagnostics.example.org (discussed in Section 3.4.2), it instead is stored under: nonce._ns-diagnostics.example.org, where nonce is a time- changing value known only to the providers and their authorized queriers. One way to compute the nonce is as a cryptographic hash of the time since an epoch concatenated with a secret value. Time provides a globally changing value; it should be rounded to a reasonable lifetime for the nonce (say, a few minutes), and both the current and prior nonce could be active to avoid requiring tight time synchronization. An attacker will not know the secret, so they can do nothing without snooping query trac. An attacker can easily snoop trac (say, by running an open recursive nameserver), but an attacker can only masquerade as the server during the nonce’s relatively brief lifetime. 94 The coordinated, changing queries are easy to deploy. Operators only need a tool to compute the nonce and then update the zone configure files of the anycast domain to put the resource record under the new nonce label. However, the success of this mechanism depends on attackers’ unawareness of the query names which is not always guaranteed as the attackers could get the query name from middle man attack. Specifically, if an attacker owns a router on the path that the query packet traverses, or even one of the open resolvers used for enumeration, the attacker can see the query name. Though we apply a short TTL to the query names, attackers still have a chance to use the query string for enumeration. To provide a more secure way for enumeration, we propose encrypted, changing replies that can theoreti- cally disable unauthorized enumeration of anycast nodes. Encrypted, changing replies: To diagnose or enumerate anycast nodes, querier now send DNS IN query for a static name like “ns-diagnostics.example.org” from multiple vantage points, and the replies are encrypted contents presented as ASCII strings. The process is shown in Figure 3.6. The querier and all the anycast nodes share a symmetric key K for encryption and decryption. The reply (R) is the encryption of a 3-tuple, which consists of the identity string of the anycast node (ID), the time the reply is sent (time) and a long sequence number (sequence) that is dierent for every reply in a certain period of time. By encrypting ID, time and sequence together, all the replies look dierent no matter the original queries go to the same or dierent anycast nodes. Only those who have the key K could decrypt the reply and get the IDs for diagnosis or enumeration. This method prevents unauthorized enumeration of anycast node because 1) unau- thorized queriers cannot decrypt the reply thus they cannot get plaintext ID; 2) unautho- rized queriers don’t know how many anycast nodes there are as all replies they get are dierent. 95 Figure 3.6: Encrypted, changing replies. All replies R1, R2, and R3 are dierent. Van- tage point 1 (VP1) and 2 go to the same anycast node, but the reply is still dierent, because the time and sequences are dierent. Reply from dierent anycast nodes are dierent, because the IDs are dierent (R1, R2 have ID 1 , R3 has ID 2 ). In addition, this method prevents hijacking or masquerading nodes from pretending a valid node by replaying old replies and there are two ways to achieve the detection. First, looking at the time field; second, auditing all the historic replies could find identical reply string which means a reply replay. This method requires modification at the server side of DNS implementation and thus it is more dicult to deploy. But the advantage over the coordinated, changing queries is that the encrypted changing replies theoretically disable unauthorized enu- meration of anycast nodes. 96 3.7.2 Discouraging Masquerader spoofing A second threat is masqueraders who may attempt to pretend to be a legitimate anycast nodes. The masquerader will likely receive queries from our diagnosis system. The masquerader then has two possible actions to prevent itself from being identified. It could discard the probe and not reply, or it could generate its own reply, possibly the response from a valid node. To protect against non-replying masqueraders, all legitimate anycast nodes must reply, and diagnostic queries must be retried several times to rule out packet loss. Protection from reply replay is dicult, since a masquerader could forward the request to a legitimate node out-of-band. If necessary, replays could be prevented by assigning each legitimate anycast node a public key and using a cryptographic challenge-response protocol. 3.7.3 Relationship to DNSsec DNSsec provides origin authentication and data integrity in DNS [AAL + 05]. In devel- opment for more than a decade, it has recently been deployed on root domains. However, DNSsec deployment does not fully address the problems we explore. By cryptographi- cally insuring the integrity of answers, DNSsec provides end-to-end validation of DNS contents. Our work in this chapter complements this role by providing diagnostic tools for DNS providers who use anycast. In addition, we provide auditing tools for the end user to assess service quality. We also provide tools to detect masquerading, helping identify some cases of possibly unexpected trac diversion (although not guaranteeing query confidentiality). 97 3.8 Related Work While there has been significant work exploring the DNS performance and anycast use in root nameservers, to date there has been little work exploring anycast discovery, at least outside the operational community. Anycast discovery: The DNS operational community has developed several tech- niques to support anycast diagnostics. The CHAOS query was first defined in RFC- 4892 [WC07], and although originally developed in the BIND implementation of DNS, the approach is now supported in other DNS server software (see Section 3.4.1 for a partial list). We carefully validate the precision and recall of this method, and suggest ways to improve its precision using traceroute. Subsequent standards activity has suggested the need for additional diagnostic meth- ods. RFC-5001 [Aus07] defines a new NSID (name server identifier) option for DNS. By using a new option, it diers from RFC-4892 in specifying that rDNS will not for- ward NSID requests. Although the RFC explores several possible payloads NSID could return, it explicitly defers standardizing contents to future work. A recent Internet Draft proposes using unique-per-node AS numbers for anycast node identification [MDS10]. When this method is widely deployed, it can be used for anycast enumeration. We expect that analysis of recall will apply here as well. Complementary to our enumeration of anycast servers, Gibbard relies on published root and TLD server information to analyze their geographic distribution [GH07]. Our work focuses on automatic instead of manual methods to identify nodes, but we do not geolocate nodes. Cicalese et al. [CJR + 15] propose a lightweight anycast enumeration and geolocation technique which uses multiple vantage point to issue latency measurements and uses speed of light constraints to enumerate and geolocate anycast nodes. Dierent from our 98 method which focus on DNS anycast services, their method can work with anycast ser- vices running on any protocol that supports latency measurements. However, as their methods depends on vantage points that supports latency measurements, the number of available vantage points is limited. In addition, they take a subset (about 200) of van- tage points from a larger set (about 8,000) by simply choosing vantage points that are far away from each other. They show that using this subset, the recall of enumeration is still close to the recall of the larger set. The dierent ways of smart selection of prob- ing sources between theirs and ours come from the dierent research goals. Their goal seems to be cost ecient enumeration of anycast nodes, thus they try to reduce redun- dant vantage points but sacrifice some recall. Our goal is to maximize recall, thus we use every vantage points that are available, including many redundant probing sources. Their results also suggest that our smart selection of probing sources can be used to drive further smart selection of probing sources that support cost ecient measurements. Root nameserver performance: Complementary to our work is a rather large body of work on measuring the proximity (client-to-server latency), anity (the stability of client-server association), and load balancing for DNS anycast. In general, meth- ods to study proximity compare anycast query latency with unicast latency to anycast servers from several vantage points [BFR06, Col05, SPT04]. However, at least one piece of work has explored proximity by measuring server-side accesses by clients, and geolocating clients to estimate latency [LHF + 07]. Several pieces of work have explored anity by periodically probing anycast servers to determine when routing changes cause anycast packets to be routed to a dierent server [BF05, Bus05, BFR06]. Boothe et al. observe that anycast anity measurement techniques can be used as a lightweight approach to understand BGP dynamics, since anycast routes are propagated using BGP [BB05]. Finally, load balancing is assessed by measuring client accesses at anycast servers [BFR06, BLKT04]. 99 Our work is inspired by these studies, but diers in several respects. While other work has explored the use of CHAOS records to study anity [Bus05, BB05, BF05, Sek05], we extensively validate CHAOS query use for anycast server enumeration and use it to characterize the use of anycast in TLDs. Most prior work listed above have used hundreds of vantage points, usually from PlanetLab; as our work shows, anycast recall is modest at the scale of PlanetLab implying that the conclusions drawn by prior work may need to be revisited. One exception is the work of Ballani et al. who have used 20,000 rDNS servers [BFR06]; our evaluations contain an order of magnitude more vantage points. Finally, the use of IN-class records for identifying anycast servers is not new; it has been used in AS112, and Ballani et al. [BFR06] use a similar mechanism to study anycast load balance in a controlled setting. Our primary contribution is a concrete proposal to standardize anycast identification using IN queries, and a careful characterization of its recall properties. 3.9 Conclusions Through its wide use in DNS, anycast has become an indispensable part of the Inter- net. We developed new methods that combine CHAOS queries with traceroutes, or use new IN records to support tens of thousands of open recursive DNS servers as vantage points. We find our methods have generally good precision and high recall. In partic- ular, we find that the topological dispersion of anycast requires a very large number of vantage points to enable high recall; on average, 10,000 vantage points are require for a recall of 80% if selected randomly. Finally, our studies of F-Root and PCH anycast infrastructure detect one third-party site masquerading as an anycast node, reveal sev- eral abnormal anycast configurations, and our evaluation of all country-code and generic top-level domain servers shows anycast is possibly used by 72% of the TLDs. 100 This chapter provides another strong evidence to support our thesis statement. We showed that smart selection of probing sources enables ecient enumeration of anycast nodes of large DNS anycast services. We believe this selection of probing sources is a smart selection for three reasons. First, we evaluate three kinds of probing sources— PlanetLab nodes, Netalyzr clients and open resolvers, and find that open resolver is the only one that supports high recall and on-demand measurements at the same time. Thus, the selection of open resolvers is smarter than selecting the other two kinds of probing sources. Second, while this selection of probing sources doesn’t support cost ecient probings due to the many redundant probing sources, it is a smart selection for our purpose—maximizing the recall of anycast enumeration. Last, our selection of large number of open resolvers is the first step towards further smart selection, such as a subset of open resolvers that support cost ecient measurements, like we do in next chapter. We then use the enumeration of anycast nodes to identify abnormal service behaviors, such as masquerading anycast nodes and in-path proxies. Our proposed method of using IN queries to facilitate probings from open resolvers has been adopted by two large DNS services, AS112 and L-root. The adoption of our work in DNS operation suggests that many other studies that need ecient enumeration of service replicas can also benefit from selecting large number of widely distributed probing sources, like we do in this chapter. In this chapter we select large number of open resolvers as probing sources to achieve high recall and on-demand measurements for anycast enumeration. In the next chapter, we still select open resolvers as probing sources, but use several techniques to reduce duplicated measurements. Without a large number of duplicated measurements, we can eciently enumerate CDN Front-Ends and study CDN Front-End to user map- ping. 101 Chapter 4 Assessing Anity Between Users and CDN Sites In this chapter we study the dynamics of the association between user and Front-Ends (FEs) for two large content delivery networks (CDNs), Google and Akamai. Front-Ends of CDNs are servers that users connect to to request web pages or services (x 4.2.1). Large web services employ CDNs to improve user performance. CDNs improve perfor- mance by serving users from nearby Front-End Clusters. Front-End (FE) Clusters are FEs in a single physical and network that provide the same services (x 4.2.1). CDNs also spread users across FE Clusters when one is overloaded or unavailable and others have unused capacity. Studying the dynamics of user-to-FE Cluster mapping helps answer the questions of whether and how user-to-FE Cluster mapping aect performance over time. In addition, studying the dynamics of user-to-FE Cluster mapping also help users and regulators understand where their data goes over time. This chapter supports the thesis statement as another strong example by demonstrat- ing that smart selection of probing sources enables ecient enumeration of CDN FEs to track and understand the dynamic of CDN user-to-FE Cluster mapping. Similar to last chapter, we first use a large number of open resolvers to maximize the recall of enumer- ation of CDN FEs. To study dynamics of user-to-FE Cluster mapping, we need to do the enumeration periodically and the enumeration should finish in short time in order to catch the mapping dynamics that happen in short time scalse. While theoretically we can parallelize the measurements and use all open resolvers to enumerate with the same 102 frequency, parallel measurements require many resources and are hard to manage. As a result, we go one step further by reducing the redundant probing sources to enable short-interval, periodic enumeration of CDN FEs. We show that using this ecient enu- meration of CDN FE Clusters we are able to study the behavior of user-to-FE Cluster mapping and understand its impacts. Our method of smart selection of probing sources in this chapter also generalizes to other studies, as discussed inx 1.2. There are stud- ies [SKB06, TWR11] that need both maximum coverage of service replicas and also periodic enumerations. These studies can first use a large number of diverse probing sources to maximize the recall, and then reduce redundant probing sources to make the enumeration more ecient like we do in this chapter. Part of the study of CDN Front-End clustering algorithm of this chapter (x 4.3.1, x 4.4.1) was published in IMC 2013 [CFH + 13]. Our previous work [CFH + 13] was about enumerating, clustering, geolocating Google FEs and understanding the use of those methods. Clustering in that previous work is the contribution of this dissertation and we refer to other contributions as prior work because they are out of the scope of this dissertation. The rest of this chapter that is about the study of dynamics of CDN user-to-FE Cluster mapping was mostly published in the 7th International Workshop on Trac Monitoring and Analysis (TMA) 2015 [FKBH15]. 4.1 Introduction Large web services serve their content from multiple sites to reduce latency to their customers, to spread load, and to provide redundancy in the face of local failure. These services use Content Distribution Networks (CDNs) that operate Front-End Clusters, each consisting of multiple servers in a specific location [ZHR + 12, DMP + 02b]. Users (perhaps all sharing a common network prefix) are often directed to specific FE Clusters 103 dynamically. This user-to-FE Cluster mapping may result from routing (anycast with BGP) [FMM + 15, Pri11, ALR + 08] or from DNS [DMP + 02a, KMS + 09a] controlled by a mapping algorithm [WJFR10, GS95, CC95, CC97]. Ideally user prefixes map to the nearest FE Cluster to minimize network latency. In practice, user-FE Cluster mapping is often more involved—a FE Cluster may be temporarily down, a nearby FE Cluster may be overloaded, estimates of user location may be incorrect or out-of-date, or peering may influence FE Cluster choice, as reported by Facebook [HBvR + 13]. Need to study dynamics of CDN user-to-FE Cluster mapping There are several rea- sons users, regulators, researchers, and CDN operators themselves should care about how CDNs map users to FE Clusters. Users care about performance, and we show that choice of FE Cluster can result in noticeable performance dierences (x 4.6). Regula- tors and some users may care about where their data goes, particularly when dierent political jurisdictions have dierent requirements for privacy. Countries have dierent policies about censorship [Wik13], and requirements for law enforcement access to user data vary by jurisdiction. Recent concerns about surveillance have prompted countries to suggest data should be kept domestically [Edg13]. While prior studies have enumer- ated and geolocated CDN networks [HWLR08, AMSU11, CFH + 13], an understanding of dynamics helps interpret such mappings. Understanding mapping dynamics also helps researchers place prior studies into context (x 4.8). Prior studies that look at location of CDN FE Clusters [CFH + 13], or use CDN mapping for other purposes [SKB06, CB08, OSRB12] may be aected if the measurements they build on change frequently, or direct users to distant FE Clusters. Finally, a better understanding of user-FE Cluster mapping will help CDN operators understand better how other CDNs work. If the mapping changes infrequently, are users sent out-of-their way for some time? 104 Contributions In this chapter, our first contribution is to demonstrate the thesis state- ment. We use a large number of open resolvers (about 600k) to enumerate CDN Front- Ends and then select a subset of open resolvers as our smart selection of probing sources which contains about 32k open resolver prefixes and has the same coverage on FE Clus- ters as the large set. This smart selection of probing sources enables us to do ecient enumeration of CDN Front-Ends to study the dynamics of user-to-FE Cluster mapping. In addition, we make the following contributions. The second contribution of this chapter is to cluster Front-End IP addresses into FE Clusters, adding outlier removal and dynamic thresholding to the existing meth- ods [XS05, MIP + 06b]. We evaluate our clustering algorithm and show that our pro- posals of outliers removal and dynamic thresholding are essential to achieve enough resolution to distinguish dierent FE Clusters in the same city. FE Clusters represent unique network locations, a view that IP addresses, prefixes, or ASes can obscure. Third, we provide the first evaluation of dynamics of user-to-FE Cluster mapping of two large CDNs from a large number of network prefixes. We collect data for the Google and Akamai CDNs from over 32k end-user prefixes, taken every 15 minutes for four weeks (x 4.3.2). In addition, we use 192 PlanetLab nodes to measure network and application latency of the two CDNs over one week. We find that many user prefixes experience mapping changes frequently. About 20% of Google user prefixes and 70- 80% of Akamai user prefixes seeing more than 60 mapping changes (twice everyday on average) in a month (x 4.5.1). We also identify several reasons for these changes, including server drain (to stop mapping any users to the server, as describe atx 4.7) and restoration (stop drain and resume mapping users back to previous drained servers) and load balancing. Last, we show eects of changes in user/FE Cluster associations on users. We find that, over one month, most prefixes (50–70%) are redirected from one FE Cluster to 105 Figure 4.1: CDN Front-End Clusters. another that is very distant, and that sometimes (28–40%) these shifts result in large changes in latency. These shifts are usually brief, but a few users (2–5%) receive poor performance much of the time. We also look at the geographic footprint of which FE Clusters users employ (x 4.6.3). We find that many prefixes are directed to several countries over the course of a month, complicating questions of jurisdiction. 4.2 Background: Measuring CDNs Our study draws upon existing approaches to measure CDNs, and our contribution includes clustering of CDN Front-End IP addresses, new long-term observations and analysis of dynamics. 4.2.1 DNS Redirection Basics CDNs deploy Front-Ends around the Internet. Front-Ends (FEs) are servers that users connect to to request web pages or services. For our purposes, we are interested in FE Clusters, each of which represents the FEs in a single physical and network location that 106 provide the same services. An example is shown in Figure 4.1. Four FE Clusters locate at dierent places around the world, FE Cluster B, C and D each have two FEs and FE Cluster A has four FEs. We discuss our method to cluster Front-End servers into FE Clusters in more detail inx 4.3.1. Some CDNs use DNS to direct users to a front-end. When a user performs a DNS lookup for CDN-hosted content, the CDN’s DNS returns IP addresses of a front-end(s) to serve that user. In practice, CDNs generally perform the same redirection for all users in a given network prefix [SBC + 13]. We call this association between network prefix and front-end the CDN’s prefix-FE Cluster mapping, since we do not dierentiate between front-ends within a given cluster. In this chapter, we only consider CDNs that use DNS-based redirection (we expect some of our techniques in the previous chapter could be used to study CDNs that use anycast). Generally, CDNs strive to map prefixes to nearby FE Clusters to reduce network latency. Here, we talk about latency sensitive dynamic services rather than large cacheable objects. However, CDNs may also vary the prefix-FE Cluster mapping dynamically to optimize performance across the CDN, perhaps load balancing prefixes across several FE Clusters. CDNs may also change the mapping dynamically in response to FE Clusters going down for maintenance. As shown in Figure 4.1, the user prefix p is mapped to FE Cluster A because FE Cluster A is the closest to p, However, p may be mapped to FE Cluster B sometimes due to load balancing or other reasons. When p is mapped to FE Cluster A at one time, then mapped to FE ClusterB at a later time, we call this a prefix-FE Cluster mapping change, and we call (A,B) the switching pair. The goal of our paper is to understand these mapping changes—how often do they occur, how many users are changed, where did they go before and after. 107 4.2.2 Enumerating CDN Front-End Servers Prior work [HWLR08, CFH + 13] used active probing to enumerate CDN infrastructure, generally by making DNS queries to a target hostname (or hostnames) of a service that is hosted by the CDN. To explore the global reach of the CDN and uncover prefix- FE Cluster mappings, these probes must represent vantage points in dierent locations. There are three specific approaches to generate queries from many vantage points. First, one can issue DNS queries from end-hosts in dierent networks, in our case, PlanetLab nodes. We follow this approach as described inx 4.3.2.2. Second, one can use the DNS EDNS-client-subnet extension [CvLL12], which some CDNs now support. This extension embeds a client’s prefix in the DNS query. The CDN’s DNS server replies with a FE Cluster selected for that client prefix. By varying the client-subnet given in the DNS query, prior work emulated access to vantage points in client prefixes around the world [CFH + 13, SBC + 13]. We use EDNS-client-subnet DNS queries to study Google’s prefix-FE Cluster mappings. Finally, one can query open DNS resolvers, those which will reply to requests from any Internet host, We have used the open resolvers to enumerate anycast nodes in Chap- ter 3. Prior work [HWLR08] has used open resolvers in dierent networks to enumerate CDNs. Since open resolvers are often in people’s homes, we use them judiciously to measure Akamai (x 4.3.2.2), which does not support our EDNS-client-subnet technique. Probing frequency: As a CDN can change the prefix-FE Cluster mapping at any time, in order to capture all prefix-FE Cluster mapping changes, one should ideally send DNS probes as fast as possible. However, DNS responses have cache lifetimes indicated by their TTL value (time to live). So, issuing measurements every DNS TTL interval should capture how quickly an end user might experience changes. Following prior 108 work [SKB06], we probe on DNS TTL intervals to capture prefix-FE Cluster mapping changes. 4.2.3 Geolocating Front-Ends To know more about the FEs and prefix-Front-End Cluster mapping changes, we also geolocate the FEs using our prior client-centric geolocation (CCG) approach [CFH + 13]. CCG assumes that most user prefixes mapped to a particular front-end server will be located close to the front-end server itself, so the locations of these prefixes suggest the location of the front-end server. The locations of the prefixes are from the Max- Mind [Max13] database, and the location of the FE Cluster is computed as the geo- graphic mean of the location of prefixes associated with that FE Cluster. CCG aggres- sively applies a number of filters to remove prefixes that may not be nearby. The types of prefixes that are filtered out include prefixes whose locations in Maxmind database have a coarse granularity, and prefixes that are mapped to geographically distant FE Clusters. 4.3 Methodology In this section, we discuss our method of Front-End clustering and how we collect data to study user-to-FE Cluster mapping. 4.3.1 Clustering Front-Ends Since we are interested in the mapping changes of FE Clusters, not IP addresses, we cluster IP addresses from server enumeration into FE Clusters. 109 We cluster by embedding each Front-End in a higher dimensional metric space, then clustering the front-end in that metric space. Such an approach has been proposed else- where [MIP + 06b, XS05, PS01] and our approach diers from prior work in using better clustering techniques and more carefully filtering outliers. In our technique, we map each front-end to a point in high dimensional space, where the coordinates are RTTs from n vantage points (in our case, 250 PlanetLab nodes at dif- ferent geographical sites). The intuition underlying our approach is that two front-ends at the same physical location should have a small distance in the high-dimensional space. Note that the other option for coordinates is the reverse-TTL (as proposed in [MIP + 06b]) of ping packet which measures how many hops the packet traverses from the destina- tion Front-End to our vantage point. As discussed inx 4.4.1, TTL-based clustering is also a good metric, but not as good as RTT. So we use RTT as the primary metric for clustering and use TTL clustering as additional reference to improve clustering results, as discussed later in this sections. Each vantage point i sends 8 pings in a few seconds to each Front-End IP, and we take the second to minimum RTT (` i ) to approximate minimum latency, while tolerating outliers. We combine n observations from n VPs into a high-dimensional point C: C =f` 1 ;` 2 ;` 3 ;:::;` n g We then compute the distance d ab of each pair of points (C a of Front-End IP a and C b of Front-End IP b) using Manhattan distance. In computing this Manhattan distance, we (a) omit coordinates for which we received fewer than 6 responses to pings and (b) omit the highest 20% of coordinate distances (j` ai ` bi j ) to account for outliers caused by routing failures, or by RTT measurements inflated by congestion. 110 Our way of outlier removal means only part of the coordinates are used to compute Manhattan distance. We may drop a dierent set of coordinates of the same point when we compute the distance from this point to two dierent other points. However, using a dierent set of coordinates to compute distance shouldn’t aect the clustering accuracy much. Because if two points are in the same cluster, then the coordinate distances should always be small, and if two points are in dierent clusters, then most of the coordinate distances should be large. The final Manhattan distance of two points won’t change much by just changing the set of coordinate distances used to compute it when the total number of coordinates are the same. We validate our omitting of largest 20% of coordinate distances inx 4.4.1. Finally, we normalize this Manhattan distance: C a =f` a;1 ;` a;2 ;` a;3 ;:::;` a;n g; C b =f` b;1 ;` b;2 ;` b;3 ;:::;` b;n g; m = 0:8 n; d ab = P m i=1 j` a;i ` b;i j m (Thej` a;i ` b;i j here are the smallest m fromfj` a;1 ` b;1 j;j` a;2 ` b;2 j;:::;j` a;n ` b;n jg) We normalize the Manhattan distance to allow their direct comparison in our clus- tering algorithm. Absolute comparisons of C vary because the number of vantage points may vary when observations are taken at dierent times, since VPs come and go. The final step is to cluster Front-Ends by their pairwise normalized Manhattan dis- tance. We use the OPTICS algorithm [ABKS99] for this. OPTICS is designed for spatial data, so instead of explicitly clustering points, it outputs an ordering of the points that captures the density of points in the dataset. As such, OPTICS is appropriate for spatial 111 same FE Cluster same FE Cluster in ground truth in our clustering tp yes yes tn no no fp no yes fn yes no Table 4.1: Judgement of the clustering result of each pair of Front-End IP addresses used in computing Rand Index. data where there may be no a priori information about either the number of clusters or their size, as is the case for our setting. In the output ordering, each point is annotated with a reachability distance: when successive points have significantly dierent reacha- bility distances, that is usually an indication of a cluster boundary. In our clustering we use a threshold of 2 time—if the reachability distance of current point is 2 times or more of the previous point’s reachability distance, we set the current point as the beginning of a new cluster. As we show inx 4.4.1, this technique, which dynamically determines cluster boundaries, is essential to achieving good accuracy. Metric for Clustering Evaluation The metric we use for assessing the accuracy of clustering is the Rand Index [Ran71]. The index is measured as the ratio of the sum of true positives (tp) and true negatives (tn) to the ratio of the sum of these quantities and false positives (fp) and false negatives (fn): R = tp + tn tp + tn + fp + fn Our definitions of tp, tn, fp and fn in the context of our clustering are shown in Table 4.1. The clustering result for a pair of Front-End IP addresses is, a tp if the two Front-End IPs are in the same FE Cluster in ground truth and also reported to be in the 112 same FE Cluster in our clustering; a tn if the two Front-End IPs are not in the same FE Cluster in ground truth and neither in our clustering result; a fp if the two Front-End IPs are not in the same FE Cluster in ground truth but are reported to be in the same FE Cluster in our clustering; a fn if the two Front-End IPs are in the same FE Cluster in ground truth but are reported to be not in the same FE Cluster in our clustering. Rand index ranges from 0 to 1, with 0 meaning there are no true positives or true negatives (complete error) and 1 meaning there are no false positives or false negatives. Post-clustering Optimizations for Google Front-Ends While our clustering has good accuracy, still about 3% of pairs of FE IPs are false positives or false negatives as shown inx 4.4.1. Inx 4.4.1 we show that we generally have very good accuracy when we use a set of Google FEs that have airport codes in their reverse DNS names for evaluation of clustering results. However, when we look at all the Google FEs, we find a few percent (4–6%) of our clustering results conflict with two kinds of external information as described below. We believe that those conflicts are false positives and false negatives and we try to correct them using the external information. While we describe and validate these optimizations using Google Front- Ends, they apply to Akamai as well. First, we use Google DNS replies to help fix potential false negatives. When a DNS query for www.google.com is sent to Google’s authoritative name server, the reply always contains a group of Front-End IP addresses (a Google DNS group) that are in same /24 prefix and are close in the address space. We believe that Front-End IP addresses in the same Google DNS group come from the same FE Cluster, and we show some supporting evidence inx 4.4.1. If we see IPs in the same Google DNS group are clustered into dierent FE Clusters, we think these FE Clusters are false negatives and merge them into a single one. 113 IP1 IP2 IP3 IP4 IP5 IP6 RTT clustering FE Cluster number RTT a ASN x y z TTL clustering FE Cluster number TT L a TT L b Final clustering FE Cluster number FN a FN b Table 4.2: Example of using ASN and TTL-based clustering to fix potential false posi- tives in RTT-based clustering. Second, we use AS number (ASN) information of Front-End IP addresses, together with evidences from TTL-based clustering to fix potential false positives. The intuition behind this fix is that IPs from a single FE Cluster should be from the same AS. When we see IPs in one FE Cluster have multiple ASNs, we treat them a potential false positives that may be corrected as following. We only separate the FE Cluster that has multiple ASNs when either or both of the following two conditions are met. One is that IPs from dierent ASes are well ordered in the OPTICS output and the reason they are not identified as dierent FE Clusters is that the distance between them don’t reach the threshold (2 times). The other is that TTL-based clustering reports that IPs from dierent ASN are from dierent FE Clusters. Table 4.2 shows an example of using TTL-based clustering to further divide a FE Cluster that has multiple ASNs. There are 6 IPs, IP1, IP2, IP3, IP4, IP5 and IP6, which are reported by RTT-based clustering as in one FE Cluster RTT a . IP1 and IP2 have ASN x, IP3 and IP4 have ASN y and IP5 and IP6 are from ASN z. Then, we look at these 6 IPs’ FE Cluster number in TTL-based clustering. IP1 through IP4 are reported from FE Cluster TT L a and IP5, IP6 are reported from FE Cluster TT L b in TTL-based clustering. Last, we will report IP1 through IP4 as in one FE Cluster and IP5, IP6 are from a dierent FE Cluster. Finally, by checking with Google DSN group information, we find that 127 Google DNS groups (6% out of 2,115) are separated after our clustering (detailed analysis is 114 coverage frequ- start date name where used target (prefixes) ency (length) Google-15min-EDNS x 4.5.1x 4.6.1 Google 32,871 15 min. 2014/03/28 (30) x 4.6.3 Akamai-Apple-15min-ODNS x 4.5.1x 4.6.1 Akamai 29,535 15 min. 2014/03/28 (30) x 4.6.3 Akamai-Hu-15min-ODNS x 4.5.1x 4.6.1 Akamai 28,308 15 min. 2014/11/17 (30) x 4.6.3 PlanetLab-DNS-TTL x 4.6.2 both 192 20 s/5 m 2014/04/23 (7) Google-15min-early x 4.7 Google 32,324 15 min. 2013/12/13 (30) Google-location-EDNS x 4.3.2.4 Google 10,057,110 1 day 2014/03/28 (30) Akamai-Apple-location-ODNS x 4.3.2.4 Akamai 271,357 once 2014/04/14 (-) Akamai-Hu-location-ODNS x 4.3.2.4 Akamai 185,370 once 2014/11/12 (-) Google-1day-EDNS drain/restoration Google 10,057,110 1 day 2014/04/01 (31) Google-HTTP-aliveness drain/restoration Google 10,057,110 1 day 2014/04/01 (31) Google-TLD-validation x 4.4.3 Google 148 once 2015/04/09 (-) ODNS-2013 x 4.3.2 - 271,357 once 2013/10/21 (-) Google-clustering-validation x 4.4.1 Google - once 2013/05/04 Google-clustering-140801 x 4.4.1 Google - once 2014/08/01 Table 4.3: Datasets collected as part of this chapter. described inx 4.4.1). By checking ASN of Google FEs, we find total 68 clusters (4.5% out of 1,488) have multiple ASNs among which 40 are separated just using OPTICS output ordering. For the rest 28 Google DNS groups, 20 are able to be fixed by referring to TTL clustering results, leaving only 8 (0.5%) that are either caused by stale ASN information or are false positives of our clustering. 4.3.2 Data Collection This chapter measures Google and Akamai using existing methodology. Our contribu- tion is new long-term observations and analysis of dynamics. Our datasets (Table 4.3) provide daily observations for a month from 10M prefixes, and frequent (15-minute) observations for a 30k subset of prefixes. 115 4.3.2.1 Targets We focus on the Google and Akamai CDNs because they are massively distributed, host popular services, and use DNS (not anycast) to map users to FE Clusters. Follow- ing prior work [CFH + 13, SKB06], we enumerate CDN infrastructure by issuing DNS queries for a service hosted by the CDN. For Google, we query for www.google.com. For Akamai, we query www.apple.com in Akamai-Apple-15min-ODNS dataset and www.huffingtonpost.com in Akamai-Hu-15min-ODNS dataset. They are both static websites hosted by Akamai. We query two websites for Akamai because our initial queries for www.apple.com, turned out to only cover a small set of Akamai’s FE Clus- ters while www.huffingtonpost.com has larger coverage. We expect our results for the specific Google and Akamai services that we study to generalize to other services they each operate that also use DNS-based redirection. Since the fundamentals of replica selection are similar, they may also apply to application-level redirection such as in YouTube and Akamai’s web caching, but we do not evaluate application-level services in this chapter. 4.3.2.2 Specific Enumeration Methods with DNS We want as complete an exploration of each CDN’s prefix-FE Cluster mapping as pos- sible. We use three techniques to get dierent kinds of coverage: EDNS-client-subnet and open resolvers to get broad coverage, and PlanetLab to get more controlled, detailed measurements. Broad probing We probe Google with the DNS EDNS-client-subnet extension, following prior work [CFH + 13, SBC + 13]. This approach allows one to emulate queries from any loca- tion, but while Google supports it, Akamai added support only to large open resolver 116 Google Akamai-Apple Akamai-Hu Total IPs 24,150 100% 685 100% 9,492 100% Un-clustered 1,471 6% 35 5% 649 7% Clustered 22,679 94% 650 95% 8,843 93% Un-geolocated 2,049 8% 90 13% 1,593 16% Geolocated 22,101 92% 595 87% 7,953 84% Clustered and geolocated 20,861 86% 577 84% 7,953 84% Total FEs (clusters) 983 336 1,195 Table 4.4: Statistics on the number of IPs and FE Clusters found for Google and Aka- mai. Dataset: Google-15min-EDNS, Akamai-Apple-15min-ODNS and Akamai-Hu- 15min-ODNS. operators, not end users [Sta14]. Thus we do not use it with Akamai and instead probe Akamai with open DNS resolvers to make DNS queries from around the globe, again following prior work [HWLR08,FHG13b]. Open resolvers are often in people’s homes, so we use them judiciously to measure Akamai. We choose a subset of global open resolvers that we collected in 2013 (ODNS-2013) as the source user prefixes. It con- tains 32,871 open resolver IPs, each from a unique /24 prefix, and covers 180 coun- tries/regions and 5158 ASes. We use about 32k open resolvers so that our measurement settings can finish a query in 15 minutes. The active measurements can be parallelized to use more open resolvers or to run more frequently, but we leave those measurements to future work. To identify the subset of 32k open resolvers, we start with all open resolvers and take five complete enumerations of mappings for both CDNs over two months. We then discard those that do not respond in every trial, and finally we keep only those necessary to complete the IP-level enumeration that we saw in our five trials. For Google, we issue DNS EDNS-client-subnet extension queries for the /24 pre- fixes 1 of the chosen open resolvers. Google hosts front-ends both on its backbone net- work and data centers (on-net) and in other ISPs around the world (o-net). We select prefixes to get broad coverage of FE Clusters. We find that our chosen open resolvers 1 We always use /24 prefixes and so just write prefix from here. 117 under-represent prefixes that are served directly from on-net FE Clusters. However, we believe our data is not drastically dierent from what we observe from all routable /24 prefixes, as the dierence is moderate (70% of prefixes are mapped to on-net FE Clus- ters in our data and 88% of all routable /24 prefixes are mapped to on-net FE Clusters from Google-location-EDNS dataset). For Akamai, we probe directly to the chosen open resolvers. We probe both Google and Akamai every 15 minutes for all the 32,871 prefixes. We choose 15 minutes to limit load we impose on open resolvers. Because of the nature of open resolvers, there is significant churn in which open resolvers respond. We post-process data from open resolvers and discard prefixes that miss more than 10% of their probes. After this cleaning we retain 29,535 and 28,308 prefixes in Akamai-Apple and Akamai-Hu. Table 4.4 shows the total number of front-end IP addresses we find using broad probing. In total, we find 24,150 Google front-end IPs. For Akamai, we find 685 front- end IPs hostingwww.apple.com (the Akamai-Apple dataset) and 9,492 Akamai front-end IPs hosting www.huffingtonpost.com in 30 days (Akamai-Hu dataset). We will see later that there are also many more FE Clusters hosting www.huffingtonpost.com than www.apple.com, and we believe this dierence comes from the dierent SLAs used by the two sites. Compared to published reports of the sizes of the Google [CFH + 13] and Akamai [ZAC + 13] CDNs, we know that the hundreds of FE Clusters we find are incomplete, but we believe we cover a good part of Google’s CDN (about 70% of prior results [CFH + 13]). Akamai runs tens of thousands of servers; our methodology tracks only the part of that infrastructure used by our targets. We focus on specific clients hosted by Akamai so we can study user-prefix dynamics for thousands of user prefixes without creating excessive measurement trac. We observe about three times more IPs in Google’s clusters compared to Akamai’s. Our methodology of sampling specific URLs means that we do not fully enumerate clusters, and load-balancing and other 118 factors mean IP addresses do not necessarily indicate cluster size, so we focus on clusters rather than IP addresses. To aid our later analysis on Google FE Clusters drainx 4.7, we also include two other datasets collected along with our prior work [CFH + 13]: Google-1day-EDNS and Google-HTTP-Aliveness. The Google-1day-EDNS is similar to Google-15min-EDNS that is also collected using DNS EDNS-client-subnet queries with a set of /24 prefixes. The dierence is two folds: first, Google-1day-EDNS uses all routable /24 prefixes instead of 32k open resolver prefixes in Google-15min-EDNS, so it has a much larger coverage on /24 prefixes; second, Google-1day-EDNS is collected less frequently, only once per day. Using Google-1day-EDNS we may only capture drain of Google FE Clusters which last more than 1 day. In addition to Google-1day-EDNS, we use the Google-HTTP-Aliveness dataset to check the aliveness of web service of every Google Front-End server. Google-HTTP- Aliveness dataset checks whether Google Front-End server has a running web service by sending HTTP HEAD request to every Google Front-End server and checking the response. If the server replies successfully and there is ”google.com” string included in the response, we think that the web service is alive on the Front-End server. Using Google-HTTP-Aliveness dataset, we know whether the drain of FE Clusters is due to service going down or not, as explained inx 4.7. Performance probing In order to also study the eects of mapping changes on user-experienced perfor- mance, we use PlanetLab to collect ping times to the front-ends and application-level page fetches, as described inx 4.6.2. We also issue frequent DNS queries from PlanetLab. Following prior work [SKB06], we probe on DNS TTL intervals (the quickest an end user might experience changes) 119 to capture prefix-FE Cluster mapping changes (TTL for Google DNS is 5 minutes and Akamai is 20 seconds). We collect our PlanetLab-DNS-TLL dataset using probing at these rates for 7 days. We use 192 PlanetLab nodes for this measurement since, unlike open resolvers, it is infrastructure intended to be used for research and measurement. Each node represents a single prefix. 4.3.2.3 FE Cluster Identification Since we are interested in mapping changes between FE Clusters, not IP addresses, we use cluster IP addresses into FE Clusters based on similarity of round-trip times from PlanetLab as described inx 4.3.1. Table 4.4 shows our clustering results. We find 983 FE Clusters for Google from 22,679 replying IP addresses. We were unable to cluster 1,471 Google IPs because they don’t respond to the measurements (pings) needed for clustering. For Akamai, we find 336 Akamai FE Clusters from 650 IP addresses in Akamai-Apple (with 35 IPs we could not cluster), and 1,195 Akamai FE Clusters from 9,492 IP addresses in Akamai-Hu dataset (with 649 IPs we could not cluster), We have no way of clustering and geolocating IP addresses that do not reply to measurements, so we must discard them. We show inx 4.4.2 that the part of IPs that we can’t cluster is relatively small compared to the measurable parts of the CDNs, and the prefixes are mapped to these IPs only for a small amount of time. So, discarding these IPs that we can’t cluster won’t aect our results. 4.3.2.4 Front-End Geolocation We geolocate FE Clusters in our datasets using our previous CCG technique (Client- Centric-Geolocation) [CFH + 13]. CCG geolocates FE Clusters by averaging the loca- tions of the prefixes they serve after aggressively removing prefixes clearly distant from 120 the FE. From that earlier study, we have daily measurements of Google since 2013. We use one month of that data, Google-location-EDNS, selecting the period and subset of prefixes to match our prefix-FE Cluster mapping datasets. We use an alternate source of data for geolocation since Akamai did not support EDNS-client-subnet queries when our measurements began (x 4.3.2.2). We collect data from open resolvers and apply the CCG algorithm to it ourselves. We use the whole set of open resolvers (ODNS-2013) we collected in 2013 as clients for CCG. The set of open resolver contains 600,000 open resolver IP addresses from 271,357 distinct /24 prefixes, covering 217 countries/regions and 11,793 ASes. Since it covers a fraction of the 10 million total routable /24 prefixes, we validate the use of CCG with open resolvers and find that it provides similar accuracy to CCG with all routable /24 IP prefixes. Our geolocation is accurate, with 90% of IP addresses having distance error within 500km (Figure 4.6 inx 4.4.5). CCG does not provide locations for 8% of Google IP addresses and about 16% of Akamai IPs (Table 4.4). Typically, CCG fails for Front-Ends that see an insucient number of clients, so these servers may be relatively unimportant. Since CCG applies to FEs and our study use FE Clusters, there are chances that FEs in a single FE Clusters may be geolocated to very dierent locations. We examine these cases and find that this inconsistency exists but is very rare. We look at the internal distance of each FE Cluster, which is the largest physical distance among every pair of FEs in a single FE Cluster based on the CCG results. We find that less than 1% of FE Clusters have their internal distance larger than 1,000km, and 5% of FE Clusters have their internal distance larger than 200km. While the CCG geolocation error is about 10% larger than 300km, greater than the inconsistency observed in clustering, we expect some of them are eects of geolocation errors. Thus, given that only a tiny number of clusters showing this inconsistency, we believe it won’t aect our result much. We 121 always assign a same location for every FEs in a FE Cluster, and the location is chosen randomly from the CCG result of one of the FEs in the FE Cluster. 4.4 Validation We next verify that our approach is correct. We first validate the accuracy of our cluster- ing algorithm. Then, we verify that discarding IPs that can not be clustered doesn’t aect our results. After that, we show that user prefixes are mapped to the same FE Clusters when accessing other top level domain names of google search as www.google.com. Last, we consider probing frequency and the interaction between our approach and geolocation. 4.4.1 Accuracy of Front-End Clustering To validate the accuracy of our clustering method, we run clustering on three groups of nodes for which we have ground truth: 72 PlanetLab servers from 23 dierent sites around world; 27 servers from 6 sites all in California, USA, some of which are very close (within 10 miles) to each other; and finally 75 Google front-end IP addresses that have airport codes in their reverse DNS names, out of the 550 (14%) having airport codes and of 8,430 (0.9%) total Google IP addresses as of April 16th, 2013. These three sets are of dierent size and geographic scope, and the last set is a subset of our target so we expect it to be most representative. In the absence of complete ground truth, we must rely on more approximate validation techniques: using PlanetLab, selecting a subset of servers with known locations, and using airport codes. As we discussed earlier, there are both on-net and o-net Google FEs. The systematic naming of many of the Google on-net FEs with airport codes caused us to validate them. And since many o-net FEs do not have uniform naming schemes, and most lack identifiable location or PoP codes 122 Experiment Rand Index False negative False positive PlanetLab 0.99 0 0.01 CA 0.97 0.03 0 Google 0.99 0.01 0 Table 4.5: Rand index for our nodesets. Our clustering algorithm achieves over 0.97 across all nodesets, indicating very few false positives or negatives. Dataset: Google- clustering-validation. that we can get, we didn’t use o-net FEs for validation. O-net FEs are mostly serving the hosting ISPs and probably the customer ISPs of the hosting ISPs. We expect that some o-net FE Clusters are hosted by the same hosting location or peer with the same local PoPs which will look identical in our measurements used for clustering. Thus, we expect that our clustering may show better performance for on-net FEs while having higher false positives for o-net FEs. We use some external information to help fix o-net clustering errors with our post-clustering optimizations (x 4.3.1). Our clustering method exhibits over 97% accuracy on three dierent test datasets. We show that it’s important to eliminate outliers and use dynamic cluster boundaries. On the Google IPs that have airport codes, our clustering show one kind of false negative that we believe to be correct and no false positive at all. We also find that using RTT is slightly better than using TTL for clustering. We also discuss some internal consis- tency checks on our clustering algorithm that gives us greater confidence in our results. Finally, to validate that IPs from one Google DNS group are all belong to a single FE Cluster, we use one clustering dataset covering all Google Front-End IP addresses that are collected in Aug 2014 (Dataset: Google-clustering-140801). 4.4.1.1 Performance of Front-End Clustering As described inx 4.3.1, we use Rand Index to evaluate the performance of our cluster- ing. Table 4.5 shows the Rand index for the 3 node sets for which we have ground truth. 123 Percentage of highest coordinate distance to omit 0 10% 20% 30% PlanetLab 0.98 0.99 0.99 0.99 CA 0.54 0.90 0.97 0.98 Google 0.93 0.99 0.99 0.97 Table 4.6: Rand index for our nodesets under dierent percentages of coordinate dis- tance to omit. Dataset: Google-clustering-validation. We see that in each case, the Rand index is upwards of 0.97. Since Rand index ranges from 0 to 1 with 1 meaning no error, our clustering is very accurate. This accuracy arises from two components of the design of our clustering method: eliminating outliers which result in more accurate distance measures, and dynamically selecting the clus- ter boundary using our OPTICS algorithm. Note that we get these results without our post-clustering refinements. Post-clustering refinements are applied after the OPTICS algorithm and are only used after clustering of all Google FEs (here we only validate ones that have airport codes). Our method does have a small number of false positives and false negatives. In the California nodeset, the method fails to set apart USC/ISI nodes from nodes on the USC campus (about 10 miles away, but with the same upstream connectivity) which leads to 0.03 false positive. In the Planet lab nodeset, some clusters have low reachability dis- tance that confuses our boundary detection method, resulting in some clusters being split into two. The Google nodeset reveals one false negative which we actually believe to be correct: the algorithm correctly identifies two distinct FE Clusters in mrs, as discussed later inx 4.4.1.3. 124 4.4.1.2 Need to Remove Outliers As we stated inx 4.3.1, we omit 20% of largest coordinate distance before the actual clustering to mitigate the eects of network congestion. Here we validate the need of omitting high coordinate distance and our choice of 20%. In Table 4.6, we show the clustering results (rand index) of the three nodesets when we omit dierent percentage of highest coordinate distances. First, we can see that if we don’t omit any coordinate distance, the rand index is always smaller than the rand indexes we get when we omit some high coordinate distances. The fact that we get poor clustering performance when we keep all coordinate distances suggests there are always congested ping RTTs that will aect the accuracy of our clustering. Second, we see that three nodesets show dierent trends when we increase the percentage of highest coordinate distance that are omitted. PlanetLab nodeset shows same Rand index when omitting 10%, 20% and 30% of coordinate distance, probably due to the number of vantage points that get congested RTTs being less than 10%. For the CA nodeset, we get better clustering when the number of high coordinate distances that we omit increases, suggesting there are many congested RTTs. For the Google nodeset, we get the best accuracy when we omit 10% or 20% of highest coordinate distances and the performance starts to drop when we omit 30%. We examine the Google data and find that when we omit 30% of coordinate distance, we lose some valid data that are critical to distinguish two FE Clusters that are close to each other. Last, our observation suggests that omitting nothing and omitting too many high coordinate distances can both aect the clustering accuracy, so we choose 20% as it provides the best results for two of our nodesets including one of our targets—Google. 125 0 1 2 3 4 5 0 10 20 30 40 50 60 70 Reachability distance Google front-ends (OPTICS output order) mrs muc mil sof eze sin syd bom del Figure 4.2: Distance plot of Google servers with airport codes. Servers in the same clus- ter have low reachability distance to each other thus are output in sequence as neighbors. Cluster boundaries are demarcated by large impulses in the reachability plot. Dataset: Google-clustering-validation. 4.4.1.3 Need Dynamic Boundaries To show the importance of using dynamic cluster boundaries, Figure 4.2 shows the out- put of the OPTICS algorithm on the Google nodeset. The x-axis in this figure represents the ordered output of the OPTICS algorithm, and the y-axis the reachability distance associated with each node. Impulses in the reachability distance depict cluster bound- aries, and we have verified that the nodes within the cluster all belong to the same airport code. In fact, as the figure shows, the algorithm is correctly able to identify all 9 Google FE Clusters. More interesting, it shows that, within a single airport code mrs, there are likely two physically distinct FE Clusters. We believe this to be correct, from an anal- ysis of the DNS names associated with those front-ends: all front-ends in one serving site have a prefix mrs02s04, and all front-ends in the other serving site have a prefix mrs02s05. 126 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 TTL Reachability distance Google front-ends (OPTICS output order) mrs muc mil sof eze syd sin bom del Figure 4.3: The output of the OPTICS clustering algorithm when reverse-TTL is used for the metric embedding. When using this metric, the clustering algorithm cannot distinguish FE Clusters at Bombay (bom) and Delhi (del) in India, while RTT-based clustering can. Dataset: Google-clustering-validation. 4.4.1.4 RTT vs. reverse-TTL An alternative metric for coordinate is reverse-TTL. Reverse-TTL is the TTL value of the ICMP echo reply packet that we received after sending a ping to a Google FE. It is originally set as 64 and was reduce by 1 whenever the packet traverse a router. Thus, the value we received is the result of 64 minus the number of routers the packet traversed. The benefit of reverse-TTL are two folds. First is that it is not aected by network congestion, and second it is not aected by forward paths. However, reverse-TTL-based (TTL-based from now on for short) clustering may be more sensitive than RTT when there are path changes. To understand the performance of TTL-based clustering, fig- ure 4.3 shows the OPTICS output when using reverse-TTL instead of RTT for the metric embedding. This uses the same set of Google servers as in our evaluation using RTT for metric embedding. We could see that TTL-based embedding performs reasonably well 127 but results in the OPTICS algorithm being unable to distinguish between FE Cluster in bom and del. RTT-based clustering is able to dierentiate these FE Clusters. More- over, although reverse-TTL suggests the possibility of two FE Clusters in mrs, reverse DNS-names suggest that it mis-identifies which servers belong to which of these FE Clusters. 4.4.1.5 Other Consistency Checks We also perform some additional consistency checks. We run our clustering algorithm against all Google front-end IPs that have airport codes (6.5%, 550 out of 8430). We find that except for the kind of false negatives we mentioned above (multiple FE Clus- ters within same airport code), the false positive rate of our clustering is 0. No false positive means we never merge two dierent airport codes together. Furthermore, when our algorithm splits one airport code into separate clusters, the resulting clusters exhibit naming consistency—our algorithm always keeps IPs that have the same hostname pat- tern<airport code><two digit>s<two digit>, such asmrs02s05, in the same cluster. 4.4.1.6 Front-End IP addresses in One Google DNS group Belong to the Same FE Cluster We use Google DNS groups to fix potential false negatives. We now show some evi- dences that IPs from the same Google DNS group belong to the same FE Cluster. First we find that for those Front-End IPs that have airport codes in the reverse DNS names, IPs in one Google DNS group all have exactly the same hostname pattern. We believe this is a strong evidence that Google tend to use IPs in the same FE Cluster to reply as a Google DNS group. Second, we also see that IPs in one Google DNS group are also very likely to be clustered together after our OPTICS clustering step, which strengthens our believe that 128 TTL RTT clutering Total Clustering Single cluster Multiple clusters Not clustered Single cluster 1,728 (82%) 97 (4%) 1,825 (86%) Multiple clusters 220 (10%) 30 (2%) 250 (12%) Not clustered 40 (2%) 40 (2%) 1,948 (92%) 127 (6%) 40 (2%) 2,115 (100%) Table 4.7: Statistics of the number of Google DNS groups that are clustered in a single FE Cluster or separated into multiple FE Clusters. Dataset: Google-clustering-140801. they are each from a single FE Cluster. We use RTT-cluster to refer to a cluster that results from our RTT-based clustering (OPTICS clustering using RTT as coordinates), and TTL-cluster to refer to a cluster that results from our TTL-based clustering (OPTICS clustering using reverse-TTL as coordinates). As shown in Table 4.7, there are total 2,115 Google DNS groups and 1,728 (82%) of them are each in a single RTT-cluster and also a single TTL-cluster. Given that our results after OPTICS clustering algorithm (RTT-clusters and TTL-clusters), have good accuracy as shown inx 4.4.1.1, we believe most Google DNS groups that are each a single RTT-cluster and a single TTL-cluster should be real FE Clusters. Thus, we can infer that for 82% of Google DNS groups, we have strong evidence that they are each in a single FE Clusters of Google. In addition, we want to further investigate those Google DNS groups that are separated into multiple RTT-clusters or TTL-clusters to see if there is a chance that the separation are actually false negatives. We look at how many Google DNS groups are reported as either a single RTT-cluster or a single TTL-cluster. We find that besides the 82% of Google DNS groups that are both a single RTT-cluster and a single TTL-cluster, an additional 10% of Google DNS groups are each in a single RTT-cluster and another 4% are each in a single TTL-cluster. As a result, for 14% of Google DNS groups, we have some evidences that they should be each in a single FE Cluster and for an additional 82% of Google DNS groups we have strong evidences that they should be each in a single FE Cluster, leaving only 4% of Google DNS groups being inconsistent with our clustering 129 Google Akamai-Apple Total prefixes 32,871 [100%] 29,535 [100%] Prefixes mapped to unclustered IPs 5,458 [17%] 2,479 [8%] Prefixes never mapped to unclustered IPs 27,413 [83%] 27,056 [92%] Table 4.8: Number and percentage of prefixes mapped to unclustered Front-End IPs. results. This result further strengthens our confidence that most Google DNS groups are each in a single FE Cluster. To sum up, because airport codes in reverse DNS name patterns are consistent with Google DNS groups, and our clustering results also suggest most Google DNS groups should be each in a single cluster, we believe all IPs in a same Google DNS group should be in the same FE Cluster, 4.4.2 Does Discarding Non-clustered IPs Aect Our Results? In this section, we examine whether discarding unclustered IPs (x 4.3.2.3) aects our results a lot. To understand the influence of these unclustered IPs, we characterize how many prefixes were mapped to these unclustered IPs and how long those prefixes stay there. To characterize how long prefixes stay on unclustered Front-End IPs, for each pre- fix, we calculate the fraction of observations at which the prefixes is mapped to unclus- tered IPs. Table 4.8 shows how many prefixes were mapped to unclustered Front-End server IPs. We can see that there are a few prefixes—about 17% for Google and 8% for Akamai were be mapped to unclustered Front-End IPs. In addition, Figure 4.4 shows the CDF of fraction of time prefixes spend at unclustered Front-End IPs. We can see that only 9% of prefixes are mapped to Google unclustered IPs more than 20% of time, and for Akamai, only 1% of prefixes are mapped to unclustered IPs more than 20% of time. Thus, we can conclude that discarding unclustered IPs won’t aect our results much, 130 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 CDF of prefixes Time fraction of prefix mapped to unclustered IPs Akamai-Apple Google Figure 4.4: CDF of fraction of time prefixes stay at unclustered Front-End IPs. Dataset: Google-15min-EDNS and Akamai-Apple-15min-ODNS. because there are only small number of prefixes mapped to unclustered Front-End IPs and most prefixes only stay there for short time. 4.4.3 Are Prefixes Mapped Dierently When Accessing Other TLDs of Google Search? Google search is not only available at google.com domain, but also available under many other dierent TLDs, such as www.google.de, www.google.com.hk and many more. Since we study www.google.com, we want to know if www.google.com is rep- resentative for other domain names of the Google search service. To do so, we check if user prefixes are mapped dierent when accessing dierent TLDs of Google search. To understand if Google map dierent domain names dierently, we choose 9 other TLDs of Google search service together withwww.google.com as study targets, listed in Table 4.9. To know whether users are mapped dierently when accessing these domains, 131 www.google.com www.google.co.jp www.google.de www.google.com.br www.google.co.uk www.google.fr www.google.ru www.google.it www.google.es www.google.com.hk Table 4.9: Domain names of Google search service we use to study whether dierent domain names have dierent user to FE Cluster mappings. Total PL nodes 148 100% All Google domains mapped to same FE Cluster 143 96.6% Google domains mapped to two FE Clusters 5 3.4% Same airport code in reverse DNS names 2 1.3% No airport codes in reverse DNS names 3 2.1% Table 4.10: Statistics of the number of PlanetLab nodes that see IPs all from one FE Cluster and from more than one FE Cluster. Dataset: Google-TLD-validation we use 148 PlanetLab (PL) nodes to make DNS queries to these 10 Google domains once (Google-TLD-validation dataset in Table 4.3). For each PlanetLab node, if all 10 domains are all served by IP addresses from the same FE Cluster, then we know dierent domains use the same user mapping. If dierent domains are served by dierent FE Clusters for a certain PlanetLab node, then we collect the reverse DNS names of the Front-End IP addresses to do further analysis. From the pattern of the reverse DNS names, we could get a sense of whether the Front-End IPs are from a same location or organization. For example, if we see same airport code in the reverse DNS name, we know they are from FE Clusters in the same city. The result is shown in Table 4.10. We can see that for most PlanetLab nodes (96.6%), all 10 Google domains are mapped to IPs from same FE Cluster, suggesting for most 132 users dierent domains are mapped the same way. For the 5 PlanetLab nodes that see multiple FE Clusters, at most they see two FE Clusters. Using reverse DNS names, we expect that 2 PlanetLab nodes observe two dierent PoPs in the same city as they see IPs with the same airport code. These 2 PlanetLab nodes suggest there are a small number of users being mapped similarly for dierent domains. For the rest 3 PlanetLab nodes, the reverse DNS names they see have no airport codes and the pattern of the reverse DNS names are dierent for two FE Clusters, thus we treat them as mapped dierently on two FE Clusters To conclude, we can confirm that for most prefixes, Google maps dierent domain names of Google search to the same FE Cluster, thus our results on www.google.com should be representative of other TLDs Google search service use. 4.4.4 How Often To Probe? Most of our measurements make probes every 15 minutes. However, the DNS TTL intervals of the two CDN are both smaller than 15 minutes, suggesting prefixes may see mapping changes more frequently than we can observe. As a result, we want to know how many mapping changes we miss because we use a probing interval larger than DNS TTL. Here we examine our rapid probing data PlanetLab-DNS-TTL, taken as described inx 4.3.2.2 to understand the eects of probing frequency. We evaluate the eects of probe frequency by collecting data as fast as necessary (PlanetLab-DNS-TTL dataset) and downsampling this data to 15 minute intervals to represent our broad probing method. By “as fast as necessary”, we mean taking data at the same time interval as the DNS TTL value (20 seconds for Akamai and 5 minutes for Google), because this is the shortest time interval users can see mapping changes. Because of oversampling, we can downselect at dierent phases to get multiple possible 15-minute equivalents (3 for Google and 45 for Akamai). 133 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 CDF of prefixes Fraction of fastest-observable mapping changes seen Google all Google unique Akamai all Akamai unique Figure 4.5: Eects of probe frequency on all and unique observed mapping changes. (Dataset: PlanetLab-DNS-TLL) Defining ground truth as the number of mapping changes present in PlanetLab-DNS- TTL, the “Google all” and “Akamai all” lines in Figure 4.5 compares how many changes we see in downsampled data for all 192 prefixes. Because we have multiple downsam- pled datasets, we report the median percentage each prefix sees (the minimum and max- imum are similar to median). We see that a few prefixes (18% for Google and 6-10% for Akamai) are stable, where 15-minute sampling always sees all changes. However, for many prefixes (80% with Google, and 90% with Akamai), a 15-minute sampling interval under-reports changes, capturing fewer changes than might be seen in practice. While these numbers suggest undersampling, we believe 15-minute probing cap- tures the correct “character” of the prefixes. We believe that the very frequent changes that we miss are due to load balancing that is computed dynamically on each DNS request. To test this hypothesis, we look at the number of unique switching pairs over our 7-day observation. Measuring unique switching pairs steps back from the eects of 134 load balancing, where prefix p might swap from FE A to B and back each observation. Hundreds of switches between A and B and back represent only two unique switching pairs (A; B) and (B; A). The “Google unique” and “Akamai unique” lines in Figure 4.5 shows the fraction of unique switching pairs we observe after downsampling to 15 minutes. For Google, a few prefixes show diversity in switches, but most are very stable (90%). Akamai has a much shorter TTL and shows more diversity in switching pairs. About half of our observations see 70% of possible changes. We conclude that our observations using 15-minute sampling provide a lower-bound on the total number of changes. For Google, we will see almost all switching pairs, but we will under-observe switches at the frequent switchers. For Akamai, we see most unique changes from most sites, but we miss some. Missing switches at frequent switch- ers means our absolute counts will be low, but we probe fast enough to classify prefixes into frequent or infrequent switchers, so this underestimate will not alter our per-prefix classification. Understanding the eects of our measurement frequency helps interpret our data below. 4.4.5 Accuracy of Geolocation With Open Resolvers In this chapter, we adapt the existing CCG algorithm [CFH + 13] to use measure- ments from open resolvers rather than EDNS-client-subnet DNS measurements from all routable prefixes (EDNS-CCG). We use open-resolver-CCG to geolocate Akamai since it does not support EDNS-client-subnet. CCG geolocates FE Clusters by averag- ing the locations of the prefixes they serve after aggressively removing prefixes clearly distant from the FE. Since we are changing the input to CCG from all routable prefixes (with EDNS-client-subnet) to only those prefixes in which we know of open resolvers, this section evaluates if this smaller input aects CCG accuracy. 135 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 3000 CDF of Google FEs Distance from open-resolver-CCG estimate to FE location (km) All FEs FEs with airport codes Figure 4.6: Compare open-resolver-CCG results with ground truth on Google FE Clus- ters with known locations (lower line) and open-resolver-CCG results with EDNS-CCG. Ideally we would evaluate the accuracy of open-resolver-CCG against the ground truth locations of Akamai’s FE Clusters. However Akamai locations are not public. We therefore study EDNS-CCG and open-resolver-CCG over Google’s FE Clusters where both data sources are available, using Google to calibrate the open-resolver-CCG for use on Akamai. First we assess the accuracy of open-resolver-CCG on Google FE Clusters where we have confidence in their location. We start with the 158 Google FE Clusters that have airport codes in their reverse-DNS names, taking these airports as ground truth. The lower line in Figure 4.6 shows the CDF of distance between open-resolver-CCG results and airport-based ground truth. We see that open-resolver-CCG generally has good performance, with 90% of results within 500 km. However, we see that 14 FE Clusters’ open-resolver-CCG result are 500 km or more o, with one each at 8,430 k and 13,084 km. We believe these two largest outliers have incorrect reverse-DNS names. 136 To broaden coverage beyond Google FE Clusters with airport codes, we next com- pare open-resolver-CCG with EDNS-CCG for all routable prefixes. With this compari- son, we get a sense of whether open resolver prefixes are representative of all prefixes. The upper line (routable /24s) in Figure 4.6 shows the CDF of distances between open- resolver-CCG results and the results of CCG using all routable prefixes. We see that evaluation of open-resolver-CCG over all /24s is even closer than the comparison of airport codes only: 50% of FE Clusters are within 50 km and 90% within 500 km. These two comparisons show that open-resolver-CCG has similar accuracy to EDNS-CCG. 4.5 Dynamics of User Redirection Before we look at the impact of prefix-Front-End Cluster mapping on users (x 4.6), we examine map changes overall and per-user, allowing us to begin to understand dier- ences in their redirection dynamics. 4.5.1 Are user prefixes mapped to dierent FE Clusters? We first examine how many mapping changes and how many FE Clusters each user prefix observes in the course of one month. Figure 4.7 shows the cumulative distribution. We see that 20% and 70% of prefixes observe more than 60 mapping changes ((A) and (B) in Figure 4.7) in a month (average 2 a day) for Google and Akamai respectively, suggesting mapping changes are common for many prefixes. (The number of changes we report here is much smaller than prior work [SKB06] because we report the changes between clusters, not just IP addresses.) In addition, we see that most user prefixes have fairly stable mappings for Google, with 92% of them being mapped to at most 4 FE Clusters ((C) in Figure 4.7). Akamai user prefixes seem to experience more variation, 137 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 100000 CDF of prefixes Count of FE Clusters or mapping changes FE Clusters Mapping changes Google Akamai-Huff Akamai-Apple (C) (D) (A) (B) (E) Figure 4.7: Number of dierent FE Clusters and number of total mapping changes pre- fixes observe in one month for Google and Akamai. Dataset: Google-15min-EDNS, Akamai-Apple-15min-ODNS and Akamai-Hu-15min-ODNS with only around 40% being mapped to 4 FE Clusters or fewer and 14% being mapped to 20 or more FE Clusters ((D) and (E) in Figure 4.7). This analysis shows that mapping changes are common, with some users changing frequently and most occasionally. We also look at the rate of the mapping changes to understand how many changes happen during each pair of observations. Figure 4.8 shows how many prefix-FE Cluster mapping changes happen at each observation time, with the figure showing the number of prefixes that the CDN directs to a dierent FE than in the previous round. The number of mapping changes varies for both CDNs, but both show that change happens between every observation. For Google, about 4% of prefixes change at each observation time (1,200 out of 32k prefixes), while for Akamai about 11% change (3,500 out of 32k). We believe this steady rate of change corresponds with load balancing. 138 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 04-01 04-08 04-15 04-22 04-29 Fraction of prefixes seen mapping changes Time (every 15 minutes) (a) Google 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 03-29 04-05 04-12 04-19 04-26 Fraction of users seen mapping changes Time (every 15 minutes) (b) Akamai-Apple 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 11-22 11-29 12-06 12-13 Fraction of prefixes seen mapping changes Time (every 15 minutes) (c) Akamai-Hu Figure 4.8: The number of prefixes seen prefix-FE Cluster mapping changes at every observation time (15 minutes). (Dataset: Google-15min-EDNS, Akamai-Apple-15min- ODNS and Akamai-Hu-15min-ODNS) 139 Both Akamai and Google have variation above a relatively steady baseline of changes. Google shows large peaks, sometimes 9 baseline, while Akamai’s varia- tion is much smaller, with peaks only 3 baseline. We will explain later that the large peak of Google is due to a drain of FE Clustersx 4.7. 4.5.2 Duration of User-to-FE Mapping While we know that many prefixes change prefix-FE Cluster mappings between cer- tain observation times, those aggregate statistics may hide how frequently each prefix changes. That is, is everyone always changing, or do some prefixes change frequently while others are stable? We next look at this question in several ways. 4.5.2.1 How long are prefixes stable? We first consider how long a given prefix-FE Cluster mapping lasts. To measure this value, we take a certain observation time t and measure how long each mapping at that instant is unchanged. For example, if prefix p is mapped to FE A at time t, we look backwards in time to t s when p was first mapped to FE A, then forward to time t e when the prefix was mapped to some other FE, and report D p (t) = t e t s . Given this definition, we look at a randomly selected time and plot the CDF of D p (t) for all prefixes at that time. (We reviewed data for two random times, but report only one because the other is similar.) Figure 4.9 shows the stability of all prefixes for both CDNs at the randomly selected observation time. We see that prefix stability varies greatly (by a factor of 1000). Sta- bility in Google’s CDN seems somewhat bimodal, with most prefixes (80%) stable for more than 24 hours. Akamai, by comparison, shows a smooth distribution of prefix-FE Cluster mapping stability. Overall, Akamai prefix-FE Cluster mappings are much less stable, with only 34% lasting more than 24 hours. 140 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 600 700 800 CDF of all prefixes Duration of prefix-FE mapping (hours) Google Akamai-Huff Akamai-Apple Figure 4.9: Duration of prefix-FE Cluster mapping. (Dataset: Google-15min-EDNS, Akamai-Apple-15min-ODNS and Akamai-Hu-15min-ODNS) 4.5.2.2 Are there stable prefixes? The somewhat bimodal distribution of stability for Google leads us to investigate whether there are particular prefixes that are more stable than others across time. To characterize prefix stability, Figure 4.10 shows the cumulative fraction of prefixes that experience prefix-FE Cluster mapping changes over one month in our datasets. Almost all prefixes (96%) see mapping changes over the month for Akamai, while a significant fraction (22%) of prefixes are stable for Google. We conclude that most prefixes even- tually see a FE change, but some prefixes see frequent changes while others are much more stable. 4.5.2.3 Are there frequent switchers? The opposite of stable prefixes are frequent switchers. We next look at this subset of prefixes. 141 0 0.2 0.4 0.6 0.8 1 03-29 04-05 04-12 04-19 04-26 Cumulative fraction of users seen mapping change Time (every 15 minutes) Google Akamai-Apple Figure 4.10: Cumulative portion of users who have ever experienced prefix-FE Cluster mapping changes. Dataset: Google-15min-EDNS and Akamai-Apple-15min-ODNS. To characterize the stability of a prefix, we compute the mean of D p (x 4.5.2.1), D m p , of all prefix-FE Cluster mappings seen by that prefix over the whole month of observation. We then plot the CDF of D m p over all prefixes. We see the CDF of D m p of all prefixes in Figure 4.11. This graph shows that some pre- fixes retain their mapping for days, while others change every hour or more frequently. The mean mapping for 8.5% of prefixes for Google and 19% of prefixes for Akamai last less than 1 hour. At the same time, 78% of prefixes for Google and 15-17% of prefixes for Akamai are stable for 12 hours or more. This data suggests that some prefixes see frequent mapping changes, while others change occasionally. 4.5.3 Is There a Primary FE Cluster for Each Prefix? We have shown that many prefixes are mapped to dierent FE Clusters, and some of them experience frequent mapping changes. However, as CDNs tend to direct users to 142 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 600 700 800 CDF of prefixes Mean prefix-FE mapping duration (hours) Akamai-Apple Akamai-Huff Google Figure 4.11: CDF of mean prefix-FE Cluster mapping duration (D m p ) over all pre- fixes. (Dataset: Google-15min-EDNS, Akamai-Apple-15min-ODNS and Akamai-Hu- 15min-ODNS) FE Clusters that have low access latency, we expect that most user prefixes will have a primary FE Cluster to which they are mapped most of the time. We show inx 4.7.2 that this kind of analysis helps us identify mapping changes that are potentially caused by load balancing. To identify the primary FE Cluster for a particular prefix, we first find all FE Clusters that this prefix has been mapped to, and then identify the FE Cluster that has the largest time fraction. We calculate the time fraction of each FE Cluster by first counting the number of observations that the prefix are mapped to that FE Cluster, and dividing it by the total number of observations in our dataset. Figure 4.12 shows the CCDF of time fraction that prefixes are mapped to their pri- mary FE Cluster. We see that 80% of Google user prefixes are mapped to their primary FE Cluster more than 60% of time. For Akamai, 60% of prefixes are mapped to primary 143 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 CCDF of all prefixes Fraction of time prefixes at primary FE Cluster (Google) Google Akamai-Apple Akamai-Huff Figure 4.12: CCDF of time fraction that prefixes are mapped to their primary FE Cluster. FE Cluster more than 60% of time. This proves our assumption that most of prefixes have a primary FE Cluster and they are mapped to their primary FE Cluster most of the time. 4.6 Impacts of User Redirection We next consider how changes to prefix-FE Cluster mappings aect users. We begin by considering how far a prefix’s FE changes—if prefix p switches from FE Cluster A to B, what is the distance between A and B? Distance does not directly measure user performance, so we then measure application-level performance on the subset of prefixes where we can. Finally, we consider the geographic footprint of FE Clusters per prefix to understand how mapping aects the legal jurisdiction of data storage. 144 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 CDF of prefixes Distance of switching pair (x1000 km) Akamai-Huff Akamai-Apple Google Figure 4.13: CDF of distance between switching pairs over all prefixes after a random observation time t for Google and Akamai. 4.6.1 Distances of Mapping Changes We next examine the distance between the FE Clusters that users switch between. We expect that a user would see little latency change when switched between nearby FE Clusters, while mapping changes between very distant FE Clusters are more likely to lead to large latency change. Unless the client is equidistant between the old and new FE Clusters, a large change in FE distance therefore suggests a non-optimal choice of a FE. We measure distance between the switching pairs of a prefix-FE Cluster mapping change. We randomly choose an observation time t, then find the switching pair of the next mapping change (A; B) for each prefix after time t. We then plot the CDF of distance between A and B over all prefixes. We see nearly identical distributions after three trials and so report one case as representative. 145 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 CDF of prefixes Max distance of switching pair (x1000 km) Akamai-Huff Akamai-Apple Google Figure 4.14: CDF of maximum distance of switching pair seen in one month over all prefixes. Figure 4.13 shows the CDF of the distance between the switching pair for all pre- fixes over one randomly chosen observation times for Google and Akamai. While some prefixes switch between FE Clusters that are near each other (about 26–33% are within 100 km), many prefixes change between FE Clusters that are far apart. More than 50% of Google changes and 30% of Akamai changes move between switching pairs more than 1000 km apart. Long distance remapping: When measured at a random time we see that many prefixes change between FE Clusters that are distant from each other. We next consider this question for every time over a month. Figure 4.14 plots the distribution of the maximum distance of switching pairs seen by every prefix in one month. Many prefixes experience long-distance changes. For example, 50% of prefixes switch between Google FE Clusters that are at least 1000 km apart, and 60-70% experience such a switch for Akamai servers. Figure 4.15 shows the distribution of the number of times prefixes 146 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 CDF of prefixes Count of large distance switches Google Akamai-Huff Akamai-Apple Figure 4.15: CDF of the number of times each prefix see large distance mapping changes in a month. experience large distance switching pairs. We see that a few Google prefixes (9%) and many Akamai prefixes (40-50%) move large distances (1000 km) more than 10 times in a single month, suggesting it’s not rare for these long distance re-mappings to happen. Inx 4.7 we explore reasons why these changes may occur. 4.6.2 Eects of Mapping Changes on Users 4.6.2.1 Larger Distance Leads to Larger Latency Whilex 4.6.1 showed that users are sometimes mapped to FE Clusters in very dierent places, it does not directly measure performance. While a prefix equidistant between two FE Clusters may see similar performance from both, in most cases we expect that a prefix that is redirected to a very dierent place will see dierent user-visible perfor- mance. 147 Here we study measurements taken from 192 prefixes hosting PlanetLab sites since evaluating user performance requires measurements taken from inside each prefix. Although these sites are only a small subset, we verified that they generally are rep- resentative of our measurements with 32,871 prefixes (x 4.6.2.3). We assess user performance by measuring network latency and application perfor- mance. We measure network and application latencies every DNS TTL, and also imme- diately after we observe a prefix p has changed its mapping from FE Cluster A to B (prior work measured latency [SKB06, TWR11], but not around mapping changes). We measure network latency with ICMP echo request (ping), observing RTT p;A and RTT p;B . We measure application latency by fetching a web page to observe PFT p;A and PFT p;B . To avoid noise in individual observations, each observation uses two pings and one page fetch, and analysis uses the second smallest of the 10 most recent observa- tions. For Google we fetch a 75 kB web page corresponding to a search for “USA” (http://www.google.com/search?q=USA). For Akamai we fetch the 9.5 kB home page of Apple (http://www.apple.com). We then evaluate the absolute value of the dier- ence of these metrics: RTT p;A;B =jRTT p;A RTT p;B j and PFT p;A;B =jPFT p;A PFT p;B j. We use absolute value to judge overall changes, since data shows that at steady state, mapping changes generally alternate between nearer to further FE Clusters. For each prefix, we evaluate all mapping changes over the entire measurement period, giving a set of observations of many RTT p;A;B and PFT p;A;B . Since changes are generally symmetric, we merge the (A; B) and (B; A) directions and take the median value of all observations to get RTT m p;A;B and PFT m p;A;B . Finally, to understand if large dis- tance switches aect performance, we divide observations into distant switches, where A and B are 1000 km apart or more, and near switches where they are less than 1000 km. We then plot the CDF of RTT m and PFT m for each group. 148 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 CDF of switching pairs Latency difference of switching pairs (ms) near switches RTT near switches page fetch distant switches RTT distant switches page fetch (a) Google 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 CDF of switching pairs Latency difference of switching pairs (ms) near switches RTT near switches page fetch distant switches RTT distant switches page fetch (b) Akamai Figure 4.16: Prefix-FE latency changes after a mapping change, measured by RTT (dashes) and page fetch time (solid). Left line are near switches, right line are far switches. (Dataset: PlanetLab-DNS-TTL) 149 0 2 4 6 8 10 12 Distance of switching pair (x1000km) 0 50 100 150 200 250 300 350 Latency difference (ms) >=5 0 1 2 3 4 Number of switching pairs (a) Google RTT 0 2 4 6 8 10 12 Distance of switching pair (x1000km) 0 200 400 600 800 1000 1200 Latency di fference (ms) >=5 0 1 2 3 4 Number of switching pairs (b) Google page fetch time Figure 4.17: Correlation of latency changes and distances of switching pairs for Google. We omit a small number of data points that are more than 12k km or more than 350ms for RTT and 1200ms for page fetch time. But the best-fit line is generated with all data points. (Dataset: PlanetLab-DNS-TTL) 150 0 2 4 6 8 10 12 Distance of switching pair (x1000km) 0 50 100 150 200 250 300 350 Latency difference (ms) >=5 0 1 2 3 4 Number of switching pairs X Y (a) Akamai RTT 0 2 4 6 8 10 12 Distance of switching pair (x1000km) 0 200 400 600 800 1000 1200 Latency di fference (ms) >=5 0 1 2 3 4 Number of switching pairs (b) Akamai page fetch time Figure 4.18: Correlation of latency changes and distances of switching pairs for Akamai. We omit a small number of data points that are more than 12k km or more than 350ms for RTT and 1200ms for page fetch time. But the best-fit line is generated with all data points. (Dataset: PlanetLab-DNS-TTL) 151 Figure 4.16 shows results for Google and Akamai. We first see that the switches between distant FE Clusters (the wider, right-most lines) show much greater perfor- mance changes than switches between nearby ones (the thinner, left lines). For Google, near switches show smaller performance changes (RTT m < 50 ms and PFT m < 150 ms), while for far switches group, more than 40% have changes more than twice that (RTT m > 100 ms and PFT m > 400 ms). The results of Akamai are similar, with only 4% of near switches showing RTT m > 100 ms, while the number is 25% for far switches. The above analysis separates the mapping changes into two categories with a speci- fied threshold—1000km, and looks at the distribution of each category. To show whether large distances of switching pairs lead to large latency change without a specified thresh- old, Figure 4.17 and Figure 4.18 show the relationship between latency changes and distances of switching pairs. The bins in the graph represent how many switching pairs have the corresponding latency dierence and distance. To make the figure more visi- ble, we adjust the scale of the density, and use the same color for bins that have at least 5 ( 5) switching pairs (1% of bins are 5 for google and 7% for Akamai). We also compute a linear fit for the data and show this best-fit line on the graph. We see that for Google, the relation is clear that larger distances lead to larger changes in access latency. Interestingly, we see that the distance of switching pairs and the latency dierences show possible linear relation, with a correlation coecient of 0.96 and 0.89 for RTT and page fetch time data, respectively. The possible linear relation not only shows that larger distances lead to larger latency changes, but also suggests that user prefixes are usually close to one of the Google FE Clusters. As shown in Figure 4.19, user prefix p 1 experience a mapping change from FE Cluster A to FE Cluster B. The distance between A and B is d A;B (3,000km). We believe the possible linear relation reflects the relation of distance and latency, specifically d A;B 152 Figure 4.19: Examples of relations of distance of switching paris and latency changes. and RTT A;B (for example 50ms). However in Figure 4.17, the latency is not the direct latency from A to B, but the dierence of latencies measured from a vantage point to each prefix (that is, RTT p 1 ;A;B = j61 9j = 52 taken from prefix p 1 ). In order for RTT p 1 ;A;B to be linearly related to d A;B in the similar way as RTT A;B , RTT p 1 ;A;B should be similar to RTT A;B , just like in our example, 52ms is close to 50ms. RTT p 1 ;A;B will be similar to RTT A;B when the prefix is close to one FE Cluster or the other. Thus, we believe the possible linear relationship suggests that RTT p;A;B is close to RTT A;B which happens as in Figure 4.19 that p 1 is close to one of the FE Clusters (B in the case of Figure 4.19). Besides the above discussion, the geolocation errors may aect our results, but it is not likely to be the primary cause of the possible linear relationship. We leave the further investigation of this possible linear relationship between latency dierences and distances of switching pairs to future work. For Akamai, the data also shows that large distance of switching pair is more likely to cause large latency dierence. We can see from the figure that when the distances of switching pairs are small (<1000km), small latency dierences are dominant; and when the distances of switching pairs are large(>1000km), large latency dierences are 153 dominant. And we see that many switching pairs are outside the diagonal, suggesting there are various combinations of latency dierences and distances. We next discuss the potential reasons for the various combinations. We compute and plot a best-fit line, but that line matches the data poorly—the cor- relation coecient is 0.44 for RTT and 0.16 for page fetch time. For Akamai, large distances of switching pairs associating with small latency dierences (those bins below the best-fit line, such as the bin X in 4.18a—with a 10,000km distance but only about 20ms latency change); and small distances of switching pairs associating with large latency dierences (those bins above the best-fit line, such as the bin Y in. 4.18a—with a few hundreds kilometers distance but about 260ms latency change). We next provide our hypothesises for these two situations. First, for cases like bin X in 4.18a, where a large distance of switching pair asso- ciated with a small latency change, they suggest that the prefix is located at a location that have similar distances to both FE Clusters. An example is shown in Figure 4.19, the distance between FE Cluster C and D is 12,000km (d C;D = 12000). The latency from user prefix p 2 to C is about 120ms, and the latency from p 2 to D is about 140ms. Thus, RTT p 2 ;C;D is only 20ms while d C;D is very large, 12,000km. In situation like this where the user prefix has similar distances to the two FE Clusters, the latency dierence will likely be small no matter what the distance between the two FE Clusters is. Second, for cases like bin Y in 4.18a, where a small distance of switching pair associated with a large latency change, they suggest the network path from the prefix to one of the FE Clusters is congested, or the path is much longer than the paths to the other FE Cluster. For example, in Figure 4.19, p 1 experiences a mapping change between FE Cluster B and FE Cluster E , where latency from p 1 to B is only 9ms but the latency from p 1 to E is more than 100ms, while B and E are very close to each other (d B;E = 100km). In this situation, though the two FE Clusters are close to each 154 other (d B;E is small), the RTT p 1 ;B;E will still be large. Besides these two hypothesises, we believe network congestion can be a reason for the bins at any places in the figure (above the linear fit line, on the line and below the line), and so can the geolocation errors. Investigation of the details of Akamai are future work. To summarize, prefixes that switch between FE Clusters that are far apart tend to also observe large network and page-fetch latency changes. 4.6.2.2 How Long Do Prefixes Stay On Non-Optimal FE Clusters? Fortunately, we next show that switches that increase user latency are usually brief for most prefixes. We analyze our PlanetLab data to see what fraction of time user prefixes spend in a mapping that has large latency (for this subset of data). We focus on distant switching pairs, those with distance larger than 1000 km, and of these, those with long dierences in page-fetch times (PFT m > 100 ms). The resulting subset are all prefixes with large distance switches that raise application latency. Finally, we look at how long each prefix remained at the larger-latency FE Cluster, computing the fraction of observations the prefix spent there. Figure 4.20 shows the CDF of fraction of time user prefixes spend on FE Clusters with large latency (where page-fetch time is 100 ms worse than in the prior mapping). Most of these FE Clusters are used only briefly (97% of Google and 93% of Akamai prefixes spend less than 5% of their time at FE Clusters with high application latency). But the tail is long, with 2% of Google and 5% of Akamai prefixes spending more than 60% of time on distant FE Clusters and seeing higher application latencies, even though lower-latency FE Clusters exist. 155 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.2 0.4 0.6 0.8 1 CDF of prefixes Fraction of time prefixes map to large latency cluster Google Akamai Figure 4.20: CDF of fraction of time user prefixes spend on a FE Cluster with large latency (where page-fetch time is 100 ms worse than in the prior mapping). 4.6.2.3 PlanetLab Data Representativeness In previous sections we have shown that mapping changes between FE Clusters that have a large distance from each other tend to lead to large access latency, and that a few user prefixes are mapped to large latency FE Clusters most of the time. These performance measurements must be taken from each vantage point, so these results are available only for the 192 prefixes in PlanetLab. The open resolver data covers 32k prefixes. Are the measurements from PlanetLab representative of those from more prefixes or the whole Internet? For comparison, we pick two matching weeks from our datasets with open resolvers (Google-15min-EDNS and Akamai-15min-ODNS) and PlanetLab (PlanetLab-DNS- TTL). As we want to understand the latency eects of switches between distant pairs, we use two distance-related metrics to study representativeness of PlanetLab data. First, the maximum distance (between before and after FE Clusters), over all FE Cluster changes 156 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 CDF of prefixes that have mapping changes Max distance of swithcing pair seen in one week (x1000 km) Google open resolver Google Planelab Google Planelab with Bulgaria FE Cluster Akamai open resolver Akamai Planetlab Figure 4.21: CDF of Maximum distance of switching pair seen by PlanetLab prefixes and open resolver prefixes for those prefixes that see multiple FE Clusters in one week, with and without the mapping changes caused by the outlier Bulgaria Google FE Clus- ters. Dataset: PlanetLab-DNS-TTL, Google-15min-EDNS, Akamai-15min-ODNS for each prefix (Figure 4.21), and second, the number of times a prefix sees mapping changes between distant (more than 1000 km) switching pairs (Figure 4.22). We focus our analysis on the set of prefixes that have mapping changes, and exclude those that never changed. Before discussing the data we need to remove one outlying cluster: we find that for Google, one cluster in Bulgaria attracts 26 U.S. academic sites (14% of all PlanetLab sites we use), skewing the PlanetLab measurements. This cluster causes PlanetLab to misrepresent typical behavior, as we confirm in the lines with and without this cluster in Figure 4.21 and Figure 4.22. Since this skew is due to the interaction of a single cluster with our academic-based observers, we remove it from the remaining analysis to support comparing PlanetLab with our general Internet data. In addition, we confirm 157 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 CDF of prefixes that have mapping changes Number of times prefixes see large distance of switching pair (>1000km) Google open resolver Google Planetlab Google Planetlab with Bulgaria FE Cluster Akamai open resolver Akamai Planetlab Figure 4.22: CDF of Number of times PlanetLab and open resolver prefixes see large distance mapping changes in one week, with and without the mapping changes caused by the outlier Bulgaria Google FE Clusters. Datasets: PlanetLab-DNS-TTL, Google- 15min-EDNS, Akamai-15min-ODNS that removing this Google FE Cluster doesn’t aect the latency results in Figures 4.16 and 4.20. We see that the results of two datasets show small dierence on the distribution of maximum distance of switching pairs see by prefixes. Figure 4.21 shows the results. For Akamai, the largest dierence between two datasets happens at 1800 km (the x axis), where 69% of open resolver prefixes have their maximum distance of switching pairs less than 1,800km, while 82% of PlanetLab prefixes have their maximum distance of switching pairs less than 1,800km, with a dierence of 13% of prefixes. For Google, the largest dierence between two datasets happens at around 3,000 km (the x axis), where 81% of open resolver prefixes have their maximum distance of switching pairs less than 3,000km, while 91% of PlanetLab prefixes have their maximum distance of switching pairs less than 3,000km, with a dierence of 10% of prefixes. We believe 158 that dierence of 10-13% of prefixes on CDF is not a large dierence that can aect the representativeness of PlanetLab data. Then, we see that the dierence of two datasets, on the distribution of the num- ber of times a prefix sees mapping changes between distant FE Clusters, is also not large. Figure 4.22 shows the results. For Akamai, the largest dierence between two datasets happens at 9 mapping changes (the x axis), where 66% of open resolver pre- fixes see mapping changes between distant FE Clusters 9 or fewer times, while 80% of PlanetLab prefixes see mapping changes between distant FE Clusters 9 or fewer times, with a dierence of 14% of prefixes. For Google, the largest dierence between two datasets happens at 6 mapping changes, where 90% of open resolvers prefixes see map- ping changes between distant FE Clusters 6 or fewer times, while 75% of PlanetLab prefixes see mapping changes between distant FE Clusters 6 or fewer times, with a dierence of 15% of prefixes. Again, we believe maximum dierence of 14-15% of prefixes on CDF doesn’t aect representativeness of PlanetLab dataset. To sum up, we showed that PlanetLab is generally representative of prefixes across the open resolver datasets, since the CDFs are close after the elimination of one outlier for Google 4.6.3 The Geographic Footprint Seen by User Prefixes Prefix-FE Cluster mapping changes sometimes are across long distances, suggesting that users may see FE Clusters in dierent countries. 2 For some users, trac leaving a given country may raise concerns about privacy or legal jurisdiction. We next show that some prefixes in many countries are often mapped abroad. We are aware that the accuracy of geolocation may not provide enough resolution to distinguish small countries, but we see prefixes from some countries are mapped to very distant other countries which shouldn’t 2 We use the term country generically, sometimes considering smaller or larger regions. 159 be aected by the geolocation error. In addition, we manually check (by looking at traceroute, reverse DNS names and whois information) some results we report, which are among countries that are close to each other, to assess the correctness of our claims. First, we assess how many countries each prefix is mapped to over the course of a month. Figure 4.24 shows this distribution. We see that more than half of prefixes are mapped to dierent countries over time (50% for Google, and 60–70% for Akamai). It is common for a user to be served from multiple countries. We caution that this result reflects two biases in our data: first, our prefix selection under-representing prefixes that are served directly from the provider, as described inx 4.3.2.2. Second, because of cluster drain (x 4.7), we expect many prefixes to shift from o-net FE Clusters, in many countries, to on-net FE Clusters, in only a few countries. We next consider from where prefixes are served. For each service we select the 10 countries that originate the most user prefixes in our data, then identify from where they are served. (We exclude prefixes (27% of Google and 23% of Akamai) that are never served domestically on the assumption that they have no local option or that our geolocation is wrong.) For each country we consider two questions: what portion of prefixes leave the country? Where does their trac go? Table 4.11 shows the results for Google and two Akamai hosted websites. (The top countries dier because the CDNs are dierent, and our Akamai target (www.apple.com) is not served from Russia.) For each country, the first column shows how many of that country’s prefixes that are sometimes mapped outside its borders. The following three columns show which other countries most often provide service. We see that all but the U.S. have many non-domestic mappings—around 50% of user prefixes for Google and 90% for Akamai. We see that Google often serves from the U.S., Belgium and Netherlands, perhaps because those countries have good connectivity and host Google datacenters [Rob13]. For Akamai, we see that U.S. FE Clusters serve 160 source non-domestic country % 1st 2nd 3rd us (United States) 11% be (4%) nl (4%) de (3%) kr (S. Korea) 97% jp (58%) us (19%) cn (18%) ru (Russia) 99% us (35%) be (6%) nl (5%) jp (Japan) 55% us (30%) nl (9%) be (7%) br (Brazil) 48% nl (18%) be (17%) us (14%) tw (Taiwan) 45% us (24%) be (9%) nl (9%) cn (China) 51% us (27%) nl (11%) be (11%) it (Italy) 60% us (40%) de (19%) fr (5%) gb (U. Kingdom) 54% us (40%) nl (19%) be (8%) au (Australia) 52% us (24%) nl (18%) be (11%) (a) Google source non-domestic country % 1st 2nd 3rd us (United States) 12% ca (5%) mx (3%) gb (1%) kr (S. Korea) 93% jp (84%) cn (67%) us (28%) jp (Japan) 64% us (51%) cn (37%) nl (4%) br (Brazil) 94% us (93%) cl (14%) de (12%) tw (Taiwan) 91% cn (90%) jp (70%) us (32%) cn (China) 99% jp (98%) us (89%) ch (78%) gb (U. Kingdom) 98% be (77%) de (73%) se (62%) ca (Canada) 99% us (99%) mx (5%) gb (3%) hk (Hong Kong) 15% cn (12%) id (5%) us (5%) th (Thailand) 94% my (80%) id (78%) cn (74%) (b) Akamai-Apple source non-domestic country % 1st 2nd 3rd us (United States) 98% ca (38%) gb (27%) fr (27%) kr (S. Korea) 99% tw (99%) jp (6%) nl (3%) ru (Russia) 96% se (74%) no (43%) de (40) jp (Japan) 100% cn (92%) us (67%) vn (9%) br (Brazil) 83% us (78%) cl (53%) ar (35%) tw (Taiwan) 99% cn (74%) us (72%) vn (48%) cn (China) 99% jp (93%) us (89%) gb (67%) hk (Hong Kong) 90% cn (88%) jp (25%) vn (12%) tr (Turkey) 91% it (82%) se (46%) de (23%) fr (France) 99% pl (69%) gb (57%) es (56%) (c) Akamai-Hu Table 4.11: Top 10 source countries (with ISO country codes) and their percentage of prefixes that had been mapped to FE Clusters in other countries, and the three non- domestic countries serving them. 161 prefixes from other countries, perhaps because of good U.S. connectivity. Akamai trac shows a stronger geographic locality. For example, in Akamai-Apple dataset, Thailand prefixes remaining in Asia, U.K. prefixes in Europe, and in Akamai-Hu dataset, French and Turkish prefixes remaining in Europe and Hong Kong prefixes in Asia. Surprisingly, most Chinese prefixes are sent abroad. For this part of analysis, we were able to man- ually check the correctness of most Asia related results. However, as the reports are mostly about geographic locality, we believe the analysis won’t be aected by geoloca- tion error. Our measurements of both CDN show that prefixes are often mapped outside their originating country. Countries that have expressed privacy concerns, such as Brazil [Edg13], or regions with strict privacy laws, such as the European Union, may find trac leaving their legal jurisdiction weakens their ability to implement some poli- cies. In Akamai-Apple data, 12% of Brazil prefixes are sent to Europe (Germany) while in Akamai-Hu, FE Clusters in Argentina take the place of German FE Clusters, proba- bly due to more FE Clusters serving Hungtonpost (we manually confirm this result is correct, not geolocation error). In other cases, prefixes in some countries find services in others that have strict limits on domestic handling of some topics. Examples include South Korea and Japan receiving service from China (with limits on Chinese politics, geolocation correctness confirmed by manual check), and in Akamai-Apple data where Brazil served from Germany (with limits on Nazi politics). While such issues may not be a concern for Apple’s or Hungtonpost’s home page, it may be for other services using these CDNs. 162 4.7 Reasons for Mapping Changes We have shown that mapping changes are common. We next evaluate why they occur. Although we cannot categorize every change, we see three general reasons: FE Clusters draining and restoring, load balancing, and user-to-FE Clusters mapping reconfigura- tion. We cannot completely separate these categories without inside knowledge of each CDN. However, our external observations provide some evidence of each. 4.7.1 FE Clusters Drain and Restoration CDN sometimes drain some of their FE Clusters, assigning no user prefixes to them, in order to, for example, perform maintenance or troubleshoot problems. For exam- ple, Facebook recently drained an entire datacenter to as part of an infrastructure stress test [Yev14]. As an example drain event, Figure 4.23 shows the number of active FE Clusters in Google over our Google-15min-EDNS dataset. We see a large drop around April 23rd (from 900 to 60 FE Clusters). Examination of the clusters before and after the drop shows that Google stopped directing clients to all FE Clusters not in Google ASes (the o-net FE Clusters). They restored broader service, then shut o-net FE Clusters again on April 28th. Is the service down when FE Clusters drain? To understand what happens when we see FE Clusters drain we next compare two datasets (Google-1day-EDNS and Google- HTTP-aliveness). We confirm the drain exists for all routable /24 prefixes, and demon- strate the service remains operational even though DNS is not forwarding new clients to it. While Google-1day-EDNS and Google-HTTP-aliveness datasets are only measured once a day, they have complete coverage on all routable /24 prefixes, which means their results generally reflect the truth for the whole Internet. 163 0 100 200 300 400 500 600 700 800 900 1000 04-01 04-08 04-15 04-22 04-29 Total number of FE Clusters Observations (every 15 minutes) Figure 4.23: Total number of Google FE Clusters seen from all prefixes at each obser- vation. Figure 4.25 shows the number of Google FE Clusters observed by all routable /24 prefixes and the number of Google FE Clusters running web services. We first see that the drain on Apr 23rd and Apr 28th is confirmed to happen to all routable /24 prefixes. Then, the figure also shows that despite the large drain on the number of FE Clusters observed using EDNS DNS query, the number of FE Clusters that have running web services almost never changes. This means even though Google didn’t direct users to many of their FE Clusters, almost all of their FE Clusters are good to provide web service. This kind of drain on FE Clusters seen by user prefixes may be due to internal test or research purposes, for example, Facebook use to shut an entire datacenter down for testing resilience [Yev14]. Does service drain aect our analysis? We checked if these drains bias our previous observations (x 4.6.1 andx 4.6.2). Our interest in understanding the eects of these large scale drains is because we think these drains won’t happen often, thus they may aect 164 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 CDF of prefixes Number of countries prefixes mapped to Google Akamai-Huff Akamai-Apple Figure 4.24: CDF of the number of dierent countries to which prefixes are mapped. our results which are supposed to report common behaviors. To understand whether these drains bias our study, we re-examined the distance user prefixes switched with and without these days where all o-net FE Clusters drained. We first re-plot Figure 4.14 without the mapping changes caused by the FE Clusters draining and restoring on the three days, but the result remains almost the same (no visible dierence from the Google line in Figure 4.14). We then re-plot Figure 4.15 the same way, and Figure 4.26 shows the comparison result. We see that the dierence is still very small. Only 2% less prefixes see large distance mapping changes after deleting those caused by FE Clusters draining/restoring. So we conclude that Google’s FE Clusters draining and restoring on those three days doesn’t aect our result of large distance mapping change much. Table 4.12 shows the total number and fraction of mapping changes we see in one month that are caused by o-net FE Clusters drain and restoration in the three days. We see that 91% and 98% of mapping changes for Google and Akamai are confirmed not 165 0 200 400 600 800 1000 1200 04/05 04/12 04/19 04/26 Total number of Google FE Clusters observed Observations (every day) HTTP alive seen in DNS queries Figure 4.25: Total number of Google FE Clusters seen from all routable /24 prefixes everyday and the number of Google FE Clusters have running web service. Datasets: Google-1day-EDNS and Google-HTTP-aliveness Google Akamai Total mapping changes 3,580,061 100% 10,390,023 100% possible FE Cluster drain and restoration 318,987 9% 242,958 2% not FE Cluster drain and restoration 3,261,074 91% 10,147,065 98% Table 4.12: Number and fraction of mapping changes that are due to potential drain and restore of FE Clusters in one month for Google and Akamai. Dataset: Google-15min- EDNS and Akamai-15min-ODNS to be caused by drain and restoration of FE Clusters. This also suggests that drain and restoration of FE Clusters only contributes to a small portion of mapping changes, so regular changes in mapping dominate our observations. 4.7.2 Load Balancing We observe three patterns of behavior that we believe are due to load-balancing of user prefixes across multiple FE Clusters. 166 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 CDF of prefixes Number of times prefixes see large distance of switching pair (>1000km) include drain no drain Figure 4.26: Comparing the CDF of number of times prefixes see large distance map- ping changes with FE Clusters drain/restore and without FE Clusters drain/restore. Frequent Mapping Changes We sometimes see some prefixes (about 10% for Google and 30% for Akamai) switch between two FE Clusters quite frequently (on average every hour). We sample 10 prefixes from each of these groups, and for each prefix, both FE Clusters they switched between are close to each other (within 200 km). This behavior may indicate that the CDN is spreading the load between FE Clusters at two dierent PoPs. FE Clusters Diurnal Trac Pattern We see that a few Google FE Clusters (about 10 of 900) display diurnal patterns (as seen in spectral analysis [QHP14]) on the number of user prefixes mapped to it during a day, suggesting some load balancing due to changes in diurnal trac patterns. An example FE Cluster in Mumbai, India is shown in Figure 4.27. We can see that the number of prefixes mapped to this FE Cluster shows daily pattern that there are peaks and valleys everyday, and the time intervals between peaks seems stable around 167 0 10 20 30 40 50 60 70 80 90 100 03-29 04-04 04-10 04-16 04-22 04-28 Number of prefixes mapped to the FE Cluster Time (every 15 minutes) Figure 4.27: A Google FE Cluster in Mumbai, India shows diurnal pattern of shift of the number of user prefixes mapped to it. Dataset: Google-15min-EDNS one day. To confirm the diurnal pattern and identify the FE Clusters that show diurnal patterns, we use Discrete FastFourier transform (FFT) to identify periodicity for each FE Cluster as did in [QHP14]. The input to the Discrete FFT is the sequence of number of prefixes mapped to the particular FE Cluster at each observation (every 15 minutes), and the evidence of a diurnal trac pattern will be identified through an outstanding amplitude at 24 hour or 12 hour intervals in the FFT output. The result of the amplitude of FFT of the Mumbai FE Cluster is shown in Fig- ure 4.28. For comparison, 4.28a shows the result of the Mumbai FE Cluster that have visual periodic pattern in the number of prefixes at each observation and 4.28b shows another FE Cluster (California, United States) that doesn’t have that visual periodic pat- tern. We can see that the Mumbai FE Cluster has the largest two amplitudes at 24 hours and 12 hours, while the California FE Cluster doesn’t have outstanding amplitudes at 24 and 12 hours. 168 0 200 400 600 800 1000 1200 1400 24h 12h 8h 6h 4h 3h 2h amplitude period (hours) (a) Diurnal 0 1000 2000 3000 4000 5000 6000 7000 24h 12h 8h 6h 4h 3h 2h amplitude period (hours) (b) Non-diurnal Figure 4.28: Amplitude of FFT for a FE Cluster that showing diurnal pattern and a FE Cluster not showing diurnal pattern. Dataset: Google-15min-EDNS 169 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 CCDF of prefixes that have mapping changes Fraction of time prefixes at primary FE Cluster (Google) primary is on-net priamry is off-net Figure 4.29: CCDF of time fraction that prefixes are directed to their primary FE Cluster for those prefixes whose primary FE Cluster is on-net and those prefixes whose primary FE Cluster is o-net. Using Discrete FFT, we are able to identify 10 (out of 900) Google FE Clusters that showing diurnal pattern, suggesting there are mapping changes due to diurnal trac dynamics. Inferring from Primary FE Cluster Choices We also expect some mapping changes of Google to be load balancing by inferring from the observation of how Google choose primary and secondary FE Clusters for each prefix. Because o-net FE Clusters are mostly inside local ISP’s network, so are likely to be used to serve the ISP’s users, we expect them to have less capacity than on-net FE Clusters, so that o-net FE Clusters are more sensitive to load changes. To prove this hypothesis, we first look at the distribution of time fraction of on- net primary FE Clusters and time fraction of o-net primary FE Clusters, as shown in Figure 4.29. We see that prefixes with on-net primary FE Clusters generally spend more 170 Primary Secondary Percentage on-net on-net 44% o-net 3% none 22% all 69% o-net on-net 23% o-net 8% none 0% all 31% Table 4.13: Percentage of occurrence of dierent combination of on-net/o-net and primary/secondary FE Clusters for Google user prefixes. time on their primary FE Clusters than prefixes with o-net FE Clusters, suggesting on-net FE Clusters may have more capacity so they seem to be more stable. We further categorize all prefixes based on whether their primary and secondary FE Clusters are on-net or o-net. Table 4.13 shows the result. We see that no matter what types (on-net, o-net) the primary FE Clusters are, secondary FE Clusters are more likely to be on-net. The fact that secondary are more likely to be on-net FE Clusters suggests that on-net FE Clusters may have more capacity so that when o-net FE Clus- ters can’t bear the load, trac will be directed to on-net FE Clusters which have more capacity. In sum, we suppose that the reason of some mapping changes that are from o-net primary to on-net secondary is load balancing. 4.7.3 Reconfiguration of User-to-FE Clusters Mapping Both Google and Akamai strive to optimize performance for users by associating pre- fixes with nearby FE Clusters [KMS + 09b, DMP + 02a]. Long-term shifts in routing, user population, and FE Cluster deployments may shift this mapping as the CDN re- optimizes. In early data (Google-15min-early dataset) we saw that Google would occa- sionally shift one-third of user prefixes at the same time [ZHR + 12]. These bulk shifts 171 have diminished in recent observations of Google and never appeared in Akamai, but both CDNs currently have a few percent of user prefixes that have stable mappings for weeks. 4.7.4 Unknown: We also observe some mapping changes that are not explained by the above reasons. For example, we see Google sometimes map prefixes to very distant Google FE Clusters (across continents) for a single observation. We are unsure why this occurs. 4.8 Prefix-FE Mapping Implications for Other Studies Our study of prefix-FE Cluster mapping gives insight into two prior studies. First, Client Centric Geolocation (CCG) uses prefix location to infer FE location. We evaluate sta- bility of those results in the face of mapping changes. Finally, we examine methods to predict which prefixes will change frequently. 4.8.1 Impact on Client Centric Geolocation In this section, we will show that prefix-FE Cluster mapping changes can cause client- centric geolocation to place a small fraction of FE Clusters in very dierent locations at dierent points of time. As discussed inx 4.3.2.4, because CCG geolocates a FE Cluster based on the loca- tions of the clients mapped to it [CFH + 13], prefix-FE Cluster mapping changes may aect the result of CCG. We say “may” aect because CCG does not necessarily use all clients that are mapped to a FE in its geolocation. Instead, CCG uses heuristics to filter out clients unlikely to be near the FE. For example, CCG pings all front-end IP 172 addresses from all PlanetLab nodes, then uses the round-trip times to establish speed- of-light constraints on where the FE Cluster could possibly be. When geolocating a FE Cluster, CCG excludes all client locations that are outside the feasible region. With this filtering, clients that are far away from the FE may be excluded during geolocation. To understand the impact that mapping changes may have on CCG, we use the Google-location-EDNS dataset to first identify Google FEs that are geolocated at very dierent places within a month. We expect that if prefix-FE Cluster mapping changes are going to aect CCG, then some Google FEs geolocated with CCG will have dif- ferent locations within one month. Our Google-location-EDNS dataset geolocates all FEs every day, which means for every FE Cluster we have 30 latitude and longitude pairs, one pair per day. We then compute the maximum distance among the 30 lati- tude and longitude pairs and consider FE Clusters that have a maximum distance of at least 500 km. We find that among 1,491 Google FE Clusters in Google-location-EDNS dataset, 77 (5%) have maximum distance of at least 500 km; we call these Google FE Clusters bad-geoloc-FE-Clusters. The maximum distance of bad-geoloc-FE-Clusters are from dierent pairs of dates. For easier processing, we choose the pair of dates that has the largest number of bad- geoloc-FE-Clusters, April 3 and 4. In total, 7 bad-geoloc-FE-Clusters have their maxi- mum distance of locations from these dates. Because CCG used fixed client locations throughout the month, if it places a FE in dierent locations on dierent days, it must be because it used a dierent set of clients to locate the FE at the two times t1 and t2. Since CCG filters out clients before estimating the FE location as the average of clients that survive filtering, the sets could dier because Google mapped dierent sets of clients to the FE or because dierent sets of clients survived filtering at t1 than at t2. We refer to the final (post-filtering) clients used to locate FE at t1 as POS T t1 and those used at t2 as POS T t2 . The two sets must 173 be dierent due to 1) prefix-FE Cluster mapping changes and/or 2) filtering. Prefix-FE Cluster mapping changes may cause the pre-filtering set of client prefixes (PRE t1 ) that are mapped to the FE Cluster at t1 to dier from the pre-filtering set of client prefixes (PRE t2 ) that are mapped to the FE Cluster at t2, in turn causing POS T t1 and POS T t2 to dier. Even if the PRE t1 and PRE t2 are the same, the filtering algorithm may filter out dierent client prefixes at t1 and t2, causing POS T t1 to be dierent from POS T t2 . Thus, in order to understand whether the dierent geolocation is due to prefix-FE Cluster mapping change (the focus of this chapter) or filtering, we perform the following steps: 1. For every client prefix p12 POS T t1 and< POS T t2 : If p12 PRE t2 : add p1 to S f iltering Else: add p1 to S map change 2. For every client prefix p22 POS T t2 and< POS T t1 : If p22 PRE t1 : add p2 to S f iltering Else: add p2 to S map change 3. Compute the mapping change index as: = jS map change j jS map change j +jS f iltering j If > 0:5, then most of the prefixes potentially responsible for a geolocation change result from changes in prefix-FE Cluster mapping, while< 0:5 means that most of the dierences in prefixes result from filtering. We compute the mapping change index of the 7 bad-geoloc-FE-Clusters that CCG placed in locations at least 500 km apart between April 3rd and April 4th. Table 4.14 174 Total bad-geoloc-FE-Clusters 7 100% = 1 4 58% = 0:99 1 14% = 0:98 1 14% = 0:35 1 14% Table 4.14: Mapping change index () of 7 bad-geoloc-FE-Clusters. Bad-geoloc-FE- Clusters are those FE Clusters that are geolocated very dierently within a month. basically measures the chance that the bad geolocation is caused by mapping changes. shows that the FE Clusters have 4 dierent mapping change index values, but most have an index of 1, meaning all changes due to remappings and no changes due to filtering. To sum up, prefix-FE Cluster mapping changes do aect CCG for some FE Clusters. In our sample, more than half (58%) of FE Clusters with large changes in estimated geolocation are caused solely by prefix-FE Cluster mapping changes. However, only 5% of FE Clusters experience large changes in their estimated location. So prefix-FE Cluster mapping changes do cause large location dierence, but only for a small fraction of FE Clusters. 4.8.2 Improving Drafting Behind Akamai Prior work (“Drafting”) has exploited Akamai’s CDN to find detour routes that have lower latency than direct paths in the Internet [SKB06]. Although that work leveraged Akamai, knowledge of that work’s evaluation of short- and long-distance switches sug- gests possible improvements to their use of Akamai. The Drafting algorithm considers two paths between pairs of nodes in an overlay network: the direct path between the nodes, and a one-hop detour based on Akamai’s recommendation. Their algorithm gets a recommendation from Akamai by considering a detour path that forwards near one of the FE Clusters currently provided by Akamai’s DNS-based FE-selection method 175 From our findings, we suggest an additional option: they should consider a detour path through one hop that was previously recommended by Akamai. x 4.6.1 shows that Akamai often recommends dierent FE Clusters at dierent times, and for 30% of prefixes these FE Clusters are far apart. In short, we suggest that Akamai’s FE selection does not always return the closest site, so use of some history can help. 4.9 Related Work Prior work compared the performance of CDN-selected front-end servers and other servers of the same CDN [SKB06, TWR11, KWZ01, OSRB12]. Su et al. use Akamai’s choice of server location to influence their selection of network paths to leverage Aka- mai’s network measurements [SKB06]. Triukose et al. compare the page download per- formance dierence between Akamai selected server with 80 other randomly selected Akamai servers to study if CDNs enhance performance [TWR11]. Krishnamurthy et al. study CDN DNS load balancing performance by using two dozen clients to detect DNS load balancing every 30 minutes and performing file download when observing CDN server changes [KWZ01]. Otto et al. compare HTTP latency between CDN servers returned by dierent DNS servers to measure the impact of using remote DNS on CDN performance [OSRB12]. Our work diers from this prior work by exploring how CDNs change their prefix-FE Cluster mappings over time, and how these changes aect network and application latency for users. The Ono system uses a large set of clients (120,000 in 15,000 prefixes) to study anity between users and CDN servers [CB08]. They use this information to help peer selection in peer-to-peer networks to reduce cross-ISP trac. Our work also uses a large set of client prefixes to assess user-to-CDN anity, but we focus on understanding 176 the properties of prefix-FE Cluster mapping changes and their potential impact on both users and previous CDN studies. Huang et al. studied the cache dynamics from users to Facebook Edge Caches as viewed from within Facebook [HBvR + 13]. Facebook optimizes to balance latency, server load, and peering cost, sometimes directed users to caches that are not physi- cally nearest. Our paper complement theirs by looking from the user side to provide third-party, user side view of the services. Torres et al. studied mechanism and policy of user to content server map- ping of Youtube using video flow data collect from 5 distinct locations over a week [TFK + ]. They geolocate Youtube datacenters using Constrain-Based Geoloca- tion (CBG) [GZCF06] and find that a non-negligible fraction of trac is provided by non-preferred datacenter. They find that the reasons of non-preferred datacenter access include load balancing, DNS server variations, limited availability of rarely accessed videos and alleviating hot-spot due to popular videos. Our work diers from theirs by focusing on the eects of user to FE Cluster mapping changes on users, while they focus on understanding the mapping dynamics themselves. We also have a broader coverage on user prefixes and CDN FE Clusters while theirs is deeper from a few vantage points. Cases et al. [CFB13] and Finamore et al. [FGM + 12] each study associations between web services, hosting organizations, content-server IPs, and service provisioning. They use min-RTT estimates to cluster IPs to datacenters. They use measurements from one ISP and observe user/datacenter switches suggesting load balancing. We also cluster IPs to datacenters, but with many vantage points [CFH + 13]. While their clustering may be enough for their research purpose, we prove that using many vantage points in clustering can provide better accuracy in our study. Both their work and ours identifies load bal- ancing and mapping changes, but they apply their work to provisioning while we study its eects on end-users. 177 Fiadino et al. use a month of HTTP flow data collected from a major European ISP to study the trac anomaly caused by cache selection dynamics and the impacts on both ISP and users [FDC14, FDB + 14]. They identify anomaly of Facebook trac by observing a large amount of flow shift from Akamai to other hosting organizations of Facebook. They report the anomaly may increase the transit cost of the users’ ISP. They also found Youtube trac anomaly that shift trac to dierent sets of /24 subnets of Youtube and found that the shift aected user experienced throughput. Our work diers from them in the following ways. First, the methodologies are quite dierent. They detect synchronized mapping changes for particular web services by watching for large shifts in flow volumes, while we directly measure target FE Clusters with EDNS-client- subnet and direct DNS queries. Their approach is ideal for studying a single ISP when trac is available, but the second dierence is that our approach allows us to provide much broader coverage. We examine 32k user prefixes from hundreds of countries and ASes, while their study focuses only on users of a single ISP. Last, we study how often users trac changes countries. 4.10 Conclusions This chapter provides the first evaluation of the dynamics of CDN redirection of user’s network prefixes to Front-End Clusters from a large range of prefixes. We propose an improved IP address clustering algorithm to accurately cluster topologically and geo- graphically close Front-End IP addresses into FE Clusters. We gather new data about Google and Akamai, and we find that some prefixes switch between FE Clusters that are long distances apart, often seeing large changes in latency and application-level perfor- mance. While most of prefixes only stay shortly on FE Clusters that have large appli- cation level latency, a few percent of prefixes are mapped to those FE Clusters much of 178 the time. We characterize reasons of the mapping changes and found evidences of FE Clusters drain/restoration, load balancing, etc. We also find that many user prefixes are directed to multiple countries in a month, complicating questions of jurisdiction. This chapter provides a third strong evidence to support our thesis statement as dis- cussed inx 1.2. We take three steps to achieve our smart selection of probing sources. We first choose a large number of open resolvers (600k) to maximize the recall of enu- meration of CDN Front-Ends. Second, we reduce many redundant probing sources and use the remaining ones (32k) to enable ecient enumeration of FE Clusters that can finish in 15 minutes. This smart selection of probing sources is essential to support the kind of analysis we did in this chapter. We want to study the dynamic of user-to-FE Cluster mapping, which requires longitudinal short-interval enumerations. At the same time, our targets are large CDNs that each has many FE Clusters around the world. If our probings only cover a small part of their CDN network, our result may be biased. Thus, first maximizing coverage of FE Clusters and then reducing redundant probing sources to enable short-interval enumeration is ideal for our study. We then show that this smart selection of probing sources enables ecient enumeration of FE Clusters. We can finish one round of enumeration within 15 mins while still covering most of FE Clusters. Last, using this ecient enumeration of CDN FE Clusters, we are able to study the dynamics of user-to-FE Cluster mapping. We believe other studies that need both longitudinal measurements and coverage on service replicas can also benefit from the way we do smart selection of probing sources. 179 Chapter 5 Future Work and Conclusions In this chapter, we discuss possible directions for future work and conclude this thesis. 5.1 Future work There are directions for immediate future work to strengthen the validity of our claims. In addition, our work also suggest some future studies that can benefit from similar ideas as we do in this dissertation. We next discuss the possible future work. 5.1.1 Immediate Future Work for Our Studies In Chapter 2 we proposed novel method to generate IP hitlist to eciently enumerate Internet edge links. There are two directions for future work. First, we would like to better understand the stability of Internet address usage. Our study showed only 50–60% of informed representatives respond three months later, implying a great deal of churn on Internet address usage. We would like to better understand why usage of the remaining addresses cannot be better predicted. More detailed analysis of gone-dark and unstable blocks may provide more information; and a combination of our data with information IANA assignments, RIR registration and block reassignment may explain causes for failure to respond. Alternatively, changes in firewalling may explain address instability. Second, we expect to get more information about how hitlists are used. Several groups expressed concern that representatives may become overloaded as multiple groups target them with measurement trac. Our research group have conduced experiment in 2013 180 and confirm that at that time there is no evidence of hitlist focusing probing trac to representatives. However, as we have growing number of hitlist users, we expect to run an automated and periodical monitoring of hitlist usage. If representative load becomes high, it will suggest operational changes, such as multiple representatives per /24 block. In Chapter 3 we evaluated dierent method for enumeration of DNS anycast. We proposed to use IN query which support large number of open resolvers to achieve high recall. There are three areas of immediate future work. The first is to reduce the num- ber of open resolvers to reduce duplicated queries. Typically there are only hundreds of anycast nodes for an anycast service, yet we use 300k open resolvers for enumer- ation. While using as many open resolvers as possible is the best way to guarantee maximized recall, we think there are ways to reduce the number of open resolvers and keep high recall. One possible direction is to remove duplicated open resolvers that are all under the same network, organization or even ASes. Recent work [CJR + 15] has reported that simply select vantage points that are physically distant from each other can achieve a good recall. However a more systematic study can be done to make sugges- tions on specific rules of how to choose open resolvers and evaluate the enumeration recall of dierent options of rules. The second area for future work is further study on security issues of anycast enumeration. As we briefly discussed inx 3.7, discour- aging masquerader spoofing is not easy, particularly the protection from reply replay. Reply replay is essentially a Man-in-the-middle attack. While there are solutions using authentication and encryption, those solutions work well when both client and server side can easily perform authentication and encryption. In our case, we propose to use Internet open resolvers as vantage points to relay DNS queries. We can’t run any code on open resolvers which means the traditional solutions for Man-in-the-middle attack won’t work. As a result, further studies are needed to prevent reply replay in the context 181 of anycast enumeration. The last area of future work is to extend our anycast enumer- ation to non-DNS anycast. Anycast is also used by CDNs to route the users to nearby servers, such as Edgecast, CloudFlare [Pri11] and Microsoft’s own CDN [FMM + 15]. Since our anycast enumeration is specific to anycast used in DNS system, we expect that future work would extend the methods to include anycast used in CDNs. In Chapter 4 we studied CDN Front-End to user mapping from large number of user prefixes and our target services included Google search and two large websites hosting on Akamai. There are several directions for future work. The first possible direction for future work is to extend the study to more Google services. We want to understand whether Google uses the same DNS based algorithm of user mapping for all of their ser- vice or there are dierent algorithms for dierent services. Understanding this question help us assess how widely our results can apply. In addition, we expect to extend the study to other large CDNs which use DNS to map users to their Front-Ends. The sec- ond direction for future work is to further characterize the impacts of mapping changes on users. We have shown that many mapping changes are between distant switching pairs and these mapping changes are likely to cause large latency changes. However, a further question is how long user prefixes are mapped to a potential distant FE Clus- ter. This study can help us further understand whether the distant mapping changes will aect users a lot. The third direction for future work is to parallelize the probings to include more user prefixes and use a shorter probing interval. While the parallelism requires more resources, more user prefixes makes our result more general, and shorter probing interval reduces the number of mapping changes we miss. The fourth direction could be making our results end-user friendly. We show some aggregated results about geographic footprint of user prefixes in our work, but those results can’t directly ben- efit end-users. We expect that some browser add-ons that shows which FE Cluster is currently serving the user and prompt notices when users experience mapping changes 182 can be appealing for end users. Last, we can further characterize the reasons of the mapping changes. In our study, we find evidences of several reasons for the mapping changes, but we didn’t characterize the reason for each individual mapping changes that we observed. Future work could develop novel algorithms to characterize the reason for every mapping change and study how many mapping changes are caused by each reason. For example, estimating the load change on the FE Cluster can help understand if load balancing happens so as to judge whether the mapping changes associated with that FE Cluster are caused by load balancing. 5.1.2 Future Work Suggested by This Thesis In addition to the above immediate future work that is closely related to our three specific studies, our thesis suggests much wider range of opportunities for future studies. Our thesis demonstrate the benefit of smart selection of probing sources and destinations on global services behavior study. We next suggest two areas of future work based on smart selection of probing sources and destinations respectively. First, future work would benefit from our hitlist which help eciently reach respon- sive targets. Our hitlist contributes the best value when it is used in places where the costs of probings are high. An example case is active probing form inside of a cloud service. Active probing from inside of cloud service is very useful for third-party studies of the outside view of cloud services, such as user-perceived latency, provisioning and availability. Traditionally, to study the behavior of an Internet service by active probing from outside, researchers need to get access to a large number of vantage points which is very hard. The cloud service provides an opportunity for researchers to probe from inside which means probing to large number of user prefixes are possible. However, since cloud providers often charge network trac, probing to many unresponsive des- tinations that don’t provide useful information is costly. As a result, we believe future 183 work could use our hitlist to do cost eective third-party studies of cloud service behav- ior. Researchers could set up virtual machines (VMs) in the target cloud service and make active probings from those VMs to targets suggested by our hitlist. Note that most cloud providers have multiple datacenter locations, thus measurements need to cover all datacenters. Possible measurements could be pings which provide network latency information and aliveness information. Network latency measurements can be used by clients of cloud services for two purposes: comparing dierent cloud service providers; deciding which cloud datacenter is the best for their (clients of cloud services) own users. Aliveness measurements can be used for the same purpose. In addition, such studies of cloud service could even extend our method of hitlist generation by further reducing the number of representatives in the destination list. An example would be an organization assigns several /24 prefixes to boxes in the same building and one respon- sive IP address in these several /24 prefixes will be enough as a representative. The expected outcome would be certain cloud provider or datacenter out-performs others for users within certain geographic areas or users in certain network locations like ISPs or ASes. Second, future work would benefit from using open resolvers to study behavior of global Internet services. Open resolvers provide a good way to quickly access to large number of widely distributed vantage points and are ideal probing sources for stud- ies of DNS services. One example future study to take advantage of open resolvers is to measure and compare managed DNS providers using open resolvers. Managed DNS services arise to provide organizations with simplicity, performance and reliabil- ity improvements, and probably some other features such as geographic load-balancing etc. As many options are available, users may wonder which provider to choose, and this question leads to the need of comparison of dierent managed DNS providers. Because of the dierences of server locations and target markets, dierent DNS providers may 184 out-perform others at dierent locations. As a result, to make fair comparison, the measurement should include many diverse probing sources from dierent locations, which can be done using open resolvers. An existing technique [GSG02] can be used to estimate name resolution latency between the open resolvers and the managed DNS providers. Thus, researchers can study the latency, availability of managed DNS providers from a large number of vantage points. In addition, longitudinal measure- ments are needed to make reliable conclusions that don’t suer from temporary net- work dynamics. Consequently, researcher can down-sample open resolvers to both keep diversity and reduce redundant probes like we do in Chapter 4. Possible outcome would be certain managed DNS provider can best serve end users in certain geographic areas or network locations. 5.2 Conclusions The Internet is growing fast and is an important part of people’s lives. As more routers and links joining the Internet, online services are becoming bigger. Service providers starts to replicate their services to distribute trac and reduce user latency. As the Inter- net grows, to keep smooth operation of the Internet and online services, researcher and operators need to track and understand the behavior of large Internet services. Dier- ent studies try to understand service behavior and many of them need ecient service enumeration. In order to achieve smooth operation of the Internet, researchers need to learn more about the topology, reachability and performance of the Internet. Many of these studies need the enumeration of Internet hosts and links. In order to achieve smooth operation of the large online services, researchers study the provisioning and performance of large online services. Many of these studies of large online services need to enumerate the service replicas. In order to reduce the measurement trac and 185 support longitudinal studies, the enumeration of the services often needs to be ecient. This dissertation shows that the ecient enumeration of services is both important and challenging. We propose methods to overcome the challenges to achieve ecient ser- vice enumeration which can further benefit many dierent studies to achieve smooth operation of both the Internet and global services. Thesis statement This dissertation makes the following thesis statement: smart selection of probing sources and destinations enables ecient enumeration of global Internet services to track and understand their behavior. Demonstrating the thesis statement To demonstrate the thesis statement, we presents three specific studies, each using either smart selection of probing destinations or smart selection of probing sources to enable ecient service enumeration. Our three studies each presents a smart selection of either probing destinations or sources. In our first work, we achieved our smart selection of probing destinations (hitlist) through informed prediction of the responsiveness of IP addresses. Specifically, we used the history of responsiveness of each IP address to predict the chance that the IP address would be responsive. We selected one IP address that was the most likely to be responsive as the representative of a /24 address block. Our hitlist then contained one such representative IP address per each allocated /24 block, thus the hitlist could provide good responsiveness. Our hitlist is a smart selection of probing destinations because it can be generated automatically and periodically, and researchers can find more (1:7 more) responsive representative addresses using hitlist than using random probing in a single round. In our second study, our smart selection of probing sources was a large number of open resolvers in the Internet which support both high recall of enumeration and on-demand measurements. We use our selections of probing sources to 186 enumerate anycast nodes of DNS services. We compare the performance of our selected open resolvers to two other types of probing sources—PlanetLab nodes and Netalyzr users. We showed that PlanetLab nodes couldn’t get high recall and Netalyzr users couldn’t support on-demand measurements. Only our selection of a large number of open resolvers supported both high recall and on-demand measurements. While our selection of open resolvers contained many duplicated probing sources which meant it was not cost ecient, sending large amount of trac, requiring a large amount of time, bandwidth, and CPU. We believe the redundancy was required for maximizing the enumeration recall for the detection of service anomalies. We also believe that our smart selection of probing sources in the second work can be used to drive fur- ther smart selections for studies that can’t aord too many redundant measurements, because without using a large number of probing sources to maximize recall, it is hard to know which probing sources are each observing a dierent anycast node. For these reasons, we believe our selection of open resolvers as probing sources is a smart selec- tion for anycast enumeration. In our third study, we smartly selected probing sources by down-sampling a large number of open resolvers to study dynamic of user to FE Cluster mapping of CDNs. Specifically, we used a large number of open resolvers to enumerate CDN Front-Ends. In order to make the enumeration finish in short time (15 minutes in our study), we selected a subset of open resolvers which has the same coverage as the large set. We also cluster the Front-Ends into FE Clusters, so that when we con- sider coverage of CDN infrastructure, we only need to consider the coverage of a small number of FE Clusters but not a large number of Front-Ends. Our selection of a subset of open resolvers is a smart selection because it covers a good part of both CDNs and supports short interval probing (15 minutes). The second part of the thesis statement was that the smart selection of measure- ments enables ecient enumeration of global services. Our studies each demonstrated 187 this second part of the thesis statement. In our first study, we evaluated and compared the responsiveness of our hitlist to the responsiveness of random probing technique. We found that using our hitlist researchers could find at least 1:7 more links than tracer- oute to randomly selected representatives in a single round. Thus, using our hitlist representatives as probing destinations, researchers can find many more links in the same amount of probing trac than using random representatives. This result showed our hitlist enabled ecient enumeration of Internet edge links. In our second study, we evaluated three dierent probing sources in anycast enumeration and found that our selection of a large number of open resolvers was the only one that supported both on- demand measurements and high recall of enumeration of anycast nodes. The evaluation showed that our smart selection of probing sources enabled ecient enumeration of anycast nodes. In our last study, our smart selection of probing sources enabled us to do ecient enumeration of CDN Front-Ends. We could finish one round of enumeration in 15 minutes and periodically enumerated CDN Front-End for four weeks. The last part of our thesis statement was ecient enumeration of global services helped tracking and understanding the behavior of the services. Our studies each demon- strated this third part of our thesis statement. For the first study, our hitlists together with our method to generate them, were used in other research projects to understand vari- ous behaviors of large Internet services, including detection of third-party addresses in traceroute, massive IP addresses geolocation, and ecient detection of Internet outages. In our second work, we used the enumeration of anycast nodes to detect anomalies of anycast services. We found a masquerading F-root node that was not operated by autho- rized organization and also many potential masquerading root name server nodes. In our third study, we used the ecient enumeration of CDN Front-Ends, to study the dynam- ics of user-to-FE Cluster mapping and identified their impacts over time. We found that user-to-FE Cluster mapping changes were common and some of the mapping changes 188 were between distant FE Clusters that were likely to cause large latency changes. We also found many users were mapped to non-domestic FE Clusters, complicating ques- tions of jurisdiction. Generalization Our three studies supported our thesis statement as specific examples, and together as a whole, they suggested our thesis generalizes to many other studies. There are published studies that directly benefit from our result or method. Our IP hitlist and the method to generate hitlist have been used by other studies to study the behavior of the Internet [MdDP13, HH12, QHP13]. Inspired by our proposal of using IN query to take advantage of open resolvers to enumerate anycast service, AS112 project started to use open resolvers to enumerate their nodes [AS112b]; L-root start to support IN TXT query for anycast enumeration [AM14]. These studies suggest our thesis can ben- efit many other classes of studies. Studies that need active probing to large number of destinations could benefit from smart selection of probing destinations like we do to generate hitlist. Studies that need to maximize recall of enumeration of service replicas could benefit from selecting widely distributed probing sources like we do for anycast enumeration. Studies that need both short measurement period and coverage on service replicas could benefit from first increasing the recall of enumeration and then reducing redundant probing sources like we do in CDN user anity study. Additional contributions In addition to demonstrate the thesis statement, our studies also presented additional contributions that provided new knowledge or methods that could be useful for other studies. In our anycast enumeration study, we discussed potential security implications that are related to our proposed methods. We proposed approaches to limit the enu- meration of anycast nodes to the anycast providers. In our study of dynamics of CDN user-to-FE Cluster mapping, we evaluated the use of CCG over open resolvers. We 189 showed that using CCG over 600k open resolvers provided similar accuracy as using CCG over all routable /24 prefixes. This evaluation extends the application scope of CCG. To sum up, in this dissertation, we showed that ecient enumeration of services was a mechanism for many studies to achieve the goal of smooth operation of both the Inter- net and global Internet services. For example, we showed that our smart selection of probing destinations—hitlist, enabled ecient edge links enumeration (1:7 more edge links than random probing in a single round) which can further help achieve smooth operation of the Internet. We also showed that smart selection of measurements can help achieve ecient service enumeration. For example, our selection of open resolvers achieved both high recall of anycast enumeration (larger than 90%) and on-demand measurements. We presented three studies to demonstrate that smart selection of prob- ing sources and destinations enabled ecient enumeration of global Internet services to track and understand their behavior. In addition, our thesis benefited some published studies to understand dierent behaviors of large Internet services, such as Internet out- age detection, massive IP addresses geolocation, etc. We also proposed directions for future work that can benefit from our thesis, such as using hitlist to study performance of cloud services and using open resolvers to compare managed DNS services. These pub- lished and future work together suggest our thesis can be generalized to many classes of studies to help make the Internet better, such as Internet reachability studies and provi- sioning of global Internet services. 190 Bibliography [AA14] Klaus Ackermann and Simon D. Angus. A resource ecient big data analysis method for the social sciences: The case of global IP activity. Procedia Computer Science, 29:2360–2369, 2014. [AAL + 05] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose. DNS security introduction and requirements. RFC 4033, 2005. [ABKS99] Mihael Ankerst, Markus M. Breunig, Hans-peter Kriegel, and J¨ org Sander. OPTICS: Ordering points to identify the clustering structure. In SIGMOD, 1999. [Abl03] Joe Abley. Hierarchical anycast for global service distribution. Technical Report ISC-TN-2003-1, ISC, March 2003. [AJB00] R´ eka Albert, Hawoong Jeong, and Albert-L´ aszl´ o Barab´ asi. Error and attack tolerance in complex networks. Nature, 406:378–382, July 27 2000. [AL06] J. Abley and K. Lindqvist. Operation of anycast services. RFC 4786, 2006. [ALR + 08] H.A. Alzoubi, S. Lee, M. Rabinovich, O. Spatscheck, and J. Van der Merwe. Anycast CDNs revisited. In the 17th international conference on World Wide Web, pages 277–286. ACM, 2008. [AM11] J. Abley and W. Maton. AS112 nameserver operations. RFC 6304, Inter- net Request For Comments, July 2011. [AM14] J. Abley and T. Manderson. A summary of various mechanisms deployed at l-root for the identification of anycast nodes. RFC 7108, 2014. [AMSU11] Bernhard Ager, Wolfgang M¨ uhlbauer, Georgios Smaragdakis, and Steve Uhlig. Web content cartography. In IMC, 2011. 191 [AS112a] AS112 Project. As112 server operators listing. http://public.as112. net/node/10, 2012. [AS112b] AS112 Project. How many public AS112 nodes are there as of March 31, 2012? http://public.as112.net/node/30, April 2012. [AS15] J. Abley and W. Sotomayor. AS112 nameserver operations. RFC 7534, 2015. [Aus07] R. Austein. Dns name server identifier (nsid) option. RFC 5001, 2007. [BB05] Peter Boothe and Randy Bush. Anycast measurements used to highlight routing instabilities. In NANOG 34, May 2005. [BDS11] Karyn Benson, Rafael Dowsley, and Hovav Shacham. Do you know where your cloud files are? In Cloud Computing Security Workshop, 2011. [BF05] H. Ballani and P. Francis. Towards a global ip anycast service. In Proc. of ACM SIGCOMM. ACM, August 2005. [BFR06] Hitesh Ballani, Paul Francis, and Sylvia Ratnasamy. A measurement- based deployment proposal for IP anycast. In IMC, pages 231–244, Rio de Janeiro, Brazil, October 2006. [BHM + 07] Randy Bush, James Hiebert, Olaf Maennel, Matthew Roughan, and Steve Uhlig. Testing the reachability of (new) address space. In Proc. of ACM Workshop on Internet Network Management, pages 236–241, Kyoto, Japan, August 2007. ACM. [BLKT04] P. Barber, M. Larson, M. Kosters, and P. Toscano. Life and times of j-root. In NANOG 34 meeting, October 2004. [BMRU09] Randy Bush, Olaf Maennel, Matthew Roughan, and Steve Uhlig. Inter- net optometry: assessing the broken glasses in internet reachability. In Proc. of ACM Intermet Measurement Conference, pages 242–253. ACM, November 2009. [BSS08] Adam Bender, Rob Sherwood, and Neil Spring. Fixing Ally’s growing pains with velocity modeling. In Proc. of 8th ACM Intermet Measurement Conference, pages 337–342, V ouliagmeni, Greece, October 2008. ACM. [Bus05] Randy Bush. Dns anycast stability: some initial results. CAIDA/WIDE workshop, http://www.caida.org/workshops/wide/0503/slides/ 050311.wide-anycast.pdf, March 2005. 192 [CAI10a] CAIDA. The internet topology data kit—2010-01. http://www.caida. org/data/active/internet-topology-data-kit/, January 2010. [CAI10b] CAIDA. The ipv4 routed /24 topology dataset 2009-12. http://www. caida.org/data/active/ipv4_routed_24_topology_dataset.xml, January 2010. [CB08] David Chones and Fabi´ an E. Bustamante. Taming the torrent: A prac- tical approach to reducing cross-ISP trac in peer-to-peer systems. In SIGCOMM, 2008. [CC95] Mark E. Crovella and Robert L. Carter. Dynamic server selection in the Internet. In Proc. of Third. IEEE, August 1995. [CC97] R. L. Carter and M. E. Crovella. Server selection using dynamic path characterization in wide-area networks. In Proc. of IEEE Infocom, Kobe, Japan, April 1997. IEEE. [CCR + 03] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzoniak, and M. Bowman. Planetlab: an overlay testbed for broad-coverage ser- vices. ACM SIGCOMM CCR, 33(3):3–12, 2003. [CFB13] Pedro Casas, Pierdomenico Fiadino, and Arian Bar. Ip mining: Extracting knowledge from the dynamics of the internet addressing space. In ITC, 2013. [CFH + 13] Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann, and Ramesh Govindan. Mapping the Expansion of Google’s Serving Infras- tructure. In Proc. of ACM Intermet Measurement Conference, October 2013. [CH09] Xue Cai and John Heidemann. Understanding address usage in the visible internet. Technical Report ISI-TR-2009-656, USC/Information Sciences Institute, February 2009. [CHK + 09] Kimberly Clay, Young Hyun, Ken Keys, Marina Fomenkov, and Dmitri Krioukov. Internet mapping: from art to science. In Proc. of IEEE Cyber- security Applications and Technologies Conference for Homeland Secu- rity (CATCH), pages 205–211, Alexandria, V A, USA, March 2009. IEEE. [CJJ + 02] Eric Cronin, Sugih Jamin, Cheng Jin, Anthony R. Kurc, Danny Raz, and Yuval Shavitt. Constrained mirror placement on the Internet. IEEE JSAC, 20(7):1369–1383, September 2002. 193 [CJR + 15] Danilo Cicalese, Diana Joumblatt, Dario Rossi, Marc-Olivier Buob, Jor- dan Aug´ e, and Timur Friedman. A fistful of pings: Accurate and lightweight anycast enumeration and geolocation. In Proc. of IEEE Info- com, Hong Kong, April 2015. IEEE. [Col05] L. Colitti. Eect of anycast on k-root. dns-oarc workshop, july 2005. DNS-OARC Workshop, July 2005. [Con13] Josh Constine. Facebook And 6 Phone Companies Launch Internet.org To Bring Aordable Access To Everyone. http://techcrunch.com/2013/ 08/20/facebook-internet-org/, 2013. [CPRW03] David D. Clark, Craig Partridge, J. Christopher Ramming, and John T. Wroclawski. A knowledge plane for the Internet. In Proc. of ACM SIG- COMM, pages 3–10, Karlsruhe, Germany, August 2003. ACM. [Cut06] Doug Cutting. Scalable computing with Hadoop. http://wiki.apache. org/lucene-hadoop-data/attachments/HadoopPresentations/ attachments/yahoo-sds.pdf, May 2006. Lecture note. [CvLL12] C. Contavalli, W. van der Gaast, S. Leach, and E. Lewis. Client sub- net in DNS requests, April 2012. Work in progress (Internet draft draft- vandergaast-edns-client-subnet-01). [DG04] Jerey Dean and Sanjay Ghemawat. MapReduce: Simplified data pro- cessing on large clusters. In Proc. of USENIX OSDI, pages 137–150, San Francisco, California, USA, December 2004. USENIX. [DH06] Alex Dekhtyar and Jane Human Hayes. Good benchmarks are hard to find: Toward the benchmark for information retrieval applications in soft- ware engineering. In Proc. of 22nd International Conference on Software Maintenance, Philadelphia, Pennsylvania, USA, September 2006. ACM. [DMP + 02a] John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, Ramesh Sitaraman, and Bill Weihl. Globally distributed content delivery. Internet Computing, IEEE, 6(5):50–58, 2002. [DMP + 02b] John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, Ramesh Sitara- man, and Bill Weihl. Globally distributed content delivery. IEEE Internet Computing, 6(5):50–58, September 2002. [Dou13] Doug Madory and Chris Cook and Kevin Miao. Who Are the Anycast- ers? https://www.isc.or://www.nanog.org/sites/default/files/ wed.general.cowie_.anycasters.37.pdf, October 2013. 194 [DWH13] Zakir Durumeric, Eric Wustrow, and J. Alex Halderman. ZMap: Fast Internet-wide scanning and its security applications. In Proceedings of the 22nd USENIX Security Symposium, August 2013. [Edg13] Anna Edgerton. NSA Spying Allegations Put Google on Hot Seat in Brazil. http://www.businessweek.com/news/2013-10-28/ nsa-spying-allegations-put-google-on-hot-seat-corporate-brazil, 2013. [EPS + 98] R. Engel, V . Peris, D. Saha, E. Basturk, and R. Haas. Using IP anycast for load distribution and server location. In Proc. of Global Internet, Decem- ber 1998. [FDB + 14] Pierdomenico Fiadino, Alessandro D’Alconzo, Arian Bar, Alessandro Finamore, and Pedro Casas. On the detection of network trac anomalies in content delivery network services. In ITC, 2014. [FDC14] Pierdomenico Fiadino, Alessandro D’Alconzo, and Pedro Casas. Char- acterizing web services provisioning via cdns: The case of Facebook. In TRAC, 2014. [Fed10] Federal Communications Commission. Connecting America: The National Broadband Plan. http://transition.fcc.gov/ national-broadband-plan/national-broadband-plan.pdf, 2010. [FGM + 12] Alessandro Finamore, Vinicius Gehlen, Marco Mellia, Maurizio Munaf` o, and Saverio Nicolini. The need for an intelligent measurement plane: The example of time-variant cdn policies. IEEE NETWORKS, 2012. [FH10] Xun Fan and John Heidemann. Selecting representative ip addresses for internet topology studies. In IMC, pages 411–423, Melbourne, Australia, November 2010. ACM. [FHG11] X. Fan, J. Heidemann, and R. Govindan. Improving diagnostics of name server anycast instances (draft-anycast-diagnostics-01.txt). discussed on dnsop@ietf.org mailing list and at http://www.isi.edu/ ~ xunfan/ research/draft-anycast-diagnostics.txt, October 2011. [FHG12] Xun Fan, John Heidemann, and Ramesh Govindan. Characterizing any- cast in the domain name system (extended). Technical report, Information Sciences Institute, May 2012. [FHG13a] Xun Fan, John Heidemann, and Ramesh Govindan. Evaluating anycast in the domain name system. In Proc. of IEEE Infocom, pages 1681–1689, Turin, Italy, April 2013. IEEE. 195 [FHG13b] Xun Fan, John Heidemann, and Ramesh Govindan. Evaluating anycast in the domain name system. In INFOCOM, 2013 Proceedings IEEE, pages 1681–1689. IEEE, 2013. [FJJ + 01] Paul Francis, Sugih Jamin, Cheng Jin, Yixin Jin, Danny Raz, Yuval Shavitt, and Lixia Zhang. IDMaps: A global internet host distance esti- mation service. ACM/IEEE Transactions on Networking, 9(5):525–540, October 2001. [FKBH15] Xun Fan, Ethan Katz-Bassett, and John Heidemann. Assessing anity between users and CDN sites. In International Workshop on Trac Mon- itoring and Analysis (TMA), April 2015. [FLYV93] V . Fuller, T. Li, J. Yu, and K. Varadhan. Classless inter-domain rout- ing (CIDR): an address assignment and aggregation strategy. RFC 1519, Internet Request For Comments, September 1993. [FMM + 15] Ashley Flavel, Pradeepkumar Mani, David Maltz, Nick Holt, Jie Liu, Yingying Chen, and Oleg Surmachev. Fastroute: A scalable load-aware anycast routing architecture for modern cdns. In 12th USENIX Sympo- sium on Networked Systems Design and Implementation (NSDI 15), pages 381–394, Oakland, CA, May 2015. USENIX Association. [GH07] Steve Gibbard and Packet Clearing House. Geographic implications of dns infrastructure distribution. The Internet Protocol Journal, 10(1):12– 24, 2007. [GM03] B. Greene and D. McPherson. Isp security: Deploying and using sink- holes. NANOG talk,http://www.nanog.org/mtg-0306/sink.html, June 2003. [GS95] James D. Guyton and Michael F. Schwartz. Locating nearby copies of replicated internet servers. In Proc. of ACM SIGCOMM, pages 288–298, Cambridge, Massachusetts, August 1995. ACM. [GSG02] K.P. Gummadi, S. Saroiu, and S.D. Gribble. King: Estimating latency between arbitrary internet end hosts. In Proc. of the 2nd ACM SIGCOMM Workshop on Internet measurment, pages 5–18. ACM, 2002. [GT00] Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for Internet map discovery. In Proc. of IEEE Infocom, pages 1371–1380, Tel Aviv, Israel, March 2000. IEEE. 196 [GZCF06] Bamba Gueye, Artur Ziviani, Mark Crovella, and Serge Fdida. Constraint- based geolocation of Internet hosts. IEEE/ACM TON, 14(6):1219–1232, December 2006. [Har02] T. Hardie. Distributing authoritative name servers via shared unicast addresses. RFC 3258, 2002. [HBvR + 13] Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C Li. An analysis of Facebook photo caching. In ACM SOSP, 2013. [Hel10] Miguel Helft. Breaking ground on our first custom data center. http://bits.blogs.nytimes.com/2010/11/11/ facebook-chooses-north-carolina-for-new-data-center/, 2010. [HFMkc01] Bradley Huaker, Marina Fomenkov, David Moore, and kc clay. Macro- scopic analyses of the infrastructure: measurement and visualization of internet connectivity and performance. http://www.caida.org/ outreach/papers/pam2001/skitter.xml, November 2001. [HFP + 02] Bradley Huaker, Marina Fomenkov, Daniel J. Plummer, David Moore, and k clay. Distance metrics in the internet. In Proc. of IEEE Interna- tional Telecommunications Symposium. IEEE, 2002. [HH12] Zi Hu and John Heidemann. Towards geolocation of millions of IP addresses. In IMC, 2012. [Hic10] Matt Hicks. Breaking ground on our first custom data center. http:// blog.facebook.com/blog.php?post=262655797130, 2010. [Hou12] Packet Clearing House. Looking glass. http://lg.pch.net/cgi-bin/ lgform.cgi, accessed in 2012. [HPG + 08] John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopou- los, Genevieve Bartlett, and Joseph Bannister. Census and survey of the visible Internet. In Proc. of ACM Intermet Measurement Conference, pages 169–182, V ouliagmeni, Greece, October 2008. ACM. [HSF15] P. Homan, A. Sullivan, and K. Fujiwara. Dns terminology (draft-ietf- dnsop-dns-terminology). Internet-Draft, 2015. [Hui01] C. Huitema. An anycast prefix for 6to4 relay routers. RFC 3068, 2001. 197 [HWLR08] Cheng Huang, Angela Wang, Jin Li, and Keith W. Ross. Measuring and evaluating large-scale CDNs. Technical Report MSR-TR-2008-106, Microsoft Research, October 2008. [Int12] Internet Assigned Numbers Authority. Root zone database. http://www. iana.org/domains/root/db/, accessed in 2012. [Int15] Internet Systems Consortium, Inc. https://www.isc.org, 2015. [ITU14] ITU. ITU releases 2014 ICT figures. http://www.itu.int/net/ pressoffice/press_releases/2014/23.aspx, 2014. [Kar05] Daniel Karrenberg. DNS root name servers frequently asked questions, January 2005. [Key08] Ken Keys. IP alias resolution techniques. Technical report, CAIDA, 2008. [KL15] P. Koch and M. Larson. Initializing a dns resolver with priming queries draft-ietf-dnsop-resolver-priming-05. https://tools.ietf.org/html/ draft-ietf-dnsop-resolver-priming-05, 2015. [KMS + 09a] Rupa Krishnan, Harsha V . Madhyastha, Sridhar Srinivasan, Sushant Jain, Arvind Krishnamurthy, Thomas Anderson, and Jie Gao. Moving beyond end-to-end path information to optimize CDN performance. In IMC, pages 190–201, 2009. [KMS + 09b] Rupa Krishnan, Harsha V . Madhyastha, Sridhar Srinivasan, Sushant Jain, Arvind Krishnamurthy, Thomas Anderson, and Jie Gao. Moving beyond end-to-end path information to optimize cdn performance. 2009. [Kob13] Nicole Kobie. Britain’s superfast broadband: who’s paying and when it’ll arrive. http://www.pcpro.co.uk/news/broadband/378754/ britains-superfast-broadband-whos-paying-and-when-it-ll-arrive, 2013. [KW00] Dina Katabi and John Wroclawski. A framework for scalable global IP- anycast (GIA). In Proc. of ACM SIGCOMM, pages 3–15, Stockholm, Sweeden, August 2000. ACM. [KWNP10] C. Kreibich, N. Weaver, B. Nechaev, and V . Paxson. Netalyzr: Illuminat- ing the edge network. In IMC, pages 246–259, 2010. 198 [KWZ01] Balachander Krishnamurthy, Craig Wills, and Yin Zhang. On the use and performance of content distribution networks. In Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, pages 169–182. ACM, 2001. [LABJ00] Craig Labovitz, Abha Ahuja, Abhijit Bose, and Farnam Jahanian. Delayed internet routing convergence. ACM SIGCOMM Computer Communica- tion Review, 30(4):175–187, 2000. [LAN04] LANDER. LANDER: Los Angeles Network Data Exchange and Reposi- tory. Project website http://www.isi.edu/ant/lander, 2004. [Lar13] Frederic Lardinois. Google X Announces Project Loon: Balloon-Powered Internet For Rural, Remote And Under- served Areas. http://techcrunch.com/2013/06/14/ google-x-announces-project-loon-balloon-powered-internet-for-rural-remote-and-underserved-areas/, 2013. [LHF + 07] Z. Liu, B. Huaker, M. Fomenkov, N. Brownlee, and K. Clay. Two days in the life of the dns anycast root servers. In Passive and Active Network Measurement, pages 125–134, 2007. [Lin06] Greg Linden. Make data useful. http://www.scribd.com/doc/4970486/ Make-Data-Useful-by-Greg-Linden-Amazon-com, 2006. [Mar15] Angelica Mari. Brazilian government promises major broadband investment. http://www.zdnet.com/article/ brazilian-government-promises-major-broadband-investment/, 2015. [Max13] MaxMind. http://www.maxmind.com/app/ip-location/, accessed in 2013. [MdDP13] Pietro Marchetta, Walter de Donato, and Antonio Pescap. Detecting third- party addresses in traceroute ip timestamp option. In Passive and Active Measurement, pages 21–30, Helsinki, Finland, March 2013. [MDS10] Danny McPherson, Ryan Donnelly, and Frank Scalzo. Unique per- node origin ASNs for globally anycasted services. Active Internet-Draft, November 2010. [Mer11] Merit Network, Inc. bgptables. http://bgptables.merit.edu/, accessed in 2011. 199 [Mil11] Rich Miller. Facebook goes global with data center in swe- den. http://www.datacenterknowledge.com/archives/2011/10/27/ facebook-goes-global-with-data-center-in-sweden/, 2011. [MIP + 06a] Harsha V . Madhyastha, Tomas Isdal, Michael Piatek, Colin Dixon, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. iPlane: An information plane for distributed services. In Proc. of 7th USENIX OSDI, pages 367–380, Seattle, WA, USA, November 2006. USENIX. [MIP + 06b] Harsha V . Madhyastha, Tomas Isdal, Michael Piatek, Colin Dixon, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. iPlane: An information plane for distributed services. In OSDI, 2006. [MKBA + 09] Harsha V . Madhyastha, Ethan Katz-Bassett, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. iPlane Nano: Path prediction for peer-to-peer applications. In Proc. of 6th USENIX Symposium on Net- work Systems Design and Implementation, Boston, MA, USA, April 2009. USENIX. [MKF + 06] Priya Mahadevan, Dmitri Krioukov, Marina Fomenkov, Bradley Huaker, Xenofontas Dimitropoulos, kc clay, and Amin Vahdat. The internet as- level topology: three data sources and one definitive metric. ACM Com- puter Communication Review, 36(1):17–26, January 2006. [Moc87a] P. Mockapetris. Domain names - concepts and facilities. RFC 1034, 1987. [Moc87b] P. Mockapetris. Domain names—concepts and facilities. RFC 1034, Inter- net Request For Comments, November 1987. [Moc87c] P. Mockapetris. Domain names - implementation and specification. RFC 1035, 1987. [Moo81] David A Moon. Chaosnet. 1981. [NCC15] RIPE NCC. Ripe atlas. https://atlas.ripe.net/, 2015. [OSRB12] John S Otto, Mario A S´ anchez, John P Rula, and Fabi´ an E Bustamante. Content delivery and the natural evolution of dns: remote dns trends, per- formance issues and alternative solutions. In Proceedings of the 2012 ACM conference on Internet measurement conference, pages 523–536. ACM, 2012. [Pfa09] Eric Pfanner. Broadband speeds surge in many countries. New York Times, page B8, Oct. 1 2009. 200 [PMM93] C. Partridge, T. Mendez, and W. Milliken. Host anycasting service. RFC 1546, 1993. [Pri11] Matthew Prince. A brief primer on anycast. https://blog.cloudflare. com/a-brief-anycast-primer/, 2011. [PS01] Venkata N. Padmanabhan and Lakshminarayanan Subramanian. An investigation of geographic mapping techniques for Internet hosts. In SIG- COMM, 2001. [QHP13] Lin Quan, John Heidemann, and Yuri Pradkin. Trinocular: understanding internet reliability through adaptive probing. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 255–266. ACM, 2013. [QHP14] Lin Quan, John Heidemann, and Yuri Pradkin. When the Internet sleeps: Correlating diurnal networks with external factors. In Proc. ofACM Inter- met Measurement Conference, page to appear, Vancouver, BC, Canada, November 2014. ACM. [Ran71] William M Rand. Objective criteria for the evaluation of clustering meth- ods. Journal of the American Statistical association, 66(336):846–850, 1971. [RD15] ROOT-DNS. http://www.root-servers.org/, 2015. [Reu13] Reuters. China to invest $323 bln to expand broadband to all-minister. http://www.reuters.com/article/2013/09/18/ china-broadband-idUSL3N0HE0O720130918, 2013. [Rob13] Frances Robinson. Google Sets Big Belgian Invest- ment. http://blogs.wsj.com/brussels/2013/04/10/ google-sets-big-belgian-investment/, April 2013. [SBC + 13] Florian Streibelt, Jan B¨ ottger, Nikolaos Chatzis, Georgios Smaragdakis, and Anja Feldmann. Exploring EDNS-client-subnet adopters in your free time. In IMC, 2013. [SBS08] Rob Sherwood, Adam Bender, and Neil Spring. DisCarte: A disjunc- tive Internet cartographer. In Proc. of ACM SIGCOMM, pages 303–315, Seatle, Washington, USA, August 2008. ACM. [Sek05] Yuji Sekiya. Passive and active dns measurement: update. CAIDA/WIDE workshop, http://www.caida.org/workshops/wide/ 0503/slides/sekiya.pdf, March 2005. 201 [SKB06] Ao-Jan Su, David R. Chones Aleksandar Kuzmanovic, and Fabi´ an E. Bustamante. Drafting behind Akamai (Travelocity-based detouring). In SIGCOMM, 2006. [SM94] Karen Sollins and Larry Masinter. Functional requirements for uniform resource names. RFC 1737, December 1994. [SM06] Matthew Sullivan and Luis Munoz. Suggested generic DNS nam- ing schemes for large networks and unassigned hosts. Work in progress Internet draft draft-msullivan-dnsop-generic-naming-schemes- 00.txt, April 2006. [SMW02] Neil Spring, Ratul Mahajan, and David Wetherall. Measuring ISP topolo- gies with Rocketfuel. In ACM CCR, pages 133–145, Pittsburgh, Pennsyl- vania, USA, August 2002. ACM. [SPT04] S. Sarat, V . Pappas, and A. Terzis. On the use of anycast in dns. Technical report, Johns Hopkins University, 2004. [Sta14] Stacey Higginbotham. Akamai signs deal with opendns to make the web faster. http://gigaom.com/2014/06/03/ akamai-signs-deal-with-opendns-to-make-the-web-faster/, 2014. [TFK + ] Ruben Torres, Alessandro Finamore, Jin Ryong Kim, Marco Mellia, Mau- rizio M Munafo, and Sanjay Rao. Dissecting video server selection strate- gies in the YouTube CDN. In 31st International Conference on Dis- tributed Computing Systems (ICDCS), pages 248–257. IEEE. [The79] The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont report: Ethical prin- ciples and guidelines for the protection of human subjects of research. Technical report, Department of Health, Education, and Welfare, April 1979. [The12] The Domain Name System Operations Analysis and Research Center. The as112 project. http://www.as112.net, accessed in 2012. [TWR11] Sipat Triukose, Zhihua Wen, and Michael Rabinovich. Measuring a com- mercial content delivery network. In Proceedings of the 20th international conference on World wide web, pages 467–476. ACM, 2011. 202 [USC06] USC/LANDER Project. Internet IPv4 address space census. PREDICT ID USC-LANDER/internet_address_survey_it11w-20060307. Retrieval information for this and other censuses is at http://www.isi.edu/ant/ traces/, March 2006. [WC07] S. Woolf and D. Conrad. Requirements for a mechanism identifying a name server instance. RFC 4892, 2007. [WCVY03] D.G. Waddington, F. Chang, R. Viswanathan, and B. Yao. Topology dis- covery for public IPv6 networks. ACM Computer Communication Review, 33(3):59–68, July 2003. [Wik12] Wikipedia. Comparison of dns server software. http://en.wikipedia. org/wiki/Comparison_of_DNS_server_software, accessed in 2012. [Wik13] Wikipedia. Internet censorship by country. http://en.wikipedia.org/ wiki/Internet_censorship_by_country, accessed in 2013. [WJFR10] Patrick Wendell, Joe Wenjie Jiang, Michael J. Freedman, and Jennifer Rexford. DONAR: Decentralized server selection for cloud services. In Proc. of ACM SIGCOMM, pages 231–242, New Delhi, India, August 2010. ACM. [WMW + 06] Feng Wang, Zhuoqing Morley Mao, Jia Wang, Lixin Gao, and Randy Bush. A measurement study on the impact of routing events on end-to- end Internet path performance. In Proc. of ACM SIGCOMM, Pisa, Italy, August 2006. ACM. [Wol98] Rich Wolski. Dynamically forecasting network performance using the network weather service. Journal of Cluster Computing, 1:119–132, Jan- uary 1998. Also released as UCSD technical report TR-CS96-494. [Wya10] Edward Wyatt. Despite ruling, F.C.C. says it will move forward on expanding broadband. New York Times, page B3, April 15 2010. [XS05] Qiang Xu and Jaspal Subhlok. Automatic clustering of grid nodes. In Proc. of 6th IEEE International Workshop on Grid Computing, 2005. [XYA + 07] Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moises Goldszmidt, and Ted Wobber. How dynamic are IP addresses? In Proc. of ACM SIGCOMM, pages 301–312, Kyoto, Japan, August 2007. ACM. 203 [Yev14] Yevgeniy Sverdlik. Facebook turned o entire data center to test resiliency. http://www. datacenterknowledge.com/archives/2014/09/15/ facebook-turned-off-entire-data-center-to-test-resiliency/, 2014. [ZAC + 13] Mingchen Zhao, Paarijaat Aditya, Ang Chen, Yin Lin, Andreas Hae- berlen, Peter Druschel, Bruce Maggs, Bill Wishon, and Miroslav Ponec. Peer-assisted content distribution in akamai netsession. In Proceedings of the 2013 conference on Internet measurement conference, pages 31–42. ACM, 2013. [ZHR + 12] Yaping Zhu, Benjamin Helsley, Jennifer Rexford, Aspi Siganporia, and Sridhar Srinivasan. LatLong: Diagnosing wide-area latency changes for CDNs. IEEE Transactions on Network and Service Management, 9(1), September 2012. [ZLMZ05] Beichuan Zhang, Raymond Liu, Daniel Massey, and Lixia Zhang. Col- lecting the internet as-level topology. ACM Computer Communication Review, 35(1):53–61, January 2005. [ZZH + 08] Z. Zhang, Y . Zhang, Y .C. Hu, Z.M. Mao, and R. Bush. Ispy: detecting ip prefix hijacking on my own. In Proc. of ACM SIGCOMM, pages 327–338, August 2008. 204
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Measuring the impact of CDN design decisions
PDF
Learning about the Internet through efficient sampling and aggregation
PDF
Improving user experience on today’s internet via innovation in internet routing
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
Anycast stability, security and latency in the Domain Name System (DNS) and Content Deliver Networks (CDNs)
PDF
Improving network reliability using a formal definition of the Internet core
PDF
Global analysis and modeling on decentralized Internet
PDF
Improving network security through collaborative sharing
PDF
Making web transfers more efficient
PDF
Balancing security and performance of network request-response protocols
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Rate adaptation in networks of wireless sensors
PDF
Congestion control in multi-hop wireless networks
PDF
Networked cooperative perception: towards robust and efficient autonomous driving
PDF
Design of cost-efficient multi-sensor collaboration in wireless sensor networks
PDF
Network reconnaissance using blind techniques
PDF
Enabling massive distributed MIMO for small cell networks
PDF
Robust routing and energy management in wireless sensor networks
PDF
Satisfying QoS requirements through user-system interaction analysis
PDF
Enabling virtual and augmented reality over dense wireless networks
Asset Metadata
Creator
Fan, Xun
(author)
Core Title
Enabling efficient service enumeration through smart selection of measurements
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Computer Networks)
Publication Date
07/28/2015
Defense Date
05/28/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
computer networks,Internet measurement,network measurement,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Heidemann, John (
committee chair
), Govindan, Ramesh (
committee member
), Katz-Bassett, Ethan (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
xunfan@outlook.com,xunfan@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-611799
Unique identifier
UC11300645
Identifier
etd-FanXun-3728.pdf (filename),usctheses-c3-611799 (legacy record id)
Legacy Identifier
etd-FanXun-3728.pdf
Dmrecord
611799
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Fan, Xun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
computer networks
Internet measurement
network measurement