Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Measuring the impact of CDN design decisions
(USC Thesis Other)
Measuring the impact of CDN design decisions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Measuring the Impact of CDN Design Decisions by Matt Calder A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2019 Copyright 2019 Matt Calder Acknowledgements First, I would like to thank my advisor, Ethan Katz-Bassett, for his guidance, friendship, and insistence on excellent work over the last six years. I am coming out of my Ph.D. with confidence I never could have imagined at the beginning. I would also like to thank Ramesh Govindan for his support in the early years of my Ph.D., and the rest of my committee, John Heidemann and Kostas Psounis. I am grateful for the fantastic friends I made working in the NSL lab at USC, Kyriakos Zarifis, Tobi Flach, and Calvin Ardi. I thank Lizsl De Leon for her amazing help over the years. A good deal of my Ph.D. work was carried out at Microsoft. Odin started as a tiny piece of the Footprint project at MSR with the support of Jitu and Ratul. It was eventually brought into production under the Azure Frontdoor team. I would like to thank the members of MOI who helped make Odin a success: Madhura Phadke, Jose Nunez de Caceres Estrada, Minerva Chen, and Manuel Schr¨ oder. I am forever grateful to Bob Morris from UMass Boston who took me under his wing while I was a broke undergraduate, gave me a job, exposed me to academic research, and sent me to my first conference (in New Zealand!). I’m also thankful to Robert Cohen involving me in his research projects. Swami Iyer and Josh Reyes are two friends from ii UMass Boston that a massive impact on my success while I was there. Thank you both. I would like to thank my previous advisor at the University of Edinburgh, Mahesh Marina, for his support in me continuing my Ph.D. at USC. Finally, I would like to thank my family. I thank my parents for their constant support and optimism. They instilled me with a work ethic which helped me keep going during difficult times. I owe a great deal to my wife, Akari, who understood how important this journey was for me and never wavered in her support. I am also grateful to my kids, Karin and Senan, who gave me a healthy new perspective on life. iii Table of Contents Acknowledgements ii List of Tables vii List of Figures viii Abstract xi Chapter 1: Introduction 1 1.1 Mapping CDN Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Supporting CDN Operations with Internet Measurement . . . . . . . . . . 4 1.3 CDN Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Analyzing the Performance of an Anycast CDN . . . . . . . . . . . 5 1.3.2 A Client Centric Approach to Latency Map Construction . . . . . 6 Chapter 2: Background 7 2.1 Content Delivery Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Client Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3: Mapping the Expansion of Google’s Serving Infrastructure 13 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Goal and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 Enumerating Front-Ends . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Client-centric Front-End Geolocation . . . . . . . . . . . . . . . . . 20 3.3.3 Clustering front-ends . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.4 Impact on Client Performance . . . . . . . . . . . . . . . . . . . . 26 3.3.4.1 Announced /24 Prefixes . . . . . . . . . . . . . . . . . . . 26 3.3.4.2 RIPE Atlas . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.4.3 PlanetLab . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 Coverage of Front-End Enumeration . . . . . . . . . . . . . . . . . 31 3.4.2 Accuracy of Client-Centric Geolocation . . . . . . . . . . . . . . . 34 3.5 Mapping Google’s Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.1 Growth Over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5.2 Characterizing the Expansion . . . . . . . . . . . . . . . . . . . . . 41 3.5.3 Impact on Geolocation Accuracy . . . . . . . . . . . . . . . . . . . 44 3.6 Client Performance Impact . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.7 Using Our Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 iv Chapter 4: Odin: A Scalable Fault-Tolerant CDN Measurement System 56 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Microsoft’s Network . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.2 Comparison to Other CDNs . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Goals and Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Limitations of Existing Solutions . . . . . . . . . . . . . . . . . . . . . . . 63 4.5 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.6.1 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.6.2 Orchestration Service . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.6.3 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.4 Analysis Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.7 Supporting CDN Operations with Odin . . . . . . . . . . . . . . . . . . . 77 4.7.1 Identifying and Patching Poor Anycast Routing . . . . . . . . . . . 77 4.7.2 Monitoring and Improving Service Availability . . . . . . . . . . . 78 4.7.2.1 Preventing Anycast Overload . . . . . . . . . . . . . . . . 79 4.7.2.2 Monitoring the Impact of Anycast Route Changes on Avail- ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.7.3 Using Measurements to Plan CDN Evolution . . . . . . . . . . . . 81 4.7.3.1 Comparing Global vs Regional Anycast . . . . . . . . . . 81 4.8 Evaluation and Production Results . . . . . . . . . . . . . . . . . . . . . . 82 4.8.1 Odin Improves Service Performance . . . . . . . . . . . . . . . . . 82 4.8.1.1 Odin Patches Anycast Performance . . . . . . . . . . . . 82 4.8.2 Using Odin to Identify Outages . . . . . . . . . . . . . . . . . . . . 83 4.8.3 Using Odin to Evaluate CDN Configuration . . . . . . . . . . . . . 86 4.8.4 Evaluating Odin Coverage . . . . . . . . . . . . . . . . . . . . . . . 87 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.10 Supplemental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 5: Examining CDN Redirection As a Design Decision 92 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.2 A Client Centric Approach to CDN Latency Map Creation . . . . . . . . 94 5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.2 Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2.3 Client-Centric Mapping . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2.5 Dispelling Mistaken Conventional Wisdom . . . . . . . . . . . . . . 104 5.3 Analyzing the Performance of an Anycast CDN . . . . . . . . . . . . . . . 106 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3.3 Routing Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3.4 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3.4.1 Passive Measurements . . . . . . . . . . . . . . . . . . . . 109 5.3.4.2 Active Measurements . . . . . . . . . . . . . . . . . . . . 109 v 5.3.5 Choice of Front-ends to Measure . . . . . . . . . . . . . . . . . . . 110 5.3.6 CDN Size and Geo-Distribution . . . . . . . . . . . . . . . . . . . . 112 5.3.7 Anycast Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3.8 Addressing Poor Performance . . . . . . . . . . . . . . . . . . . . . 122 5.3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Chapter 6: Literature Review 128 6.1 Mapping CDN Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.2 Measuring CDN Performance . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.3 Anycast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.4 DNS Latency Map Creation . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Chapter 7: Conclusions and Future Work 141 7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Bibliography 147 vi List of Tables 3.1 The number of client prefixes excluded from CCG by filtering. . . . . . . . 25 3.2 Comparison of Google front-ends found by EDNS and open resolver. . . . 31 3.3 Difference in daily vs. cumulative count of Google IPs, /24s, and ASes. . 34 3.4 Classification of ASes hosting Google serving infrastructure over time. . . 41 4.1 Goals of Odin and requirements to meet our operational CDN needs. . . . 63 5.1 The performance improvement in the 75th and 95th percentile from a 2 month roll-out using the CCM mapping technique over May and June 2017. . . . . . . 104 vii List of Figures 2.1 High-level architecture of many CDNs. . . . . . . . . . . . . . . . . . . . . 9 2.2 Example of DNS resolution from a client to a CDN front-end. . . . . . . . 10 3.1 Discovery of front-ends topologically nearby RIPE Atlas probes. . . . . . 29 3.2 Comparison of our client-centric geolocation against traditional techniques. 36 3.3 Impact of filtering techniques on client-centric geolocation. . . . . . . . . . 37 3.4 Growth in Google front-end IP addresses, /24s, and ASes over time. . . . 39 3.5 A world wide view of the expansion in Google’s infrastructure. . . . . . . 40 3.6 Number of countries hosting Google serving infrastructure over time. Dates show are between 2012-10-17 and 2015-11-30. Our data after this period observed no new countries. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7 CDF of number of sites in different types of ISPs from August 2013. . . . 43 3.8 Impact of our various techniques to filter client locations when performing client-centric geolocation on Google front-ends with known locations. . . . 44 3.9 Distances from client prefixes to Google front-end locations. . . . . . . . . 45 3.10 Distances from all /24 prefixes to Google-hosted and ISP-hosted front-ends. 47 3.11 The distribution of latency and HTTP GET search improvement for RIPE Atlas probes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.12 The difference in front-end performance for PlanetLab nodes before and after Google’s front-end expansion. Figure (a) on the left shows the difference for three HTTP-related metrics. Figure (b) on the right shows the difference in performance for page load time. . . . . . . . . . . . . . . . . . . . . . . 48 3.13 The relationship between number of Google IPs discovered and number of vantage points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 viii 4.1 Odin architecture overview. . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Odin measurement types. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3 Three types of Internet faults that may occur when fetching measurement configu- ration or uploading reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Topology of backup paths when a front-end is unreachable. . . . . . . . . 75 4.5 Regional anycast performance. . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6 Debugging 2017 availability drop between Helsinki front-end and AS1759 users in Finland. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.7 Relative difference per hour in percentage of reports received through the backup path across four weekdays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.8 Latency degradation of 5-region vs. global anycast. . . . . . . . . . . . . . . . 87 4.9 Percent of measurement coverage based on 4 Odin-embedded applications with different user populations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.10 The average error of observed failure rate, as a function of number of measurements and true failure rate. . . . . . . . . . . . . . . . . . . . . . . 91 5.1 Distribution of average distance between Microsoft users and their LDNS . . . . 98 5.2 Aggregation of measurements for DNS mapping. . . . . . . . . . . . . . . 101 5.3 Performance difference between CCM map and geolocation map. . . . . . 103 5.4 Diminishing returns of measuring to additional front-ends. . . . . . . . . . 112 5.5 Distances in kilometers (log scale) from volume-weighted clients to nearest front-ends.114 5.6 The fraction of requests where the best of three different unicast front-ends out- performed anycast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.7 The distance in kilometers (log scale) between clients and the anycast front-ends they are directed to. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.8 Anycast poor-path prevalence during April 2015. . . . . . . . . . . . . . . 119 5.9 Anycast poor-path duration across April 2015. . . . . . . . . . . . . . . . 120 ix 5.10 The cumulative fraction of clients that have changed front-ends at least once by different points in a week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.11 The distribution of change in client-to-front-end distance (log scale) when the front-end changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.12 Improvement over anycast from making LDNS or ECS-based decisions. . . 126 x Abstract High performance Internet services are critical to the success of online businesses because poor user experience directly impacts revenue. To provide low latency and high availability service, companies often use content delivery networks (CDNs) to deliver content quickly and reliably to customers through a globally distributed network of servers. In order to meet the performance demands of customers, CDNs continuously make crucial network designs decisions that impact end-user performance. However, little is known about how CDN design decisions impact end-user performance or how to evaluate them effectively. In this thesis, we aim to help content providers and researchers understand the performance impact of CDN design changes on end-users. To achieve this, we examine a collection of measurement results and a Internet measurement system deployed in production at a large content provider. With our measurement results we will look at the impact of two important CDN design decisions – expansion of CDN front-end deployment and a choice of popular redirection strategies. Our design and evaluation of a measurement system at Microsoft demonstrates how CDNs can use end-user Internet measurements to support operations. First, we look at a massive expansion of Google’s serving infrastructure into end-user networks to reduce latency to their services. We first explore measurement techniques xi using the client-subnet-prefix DNS extension to completely enumerate and geolocate Google’s serving infrastructure. Our longitudinal measurements then capture a large expansion of Google sites from primarily on Google’s own network into end-user networks, greatly reducing the distance between end-users and Google services. We then examine the performance impact of Google’s expansion by conducting a study of user performance to the servers users are directed to before and after the expansion. Second, we examine Odin, a production measurement system deployed at Microsoft to support CDN and network operations. Odin was designed to overcome the measurement limitations of existing approaches (Section 4.4) and to take advantage of Microsoft’s control of first-party end-user applications. We demonstrate that Odin delivers measurements even in the presence of Internet outages and supports a number of critical CDN operational scenarios such as traffic management, outage identification, and network experimentation. Third, we look at anycast and DNS redirection, the two common strategies used for serving latency sensitive content. We first examine a technique for constructing DNS latency maps that improves performance over existing approaches. We then use that approach to compare performance of DNS and anycast redirection where we find that most of the time anycast directs users to low latency server but DNS performance is better in the tail. xii Chapter 1 Introduction Content delivery networks (CDN) are global scale networks designed to improve user web experience by delivering low-latency and highly available content, primarily by serving content close to users. For large content providers, the performance gains offered by CDNs are critical because slow performance directly impacts revenue. For example, both Amazon [113] and Yahoo [140] have disclosed that hundreds of milliseconds of additional latency have a major impact on revenue. Different CDN designs offer different performance characteristics. Two such design points are deployment strategy, in what geographic regions and networks to deploy CDN serving infrastructure, and client redirection strategy, the protocols controlling how end- users are directed to a particular. Since end-user performance plays such a vital role in the success of online economies, the CDN design decisions which impact this performance must be carefully evaluated. However, there is little understanding of how to evaluate client-impacting design decisions of CDN structure and behavior because of several challenges. First, understanding the structure and strategy of state-of-the-art content provider networks is difficult. Many large content providers treat the geographic locations, inter-connectivity, capacity, and operations of their CDNs as confidential, yet the majority of Internet traffic is controlled by roughly a dozen content providers [26,110]. Since these “Internet giants” dominate today’s Internet traffic, understanding their structure and strategies is critical to understanding 1 the Internet. Existing approaches to mapping CDN infrastructures have incomplete coverage of Internet paths between the CDN and end-users, providing only a partial view of infrastructure deployment [28,107,125,142,146]. Second, even with a good technical understanding of state-of-the-art structure and strategy of CDNs, it is unclear whether existing measurement approaches are adequate. Existing approaches include third-party measurement platforms [9,10,13,22,24], server- side instrumentation [36, 52, 72, 108, 135, 154, 160], and issuing layer 3 measurements toward [54,64,106,133,143] or from [157] CDN targets. All of these approaches suffer from technical or deployment limitations which prevent them from offering sufficient visibility into the Internet paths required for CDN operations. Third, the performance trade-offs of different strategies for directing users to the closest content remains largely unexplored, leaving a major gap of understanding in critical aspect of CDN design. Thesis Statement: Client-side measurements allow complete coverage of CDN users over time and across diverse Internet paths to enable new analyt- ical power to understand the impact of CDN design decisions. In this thesis, we describe a collection of measurement results and a system to help content providers and researchers understand the impact of CDN design decisions on end-users. Specifically, we use client-side measurements to address three issues: (1) How to map the content serving infrastructure of large CDN over time. (2) How an Internet measurement system can support CDN operations. (3) Understanding the client performance trade-offs between redirection strategies. 2 Critical to the impact of this work is our choice to focus our studies on two of the largest content providers on the Internet today, Google and Microsoft. For the majority of this work, we studied the performance impact of CDN design decisions using real Microsoft end-users. The promise of our initial measurement results led to the development of a production measurement system for supporting Microsoft’s CDN operations. This fact reinforces the relevance of the research challenges addressed in this thesis and the impact of the corresponding results. The following sections in this chapter provide summaries of the work conducted to support this thesis. 1.1 Mapping CDN Infrastructure Modern content distribution networks provide both bulk content delivery and act as “serving infrastructure” for web services in order to reduce user-perceived latency. Serving infrastructures, such as Google’s, are composed of globally distributed sets of colocated clusters of servers called front-ends. CDNs are now critical to the online economy, making it imperative to understand their size, geographic distribution, and growth strategies. To this end, we develop techniques that enumerate IP addresses of front-ends in these infrastructures, find their geographic location, and identify the association between clients and front-ends. While general techniques for front-end enumeration and geolocation can exhibit large error, our techniques exploit the design and mechanisms of serving infrastructure to improve accuracy. 3 In Chapter 3, we explore the use of the EDNS-client-subnet DNS extension to measure which clients a service maps to which of its serving sites. Unlike previous techniques for mapping serving sites, our approach maximizes network diversity and scale by enabling DNS resolution from all possible vantage points on the Internet. We devise a novel technique that uses this mapping to geolocate front-ends by combining noisy information about client locations with speed-of-light constraints. We demonstrate that this technique substantially improves geolocation accuracy relative to existing approaches. We also cluster front-end IP addresses into physical sites by measuring RTTs and adapting the cluster thresholds dynamically. Google’s serving infrastructure has grown dramatically over a ten month period, and we use our methods to chart its growth and understand its content serving strategy. We find that the number of Google serving sites has increased more than sevenfold, and most of the growth has occurred by placing servers in large and small ISPs across the world, not by expanding Google’s backbone. 1.2 Supporting CDN Operations with Internet Measurement While CDNs aim to deliver high performance Internet services using a world-wide de- ployment of front-ends, the heterogeneous connectivity of clients and the dynamic nature of interconnection between the 60 thousand ASes on the Internet makes understanding end-user performance a challenge. Common issues such as outages, routing changes, and congestion can hurt end-user performance by increasing latency and lowering availability. Without continuous insight on performance between users, servers, and external networks, CDNs will not be able to attain their full potential performance. 4 In Chapter 4, we describe Odin, a global-scale fault-tolerant Internet measurement system we designed to support CDN operations. Odin has been deployed in production for two years at Microsoft, where it continuously supports many aspects of CDN operations such as traffic management, network diagnostics, planning, and experimentation. Existing approaches suffer from unrepresentative performance and insensitivity to Internet events. Odin overcomes these limitations by using client-side, active layer-7 measurements from the massive population of Microsoft end-users. Using Odin, we helped Microsoft improve latency by 10% or more for almost 60% of high traffic volume countries for Azure [7] cloud customers deployed in multiple regions. 1.3 CDN Redirection 1.3.1 Analyzing the Performance of an Anycast CDN Content delivery networks must balance a number of trade-offs when deciding how to direct a client to a CDN server. Whereas DNS-based redirection requires a complex global traffic manager, anycast depends on BGP to direct a client to a CDN front-end. Anycast is simple to operate, scalable, and naturally resilient to DDoS attacks. This simplicity, however, comes at the cost of precise control of client redirection. We examine the performance implications of using anycast in a global, latency-sensitive, CDN. We analyze millions of client-side measurements from the Bing search service to capture anycast versus unicast performance to nearby front-ends. We find that anycast usually performs well despite the lack of precise control but that it directs roughly 20% of 5 clients to a suboptimal front-end. We also show that the performance of these clients can be improved through a simple history-based prediction scheme. 1.3.2 A Client Centric Approach to Latency Map Construction DNS-based redirection remains the most popular method for directing clients to nearby CDN front-ends for latency sensitive content because it provides a level of control un- available with techniques such as anycast, without the performance penalty of HTTP redirection. While CDNs such as Akamai and large content providers such as Google and Facebook use DNS-redirection to control a commanding portion of today’s Internet traffic, the control-plane algorithms and the data driving them has remained confidential, leading to mystery and misconception in industry and the research community. The few approaches described by previous work have limitations that fail to meet the performance demands of today’s CDNs. In this section, we describe an approach for a DNS-redirection control plane algorithm, primarily driven by large-scale end-user measurements, that identifies the best performing front-end for an LDNS. Our approach overcomes limitations of existing approaches by (1) removing assumptions about the relationships between end-users and their LDNSes (2) weighting decisions based on actual traffic volume (3) fine-grain monitoring of paths between CDN and its users. We dispel common misconceptions about how modern DNS- redirection operate and demonstrate how our approach greatly improved performance of Microsoft cloud customers over a previously deployed approach. 6 Chapter 2 Background In this section, we provide background on CDNs and highlight some of the technical challenges addressed in this thesis. 2.1 Content Delivery Networks Adding even a few hundreds of milliseconds to a webpage load time can cost service providers users and business [113,140], so providers seek to optimize their web serving infrastructure to deliver content quickly to clients. Whereas once a website might have been served from a single location to clients around the world, today’s major services rely on much more complicated and distributed infrastructure. Providers replicate their services at serving sites around the world and try to serve a client from the closest one [108]. CDNs initially sped delivery by caching static content and some forms of dynamic content within or near client networks. Today, CDNs provide a large number of services critical for online businesses such as caching, website acceleration, load balancing, and security such as DDoS mitigation and firewalls. Content delivery networks were pioneered by Akamai [124] in 1998 by providing distributed Internet caching services for content providers. As CDNs have evolved into a critical part of the Internet, different categories have emerged. The traditional CDNs such as Akamai, we call third-party because they are independent businesses, operating their 7 own networks, that serve content for content producers (e.g. media companies). A private or first-party CDN is one that is operated by the content producer to serve their own content. Examples of companies operating first-party CDNs are Google, Facebook, Netflix, and Microsoft. Other types of third-party CDNs include TelCo, operated by large tier 1 transit provider ISPs such as Level3 and AT&T, Meta-CDNs (Cedexis), and peer-to-peer (commercial BitTorrent). This thesis will focus primarily on challenges facing first-party CDNs. 2.2 Architecture A CDN has geographically distributed server clusters (known as front-ends, edges, or proxies), each serving nearby users to shorten paths and improve performance [19,71,88,135] (see Figure 2.1). Many front-ends are hosted in CDN “points of presence” (PoPs), physical interconnection points where the CDN peers with other ISPs. CDNs typically deploy PoPs in major metro areas. Some CDNs also deploy front-ends in end-user networks, known as off-nets, or in datacenters. A front-end serves cached content immediately and fetches other content from a back-end. Back-ends can be complex web applications. Some CDNs operate their own backbone networks, providing global interconnect between PoPs and back-ends. Front-ends may also serve as reverse proxies, terminating users’ TCP connections and multiplexing the requests to the appropriate back-ends via pre-established warm TCP connections [51,129]. The back-ends forward their responses back to the same front-ends, which relay the responses to users. Reverse proxies accelerate websites because shorter 8 Figure 2.1: High-level architecture of many CDNs. A popular strategy is to host FEs, or front- ends, in a CDN’s PoP or in end-user networks (off-net FEs). FEs fetch content from back-ends which may be internal, a content provider’s private infrastructure, Cloud, public infrastructure offered by the CDN for content hosting, or external to the CDN’s network. round-trip times between clients and front-ends enable faster congestion window growth and TCP loss recovery [71]. 2.3 Client Redirection CDNs direct clients to a front-end by client redirection. While there are several approaches, this thesis considers the two most common redirection mechanisms CDNs use for latency sensitive traffic. DNS: The client will fetch a CDN-hosted resource via a hostname that belongs to the CDN. The client’s local DNS resolver (LDNS), typically configured by the client’s ISP, will receive the DNS request to resolve the hostname and forward it to the CDN’s authoritative nameserver. The CDN makes a performance-based decision about what IP address to return based on which LDNS forwarded the request, as shown in Figure 2.2. 9 DNS redirection allows relatively precise control to redirect clients on small timescales by using small DNS cache TTL values. Figure 2.2: Example of DNS resolution from a client to a CDN front-end. The client requests DNS resolution for www.cdn.com via its LDNS, 1.2.3.4. If the LDNS does not have a cached record, it requests resolution from the authoritative DNS for cdn.com. The authoritative DNS returns a record for Front-end A because the CDN believes A will offer the best performance for clients served by 1.2.3.4. DNS-based redirection faces some challenges because the authoritative DNS only sees the IP of the LDNS, and not the client, so a CDN must make decisions at the granularity of LDNS rather than client. An LDNS may be distant from the clients that it serves or may serve clients distributed over a large geographic region, such that there is no good single redirection choice an authoritative resolver can make. This situation is very common with public DNS resolvers such as Google Public DNS and OpenDNS, which serve large, geographically disparate sets of clients [50]. A proposed solution to this issue is the EDNS client-subnet-prefix standard (ECS) which allows a portion of the client’s actual IP address to be forwarded to the authoritative resolver, allowing per-prefix redirection decisions [50,60]. Another issue with DNS direction is honoring of DNS TTL records. Previous work showed that DNS records are used far after their TTL has expired [12, 73, 128]. For 10 examples, DropBox [12] found that even after changing a record with a 1 minute TTL, after one hour, 5% of traffic continued to resolve the old record. Anycast: Anycast is a routing and addressing strategy where multiple paths exist to two or more destinations with the same IP address. For Internet anycast in the context of content delivery, a CDN announces the same BGP prefix from multiple PoPs. BGP routes clients to a PoP based on BGP’s notion of best path. Once traffic enters the CDN’s network, intra-domain policy routes to the nearest anycast destination. Because CDN’s often co-locate front-ends and PoPs, CDN intra-domain paths are very short so the majority of the path is determined by BGP. 1 Because anycast defers client redirection to Internet routing, it offers operational simplicity. Anycast has an advantage over DNS- based redirection in that each client redirection is handled independently – avoiding the LDNS problems described above. From this point forward when we refer to “anycast”, we mean Internet anycast where PoPs and front-ends are co-located, unless explicitly stated otherwise. Internet anycast has some well-known challenges. First, anycast is unaware of network performance, because BGP is unaware, so it does not react to changes in network quality along a path. Second, anycast is unaware of server load. If a particular front-end becomes overloaded, it is difficult to gradually direct traffic away from that front-end, although there has been recent progress in this area [74]. Simply withdrawing the route to take that front-end offline can lead to cascading overloading of nearby front-ends. Third, anycast routing changes can cause ongoing TCP sessions to terminate and need to be restarted. 1 Examples of non-CDN anycast exceptions to this are Google public DNS [18] and multi-region cloud application deployments [15], where there are many fewer DNS sites and cloud regions than PoPs [47] so traffic may traverse Google’s backbone to reach its destination. 11 In the context of the Web, which is dominated by short flows, this does not appear to be an issue in practice [74,111]. Many companies, including Cloudflare, CacheFly, Edgecast, and Microsoft, run successful anycast-based CDNs. Other Redirection Mechanisms: Whereas anycast and DNS direct a client to a front- end before the client initiates a request, the response from a front-end can also direct the client to a different server for other resources, using, for example, HTTP status code 3xx [21] or manifest-based redirection common for video [27]. These schemes are common in media content delivery and application/OS upgrades but add extra RTTs, and thus are not suitable for latency-sensitive services such as search. We do not consider them further in this thesis. 12 Chapter 3 Mapping the Expansion of Google’s Serving Infrastructure 3.1 Introduction Internet traffic has changed considerably in recent years, as access to content is increasingly governed by web serving infrastructures, made up of globally distributed front-ends servers. Clients of these infrastructures are directed to nearby front-ends, which either directly serve static content (e.g., video or images from a content distribution network like Akamai), or use split TCP connections to relay web access requests to back-end datacenters (e.g., Google’s search infrastructure) [51,71,129]. Web service providers employ serving infrastructures to optimize user-perceived la- tency [136]. They invest heavily in building out these infrastructures and develop sophisti- cated mapping algorithms to direct clients to nearby front-ends (Section 5). Over the course of our study, as we discuss later, Google’s serving infrastructure increased sevenfold in size. Given the increasing economic importance of these serving infrastructures, we believe it is imperative to understand the content serving strategies adopted by large web service providers. Specifically, we are interested in the geographic and topological scope of serving infrastructures, their expansion, and how client populations impact build-out of the serving infrastructure. Several prior studies have explored static snapshots of content-distribution net- works [32, 94, 125], often focusing on bulk content delivery infrastructures [94], new 13 mapping methodology [32], or new DNS selection methods [125]. In contrast, our work focuses on web serving infrastructures, develops more accurate methods to enumerate and locate front-ends and serving sites, and explores how one infrastructure, Google’s, grows over ten months of active buildout. The first contribution of this chapter is a suite of methods to enumerate the IP addresses of front-ends, geolocate them, and cluster them into serving sites. Our methods exploit mechanisms used by serving infrastructures to optimize client-perceived latency. To enumerate the IP addresses, we use the EDNS-client-subnet prefix extension [50,60] that some serving infrastructures, including Google, use to more accurately direct clients to nearby front-ends. A front-end IP address may sit in front of many physical server machines. In this work, we focus on mapping out the front-end IP addresses, but we do not attempt to determine the number of physical servers. We develop a novel geolocation technique and show that it is substantially more accurate than previously proposed approaches. Our technique, client-centric geolocation (CCG), exploits the sophisticated strategies providers use to map customers to their nearest serving sites. CCG geolocates a server from the geographic mean of the (possibly noisy) locations for clients associated with that server, after using speed-of-light constraints to discard misinformation. While EDNS-client-subnet has been examined before [119,125], we are the first to use EDNS-client-subnet to (1) completely enumerate a large content delivery infrastructure; (2) demonstrate its benefit over existing enumeration techniques; and (3) geolocate the infrastructure. We also cluster the front-end IP addresses into serving sites using an OPTICS-based [37] RTT clustering technique [67]. These changes provide enough 14 resolution to distinguish different sites in the same city. These sites represent unique network locations, a view that IP addresses, prefixes, or ASes can obscure. Our second major contribution is a detailed study of Google’s web serving infrastructure and its expansion over a 3 1 ⁄2 year period. To our knowledge, we are the first to observe rapid growth of the serving infrastructure of a major content provider. We find that Google’s serving infrastructure has grown over sevenfold in the number of front-end sites, with serving sites deployed in over 140 countries and nearly 1600 new ASes. Its growth strategy has been to move away from serving clients from front-ends deployed on its own backbone and towards serving from front-ends deployed in lower tiers of the AS hierarchy; the number of /24 prefixes served off Google’s network more than quadrupled during the expansion. Furthermore, these new serving sites, predictably, have narrow customer cones, serving only the customers of the AS the site is deployed in. Finally, we find that the expansion has noticeably reduced the distribution of geographic distances from the client to its nearest front-end server, and that this shift can also reduce the error in geolocating front-ends using client locations alone, but not enough to obviate the need for CCG’s filtering techniques. Our third contribution is an examination of whether—and to what extent—the newly deployed infrastructure reduced latency between clients and Google. We first look at reduction in geographic distance between clients and Google as a proxy for latency and find many clients are substantially closer to their front-ends after the expansion. We then use vantage points from the global PlanetLab [55] and RIPE Atlas [22] testbeds to quantify the performance improvements experienced by vantage points served by the new 15 front-ends. We find that 60-70% of vantage points saw some latency improvement and 40% saw improvements of 20ms or more. An explicit non-goal of this work is to estimate the increase in Google’s serving capacity: in placing front-ends in ISPs around the world, Google’s expansion presumably focused on improving the latency of Web accesses through split-TCP connections [51, 71, 129], so proximity of front-ends to clients and good path performance between clients and front-ends were more important than capacity increases. Google’s expansion showcases an important CDN design decision to shift from infras- tructure hosted primarily on the CDN’s network into thousands of other ISPs to further reduce latency to end-users. To support our thesis, we show how enumeration techniques with partial client coverage fail to uncover all serving infrastructure. In contrast, our EDNS-client-subnet technique supports complete client coverage over time (3 1 ⁄2 years) and captures the full impact of improvement in distance between Google and clients. 3.2 Goal and Approach Our goal is to understand content serving strategies for large IPv4-based serving infras- tructures, especially that of Google. Serving strategies are defined by how many serving sites and front-end servers a serving infrastructure has, where the serving sites are located geographically and topologically (i.e., within which ISP), and which clients access which serving sites. Furthermore, services continuously evolve serving strategies, so we are also interested in measuring the evolution of serving infrastructures. Of these, Google’s serving 16 infrastructure is inarguably one of the most important, so we focus our attention on this infrastructure. To this end, we develop novel measurement methods to enumerate front-end servers, geolocate serving sites, cluster front-end servers into serving sites, and quantify client performance impact. The challenge in devising these measurement methods is that serving infrastructures are large, distributed entities, with thousands of front-end servers at hundreds of serving sites spread across dozens of countries. Traditional approaches to enumerating serving sites would require perspectives from a very large number of topological locations in the Internet, much larger than the geographic distribution provided by research measurement infrastructures like PlanetLab. Moreover, existing geolocation methods that rely on DNS naming [97,138] or geolocation databases [96,120,131] do not work well on these serving infrastructures where location-based DNS naming conventions are not consistently employed. While our measurement methods use these research infrastructures for some of their steps, the key insight in the design of the methods is to leverage mechanisms used by serving infrastructures to serve content. Because we design them for serving infrastructures, these mechanisms can enumerate and geolocate serving sites more accurately than existing approaches, as we discuss below. Our method to enumerate all front-end server IP addresses within the serving infras- tructure uses the EDNS-client-subnet extension. As discussed in Section 2, Google (and some other serving infrastructures) use this extension to address the problem of geograph- ically distributed clients using a resolver that prevents the serving infrastructure from optimally directing clients to front-ends. We use this extension to enumerate front-end IP 17 addresses of a serving infrastructure from a single location: this extension can emulate DNS requests coming from every active prefix in the IP address space, effectively providing a very large set of vantage points for enumerating front-end IP addresses. To geolocate front-end servers and serving centers, we leverage another mechanism that serving infrastructures have long deployed, namely sophisticated mapping algorithms that maintain performance maps to clients with the goal of directing clients to the lowest latency available server. These algorithms have the property that clients that are directed to the server are likely to be topologically, and probably geographically, close to the server. We exploit this property to geolocate front-end servers: essentially, we approximate the location of a server by the geographical mean of client locations, a technique we call client-centric geolocation or CCG. We base our technique on this intuition, but we compensate for incorrect client locations and varying density of server deployments. We use existing measurement infrastructure (PlanetLab) to collect RTT measurements to all front-end IP addresses and cluster them into serving sites using an OPTICS-based clustering technique [47,67]. Using these measurement methods over a ten month period, we are able to study Google’s serving infrastructure and its evolution. Coincidentally, Google’s infrastructure has increased sevenfold over this period, and we explore salient properties of this expansion: where (geographically or topologically) most of the expansion has taken place, and how it has impacted clients. Finally, we evaluate the performance impact of Google’s expansion on vantage points from PlanetLab and RIPE Atlas. We measure performance from each vantage point to the front-ends selected by Google before and after the expansion, as well as other nearby 18 front-ends. We examine two metrics: Page Load Time (PLT) and search request latency that correspond more closely with user perceived performance than RTT. 3.3 Methodology In this section, we discuss the details of our measurement methods for enumerating front-ends, geolocating them, and clustering them into serving sites, and measuring their performance. 3.3.1 Enumerating Front-Ends Our first goal is to enumerate the IP addresses of all front-ends within a serving infrastruc- ture. We do not attempt to identify when multiple IP addresses belong to one computer, or when one address fronts for multiple physical computers. An IP addresses can front hardware from a small satellite proxy to a huge datacenter, so careful accounting of public IP addresses is not particularly meaningful. Many large serving infrastructures, such as Akamai [124], use mapping algorithms and DNS redirection. One way to enumerate front-ends is to issue DNS requests from multiple vantage points. Each request returns a front-end near the querying vantage point. The completeness of this approach is a function of the number of vantage points. We emulate access to vantage points around the world using the client-subnet prefix DNS extension standard using the EDNS extension mechanism (we call this approach EDNS-client-subnet). As of May 2013, EDNS-client-subnet is supported by Google, 19 CacheFly, EdgeCast, ChinaCache and CDN 77. We use a patch to dig 1 that adds support for EDNS-client-subnet, allowing the query to specify the client prefix. In our measurements of Google, we issue the queries through Google Public DNS’s public recursive nameservers, which passes them on to the service we are mapping. The serving infrastructure then returns a set of front-ends it believes is best suited for clients within the client prefix. EDNS-client-subnet allows our single measurement site to solicit the recommended front-end for each specified client prefix. Using EDNS-client-subnet, we effectively get a large number of vantage points We query using client prefixes drawn from 10 million routable /24 prefixes obtained from a RouteViews BGP snapshot. Queries against Google using this approach take about a day to enumerate. 3.3.2 Client-centric Front-End Geolocation Current geolocation approaches are designed for generality, making few or no assump- tions about the target. Unfortunately, this generality results in poor performance when geolocating serving infrastructure. For example, MaxMind’s free database [120] places all Google front-ends in Mountain View, the company’s headquarters. (MaxMind may have more accurate locations for IPs belonging to eyeball ISPs, but IPs belonging to transit ISPs will have poor geolocation results [97].) General approaches such as CBG [85] work best when vantage points are near the target [104], but front-ends in serving infrastructures are sometimes in remote locations, 1 http://wilmer.gaa.st/edns-client-subnet/ 20 far from public geolocation vantage points. Techniques that use location hints in DNS names of front-ends or routers near front-ends can be incomplete [94]. Our approach combines elements of prior work, adding the observation that today’s serving infrastructures use privileged data and advanced measurement techniques to try to direct clients to nearby front-ends [142]. While we borrow many previously proposed techniques, our approach is unique and yields better results. We base our geolocation technique on two main assumptions. First, a serving infras- tructure tries to direct clients to a nearby front-end, although some clients may be directed to distant front-ends, either through errors or a lack of deployment density. Second, geolocation databases have accurate locations for many clients, at least at country or city granularity, but also have poor granularity or erroneous locations for some clients. Combining these two assumptions, our basic approach to geolocation, called client- centric geolocation (CCG), is to (1) enumerate the set of clients directed to a front-end, (2) query a geolocation database for the locations of those clients, and (3) assume the front-ends are located geographically close to most of the clients. To be accurate, CCG must overcome challenges inherent in each of these three steps of our basic approach: 1. We do not know how many requests different prefixes send to a serving infrastructure. If a particular prefix does not generate much traffic, the serving infrastructure may not have the measurements necessary to direct it to a nearby front-end, and so may direct it to a distant front-end. 21 2. Geolocation databases are known to have problems including erroneous locations for some clients and poor location granularity for other clients. 3. Some clients are not near the front-end that serve them, for a variety of reasons. For example, some front-ends may serve only clients within certain networks, and some clients may have lower latency paths to front-ends other than the nearest ones. In other cases, a serving infrastructure may direct clients to a distant front-end to balance load or may mistakenly believe that the front-end is near the client. Or, a serving infrastructure may not have any front-ends near a particular client. We now describe how CCG addresses these challenges. Selecting client prefixes to geolocate a front-end. To enumerate front-ends, CCG queries EDNS using all routable /24 prefixes. However, this approach may not be accurate for geolocating front-ends, for the following reason. Although we do not know the details of how a serving infrastructure chooses which front-end to send a client to, we assume that it attempts to send a client to a nearby front-end and that the approach is more likely to be accurate for prefixes hosting clients who query the service a lot than for prefixes that do not query the service, such as IP addresses used for routers. To identify which client prefixes can provide more accurate geolocation, CCG uses traceroutes and logs of users of a popular BitTorrent extension, Ono [54] from March 2013. From the user logs we obtain a list of 2.6 million client prefixes observed to participate in BitTorrent swarms because they are likely to host Internet users. We assume that a serving infrastructure is likely to also observe requests from these prefixes. 22 Overcoming problems with geolocation databases. CCG uses two main ap- proaches to overcome errors and limitations of geolocation databases. First, we exclude locations that are clearly inaccurate, based on approaches described in the next paragraph. Second, we combine a large set of client locations to locate each front-end and assume that the majority of clients have correct locations that will dominate the minority of clients with incorrect locations. To generate an initial set of client locations to use, CCG uses a BGP table snapshot from RouteViews [122] to find the set of prefixes currently announced, and breaks these routable prefixes up into 10 million /24 prefixes. 2 It then queries MaxMind’s GeoLiteCity database to find locations for each /24 prefix. We chose MaxMind because it is freely available and is widely used in research. CCG prunes three types of prefix geolocations as untrustworthy. First, it excludes prefixes for which MaxMind indicates it has less than city-level accuracy. This heuristic excludes 1,966,081 of the 10 million prefixes (216,430 of the 2.6 million BitTorrent client prefixes). Second, it uses a dataset that provides coarse-grained measurement-based geolocations for every IP address to exclude prefixes that include addresses in multiple locations [90]. Third, it issues ping measurements from all PlanetLab 3 locations to five responsive addresses per prefix, and excludes any prefixes for which the MaxMind location would force one of these ping measurements to violate the speed of light. Combined, these exclude 8,396 of the 10 million prefixes (2,336 of the 2.6 million BitTorrent client prefixes). With these problematic locations removed, and with sets of prefixes likely to include clients, CCG assumes that both MaxMind and the serving infrastructure we are mapping 2 In Section 3.4.1, we verify that /24 is often the correct prefix length to use. 3 As we show later, we have found that PlanetLab contains a sufficient number of vantage points for speed-of-light filtering to give accurate geolocation. 23 likely have good geolocations for most of the remaining prefixes, and that the large number of accurate client geolocations should overwhelm any remaining incorrect locations. Dealing with clients directed to distant front-ends. Even after filtering bad geolocations, a client may be geographically distant from the front-end it is mapped to, for two reasons: the serving infrastructure may direct clients to distant front-ends for load-balancing, and in some geographical regions, the serving infrastructure deployment may be sparse so that the front-end nearest to a client may still be geographically distant. To prune these clients, CCG first uses speed-of-light constraints, as follows. It issues pings to the front-end from all PlanetLab nodes and use the speed of light to establish loose constraints on where the front-end could possibly be [85]. When geolocating the front-end, CCG excludes any clients outside of this region. This excludes 4 million out of 10 million prefixes (1.1 million out of 2.6 million BitTorrent client prefixes). It then estimates the preliminary location for the front-end as the weighted average of the locations of the remaining client prefixes, then refines this estimate by calculating the mean distance from the front-end to the remaining prefixes, and finds the standard deviation from the mean of the client-to-front-end distances. Our final filter excludes clients that are more than a standard deviation beyond the mean distance to the front-end, excluding 392,668 out of 10 million prefixes (214,097 out of 2.6 million BitTorrent client prefixes). Putting it all together. In summary, CCG works as follows. It first lists the set of prefixes directed to a front-end, then filters out all prefixes except those observed to host BitTorrent clients. Then, it uses MaxMind to geolocate those remaining client prefixes, but excludes: prefixes without city-level MaxMind granularity; prefixes that include addresses in multiple locations; prefixes for which the MaxMind location is not in the feasible actual 24 10M prefixes 2.6M prefixes No city-level accuracy -1.9M (19.5%) -216K (8.1%) Multiple locations and client location speed-of-light violations -8K (.08%) -2K (.08%) Front-End location speed-of-light viola- tions -4M (40%) -1.1M (41%) Outside one standard deviation -392K (3.9%) -214K (8%) Remaining 3.7M (37%) 1M (39%) Table 3.1: Summary of the number of client prefixes excluded from CCG by filtering. 10M is the 10 million client prefix set and 2.6M is the 2.6 million BitTorrent client prefix set. location based on speed-of-light measurements from PlanetLab and M-Lab; and prefixes outside the feasible location for the front-end. (Table 3.1 shows the number of prefixes filtered at each step.) Its preliminary estimate for the front-end location is the geographic mean of the remaining clients that it serves. Calculating the distances from remaining clients to this preliminary location, CCG further exclude any clients more than a standard deviation beyond the mean distance in order to refine our location estimate. Finally, it locates the front-end as being at the geographic mean of the remaining clients that it serves. 3.3.3 Clustering front-ends As we discuss later, CCG is accurate to within 10s of kilometers. In large metro areas, some serving infrastructures may have multiple serving sites, so we utilize an existing technique to distinguish physically distinct serving sites [47,67]. The technique works by mapping each front-end to a point in high dimensional space, where the coordinates are RTTs from landmarks (250 PlanetLab nodes at different geographical sites). The approach looks at distances between points and clusters together those with small distances as distinct serving sites. The intuition underlying the approach is that two front-ends at the 25 same physical location should have a small distance in the high-dimensional space. The existing validation results show the method exhibits over 97% accuracy on three different test datasets [47,67]. 3.3.4 Impact on Client Performance To evaluate the impact of the expansion on client performance, we use three qualitatively different sets of vantage points: prefixes announced around the world, RIPE Atlas, and PlanetLab. The three sets provide different trade-offs between coverage and level of measurement detail. 3.3.4.1 Announced /24 Prefixes The announced /24 prefixes, when combined with EDNS client-subnet-prefix, effectively provide as many vantage points as the number of prefixes. As described in Sections 3.3.1 and 3.3.2, using all routable prefixes as vantage points, we can obtain the geographic location of all front-ends and the identity of the default front-end. With this information, we can also estimate latency between a /24 and any front-end using geographic distance as a proxy for latency. These latency estimates will not always be accurate, but large differences in distance should often reflect distances in latency, and the data set provides broad coverage across the Internet. We first de-aggregated all announced prefixes in a RouteViews BGP snapshot to yield 10 million announced /24 prefixes. Our approaches rely on using reasonably accurate geolocation data from the MaxMind City-Lite Free database so we end up only using 8 million /24 prefixes for which MaxMind has city-level accuracy. We also make use 26 of a second set of 2.6 million /24 prefixes observed in BitTorrent swarms by Ono [54]. We assume these prefixes host end users who are likely to be “eyeballs” for services like Google, and therefore we assume Google is more likely to care about serving them quickly. Furthermore, geolocation databases are less likely to have inaccurate locations for eyeball /24s than for prefixes that only host infrastructure. 3.3.4.2 RIPE Atlas RIPE Atlas is an Internet measurement platform made up of small probe devices hosted by volunteers primarily in home networks. Network operators and researchers collect measurement credits by hosting probes and can spend credits to perform measurements using the community of probes. The Atlas platform supports traditional measurement tools such as ping, traceroute, and dns resolution but offers no direct access or programmability for users. At the time of this writing, RIPE Atlas had around 10 thousand active probes. We use the ping, DNS, and HTTP tools available in Atlas to measure front-ends. HTTP 4 measurements allow simple GET requests to a particular URL, and the results return properties such as request completion time, headers, and number of bytes in the body. The actual contents of the request (e.g. HTML) are not available, and the probe does not fetch subresources. We issue HTTP GET Google search requests to specific front-ends with URL http://hfront-endipi/search?q=dogs. We chose the query term “dogs” as a neutral query term which is unlikely to cause trouble for individuals hosting probes in countries where Internet traffic content is highly monitored. Previous work has shown that choice of query term does not play a significant part in query response time and 4 At the time of this work, HTTP measurements were available with permission. 27 that search results are not cached at front-ends [51], and our measurements support these claims. For the remainder of the chapter we will refer to these searches as just HTTP. In several of our performance analyses, we are interested in a vantage point’s per- formance to the front-ends that provide it the lowest latency, but we may not know a priori which these are, especially for vantage points with many nearby front-ends. With Planetlab, there are essentially no costs associated with issuing many measurements, and so we can use exhaustive probing. However, RIPE Atlas uses a credit-based measurement system so we have to pay the credit cost of each measurement. Because we only have a few probes that accumulate credits, we must be careful to build a sensible measurement budget to avoid burning through all our credits. To find a set of front-ends that provide low latency, we use the following approach for each Atlas probe: 1. Measure RTT to its default front-end, the one Google sends it to. 2. Convert the RTT to a distance, using a delay-to-distance conversion ratio from earlier work [104]. 3. Find all of the front-ends that are located within that distance of the probe. 4. Measure RTT to this set of front-ends to find the set with low latency from the probe. This process yields a set of candidate front-ends per Atlas vantage point, as well as the RTT from the vantage point to the front-ends. However, some vantage points may end up with many such front-ends. To restrict the credit budget needed to make the recurring measurements we need for our study, we further filter the front-ends by only choosing 28 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 120 140 CDF of Atlas Probes Number of Topologically Nearby ISP-hosted Front-ends Provider Only All Figure 3.1: Topologically nearby front-end discovery of RIPE Atlas probes. We include only front-ends with RTT less than half of a probe’s default front-end. those that have RTT less than a multiple R times the RTT to the vantage point’s default front-end. Figure 3.1 demonstrates this approach applied to Atlas with R = 0.5, restricting the “closeness” to those front-ends with RTT less than half the RTT to the probe’s default front-end. It shows the distribution of the number of nearby ISP-hosted front-ends selected per Atlas probe for different filtering regimes. The “Provider Only” lines only consider nearby front-ends that are within the probe’s ISP providers hierarchy (i.e. the transitive closure of upstream providers) whereas “All” includes all ISP-hosted front-ends, regardless of AS relationship. We use R = 0.5 throughout the rest of the chapter. This two-pass filtering–first by distance to find front-ends to ping, then by RTT to select front-ends for the measurements later in this chapter–is necessary because some physically nearby front-ends may have high RTT due, for instance, to geographically circuitous paths. 29 3.3.4.3 PlanetLab PlanetLab [55] has much lower network coverage than RIPE Atlas, but allows us to gather more detailed information on access latency. Daily, we ping all Google-hosted front-end IPs observed the previous day from all PlanetLab nodes. For each Planetlab node, we select the set of front-ends whose RTT is no more than that to the node’s default front-end. We measure two types of HTTP related search performance: basic HTTP search metrics to Google front-ends using curl and page load time performance using a modified version of the PhantomJS [20] headless web-browser and the snitch.js [23] measurement script. To measure page load time from a specific front-end with PhantomJS, we make search requests to the IP address of the front-end, bypassing DNS resolution. The HTML query response contains URLs to additional subresources (JavaScript, CSS, images) that the browser much also fetch. The hostnames of these resources are often outside of www.google.com, instead residing in domains others such as gstatic.com, static.google.com and apis.google.com. By default, the browser would use DNS resolution to determine the IP addresses to send these resource requests to, so these requests would side-step the actual target front-end of our measurements. To address this, we added support for fine-grained DNS resolution control into PhantomJS. In our PhantomJS build, we supply a hostname to IP mapping file that PhantomJS will use before falling back to calling the host’s DNS, allowing us to specify the front-end to use for each domain (in our measurements, a single front-end for all resources on the page). Front-ends act as virtual hosts; a single host can handle requests for multiple hostnames. To ensure that an HTTP 30 IPs /24s ASes Countries Open resolver 23939 1207 753 134 EDNS-client-subnet 28793 1445 869 139 Benefit +20% +20% +15% +4% Table 3.2: Comparison of Google front-ends found by EDNS and open resolver. EDNS providers significant benefit over the existing technique. request sent to a specific IP address is directed to the correct hostname, we set also set the “Host” HTTP header field to the subresource’s hostname. 3.4 Validation In this section, we validate front-end enumeration and geolocation. 3.4.1 Coverage of Front-End Enumeration Using EDNS-client-subnet can improve coverage over previous methods that have relied on using fewer vantage points. We first quantify the coverage benefits of EDNS-client- subnet. We then explore the sensitivity of our results to the choice of prefix length for EDNS-client-subnet, since this choice can also affect front-end enumeration. Open Resolver vs EDNS-client-subnet Coverage. An existing technique to enu- merate front-ends for a serving infrastructure is to issue DNS queries to the infrastructure from a range of vantage points. Following previous work [94], we do so using open recursive DNS resolvers. We use a list of about 200,000 open resolvers 5 ; each resolver is effectively a distinct vantage point. These resolvers are in 217 counties, 14,538 ASes, and 118,527 unique /24 prefixes. Enumeration of Google with the open resolvers takes 5 Used with permission from Duane Wessels, Packet Pushers Inc. 31 about 40 minutes. This dataset forms our comparison point to evaluate the coverage of the EDNS-client-subnet approach we take in this chapter. Table 3.2 shows the added benefit over open resolvers of enumerating Google front- ends using EDNS-client-subnet. Our approach uncovers at least 15-20% more Google front-end IP addresses, prefixes, and ASes than were visible using open resolvers. By using EDNS-client-subnet to query Google on behalf of every client prefix, we obtain a view from locations that lack open recursive resolvers. In Section 3.5.1, we demonstrate the benefit over time as Google evolves, and in Section 3.7 we describe how we might be able to use our Google results to calibrate how much we would miss using rDNS to enumerate a (possibly much larger or smaller than Google) serving infrastructure that does not support EDNS-client-subnet. Completeness and EDNS-client-subnet Prefix Length. The choice of prefix length for EDNS-client-subnet queries can affect enumeration completeness. Prefix lengths shorter than /24 in BGP announcements can be too coarse for enumeration. We find cases of neighboring /24s within shorter BGP announcement prefixes that are directed to different serving infrastructure. For instance we observed an ISP announcing a /18 with one of its /24 subprefixes getting directed to Singapore while its neighboring prefix is directed to Hong Kong. Our evaluations query using one IP address in each /24 block. If serving infrastructures are doing redirections at finer granularity, we might not observe some front-end IP addresses or serving sites. The reply to the EDNS-client-subnet query returns a scope, the prefix length covering the response. Thus, if a query for an IP address in a /24 block returns a scope of, say /25, it means that the corresponding redirection holds for all IP addresses 32 in the /25 covering the query address, but not the other half of the /24. For almost 75% of our /24 queries, the returned scope was also for a /24 subnet, likely because it is the longest globally routable prefix. For most of the rest, we saw a /32 prefix length scope in the response, indicating that Google’s serving infrastructure might be doing very fine-grained redirection. We refer the reader to related work for a study of the relationship between the announced BGP prefix length and the returned scope [141]. For our purposes, we use the returned scope as a basis to evaluate the completeness of our enumeration. We took half of the IPv4 address space and issued a series of queries such that their returned scopes covered all addresses in that space. For example, if a query for 1.1.1.0/24 returned a scope of /32, we would next query for 1.1.1.1/32. These brute force measurements did not uncover any new front-end IP addresses not seen by our /24 queries, suggesting that our approach of using /24 prefixes likely provides complete coverage of Google’s entire front-end serving infrastructure. Enumeration Over Time. Front-ends often disappear and reappear from one day to the next across daily enumeration. Some remain active but are not returned in any EDNS-client-subnet requests, others become temporarily inactive, and some may be permanently decommissioned. To account for this variation and obtain an accurate and complete enumeration, we accumulate observations over time, but also test which servers still serve Google search on a daily basis. We check liveness by issuing daily, rate-limited, HTTP HEAD requests to the set of cumulative front-end IP addresses we observe. The Daily row in Table 3.3 shows a snapshot of the number of IPs, /24s, and ASes that are observed on 2013-8-8. The Cumulative row shows the additional infrastructure observed earlier in our measurement period but not on that day, and the Inactive row 33 IPs /24s ASes Daily 22959 1213 771 Cumulative +5546 +219 +93 –Inactive -538 -24 -8 Active 27967 (+22%) 1408 (+16%) 856 (+11%) Table 3.3: A snapshot from 2013-8-8 showing the differences in number of IPs, /24s, and ASes observed cumulatively across time versus what can be observed within a day. Some front-end IP addresses may not be visible in a daily snapshot. However, IP addresses may be temporarily drained or become permanently inactive or be reassigned. Acquiring an accurate and complete snapshot of active serving infrastructure requires accumulating observations over time and testing which remain active. indicates how many of those were not serving Google search on 2013-8-8. This shows that the front-ends that are made available through DNS on a given day is only a subset of what may be active on a given day. For example, for several consecutive days in the first week of August 2013, all our queries returned IP addresses from Google’s network, suggesting a service drain of the front-ends in other networks. Our liveness probes confirmed that the majority of front-ends in other networks still actively served Google search when queried, even though no DNS queries directed to them. A possible future direction is to examine whether this approach can infer Google maintenance periods and redirections away from outages, as well as assess whether these shifts impact performance. 3.4.2 Accuracy of Client-Centric Geolocation Client-centric geolocation using EDNS-client-subnet shows substantial improvement over traditional ping based techniques [85], undns [138], and geolocation databases [120]. Dataset. To validate our approach, we use the subset of Google front-ends with hostnames that contain airport codes hinting at their locations. Although the airport code does not represent a precise location, we believe that it is reasonable to assume that the actual front-end is within a few 10s of kilometers of the corresponding airport. Airport 34 codes are commonly used by network operators as a way to debug network and routing issues so having accurate airport codes is an important diagnostic tool. Previous work has show that only 0.5% of hostnames in a large ISP had misleading names [155], and so we expect that misnamed Google front-ends only minimally distort our results. A limitation of our validation is that we cannot validate against Google hosted IPs that do not have airport codes because popular geolocation databases such as MaxMind place these IPs in Mountain View, CA. Using all 550 front-ends with airport codes, we measure the error of our technique as the distance between our estimated location and the airport location from data collected on April 17, 2013. Accuracy. Figure 3.2 shows the distribution of error for CCG, as well as for three traditional techniques. We compare to constraint-based geolocation (CBG), which uses latency-based constraints from a range of vantage points [85], a technique that issues traceroutes to front-ends and locates the front-ends based on geographic hints in names of nearby routers [94], and the MaxMind GeoLite Free database [120]. We offer substantial improvement over existing approaches. For example, the worst case error for CCG is 409km, whereas CBG, the traceroute-based technique, and MaxMind have errors of over 500km for 17%, 24%, and 94% of front-ends, respectively. CBG performs well when vantage points are close to the front-end [104], but it incurs large errors for the half of the front-ends in more remote regions. The traceroute-based technique is unable to provide any location for 20% of the front-ends because there were no hops with geographic hints in their hostnames near to the front-end. The MaxMind database performs poorly because it places most front-ends belonging to Google in Mountain View, CA. 35 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 CDF of estimated location Error (km) client-centric geolocation (CCG) CBG undns Maxmind Figure 3.2: Comparison of our client-centric geolocation against traditional techniques, using Google front-ends with known locations as ground truth. Importance of Filtering. Figure 3.3 demonstrates the need for the filters we apply in CCG. The CCG no filtering line shows our basic technique without any filters, yielding a median error of 556km. Only considering client eyeball prefixes we observed in the BitTorrent dataset reduces the median error to 484km and increases the percentage of front-ends located with error less than 1000km from 61% to 74%. Applying our standard deviation filtering improves the median to 305km and error less than 1000km to 86%. When using speed-of-light constraints measured from PlanetLab and MLab to exclude client locations outside the feasible location for a front-end and to exclude clients with infeasible MaxMind locations, we obtain a median error of 26km, and only 10% of front-end geolocations have an error greater than 1000km. However, we obtain our best results by simultaneously applying all three filters. Case Studies of Poor Geolocation. CCG’s accuracy depends upon its ability to draw tight speed-of-light constraints, which in turn depends (in our current implementation), on Planetlab and M-Lab deployment density. We found one instance where sparse vantage 36 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 CDF of estimated location Error (km) client-centric geolocation (CCG) CCG only sol CCG only std CCG only eyeballs CCG no filtering Figure 3.3: Impact of our various techniques to filter client locations when performing client- centric geolocation on Google front-ends with known locations. point deployments affected CCG’s accuracy. In this instance, we observe a set of front- ends in Stockholm, Sweden, with the arn airport code, serving a large group of client locations throughout Northern Europe. However, our technique locates the front-ends as being 409km southeast of Stockholm, pulled down by the large number of clients in Oslo, Copenhagen and northern Germany. Our speed of light filtering usually effectively eliminates clients far from the actual front-end. In this case, we would expect Planetlab sites in Sweden to filter out clients in Norway, Denmark and Germany. However, these sites measure latencies to the Google front-ends in the 24ms range, yielding a feasible radius of 2400km. This loose constraint results in poor geolocation for this set of front-ends. It is well-known that Google has a large datacenter in The Dalles, Oregon, and our map (Fig. 3.5) does not show any sites in Oregon. In fact, we place this site 240km north, just south of Seattle, Washington. A disadvantage of our geolocation technique is that large datacenters are often hosted in remote locations, and our technique will pull them 37 towards large population centers that they serve. In this way, the estimated location ends up giving a sort of “logical” serving center of the server, which is not always the geographic location. We also found that there are instances where we are unable to place a front-end. In particular, we observed that occasionally when new front-ends were first observed during the expansion, there would be very few /24 client networks directed to them. These networks may not have city-level geolocation information available in MaxMind so we were unable to locate the corresponding front-ends. 3.5 Mapping Google’s Expansion We present a longitudinal study of Google’s serving infrastructure. Our initial dataset is from late October to early November of 2012, and our second dataset covers March 2013 through April of 2016. 6 We are able to capture a substantial expansion of Google infrastructure. 3.5.1 Growth Over Time For each snapshot that we capture, we use EDNS-client-subnet to enumerate all IP addresses returned for www.google.com. Figure 3.4(a) depicts the number of server IP addresses seen in these snapshots over time. 7 The graph shows slow growth in the cumulative number of Google IP addresses observed between November 2012 and March 2013, then a major increase in mid-March 2013 in which we saw approximately 3,000 new 6 Results from late October to early November of 2012 and March through August of 2013 were presented at IMC 2013 [47]. The rest of measurements are from after publication. 7 It is not necessarily the case that each IP address maps to a distinct front-end. 38 0 10000 20000 30000 40000 50000 60000 70000 80000 2013-01-01 2013-07-01 2014-01-01 2014-07-01 2015-01-01 2015-07-01 2016-01-01 Cumulative IPs Observed Date EDNS Open resolver 0 500 1000 1500 2000 2500 3000 3500 2013-01-01 2013-07-01 2014-01-01 2014-07-01 2015-01-01 2015-07-01 2016-01-01 Cumulative /24s observed Date EDNS Open resolver 0 200 400 600 800 1000 1200 1400 1600 2013-01-01 2013-07-01 2014-01-01 2014-07-01 2015-01-01 2015-07-01 2016-01-01 Cumulative ASes observed Date EDNS Open resolver 0 200 400 600 800 1000 1200 1400 1600 2012-10-01 2012-11-01 2012-12-01 2013-01-01 2013-02-01 2013-03-01 2013-04-01 2013-05-01 2013-06-01 2013-07-01 2013-08-01 Cumulative clusters observed by EDNS Date Figure 3.4: Growth in the number of IP addresses (a), /24 prefixes (b), ASes (c), and points of presence (d) observed to be serving Google’s homepage over time. During our study, Google expanded rapidly at each of these granularities. Data for (a), (b), and (c) is shown from 2012-10-17 through 2016-04-23. Points of presence data (d) is shown from 2012-10-26 through 2013-05-19. Figures from our original publication from IMC 2013 [47] stopped at 2013-08-15, except for points of presence which stopped at 2013-05-19. serving IP addresses come online. This was followed by another large jump of 3,000 in mid-May 2013. Over the month of June 2013, we observed 11,000 new IPs followed by an increase of 4,000 across July 2013. Growth continued steadily until May of 2015 when we observed an increase of around 24,000 new IP addresses through December 2015. By the end of our study, the number of serving IP addresses increased seventeenfold. Figure 3.4(b) shows this same trend in the growth of the number of /24s seen to serve Google’s homepage. In Figure 3.4(c), we see 14X growth in the number of ASes originating these prefixes, indicating that this large growth is not just Google adding new capacity 39 180 ° W 135 ° W 90 ° W 45 ° W 0 ° 45 ° E 90 ° E 135 ° E 180 ° E 90 ° S 45 ° S 0 ° 45 ° N 90 ° N Google AS Other AS 2012-10-28 Other AS 2013-08-14 Other AS 2016-04-23 Figure 3.5: A world wide view of the expansion in Google’s infrastructure. Note that the locations that appear floating in the ocean are on small islands. These include Guam, Maldives, Seychelles, Cape Verde and Funchal. to existing serving locations. Figure 3.4(d) shows the growth in the number of distinct serving sites within those ASes. Figure 3.5 shows the geographic locations of Google’s serving infrastructure at the beginning of our measurements, in August 2013, and in April 2016. We observe two types of expansion. First, we see new serving locations in remote regions of countries that already hosted servers, such as Australia and Brazil. Second, we observe Google turning up serving infrastructure in countries that previously did not appear to serve Google’s homepage, such as Vietnam and Thailand. Of new front-end IP addresses that appeared up to our original IMC publication [47] with data up to August 2013, 95% are in ASes other than Google. Of those addresses, 13% are in the United States and 26% are in Europe, places that would appear to be well-served directly from Google’s network. In addition, 21% are in Asia, 13% are in North America (outside the US), 11% are in South America, 8% are in Africa, and 8% are in Oceania. 40 0 20 40 60 80 100 120 140 160 2013-01-01 2013-07-01 2014-01-01 2014-07-01 2015-01-01 2015-07-01 Cumulative countries observed Date EDNS Open resolver Figure 3.6: Number of countries hosting Google serving infrastructure over time. Dates show are between 2012-10-17 and 2015-11-30. Our data after this period observed no new countries. Figure 3.6 depicts this growth in the number of countries hosting serving infrastructure, from 58 or 60 at the beginning of our study to 147 in April 2016. 8 3.5.2 Characterizing the Expansion November 2012 May 2013 August 2013 ASes Clients ASes Clients ASes Clients Google 2 9856K 2 (+0%) 9658K (-2%) 2 (+0%) 9067K (-8%) Tier 1 2 481 2 (+0%) 201 (-58%) 4 (+100%) 35K (+7278%) Large 30 111K 46 (+53%) 237K (+114%) 123 (+310%) 410K (+270%) Small 35 37K 64 (+83%) 63K (+71%) 319 (+811%) 359K (+870%) Tiny 23 31K 41 (+78%) 57K (+84%) 206 (+796%) 101K (+228%) Stub 13 21K 36 (+177%) 38K (+81%) 201 (+1446%) 79K (+281%) Table 3.4: Classification of ASes hosting Google serving infrastructure at the beginning, middle, and end of our study. We count both the number of distinct ASes and the number of client /24 prefixes served. Growth numbers for May and August are in comparison to November. Google still directs 90% of the prefixes to servers within its own network, but it is evolving towards serving fewer clients from its own network and more clients from smaller ASes around the world. 8 We base our locations on our CCG approach, which may distort locations of front-ends that are far from their clients. 41 To better understand the nature of Google’s expansion, we examine the types of networks where the expansion is occurring and how many clients they serve. Table 3.4 classifies the number of ASes of various classes in which we observe serving infrastructure, both at the beginning and at the end of our study. It also depicts the number of /24 client prefixes (of 10 million total) served by infrastructure in each class of AS. We use AS classifications from the June 28, 2012 dataset from UCLA’s Internet Topology Collection [147], 9 except that we only classify as stubs ASes with 0 customers, and we introduce a Tiny ISP class for ASes with 1-4 customers. As seen in the table, the rapid growth in ASes that host infrastructure has mainly been occurring lower in the AS hierarchy. Although Google still directs the vast majority of client prefixes to servers in its own ASes, it has begun directing an additional 8% of them to servers off its network, representing a 393% increase in the number served from outside the network. By installing servers inside client ISPs, Google allows clients in these ISPs to terminate their TCP connections locally (likely at a satellite server that proxies requests to a datacenter [51,71,129], as it is extremely unlikely that Google has sufficient computation in these locations to provide its services). We perform reverse DNS lookups on the IP addresses of all front-ends we located outside of Google’s network. More than 20% of them have hostnames that include either ggc or google.cache. These results suggest that Google is reusing infrastructure from the Google Global Cache (GGC) [17], Google’s content distribution network built primarily to cache YouTube videos near users. 10 It is 9 UCLA’s data processing has been broken since 2012, but we do not expect the AS topology to change rapidly. 10 GGC documentation mentions that the servers may be used to proxy Google Search and other services. 42 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 CDF of ISPs Number of serving sites All Stub ISP Tiny ISP Small ISP Large ISP Tier-1 ISP Figure 3.7: CDF of number of sites in different types of ISPs from August 2013. possible that the servers were already in use as video caches; if so, this existing physical deployment could have enabled the rapid growth in front-ends we observed. Figure 3.7 depicts a different view of the Google expansion. It charts the cumulative distribution of the number of serving sites by ISP type. Overall, nearly 70% of the ISPs host only one serving site. Generally speaking, smaller ISPs host fewer serving sites than larger ISPs. The biggest exceptions are a Tiny ISP in Mexico hosting 23 serving sites consisting of hundreds of front-end IPs, and a Stub national mobile carrier with 21 sites. Befitting their role in the Internet, most Large and Tier 1 ISPs host multiple sites. For example, a Large ISP in Brazil serves from 23 sites. Whereas Google would be willing to serve any client from a server located within the Google network, an ISP hosting a server would likely only serve its own customers [152] 11 . Serving its provider’s other customers, for example, would require the ISP to pay its 11 What Akamai refer to as “Onnet”, we call ISP-hosted front-ends. 43 provider for the service! We check this intuition by comparing the location in the AS hierarchy of clients and the servers to which Google directs them. Of clients directed to servers outside of Google’s network, 93% are located within the server’s AS’s customer cone (the AS itself, its customers, their customers, and so on) [116]. Since correctly inferring AS business relationship is known to be a hard problem [65], it is unclear whether the remaining 7% of clients are actually served by ISPs of which they are not customers, or (perhaps more likely) whether they represent limitations of the analysis. In fact, given that 40% of the non-customer cases stem from just 7 serving ASes, a small number of incorrect relationship or IP-to-AS inferences could explain the counter-intuitive observations. 3.5.3 Impact on Geolocation Accuracy 0 0.2 0.4 0.6 0.8 1 0 1000 2000 3000 4000 5000 CDF of estimated location Error (km) client-centric geolocation (CCG) 2013-4-14 CCG no filtering 2013-4-14 CCG no filtering 2013-3-20 CCG no filtering 2012-10-29 Figure 3.8: Impact of our various techniques to filter client locations when performing client- centric geolocation on Google front-ends with known locations. A side-effect of Google directing more clients to front-ends closer to them is that our geolocation technique should become more accurate over time, since we base it on the 44 assumption that front-ends are near their clients. To verify that assumption, we apply our basic geolocation approach–without any of our filters that increase accuracy–to the datasets from three points in time. We chose dates to coincide with the large jumps in Google servers that we observe in Figure 3.4. Using the airport code-based ground truth dataset from Section 3.4.2, Figure 3.8 shows the distribution of error in geolocation using these three datasets and, for comparison, the most recent dataset using all our filters. We can see that there is steady reduction in error over time, with median error decreasing from 817km in October 2012, to 610km in March 2013, and 475km in April 2013. However, our filters still provide substantial benefit, yielding a median error of only 22km. 3.6 Client Performance Impact 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 3000 3500 4000 CDF of clients Distance from Client to Estimated Front-end Location (km) 2013-8-14 2012-10-29 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 3000 3500 4000 CDF of clients Distance from Client to Estimated Front-end Location (km) Other AS only 2013-8-14 Other AS only 2012-10-29 Figure 3.9: (a) Distances from all BitTorrent client prefixes to estimated front-end locations to which Google directs them. (b) Comparison of the distances between the set of clients served by front-ends outside of Google’s network on 2013-8-14 and their estimated front-end locations on 2013-8-14 and 2012-10-29. In this section, we evaluate the performance of customers previously served by Google- hosted front-ends (Google-hosted front-end) that are directed to ISP-hosted front-ends (ISP-hosted front-end) post-expansion. 45 Distance: Google’s expansion of infrastructure implies that, over time, many clients should be directed to servers that are closer to them than where Google directed them at the beginning of the study. Figure 3.9(a) shows the distribution of the distance from a client to our estimate of the location of the server serving it. We restrict the clients to those in our BitTorrent eyeball dataset (2.6 million client prefixes) and geolocate all client locations using MaxMind. Some of the very large distances shown in both curves could be accuracy limitations of the MaxMind GeoLite Free database, especially in regions outside of the United States. Overall, results show that in mid-August 2013, many clients are substantially closer to the set of servers they are directed to than in October of 2012. For example, the fraction of client prefixes within 500km of their front-ends increases from 39% to 64%, and the fraction within 1000km increases from 54% to 78%. Figure 3.9(b) shows the distribution of distances only for the set of client prefixes that were directed to front-ends outside of Google’s network on 2013-8-14. The top curve shows the distances between the clients and front-ends on 2013-8-14 while the bottom curve shows the distances between this same set of clients and the front-ends that they were served by on 2012-10-29. The figure shows that the set of clients that have moved off of Google’s network are now much closer to their front-ends in August of 2013 than in October of 2012. The fraction of client prefixes within 500km of their front-ends has increased from 21% to 89%, and the fraction within 1000km increased from 36% to 96%. Because many of the newer front-ends seem to be satellites that likely proxy traffic back to datacenters, it is hard to know the impact that decreasing the distance from client to front-end will have on application performance [129]. 46 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 3000 CDF of Prefixes Distance (km) Closest Closest Google or Provider Closest Google-hosted Figure 3.10: Distances from all announced /24 prefixes to the closest Google-hosted front- end, closest Google-hosted or provider ISP- hosted front-end, and closest ISP-hosted or Google-hosted front-ends. 0 0.2 0.4 0.6 0.8 1 -400 -200 0 200 400 600 800 CDF of Atlas Probes Directed to ISP-hosted Front-end Improvement (ms) Latency HTTP Search Figure 3.11: The distribution of latency and HTTP GET search improvement for RIPE At- las probes that are now directed to ISP-hosted front-ends. To show the potential offered by the expansion in Google’s serving infrastructure, Figure 3.10 shows the distribution of distances between all announced /24 prefixes for which we have accurate geolocation data, and three subsets of Google infrastructure. The right-most line shows the distances to the closest Google-hosted front-end (henceforth, on-net). The middle line shows distances to the closest on-net or the closest ISP-hosted front-end (henceforth, off-net) in the /24’s provider hierarchy, whichever is closer. The left-most line shows the closest front-end (either on-net or off-net) for each /24, including off-nets outside of providers to which it would normally never be directed to by policy considerations. The median distance from /24 prefixes to the closest on-net is 190 km. The new front-end sites reduce the median distance to 43 km when adding in only front-ends in a prefix’s providers or 30 km when considering all front-ends. The improvement is greater in parts of the tail, those clients that may be particularly underserved today. At the 90th percentile, /24 prefixes are within 1298km of the closest on-net. Provider off-nets shift 47 the population to within 1000 km and the closest front-end to within 667 km. This graph shows that off-nets provide new serving capabilities near many clients, opening up the potential for improved performance. However, the gap between distances to all front-ends and distances to only provider off-nets or on-nets suggests that policy may limit much of the performance gain in practice. RIPE Atlas: For RIPE Atlas probes, we find that 230 (5%) out of 4,648 active probes are directed to ISP-hosted front-ends. Figure 3.11 shows the performance improvement in RTT and HTTP search latency (Section 3.3.4.2). In general, we see that for about 75% of probes now directed to ISP-hosted front-ends, there is some performance improvement. Around 40% see strong improvement with latency reductions of 20ms or more. This relationship between the lines also indicates that latency is a reasonable predictor of the HTTP performance. 0 0.2 0.4 0.6 0.8 1 -400 -200 0 200 400 600 800 1000 CDF of PlanetLab Nodes Directed to ISP-hosted Front-ends Improvement (ms) HTTP GET Time-To-Connect HTTP GET Time-To-First-Byte HTTP GET Request Completed 0 0.2 0.4 0.6 0.8 1 -3000-2000-1000 0 1000 2000 3000 4000 5000 6000 7000 8000 CDF of PlanetLab Nodes Directed to ISP-hosted Front-ends Improvement (ms) Search Page-Load-Time Figure 3.12: The difference in front-end performance for PlanetLab nodes before and after Google’s front-end expansion. Figure (a) on the left shows the difference for three HTTP-related metrics. Figure (b) on the right shows the difference in performance for page load time. PlanetLab: Of 226 PlanetLab sites, Google directs 59 (26%) to ISP-hosted front-ends, much higher than the 5% of RIPE Atlas probes. This difference is likely due to RIPE Atlas’ concentrated deployment in Europe. To evaluate PlanetLab’s performance, we look 48 at four metrics: time-to-connect, time-to-first-byte and HTTP request completion time using curl, and page load time using the PhantomJS headless web-browser. Figure 3.12(a) shows the improvement for all HTTP metrics. It shows that while time- to-connect is the same or better for more than 90% of nodes, that time-to-first-byte and total-request-completion time are better only for 77% and 62% respectively. Figure 3.12(b) captures the entire page load time difference of search results on PlanetLab where 44% of nodes see improvement of 200ms or more. The fraction of nodes that see improvement is very close to the HTTP Request Complete line in Figure 3.12(a), indicating that HTTP GET is a strong predictor of improvement in page load time. This makes sense in the case of Google Search where a single front-end IP can handle requests to page resources from multiple Google domains such as www.google.com, ssl.gstatic.com, and apis.google.com. 3.7 Using Our Mapping In addition to our evaluation of Google’s serving infrastructure so far, our mapping is useful to the research community, for what it says about clients, and for what it can predict about other serving infrastructure. The Need for Longitudinal Research Data. Our results show the limitations of one-off measurement studies—a snapshot of Google’s serving infrastructure in October would have missed the rapid growth of their infrastructure and potentially misrepresented their strategy. We believe the research community needs long-term measurements. This chapter demonstrates the necessity of measurements over time to support our thesis (Section 3.8). 49 Sharing the Wealth: From Our Data to Related Data. Our mapping techniques assume the target infrastructure is pervasive and carefully and correctly engineered. We assume that (a) Google directs most clients to nearby front-ends; (b) Google’s redirection is carefully engineered for “eyeball” prefixes that host end-users; and (c) Google will only direct a client to a satellite front-end if the client is a customer of the front-end’s AS. Google has economic incentives to ensure these assumptions. In practice, these assumptions are generally true but not always, and our design and evaluation has carefully dealt with exceptions (such as clients occasionally being directed to distant front-ends). If we accept these assumptions, our maps allow us to exploit Google’s understanding of network topology and user placement to improve other datasets. Prior work has used Akamai to chose detour routes [142]; we believe our mapping can improve geolocation, peer selection, and AS classification. Geolocation is a much studied problem [85,90,104], and availability of ground truth can greatly improve results. With clients accessing Google from mobile devices and computers around the world, Google has access to ample data and measurement opportunity to gather very accurate client locations. An interesting future direction is to infer prefix location from our EDNS-client-subnet observations, and use that coarse data to re-evaluate prefixes that existing datasets (such as MaxMind) place in very different locations. The end result would be either higher accuracy geolocation or, at least, identification of prefixes with uncertain locations. Researchers designed a BitTorrent plugin that would direct a client to peer with other users the plugin deemed to be nearby, because the potential peer received similar CDN redirections as the client’s [54]. However, with the existing plugin, the client can only 50 assess similarity of other users of the plugin who send their CDN front-end mappings. Just as we used EDNS-client-subnet to obtain mappings from arbitrary prefixes around the world, we could design a modified version of the plugin that would allow a client to assess the nearness of an arbitrary potential peer, regardless of whether the peer uses the plugin or not. By removing this barrier, the modified plugin would be much more widely applicable, and could enhance the adoption of such plugins. Finally, in Section 3.5.2, we showed that 90% of prefixes served in ASes other than Google are within the customer cone of their serving AS. The remaining 10% of prefixes likely represent problems with either our IP-to-AS mapping [98] or with the customer cone dataset we used [116]. From talking to the researchers behind that work and sharing our results with them, it may be necessary to move to prefix-level cones, to accommodate the complex relationships between ASes in the Internet. The client-to-front-end data we generate could help resolve ambiguities in AS relationships and lead to better inference in the future. Our data has had impact. Since the original publication in IMC 2013 [47], we know of at least 26 researchers that have used our data to help their own research. Mapping Other Providers. While our techniques will apply directly for some providers, we will need to adapt them for others, and we describe the challenges and potential approaches here. Our studies of Google combine observations using EDNS-client-subnet and open recursive resolvers. EDNS-client-subnet support is increasing. However, some networks such as Akamai have restricted its use to popular public DNS resolvers, and we are restricted to using open resolvers for them. 51 500 1000 1500 2000 2500 3000 3500 4000 4500 0 20000 40000 60000 80000 100000 Number of Google IP addresses Number of vantage points (one per /24) resolver min resolver max resolver mean EDNS with resolver /24s min EDNS with resolver /24s max EDNS with resolver /24s mean Figure 3.13: The relationship between number of Google IP addresses discovered and the number of vantage points. Using one open resolver per /24 block and one EDNS query per /24 block. In Section 3.4.1, we demonstrated that even using hundreds of thousands of open DNS resolvers would miss discovering much of Google’s infrastructure. Table 3.2 showed that EDNS-client-subnet found 20% more front-end IPs than open resolvers, but we cannot assume that ratio holds on other infrastructures. We would expect open resolvers to suffice to uncover all of a ten-front-end infrastructure, for example, but it is unclear what the coverage gap for Akamai would be. We may be able to use our results from Google to project results for other providers that support only open resolvers. We select one open recursive resolver from each /24 in which we know one (there are 110,000 such prefixes). Then, we select one of these /24s at a time and resolve www.google.com from the open resolver in the prefix and via an EDNS query for that prefix. Figure 3.13 depicts the growth in the number of Google front-end IP addresses discovered by the two approaches as we issue additional measurements (1000 trials). Using resolvers in a set of prefixes yields very similar results to issuing EDNS 52 queries for that same set of prefixes, so that the benefit of EDNS is primarily that we can issue queries for many more prefixes than we have access to resolvers in. We extrapolate these growth curves to understand the impact of having more resolvers. To test this theory, we fit power law curves to the open resolver lines (R = 0.97 in all cases). We project access to resolvers up to all 10M routable /24 prefixes, predicting discovery of 6990–8687 IP addresses of Google front-end servers as of May 4th, 2013. Using EDNS-client-subnet queries for these 10M prefixes, we found 8563 IP addresses, within the range, so the extrapolation approach may be reasonable. We can use our Google results to characterize which regions our set of open resolvers has good coverage in, in order to flag portions of other infrastructures as more or less complete. CDNs such as CloudFront support EDNS queries to discover client-to-front-end map- pings, but they lack the density of servers of Akamai and Google and so necessarily direct some clients to distant servers. Since our geolocation approach assumes front-ends are near clients, it may not be sound to assume that the front-end is at the geographic center of the clients. CloudFront publishes its geographic points of presence on its website, so we can use its deployment as ground truth to evaluate approaches to map other providers that do not publish this information. If aggressive pruning of distant clients does not work well for sparse deployments, straightforward alternate approaches may work well. For example, these small deployments tend to be at well-connected colocation facilities, where we likely have a vantage point close enough to accurately use delay-based geolocation [85,104]. 53 3.8 Conclusions As the role of interactive web applications continues to grow in our lives, and the mobile web penetrates remote regions of the world more than wired networks ever had, the Internet needs to deliver fast performance to everyone, everywhere, at all times. To serve clients around the world quickly, service providers deploy globally distributed serving infrastructure, and we must understand these infrastructures to understand how providers deliver content today. Towards that goal, we developed approaches specific to mapping these serving infrastructures. By basing our techniques around how providers architect their infrastructures and guarding our techniques against noisy data, we accurately map the geographically-distributed serving sites. We apply our techniques to mapping Google’s serving infrastructure and track its rapid expansion over the period of our measurement study. During that time, the number of serving sites grew more than sevenfold, and we see Google deploying satellite front- ends around the world, in many cases distant from any known Google datacenters. By continuing to map Google’s and others’ serving infrastructures, we will watch the evolution of these key enablers of today’s Internet, and we expect the accurate maps to enable future work by us and others to understand and improve content delivery on a global scale. Google’s shift in serving Google Search from ISP-hosted front-ends is a CDN design decision. We showed that measurements with open DNS resolvers, which offer limited user vantage points, failed to capture the full extent of Google serving infrastructure and expansion. In contrast, through complete enumeration of Google’s serving infrastructure over 3 1 ⁄2 years, we found that Google’s expansion had significant impact by greatly reducing 54 the distance between clients and front-ends. Therefore, this work supports the thesis that complete client coverage over time is required to understand the impact of Google’s design decision to expand serving infrastructure into non-Google ISPs. 55 Chapter 4 Odin: A Scalable Fault-Tolerant CDN Measurement System 4.1 Introduction Content delivery networks (CDNs) are a key part of the Internet ecosystem. The primary function of a CDN is to deliver highly-available content at high performance. To accomplish this, CDNs deploy Points of Presence (PoPs) around the world that interconnect with other Autonomous Systems (ASes) to provide short, high quality paths between content and end users. While a CDN’s goal is to deliver the best performance to all users in a cost-effective manner, the dynamic, heterogeneous, and distributed nature of the Internet makes this difficult. CDNs serve content to users all over the world, across tens of thousands of ASes, using various forms of Internet access and connection quality. User performance is impacted by Internet routing changes, outages, and congestion, all of which can be outside the control of the CDN. Without constant insight into user performance, a CDN can suffer from low availability and poor performance. To gain insight into user performance, CDNs need large-scale measurements for critical CDN operations such as traffic management [114,135,148,154,157], Internet path performance debugging [108,160], and deployment modeling [144]. 56 Microsoft operates a CDN with over 100 PoPs around the world to host applications critical to Microsoft’s business such as Office, Skype, Bing, Xbox, and Windows Update. This work presents our experience designing a system to meet the measurement needs of Microsoft’s global CDN. We first describe the key requirements needed to support Mi- crosoft’s CDN operations. Existing approaches to collecting measurements were unsuitable for at least one of two reasons: • Unrepresentative performance. Existing approaches lack coverage of Microsoft users or use measurement techniques that do not reflect user performance. • Insensitive to Internet events. Existing approaches fail to offer high measurement volume, explicit outage notification, and comparative measurements to satisfy key Microsoft CDN use cases. As a result, existing measurements solutions provide both (1) incorrect answers to questions such as “What is P95 latency to users directed to our PoP in Seattle?” and (2) insufficient answers such as failing to detect an outage impacting 1% of requests or specific networks. Next we present the design of Odin, our scalable, fault-tolerant CDN measurement system. Odin issues active measurements from popular Microsoft applications to provide high coverage of Internet paths from Microsoft users. It measures to configurable endpoints, which are hostnames or IP addresses of remote target destinations and can be in Microsoft or external networks. Measurement allocation is controlled by a distributed web service, enabling many network experiments to run simultaneously, tailoring measurements on a per-use-case basis as necessary. Odin is able to collect measurements even in the presence 57 of Microsoft network failures, by exploiting the high availability and path diversity offered by third party CDNs. Last, we demonstrate that Odin enables important Microsoft CDN use cases, including improving performance. There are two key insights that make our design distinct and effective. Firstly, first-party CDNs have an enormous advantage over third-party CDNs in gathering rich measurement data from their own clients. Secondly, integration with external networks provides a valuable opportunity for rich path coverage to assist with network debugging and for enabling fault-tolerance. In our design of Odin, we emphasize key design requirements that support our thesis. Odin makes client-side active measurements over time to monitor diverse paths to Microsoft and other networks from end-user applications. From this, we get complete coverage of Microsoft’s user population to effectively evaluate CDN design decisions. A more thorough discussion follows in Section 4.9. 4.2 Background This section provides background about content delivery networks and Microsoft’s deploy- ment. Microsoft’s CDN is a “hybrid cloud“ CDN, i.e., it is used for both its own first-party content as well as for other large third-party customers such as streaming services and online newspapers. 58 4.2.1 Microsoft’s Network Microsoft provides high performance and availability to its customers using a global network with 100+ PoPs, many datacenters, and a Microsoft-operated backbone network interconnecting them. Microsoft operates two types of datacenters. One set is Microsoft’s Azure public cloud compute platform [7] which currently has 36 regions. A region is “a set of datacenters deployed within a latency-defined perimeter and connected through a dedicated regional low-latency network” [8]. The second consists of legacy datacenters, pre-dating Azure. Third-party cloud tenants only run in the Azure datacenters, whereas first-party services operated by Microsoft run in both types. Figure 2.1 shows Azure regions as “Cloud Back-ends” and private datacenters as “Internal Back-ends” . Redirection of first-party and third-party clients Microsoft currently runs two independent CDNs. A first-party anycast CDN runs Microsoft services such as Bing, Office, and XBox [48, 74]. It has more than 100 front-end locations around the world, collocated with all PoPs and several Microsoft public and private datacenters. The second CDN is an Azure traffic management service offered to Azure customers with applications deployed in multiple regions. Whereas Microsoft’s first party CDN uses anycast to steer clients, its Azure service uses DNS to direct users to the lowest-latency region. After receiving the DNS response, users connect directly to an Azure region. 4.2.2 Comparison to Other CDNs Microsoft’s architecture closely mirrors other CDNs, especially hybrid-cloud CDNs from Google and Amazon. 59 End-user applications. All three have web, mobile, and desktop application deploy- ments with large global user bases. Google’s include the Chrome Browser, Android OS, Search, YouTube, and Gmail. Amazon’s include the Store, Audible, and Prime Video. Microsoft’s include Office, Windows OS, Skype, and XBox. CDN and cloud services. Like Microsoft, Amazon and Google run multiple types of CDNs. Google runs a first-party CDN [71, 108, 160], a third-party CDN [14], and application load balancing across Google Cloud regions [15]. Amazon’s equivalent services are CloudFront [5] and Route 53 [4]. Amazon Web Services [6] and Google Cloud Platform [16] are similar to Microsoft Azure [7]. Amazon [88] and Google [100] also run backbone networks. Because of these similarities, we believe our goals, requirements, and design are applicable to networks beyond Microsoft. 4.3 Goals and Requirements We need measurements to support Microsoft’s CDN operations and experimentation, leading to the following goals and resulting requirements. Goal-1: Representative performance reflecting what users could achieve on current and alternate routes. Requirement: High coverage of paths between Microsoft’s users and Microsoft is critical for traffic engineering, alerts on performance degradation, and “what-if” experimentation on CDN configurations, to avoid limited or biased insight into the performance of our network. In particular, our measurements should cover paths to /24 prefixes that combine 60 to account for 90% of the traffic from Microsoft. In addition, they should cover paths to 99% of designated “high revenue” /24 prefixes, which primarily are enterprise customers. Requirement: Coverage of paths between Microsoft users and external networks to help detect whether a problem is localized to Microsoft and to assess the performance impact of expanding Microsoft’s footprint to new sites. External networks may be any CDN, cloud provider, or virtual private server hosting service. Requirement: Measurements reflect user-perceived performance, correlating with appli- cation metrics and reflecting failure scenarios experienced by production traffic, to enable decisions that improve user experience. Goal-2: Sensitive and quick detection of Internet events. Requirement: High measurement volume in order to quickly detect events across a large number of users and cloud endpoints, even if the events impact only a small number. Without high measurements counts, events can be missed entirely, or data quality can be too poor to confidently make measurement-driven traffic engineering choices. A reasonable level of sensitivity is the ability to detect an availability incident that doubles the baseline failure rate, e.g., from 0.1% to 0.2%. Figure 4.10, in Section 4.10, shows if we assume measurements fail independently according to a base failure rate, detecting this change would require at least 700 measurements, and detecting a change from 0.01% to 0.02% would require at least 7000 measurements. For confidentiality reasons, we cannot describe our baseline failure rates, but we consider several thousand measurements within a five minute window from clients served by an ISP within one metropolitan region sufficient for our needs. 61 Requirement: Explicit outage signals, in order to detect events that impact small groups of clients. Historical trends are too noisy to detect the gray failures that make up the majority of cloud provider incidents [95]. Requirement: Fault tolerance in data collection, to collect operation-critical measure- ments in the presence of network failures between the client and collector. Requirement: Comparative measurements in same user session for experimentation, providing accurate “apples-to-apples” comparisons when performing an A/B test and minimizing the chance of changing clients or network conditions coloring the comparison between test and control measurements. Goal-3: Compatible with operational realities of existing systems and appli- cations. Requirement: Measurements of client-LDNS associations, which are needed to operate both anycast and DNS-redirection CDNs effectively (Section 2.3, 4.7.2.1, 5.2.3). Requirement: Minimal updates to user-facing production systems, given that network configuration changes are a common cause of online service outages [84]. Requirement: Application compliance across varying requirements. Each Microsoft application independently determines the level of compliance certifications (FISMA, SOC 1-3, ISO 27018, etc.), physical and logical security, and user privacy protections. Application requirements determine the endpoints that can be measured, set of front-ends that can process the measurements, requirements for data scrubbing and aggregation (e.g., IP blocks), and duration of data retention. These strict security policies stem from Microsoft’s enterprise customers. Any cloud provider or CDN that serves enterprises, such as Akamai [3], also need to meet these compliance requirements. 62 Goals Requirements Third- party measure- ment platforms Layer 3 measure- ments from CDN infras- tructure Layer 3, DNS from users Server- side mea- sure- ments of client connec- tions Odin Represen- tative Perfor- mance Coverage of paths between Microsoft users and Microsoft X X Coverage of paths between Microsoft users and external networks X X Measurements reflect user-perceived performance X X X Sensitive to Internet Events High measurement volume X X X X Explicit outage signal X X X X Fault tolerance X X X X Comparative measurements in same user session for experimentation X X X X Compati- ble with Measurements of client-LDNS associations X X X X Opera- tional Realities Minimal updates to user-facing production systems X X X X Application compliance X X X X Table 4.1: Goals of Odin and requirements to meet our operational CDN needs. No existing approach satisfies all the requirements. 4.4 Limitations of Existing Solutions This section describes how existing approaches fail to meet our requirements, summarized in Table 4.1. 1) Third-party measurements platforms provide insufficient measurement cov- erage of Microsoft users. Non-commercial measurement platforms such as Planetlab, MLab, Caida ARK, and RIPE Atlas have insufficient coverage, with only a few hundred to few thousand vantage points. The largest, RIPE Atlas, has vantage points in 3,589 IPv4 ASes [22], less than 10% of the number of ASes seen by Microsoft’s CDN on a standard weekday. 63 Commercial measurement platforms also lack sufficient coverage. Platforms including Dynatrace [13], ThousandEyes [24], and Catchpoint [9] offer measurements and alerting from cloud-based agents in tier 1 and “middle-mile” (tier 2 and tier 3) ISPs. Cedexis uses a different approach, providing customers with layer 7 measurements collected from users of Cedexis partner websites [10]. However, none of the platforms provides measurements from more than 45% of Microsoft client /24 networks. On top of missing over half the networks, the platform with the best coverage provides 10 + measurements a day from less than 12% of the networks and 100 + measurements a day from only 0.5% of them, not enough to meet Microsoft’s operational need for sensitivity to Internet events. 2) Layer 3 measurements from CDN infrastructure cannot provide repre- sentative coverage of the performance of Microsoft users. A CDN can issue measurements such as traceroutes and pings from its front-ends or datacenters to hosts across the Internet. For example, Entact measures the performance along different routes by issuing pings from servers in datacenters to responsive addresses in prefixes across the Internet [157]. One measurement technique used by Akamai is to traceroute from CDN servers to LDNSes to discover routers along the path, then ping those routers as a proxy for CDN to LDNS or end-user latency [50]. Verfploeter issues measurements from anycast servers to responsive addresses in /24s with ICMP probes and can determine the anycast catchment of a server by grouping responses by anycast site [63]. However, these measurements cannot provide a good understanding of user performance. Many destinations do not respond to these probes, so Entact was unable to find enough responsive addresses in the networks responsible for 74% of MSN traffic and Verfploeter [63] receives responses from only 55% of /24s. Similarly, previous work has shown that 45% of 64 LDNS do not respond to ICMP ping or to DNS queries from random hosts [92], and 40% of end users do not respond to ICMP probes [93]. Routers are more responsive than LDNS, with 85% responding to ping [86], but measurements to routers may not reflect a client’s application performance because ICMP packets may be deprioritized or rate-limited [139]. All of the above fail to exercise critical layer 7 behaviors including SSL/TLS and HTTP redirection and so techniques such as Verfploeter may not reflect client performance. 3) Layer 3 and DNS measurement from clients may not reflect user-perceived performance and do not provide sufficient coverage. Many systems perform layer 3 measurements from end user devices [54,64,106,133,143]. 1 These measurements are generally dropped by the strict network security policies of enterprise networks. Further, these measurements generally cannot be generated from in-browser JavaScript and instead require installing an application, keeping them from providing measurements from Microsoft’s many web users. 4) Server-side measurements of client connections can satisfy some but not all of our use cases. Google [72, 108, 154, 160], Facebook [135], Microsoft [52], and other content providers and CDNs collect TCP- and application-layer statistics on client connections made to their servers [36]. To measure between users and alternate PoPs or paths, CDNs use DNS or routing to direct a small fraction of traffic or client requests to alternate servers or paths. These measurements are useful for performance comparisons, and DNS redirection could steer some of the measurements to measurement servers hosted in external cloud providers. However, if a user cannot reach a server, the outage will not register in server-side measurements, and so these measurements cannot be used 1 Ono [54] and Netalyzr [106] also measure throughput. 65 to measure fine-grained availability. There are also several practical challenges with only using server-side measurements. While Table 4.1 shows that technically server-side measurements can be collected on external networks, there are a number of engineering and operational trade-offs that make client-side measurements a better solution for large content providers. The first is that measuring to external networks would mean hosting alternate front-ends on an external provider which immediately raises serious compliance and production concerns. The second issue is that doing A/B network testing with production traffic is considered too high risk with an enterprise customer base. 4.5 Design Decisions To meet our goals (Section 4.3) and overcome the limitations of other approaches (Section 4.4), Odin uses user-side, application-layer measurements of client connections, combining the explicit outage signaling and fault tolerance of user-side measurements (as with layer 3 measurements from users in Section 4.4) with the representative performance and coverage achieved by measuring client connections (as with server-side measurements in Section 4.4). Client-side active measurement from Microsoft users. Odin embeds a measure- ment client into some Microsoft thick clients and web applications. It directs measurement clients to fetch web objects. This approach helps achieve a number of our requirements. Odin issues measurements from Microsoft users, achieving coverage important to Microsoft’s businesses and (by issuing measurements at a rate similar to the use of Microsoft’s applications) sensitivity 66 to Internet events, even events that impact only a small fraction of users or connections. By embedding our measurement client into thick clients, Odin can issue measurements even from users unable to reach a Microsoft web server. Application layer measurements. Odin clients perform DNS resolutions and fetch web objects, measuring availability and timing of these application-layer actions and reporting the results to Odin. The clients can usehttp andhttps, allowing integration with Microsoft applications that require https. Unlike ping and traceroute, the measurements are compatible with enterprise networks that host many Microsoft services and users. These measurements capture the application-layer user performance that we care about, exercising mechanisms across the network stack that can impact performance and availability, including TLS/SSL, web caching, TCP settings, and browser choice. http and https measurements also provide status code errors that are useful for debugging. They also suffice to uncover user-LDNS associations [118], a key need for both our anycast and DNS redirection CDNs (Section 4.7). External services and smarter clients. We design the clients to conduct measure- ments and report results even when they cannot reach Microsoft services, as outage reports are some of the most valuable measurements and measurement-dependent operations must continue to function. To build this fault tolerance, clients that cannot fetch measurement configuration or report results fall back to using third-party CDNs for these operations. We use the third-party CDNs to proxy requests to Microsoft and to host static measurement configuration. 67 Flexible measurement orchestration and aggregation. We built a measurement orchestration system for Odin that supports parallel experiments with different config- urations, helping meet a variety of requirements. To accommodate the unique business constraints and compliance requirements of each application that Odin measures to or from, the system provides control over which endpoints an application’s users may be given to measure and which servers they upload reports to. When appropriate, experiments can measure to servers in external (non-Microsoft) networks, and clients conduct multiple measurements in a session to allow direct comparisons. By having clients fetch instructions on which active measurements to run, new experiments generally do not require changes to operational services or to clients, reducing operational risk. We also allow for flexibility in aggregation of the measurements (e.g., in 5 minute buckets) for faster upload to our real-time alerting system. 4.6 System Design Figure 4.1: Odin Architecture Overview: CDN clients download measurement config, perform measurements, and upload results. If first-party network sites are unreachable, third-party sites can cache config and relay upload requests. 68 Figure 4.1 outlines the Odin measurement process. A number of Microsoft applications embed the Odin client (Section 4.6.1). Odin clients in thick applications support a range of measurements. This work focuses on measuring latency and availability, our highest priorities, supported by thick and web applications. Step 1: The client uses a background process to fetch a measurement configuration from the Orchestration Service (Section 4.6.2). The configuration defines the type of measurements and the measurement endpoints. Measurement endpoints are Internet destinations which are the target of a measurement. We use endpoint in other contexts to refer to Internet destinations that are part of the Odin service. Step 2: The client issues the measurements. To measure latency and availability, endpoints host a small image on a web server for clients to download. Many Microsoft applications require https requests, so measurement endpoints have valid certificates. The endpoints can be in Microsoft front-ends, Microsoft data centers, or third-party cloud/collocation facilities. Step 3: When the client completes its measurements, it uploads the measurement results to a Report Upload Endpoint (Section 4.6.3). The Report Upload Endpoint forwards the measurements to Odin’s two analysis pipelines. Step 4: The real-time pipeline performs alerting and network diagnostics, and the offline pipeline enriches measurements with metadata for big data analysis (Section 4.6.4). 4.6.1 Client Measurements can vary along 3 dimensions: http or https, direct or DNS-based, and warm or cold. Figure 4.2 illustrates direct and DNS-based measurements. The first type 69 Figure 4.2: Odin supports two measurement types for latency. Measurement a measures the test domain directly. Measurement b contacts an Odin authoritative DNS first, which responds with the endpoint to measure. This gives Odin client-LDNS association for a measurement. has the client performing DNS resolution of test.domain2.com in a1 and fetching the image and recording the latency in a2. The measurement to <randid>.contoso.com is an example of the second type, which we refer to as a DNS-based measurement and which we use to measure web fetch latency and client-LDNS association. The domain (contoso.com) is one that we control. We design the clients to recognize the <randid> scheme and substitute in a unique, transient, random identifier $(RandID). 2 The client then issues a DNS request via the user’s LDNS for $(RandID).contoso.com (step b1). The DNS request goes to our authoritative DNS server, which returns a record for the endpoint Odin wants to measure (test.domain1.com) and logs its response associated with the $(RandID). The client then fetches http://test.domain1.com/tiny.gif. In 2 Generating the $(RandID) at the client rather than at the Orchestration Service lets caches serve the measurement configuration. 70 step c, the client reports its measurements, reporting the ID for the second measurement as $(RandID). The measurement back-end uses $(RandID) to join the client’s IP address with the DNS log, learning the user-LDNS association. The Orchestration Service can ask clients to perform “cold” and/or “warm” measure- ments. A cold measurement initiates a new TCP connection to fetch the image. A warm measurement fetches the image twice and reports the second result, which will benefit from DNS caching and from a warm TCP connection. 3 Web client vs. thick client measurements. Web clients default to measuring latency using the request start and end times in JavaScript, which is known to be imprecise [99]. If the browser supports the W3C resource-timing API [99], then the client reports that more precise measurement instead, along with a flag that signals that it used the more precise option. If the image fetch fails, the client reports the HTTP error code if one occurred, otherwise it reports a general failure error code. A limitation of in-browser measurements is that low-level networking errors are not exposed to JavaScript. For example, we cannot distinguish between a DNS resolution failure and a TCP connection timeout. Thick clients issue measurements through an Odin application SDK. Unlike web clients, the SDK can report specific low-level networking errors which are valuable in debugging. Odin thick clients have measurement feature parity with RIPE Atlas [22]. 4.6.2 Orchestration Service The Orchestration Service coordinates and dispatches measurements. It is a RESTful API service that Odin clients invoke to learn which measurements to perform. The service 3 The client prevents browser caching by appending a random parameter to the image request (e.g. tiny.gif?abcde12345). 71 returns a small JSON object specifying the measurements. In the rare case of major incidents with Odin or target Microsoft services, the Orchestration Service has the option to instruct the client to issue no measurements to avoid aggravating the issues. NumMeasurements: 3, MeasurementEndpoints: [ {type:1, weight:10, endpoint:"m1.contoso.com"}, {type:1, weight:20, endpoint:"m2.microsoft.com"}, {type:2, weight:30, endpoint:"m3.azure.com"}, {type:3, weight:10, endpoint:"m4.azure.com"}, {type:2, weight:30, endpoint:"m5.azure.com"}, {type:1. weight:15, endpoint:"m6.microsoft.com"}], ReportEndpoints: ["r1.azure.com","r2.othercdn.com"] Listing 1: Example measurement configuration that is served by the orchestration service to the client. Listing 1 shows an example configuration that specifies three measurements to be run against three out of six potential endpoints. The ability to specify more endpoints than measurements simplifies configurations that need to “spray” measurements to destinations with different probabilities, as is common in CDN A/B testing [48]. The client performs a weighted random selection of three endpoints. The other component of orchestration is the customized authoritative DNS server for DNS-based measurements (Section 4.6.1). When a client requests DNS resolution for a domain such as 12345abcdef.test.contoso.com, the DNS server responds with a random record to a target endpoint, with the random choice weighted to achieve a desired measurement distribution. Even a unique hostname used for client-LDNS mapping can generate multiple DNS queries. Our measurements reveal that 75% of unique hostnames result in multiple LDNS requests, and 70% result in requests from multiple LDNS IP addresses. If our authoritative 72 DNS returned different responses for a single hostname, we would be unable to determine from logs which target endpoint the client actually measured. To overcome this issue, we use consistent hashing to always returns the same response for the same DNS query. The Orchestration Service allocates measurements to clients based on experiments. An experiment has an Orchestration Service configuration that specifies the endpoints to be measured, which applications’ users will participate, and which report endpoints to use based on compliance requirements of the applications. Experiment owners configure endpoint measurement allocation percentages, and the Orchestration Service converts them into weights in the configuration. The Orchestration Service runs multiple experiments, and experiments may be added or removed at any time. The Orchestration Service allows Odin to tailor configurations to meet different measurement needs and use cases. For example, the service can generate specialized configuration for clients depending on their geography, connection type, AS, or IP prefix. When experimenting with CDN settings, we tailor Odin configurations to exercise the experimental settings from clients in a particular metropolitan area and ASN. When debugging a performance issue, we tailor configurations to target measurements to an endpoint experiencing problems. If the Orchestration Service is unavailable, proxies in third-party networks may be used instead. The proxies may act as reverse proxies for the first-party system. Alternatively, if the first-party system is unavailable, a fallback to a cached default configuration can be returned to clients. 73 4.6.3 Reporting Listing 1 shows that the measurement configuration returned by the Orchestration Service also specifies the primary and backup ReportEndpoints for the client to upload mea- surement results. ReportEndpoints are hosted across the 100+ front-ends of Microsoft’s first-party CDN. When a ReportEndpoint receives client measurement results, it forwards them to two Microsoft data pipelines, as shown in Figure 4.1. If for some reason the Microsoft CDN is unavailable, the client will fall back to using proxies hosted in third-party networks. The proxies forward to a set of front-ends that are not part of the primary set of front-ends. Fault-tolerant measurement reporting is necessary to support our requirement of an explicit outage signal, since we cannot measure the true availability of Microsoft’s first-party CDN if we also report measurements there. Odin’s fault-tolerant approach for fetching measurement configuration and uploading results will succeed if the backup reporting channel(s) use a path that avoids the failure and fail if both the primary and backup paths encounter the failure. As long as the client can reach a backup, and the backup can reach at least one of the Odin servers at Microsoft, Odin receives the result, tolerating all but widespread failures that are detectable with traditional approaches and are often outside of Microsoft’s control to fix. From operational experience, Odin’s handling of faults provides a high level of resilience for our measurement data. We now discuss Odin’s behavior in the face of three fault scenarios. We do not consider this an exhaustive treatment of all possible fault scenarios. 74 Figure 4.3: Three types of Internet faults that may occur when fetching measurement configuration or uploading reports. Figure 4.4: Topology of backup paths when FE 1 is unreachable. FE 1 is a front-end collo- cated with a PoP while FE DC is a front-end in a datacenter. Interconnection faults impact an individual link(s) between an end-user ISP and the CDN, caused by issues such as peering misconfiguration or congestion. Connectivity to other ISPs is not impacted. Figure 4.3(A) shows an interconnection fault between PoPs A and B. Figure 4.4 shows that, when these faults occur, the client will send a backup request using path 2,3 to Proxy 1. The proxy then forwards the request back to the CDN by path 3,4, through D, to a datacenter front-end FE DC instead of FE 1 . Front-end system faults are failures of a front-end due to software or hardware problems, as shown in Figure 4.3(B). Because the backup proxies connect to a distinct set of front-ends (hosted in datacenters), we gain resilience to front-end system faults, as seen in Figure 4.4. PoP-Level faults impact multiple ISPs and a large volume of traffic exchanged at that facility. These faults may be caused by a border router or middle-box misconfiguration or a DDoS attack. In our experience, these faults are rare and short-lived, and so we did not 75 design Odin to be resilient to them. Figure 4.4 shows that Proxy 1’s path to FE DC goes through the same PoP as the client’s path to FE 1 , whereas Proxy 2 avoids it. We present a preliminary evaluation of this scenario in Section 4.8.2. 4.6.4 Analysis Pipeline Measurement reports get sent to two analysis pipelines. Offline Pipeline. The measurement reports include a report ID, metadata about the client (version number of client software, ID of Microsoft application it was embedded in, whether it used the W3C Resource Timing API), and the measurement results, each of which includes a measurement ID, the measurement latency (or failure information), and whether it was a cold or warm measurement. The offline pipeline enriches measurements with metadata including the client’s LDNS and the user’s location (metropolitan region), ASN, and network connection type. This enriched data is the basis for most operational applications (Section 4.7). Real-time Alerting Pipeline. Many of the target endpoints that Odin monitors are business-critical so must react quickly to fix high latency or unavailability. To ensure fast data transfer to the back-end real-time data analytics cluster, each reporting endpoint reduces data volume by aggregating measurements within short time windows. It annotates each measurement with the client’s metropolitan region and ASN, using in-memory lookup tables. Within each window, it groups all measurements to a particular endpoint from clients in a particularhmetropolitan region, ASNi, then reports fixed percentiles of latency from that set of measurements, as well as the total number of measurements and the fraction of measurements that failed. 76 4.7 Supporting CDN Operations with Odin We use Odin to support key operational concerns of CDNs – performance and availability, plus CDN expansion/evolution and how it impacts the other concerns. The two CDNs we support have sizes of over a hundred sites (which is more than most CDNs) and few dozen sites (which is common for CDNs [25]). 4.7.1 Identifying and Patching Poor Anycast Routing Microsoft’s first-party CDN uses anycast (Section 4.2.1). Anycast inherits from BGP an obliviousness to network performance and so can direct user requests to suboptimal front-ends. We identify incidents of poor anycast routing in Microsoft’s anycast CDN by using Odin to measure performance of anycast and unicast alternatives from the same user. Our previous study used this methodology for a one-off analysis using measurements from a small fraction of Bing users [48]. Odin now continuously measures at a large scale and automatically generates daily results. As with our earlier work, we find that anycast works well for most–but not all–requests. The traditional approach to resolving poor anycast routing is to reconfigure route announcements and/or work with ISPs to improve their routing towards the anycast address. While Microsoft pursues this traditional approach, announcements can be difficult to tune, and other ISPs may not be responsive, and so we also patch instances of poor anycast performance using a hybrid scheme that we proposed (but did not implement) in our previous work [48]. The intuition is that both DNS redirection and anycast work well most of the time, but each performs poorly for a small fraction of users. DNS redirection 77 cannot achieve good performance if an LDNS serves widely distributed clients [50], and anycast performs poorly in cases of indirect Internet routing [48]. Since the underlying causes do not strongly correlate, most clients that have poor anycast performance can have good DNS redirection performance. We use Odin measurements to identify these clients, and a prominent Microsoft application now returns unicast addresses to the small fraction of LDNS that serve clients with poor anycast routing. 4.7.2 Monitoring and Improving Service Availability The first concern of most user-facing web services is to maintain high availability for users, but it can be challenging to quickly detect outages, especially those that impact only a small volume of requests. Odin’s design allows it to monitor availability with high coverage and fast detection. By issuing measurements from the combined user base of a number of services, it can detect issues sooner than any individual service. By having a single client session issue mea- surements to multiple endpoints, sometimes including an endpoint outside of Microsoft’s network, it can understand the scope of outages and differentiate client-side problems from issues with a client contacting a particular service or server. By providing resilient data collection even in the face of disruptions, Odin gathers these valuable measurements even from clients who cannot reach Microsoft services. Anycast introduces challenges to maintaining high availability. This section discusses how Odin helps address them. 78 4.7.2.1 Preventing Anycast Overload Monitoring a front-end’s ability to control its load. Previous work from our collaborators demonstrated how Microsoft’s anycast CDN prevents overload [74]. The approach works by configuring multiple anycast IP addresses and organizing them into a series of “rings” of front-ends. All front-ends are part of the largest ring, and then each subsequent ring contains only a subset of the front-ends in the previous one, generally those with higher capacity. The innermost ring contains only high capacity data centers. Each front-end also hosts an authoritative nameserver. If a front-end becomes overloaded, its authoritative nameserver “sheds” load by directing a fraction of DNS queries to a CNAME for the next ring. These local shedding decisions work well if anycast routes a client’s LDNS’s queries and the client’s HTTP requests to the same front-end, in which case the authoritative nameserver can shed the client’s requests. The previous work used measurements from Odin to evaluate how well HTTP and DNS queries correlate for each front-end [74], a measure of how controllable its traffic is. Odin now continuously measures the correlations and controllability of each front-end, based on its measurements of client-to-LDNS associations. Designing rings with sufficient controllability. We use Odin data on per front-end controllability to design anycast rings that can properly deal with load. The data feeds a traffic forecasting model that is part of our daily operation. The model predicts per front-end peak load, broken down by application, given a set of rings. Two scenarios can compromise a front-end’s ability to relieve its overload. First, the above approach sheds load at DNS resolution time, so it does not move existing 79 connections. This property is an advantage in that it does not sever existing connections, but it means that it cannot shed the load of applications with long-lived TCP connections. Second, if a front-end receives many HTTP queries from users whose DNS queries are not served from the front-end, it can potentially be overwhelmed by new connections that it does not control, even if it is shedding all DNS requests to a different ring. We use Odin measurements in a process we call ring tuning to proactively guard against these two situations. For the first, we use measurements to identify a high-correlation set of front-ends to use as the outermost anycast ring for applications with long-lived connections. The high-correlation allows a front-end that is approaching overload to quickly shed any new connections, both from the long-lived application and other applications it hosts on other anycast addresses. To guard against the second situation, we use measurements to design rings that avoid instances of uncontrollable load, and we configure nameservers at certain front-ends to shed all requests from certain LDNS to inner rings, to protect another front-end that does not control its own fate. 4.7.2.2 Monitoring the Impact of Anycast Route Changes on Availability Long-lived TCP connections present another challenge to anycast: an Internet route change can shift ongoing TCP connections from one anycast front-end to another, severing the connections [33–35,150]. Odin measurements address this concern in two ways. First, by having clients fetch an object from both an anycast address and a unicast address, we can monitor for problems with anycast availability. Second, we use Odin to monitor the availability of different candidate anycast rings in order to identify subsets of front-ends with stable routing. 80 Figure 4.5: In a regional anycast scenario, if a user’s LDNS is served by a front-end in the user’s region, the user’s performance is unaffected. If the user’s LDNS is served by a front-end in a different region, then the user will be served from the distant region, likely degrading performance. 4.7.3 Using Measurements to Plan CDN Evolution 4.7.3.1 Comparing Global vs Regional Anycast In a basic anycast design, all front-ends share the same global IP address. However, this address presents a single point of failure, and a routing configuration mistake at Microsoft or one of our peers has the potential to blackhole a large portion of our customer traffic. An alternate approach is to use multiple regional anycast addresses, each announced by only a subset of front-ends. Such an approach reduces the “blast radius” of a potential mistake, but it can also change the performance of the CDN. A user’s request can only end up at one of the front-ends that announces the anycast address given to its LDNS, which might prevent the request from using the best performing front-end . . . or prevent Internet routing from delivering requests to distant front-ends. Figure 4.5 shows the three scenarios that can occur when partitioning a global anycast into regions. A user’s LDNS may be served by the same front-end as the user or by a different one. If different, the front-ends may be assigned to the same or different regions. 81 If they are assigned to different regions, then the user will be directed away from its global front-end to a different one, likely degrading performance. In a use case similar to anycast ring tuning, we used Odin to collect data, then used a graph partitioning algorithm to construct regions that minimize the likelihood that a user and their LDNS are served by front-ends in different regions. We construct a graph where vertices represent front-ends and edges between vertices are weighted proportional to the traffic volume where one endpoint serves the DNS query and the other serves the HTTP response. We use an off-the-shelf graph partitioning heuristic package to define 5 regions, each with approximately the same number of front-ends, that minimizes the number of clients directed to distant regions. We compare the performance of regional versus global anycast in Section 4.8.3. 4.8 Evaluation and Production Results Odin has been running as a production service for 2 years. It has been incorporated into a handful of Microsoft applications, measuring around 120 endpoints. 4.8.1 Odin Improves Service Performance 4.8.1.1 Odin Patches Anycast Performance Here we summarize the results from Section 5.3 involving early Odin experiments with anycast. Anycast directed 60% of requests to the optimal front-end, but it also directed 20% of requests to front-ends that were more than 25ms worse than the optimal one. Today we use Odin measurements to derive unicast “patches” for many of those clients. 82 4.8.2 Using Odin to Identify Outages An outage example. Figure 4.6 visualizes Odin measurements showing an availability drop for Finnish users in AS1759 during a 24 hour period in 2017. The availability issue was between users in that ISP and a Microsoft front-end in Helsinki. Because Odin measures from many Microsoft users to many endpoints in Microsoft and external networks, it provides information that assists with fault localization. First, we can examine measurements from multiple client ISPs in the same region towards the same endpoint. For readability, we limit the figure to one other ISP, AS719, which the figure shows did not experience an availability drop to the front-end. So, the front-end is still able to serve some user populations as expected. Second, the figure indicates that AS1759 maintains high availability to a different endpoint in Microsoft’s network, a nearby datacenter. So, there is no global connectivity issue between Microsoft and AS1759. Last, the figure indicates that availability remains high between clients in AS1759 and an external network. The rich data from Odin allows us to localize the issue to being between clients in AS1759 and our Helsinki front-end. Reporting in the presence of failures. Odin successfully reports measurements despite failures between end-users and Microsoft. Figure 4.7 shows the fraction of results reported via backup paths for representative countries in different regions, normalized by the minimum fraction across countries (for confidentiality). During our evaluation period, there were no significant outages so the figure captures transient failures that occur during normal business operations. All countries show a strong diurnal pattern with peaks around midnight and valleys around 8 a.m. local time. Interestingly, the peaks 83 0 10 20 30 40 50 60 70 Hours 0.0 0.2 0.4 0.6 0.8 1.0 Availability AS1759-DC AS1759-Front-end AS1759-External AS719-DC AS719-Front-end AS719-External Figure 4.6: Debugging 2017 availability drop between Helsinki front-end and AS1759 users in Finland. of highest failover reports occur well outside of working hours, when Microsoft’s traffic volume is low. This is consistent with previous work which found that search performance degraded outside business hours, because of an increase in traffic from lower quality home broadband networks relative to traffic from well-provisioned businesses [52]. The percentage of reports uploaded through third-parties varies significantly across countries. For example, at peak, Brazil have 3x and 4x the percentage of backup reports as compared to Germany and Japan. Another distinguishing characteristic across countries is the large difference in range between peaks and valleys. India ranges from≈ 3× to ≈ 8× the baseline, Australia from≈ 2× to≈ 4×, and Japan from≈ 1× to≈ 2× Backup path scenarios. Backup proxies can only help Odin collect a measurement in the face of a failure if the proxyâĂŹs path to OdinâĂŹs ReportEndpoint avoids the failure. To demonstrate the potential problem, we allocated a small fraction of Odin measurements to use an alternate configuration in which uploads are reported to both primary ReportEndpoints and third-party proxies that forward traffic to primary 84 09:00 21:00 09:00 21:00 09:00 21:00 09:00 21:00 Local Time 1x 2x 3x 4x 5x 6x 7x 8x 9x Relative Difference in percent of reports from backup path United States Germany India Brazil Japan Australia Figure 4.7: Relative difference per hour in percentage of reports received through the backup path across four weekdays. ReportEndpoints. The third-party CDN has roughly the same number of front-end sites as Microsoft. Out of the 2.7 billion measurements collected globally over several days in January 2018, around 50% of reports were directed to the same front-end by both the third-party proxies and the primary reporting pathway, meaning that the reports could be lost in cases of front-end faults. To provide resilience to these sorts of failures, in actuality we configure backup proxies to forward report uploads to datacenter front-ends instead of front-ends colocated with PoPs. Fault-tolerance for PoP-level failures. Figure 4.3(C) shows an entire PoP failing. It is likely that the nearest front-end and nearest backup proxy to the end-user are also relatively close to each other. When the proxy forwards the request, it will likely ingress at the same failing PoP, even though the destination is different. To route around PoP-level faults, we want the end-user to send the backup request to a topologically distant proxy, such as Proxy 2 in Figure 4.4. The proxy will forward the 85 request through nearby CDN PoP F and avoid the failure. To test this, we configured two proxy instances in a popular cloud provider, on the East and West Coasts of the United States. These proxies forward requests to the set of front-ends collocated at Microsoft PoPs. We configured a load balancing service to send all requests to the East Coast instance by default, but with an exception to direct all requests from East Coast clients to the West Coast proxy. After collecting data globally for several days, we observed that only 3% of backup requests enter Microsoft’s network at the same PoP as the primary, as opposed to the 50% above. This prototype is not scalable with so few proxy instances, but demonstrates an approach to mitigate PoP-level faults that we will develop in a future release. 4.8.3 Using Odin to Evaluate CDN Configuration This section uses Odin to evaluate the performance impact of regional rings as compared to a global anycast ring (Section 4.7.3.1). The cross-region issue illustrated in Figure 4.5 still has the potential to introduce poor anycast performance, even though our graph partitioning attempts to minimize it. To measure the impact to end users, we configure an Odin experiment that compares the latency of the regional anycast ring with our standard “all front-ends” ring. Figure 4.8 shows that performance change at the median is relatively small – just about 2%. The 75th percentile consistently shows the highest percentage of degradation over time, fluctuating around 3%. While the median and 75th percentiles are stable over the five months, both 90th and 99th percentiles begin to trend higher in the starting in May, suggesting that static region performance may change over time at higher percentiles. 86 Figure 4.8: Latency degradation of 5-region vs. global anycast. 4.8.4 Evaluating Odin Coverage In this section we examine Odin’s coverage of Microsoft’s users as part of our requirement to cover paths between Microsoft users, Microsoft, and external networks. We will examine four applications which we have integrated with Odin. We have categorized them by their user base: Consumer, Enterprise, and General. The consumer user base is primarily made up of residential access networks using non-subscription services such as Bing. Enterprise is composed of users in business networks using paid Microsoft services and General is a mix of enterprise and consumer users. We first look at how much of Microsoft’s user base is covered by individual and combined applications. Figure 4.9 shows Consumer1, Consumer2, and Enterprise1 have similar percent coverage of Microsoft end users by AS. The benefit of multiple applications is more apparent when looking at /24 coverage. We see that all four applications combined cover 85% of /24s whereas individually all except for General1 cover much less. We also 87 Figure 4.9: Percent of measurement coverage based on 4 Odin-embedded applications with different user populations. examined the overlap between application coverage and found that the four properties only see around 42% pairwise overlap in /24 coverage, meaning that individual applications contribute a substantial amount of user diversity to Odin. General1 is the highest distinct contributor by providing about 18% of distinct /24s observed. Breaking down the coverage by “high revenue” (e.g. Reverse Proxy for Enterprise scenarios), “medium revenue” (e.g. Consumer email, ad-supported content) and “low revenue” (commodity caching workloads), we observe a higher /24 prefix coverage with Odin for “high revenue” (95%) compared to “medium” (91%) and “low” (90%). This suggests that the missing coverage of Odin is in the tail of relatively low-value traffic. 4.9 Conclusion CDNs are critical to the performance of large-scale Internet services. Microsoft operates two CDNs, one with 100+ endpoints that uses anycast and one for Azure-based services that uses DNS-redirection. This chapter describes Odin, our measurement system that 88 supports Microsoft’s CDN operations. These operations span a wide variety of use cases across first- and third-party customers, with clients spread out worldwide. Odin has helped improve the performance of major services like Bing search and guided capacity planning of Microsoft’s CDN. We believe that the key design choices we made in building and operating Odin at scale address the deficiencies of many prior Internet measurement platforms. Key points of Odin’s design support our thesis. First, Odin is deployed in Microsoft end-user applications to get complete coverage of Microsoft’s user population. Second, these Odin clients issue active measurements to enable coverage of diverse Internet paths needed for applications such as fine-grain outage detection. Last, to ensure we get data that covers Microsoft’s users, Odin prefers application layer measurements because they are rarely blocked by network security policy whereas traditional Internet measurement tools, such as layer 3 probes, often don’t work for enterprise networks. These aspects of Odin’s design enable Microsoft to understand the impact of choices such as measurement and use of diverse Internet paths. In Section 4.8.2, we examined how Odin is able to measure true availability by routing around failures on the Internet and Microsoft’s own network by using diverse paths. And from receiving diverse path measurements, we were able to quickly identify and mitigate an outage in Finland impacting a link between Microsoft and a single ISP in the Helsinki region. Odin’s design features enable support of Microsoft’s operations in ways that existing measurement systems could not, validating our thesis. 89 4.10 Supplemental Measurement Counts Let the number of measurements be n and the true failure rate be p. Analytically, the observed failure rate ˆ p is distributed as Bin(n,p)/n, so the average error is E [|ˆ p−p|] =E Bin(n,p) n −p . Figure 4.10, however, is generated computationally via Monte Carlo simulations. For example, to find the value described in the caption, we simulated a large number (10 7 ) of draws from the binomial distribution ˆ p∼Bin(n = 200,p = 0.01)/200, then found the average value of|ˆ p−p|≈ 54% 90 30 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 Measurement Counts 0 50 100 150 200 250 300 Average Error (%) 0.001% Failure Rate 0.010% Failure Rate 0.100% Failure Rate 1.000% Failure Rate Figure 4.10: The average error of observed failure rate, as a function of number of measurements and true failure rate. For example, if the true failure rate of a service is 1.0% (red dotted line), then a sample of 200 measurements would yield an average error of about 50%, i.e., 1.0± 0.5%. 91 Chapter 5 Examining CDN Redirection As a Design Decision 5.1 Introduction Low latency web services correlate with higher user satisfaction and service revenue. A central proposition of a CDN is that distributed front-ends can serve users over short distances, lowering latency, but deploying the front-ends alone does not suffice to achieve that goal. A CDN must also choose a redirection strategy, the protocol that directs users to a low latency front-end. The choice of redirection is critical to a CDN’s performance and operations. Anycast and DNS are two redirection strategies that are feasible for latency sensitive content delivery. With anycast, shortest path Internet routing delivers to client requests to the “nearest” CDN front-end, typically providing good performance [48] and operational simplicity. On the other hand, DNS-based redirection offers a higher level of control of front-end selection, but with greater operational complexity. Despite this, DNS has been the standard choice for CDN redirection since the early days of Akamai and remains the primary choice of large content providers such as Google and Facebook. In this chapter, we will use Odin to examine how DNS and anycast redirection impact CDN performance. We first present an approach for constructing a DNS latency map – the redirection control plane that decides to which front-end to direct a client, using Odin data. Ours is the first work to detail how DNS latency maps are constructed in 92 production for a large cloud provider, Microsoft Azure. At the time when these maps replaced a legacy system based on layer 3 measurements from CDN infrastructure, we observed at least a 10% latency improvement in high traffic volume countries. Second, we use our DNS mapping approach to provide the first performance comparison of DNS and anycast redirection for the same CDN infrastructure in the form of an Odin experiment. This is the first evaluation of DNS and anycast redirection performance using actual users and the infrastructure and peering of a global production CDN. We find that while anycast performs well most of the time, tail performance is poor but the duration of individual poor performing paths are short lived. This chapter supports our thesis by examining the trade-offs of these two redirection strategies, a critical design decision that all CDNs must make. For DNS latency maps, measurements must be continuously collected over time, over diverse paths to different Azure regions to adapt to ever-changing conditions on the Internet. Complete coverage of Microsoft users is needed to monitor all the LDNSes that Microsoft clients use. Then, to make a meaningful comparison of DNS redirection with anycast, we use the same population of Microsoft users. We use Odin’s ability to support diverse path measurements to exercise non-production anycast and DNS redirection control plane mechanisms on a production CDN deployment without fear of impacting production services. 93 5.2 A Client Centric Approach to CDN Latency Map Creation 5.2.1 Introduction One factor contributing to the additional complexity of a DNS-based CDN is that it must compute which front-end DNS record to respond based on some performance criteria and then programming those decisions into the CDN’s authoritative DNS. We refer to this set of decisions as a map [160]. The primary challenge in DNS-based redirection is to direct clients to a high perfor- mance front-end based on the client’s LDNS IP address, not the client’s IP address. While clients are often geographically and/or topologically nearby their LDNS, many are not. This means that the performance or location properties LDNSes are unrepresentative of the performance or location of the clients that they serve so relying on these properties for mapping can result in poor client performance. This has been referred to in previous work as the client-LDNS mismatch problem [91]. In this work, we describe and evaluate a modern approach to construct maps for DNS- based redirection called client-centric mapping (CCM). CCM avoids the shortcomings of existing approaches by removing assumptions about the geographic and topological relationship between LDNSes, their clients, and CDN front-ends. While many CDNs use DNS redirection, there has been very little information describing the process for creating latency maps. We are the first to describe and evaluate such an approach in detail and in use by a large content provider. Through the Odin measurement platform (Section 4), we collect continuous, large scale synthetic client-side measurements to front-ends, while at the same time tracking 94 which LDNS was used when a measurement was performed. Using these measurements, we first derive the set of clients served by an LDNS, then the client performance to front-ends, and finally compute the best performing front-end for the clients served by the LDNS. CCM removes the need to rely on inaccurate geolocation for LDNSes or active measurements toward shared Internet infrastructure such as LDNSes and routers as a proxy for end-user performance to front-ends. Our results show that CCM makes better mapping decisions than previous approaches, leading to improved CDN performance. An evaluation against a traditional geolocation- based approach shows that CCM performs much better, especially at high latency per- centiles. For example, 95th percentile CCM latency performance is better by 65ms half the time. We then describe the performance improvement from a production deployment of CCM to replace a legacy DNS-redirection mapping system. Our results show that 10 of the highest traffic volume countries saw latency improvements of 10% or more. 5.2.2 Existing Approaches In this section, we outline the previous approaches to building latency maps for DNS-based redirection. Latency-sensitive As future work, Facebook hints at a methodology similar to CCM in a Facebook post describing a system called Doppler [46]. The post proposes using standard techniques for client-to-LDNS mapping with unique DNS hostnames and associating them with latency measurements between end-users and datacenters (Section 5.2.3). Facebook has not made public any additional information on Doppler or the proposed future work. Google’s LatLong work [160] mentions that when periodically constructing latency maps, 95 the CDN identifies the client IP prefix associated with the LDNS. Maps are recomputed based on passive round-trip statistics under the assumption that “users are relatively close to their local DNS servers”. In Google’s Why High [108] paper, the authors state that “every so often” they direct an end-user to a random front-end and passively collect TCP metrics on the server-side. Measurements are grouped by the client IP addresses into routable prefixes received from BGP updates. To summarize, what is publicly known of Google’s map creation is that (1) they know the mapping between clients and LDNSes (2) occasionally clients are directed to random front-ends and (3) performance metrics are collected passively at the server-side. Location-based Location based DNS-redirection approaches respond record with the geographically closest front-end to an LDNS. In addition to the dubious quality of geolocation for Internet infrastructure [82], this approach assumes that LDNSes and end-users are geographically nearby, which is often not the case. Akamai showed that the distance between 15% of all client-LDNS pairs is around 1600 km [50] or more. 1 Since location-based solutions are static and agnostic to the complexities of Internet routing, the geographically closest front-end may not be the the lowest latency front-end, leading to suboptimal decisions [117,159]. EDNS client-subnet-prefix (ECS) While not itself an approach, ECS [60] exposes a portion of the end-users IP address to an authoritative DNS, allowing CDNs to make a client-specific redirection choice. Akamai showed that for clients public DNS resolvers, such as Google Public DNS and OpenDNS, the mean distance between clients and their 1 Approximation from Chen et al [50] Figure 11 96 servers was over 3,200km before deploying ECS. After deploying ECS, the mean distance dropped to around 400km [50]. While this result suggests that ECS has solved the problem of directing clients to distant (and hence high latency) front-ends, we show this is not the case. While Akamai’s work showed the mean client distance of “Public-Resolvers”, it left open the question of client distances to non-public resolvers. Figure 5.1 shows the CDF of mean client distances between Microsoft users and their LDNS. The geolocation data comes from commercial geolocation databases used at Microsoft. Using the methodology described in Section 3.3 of [50], the mean client distance for an LDNS is computed by first computing the centroid of an LDNS - the average location of all clients that it serves, and then finding the mean of the distances between all clients and the centroid. The “All client-LDNS mean distance” line shows similar results to what Akamai reports for all client-LDNS pairs. The “No-Public” line is the client-LDNS mean distances excluding Public DNS resolvers. This line shows that around 15% of client-LDNS pairs are slightly less than 1000 km or more from each other, meaning that large proximity between clients and their LDNS is not only a public resolver issue. Moreover, even after 6 years of production Internet deployments, adoption of ECS in LDNSes is almost exclusively limited to public resolvers. A one week snapshot from Microsoft Azure authoritative DNS resolvers in November 2018 reveals that a few more than 50 ISPs world-wide have adopted ECS. Google public DNS contributes nearly all of the total volume, with OpenDNS the second highest contributor. Interestingly, China has relatively wide adopted of ECS, hosting over 30% of all ECS enabled ISPs. 97 0 1000 2000 3000 4000 5000 Distance (km) 0.0 0.2 0.4 0.6 0.8 1.0 CDF All Client-LDNS mean distance No-Public Client-LDNS mean distance Figure 5.1: Distribution of average distance between Microsoft users and their LDNS Ping from CDN Infrastructure CDN operators may issue ICMP or layer 3 pings from front-ends to end-users or Internet infrastructure believed to be geographically co-located with end-users. The CDN’s DNS responds with the record of the front-end that had the lowest latency ping to the querying resolver. For example, one measurement technique used by Akamai is to traceroute from CDN servers to LDNSes to discover routers along the path, then ping those routers as a proxy for CDN to LDNS or end-user latency [50]. The first issue with this approach is probe responsiveness. As discussed in 4.4, previous work has shown that most LDNSes are unresponsive to probes from random Internet hosts [93]. The second issue is that even when LDNSes or other infrastructure are responsive, the problem remains that performance between a LDNS and a CDN is often not representative of the performance between an LDNS’s client and the CDN. DNS reflection Huang et al. proposed a method known as DNS reflection which uses the difference in response times to a series of CNAME redirects to estimate the latency between an LDNS and front-end [92]. This approach avoids the target responsiveness 98 issues of ping-based approaches, but still suffers from the assumption that clients and LDNS are co-located and explicitly state their work focuses on only on that set of clients. The previous approaches for latency map creation offer inadequate redirection per- formance for today’s competitive CDN and content-driven Internet. Major CDN and content providers such as Akamai, Google, and Facebook control a large portion of today’s Internet traffic using DNS-based redirection but none of their approaches have been made publicly available, to the detriment of Internet researchers. In the next section, we attempt to bridge this gap. Server-Side URL Rewriting The authoritative DNS servers of a CDN may direct clients to a sub-optimal front-end but a first-party CDN with applications tightly integrated with serving-infrastructure, can partially recover from a poor DNS redirection. Since the front-end will see the actual IP address of the end-user when serving content, it can rewrite the URLs of sub-resources (e.g. images in the index.html) to the optimal front-end. YouTube is one content provider observed using this technique [29]. Huang et al [91] proposed to solve the “client-LDNS mismatch” problem using hostname extension mechanism. A client first contacts a global traffic manager (GTM) web service to receive a gtm-id that is assigned to groups of clients with similar performance properties. The client then prepends the gtm-id to the service name (e.g. gtm-id.www.service.com) and during hostname resolution, the LDNS can then assign the appropriate front-end based on performance characteristics of the clients with that gtm-id. The extension could be implemented today through server-side URL rewriting, but this negates the decoupling of HTTP and DNS services the authors argue for while introducing an additional DNS 99 lookup. To be effective for latency sensitive services, this extension would require support in Web browsers. 5.2.3 Client-Centric Mapping Next, we describe CCM, a latency map creation approach for DNS-based redirection CDNs that is currently deployed in production traffic manager service at Microsoft Azure. To achieve low latency for users, we need to understand which use each LDNS and their performance to the various regions. Previous approaches suffer primarily from coverage issues and assumptions about the similarity between LDNSes and the clients they serve. The insight that serves the basis for CCM’s approach is that the physical and topological proximity between an LDNS and its clients is irrelevant. Only knowing the clients that an LDNS serves and the performance between the clients CDN front-ends are important. For example, if an LDNS is located in New York City and the set of clients it serves have lowest latency to a Los Angeles front-end, then the correct CDN redirection decision would be to respond with the front-end in Los Angeles. To accomplish this, we require measurements of (1) client-to-LDNS mapping to know the set of clients behind the LDNS and (2) sufficiently high measurement volume between those clients and CDN front-ends to determine that Los Angeles is the best performing front-end for those clients. Odin (Section 4) is a client-side measurement platform currently deployed at Microsoft which meets our measurement requirements. We have configured a continuous Odin experiment to measure Azure regions around the world from Microsoft users and to capture the client’s LDNS (Figure 4.2) used when performing each measurement. 100 Figure 5.2: (a) Client measurements to Azure regions are aggregated behind the LDNS that served them. (b) Per-LDNS, the Azure region with lowest median latency is selected as best. Map construction from the input Odin data is illustrated in Figure 5.2 a and b and works as follows: (1) Data Aggregation. We use the association of client to LDNS to group the measurements into each LDNS’ /26 prefix to each Azure region. We have that /26 balances precision because of of IP localization and statistical power from measurement aggregation. (2) Filtering. Next, we filter out LDNS-region pairs which do not have enough measurements. Our minimum threshold was chosen using statistical power analysis. If we filter the region that was lowest latency for the LDNS in the currently-deployed map, we do not update the map for the LDNS, to prevent us from making the best decision from a set of bad choices. (3) Ranking Results. For each LDNS, we rank the regions by median latency. At query resolution time, the traffic management authoritative DNS responds to an LDNS with the lowest latency region that is currently online. 101 (4) Applying the Overrides. The final step is to apply the per-LDNS changes to the currently deployed map, resulting in the new map. The map generation process takes care of prefix slicing, merging, and aggregation of neighboring prefixes with the same answer to produce a map with a small memory footprint. 5.2.4 Evaluation In this section, we perform two evaluations of CCM and show the performance benefits over existing approaches. Comparison with alternative DNS mapping techniques. A simple approach to generating a redirection map for a CDN is to use the locations of LDNSes. To test the performance difference between this and the CCM approach, we generated a map using a proprietary Microsoft geolocation database that aggregates locations from many sources. For every IPv4 address, we find the geographically closest front-end and choose that for the map entry. We aggregate neighboring IP addresses with the same map entry and convert this into a set of DNS records. We then configured Odin to measure both this map and the CCM map for 24 hours on Sept. 21, 2017. We bucketed measurements into 10-minute bins. For each bin, we calculated the latency differences at different percentiles. Figure 5.3 depicts a CDF over all 10 minute bins. Most of the time there is no real difference at the median. The difference is also small at the 75th percentile, although CCM is better around 65% of the time. The real improvement of using CCM comes at the 90th, 95th, and 99th percentile. At P95, CCM’s map is better by 65ms half the time. Client-Centric Mapping Production Deployment. 102 0 50 100 150 Global Latency Difference (geomap - Odin) (ms) 0.0 0.2 0.4 0.6 0.8 1.0 CDF of 10 minute Windows P50 P75 P90 P95 P99 Figure 5.3: Difference in global performance over one day between a CCM map and a map generated from LDNS locations (geomap). Values less than 0 show the fraction of time that the geomap performed better. In May 2017, the Azure traffic manager began directing production traffic using maps generated as described in Section 5.2.3, replacing a proprietary approach that combined geolocation databases with pings from CDN infrastructure. We evaluated the performance of the two maps by running an Odin experiment that had each client issue two measurements, one according to each map. Table 5.1 shows the latency change at the 75th and 95th percentile for the countries with the most measurements. Finland and Brazil saw negligible latency increases (1.56%, 0.68%) at the 75th percentile, but all other high traffic countries saw reductions at both percentiles, with a number improving by 20% or more. 103 Country P75 Imp. P95 Imp. Country P75 Imp. P95 Imp. Spain 30.68% 10.79% Switzerland 10.67% 22.18% Italy 29.92% 17.95% Netherlands 7.22% 24.94% Japan 28.14% 32.02% France 6.60% 18.14% Australia 20.05% 16.82% Norway 5.61% 14.93% Canada 19.17% 5.10% U.K. 4.44% 12.39% Sweden 14.14% 24.02% Germany 2.82% 5.49% U.S.A. 14.04% 8.81% Finland 1.56% 12.97% South Africa 13.97% 6.33% Brazil 0.68% 6.18% India 13.97% 6.08% Table 5.1: The performance improvement in the 75th and 95th percentile from a 2 month roll-out using the CCM mapping technique over May and June 2017. 5.2.5 Dispelling Mistaken Conventional Wisdom Prior work on CDN performance sometimes exhibited misconceptions about DNS redirec- tion, because operational practices were not transparent to the research community. We distill some takeaways from our work that contradict prior claims and elucidate realities of modern CDNs. • For many CDNs, measurements of user connections suffice as the key input to map generation, whereas previous work often describes mapping as a complex process requiring many different types of Internet measurements [148], including measurements from infrastructure to the Internet [87,108]. This reality is especially true for CDNs that host popular first-party services, as we have designed Odin to take advantage of the measurement flexibility available when in control of first-party services. • The geographic or network location of an LDNS does not impact the quality of redirection, even though redirection occurs at the granularity of an LDNS. Previous work claimed that redirection decisions were based on the location of or measurements to the LDNS [87], or that good decisions depending on users being near their LDNS [118,125,126]. 104 • It is both possible to measure which clients use each LDNS and necessary to do so to best map clients to front-ends. In reality, techniques for measuring associations between users and LDNS have been known for years [118], allowing decisions based on the performance of the users of an LDNS to various front-ends, which provides good performance as long as the users of an LDNS experience good performance from the same front-end as each other. • Most redirection still must occur on a per LDNS basis, even though EDNS client-subnet (ECS) enables user prefix-granularity decisions [1,50,60,87]. Our measurements reveal that, outside of large public resolvers, almost no LDNS operators have adopted ECS. To summarize, the client-LDNS mismatch problem [91] is not solved by ECS but construction of high-performance DNS latency maps can be a simple, robust process if the correct measurements are made. 105 5.3 Analyzing the Performance of an Anycast CDN 5.3.1 Introduction Content delivery networks are a critical part of Internet infrastructure. CDNs deploy front-end servers around the world and direct clients to nearby, available front-ends to reduce bandwidth, improve performance, and maintain reliability. We will focus on a CDN architecture which directs the client to a nearby front-end, which terminates the client’s TCP connection and relays requests to a backend server in a data center. The key challenge for a CDN is to map each client to the right front-end. For latency-sensitive services such as search results, CDNs try to reduce the client-perceived latency by mapping the client to a nearby front-end. CDNs can use several mechanisms to direct the client to a front-end. The two most popular mechanisms are DNS and anycast. DNS-based redirection was pioneered by Akamai. It offers fine-grained and near-real time control over client-front-end mapping, but requires considerable investment in infrastructure and operations [124]. Some newer CDNs like CloudFlare rely on anycast [11], announcing the same IP address(es) from multiple locations, leaving the client-front-end mapping at the mercy of Internet routing protocols. Anycast offers only minimal control over client-front-end mapping and is performance agnostic by design. However, it is easy and cheap to deploy an anycast-based CDN – it requires no infrastructure investment, beyond deploying the front-ends themselves. The anycast approach has been shown to be quite robust in practice [74]. 106 In this section, we aim to answer the questions: Does anycast direct clients to nearby front-ends? What is the performance impact of poor redirection, if any? To study these questions, we use data from Bing’s anycast-based CDN [74]. We instrumented the search stack so that a small fraction of search response pages carry a JavaScript beacon. After the search results display, the JavaScript measures latency to four front-ends– one selected by anycast, and three nearby ones that the JavaScript targets. We compare these latencies to understand anycast performance and determine potential gains from deploying a DNS solution. Our results paint a mixed picture of anycast performance. For most clients, anycast performs well despite the lack of centralized control. However, anycast directs around 20% of clients to a suboptimal front-end. When anycast does not direct a client to the best front-end, we find that the client usually still lands on a nearby alternative front-end. We demonstrate that the anycast inefficiencies are stable enough that we can use a simple prediction scheme to drive DNS redirection for clients underserved by anycast, improving performance of 15%-20% of clients. Like any such study, our specific conclusions are closely tied to the current front-end deployment of the CDN we measure. However, as the first study of this kind that we are aware of, the results reveal important insights about CDN performance, demonstrating that anycast delivers optimal performance for most clients. 5.3.2 Methodology Our goal is to answer two questions: 1) How effective is anycast in directing clients to nearby front-ends? And 2) How does anycast performance compare against the more 107 traditional DNS-based unicast redirection scheme? We experiment with Bing’s anycast- based CDN to answer these questions. The CDN has dozens of front end locations around the world, all within the same Microsoft-operated autonomous system. We use measurements from real clients to Bing CDN front-ends using anycast and unicast. In Section 5.3.6, we compare the size of this CDN to others and show how close clients are to the front ends. 5.3.3 Routing Configuration All test front-ends locations have both anycast and unicast IP addresses. Anycast: Bing is currently an anycast CDN. All production search traffic is current served using anycast from all front-ends. Unicast: We also assign each front-end location a unique /24 prefix which does not serve production traffic. Only the routers at the closest peering point to that front-end announce the prefix, forcing traffic to the prefix to ingress near the front-end rather than entering Microsoft’s backbone at a different location and traversing the backbone to reach the front-end. This routing configuration allows the best head-to-head comparison between unicast and anycast redirection, as anycast traffic ingressing at a particular peering point will also go to the closest front-end. 5.3.4 Data Sets We use both passive and active measurements in our study, as discussed below. 108 5.3.4.1 Passive Measurements Bing server logs provide detailed information about client requests for each search query. For our analysis we use the client IP address, location, and what front-end was used during a particular request. This data set was collected on the first week of April 2015 and represents many millions of queries. 5.3.4.2 Active Measurements To actively measure CDN performance from the client, we inject a JavaScript beacon into a small fraction of Bing Search results. After the results page has completely loaded, the beacon instructs the client to fetch four test URLs. These URLs trigger a set of DNS queries to our authoritative DNS infrastructure. The DNS query results are randomized front-end IPs for measurement diversity, which we discuss more in Section 5.3.5. The beacon measures the latency to these front-ends by downloading the resources pointed to by the URLs, and reports the results to a backend infrastructure. Our authoritative DNS servers also push their query logs to the backend storage. Each test URL has a globally unique identifier, allowing us to join HTTP results from the client side with DNS results from the server side [118]. The JavaScript beacon implements two techniques to improve quality of measurements. First, to remove the impact of DNS lookup from our measurements, we first issue a warm-up request so that the subsequent test will use the cached DNS response. While DNS latency may be responsible for some aspects of poor Web-browsing performance [31], in this work we are focusing on the performance of paths between client and front-ends. We set TTLs longer than the duration of the beacon. Second, using JavaScript to measure 109 the elapsed time between the start and end of a fetch is known to not be a precise measurement of performance [112], whereas the W3C Resource Timing API [99] provides access to accurate resource download timing information from compliant Web browsers. The beacon first records latency using the primitive timings. Upon completion, if the browser supports the resource timing API, then the beacon substitutes the more accurate values. We study measurements collected from many millions of search queries over March and April 2015. We aggregated client IP addresses from measurements into /24 prefixes because they tend to be localized [78]. To reflect that the number of queries per /24 is heavily skewed across prefixes [124], for both the passive and active measurements, we present some of our results weighting the /24s by the number of queries from the prefix in our corresponding measurements. 5.3.5 Choice of Front-ends to Measure The main goal of our measurements is to compare the performance achieved by anycast with the performance achieved by directing clients to their best performing front-end. Measuring from each client to every front-end would introduce too much overhead, but we cannot know a priori which front-end is the best choice for a given client at a given point in time. We use three mechanisms to balance measurement overhead with measurement accuracy in terms of uncovering the best performing choices and obtaining sufficient measurements to them. First, for each LDNS, we consider only the ten closest front-ends to the LDNS (based on geolocation data) as candidates to consider returning to the clients of that 110 LDNS. Recent work has show that LDNS is a good approximation of client location: excluding 8% of demand from public resolvers, only 11-12% of demand comes from clients who are further than 500km from their LDNS [50]. In Figure 5.4, we will show that our geolocation data is sufficiently accurate that the best front-ends for the clients are generally within that set. Second, to further reduce overhead, each beacon only makes four measurements to front-ends: (a) a measurement to the front-end selected by anycast routing; (b) a measurement to the front-end judged to be geographically closest to the LDNS; and (c-d) measurements to two front-ends randomly selected from the other nine candidates, with the likelihood of a front-end being selected weighted by distance from the client LDNS (e.g. we return the 3rd closest front-end with higher probability than the 4th closest front-end). Third, for most of our analysis, we aggregate measurements by /24 and consider distributions of performance to a front-end, so our analysis is robust even if not every client measures to the best front-end every time. To partially validate our approach, Figure 5.4 shows the distribution of minimum observed latency from a client /24 to a front-end. The labeled Nth line includes latency measurements from the nearest N front-ends to the LDNS. The results show decreasing latency as we initially include more front-ends, but we see little decrease after adding five front-ends per prefix, for example. So, we do not expect that minimum latencies would improve for many prefixes if we measured to more than the nearest ten front-ends that we include in our beacon measurements. 111 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 CDF of /24s Min Latency (ms) 9 front-ends 7 front-ends 5 front-ends 3 front-ends 1 front-end Figure 5.4: Diminishing returns of measuring to additional front-ends. The close grouping of lines for the 5th+ closest front-ends suggests that measuring to additional front-ends provides negligible benefit. 5.3.6 CDN Size and Geo-Distribution The results in this section (5.3) are specific to Bing’s anycast CDN deployment. Next, we characterize the size of the deployment, showing that our deployment is of a similar scale–a few dozens of front-end server locations–to most other CDNs and in particular most anycast CDNs, although it is one of the largest deployments within that rough scale. We then measure what the distribution of these dozens of front-end locations yields in terms of the distance from clients to the nearest front-ends. Our characterization of the performance of this CDN is an important first step towards understanding anycast performance. An interesting direction for future work is to understand how to extend 112 these performance results to CDNs with different numbers and locations of servers and with different interdomain connectivity [53]. We compare our CDN to others based on the number of server locations, which is one factor impacting CDN and anycast performance. We examine 21 CDNs and content providers for which there is publicly available data [25]. Four CDNs are extreme outliers. ChinaNetCenter and ChinaCache each have over 100 locations in China. Previous research found Google to have over 1000 locations worldwide [47], and Akamai is generally known to have over 1000 as well [50]. While this scale of deployment is often the popular image of a CDN, it is in fact the exception. Ignoring the large Chinese deployments, the next largest CDNs we found public data for are CDNetworks (161 locations) and SkyparkCDN (119 locations). The remaining 17 CDNs we examined (including ChinaNetCenter’s and ChinaCache’s deployments outside of China) have between 17 locations (CDNify) and 62 locations (Level3). In terms of number of locations and regional coverage, the Bing CDN is most similar to Level3 and MaxCDN. Well-known CDNs with smaller deployments include Amazon CloudFront (37 locations), CacheFly (41 locations), CloudFlare (43 locations) and EdgeCast (31 locations). CloudFlare, CacheFly, and EdgeCast are anycast CDNs. To give some perspective on the density of front-end distribution, Figure 5.5 shows the distance from clients to nearest front-ends, weighted by client Bing query volumes. The median distance of the nearest front-end is 280 km, of the second nearest is 700 km, and of fourth nearest is 1300 km. 113 0 0.2 0.4 0.6 0.8 1 64 128 256 512 1024 2048 4096 8192 CDF of Clients weighted by query volume Client Distance to Nth Closest Front-end (km) 1st Closest 2nd Closest 3rd Closest 4th Closest Figure 5.5: Distances in kilometers (log scale) from volume-weighted clients to nearest front-ends. 5.3.7 Anycast Performance We use measurements to estimate the performance penalty anycast pays in exchange for simple operation. Figure 5.6 is based on millions of measurements, collected over a period of a few days, and inspired us to take on this project. As explained in Section 5.3.2, each execution of the JavaScript beacon yields four measurements, one to the front-end that anycast selects, and three to nearby unicast front-ends. For each request, we find the latency difference between anycast and the lowest-latency unicast front-end. Figure 5.6 shows the fraction of requests where anycast performance is slower than the best of the three unicast front-ends. Most of the time, in most regions, anycast does well, performing as well as the best of the three nearby unicast front-ends. However, anycast is at least 25ms slower for 20% of requests, and just below 114 10% of anycast measurements are 100ms or more slower than the best unicast for the client. Note that this is not an upper bound: to derive that, we would have to poll all front- ends in each beacon execution, which is too much overhead. There is also no guarantee that a deployed DNS-based redirection system will be able to achieve the performance improvement seen in Figure 5.6 – to do so the DNS-based redirection system would have to be practically clairvoyant. Nonetheless, this result was sufficiently tantalizing for us to study anycast performance in more detail, and seek ways to improve it. Examples of poor anycast routes: A challenge in understanding anycast performance is figuring out why clients are being directed to distant or poor performing edges front- ends. To troubleshoot, we used the RIPE Atlas [22] testbed, a network of over 8000 probes predominantly hosted in home networks. We issued traceroutes from Atlas probes hosted within the same ISP-metro area pairs where we have observed clients with poor performance. We observe in our analysis that many instances fall into one of two cases. 1) BGP’s lack of insight into the underlying topology causes anycast to make suboptimal choices and 2) intradomain routing policies of ISPs select remote peering points with our network. In one interesting example, a client was roughly the same distance from two border routers announcing the anycast route. Anycast chose to route towards router A. However, internally in our network, router B is very close to a front-end C, whereas router A has a longer intradomain route to the nearest front-end, front-end D. With anycast, there is no way to communicate [137] this internal topology information in a BGP announcement. 115 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 CCDF of Requests Performance difference between anycast and best unicast (ms) Europe World United States Figure 5.6: The fraction of requests where the best of three different unicast front-ends out- performed anycast. Several other examples included cases where a client is nearby a front-end but the ISP’s internal policy chooses to hand off traffic at a distant peering point. Microsoft intradomain policy then directs the client’s request to the front-end nearest to the peering point, not to the client. Some examples we observed of this was an ISP carrying traffic from a client in Denver to Phoenix and another carrying traffic from Moscow to Stockholm. In both cases, direct peering was present at each source city. Intrigued by these sorts of case studies, we sought to understand anycast performance quantitatively. The first question we ask is whether anycast performance is poor simply because it occasionally directs clients to front-ends that are geographically far away, as was the case when clients in Moscow went to Stockholm. 116 Does anycast direct clients to nearby front-ends? In a large CDN with presence in major metro areas around the world, most ISPs will see BGP announcements for front-ends from a number of different locations. If peering among these points is uniform, then the ISP’s least cost path from a client to a front-end will often be the geographically closest. Since anycast is not load or latency aware, geographic proximity is a good indicator of expected performance. Figure 5.7 shows the distribution of the distance from client to anycast front-end for all clients in one day of production Bing traffic. One line weights clients by query volume. Anycast is shown to perform 5-10% better at all percentiles when accounting for more active clients. We see that about 82% of clients are directed to a front-end within 2000 km while 87% of client volume is within 2000 km. The second pair of lines in Figure 5.7, labeled “Past Closest”, shows the distribution of the difference between the distance from a client to its closest front-end and the distance from the client to the front-end anycast directs to. About 55% of clients and weighted clients have distance 0, meaning they are directed to the nearest front-end. Further, 75% of clients are directed to a front-end within around 400 km and 90% are within 1375 km of their closest. This supports the idea that, with a dense front-end deployment such as is achievable in North America and Europe, anycast directs most clients to a relatively nearby front-end that should be expected to deliver good performance, even if it is not the closest. 117 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 64 128 256 512 1024 2048 4096 8192 CDF Distance (km) Weighted Clients Past Closest Clients Past Closest Weighted Clients to Front-end Clients to Front-end Figure 5.7: The distance in kilometers (log scale) between clients and the anycast front-ends they are directed to. From a geographic view, we found that around 10-15% of /24s are directed to distant front-ends, a likely explanation for poor performance. 2 Next we examine how common these issues are from day-to-day and how long issues with individual networks persist. Is anycast performance consistently poor? We first consider whether significant fractions of clients see consistently poor performance with anycast. At the end of each day, we analyzed all collected client measurements to find prefixes with room for improvement over anycast performance. For each client /24, we calculate the median latency between the prefix and each measured unicast front-end and anycast. 2 No geolocation database is perfect. A fraction of very long client-to-front-end distances may be attributable to bad client geolocation data. 118 0 0.2 0.4 0.6 0.8 1 04/04 04/11 04/18 04/25 Fraction of /24s Date all > 10ms > 25ms > 50ms > 100ms Figure 5.8: Daily poor-path prevalence during April 2015 showing what fraction of client /24s see different levels of latency improvement over anycast when directed to their best performing unicast front-end. Figure 5.8 shows the prevalence of poor anycast performance each day during April 2015. Each line specifies a particular minimum latency improvement, and the figure shows the fraction of client /24s each day for which some unicast front-end yields at least that improvement over anycast. On average, we find that 19% of prefixes see some performance benefit from going to a specific unicast front-end instead of using anycast. We see 12% of clients with 10ms or more improvement, but only 4% see 50ms or more. Poor performance is not limited to a few days–it is a daily concern. We next examine whether the same client networks experience recurring poor performance. How long does poor performance persist? Are the problems seen in Figure 5.8 always due to the same problematic clients? 119 0 0.2 0.4 0.6 0.8 1 1 5 10 15 CDF of Client /24s Number of Days Max # of Consecutive Days # Days Figure 5.9: Poor path duration across April 2015. We consider poor anycast paths to be those with any latency inflation over a unicast front-end. Figure 5.9 shows the duration of poor anycast performance during April 2015. For the majority of /24s categorized as having poor-performing paths, those poor-performing paths are short-lived. Around 60% appear for only one day over the month. Around 10% of /24s show poor performance for 5 days or more. These days are not necessarily consecutive. We see that only 5% of /24s see continuous poor performance over 5 days or more. These results show that while there is a persistent amount of poor anycast performance over time, the majority of problems only last for a single day. Next we look at how much of poor performance can be attributed to clients frequently switching between good and poor performing front-ends. 120 Front-end Affinity: Recurrent front-end selection changes for user over time may indicate route stability issues which can lead to anycast performance problems. We refer to how “attached” particular clients are to a front-end as front-end affinity. In this section, we analyze our passive logs. Figure 5.10 shows the cumulative fraction of clients that have switched front-ends at least once by that time of the week. Within the first day, 7% of clients landed on multiple front-ends. An additional 2-4% clients see a front-end change each day until the weekend, where there is very little churn, less than .5%. This could be from network operators not pushing out changes during the weekend unless they have to. From the weekend to the beginning of the week, the amount of churn increases again to 2-4% each day. Across the entire week, 21% of clients landed on multiple front-ends, but the vast majority of clients were stable. We discuss potential solutions to this more at the end of Section 5.3.8. We observe that the number of client front-end switches is slightly higher in a one day snapshot compared to the 1.1-4.7% reported in previous work on DNS instance-switches in anycast root nameservers [59, 115]. A likely contributing factor is that our anycast deployment is around 10 times larger than the number of instances present in K root name server at the time of that work. Figure 5.11 shows the change in the client-to-front-end distance when the front-end changes. This shows that when the majority of clients switch front-ends, it is to a nearby front-end. This makes sense given the CDN front-end density in North America and Europe. The median change in distance from front-end switches is 483 km while 83% are within 2000 km. 121 0 0.2 0.4 0.6 0.8 1 Wed Thu Fri Sat Sun Mon Tue Cumulative Fraction of Clients Day of the Week Figure 5.10: The cumulative fraction of clients that have changed front-ends at least once by different points in a week We saw in this section that most clients show high front-end-affinity, that is, they continue going to the same front-end over time. For the clients that do switch front-ends, there is a long tail of distance between a client and switched pairs of front-ends. 5.3.8 Addressing Poor Performance The previous section showed that anycast often achieves good performance, but sometimes suffers significantly compared to unicast beacon measurements. However, the ability for unicast to beat anycast in a single measurement does not guarantee that this performance is predictable enough to be achievable if a system has to return a single unicast front-end to a DNS query. If a particular front-end outperformed anycast in the past for a client, 122 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 64 128 256 512 1024 2048 4096 8192 CDF of Front-end Changes Change in client-to-front-end distance when the front-end changes (km) Figure 5.11: The distribution of change in client-to-front-end distance (log scale) when the front-end changes, for the 7% of clients that change front-end throughout a day. will it still if the system returns that front-end next time? Additionally, because of DNS’s design, the system does not know which client it is responding to, and so its response applies either to all clients of an LDNS or all clients in a prefix (if using ECS). Can the system reliably determine front-ends that will perform well for the set of clients? We evaluate to what degree schemes using DNS and ECS can improve performance for clients with poor anycast performance. We evaluate (in emulation based on our real user measurements) a prediction scheme that maps from a client group (clients of an LDNS or clients within an ECS prefix) to its predicted best front-end. It updates its mapping every prediction interval, set to one day in our experiment. 3 The scheme chooses 3 We cannot make predictions at finer timescales, as our sampling rate was limited due to engineering issues. 123 to map a client group to the lowest latency front-end across the measurements for that group, picking either the anycast address or one of the unicast front-ends. We evaluate two prediction metrics to determine the latency of a front-end, 25th percentile and median latency from that client group to that front-end. We choose lower percentiles, as analysis of client data showed that higher percentiles of latency distributions are very noisy (we omit detailed results due to lack of space). This noise makes prediction difficult, as it can result in overlapping performance between two front-ends. The 25th percentile and median have lower coefficient of variation, indicating less variation and more stability. Our initial evaluation showed that both 25th percentile and median show very similar performance as prediction metrics, so we only present results for 25th percentile. We emulate the performance of such a prediction scheme using our existing beacon measurements. We base the predictions on one day’s beacon measurements. For a given client group, we select among the front-ends with 20+ measurements from the clients. We evaluate the performance of the prediction scheme by comparing against the performance observed in next day’s beacon measurements. We compare 50th and 75th anycast performance for the group to 50th and 75th performance for the predicted front- end. The Bing team routinely uses 75% percentile latency as an internal benchmarks for a variety of comparisons. Next, we evaluate prediction using both ECS and LDNS client grouping. Prediction using EDNS client-subnet-prefix: The ECS extension [60] enables precise client redirection by including the client’s prefix in a DNS request. Our prediction scheme is straightforward: we consider all beacon measurements for a /24 client network and choose the front-end according to the prediction metrics. 124 The “EDNS-0” lines in Figure 5.12 depict, as a distribution across clients weighted by query volume, the difference between performance to the predicted front-end (at the 50th and 75th percentile) and the performance to the anycast-routed front-end (at the same percentiles). Most clients see no difference in performance, in most cases because prediction selected the anycast address. For the nearly 40% of queries-weighted prefixes we predict to see improvement over anycast, only 30% see a performance improvement over anycast, while 10% of weighted prefixes see worse performance than they would with anycast. LDNS-based prediction: Traditionally, DNS-based redirection can only make decisions based on a client’s LDNS. In this section, we estimate to what degree LDNS granularity can achieve optimal performance when anycast routing sends clients to suboptimal servers. We construct a latency mapping from LDNS to each measured edge by assigning each front-end measurement made by a client to the client’s LDNS, which we can identify by joining our DNS and HTTP logs based on the unique hostname for the measurement. We then consider all beacon measurements assigned to an LDNS and select the LDNS’s best front-end using the prediction metrics. In the page loads in our experiment, public DNS resolvers made up a negligible fraction of total LDNS traffic so their wide user base have an insignificant impact on results. The “LDNS” lines in Figure 5.12 show the fraction of /24 client networks that can be improved from using prediction of performance based on an LDNS-based mapping. While we see improvement for around 27% of weighted /24s, we also pay a penalty where our prediction did poorly for around 17% of /24s. 125 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -400 -300 -200 -100 0 100 200 300 400 CDF of Weighted /24s Improvement (ms) EDNS-0 Median EDNS-0 75th LDNS Median LDNS 75th Figure 5.12: Improvement over anycast from making LDNS or ECS-based decisions with prediction using 25th percentile prediction metric. Negative x-axis values show where anycast was better than our prediction. Values at 0 show when we predicted anycast was the best performing. Positive x-axis values show our improvement. 5.3.9 Conclusion In this section we studied the performance of a large anycast-based CDN, and evaluated whether it could be improved by using a centralized, DNS-based solution. We found that anycast usually performs well despite the lack of precise control, but that it directs≈ 20% of clients to a suboptimal front-end. We demonstrated that a simple prediction scheme may allow DNS redirection to improve performance for some of the clients that see poor anycast performance. 126 5.4 Conclusion Section 5.2 first supports our thesis by examining a method for DNS latency map creation that uses client-side active measurements from Odin to probe diverse paths from users to Azure regions. This approach greatly improves over existing approaches by providing complete coverage of Microsoft’s users, their LDNSes, and the association between the two. The result of our precise user-LDNS association was 10% or more reduction in latency for Azure customers in high traffic volume countries. Second, Section 5.3 supports our thesis by showing how, with complete coverage of real Microsoft’s users issuing measurements to diverse paths selected by DNS or anycast, we can effectively evaluate the performance differences in these two redirection strategies that CDNs choose between. 127 Chapter 6 Literature Review 6.1 Mapping CDN Infrastructure Many previous studies have explored mapping of CDN serving infrastructure. Huang et al. [94] map two popular content delivery networks, Akamai and Limelight, by enumerating their front-ends using a quarter of a million open LDNS resolvers. They geolocate and cluster front-ends using a geolocation database and also use the location of the penultimate hops of traceroutes to front-ends. This work is closely related to this thesis in its focus on contrasting the difference in deployment design decisions made by Akamai and Limelight, where Akamai deploys many front-ends deep into ISP networks, and Limelight deploys front-ends in select locations in major metro areas. While this work was influential in pointing out the importance of contrasting these design decisions, the execution was found to contain methodological flaws [2]. Ager et al. [32] evaluate web hosting structures as a whole. They asked volunteers from IMC 2010 to run software that probes content from the Alexa Top 2000 list to collect IP addresses serving the sites and all embedded content. Volunteer participation resulted in 144 trace files after data cleaning. They use MaxMind [120] to geolocate IPs to the country level and use feature-based clustering to separate front-ends belonging to different hosting infrastructures. Their results reveal that companies such as Google and Akamai host at least some content in a vast number of popular websites. They also show that the 128 largest amount of content continues to be served from North America, regardless of the user’s continent. Torres et al. [145] use a small number of vantage points in the US and Europe, and constraint-based geolocation to approximately geolocate serving sites in the YouTube CDN, with the aim of understanding video server selection strategies in YouTube’s (then recently) revamped architecture. The authors find that individual datacenters repeatedly serve the same users and that low RTT between users and YouTube datacenters is the primary determinant for server selection. In around 10% of cases, non-preferred datacenters served videos. A significant limitation of this work is that it only uses five vantage points to study YouTube. Adhikari et al. [28] use open DNS resolvers and PlanetLab to enumerate YouTube servers with the aim of reverse-engineering the caching hierarchy and logical organization of YouTube infrastructure using DNS namespaces. A portion of server hostnames contain city-level hints that provide geolocation. For servers without geohinted hostnames, the authors use a technique similar to GeoPing [127]. Xue, Choffnes, and Wang look at CDN deployments in China and find that Chinese CDNs rely on static mapping between users and front-ends, in contrast to CDNs outside of China which dynamically adjust traffic in response to network conditions [153]. This work relies on only three vantage points for their study which makes it difficult to know how the results hold for the general Chinese user population. All of the aforementioned previous work suffers from two issues: (1) a limited number of vantage points yielding incomplete coverage of infrastructure and (2) short study durations which fail to uncover changes in infrastructure over time. Our work has complete coverage 129 of infrastructure and end-users, and we explore an expansion of Google’s network over a multi-year study. Researchers have mapped the Netflix CDN, OpenConnect, by identifying a common URL scheme shared across Netflix content hostnames, made up of segments such as airport code, IP protocol, and network card [44]. The authors find that Netflix has a well-documented and consistent naming scheme for their OpenConnect infrastructure. By resolving the URLs generated by all combinations of segments, the authors were able to uncover the OpenConnect deployment of 500 locations worldwide. This work is similar to our Mapping Google (Chapter 3) work in that the enumeration of CDN infrastructure is complete, but differs from our work because it lacks longitudinal analysis to show changes in CDN infrastructure over time. Wohlfart et al. describe Akamai’s serving infrastructure in detail [152]. They describe four different types of deployment which vary by the type of hosting ISP (eyeball, transit) and the kind of facility (PoP, IXP). These different deployment settings combine to make up Akamai’s diverse connectivity fabric. Some of Akamai’s connectivity is visible to Internet researchers as explicit peering between their network and others (e.g. public BGP listeners). Akamai also has implicit peering that it inherits from their infrastructure deployed in eyeball and transit networks. This work is similar to ours in that Akamai has total coverage of their deployment, but, unlike Akamai, we do not examine interconnection in Chapter 3. Another difference is that, whereas Akamai used their proprietary data and full knowledge of their network for their results, our Mapping Google (Chapter 3) work uncovered all of Google’s serving infrastructure, with accurate geolocation, using new measurement techniques and publicly available data. 130 In addition to our work of mapping Google’s serving infrastructure, simultaneous work appearing at the same conference also used EDNS-client-subnet to expose CDN infrastructure [141]. While our work focuses on characterizing Google’s rapid expansion, including geolocating and clustering front-ends, that work addresses complementary issues including measuring other EDNS-client-subnet-enabled infrastructures. Our results differ from that work, as our work exposes 30% more /24 prefixes and 12% more ASes hosting front-ends that are actively serving Google search during the same period. We believe our additional coverage results from our more frequent mapping and accumulation of servers over time, since a single snapshot may miss some infrastructure (see Table 3.3). Fan et al. [69] study the dynamics of client to CDN site mapping for Google and Akamai. For Google, they build upon our “Mapping Google” work (Chapter 3) by using EDNS-client-subnet measurements to measure changes between user prefixes and which site Google’s DNS redirection sends them to. To measure Akamai, they use open DNS resolvers. While our work focuses on mapping CDN infrastructure with complete coverage, this work uses our measurements to tackle a different problem of understanding how CDNs change end-user to site mapping. In comparison to the daily measurements we perform over 3 1 ⁄2 years, this work issues measurements every 15 minutes (over a one month) because more frequent measurements are required to capture short-timescale redirection decisions. The EDNS-client-subnet extension we use was designed to permit serving infrastruc- tures to more accurately direct clients to serving sites in these cases. Otto et al. [125] examine the end-to-end impact that different DNS services have on CDN performance. It is the first work to study the potential of EDNS-client-subnet to address the client CDN mapping problem, using the extension as intended. Some in the operations community 131 also independently recognized how to use EDNS-client-subnet to enumerate a CDN’s servers, although these previous measurements presented just a small-scale enumeration without further investigation [119]. Fury Route leverages EDNS-client-subnet enabled CDNs to estimate network distance between arbitrary hosts on the Internet [75]. Kintis et al. investigate the privacy implications of EDNS-client-subnet and find it facilitates mass surveillance and offers a new vector for targeted DNS poisoning attacks [105]. This work all shares a common thread of using EDNS-client-subnet for Internet measurement but does not repurpose EDNS to map infrastructure, as we do. Several other pieces of work are tangentially related to ours. Su et al. investigate whether Akamai client redirections reveal network conditions on paths between Akamai and clients [142]. Choffnes and Bustamante describe an approach for reducing cross-ISP P2P traffic by finding clients directed to similar CDN front-ends [54]. Both previous works exploit the observation that two clients directed to the same or nearby front-ends are likely to be geographically close. Our work uses this observation to geolocate front-ends. 6.2 Measuring CDN Performance There has been considerable prior work on improving and evaluating general CDN perfor- mance. WISE [144] is a what-if scenario tool that predicts the deployment implications of new infrastructure in CDNs by using machine learning. WISE relies on discovering causal relationships of variables impacting CDN performance from passive datasets, so can only make predictions based on variables observed in data. WISE can answer questions such as 132 “What happens if users in India are directed to a front-end in Taiwan?” because the passive dataset captured some Indian users historically directed to Taiwan. Odin does not support users specifying what-if scenarios but allows measurement of arbitrary endpoints on the Internet to enable what-if scenario analysis, even for variables not part of production environments (Section 4.7.3.1). WhyHigh [108] and LatLong [160] focus on automating troubleshooting at Google. WhyHigh uses passive server-side measurements to identify prefixes with persistently inflated RTTs. The system then issues traceroutes from front-ends and then pings interme- diate routers to narrow down the issue. LatLong uses similar server-side measurements to diagnose performance changes which may be caused by changes in inter-domain routing or front-end selection. Troubleshooting is a single application of Odin’s data. Like WhyHigh, Odin rich clients can issue traceroutes but do so from clients instead of front-ends. Client- side measurements provide Odin a considerable advantage for troubleshooting (Section 4.8.2) in that it can issue traceroutes to arbitrary destinations on the Internet, not just the paths between users and the CDN. LatLong lacks this advantage, as it limits itself to passive server-side measurements. Entact [157], EdgeFabric [135], and Espresso [154] measure the quality of egress paths from a CDN to the Internet. Entact describes a technique to measure alternative paths to Internet prefixes by injecting specific routes (/32) at border routers to force egress traffic through a particular location. A collection of “pinger” machines deployed in datacenters utilize these paths to target IPs within a prefix. Entact found that most of its target IPs (74%) were unresponsive, motivating Odin’s design to favor client-side application layer measurements. EdgeFabric directs a small percent of user traffic through alternative egress 133 paths to measure alternate path performance. While this approach may be suitable for some business models, Microsoft primarily serves enterprise customers where performing network experiments with production enterprise traffic from paying customers is an unnecessary risk. Also, this approach limits measurement only to CDN infrastructure so is unable to support the diverse measurements needed for quick troubleshooting and experimentation. Prior work has also explored cooperation between ISPs and CDNs. ISPs may release distance maps to CDNs to enable accurate client to server mappings [130], and ISPs can host CDN servers on demand [76]. While this line of work improves performance through a collaborative exchange of data between ISPs and CDNs, Odin’s approach has advantages. Odin removes the overhead of collaboration across various institutions by using Microsoft’s large customer base to measure network conditions between users and Microsoft’s CDN. There has been much work on client-side measurement platforms. Academic platforms such as Fathom [64], Netalyzr [106], Ono [54], Via [101], and Dasu [133] are thick client applications that run measurements from end-user machines. BISmark [143] measures from home routers. Akamai collects client-side measurements using their Media Analytics Plugin [109] and peer-assisted NetSession [50,158] platform. This work is similar to Odin in its use of client-side measurements, but, unlike Odin which can cover all of Microsoft’s users, they are limited in their adoption and coverage, especially in enterprise networks where only business relevant applications are permitted. Even Akamai is limited because they have popular end-user applications coupled with their CDN, unlike Microsoft, Facebook, Google, and Netflix. 134 Geoff Huston explored measuring IPv6 and DNS with client-side measurements by embedding Flash/HTML5 web clients in Google Ads [80, 81]. These web clients are equivalent to measuring latency using JavaScript beacons, which is a well-established technique [30,48] that we utilize in Odin web clients. Of commercial measurement platforms, Cedexis [10] is the closest to Odin. Cedexis partners with popular websites with large user bases such as LinkedIn and Tumblr that embed Cedexis’ measurement JavaScript beacon into their page. Cedexis customers register their endpoints to be measured by a portion of end-users of Cedexis’ partners. In this way, a customer collects measurements to their endpoints from a relatively large user base. While Cedexis offers much better coverage than existing measurement platforms, it has many several limitations in supporting CDN operations. The first is that Cedexis’ user population is not representative of Microsoft’s user population. The difference in population means that performance as measured by Cedexis does not reflect performance actually experienced by Microsoft users. The second issue is that the cost of Cedexis scales by the number of target endpoints to measure. As of January 2019, Odin measures several hundred destinations which makes the cost infeasible. Last, Cedexis provides only HTTP(S) measurements. While these are the preferred measurement of Odin because any platform can issue them, Odin rich clients can issue measurements critical to network diagnostics such as DNS resolution and traceroute. Conviva is a commercial platform which uses extensive client-side measurements from video players to optimize video delivery for content publishers [79,102]. Conviva does not control redirection in that they select the CDN, not the front-end, so the measurement 135 is limited. Odin is more general in that it can measure arbitrary Internet endpoints to support arbitrary CDN operations, of which a single application is optimizing redirection. Richter et al. describe a technique for passive outage detection that detects traffic drops below a threshold set by always-on Internet devices sending regular, predictable traffic volumes [132]. While this scheme is adequate for detecting severe outage events, most outages are partial and manifest as transient failures [95] only impacting a portion of clients or requests. For this reason, we believe Odin’s fault-tolerant (Section 4.8.2) client-side measurements to be a more robust signal for outage detection. 6.3 Anycast Most closely related to our anycast CDN work is research from Alzoubi et al. [34, 35]. They describe a load-aware anycast CDN architecture where a centralized route controller manages ingress routes from a CDN to a large ISP. Unlike our work, they do not examine the end-to-end application performance comparison between DNS redirection and anycast. Follow up work focuses on handling anycast TCP session disruption due to BGP path changes [33]. This work fails to quantify the impact of anycast disruption in practice and evaluation of the proposed solution is simulated. Our work instead includes anycast TCP session disruptions implicitly, although we cannot quantify them, as we measure anycast CDN performance from real-users to production anycast CDN infrastructure. Wei and Heidemann also examined anycast TCP session disruptions using the stability of paths between RIPE Atlas nodes and the root DNS servers as a proxy for CDNs [150]. They find that around 1% of nodes are anycast unstable. This work focuses on a client-side 136 solution for anycast disruptions. Our work differs by measuring the real impact of TCP anycast performance in contrast to unicast with DNS redirection. Our work is also closely related to FastRoute [74], a system for load balancing within an anycast CDN, but it does not address performance issues around redirection. Cicalese et al. describe a methodology for IP anycast detection and geolocation [57]. Closely related follow up work presents an Internet-wide scan of IP addresses to quantify all the networks hosting anycast services [56]. This work focuses on measuring anycast adoption on the Internet but does not examine anycast CDN performance. The majority of previous work on anycast performance has focused on root name server DNS. There has been significant attention to anycast DNS from the network operations community [40–43,58,59,89]. Sarat et al. examined the performance impact of anycast on DNS across different anycast configurations [134]. Fan et al. [68] present new methods to identify and characterize anycast nodes. In “Anycast vs. DDOS”, the authors evaluate the effectiveness of anycast in mitigating a DDOS attack against many root DNS servers [123]. Schmidt, Heidemann, and Kuipers studied how many anycast sites are required to provide low latency by examining four anycast letters of the root DNS service [62]. Our work focuses on anycast content delivery over HTTP(S), which has different performance requirements than the root DNS name servers and thus merits a separate analysis. The related work on anycast DNS is not representative of anycast CDNs, because while CDNs are engineered for performance, the root DNS servers are engineered for reliability. Large scale CDN deployments involve aggressive (and often expensive) peering. In contrast, the majority of the root DNS servers are hosted by academic institutions, non-profits, and governments, all with little motivation or resources 137 to improve performance. Improving performance would be unlikely to make a difference anyway – since most TLD records have TTLs of 48 hours, caching limits the impact of poor anycast performance. Several pieces of tangentially related work describe different aspects of anycast such as the deployment of anycast services [38,39,77,103]. Using passive traces collected from a large European ISP, Giordano et al. characterize how widely anycast is utilized in content delivery [83]. Verfploeter describes a method to improve our ability to measure anycast catchment. Verfploeter achieves greater coverage over systems such as RIPE Atlas by issuing large scale probes from many anycast sites and then passively listening for probe responses to find which anycast site a target responded to [63]. Our work instead focuses on anycast redirection performance for content delivery over TCP. 6.4 DNS Latency Map Creation Even though DNS redirection has been a critical part of CDN operations for the past 20 years, there is surprisingly little work in this area. Central to Odin’s DNS latency map creation is the ability to map clients to their LDNS. Mao et al. [118] were the first to quantify the proximity of clients to their local DNS resolvers and find that clients in different geographic locations may use the same resolver. Odin borrows their technique. Akamai uses its NetSession download manager software to obtain client-to-LDNS mappings [50]. Prior work has examined the trade-offs of using location-based approaches (Section 5.2.2) for DNS redirection [82,117,159]. Location-based approaches have the advantage of 138 simplicity but are unable to respond to network conditions. Our work shows that Odin’s maps outperform location-based approaches, most notably in higher percentiles. Huang et al. propose a method called DNS reflection which uses the difference in response times to a series of CNAME redirects to estimate the latency between an LDNS and front-end [92]. This approach overcomes the limitation of low response rates from active probes to LDNSes. A measurement technique used by Akamai is to traceroute from CDN servers to LDNSes to discover routers along the path, then ping those routers as a proxy for CDN to LDNS or end-user latency [50]. The disadvantage of both approaches is that they are only suitable for LDNSes where all clients served are nearby. Akamai published a study on DNS-based redirection [50] showing that enabling ECS [60] greatly improved user performance for users behind public DNS resolvers. Our maps also support ECS by applying a per-client prefix override (Section 5.2.3). PECAN is a system which optimizes both client to front-end selection and Internet routes between clients and the CDN [148]. They find that a joint optimization improves performance for 35% of clients. This work assumes that CDNs can map clients to front- ends using either DNS, HTTP redirection, or HTML rewriting but do not use any of these methods in the evaluation on PlanetLab. Our work focuses explicitly on how to generate maps using DNS. An evaluation of Akamai’s serving infrastructure shows how their DNS mapping system utilizes different parts of their peering fabric in different scenarios [152]. While this work highlights that Akamai’s DNS mapping system takes advantage of different peering fabrics, there are no details on how Akamai’s mapping system makes these decisions. In contrast, 139 our work describes the mechanics of latency map construction and the performance gains observed over other techniques. 140 Chapter 7 Conclusions and Future Work 7.1 Contributions In this dissertation we examined measurement results and an Internet measurement system to show how CDN design decisions impact end-user performance. This work demonstrates how client-side measurements address (1) How to map the content serving infrastructure of large CDN over time (2) How an Internet measurement system can support CDN operations and (3) Understanding the client performance trade-offs between redirection strategies. In Chapter 3, we used the EDNS client-subnet-prefix to enumerate all of Google’s serving infrastructure by resolving DNS queries as though from all possible vantage points. We were able to use the client-to-server mapping to accurately geolocate Google’s infrastructure and show a massive, rapid expansion into ISPs outside of Google’s network. This expansion demonstrated a shift in Google’s CDN design by expanding into networks closer to users to further reduce end-user latency to their services. In Chapter 4, we described Odin, a client-side measurement system deployed at Microsoft to support CDN operations. Odin overcomes limitations of existing systems by offering high measurement volume, high client coverage, and fault tolerance when reporting measurements. Odin currently drives a number of critical operational CDN 141 applications at Microsoft such as traffic engineering, real-time availability alerting, and experimentation. Finally, in Chapter 5 we examined two popular choices for CDN redirection of latency- sensitive services, anycast and DNS. First, in Section 5.2 we looked at a method to construct DNS redirection latency maps that overcome limitations in existing approaches such as low measurement probe responsiveness and assuming the clients are co-located with their LDNS. Then in Section 5.3, we used Odin to evaluate the performance difference of anycast and DNS redirection. We find that anycast performs well most of the time but suffers from poor tail performance and then demonstrate a hybrid anycast-DNS approach that can mitigate poor anycast performance. 7.2 Future Directions While this dissertation has improved our understanding of how CDN design decisions impact client performance, it is only a step towards improving content delivery on the Internet. In this section, we describe several high impact areas which can further improve the state of content delivery. Automating CDN operations. We demonstrated that, with Odin, CDNs can greatly improve their operations using Internet measurement, but Odin remains a stepping stone. The next level of improvement in CDN operations requires sophisticated applications built on top of large, distributed systems such as Odin. Here are three open problems in automation of CDN operations: 142 • Anycast load management. Despite progress with systems such as FastRoute [74], anycast CDN load management remains a challenge. FastRoute’s approach has two major shortcomings. The first is that the assumption that DNS and HTTP requests will land on the same front-end most of the time was true with a modest front-end deployment size at Microsoft, like when FastRoute was first evaluated and published [74], but this behavior varies greatly with a larger deployment and across geographic regions. FastRoute relies on this correlated DNS-HTTP behavior to shed traffic from overloaded front-ends to high capacity front-ends hosted in datacenters. The second shortcoming is relying on high capacity datacenters to serve traffic if a front-end cannot. In cases of high-volume traffic (egress bytes and typically low-priority), this can result in high utilization of valuable backbone links for unimportant traffic. For these reasons, additional work is needed for a better anycast load management solution. • Automated CDN site selection and evaluation. Manual CDN site selection for new sites is a long and tedious process that may not produce the expected performance improvements. Even if a particular country is known to be underserved (e.g. high volume with high latency or in high growth market), it still involves identifying a facility, knowing which ISPs are present, whether those ISPs are amenable to peering, and then dragging CDN WAN fiber to the facility. This process may cost millions of dollars and involves contract negotiation with the facility and all the peers. After this, it may be unclear whether interconnection at this particular facility offers better performance for CDN applications than the alternatives. To 143 solve this problem, we need a better, automated process for selecting and validating CDN site selection. • Routing anomalies with new Internet prefixes. It is common for IP space on the Internet to move between ISPs through the marketplace or acquisitions. When an ISP announces a prefix for the first time, the routing for that prefix may not match the routes for the ISP’s “groomed” prefixes. At least one reason for this is that some ISPs hand-maintain whitelisted blocks of established prefixes per ASN. Upon seeing unfamiliar block announced from the CDN, the peer ISP simply drops the announcement, and traffic must be delivered via transit. For CDNs, this can cause a performance regression. Trying to detect and mitigate these regressions by hand do not scale. CDNs require a system that can automatically detect new IP prefix performance regressions, find a root cause, and contact the responsible ISP to mitigate. Data quality for Internet measurement-driven systems. Large, Internet measurement- driven systems such as Espresso [154], EdgeFabric [135], and Odin [49] control critical parts of large content providers’ networks. Systems that rely on Internet measurement must be concerned with data quality issues such as network coverage, measurement volume, and invalid data that may be impacted from software bugs, telemetry or Internet outages, and bots. Despite these common issues, there has been little work to understand what data quality is required to ensure that systems such as these make good decisions. Early progress was made to show some properties of the Internet either exhibit power- law, Zipfian [45,70,149,156], or other non-normal statistical behavior [61,66,121,151] – 144 making conventional statistical techniques inappropriate for seemingly simple scenarios such as picking the lowest-latency path with high confidence. Combined with a lack of understanding of how different aggregations of Internet measurement data, such as temporal, geographic, and cross-network, change statistical characteristics, makes it difficult to be confident in our systems’ actions. To overcome our lack of understanding, we need to evaluate the application statistical techniques to Internet data to ensure that Internet measurement driven systems make good decisions, even in the presence of incorrect, incomplete, or untrustworthy data. We need better understanding of enterprise Internet connectivity. Until now, the Internet measurement community has focused on studying inter-connection between consumer (end-user), transit, and content provider networks [32,53,110,137]. These links have been scrutinized because they carry the highest volumes of traffic on the Internet. However, little work has been done to understand how enterprise networks connect to the Internet. Even though traffic volumes are relatively modest, the relative value of this traffic is incredible as it drives some of the world’s largest multi-national corporations. From our work with Odin (Section 4), we know that from the Internet, enterprise networks look very different than consumer networks and are more difficult to measure. For example, enterprises often connect to the Internet through corporate proxies ser- vices, leading to unpredictable routing, varying performance for different geographically distributed branches, and difficulty in diagnosing outages. Further, different parts of the Internet can be accessed through different proxies, such as one for public Internet, another for business services such as corporate email, and yet another for cloud-hosted infrastructure on a private VLAN. Enterprise networks also enforce higher levels of security, 145 making understanding their properties challenging with traditional Internet measurement techniques such as traceroute and ping. In order to better serve enterprise networks, we must understand the architectural-level differences compared to consumer networks, why traditional measurement techniques are inadequate, and what can be done to improve insights in enterprise connectivity. 7.3 Summary CDNs form a critical part of the Internet ecosystem by delivering content quickly and reliably which makes users happy and improves business revenue. As the Internet continues to change, CDNs must continuously update and evaluate their designs to keep up with the latest demand. In this dissertation, we presented analyses on the impact of two CDN design decisions, front-end placement and choice of redirection. We also showed a client-side active Internet measurement system that allows CDNs to measure the impact of their design decisions and support general CDN operations. While this thesis demonstrates how CDNs can measure the impact of design decisions, many challenges still exist for content delivery. We hope that this work serves as a building block for the next generation of improvements for content delivery and Internet performance. 146 Bibliography [1] A Faster Internet. http://www.afasterinternet.com/participants.htm. [2] Akamai & Limelight Say Testing Methods Not Accurate In Microsoft Research Paper. https://www.streamingmediablog.com/2008/10/akamai-responds.html. [3] Akamai Compliance Management. https://www.akamai.com/uk/en/multimedia/ documents/product-brief/akamai-for-compliance-management-feature-sheet.pdf. [4] Amazon AWS Route53. https://aws.amazon.com/route53/. [5] Amazon CloudFront. https://aws.amazon.com/cloudfront/. [6] Amazon Web Services. https://aws.amazon.com/. [7] Azure regions. https://azure.microsoft.com/en-us/regions/. [8] Azure Regions. https://azure.microsoft.com/en-us/global-infrastructure/regions/. [9] Catchpoint. https://www.catchpoint.com. [10] Cedexis. https://www.cedexis.com/. [11] Cloudflare. https://www.cloudflare.com/. [12] Dropbox Traffic Infrastructure: Edge network. https://blogs.dropbox.com/tech/ 2018/10/dropbox-traffic-infrastructure-edge-network/. [13] Dynatrace. https://www.dynatrace.com/capabilities/synthetic-monitoring/. [14] Google Cloud CDN. https://cloud.google.com/cdn/. [15] Google Cloud Load Balancer. https://cloud.google.com/load-balancing/. [16] Google Cloud Platform. https://cloud.google.com/. [17] Google Edge Network. https://peering.google.com/#/infrastructure. [18] Google Public DNS. [19] Netflix Open Connect. https://media.netflix.com/en/company-blog/ how-netflix-works-with-isps-around-the-globe-to-deliver-a-great-viewing-experience. [20] PhantomJS - Scriptable Headless Browser. https://phantomjs.org. [21] RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. https://tools.ietf.org/html/rfc7231. [22] RIPE Atlas Network Coverage. https://atlas.ripe.net/results/maps/ network-coverage/. 147 [23] Snitch: Page Load Analyzer for PhantomJS. https://github.com/inspace/snitch. [24] Thousandeyes. https://www.thousandeyes.com/. [25] USC CDN Coverage. http://usc-nsl.github.io/cdn-coverage. [26] Sandvine: Global Internet Phenomena 2016. https://www.sandvine.com/hubfs/downloads/archive/ 2016-global-internet-phenomena-report-latin-america-and-north-america.pdf, 2016. [27] Vijay Kumar Adhikari, Yang Guo, Fang Hao, Volker Hilt, and Zhi-Li Zhang. Tale of Three CDNs: An Active Measurement Study of Hulu and its CDNs. In Global Internet Symposium, 2012. [28] Vijay Kumar Adhikari, Sourabh Jain, Yingying Chen, and Zhi-Li Zhang. Vivisecting Youtube: An Active Measurement Study. In INFOCOM, 2012. [29] Vijay Kumar Adhikari, Sourabh Jain, and Zhi-Li Zhang. YouTube Traffic Dynamics and its Interplay with a Tier-1 ISP: an ISP Perspective. In IMC, 2010. [30] Adnan Ahmed, Zubair Shafiq, Harkeerat Bedi, Amir Khakpour. Peering vs. Transit: Performance Comparison of Peering and Transit Interconnections. In ICNP, 2017. [31] Bernhard Ager, Wolfgang M¨ uhlbauer, Georgios Smaragdakis, and Steve Uhlig. Comparing DNS Resolvers in the Wild. In IMC, 2010. [32] Bernhard Ager, Wolfgang M¨ uhlbauer, Georgios Smaragdakis, and Steve Uhlig. Web Content Cartography. In IMC, 2011. [33] Zakaria Al-Qudah, Seungjoon Lee, Michael Rabinovich, Oliver Spatscheck, and Jacobus Van der Merwe. Anycast-aware Transport for Content Delivery Networks. In WWW, 2009. [34] Hussein A Alzoubi, Seungjoon Lee, Michael Rabinovich, Oliver Spatscheck, and Jacobus Van der Merwe. Anycast CDNs Revisited. In WWW, 2008. [35] Hussein A Alzoubi, Seungjoon Lee, Michael Rabinovich, Oliver Spatscheck, and Jacobus Van Der Merwe. A Practical Architecture for an Anycast CDN. Transactions on the Web, 2011. [36] Matthew Andrews, Bruce Shepherd, Aravind Srinivasan, Peter Winkler, and Francis Zane. Clustering and Server Selection Using Passive Monitoring. In INFOCOM, 2002. [37] Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and J¨ org Sander. OPTICS: Ordering Points to Identify the Clustering Structure. In ACM Sigmod Record, volume 28, 1999. [38] Hitesh Ballani and Paul Francis. Towards a Global IP Anycast Service. In SIGCOMM, 2005. 148 [39] Hitesh Ballani, Paul Francis, and Sylvia Ratnasamy. A Measurement-based Deploy- ment Proposal for IP Anycast. In IMC, 2006. [40] Piet Barber, Matt Larson, and Mark Kosters. Traffic Source Analysis of the J Root Anycast Instances. NANOG 39. February, ’07. [41] Piet Barber, Matt Larson, Mark Kosters, and Pete Toscano. Life and Times of J-ROOT. NANOG 32, October 2004. [42] Peter Boothe and Randy Bush. Anycast Measurements Used To Highlight Routing Instabilities. NANOG 35, October 2005. [43] Peter Boothe and Randy Bush. DNS Anycast Stability. APNIC, 2005. [44] Timm B¨ ottger, Felix Cuadrado, Gareth Tyson, Ignacio Castro, and Steve Uhlig. Open Connect Everywhere: A Glimpse at the Internet Ecosystem Through the Lens of the Netflix CDN. CCR, 48(1), 2018. [45] Nevil Brownlee and Kimberly C Claffy. Understanding Internet Traffic Streams: Dragonflies and Tortoises. IEEE Communications magazine, 40(10), 2002. [46] Carlos Bueno. Doppler: Internet radar. https://www.facebook.com/note.php? note id=10150212498738920. [47] Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann, and Ramesh Govindan. Mapping the Expansion of Google’s Serving Infrastructure. In IMC, 2013. [48] Matt Calder, Ashley Flavel, Ethan Katz-Bassett, Ratul Mahajan, and Jitendra Padhye. Analyzing the Performance of an Anycast CDN. In IMC, 2015. [49] Matt Calder, Manuel Schr¨ oder, Ryan Stewart Ryan Gao, Jitendra Padhye, Ratul Mahajan, Ganesh Ananthanarayanan, and Ethan Katz-Bassett. Odin: Microsoft’s Scalable Fault-Tolerant CDN Measurement System. In NSDI, 2018. [50] Fangfei Chen, Ramesh K Sitaraman, and Marcelo Torres. End-user Mapping: Next Generation Request Routing for Content Delivery. In SIGCOMM, 2015. [51] Yingying Chen, Sourabh Jain, Vijay Kumar Adhikari, and Zhi-Li Zhang. Charac- terizing Roles of Front-end Servers in End-to-End Performance of Dynamic Content Distribution. In IMC, 2011. [52] Yingying Chen, Ratul Mahajan, Baskar Sridharan, and Zhi-Li Zhang. A Provider- side View of Web Search Response Time. In SIGCOMM, 2013. [53] Y. Chiu, B. Schlinker, Abhishek Balaji Radhakrishnan, E. Katz-Bassett, and R. Govindan. Are We One Hop Away from a Better Internet? In IMC ’15. [54] David Choffnes and Fabian E. Bustamante. Taming the Torrent: A Practical Approach to Reducing Cross-ISP Traffic in Peer-to-Peer Systems. In SIGCOMM, 2008. 149 [55] Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson, Mike Wawrzoniak, and Mic Bowman. Planetlab: An Overlay Testbed for Broad-coverage Services. CCR, 33(3), 2003. [56] Danilo Cicalese, Jordan Aug´ e, Diana Joumblatt, Timur Friedman, and Dario Rossi. Characterizing IPv4 Anycast Adoption and Deployment. In CoNEXT, 2015. [57] Danilo Cicalese, Diana Joumblatt, Dario Rossi, Marc-Olivier Buob, Jordan Aug´ e, and Timur Friedman. A Fistful of Pings: Accurate and Lightweight Anycast Enumeration and Geolocation. In INFOCOM, 2015. [58] Lorenzo Coletti. Effects of Anycast on K-root Performance. NANOG 37. June, ’06. [59] Lorenzo Colitti, Erik Romijn, Henk Uijterwaal, and Andrei Robachevsky. Evaluating the Effects of Anycast on DNS Root Name Servers. RIPE document RIPE-393, 2006. [60] C. Contavalli, W. van der Gaast, S. Leach, and E. Lewis. RFC7871: Client Subnet in DNS Queries. https://tools.ietf.org/html/rfc7871. [61] Mark E Crovella and Azer Bestavros. Self-similarity in World Wide Web Traffic: Evidence and Possible Causes. Transactions on Networking, 5(6), 1997. [62] Ricardo de Oliveira Schmidt, John Heidemann, and Jan Harm Kuipers. Anycast Latency: How Many Sites are Enough? In PAM, 2017. [63] Wouter B De Vries, Ricardo de O Schmidt, Wes Hardaker, John Heidemann, Pieter- Tjerk de Boer, and Aiko Pras. Broad and Load-aware Anycast Mapping with Verfploeter. In IMC, 2017. [64] Mohan Dhawan, Justin Samuel, Renata Teixeira, Christian Kreibich, Mark Allman, Nicholas Weaver, and Vern Paxson. Fathom: A Browser-based Network Measurement Platform. In IMC, 2012. [65] Xenofontas Dimitropoulos, Dmitri Krioukov, Marina Fomenkov, Bradley Huffaker, Young Hyun, k. c. claffy, and George Riley. AS Relationships: Inference and Validation. CCR, 37(1), 2007. [66] Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On Power-law Rela- tionships of the Internet Topology. In CCR, volume 29, 1999. [67] Xun Fan. Enabling Efficient Service Enumeration Through Smart Selection of Measurements. PhD thesis, University of Southern California, 2015. [68] Xun Fan, John Heidemann, and Ramesh Govindan. Evaluating Anycast in the Domain Name System. In INFOCOM ’13. [69] Xun Fan, Ethan Katz-Bassett, and John Heidemann. Assessing Affinity Between Users and CDN Sites. In TMA, 2015. 150 [70] Wenjia Fang and Larry Peterson. Inter-AS Traffic Patterns and their Implications. In GLOBECOM, 1999. [71] Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan. Reducing Web Latency: The Virtue of Gentle Aggression. 2013. [72] Tobias Flach, Pavlos Papageorge, Andreas Terzis, Luis Pedrosa, Yuchung Cheng, Tayeb Karim, Ethan Katz-Bassett, and Ramesh Govindan. An Internet-wide Analysis of Traffic Policing. In SIGCOMM, 2016. [73] Ashley Flavel, Pradeepkumar Mani, and David Maltz. Re-evaluating the Respon- siveness of DNS-based Network Control. In LANMAN, 2014. [74] Ashley Flavel, Pradeepkumar Mani, David Maltz, Nick Holt, Jie Liu, Yingying Chen, and Oleg Surmachev. FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs. In NSDI, 2015. [75] Marcel Flores, Alexander Wenzel, Kevin Chen, and Aleksandar Kuzmanovic. Fury Route: Leveraging CDNs to Remotely Measure Network Distance. In PAM, 2018. [76] Benjamin Frank, Ingmar Poese, Yin Lin, Georgios Smaragdakis, Anja Feldmann, Bruce Maggs, Jannis Rake, Steve Uhlig, and Rick Weber. Pushing CDN-ISP Collaboration to the Limit. CCR, 43(3), 2013. [77] Michael J Freedman, Karthik Lakshminarayanan, and David Mazi` eres. OASIS: Anycast for Any Service. In NSDI, 2006. [78] Michael J Freedman, Mythili Vutukuru, Nick Feamster, and Hari Balakrishnan. Geographic Locality of IP Prefixes. In IMC, 2005. [79] Aditya Ganjam, Faisal Siddiqui, Jibin Zhan, Xi Liu, Ion Stoica, Junchen Jiang, Vyas Sekar, and Hui Zhang. C3: Internet-Scale Control Plane for Video Quality Optimization. In NSDI, 2015. [80] Geoff Huston. Measuring the DNS from the Users’ Perspective. In APNIC Labs, May 2014. [81] Geoff Huston. Measuring IPv6 using Ad-based Measurement. In APNIC 44, September 2017. [82] Manaf Gharaibeh, Anant Shah, Bradley Huffaker, Han Zhang, Roya Ensafi, and Christos Papadopoulos. A Look at Router Geolocation in Public and Commercial Databases. In IMC, 2017. [83] Danilo Giordano, Danilo Cicalese, Alessandro Finamore, Marco Mellia, Maurizio Munaf` o, Diana Zeaiter Joumblatt, and Dario Rossi. A First Characterization of Anycast Traffic from Passive Traces. In TMA, 2016. 151 [84] Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure. In SIGCOMM, 2016. [85] Bamba Gueye, Artur Ziviani, Mark Crovella, and Serge Fdida. Constraint-based Geolocation of Internet hosts. Transactions On Networking, 14(6), 2006. [86] Mehmet H Gunes and Kamil Sarac. Analyzing Router Responsiveness to Active Measurement Probes. In PAM, 2009. [87] Gonca G¨ ursun. Routing-aware Partitioning of the Internet Address Space for Server Ranking in CDNs. Computer Communications, 106, 2017. [88] James Hamilton. AWS RE:Invent 2016: Amazon Global Network Overview. https: //www.youtube.com/watch?v=uj7Ting6Ckk. [89] James Hiebert, Peter Boothe, Randy Bush, and Lucy Lynch. Determining the Cause and Frequency of Routing Instability with Anycast. In AINTEC, 2006. [90] Zi Hu and John Heidemann. Towards Geolocation of Millions of IP Addresses. In IMC, 2012. [91] Cheng Huang, Ivan Batanov, and Jin Li. A Practical Solution to the Client-LDNS Mismatch Problem. CCR, 42(2), 2012. [92] Cheng Huang, Nic Holt, Angela Wang, Albert G Greenberg, Jin Li, and Keith W Ross. A DNS Reflection Method for Global Traffic Management. In ATC, 2010. [93] Cheng Huang, David A Maltz, Jin Li, and Albert Greenberg. Public DNS System and Global Traffic Management. In INFOCOM, 2011. [94] Cheng Huang, Angela Wang, Jin Li, and Keith W Ross. Measuring and Evaluating Large-scale CDNs. In IMC, 2008. [95] Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. Gray Failure: The Achilles’ Heel of Cloud-Scale Systems. In HotOS, 2017. [96] Bradley Huffaker, Marina Fomenkov, et al. Geocompare: A Comparison of Public and Commercial Geolocation Databases-Technical Report. Technical report, Cooperative Association for Internet Data Analysis (CAIDA), 2011. [97] Bradley Huffaker, Marina Fomenkov, et al. Drop: Dns-based router positioning. CCR, 44(3), 2014. [98] iPlane. http://iplane.cs.washington.edu. [99] Arvind Jain, Jatinder Mann, Zhiheng Wang, and Anderson Quach. W3C Resource Timing Working Draft. https://www.w3.org/TR/resource-timing-1/, July 2017. 152 [100] Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Ar- jun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. B4: Experience with a Globally-deployed Software Defined WAN. In SIGCOMM, 2013. [101] Junchen Jiang, Rajdeep Das, Ganesh Ananthanarayanan, Philip A Chou, Venkata Padmanabhan, Vyas Sekar, Esbjorn Dominique, Marcin Goliszewski, Dalibor Kukoleca, Renat Vafin, et al. Via: Improving Internet Telephony Call Quality using Predictive Relay Selection. In SIGCOMM, 2016. [102] Junchen Jiang, Vyas Sekar, Henry Milner, Davis Shepherd, Ion Stoica, and Hui Zhang. CFA: A Practical Prediction System for Video QoE Optimization. In NSDI, 2016. [103] Dina Katabi and John Wroclawski. A Framework For Scalable Global IP-anycast (GIA). CCR, 2000. [104] Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson, and Yatin Chawathe. Towards IP geolocation using delay and topology measurements. In IMC, 2006. [105] Panagiotis Kintis, Yacin Nadji, David Dagon, Michael Farrell, and Manos Anton- akakis. Understanding the Privacy Implications of ECS. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, 2016. [106] Christian Kreibich, Nicholas Weaver, Boris Nechaev, and Vern Paxson. Netalyzr: Illuminating the Edge Network. In IMC, 2010. [107] Balachander Krishnamurthy, Craig Wills, and Yin Zhang. On the Use and Perfor- mance of Content Distribution Networks. In IMC, 2001. [108] Rupa Krishnan, Harsha V. Madhyastha, Sridhar Srinivasan, Sushant Jain, Arvind Krishnamurthy, Thomas Anderson, and Jie Gao. Moving Beyond End-to-End Path Information to Optimize CDN Performance. In IMC, 2009. [109] S Shunmuga Krishnan and Ramesh K Sitaraman. Video Stream Quality Impacts Viewer Behavior: Inferring Causality using Quasi-experimental Designs. Transactions on Networking, 21(6), 2013. [110] Craig Labovitz, Scott Iekel-Johnson, Danny McPherson, Jon Oberheide, and Farnam Jahanian. Internet Inter-domain Traffic. In CCR, volume 40, 2010. [111] Matt Levine, Barrett Lyon, and Todd Underwood. Operation Experience with TCP and Anycast. In NANOG 37, 2006. [112] Weichao Li, Ricky KP Mok, Rocky KC Chang, and Waiting WT Fok. Appraising the Delay Accuracy In Browser-based Network Measurement. In IMC ’13. [113] Greg Linden. Make Data Useful. http://sites.google.com/site/glinden/Home/ StanfordDataMining.2006-11-28.ppt, 2006. 153 [114] Hongqiang Harry Liu, Raajay Viswanathan, Matt Calder, Aditya Akella, Ratul Mahajan, Jitendra Padhye, and Ming Zhang. Efficiently Delivering Online Services over Integrated Infrastructure. In NSDI, 2016. [115] Ziqian Liu, Bradley Huffaker, Marina Fomenkov, Nevil Brownlee, et al. Two Days in the Life of the DNS Anycast Root Servers. In PAM. 2007. [116] M. Luckie, B. Huffaker, A. Dhamdhere, V. Giotsas, and k claffy. AS relationships, customer cones, and validation. In IMC, 2013. [117] Cristian Lumezanu, Randy Baden, Neil Spring, and Bobby Bhattacharjee. Triangle Inequality and Routing Policy Violations in the Internet. In PAM, 2009. [118] Zhuoqing Morley Mao, Charles D Cranor, Fred Douglis, Michael Rabinovich, Oliver Spatscheck, and Jia Wang. A Precise and Efficient Evaluation of the Proximity Between Web Clients and Their Local DNS Servers. In ATC, 2002. [119] Mapping CDN domains. http://b4ldr.wordpress.com/2012/02/13/ mapping-cdn-domains/. [120] MaxMind. http://www.maxmind.com/app/ip-location/. [121] Alberto Medina, Ibrahim Matta, and John Byers. On the Origin of Power Laws in Internet Topologies. CCR, 30(2), 2000. [122] David Meyer. RouteViews. http://www.routeviews.org. [123] Giovane Moura, Ricardo de O Schmidt, John Heidemann, Wouter B de Vries, Moritz Muller, Lan Wei, and Cristian Hesselman. Anycast vs. DDoS: Evaluating the November 2015 root DNS Event. In IMC, 2016. [124] Erik Nygren, Ramesh K Sitaraman, and Jennifer Sun. The Akamai Network: A Platform for High-performance Internet Applications. In SIGOPS, 2010. [125] John S. Otto, Mario A. S´ anchez, John P. Rula, and Fabi´ an E Bustamante. Content Delivery and the Natural Evolution of DNS. In IMC, 2012. [126] John S Otto, Mario A S´ anchez, John P Rula, Ted Stein, and Fabi´ an E Bustamante. namehelp: Intelligent Client-side DNS Resolution. In SIGCOMM, 2012. [127] Venkata N Padmanabhan and Lakshminarayanan Subramanian. An Investigation of Geographic Mapping Techniques for Internet Hosts. In CCR, volume 31, 2001. [128] Jeffrey Pang, Aditya Akella, Anees Shaikh, Balachander Krishnamurthy, and Srini- vasan Seshan. On the Responsiveness of DNS-based Network Control. In IMC, 2004. [129] Abhinav Pathak, Y Angela Wang, Cheng Huang, Albert Greenberg, Y Charlie Hu, Randy Kern, Jin Li, and Keith W Ross. Measuring and Evaluating TCP Splitting for Cloud Services. In PAM, 2010. 154 [130] Ingmar Poese, Benjamin Frank, Bernhard Ager, Georgios Smaragdakis, Steve Uhlig, and Anja Feldmann. Improving Content Delivery with PaDIS. Internet Computing, 16(3), 2012. [131] Ingmar Poese, Steve Uhlig, Mohamed Ali Kaafar, Benoit Donnet, and Bamba Gueye. Ip geolocation databases: Unreliable? CCR, 41(2), 2011. [132] Philipp Richter, Ramakrishna Padmanabhan, Neil Spring, Arthur Berger, and David Clark. Advancing the Art of Internet Edge Outage Detection. In IMC, 2018. [133] Mario A S´ anchez, John S Otto, Zachary S Bischof, David R Choffnes, Fabi´ an E Bustamante, Balachander Krishnamurthy, and Walter Willinger. Dasu: Pushing Experiments to the Internet’s Edge. In NSDI, 2013. [134] Sandeep Sarat, Vasileios Pappas, and Andreas Terzis. On the Use of Anycast in DNS. In ICCCN, 2006. [135] Brandon Schlinker, Hyojeong Kim, Timothy Cui, Ethan Katz-Bassett, Harsha V Madhyastha, Italo Cunha, James Quinn, Saif Hasan, Petr Lapukhov, and Hongyi Zeng. Engineering Egress with Edge Fabric. In SIGCOMM, 2017. [136] Steve Souders. High-performance web sites. 51(12):36–41, December 2008. [137] Neil Spring, Ratul Mahajan, and Thomas Anderson. The Causes of Path Inflation. In SIGCOMM, 2003. [138] Neil Spring, Ratul Mahajan, and David Wetherall. Measuring ISP Topologies with Rocketfuel. In SIGCOMM, 2002. [139] Richard A Steenbergen. A Practical Guide to (correctly) Troubleshooting with Traceroute. NANOG 37, pages 1–49, 2009. [140] Stoyan Stefanov. Yslow 2.0. In CSDN SD2C, 2008. [141] Florian Streibelt, Jan B¨ ottger, Nikolaos Chatzis, Georgios Smaragdakis, and Anja Feldmann. Exploring EDNS-client-subnet Adopters in Your Free Time. In IMC, 2013. [142] Ao-Jan Su, David R. Choffnes Aleksandar Kuzmanovic, and Fabi’an E. Bustamante. Drafting behind Akamai (Travelocity-based detouring). In SIGCOMM, 2006. [143] Srikanth Sundaresan, Sam Burnett, Nick Feamster, and Walter De Donato. BISmark: A Testbed for Deploying Measurements and Applications in Broadband Access Networks. In ATC, 2014. [144] Mukarram Tariq, Amgad Zeitoun, Vytautas Valancius, Nick Feamster, and Mostafa Ammar. Answering What-If Deployment and Configuration Questions with WISE. In SIGCOMM, 2008. 155 [145] Ruben Torres, Alessandro Finamore, Jin Ryong Kim, Marco Mellia, Maurizio M Munafo, and Sanjay Rao. Dissecting video server selection strategies in the youtube cdn. In ICDCS, 2011. [146] Sipat Triukose, Zhihua Wen, and Michael Rabinovich. Measuring a Commercial Content Delivery Network. In WWW, 2011. [147] UCLA Internet Topology Collection. http://irl.cs.ucla.edu/topology/. [148] Vytautas Valancius, Bharath Ravi, Nick Feamster, and Alex C Snoeren. Quantifying the Benefits of Joint Content and Network Routing. In SIGMETRICS, 2013. [149] J¨ org Wallerich, Holger Dreger, Anja Feldmann, Balachander Krishnamurthy, and Walter Willinger. A Methodology for Studying Persistency Aspects of Internet Flows. CCR, 35(2), 2005. [150] Lan Wei and John Heidemann. Does Anycast Hang up on You? IEEE Transactions on Network and Service Management, 2018. [151] Walter Willinger, David Alderson, and John C Doyle. Mathematics and the internet: A source of enormous confusion and great potential. Notices of the American Mathematical Society, 56(5), 2009. [152] Florian Wohlfart, Nikolaos Chatzis, Caglar Dabanoglu, Georg Carle, and Walter Willinger. Leveraging Interconnections for Performance: The Serving Infrastructure of a Large CDN. In SIGCOMM, 2018. [153] Jing’an Xue, David Choffnes, and Jilong Wang. CDNs Meet CN: An Empirical Study of CDN Deployments in China. IEEE Access, 5, 2017. [154] Kok-Kiong Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, et al. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering. In SIGCOMM, 2017. [155] Ming Zhang, Yaoping Ruan, Vivek S Pai, and Jennifer Rexford. How DNS Misnaming Distorts Internet Topology Mapping. In ATC, 2006. [156] Yin Zhang, Lee Breslau, Vern Paxson, and Scott Shenker. On the Characteristics and Origins of Internet Flow Rates. In CCR, volume 32, 2002. [157] Zheng Zhang, Ming Zhang, Albert G Greenberg, Y Charlie Hu, Ratul Mahajan, and Blaine Christian. Optimizing Cost and Performance in Online Service Provider Networks. In NSDI, 2010. [158] Mingchen Zhao, Paarijaat Aditya, Ang Chen, Yin Lin, Andreas Haeberlen, Peter Druschel, Bruce Maggs, Bill Wishon, and Miroslav Ponec. Peer-assisted Content Distribution in Akamai NetSession. In IMC, 2013. [159] Han Zheng, Eng Keong Lua, Marcelo Pias, and Timothy G Griffin. Internet Routing Policies and Round-trip-times. In PAM, 2005. 156 [160] Yaping Zhu, Benjamin Helsley, Jennifer Rexford, Aspi Siganporia, and Sridhar Srini- vasan. LatLong: Diagnosing Wide-area Latency Changes for CDNs. In Transactions on Network and Service Management, 2012. 157
Abstract (if available)
Abstract
High performance Internet services are critical to the success of online businesses because poor user experience directly impacts revenue. To provide low latency and high availability service, companies often use content delivery networks CDNs to deliver content quickly and reliably to customers through a globally distributed network of servers. In order to meet the performance demands of customers, CDNs continuously make crucial network designs decisions that impact end-user performance. However, little is known about how CDN design decisions impact end-user performance or how to evaluate them effectively. ❧ In this thesis, we aim to help content providers and researchers understand the performance impact of CDN design changes on end-users. To achieve this, we examine a collection of measurement results and a Internet measurement system deployed in production at a large content provider. With our measurement results we will look at the impact of two important CDN design decisions—expansion of CDN front-end deployment and a choice of popular redirection strategies. Our design and evaluation of a measurement system at Microsoft demonstrates how CDNs can use end-user Internet measurements to support operations. ❧ First, we look at a massive expansion of Google's serving infrastructure into end-user networks to reduce latency to their services. We first explore measurement techniques using the client-subnet-prefix DNS extension to completely enumerate and geolocate Google's serving infrastructure. Our longitudinal measurements then capture a large expansion of Google sites from primarily on Google's own network into end-user networks, greatly reducing the distance between end-users and Google services. We then examine the performance impact of Google's expansion by conducting a study of user performance to the servers users are directed to before and after the expansion. ❧ Second, we examine Odin, a production measurement system deployed at Microsoft to support CDN and network operations. Odin was designed to overcome the measurement limitations of existing approaches and to take advantage of Microsoft's control of first-party end-user applications. We demonstrate that Odin delivers measurements even in the presence of Internet outages and supports a number of critical CDN operational scenarios such as traffic management, outage identification, and network experimentation. ❧ Third, we look at anycast and DNS redirection, the two common strategies used for serving latency sensitive content. We first examine a technique for constructing DNS latency maps that improves performance over existing approaches. We then use that approach to compare performance of DNS and anycast redirection where we find that most of the time anycast directs users to low latency server but DNS performance is better in the tail.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Anycast stability, security and latency in the Domain Name System (DNS) and Content Deliver Networks (CDNs)
PDF
Enabling efficient service enumeration through smart selection of measurements
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
Making web transfers more efficient
PDF
Improving user experience on today’s internet via innovation in internet routing
PDF
Balancing security and performance of network request-response protocols
PDF
Learning about the Internet through efficient sampling and aggregation
PDF
Congestion control in multi-hop wireless networks
PDF
Global analysis and modeling on decentralized Internet
PDF
Scaling-out traffic management in the cloud
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
Towards highly-available cloud and content-provider networks
PDF
Improving network security through collaborative sharing
PDF
Detecting and characterizing network devices using signatures of traffic about end-points
PDF
Leveraging programmability and machine learning for distributed network management to improve security and performance
PDF
Mitigating attacks that disrupt online services without changing existing protocols
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Efficient pipelines for vision-based context sensing
PDF
Benchmarking interactive social networking actions
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
Asset Metadata
Creator
Calder, Matt
(author)
Core Title
Measuring the impact of CDN design decisions
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
02/21/2019
Defense Date
12/14/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
anycast,CDN,content delivery network,DNS,Google,internet mapping,Internet measurement,internet performance,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Katz-Bassett, Ethan (
committee chair
), Govindan, Ramesh (
committee member
), Heidemann, John (
committee member
), Psounis, Kostas (
committee member
)
Creator Email
calderm@usc.edu,crawlder@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-126361
Unique identifier
UC11675369
Identifier
etd-CalderMatt-7103.pdf (filename),usctheses-c89-126361 (legacy record id)
Legacy Identifier
etd-CalderMatt-7103.pdf
Dmrecord
126361
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Calder, Matt
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
anycast
CDN
content delivery network
DNS
Google
internet mapping
Internet measurement
internet performance