Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Protecting online services from sophisticated DDoS attacks
(USC Thesis Other)
Protecting online services from sophisticated DDoS attacks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PROTECTING ONLINE SERVICES FROM SOPHISTICATED DDOS ATTACKS by Rajat Tandon A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2022 Copyright 2022 Rajat Tandon Acknowledgments I would like to express my sincere gratitude to my PhD advisor Prof. Jelena Mirkovic, for her valuable guidance, support, and encouragement throughout my PhD journey. She has been the best advisor a student may ever have. Her expertise in Computer Science, Computer and Network Security, and great research skills are unmatched. I have been very privileged and fortunate to pursue my PhD under her. She has taught me how to reason and solve complex scientific problems with critical thinking easily. I am also very grateful to Dr. Genevieve Bartlett for her immense support and valuable advice throughout my PhD program. She has been a great mentor and an excellent researcher. Again, I have been very fortunate to have her as my mentor. I want to express my gratitude to the other supervisors of STEEL Lab, Dr. Christophe Hauser and Dr. Luis Garcia, for all their help, suggestions and valuable feedback during my PhD journey. I would like to thank Dr. Barath Raghavan, Dr. Phebe Vayanos, Dr. Ning Wang, Dr. Emilio Ferrara, Dr. Ramesh Govindan and Dr. Chris Kyriakakis for taking the time to serve on my qualifying exam committee and dissertation committee out of their busy schedules. Their suggestions and advice have been a great inspiration for me. I am thankful to my fellow collaborators, friends and colleagues: Dr. Michalis Kallitsis, Dr. Dhiraj Murthy, Dr. Stillian Stoev, Hao Shi, Simon Woo, Sivaram Ramanathan, Nicolaas Weideman, Abhinav Palia, Jaydeep Ramani, Pithayuth Charnsethikul, Sima Arasteh, Wei-Cheng Wu, Ishank Arora and many others. Also, I would like to express my gratitude to Alba Regalado, Joseph Kemp, Jeeanine Yamazaki, ii Matt Binkley and Lizsl De Leon for their assistance and administrative support at ISI and USC that has greatly improved our research life. iii Table of Contents Acknowledgments ii List of Tables vii List of Figures ix Abstract xi Chapter 1: Introduction 1 1.1 Demonstrating the Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Structure of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2: AMON-SENSS: Scalable and Accurate Detection of Volumetric DDoS Attacks at ISPs 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 DDoS Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.0.1 Ground Truth Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.0.2 Evaluation Approach and Calibration . . . . . . . . . . . . . . . . . . . . 15 2.4.0.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.0.4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.0.5 Operational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Chapter 3: Leader: Defense Against Exploit-Based Denial-of-Service Attacks on Web Servers 19 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 ExDoS Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Leader Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1.1 Behavior Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1.2 Attack Request Identification . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.1.3 Attack Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.2 Assumptions and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3.1 Prober Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.3.2 Builder Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.3.3 Learner Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 iv 3.3.3.4 Scoring Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.3.5 Mitigation Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.4 Deployment Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.2 Maliciously Crafted URL Attack on a Flask Application (MCU) . . . . . . . . . . . 40 3.4.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.1 The liberal design scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.2 The conservative design scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Comparison With Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.7 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.8 Operational Cost and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Chapter 4: Defending Web Servers Against Flash Crowd Attacks 54 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 FRADE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.4 Request Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2.5 Request Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.6 Deception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.7 Using a Proxy To Speed Up Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.8 Improvements over OM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2.9 Deployment Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.10 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.1 Emulation Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.2 Today’s (Naive) Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3.3 Sophisticated Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3.4 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.5 FRADE Outperforms OM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.6 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3.7 Operational Cost and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Chapter 5: Quantifying Cloud Misbehavior 85 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.1 Cloud Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.2 Identifying Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3.1 Network Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 v 5.3.2 Blocklists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.4 Misbehavior Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.4.1 Findings from Network Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.4.2 Findings from Blocklists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Chapter 6: Conclusions 105 Bibliography 107 vi List of Tables 2.1 Summary of key related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Datasets used in our evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Results: A-S: AMON-SENSS, NS - NetScout, FNM - FastNetMon . . . . . . . . . . . . . 16 3.1 Resource usage by the the different stages of a sample legitimate and a sample attack connection in Figure 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Leader tracks the highlighted function calls or uses them to identify a sequence of calls. . . . 32 3.3 Comparing Leader’s evaluation setup with closely related prior work. . . . . . . . . . . . . 38 3.4 Classification accuracy for the liberal design scenario for 1-class SVM and elliptic envelope. 41 3.5 Leader’s classification accuracy, avg over 10 trials, for the liberal and the conservative scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 Comparison with related works over the same attack scenarios. . . . . . . . . . . . . . . . 48 3.7 Precision and recall values when varying the contamination parameter of elliptic envelope. . 48 3.8 Training time and model’s sensitivity for an MCU attack scenario on varying the number of connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1 Comparison between OM [114] and FRADE. . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 FRADE’s parameters and the values we used. . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 Group assignment for our three Web sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4 Time to block all bots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 Page serve time in ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 vii 4.6 Rel. work comparison, showing the absence or presence of human Web server interaction features, even if present at the very basic level. . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.1 Datasets summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 viii List of Figures 1.1 The big picture: gaps identified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 The big picture: research insights and filling the gaps . . . . . . . . . . . . . . . . . . . . . 4 3.1 Sample legitimate and attack connection life stages. Table 3.1 shows the time of each stage, how often it is visited in the sequence, memory used, number of CPU cycles elapsed, number of page faults that occurred and the number of open file descriptors for each stage. The red portion shows the sequence of calls in an exDoS attack that are different from the calls observed on a legitimate connection. Different exDoS attacks may follow different sequences and consume different amount of resources at different calls. . . . . . . . . . . . 27 3.2 Leader’s operation: learning and classification . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Experiment topology in Emulab: 4 physical attackers (up to 1,000 virtual attackers), 1 legitimate client (up to 100 virtual clients) and 3 servers on a LAN. . . . . . . . . . . . . . 30 3.4 Leader identifies all the aggressive attackers within seconds and blocks them, which allows legitimate traffic to recover and obtain good service again. . . . . . . . . . . . . . . . . . . 41 3.5 ROC curves using the percentage thresholds for different sequence lengths . . . . . . . . . . 42 3.6 Effect of adversarial data injection for an MCU attack scenario using the contamination parameter as 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.7 The classification threshold values for the connections that are outliers for a given source IP for different sequence lengths that lead to more than 99% True Positives and fewer than 0.5% false positives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 Overview of FRADE’s processing of a Web request. . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Illustration of high-rate attack handling by the server itself . . . . . . . . . . . . . . . . . . 65 4.3 Illustration of high-rate attack handling by Trans approach . . . . . . . . . . . . . . . . . . 65 4.4 Illustration of high-rate attack handling by TAB approach . . . . . . . . . . . . . . . . . . . 65 ix 4.5 Attackers: A1-A8 (up to 8,000 virtual bots), Legitimate: L (100 clients), the proxy and 3 Web servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 FRADE’s handling of an FCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.7 Today’s (Naive) attacks and performance comparison for sophisticated attacks. . . . . . . . 75 4.8 The time to block 8,000 bots in sophisticated attacks. . . . . . . . . . . . . . . . . . . . . . 76 4.9 Memory and CPU cost vs # bots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1 Identifying Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Number of cloud prefixes identified from different sources and their overlaps. . . . . . . . . 88 5.3 Network Traces Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4 Percentage of clouds and non-clouds in the different Network Traces datasets and Blocklists datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.5 rank avg of top clouds and non-clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6 Cumulative scans ratio distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 x Abstract Distributed denial-of-service (DDoS) attacks are increasing in volume, frequency and sophistication. Even two decades after the first DDoS attack, there are important challenges that need to be addressed. In this dissertation, we identify the gaps that are present in the existing operational and research-based DDoS defenses. For each gap, we describe research insights that we follow, and present novel solutions utilizing the insights to fill the gaps. A typical kind of DDoS attacks are volumetric attacks, which overwhelm the victim with high rate of useless traffic (usually hundreds of Gbps or even a few Tbps), consuming a critical resource such as bandwidth. On the other hand, there are also low-rate attacks, which target a specific application or OS resource that can be exhausted at lower traffic rate. Some low-rate DDoS attacks target a vulnerability in an application or OS service, and are known as exploit-based DDoS attacks. Other low-rate attacks target a scarce application or OS resource, such as CPU, thread count or socket count, and are known as flash- crowd DDoS attacks. Each of these DDoS categories has motivated many efforts in operational and research world to design effective defenses. Volumetric attacks. Operational defenses for volumetric DDoS attacks include in-network defense ap- pliances or a subscription to cloud-based DDoS defense service. These solutions are expensive, signature- based and opaque to customers. Research-based defenses focus on smart traffic filtering to handle its large volume, and often require big changes to network hardware, architecture or operations, for example to im- plement machine learning in line with traffic, to implement collaborative attack mitigation among networks xi or to redesign Internet architecture to minimize unwanted traffic. Hence, most research solutions are not deployable in today’s network. Gap: The gap we identified is lack of accurate attack detection and accurate signature generation. Many networks are well-equipped to filter traffic either at their ingress router or in collaboration with their upstream ISP. Filtering technologies already exist in today’s routers and switches, at the granularity of network flows. An open problem is a design of solutions that are open-source, deployable, and accurate in detecting attacks and devising network signatures, with minimum false positives. Further, such solutions should minimize the number of signatures they produce, as there is limited capacity at routers and switches for filtering rules. Insight: Our insight for volumetric attack detection is to focus on detection of the effect of denial of service, when the destination is struggling to deal with a traffic increase. We also want to detect attacks in a scalable manner, to allow for deployment of our solution at the target or at its ISP. Our solution, AMON-SENSS, uses traffic asymmetry and volume changes to capture the denial of service effect. Further, we use smart binning of the traffic data to achieve scalability and reduce false positives. Exploit-based attacks. Operational defenses for exploit-based attacks involve patching the vulnerabil- ity, providing piecemeal, custom-tailored defenses, which cannot handle new attacks. Research defenses similarly focus on handling one category of exploits, e.g., PHP exploits, and are not portable to new attack variants. Gap: The gap we identified is that the existing solutions for exploit-based attacks are not attack- agnostic. Insight: Our insight is to leverage the fact that each exploit-based attack must somehow misuse resources, in a way that is different from how these resources are used by legitimate clients. Our solution, Leader, monitors resource use per application and per network connection, and models typical resource us- age during no-attack periods. Leader detects anomalous use of resources during attack, and blocks sources of corresponding connections. Flash-crowd attacks. Commercially deployed defenses for handling flash crowd attacks use visual puzzles that human users must solve to prove that they are human, such as CAPTCHA and reCAPTCHA. xii These visual puzzles are annoying to users and can be bypassed using machine learning techniques. Re- search defenses against flash-crowd attacks all assume naive adversaries, and focus on one aspect of adver- sary behavior, e.g., random access to Web pages. Gap: The gap we have identified is that current defenses cannot handle sophisticated flash-crowd attacks in a way that is transparent to human users. Insight: Our insight is to model how human users interact with a service and use these models to detect bots. Our models must be either impossible or very costly for bots to bypass. Our solution, FRADE, models human behavior across multiple dimensions – request dynamics, semantics and processing of visual content. While a bot may bypass one of these models, bypassing all three is very costly and would force attackers to throw away bots after a few requests. In addition to our focus on designing effective DDoS defenses, we also explore in this work, if static, public blocklists can help reduce unwanted traffic . We find increasing use of public clouds to generate un- wanted traffic – about 50% of scans are sent from public clouds, and cloud addresses are twice as likely as non-cloud addresses to appear on public blocklists. Because businesses use public clouds for daily opera- tions, blocking cloud addresses for long time periods is not a feasible solution. Instead we need dynamic and custom blocklists, generated during attacks and deactivated as soon as the attack abates. Our Leader and FRADE solutions produce such blocklists, while AMON-SENSS focuses on coarser network-level sig- natures, which do not include source address information. Overall, in this dissertation, we focus on design of deployable, effective and scalable DDoS defenses that are robust against strong adversaries. xiii Chapter 1 Introduction Distributed denial-of-service (DDoS) attacks are on the rise [69, 37]. A recent report from NetScout [111] reveals that more than 5 M attacks occurred between January and June 2021, including multiple attacks exceeding 1 Tbps. We can classify DDoS attacks as volumetric or low rate attacks. V olumetric DDoS attacks involve large volumes of useless traffic of order of several Gbps or Tbps. Low-rate DDoS attacks can be viewed as exploit-based attacks, which exploit some vulnerability at the target application to bring it down with very low request rate, and flash-crowd attacks [149], which send many legitimate-like requests to the target, to overwhelm a scarce application or system resource. Many defenses are proposed against volumetric attacks. Commercial defenses include either a subscription to cloud-based DDoS defense services or purchase of in-network commercial defense appliances. These solutions are expensive, they often rely on signatures of currently popular DDoS attacks, and do not offer explainable interface to the customer, i.e., the customer cannot control closely what is filtered nor measure filtering accuracy and effectiveness. Research-based defenses mostly require big changes to networks to implement in-line machine learning, to facilitate collab- oration between networks or to implement a new Internet architecture. Hence, research solutions are often non-deployable in today’s networks. 1 Exploit-based attacks require very few requests to completely exhaust the application’s resources and deny service to legitimate clients. In some cases the vulnerability in the application can be patched. For example, the application may use a weak hash table implementation to store user input, and the attack sends carefully crafted inputs that hash into the same slot. This leads to hash table collisions and slows down the application. Replacing the hash table implementation with a more secure one will patch the vulnerability. In other cases, the vulnerability is simply in the assumption of how an application will use a scarce resource. For example, operating systems support a limited number of simultaneously open sockets, on the order of several thousand. When a new client request arrives, the server creates a dedicated socket to process it. In case of legitimate requests this approach works well, since client/server connections are short-lived and the socket is quickly freed. Since there are many applications, with known and unknown vulnerabilities, we cannot rely on patching only to address exploit-based attacks. For similar reasons, defenses against one variant of exploit-based attacks (e.g., Fitri et al. [58] and Choi et al. [32] approaches defend only against Slowloris) cannot fully address the problem. Hence, the common problems with existing research defenses against exploit-based attacks are: a) low portability to other applications b) inability to handle new attacks. In flash crowd attacks, attackers flood a popular service with legitimate-like requests, using many bots. This usually has a severe impact on the server, impairing its ability to serve legitimate users. The attack resembles a “flash crowd”, where many legitimate clients access popular content. Distinguishing between a flash-crowd and a flash crowd attack is hard, as the attack uses requests whose content is identical to a legitimate user’s content, and each bot may send at a low rate [134, 71, 78]. CAPTCHAs [15, 82] are popular defense against FCAs. Users, who correctly solve a graphical puzzle have their IPs placed on “allow” list. While a deterrent, CAPTCHAs have some issues. Multiple on-line services offer bulk CAPTCHA solving, using automated and semi-automated methods (e.g. [91]). CAPTCHAs also place a burden on human users, while FRADE does not. Google’s reCAPTCHAs [64] and similar approaches for human user detection are transparent to humans, but can still be defeated using deep learning approaches [7, 22, 137]. There are 2 Figure 1.1: The big picture: gaps identified several research-based defenses against flash-crowd attacks, but they all model one aspect of a bot’s behavior (e.g., random access to Web pages) and thus cannot handle sophisticated adversaries [126, 94, 180, 21]. In this dissertation, we identify the gaps that are present in the existing operational and research-based DDoS defenses. For each gap that we find, we provide new research insights and present novel solutions utilizing the insights to fill the gaps. The overall landscape we discuss and the positioning of our work are shown in Figures 1.1 and 1.2. Volumetric attacks: The gap is that we need solutions that are open-source and deployable in today’s networks. These networks have the ability to filter traffic either at their ingress router or in collaboration with their first-hop ISP. What lacks is the ability to detect attacks accurately and to devise accurate network- level signatures. Hence we need solutions that are open-source, deployable, accurate in detecting attacks and devising signatures, with minimum false positives. For volumetric attacks, our insight is that if we want to detect the effect of denial-of-service when the destination is struggling to deal with traffic increase. Further, we want scalable detection and signature derivation, so our solution could be deployed either at 3 Figure 1.2: The big picture: research insights and filling the gaps the potential target or at an ISP serving millions of potential targets. We propose AMON-SENSS, an open- source system for scalable, accurate DDoS detection and signature generation in large networks. AMON- SENSS employs hash-based binning with multiple bin layers for scalability, observes traffic at multiple granularities and deploys traffic volume and traffic asymmetry change-point detection techniques to detect attacks. It proactively devises network-level attack signatures, which can be used to filter attack traffic. We evaluate AMON-SENSS against two commercial defense systems, using 37 days of real traffic from a mid-size Internet Service Provider (ISP). We find that our proposed approach exhibits superior performance in terms of accuracy, detection timeliness and network signature quality over these commercial alternatives. AMON-SENSS is deployable today, it is free, and requires no hardware or routing changes. Exploit-based attacks: The gap is that current defenses for exploit-based attacks are not attack-agnostic. Our insight is that exploits misuse resources. We propose Leader, a defense that detects anomalous resource use and blocks sources of such traffic. Leader monitors fine-grained resource usage per application and per each external request to that application. Over time, Leader learns the time-based patterns of legitimate user’s usage of resources for each application and models them using elliptic envelope. During attacks, Leader uses these models to identify application clients that use resources in an abnormal manner, and blocks them. We implement and evaluate Leader for Web server protection against exDoS attacks. Our 4 results show that Leader correctly identifies over 99% of attack IPs, and over 99% of legitimate IPs across the six different exDoS attacks used in our evaluation. On the average, Leader can identify and block an attacker after about 6 requests. Leader has a small run-time cost, adding less than 0.5% to page loading time. Flash-crowd attacks: The gap is that current defenses model just one aspect of bot behavior and cannot handle sophisticated attacks. Our insight is to model multiple dimensions of human interaction with the server and detect bots when their actions mismatch any of our models. We propose FRADE, a DDoS defense that models human behavior across multiple dimensions, which are costly for bots to replicate. We propose robust and reliable models of human interaction with server, which can identify and block a wide variety of bots. We implement the models in a system called FRADE, and evaluate them on three Web servers with different server applications and content. Our results show that FRADE detects both naive and sophisticated bots within seconds, and successfully filters out attack traffic. FRADE significantly raises the bar for a successful attack – because each bot is blocked by FRADE within a few requests, it forces attackers to deploy at least three orders of magnitude larger botnets than today. Public blocklists: Orthogonal to the above DDoS-centric research, we also explored a related aspect of using static, public blocklists to filter unwanted traffic. A common defense against unwanted traffic (such as DDoS, scanning, malware) is to block known offenders by using public blocklists. We focus in this work on quantifying the maliciousness of public clouds, which comprise less than 6% of routable IPv4 address space. We find that about 50% of scans are launched from public clouds, and cloud addresses are twice as likely to end on public blocklists than non-cloud addresses. Because businesses use public clouds, and because our research shows they are heavily misused for malicious purposes, static, public blocklists are not the right approach for unwanted traffic filtering. Instead, we need dynamic and custom blocklists, produced during attacks and customized for the specific attack. We need agile defenses that evaluate maliciousness of 5 a source in real time. Two of our defenses – Leader and FRADE – produce such dynamic, custom blocklists in real time. Therefore, in this dissertation, we fill the gaps that exist in the current state-of-the-art solutions by developing deployable and effective DDoS defenses, AMON-SENSS, Leader and FRADE, that are robust against strong adversaries. 1.1 Demonstrating the Thesis Statement In this thesis, we identify the gaps that exist in the current state-of-the-art DDoS defenses. For each gap that we find, we utilize our insights and develop novel solutions to fill those gaps, providing effective DDoS defenses that are deployable and robust against strong adversaries. To address accurate detection and signature generation of volumetric attacks, we propose AMON- SENSS. Our solution detects attacks by looking for denial-of-service effect, which must be present if the attack is successful. This effect is visible in sudden increase of incoming traffic to a potential target, and sudden change of traffic asymmetry (ratio of received and sent packets within a given traffic type), which signals the target’s inability to process incoming traffic. AMON-SENSS proactively devises network-level attack signatures, which can be used to filter attack traffic. To achieve scalable deployment at ISPs, AMON- SENSS employs multiple levels of binning to reduce amount of data stored, and to reduce false positives. We show that AMON-SENSS is both accurate and effective by evaluating AMON-SENSS against two commercial defense systems, NetScout [12] and FastNetMon [55], using 37 days of real traffic from a mid-size Internet Service Provider – FrontRange GigaPOP. AMON-SENSS achieves superior performance, with F1-score of 92%–99%, and low to none false positives and false negatives. NetScout has significantly lower F1-score of 26–70%, mostly due to large false negatives. NetScout’s false negatives occur either on short attacks (under one minute) or on attacks that are under 1 Gbps, potentially due to thresholding issues. FastNetMon fares the worst—it has a lot of false positives, leading to F1-score of up to 2%. FastNetMon fails 6 to identify multiple reflection attacks, such CLDAP amplification. With regard to detection delay, AMON- SENSS has detection delay of under 30 seconds from attack’s onset, while NetScout takes more than 3 times as long. FastNetMon too takes much longer than AMON-SENSS to detect attacks. Our solution, AMON- SENSS is highly deployable. It works out of the line of traffic, analyzing passively-collected traffic traces, and it produces signatures, which can be immediately deployed in current switch and router hardware. To provide attack-agnostic defense against exploit-based (exDoS) attacks , we propose Leader. Leader [150] stays attack-agnostic by modeling how legitimate users of an application use system resources. Models are trained per application, during no-attack periods, but each model uses the same building pieces, and thus the entire defense is attack-agnostic. Leader monitors fine-grained resource usage per application and per each external request to that application. Over time, Leader learns the time-based patterns of legitimate user’s usage of resources for each application and models them using elliptic envelope. During attacks, Leader uses these models to identify application clients that use resources in an abnormal manner, and blocks them using existing OS firewall technology. We implement and evaluate Leader for Web server protection against exDoS attacks. Leader is effective – it correctly identifies over 99% of attack IPs, and over 99% of legitimate IPs across the six different exDoS attacks used in our evaluation. On the average, Leader can identify and block an attacker after about 6 requests. Leader has a small run time cost, adding less than 0.5% to page loading time. Leader is also highly deployable. It uses existing kernel-level monitoring [144] and existing firewall technology ( iptables), which are standard on Linux systems. Leader can thus protect with any Linux-based server. To provide sophisticated defense against flash-crowd attacks , we propose FRADE [149, 148]. FRADE models human behavior with Web servers across multiple dimensions, thus producing complex behavior models, which bots cannot easily match. FRADE models dynamics of human interaction with Web servers across multiple time windows, the probability of various request sequences and the way humans process visual cues. While a bot could defeat one model (e.g., send requests at a low rate), it is difficult and costly 7 to defeat all models. FRADE is very effective, blocking both naive and sophisticated bots within seconds. Because the attacker can only use a bot for a few requests, before being blocked, FRADE significantly raises the bar for a successful attack, by forcing attackers to deploy at least three orders of magnitude larger botnets than today. Further we quantify misbehavior from clouds [147], to demonstrate that static blocklists of known offenders cannot effectively filter unwanted traffic, including DDoS. A common defense against unwanted traffic (such as DDoS) is to block known offenders by using public blocklists. These lists are updated irregularly, ranging from 15 minutes to several days. In our work, we quantify the misuse of public clouds, which comprise less than 6% of IPv4 address space. We find that half of the scan traffic is launched from public clouds, and that cloud addresses are twice as likely to end on a public blocklist than non-cloud addresses. Because businesses use public clouds, static blocklists cannot be effective against unwanted traffic. Instead, we need dynamic and custom blocklists, which are built in agile manner by evaluating maliciousness of a traffic source in real time, close to the attack target. Our defenses, Leader and FRADE, build such dynamic and custom blocklists. 1.2 Structure of the Dissertation This dissertation is organized along our four research contributios. In chapter 2, we provide details about AMON-SENSS, an open-source system for scalable, accurate volumetric DDoS detection and signature generation in large networks.In chapter 3, we describe Leader, an application-agnostic and attack-agnostic defense against exploit-based DDoS attacks. In chapter 4, we describe FRADE, which models a human user’s interaction with a Web server to protect the server from flash-crowd DDoS attacks, launched by bots. In chapter 5, we analyze 13 datasets, containing various types of unwanted traffic, to quantify cloud misbehavior and identify clouds that most often and most aggressively generate unwanted traffic. 8 Chapter 2 AMON-SENSS: Scalable and Accurate Detection of Volumetric DDoS Attacks at ISPs Distributed Denial of Service attacks continue to be a severe threat to the Internet, and have been evolving both in traffic volume and in sophistication. While many attack detection approaches exist, few of them pro- vide easily interpretable and actionable network-level signatures. Further, most tools are either not scalable or are prohibitively expensive, and thus are not broadly available to the network operator community. We bridge this gap by proposing AMON-SENSS, an open-source system for scalable, accurate DDoS detection and signature generation in large networks. AMON-SENSS employs hash-based binning with multiple bin layers for scalability, observes traffic at multiple granularities and deploys traffic volume and traffic asymmetry change-point detection techniques to identify attacks. It proactively devises network- level attack signatures, which can be used to filter attack traffic. We evaluate AMON-SENSS against two commercial defense systems, using 37 days of real traffic from a mid-size Internet Service Provider (ISP). We find that our proposed approach exhibits superior performance in terms of accuracy, detection timeliness and network signature quality over these commercial alternatives. AMON-SENSS is deployable today, it is free, and requires no hardware or routing changes. 9 2.1 Introduction Distributed denial-of-service (DDoS) attacks are increasing in volume and frequency [69, 37]. A recent report from NetScout [151] reveals that more than 5 M attacks occured before January and June 2021, including multiple attacks exceeding 1 Tbps. Many networks today handle volumetric DDoS attacks by purchasing either in-network commercial defense appliances or by subscribing to cloud-based DDoS de- fense services. In both cases, the commercial defense profiles the traffic to the potential targets and derives some kind of network signature, specifying the target and traffic ports and protocols. The defense uses this signature to filter attacks. A commercial defense can also resort to deep packet inspection or other more so- phisticated traffic analyses when filtering traffic, to minimize collateral damage. Commercial defenses have two downsides: cost and lack of transparency. While clouds offer basic business plans relatively cheaply (e.g., $200 per month for CloudFlare, $3,000 per year for AWS Shield), the price quickly increases for larger networks or networks carrying larger traffic volumes. Commercial appliance costs range from tens to hundreds of thousands of dollars. Commercial defenses further employ proprietary algorithms to devise filtering rules, and there are no independent evaluations of their accuracy. In many cases volumetric DDoS attacks could be handled comparatively well by the target network or the target’s upstream ISP, foregoing the cost and uncertain performance of commercial defenses. ISPs are especially well positioned for this task – they already handle high traffic volumes, and they have close rela- tionships with potential attack targets and interest to protect them. If the target or its ISP could accurately detect attacks and devise accurate filtering rules , these rules can be installed for free in existing firewalls and switches, at the target or at their first-hop ISP. The gap here is that there are no publicly available solu- tions, which detect DDoS attacks at scale (e.g., at ISP-size networks with millions of potential targets), and produce accurate network signatures for filtering. We propose AMON-SENSS, an open-source, scalable, flow-level DDoS detection for ISPs. AMON- SENSS employs binning to make detection scalable at the ISP level, and employs layers of bins to reduce 10 false alerts. AMON-SENSS works offline using existing flow capture tools, analyzes traffic at multiple gran- ularities and detects an anomaly when both volume and asymmetry of the traffic in a bin increase for a sus- tained time period. While anomaly-based DDoS detection has been widely explored in the past (e.g., [182, 119, 57]), the novelty of AMON-SENSS lies in scalability of its detection, and the accuracy of its attack signatures. AMON-SENSS produces alerts that contain accurate network signatures of attack traffic, and can be immediately deployed at existing firewalls. This deployability (offline detection using existing traf- fic captures, ready-to-deploy firewall rules) distinguishes AMON-SENSS from other research approaches, which classify traffic flows instead of producing attack traffic signatures, and are much less deployable. Flow classification must work in line of traffic, so flows classified as attack can be immediately filtered. Inline deployment requires new hardware. Many classification approaches also deploy machine learning, and thus deployment must support line-rate machine learning, which network devices do not support today. We further show that AMON-SENSS offers superior performance compared to two commercial de- fense solutions, NetScout [12] and FastNetMon [56]. Using 37 days of real traffic from a mid-size ISP FRGP, AMON-SENSS outperforms commercial approaches both in accuracy (F1-score of 92–99% versus NetScout’s 26–70% and FastNetMon’s up to 2%) and in detection speed (AMON-SENSS’s detection delay is less than a third of the NetScout’s). We release AMON-SENSS as open-source at [10]. 2.2 Related Work Many solutions have been proposed in literature to detect DDoS attacks [101]. We discuss only those research approaches that are closely related to AMON-SENSS in this section. Table 4.6 summarizes the related works in terms of whether the solution is packet-based or flow-based, whether it can identify attack signatures and whether it scales to ISP-level detection. Packet-based solutions: These solutions learn feature distribution in legitimate traffic, prior to attacks, and use these models to classify each packet as attack or legitimate. PacketScore [85], Carl et al. [182], 11 Fouladi et al. [60], Feinstein et al. [57], Lotfollahi et al. [99], Yuan et al. [181], Bardas et al. [14], and Kitsune [184] are examples of packet-based approaches. These approaches can be very accurate, but they are not practical. First, many networks only collect sampled NetFlow data, and not packet data, due to scale challenges. Second, packet classification approaches must be installed inline so they can immediately filter packets they identify as attacks. Machine learning and line-rate packet classification require special hardware, which many networks do not have today. Flow-based solutions: Similar to packet-based solutions, some flow-based solutions focus only on flow classification problem using clustering or machine learning (e.g., [119]). These approaches have the same deployment challenge as packet-based approaches – they must be deployed inline on special hardware. Other flow-based approaches work to identify attack sources, such as Braga et al. [23], Simpson et al. [136], Xu and Liu [179], and Doshi et al. [51].Source identification is useful, but not practical for mitigation, as attack sources can be spoofed or there can be more sources than filtering rules that can fit in today’s switches [93]. None of related work approaches produces network-level signatures in terms of transport ports and protocols, for attack filtering – they simply identify the attack’s onset and in some cases the attack’s target and sources. This is not useful for operational DDoS defense. Our focus is on identifying best, concise network filtering rules via offline traffic processing. Such rules can then be efficiently deployed in standard network hardware, such as switches and firewalls. In [166] Wagner et al. propose collaborative DDoS detection, involving multiple ISPs, where each ISP applies simple, threshold-based detection rules, and exchanges detection signals with others. Our work is complementary to Wagner et al. [166] and could replace their threshold-based detection to improve overall detection accuracy and scalability. 2.3 DDoS Attack Detection DDoS attacks come in many flavors [106, 140, 146, 101]. Volumetric attacks are attacks that overwhelm internal network capacity or even centralized DDoS mitigation scrubbing facilities with significantly high 12 Related work by works on identifies attack signature scales to today’s ISP traffic Kim et al. [85] packtes ✗ ✗ Carl et al. [182] packets ✗ ✗ Fouladi et al. [60] packets ✗ ✗ Qin et al. [119] flows ✗ ✓ Feinstein et al. [57] packets ✗ ✗ Doshi et al. [51] flows, packets ✗ ✓ Yuan et al. [181] packets ✗ ✗ Braga et al. [23] flows ✗ ✓ Simpson et al. [136] flows ✗ ✓ Jin et al. [77] packets ✗ ✗ Bardas et al. [14] packets ✗ ✗ Mirsky et al. [184] packets ✗ ✗ Xu and Liu [179] flows ✗ ✓ AMON-SENSS flows, packets ✓ ✓ Table 2.1: Summary of key related works volumes of malicious traffic such as UDP flood attack [165]. Application-level attacks are attacks that overload specific application resources with legitimate looking requests to make the application unavailable or unresponsive to legitimate users, such as flash-crowd attacks [149] or exploit-based attacks [26]. Because there are many DDoS variants it is difficult to design a solution that detects them all reliably. In this chapter, we focus on detection of volumetric attacks, i.e., ones that create a visible increase of volume of traffic to their target. One might consider such attacks trivially detectable, but detection is challenging at the network level. An attack could be too small at the network level to trigger detection, while still being large enough to overwhelm its target. The problem is compounded at large ISPs, which route extremely high traffic volumes in aggregate and serve millions of customers. Threshold-based approaches for attack detection usually employ manually set thresholds per target IP prefix, and raise alerts when incoming traffic exceeds them. Since different attacks can be harmful at different rates, threshold-based approaches cannot fully address the problem. Anomaly-based approaches track legitimate traffic’s features for each potential target IP address, and detect deviations as attacks. Such approaches are promising, but fall short in scaling up to millions of traffic streams, and in separating benign traffic spikes from malicious ones. Once attack is detected, the defense should produce an accurate network signature of the attack for fil- tering, specifying IP and transport header fields, e.g., dst IP 1.2.3.4 and proto udp and src port 13 month days app. flows app. bytes May 2020 9 2 T 30 PB August 2020 15 3.8 T 45 PB September 2020 13 3.3 T 40 PB Table 2.2: Datasets used in our evaluation 53. While it may be tempting to produce signatures that specify sources of attack traffic, such filtering rules can be bypassed if the attacker uses spoofing. Even without spoofing, large botnet-launched attacks can lead to thousands of filtering rules—a scale that cannot be supported by switch TCAM size [93]. AMON- SENSS focuses on generation of network-level signatures, which specify attack type and target, but not attack sources. 2.4 Evaluation We evaluate AMON-SENSS on Netflow traffic traces collected in a mid-size US ISP, FrontRange GigaPOP (FRGP), connecting educational, research, government and business institutions to the Internet. Traces are collected on all ingress/egress links between FRGP and the Internet, using packets sampled at either 1:100 or 1:4096 rate, and cover 37 days in 2020. We provide further details in Table 2.2. In addition to NetFlow traces, we also have alerts generated by a commercial DDoS appliance, NetScout, which is deployed on one, large ingress/egress link between FRGP and its upstream ISP. NetScout’s alerts denote start and stop times of the attacks (as observed by NetScout), the alleged target and the attack type, which can be easily transformed into a network-level signature. We evaluate AMON-SENSS against NetScout, and against an open-source NetFlow-based DDoS detection engine, FastNetMon [55], which is the basic, free version of the same-named commercial tool [56]. While details of how NetScout and FastNetMon detect attacks and determine attack type are proprietary, various technical brochures and blog posts indicate some use of threshold-based detection [112]. 14 2.4.0.1 Ground Truth Labeling Since we want to compare NetScout with AMON-SENSS, we cannot use NetScout’s alerts as ground truth. Instead we apply the approach described in [166, 86] to identify reflection attacks in our datasets as events where at least 10 different sources send more than 100 Mbps of traffic to a given target within the same second using the same source port. This approach was shown to be accurate in Kopp et al. [86], and was also used in Wagner et al. [166]. While accurate for ground truth labeling, this approach requires tracking separately traffic to each of the millions of ISP customers and thousands of ports per customer. It is memory and CPU hungry, running longer than real time and consuming up to 16 GB of memory on our datasets, and it cannot be used for scalable DDoS detection. 2.4.0.2 Evaluation Approach and Calibration In our evaluation we compare AMON-SENSS to NetScout and to an open-source version of FastNet- Mon [55]. We compare the detection delay of these approaches, their accuracy, and their signature quality. We focus only on detections of reflector attacks, which exceed 100 Mbps, because we have ground truth for these attacks. Both AMON-SENSS and FastNetMon use multiple parameters that determine their sensitivity and accuracy. We calibrate these parameters on three days of data in September dataset, and select best- performing settings for full evaluation. Our settings for AMON-SENSS are: S=60, N=3337, V =10, L=5, T on =20 s, T o f f =60 s. Our settings for FastNetMon are: ban_details_records_count=500, ban_time=3600s, threshold_pps =20,000. 2.4.0.3 Results Table 2.3 shows our results for AMON-SENSS, NetScout and FastNetMon. AMON-SENSS achieves supe- rior performance, with F1-score of 92%–99%, and low to none false positives and false negatives. NetScout has significantly lower F1-score of 26–70%, mostly due to large false negatives. NetScout’s false negatives 15 data detect. gr. truth TP FP FN F1 delay May A-S 12 12 2 0 0.92 27 s NS 7 1 5 0.70 99 s FNM 11 26.5 K 1 0.0008 43 s Aug A-S 39 38 5 1 0.92 17 s NS 7 8 31 0.26 75 s FNM 24 13.5 K 15 0.004 61 s Sep A-S 126 126 3 0 0.99 20 s NS 38 15 70 0.47 66 s FNM 106 11.6 K 20 0.02 78 s Table 2.3: Results: A-S: AMON-SENSS, NS - NetScout, FNM - FastNetMon occur either on short attacks (under one minute) or on attacks that are under 1 Gbps, potentially due to thresholding issues. FastNetMon fares the worst—it has a lot of false positives, leading to F1-score of up to 2%. FastNetMon fails to identify multiple reflection attacks, such CLDAP amplification. With regard to de- tection delay, AMON-SENSS has detection delay of under 30 seconds from attack’s onset, while NetScout takes more than 3 times as long. FastNetMon too takes much longer than AMON-SENSS to detect attacks. We further evaluate the quality of signatures for AMON-SENSS, NetScout and FastNetMon by com- paring them with ground-truth signatures for all true positives. When a ground truth attack involves several attack vectors a signature may match the ground-truth partially (only for some vectors) or fully (for all vectors). AMON-SENSS achieves 60% full matches and 40% partial matches, compared to NetScout’s 57% full and 43% partial matches, and FastNetMon’s 100% partial matches. AMON-SENSS thus clearly produces very accurate signatures. Another way to measure signature quality is to evaluate how well it drops attack traffic. We measure the amount of attack traffic dropped by AMON-SENSS, NetScout or FastNetMon, and compare it to the ideal case, which uses signatures derived from the ground truth. AMON-SENSS filters 80% of the ground- truth attack traffic, while NetScout filters only 46% due to larger detection delay and some false negatives. FastNetMon filters 56% of the ground-truth attack traffic. AMON-SENSS further achieves 98% precision (98% of filtered traffic is indeed attack, 2% is dropped due to false positives), compared to NetScout’s 99%, and FastNetMon’s 71%. Thus overall, AMON-SENSS’s signatures have the highest quality. 16 2.4.0.4 Sensitivity Analysis In this Section we explore how performance of AMON-SENSS depends on values of its parameters: anomaly score threshold S, number of bins per bin array N, number of layers L, times for attack alert aggregation T on and T o f f , and vote threshold for pruning V . For this evaluation we use select three days from September dataset. For space reasons, we summarize results. Score threshold S. We experimented with different thresholds for anomaly score ranging from 5 to 60. This threshold had only minor impact on the detection accuracy, with lower values slightly increasing false positives. Number of bins per bin array N. We explored values from 1 K–64 K for the parameter N. Larger values reduce errors (false positives and false negatives), but the gain is small. False negatives stabilize once we exceed 3 K bins. False positives are better handled by increasing the number of layers than number of bins. We expect that the N value would need to be calibrated for each deploying network, with more diverse traffic likely needing more bins per array. Number of bin layers L and voting threshold V . We jointly explored these parameters, evaluating 1–10 bin layers and V ={1-20}. Increasing the number of layers and voting threshold improves F1-score, but the improvement decreases above 5 layers, and V =10. Alert aggregation times T on and T o f f . We explore values between 5 and 200 seconds for these two parameters. While T o f f variation does not change AMON-SENSS performance, increasing T on increases false positives, while lowering detection delay. Optimal values for T on are 20–30 s. 2.4.0.5 Operational Cost On Intel Xeon 3.2GHz CPU with 4 cores, and with our datasets AMON-SENSS processes a day of traffic in seven hours, with 10 layers and 3,337 bins per bin array, consuming up to 11 GB of memory. Speed and memory cost scale linearly with N and L parameters. For example, with 1 layer and 3,337 bins, it takes 17 around 0.75 hours to process a day of traffic. Thanks to binning, memory cost stays constant when the size of the monitored network or traffic volume change. 2.5 Conclusions In this work, we propose a scalable, accurate, open-source DDoS detection system, AMON-SENSS. AMON- SENSS employs binning and layering to collect traffic statistics in scalable manner, and observes traffic at multiple granularities. AMON-SENSS applies anomaly detection using traffic surplus and asymmetry, and proactively collects and evaluates network level signatures. We evaluate AMON-SENSS against two com- mercial defense systems using 37 days of real traffic from a mid-size ISP. We find that AMON-SENSS exhibits superior performance in terms of accuracy, latency and network signature quality over these com- mercial alternatives, and release it as open-source. 18 Chapter 3 Leader: Defense Against Exploit-Based Denial-of-Service Attacks on Web Servers Exploit-based denial-of-service attacks (exDoS) are challenging to detect and mitigate. Rather than flooding the network with excessive traffic, these attacks generate low rates of application requests that exploit some vulnerability and tie up a scarce key resource. It is impractical to design defenses for each variant of exDoS attacks separately. This approach does not scale, since new vulnerabilities can be discovered in existing applications, and new applications can be deployed with yet unknown vulnerabilities. We propose Leader, an application-agnostic and attack-agnostic defense against exDoS attacks. Leader monitors fine-grained resource usage per application and per each external request to that application. Over time, Leader learns the time-based patterns of legitimate user’s usage of resources for each application and models them using elliptic envelope. During attacks, Leader uses these models to identify application clients that use resources in an abnormal manner, and blocks them. We implement and evaluate Leader for Web server protection against exDoS attacks. Our results show that Leader correctly identifies over 99% of attack IPs, and over 99% of legitimate IPs across the six different exDoS attacks used in our evaluation. On the average, Leader can identify and block an attacker after about 6 requests. Leader has a small run time cost, adding less than 0.5% to page loading time. 19 3.1 Introduction Distributed denial-of-service (DDoS) attacks create a large disturbance to businesses and critical infrastruc- ture services, resulting in large monetary losses [84]. Traditionally, DDoS attacks generate a flood of traffic to deplete network or CPU resources at the target, and interfere with the target’s service to its legitimate users. As cloud-based defenses handle volumetric DDoS attacks, attackers shift their focus to sophisti- cated, low-rate attacks targeting application resources [145, 116]. Application-layer DDoS attacks are on the rise [69, 72, 83, 90]. Recent statistics from Akamai [3, 4], show that the number of daily Web application attacks have seen a growth of more than 200% from December 2017 to October 2019 and millions of Web application attacks have occurred on a daily basis in 2020 too. There are two broad classes of application-layer DDoS attacks: flash-crowd attacks [149], which send high quantities of legitimate requests to the target, and exploit-based attacks, which exploit some vulnerabil- ity at the target application to bring it down with very low request rate. This chapter focuses on identifying and blocking exploit-based attacks (exDoS for short). ExDoS attacks require very few requests to completely exhaust the application’s resources and deny service to legitimate clients. In some cases the vulnerability in the application can be patched. For example, the application may use a weak hash table implementation to store user input, and the attack sends carefully crafted inputs that hash into the same slot. This leads to hash table collisions and slows down the application. Replacing the hash table implementation with a more secure one will patch the vulnerability. In other cases, the vulnerability is simply in the assumption of how an application will use a scarce resource. For example, operating systems support a limited number of simultaneously open sockets, on the order of several thousand. When a new client request arrives, the server creates a dedicated socket to process it. In case of legitimate requests this approach works well, since client/server connections are short-lived and the socket is quickly freed. Slowloris[173] exDoS attack, however, leads to a prolonged client/server communication and quickly depletes all available sockets. Since there are many applications, with known and unknown 20 vulnerabilities, we cannot rely on patching only to address exDoS. For similar reasons, defenses against one variant of exDoS (e.g., Fitri et al. [58] and Choi et al. [32] approaches defend only against Slowloris) cannot fully address the problem. We propose Leader, a novel application-agnostic and attack-agnostic defense against exDoS. The nov- elty of our approach lies in the Leader’s monitoring of resource use patterns, which we call connection life stages. These connection life stages are built from multiple, complementary observations collected at the (1) network level, and (2) OS level as each external client request is handled. Those observations are fur- ther linked to application-level information, identifying the application’s process and thread that process the external client request. Leader monitors all external requests for running services, and leverages connection life stages to build a fine-grained pattern of resource consumption by each service as it processes each request. During baseline operation (in absence of attacks), Leader groups these patterns per application, and uses elliptic envelope to build the application profile – a model of legitimate resource usage patterns by the external requests sent by the application’s legitimate clients. Afterwards, Leader performs classification on each external request, using the corresponding application profile to detect requests that consume resources in an abnormal manner. Sources of these requests (IP addresses) are blocked to mitigate the attack. Because Leader monitors all services and all resources its design is generic enough to protect various applications against various exDoS variants. In this chapter, we focus on one popular application – Web servers. We narrowed our focus to one application so we could design realistic evaluation scenarios, which include mulitiple application imple- mentations, realistic content served by the application, legitimate user’s interaction with the application and a variety of exDoS attacks. Leader’s design is application-agnostic. We leave exploration of Leader’s use with applications other than Web servers for future work. 21 We evaluate Leader on the Emulab testbed [169] using two popular Web server applications – apache2 and nginx – and a popular Web framework Flask [59]. In experiments our servers serve copyright-free content: Imgur and Wikipedia. We add some other Web pages with known vulnerabilities, to make the entire service vulnerable to certain variants of exDoS attacks. We generate realistic legitimate and attack traffic and evaluate against six different exDoS attacks: Slowloris attack [173], Hash Collision Attack [54], Regular Expression Denial of Service Attack (ReDoS) [152], the attack using preg_replace() PHP Function Exploitation (PHPEx) [163], Infinite recursive calls denial of service (IRC) [66] and Maliciously Crafted URL Attack (MCU). Leader accurately identifies more than 99.1% of attacker IP addresses, and more than 99.6% of legit- imate IP addresses. On the average, Leader can successfully identify and block an attacker after about 6 requests. Leader has a minimal run-time overhead, and adds at most 0.5% to Web request processing time. Compared to related work – Rampart [103] and Finelame [47] – Leader does not require any modification in the source code of applications, and it achieves comparable or better classification accuracy on a wider range of attacks and server applications. Thus, Leader offers superior defense against exDoS. Our code and data are available at [89]. 3.2 ExDoS Attacks Exploit-based denial-of-service attacks (exDoS) are challenging for defenses, because they exploit vulner- abilities in the target application’s design or implementation. They craft legitimate-looking application requests, and are often effective at very low rates (e.g., 100s to 1000s of requests per second). Each attack variant exploits a different vulnerability through a completely different mechanism. As there are many target applications and potential vulnerabilities, it is hard to handle exDoS in a scalable manner. We observe that many of exDoS attacks consume the resources of the target application or the underlying operating system in a manner that is different from legitimate users. For example, legitimate Web clients 22 sample legitimate connection sample exDoS attack connection call dur #calls mem CPU cyc. pf fd. dur #calls mem CPU cyc. pf fd SyS_getsockname 6.5µs 1 0KB 0.01M 0 0 16µs 1 0KB 0.01M 0 0 sock_recvmsg 789µs 4 0KB 0.1M 0 1 22,939µs 295 0KB 44M 1 1 sock_read_iter 34µs 4 1KB 0.03M 0 2 8,750µs 295 16KB 15M 0 1 sock_sendmsg 415µs 2 1KB 0.1M 0 1 752µs 2 1KB 0.1M 0 0 sock_write_iter 9.8µs 1 1KB 0.01M 0 1 32µs 1 1KB 0.01M 0 1 sock_poll 2,491µs 3 0KB 3M 0 0 11,073,328µs 97 0KB 55M 0 0 sockfd_lookup_light 53µs 3 0KB 0.01M 0 0 120µs 3 0KB 0.01M 0 0 Sys_shutdown 62µs 1 0KB 0.01M 0 0 101µs 1 0KB 0.01M 0 0 Table 3.1: Resource usage by the the different stages of a sample legitimate and a sample attack connection in Figure 3.1 . usually send their request to the Web server in a single packet, and then receive a reply. In a Slowloris attack [173], a variant of exDoS, the attacker starts but never finishes sending the Web request. This ties up a network socket for a long time, until all network sockets on the target server are depleted, which leads to denial of service. Similarly, when legitimate clients access a PHP page, they usually provide inputs that take a moderate time to process at the server. Conversely, attackers will craft requests to a vulnerable PHP page that will lead to lengthy processing or even an infinite loop [152]. This insight guides our approach to model how legitimate clients use server resources in a fine-grained manner, and to detect attackers as clients whose resource use departs from this model. 3.3 Leader Design and Implementation This section gives an overview of Leader’s design and implementation. Our design attempts to be generic and application-agnostic, while our implementation focuses specifically on protecting Web servers from exDoS attacks. 3.3.1 Overview Leader runs on the server that is being protected from exDoS attacks, and performs three distinct function- alities: (1) behavior profiling, (2) attack connection identification, and (3) attack mitigation. 23 Leader has two running modes: learning and classification. In the learning mode, Leader performs behavior profiling to learn application profiles . Each application profile models how legitimate users con- sume resources when engaging with the given application. This learning occurs offline, using traces of the application’s operation in the absence of attack traffic. In general, the models of per-request resource con- sumption should not change when the server’s content changes, but only if the nature of its service changes. For example, if an application server updates its content daily, it need not retrain Leader’s models. However, if an application server used to serve static content, and now it started serving dynamic content, or if it used to support short messages between users but now it supports video messages too, Leader’s models should be retrained to capture new usage patterns. During learning, Leader should ingest traces over many time periods to ensure that we learn diverse behaviors of legitimate users. When Leader’s models are learned, it switches to classification mode, where it continuously performs attack connection identification. For each external request, Leader compares resource use patterns of the application that processes the request against its application profile, and classifies the request as legitimate or attack. Sources of attack requests to the mitigation module for blocking. 3.3.1.1 Behavior Profiling In absence of attacks, Leader runs in learning mode, building models of how legitimate users consume each application’s resources. Leader observes the process of serving each connection as a sequence of connection life stages. A connection life stage is defined as specific resource usage (i.e., time, memory, CPU cycles, page faults, and open file descriptors ), quantified by amount of resource used, as a result of serving the incoming request. Each connection life stage relates to a specific function call in ( net/socket.c), issued in the process of serving the request. 24 Leader employs machine learning to build an aggregate baseline model for each application and for each stage of the connection life stage sequence. 3.3.1.2 Attack Request Identification Once it learns the baseline models, Leader switches to running in the classification mode . During classifica- tion Leader works on live data, applying its models to identify attackers. We considered running Leader in the learning mode until an attack is detected. Such on-demand en- gagement of classification would limit any redundant misclassification of legitimate traffic. But continuous classification had the advantage of early attack detection, even in the case of stealthy attacks. Extremely low rate attacks, such as PHP Infinite recursive calls denial of service [66] (discussed in detail in Section 3.4.1) can create load on a Web server, which is equivalent to 0.53 million requests, using just a single request. Running Leader in continuous classification mode enables us to identify and block the attacker after even a single malicious request. 3.3.1.3 Attack Mitigation Sources whose requests consume resources in a way that deviates from Leader’s models are identified as attackers. In our current prototype, we block IP addresses of the attackers. However, one could deploy one or more alternative mitigation approaches, such as: (1) derivation of payload signatures from attack connections and their use in a firewall with deep packet inspection, (2) connection termination (e.g., via TCP RST), (3) dynamic resource replication, (4) program patching and algorithm modification. We leave exploration of other mitigation approaches as future work. 3.3.2 Assumptions and Limitations As discussed in Section 3.2, Leader focuses on handling many variants of exDoS attacks. This includes attacks that exploit vulnerabilities at application, operating system, and protocol levels. Leader does not 25 handle flash-crowd attacks [ fca] that simply send more requests per second than the server can handle, but do not consume resources in a manner that differs from legitimate users’. We consider that a powerful remote attacker (i) can send arbitrary requests to a service hosting a vulnerable application, and (ii) is aware of the application’s structure and vulnerabilities. Attackers can target CPU, memory, file descriptors, page-faults or other limited resources in the host system. We assume that each incoming connection carries one or more application requests. Our connection life stages model how all requests on the given connection are being processed by the application, and how they consume resources on the server. For simplicity, we use terms “request” and “connection” interchangeably in the rest of the chapter. Leader attempts to quantify resource usage per an incoming connection to the server by measuring resource usage of processes and/or threads that serve requests received on this connection. Typically, ap- plication servers either spawn off a process or start a new thread to handle each incoming request. This is also the case for Web server applications we tested – apache2, nginx and a web server written in Flask framework. It is possible to design applications so that they process multiple incoming requests in a thread. This would make it harder to tease apart which resource consumption is due to which connection. We leave handling of this shared processing scenario for future work. We assume an attacker model where remote attackers cannot overwrite system binaries or modify the kernel of the system running Leader, i.e, we assume that Leader process is always trustworthy and engaged. 3.3.3 Implementation In this section we describe how we implemented Leader for protection of Web servers against exDoS attacks. Leader contains five modules, illustrated in Figure 3.2 , that are each responsible for a different func- tionality in exDoS attack mitigation. During the learning phase, the Prober Module collects the data needed for learning and passes it to the Builder Module. The Builder module keeps building connection life stage 26 start SyS_getsockname sock_sendmsg sock_write_iter end sock_recvmsg sock_poll sockfd_lookup_light SyS_shutdown sock_read_iter Figure 3.1: Sample legitimate and attack connection life stages. Table 3.1 shows the time of each stage, how often it is visited in the sequence, memory used, number of CPU cycles elapsed, number of page faults that occurred and the number of open file descriptors for each stage. The red portion shows the sequence of calls in an exDoS attack that are different from the calls observed on a legitimate connection. Different exDoS attacks may follow different sequences and consume different amount of resources at different calls. sequences. Each sequence is a snapshot of the associated request’s resource consumption up to the given moment in time. These snapshots are sent to the Learner module to generate the baseline model of legitimate client behavior. Together, Prober, Builder and Learner modules are engaged in behavior profiling. The base- line model is used by the Scoring module in the classification phase to classify connections as legitimate or attack. Finally, source addresses of connections that are classified as attack are sent to the Mitigation module to mitigate the attack. We explain each of these modules in more detail next. 3.3.3.1 Prober Module This module uses SystemTap [144] to trace and log in real time in the kernel all function entries and ex- its in the socket library (net/socket.c). It also records the resource usage during each of these function calls. SystemTap dynamically inserts our selected probes into kernel code using Kprobes, and it is a reli- able, widely used profiling tool. Another option would have been to use extended Berkeley Packet Filter (eBPF) [162], which is lighter-weight and safer to use. We leave exploration of eBPF for future work, but note that newer versions of SystemTap can internally use eBPF [127]. Thus we expect that our implementa- tion could easily switch to eBPF. 27 We use SystemTap to probe the application’s processing of incoming service requests at function call level. This does not require any modifications to applications, but only to the server’s operating system to install the loadable kernel module. We trap calls to net/socket.c, which is involved in handing of incoming service requests for any application. SystemTap provides the functionality to log the function call, entry and exit timestamps, in micro-seconds precision, and the thread and process identifiers (the task group ID of the current thread) associated with the function call. For each function call, SystemTap also allows us to log the number of CPU cycles, page faults that occurred, file descriptors opened, the amount of memory used, as well as the associated source IP address and source port number. One can use log rotation techniques [171] to limit the size of the real time SystemTap logs. 3.3.3.2 Builder Module Using the data collected by the Prober module, the Builder module builds life stages for the given incoming connection. In our prototype, we use the tuple <thread id, process id> to uniquely identify a given external (in- coming) connection to the application at a given time. We later link this tuple to the source IP address and source port of the external client, which we obtain from the arguments of the sock_recvmsg call. We use the process table to map the <process id> to the application name. A connection’s life stage corresponds to one function call ofnet/socket.c and the resource usage (CPU cycles, page faults, file descriptors and memory) measured by SystemTap for handling this call. As time progresses, recent life stages are linked to the preceding ones. Thus, each life stage pattern is actually a snapshot of the function call sequence and resource usages from the start of the given external connection up to the given moment in time. The initial call to sockfd_lookup_light marks the start of the sequence, and it ends eventually with the call to SyS_shutdown or sock_destroy_inode or __sock_release or sock_close. If serving a request re- quires access to the database, cache, and/or any other internal services, the internal connection’s life stage 28 Prober syscall1, measurements, treadID, pID, conn_info syscall2, measurements, threadID, pID, conn_info … Builder Learner time life stage part. memory CPU cycles page faults file descriptors Ensemble learning Random forest classifier Mitigation 1.2.33.18 178.33.89.21 98.31.75.132 blocklist via ipset LEARNING legitimate requests LEARNING attack requests CLASSIFICATION legitimate and attack requests Figure 3.2: Leader’s operation: learning and classification sequence is integrated into the main external connection’s life stage sequence. We can link all these connec- tions together if they are served by the same process/thread that accepted the external connection, or if the accept process/thread spawned the processes/threads for the internal connections. However, we currently cannot account for internal resource consumption that occurs when accept process/thread passes jobs via queues to another, already running internal thread. We leave this for future work. After each second, we snapshot each connection’s life stage sequence. If Leader is in the learning mode, we pass this snapshot to the Learner module to create the baseline model. If Leader is in the classification mode, we pass the snapshot to the Scoring module. Figure 3.1 illustrates the connection life stages cor- responding to serving of one sample legitimate (green line) and one sample attack (red line) request. For the given incoming request, the resource usage for each life stage at that time by the given process and/or thread serving that request is shown in Table 3.1. The portion highlighted in red color shows the portion of a connection sequence that differs between the attack connection and the legitimate connection. In this ex- ample, the attack connection loops many times between the calls sock_read_iter and sock_recvmsg. It ocassionally departs from sock_read_iter and cycles through sock_send_msg, sock_write_iter, and sock_poll, then goes back to loop betweensock_read_iter andsock_rec-vmsg. This pattern produces both abnormal time spent in a given life stage and abnormal number of visits to a given life stage. 29 Figure 3.3: Experiment topology in Emulab: 4 physical attackers (up to 1,000 virtual attackers), 1 legitimate client (up to 100 virtual clients) and 3 servers on a LAN. 3.3.3.3 Learner Module This module deploys machine learning to learn separate baseline models for the following resources spent per connection life stage: time, memory, CPU cycles, the number of page faults and the number of open file descriptors. In addition to these models, Learner also builds a model of specific life stage patterns. This model includes the number of visits to each life stage and the presence or absence of some specific life stage transitions. Learner first groups all the snapshots for a given application. We mine the following features from each snapshot to build our baseline model: (a) total time elapsed (in microseconds precision) in each of the following function calls: sock_write_iter, sock_read_iter, sock_recvmsg and sock_poll, (b) total memory consumed in each of the following function calls: sock_write_iter, sock_read_iter, sock_recvmsg and sock_poll, (c) total CPU cycles elapsed in each of the following function calls: sock_write_iter, sock_read_iter, sock_recvmsg and sock_poll, (d) total page faults that occurred in each of the following function calls: sock_write_iter,sock_read_iter,sock_recvmsg andsock_poll, (e) total number of open file descriptors in each of the following function calls: sock_write_iter,sock_read_iter, sock_recvmsg andsock_poll, (f) number of calls to the following functionssock_write_iter,sock_read_iter, sock_recvmsg and sock_poll and the presence or absence of the sequence of the calls (s1) sock_r- ead_iter→ sock_sendmsg, (s2) sock_read_iter→ sock- _poll, We learn separate models for (a) – (f) and later utilize these models for classification. While it is possible to use (a) – (f) together to learn a 30 single baseline model, that leads to overfitting, because of high dimensionality [170]. We chose to track the highlighted function calls (see Table 3.2), because they are involved in starting the connection, receiving service requests and sending the replies back to the client. When a service request ties up resources at the server, this becomes visible in the time elapsed and the resources consumed in these selected function calls, or in the increase of the frequency of these function calls. We also chose to track the presence and absence of two key life stage transitions: sock_read_iter→ sock_sendmsg and sock_read_iter→ sock_poll, because these transitions were heavily present in processing of attacks we used in evaluation. We plan to explore tracking of other transitions in our future work. We standardize our features (by removing the mean and scaling to unit variance). We then apply machine learning to learn the baseline model. We focus on single-class learning, because we wanted to model only legitimate traffic. Modeling only legitimate traffic enables Leader to be attack-agnostic and potentially effective against a variety of exDoS attacks. We evaluated 1-class SVM [92] and elliptic envelope [132, 130] for our model building, and elliptic envelope had superior performance (see Appendix, Table 4.1). Elliptic envelope [132, 130] is an outlier detection approach, which models target features as Gaussian distributions. It gives a robust co-variance estimate of the data. It defines the shape of the data, creating a frontier that delimits the contour, that is elliptical in shape. Basically, it fits the tightest Gaussian (smallest volume ellipsoid) that it can over the data points, while discarding some fixed fraction of outlier points specified by the user [129]. This Gaussian forms a decision boundary. The trained model stores the estimated co-variance and the decision boundaries for each feature. We verified that our features follow half-normal distribution, which is a subclass of Gaussian distribution. In current implementation, the duration of our learning stage depends on the size of our training set. In actual deployment, learning should end when the model has stabilized, and no further changes are detected with new input samples. Section 4.3.7 lists the amount of time it requires to train our model for the different sizes of the training sets. 31 candidate function calls sock_lookup_light sock_alloc_inode sock_alloc sock_poll sock_alloc_file move_addr_to_user SYSC_getsockname SyS_accept4 SYSC_accept4 SyS_getsockname sock_write_iter sock_sendmsg sock_read_iter sock_recvmsg __sock_release Sys_shutdown sock_close sock_destroy_inode Table 3.2: Leader tracks the highlighted function calls or uses them to identify a sequence of calls. To further improve the classification accuracy, we perform ensemble learning [184] on the classification outputs of the individual elliptic envelope models. During the learning phase, we learn a Random For- est ensemble model, using a well-balanced mix of known legitimate and attack connections. During the classification phase, the classified outputs of the individual elliptic envelope models are fed as input to the pre-trained Random Forest ensemble model to decide if a connection is legitimate or attack. 3.3.3.4 Scoring Module After learning, Leader switches to the classification mode. In classification mode, the Scoring module uses elliptic envelope to classify each connection as either a legitimate connection or an attack connection. Ellip- tic envelope uses the Mahalanobis distance in identifying outliers, which has been widely used in literature for identifying outliers in multivariate datasets [46]. During classification, elliptic envelope computes the Mahalanobis distance for each input, using the co-variance learned by the model. If this distance lies within the decision boundaries, the input is considered to be in line with the model. Otherwise it is considered as a possible attack connection. The same process is repeated for all the individual models that we learned during the learning phase. Finally, using the pre-trained Random Forest ensemble model, Leader generates the aggregated final classification decision. Each connection is classified every second so as to terminate the attack connections as soon as possible. We considered and evaluated two designs for attack classification: 32 • Liberal design assumes that each anomalous connection is attack connection. This approach ensures fast decision time but if there are any errors in classification, a legitimate source may become blocked by the module. • Conservative design requires that a connection receives some portion of anomalous classifications before i is regarded as attack. This approach should reduce misclassification of legitimate connections, at the expense of longer decision time. When scoring module identifies a connection as attack, it forwards its source IP address to the mitigation module. 3.3.3.5 Mitigation Module Our current implementation of the mitigation module adds each IP it receives from the scoring module to the IP blocklist. We use theipset utility in the Linux kernel to implement the blocklist. Blocking rules remain in place for a custom duration (e.g., 10 minutes), which can be configured by the system administrator. Long attacks would thus lead to cyclical blocking of attack IP addresses. 3.3.4 Deployment Considerations We now discuss some practical deployment considerations. User identification. While we use IP addresses for identifying attack sources, using Web cookies could improve classification accuracy, because it would more accurately identify users with dynamically assigned IP addresses, or users which may share an IP address with others (e.g., behind a network address translator). This optimization is specific to Web servers, as other applications may not have an equivalent of a cookie. We leave this optimization for future work. Adversarial Data. Leader requires training data of legitimate connections and needs to be trained per application server. Leader requires training data that covers periods of both low and high server load. This is 33 because many servers may go through a different sequence of system calls when the load is high, engaging additional optimizations. High load may also slow down an application’s processing speed, since it may have to wait for resources to be made available for use in order to finish its processing. Such disturbances, may also lead the legitimate connections’ resource consumption patterns to deviate, which is important to capture in the models. The richer data we have for training, the better we can protect legitimate connections from misclassifications (Section 4.3.7). Attackers could somehow compromise training process and intro- duce adversarial data to influence learned models. For example, if low-rate attacks were introduced during training this could make our models unable to accurately detect attacks during classification. One way to handle this threat is to sample training data at random, over multiple days or weeks. Another way is to exclude outliers, by setting an outlier fraction (contamination parameter) value while training the baseline model. A third approach to handle this threat is to use techniques such as machine unlearning [27]. We evaluate the effect of adversarial data injection for an attack scenario in Section 4.3.7. Deployment cost and complexity. We envision that Leader would be deployed at a Web server, as a standalone application. Leader could run at the server at all times, since it is non-intrusive and not in line with traffic. Web servers that use a proxy or a load-balancer. Web servers that use a proxy would need to run Leader on all the backend servers, as well as on the proxy. The proxy would have to be trained separately from the backend servers, since they experience different resource use patterns when handling each con- nection. On identification of a malicious connection at the backend level, the backend server would have to signal to the proxy to mitigate the attack (e.g., block the attack source). Web servers that use a load balancer only need to run the Mitigation module on it, and Scorer modules from backend servers need to communicate attack sources to this Mitigation module. 34 Windows-based servers. Currently, our prototype supports only Linux and Unix-like operating sys- tems, because our implementation uses Linux-specific tools (SystemTap, ipset, etc.) However, Leader can be ported to other operating systems. We leave this for future work. Attackers spoofing IP addresses. We focus on Web server exDoS attacks. The servers use TCP, a successful 3-way handshake is required for payload exchange, and exDoS attacks require payload to exploit a vulnerability. Thus IP spoofing cannot be used in exDoS attacks on Web servers. Attackers use many sources. Distribution of attack traffic does not affect detection of exDoS attacks, because our models model per-connection behavior, and each exDoS connection will use resources in ab- normal manner. Distribution could affect mitigation if the attacker can easily move on to new sources after Leader blocks the old ones. While this is possible, it is not trivial for the attacker. New sources have to be acquired (e.g., rented), set up and synchronized. Depending on attack’s duration, and because our detection delay is low, the attacker would use only a small fraction of their botnet at each time period, while spending effort and money to maintain a large infrastructure. A target protected by Leader would be far less attractive for the attacker, than a target protected by conventional DDoS defenses. Sharing the same IP. If an attacker shares an IP with a legitimate user of the target server, which we expect to be rare, both would be blocked. This is unfortunate, but necessary to protect the server. Shared environment and network delays. Each application runs as a separate process/thread and we monitor time this process/thread spends in a given function call, as it has control of the CPU. Thus we measure an application’s system time (not real time) to handle a given processing task. This time does not include network delays nor is it impacted by other applications. For the same reason, an attacker cannot influence the measurement. 35 3.4 Evaluation In this section we describe our evaluation setup and our experiments, and we demonstrate how Leader defends against exDoS attacks. 3.4.1 Evaluation Setup We mirror dynamic content for two popular Web sites, for which the server setup is publicly available and content is copyright-free: Wikipedia and Imgur. All content is generated dynamically by pulling page information from the server’s database, using the original site’s scripts. We download each full site and deploy the site’s original configuration and scripts on our server within Emulab testbed [169]. We crawl the full Web sites using the Selenium-based [133] crawler, to learn all possible legitimate requests for these Web sites. For dynamic pages, we analyze what kind of arguments they require (e.g., string vs integer) and fuzz the inputs during crawling. Additionally, we engage 350 users from Mechanical Turk service to browse our Web sites and we collect their requests to further enrich the legitimate requests patterns. We have a total of over 300K legitimate requests for Imgur and over 500K legitimate requests for Wikipedia. We utilize 70% of the data for training and 30% of the data for testing. We generate legitimate requests by replaying these requests in a controlled environment—the Emulab testbed [169]—and launch attacks on our servers on the testbed. Ideally, we would have engaged real humans to perform live interaction with the server during the attacks. However, because we needed many trials, continuous human engagement would have been prohibitively expensive, and it would have been hard to achieve repeatability. In order to replicate non-copyrighted and diverse content, we selected two sites: Imgur, a picture-rich Web site, and Wikipedia, a Web site with textual content. To make sites vulnerable to some variants of exDoS attacks, we add five Web pages with vulnerabilities, one vulnerable page for all the attack variants except the Slowloris attack. Our selection of Web sites to replicate gave us not only content diversity, but Web server diversity as well. Imgur runs on the apache2 HTTP Server. Wikipedia could handle more 36 requests per second with nginx compared to apache2, so we used nginx in our tests. We now explain details of our evaluation setup, and discuss limitations in Section 3.4.3. Human User Data. We obtained data on how human users engage with our mirrored servers with the help of Amazon Mechanical Turk workers. This study was reviewed and approved by our IRB. In the study we presented an information sheet to each worker, paired the worker with a server at random, and asked the worker to browse naturally. We intentionally did not create specific tasks for workers, because we wanted them to follow their interests and thus produce realistic data. To keep engagement high, and discourage workers who just click through as fast as possible, we asked each worker to rate each page’s loading speed on a 1–5 point scale. These ratings were not used in our study. Each worker was compensated $0.20. Legitimate Traffic Generator. During each experiment, we replay the 30% of the legitimate requests that we set aside for testing, which we had collected using (a) the selenium-based [133] crawler, (b) Amazon Mechanical Turk workers. We wrote a custom traffic generator, which extracts URL sequences from our testing dataset, and then chooses when to start each full-length sequence depending on the desired number of active users. For example, if we want to have 10 active users at all times, our tool will extract 10 full-length sequences from our dataset, and replay each using a different source IP address. We adjust IP addresses assigned to machines in our experiments, and the routing, to ensure that each machine can maintain two- way communication with each server. Traffic is replayed at the application level. Thus any packet drops are handled properly by the underlying TCP protocol. When a sequence completes, another sequence is selected and another IP address becomes active. In our experiments, we maintain 100 active, simultaneous legitimate clients throughout each run, that send requests at the rate of 1 request per second. Experiment Topology and Scenarios. Our experimental topology is shown in Figure 4.5. It has 1 physical attack node (that can emulate up to 1,000 virtual attackers), 1 legitimate client node (emulating 100 legitimate users at a time) and 1 node each for the mirrored servers Imgur and Wikipedia. All nodes were of 37 defense applications tested machines used leg. requests attacks Rampart [103] Apache2 on Drupal, Wordpress 2 ( 1 for server; 1 for generating both legitimate and attack traffic) Crawled all the endpoints of each web application requests for replay, combined with humans generated requests. Legitimate traffic involves up to 128 user instances, with at least 0.1s pause between any 2 requests Tested attack variants: XML-RPC and PHPass. It maintains up to 30 attacker sessions, with 0.17 requests/second to 1 requests/second Finelame [47] Apache2, Node.js, DeDoS 2 ( 1 for server; 1 for generating both legitimate and attack traffic) Generated using Tsung [113] (250 requests/second to 750 requests/second) Tested attack variants: ReDoS, Billion Laughs and SlowLoris. Attack Rate: 1 to 15 malicious requests/second Cogo [52] Apache, OpenSIPS 3 for Apache ( 1 for server; 1 for generating legitimate and 1 for generating attack traffic) Generated using Choi-Limb [31] (100 benign clients (up to 100 requests/second) Tested attack variants for Apache: Slowloris and Slowread. Attack Rate: 1 to 100 connections/second Leader Apache2 on mirrored server Imgur, Nginx on mirrored server Wikipedia and a Flask application 3 ( 1 for server; 1 for generating legitimate and 1 for generating attack traffic). We use multiple IP addresses on the same physical node to emulate multiple attackers and legitimate clients. Crawled the full Web-sites of each web application to collect legitimate requests, combined with humans generated requests fetched using Amazon Mechanical Turk workers. 100 active, simultaneous legitimate clients send requests at the rate of 1 request/second. Tested attack variants: SL, HC, Re-DoS, PHPEx, IRC and MCU. Attack Rate: 100 requests/second. Table 3.3: Comparing Leader’s evaluation setup with closely related prior work. type d820 on Emulab, with 32 cores and the Ubuntu 18.04 OS. We further fine-tuned nodes to maximize the request rate that each client could generate, and to maximize the request rate that our servers could handle. We evaluate Leader by launching various exDoS attacks during classification. Each virtual attacker (one distinct source IP) sends attack traffic at a low rate of one request per second. We use the aggregate attack request rate similar to that of the aggregate legitimate request rate, i.e., close to 100 requests per second. This means that 100 virtual attackers are active at the same time in the experiment. Attack Variants. We investigate the following exDoS attacks: (1) Slowloris attack (SL) [173]: This attack uses partial HTTP requests to open connections between the attacker and the targeted Web server. The attacker keeps those connections open for as long as possible, thereby overusing socket descriptors at the target, which makes it unable to serve legitimate clients. (2) Hash Collision Attack (HC) [54]: A hash table is used to store a series of keys and values, and a hash function is used to convert each key to a bucket where the value will be stored. When the function maps two different keys to the same bucket, there is a collision, which is usually handled by employing a less efficient structure to store multiple key/value pairs such as double chaining or linked list. Buckets with many collisions experience much slower lookup, insert and removal of values. If a Web application uses 38 a predictable (weak) hash function and derives keys from user input, attackers can craft Web requests with colliding keys, thus dramatically slowing down the server. (3) Regular Expression Denial of Service Attack (ReDoS) [152]: In a ReDoS, attackers force a situation where the regex evaluator, for example, preg_match() for PHP, gets stuck evaluating a string and runs for a long time. The inordinately long run time is caused by backtracking as multiple matches are attempted. If a Web application employs regex matching on user input, and the attacker knows the specific regular expression used, they can craft inputs that take inordinately long time to process, thus causing ReDoS. (4) Attack using preg_replace() PHP Function Exploitation (PHPEx) [163]: The PHP functionpreg_replace() can lead to a remote code execution vulnerability if the Web application passes user input topreg_replace() and if that input includes executable PHP code. This vulnerability can be used to launch an exDoS attack, if user input includes many PHP function calls. PHP versions prior to version 5.5.0 have this vulnerability. In our experiments, we used PHP 5.3.29 to reproduce this vulnerability. (5) Infinite recursive calls denial of service (IRC) [66]: Passing a PHP file as an argument to itself can in some cases lead to infinite recursive call. If the Web application passes user input to a PHP file, the input containing that file’s name can trigger the IRC attack. For example, article.php?file=art- icle.php is an illustration of an attack request code.The Web application runs until the Web server hits the maximum execution time for a script and terminates. (6)Maliciously Crafted URL Attack on a Flask Application (MCU): Flask [59] is a popular Python Web framework. It is a third-party Python library used for developing Web applications. We crafted an imple- mentation of a vulnerable Web application in Flask to demonstrate that Leader supports a variety of Web server applications and frameworks. Our application is shown in Section 3.4.2. 39 3.4.2 Maliciously Crafted URL Attack on a Flask Application (MCU) We crafted an implementation of a vulnerable Web application in Flask to demonstrate that Leader supports a variety of Web server applications and frameworks. Our application is shown below: from f l a s k import Flask , r e q u e s t , r e n d e r _ t e m p l a t e _ s t r i n g , r e n d e r − _ t e m p l a t e app = F l a s k ( __name__ ) @app . r o u t e ( ’ / ’ ) def sample ( ) : p e r s o n = { ’ name ’ : "XYZ" , ’ d e t a i l s ’ : "== dfdgg . . . c n J l Z g ==" } i f r e q u e s t . a r g s . g e t ( ’ name ’ ) : p e r s o n [ ’ name ’ ] = r e q u e s t . a r g s . g e t ( ’ name ’ ) t e m p l a t e = ’ ’ ’ D e t a i l s are : %s ! ’ ’ ’ % p e r s o n [ ’ name ’ ] return r e n d e r _ t e m p l a t e _ s t r i n g ( t e m p l a t e , p e r s o n = p e r s o n ) A legitimate user’s request may look as follows: Webpage . com?name=XYZ{{ p e r s o n . d e t a i l s }} An attack request may look as follows: Webpage . com?name=XYZ{{ p e r s o n . d e t a i l s }}{{ p e r s o n . d e t a i l s }}− . . . . . . { { p e r s o n . d e t a i l s }}{{ p e r s o n . d e t a i l s }} ; where{{person.details}} can be appended to the URL hundreds of times, until the limit for the allowed maximum length for a URL is reached. If{{person.details}} resolves to a large text or string, the call to render_template_string() would lead to hundreds of occurrences of{{person.details}} being resolved. The return value of the function will be hundreds of times larger than the return values of legitimate requests. If 40 Figure 3.4: Leader identifies all the aggressive attackers within seconds and blocks them, which allows legitimate traffic to recover and obtain good service again. liberal design 1-class SVM elliptic envelope scenario TP TN FP FN TP TN FP FN SL 93.2% 92.4% 7.6% 6.8% 99.9% 99.4% 0.6% 0.1% HC 93.6% 91.7% 8.3% 6.4% 99.9% 99.2% 0.8% 0.1% Re-DoS 93.9% 92.1% 7.9% 6.1% 99.4% 98.1% 1.9% 0.6% PHPEx 93.1% 92.4% 7.6% 6.9% 99.1% 97.5% 2.5% 0.9% IRC 93.7% 92.1% 7.9% 6.3% 99.9% 96.9% 3.1% 0.1% MCU 94.1% 92.5% 7.5% 5.9% 100% 99.95% 0.05% 0% Table 3.4: Classification accuracy for the liberal design scenario for 1-class SVM and elliptic envelope. multiple concurrent malicious requests are made to the server this can overload its outgoing bandwidth and deny service to legitimate clients. Attack Traffic Generator. For different attack scenarios, we use different attack tools. We modified the source code of the tool PySlowLoris [139], to launch the Slowloris Attack. Similarly, we modified the source code of the tool php-dos-attack [102], to launch the Hash Collision Attack. In each case we use multiple IP addresses on the same physical node to emulate multiple attackers. For other exDoS attacks, our attack traffic generator is a modified httperf [108] tool. We added the ability to choose source IPs 41 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 True Positive Rate False Positive Rate Sequence Length = 3 Sequence Length = 4 Sequence Length = 5 Sequence Length = 6 Sequence Length = 7 Figure 3.5: ROC curves using the percentage thresholds for different sequence lengths Figure 3.6: Effect of adversarial data injection for an MCU attack scenario using the contamination param- eter as 0.1. 42 0 20 40 60 80 100 0 20 40 60 80 100 120 140 160 180 200 Percentage Thresholds Sequence Length Figure 3.7: The classification threshold values for the connections that are outliers for a given source IP for different sequence lengths that lead to more than 99% True Positives and fewer than 0.5% false positives. measure/scenario liberal conservative SL HC Re-DoS PHPEx IRC MCU SL HC Re-DoS PHPEx IRC MCU true positive 99.9% 99.9% 99.4% 99.1% 99.9% 100% 99.9% 99.9% 99.4% 99.1% 99.9% 100% true negative 99.4% 99.2% 98.1% 97.5% 96.9% 99.95% 100% 100% 100% 100% 99.8% 99.95% false positive 0.6% 0.8% 1.9% 2.5% 3.1% 0.05% 0% 0% 0% 0% 0.2% 0.05% false negative 0.1% 0.1% 0.6% 0.9% 0.1% 0% 0.1% 0.1% 0.6% 0.9% 0.1% 0% att. req. before block 1.65 1.92 1.27 1.25 1.44 1.18 5.17 5.32 5.50 5.02 5.84 5.07 Table 3.5: Leader’s classification accuracy, avg over 10 trials, for the liberal and the conservative scenario. from a pool, and to select requests for each IP from a given sequence, in specified order. Finally, Leader’s evaluation setup is comparable to evaluation used for closely related prior work, shown in Table 3.3. 3.4.3 Limitations There are several limitations of our evaluation scenarios. First, our topology is small and this limits the scale of our tests (e.g., number of requests per second and how many separate IPs we can emulate). Since we focus on low-rate, exDoS attacks, this limitation does not impact realism of our evaluation. Second, we replicate only two Web sites’ contents and Web server applications, Apache2, Nginx and a Flask-based application. Other sites may offer a variety of dynamic content and mulitmedia content to their users, which would lead to different models of resource usage. While we would have liked to replicate other popular sites, like Facebook or Google, this impossible as their source code is private, very complex and 43 supported by large datacenters. Our evaluation setup is comparable to other published work on this topic (e.g., Rampart [103], Cogo [52] and Finelame [47]). Third, a portion of our legitimate users’ traffic comes from our Amazon MTurk study, and it may not be realistic. However, Leader models do not rely on a user’s request sequence, but on the diversity of processing times and system calls for the pages visited by legitimate users. In our study users explored many of the pages on the replicated sites, and thus provided a good blend of request variety. Our full crawl of the Web sites provided a comprehensive dataset to complement Amazon MTurk dataset. 3.5 Results We show results for Imgur running on apache2. exDoS attacks on the Wikipedia server running on nginx show similar findings. 3.5.1 Theliberal design scenario In the liberal scenario, Leader has an aggregate accuracy of 99.1% to 100% in identifying the attacker IP addresses and an aggregate accuracy of 96.9% to 99.98% in identifying the legitimate IP addresses, across the six exDoS attack scenarios used in our evaluation. We summarize the results in Table 3.5. On the average, Leader can identify and block an attacker after 1–2 requests in the liberal design scenario. Attack variants ReDoS, PHPEx and IRC are more challenging for Leader, resulting in false positives (legitimate connection identified as attack) higher than 1%. Figure 4.6 in the Appendix, illustrates Leader’s handling of Slowloris attack. The figure shows: (a) the legitimate traffic (green), (b) the attack traffic going to the server (yellow), (c) the attack traffic after Leader’s actions (purple) and (d) blocking of the IP addresses of the attackers over time. The incoming attack maintains about 1,000 connections per second, using 100 attackers (10 connections per attacker IP address). Leader identifies and successfully blocks 89% of the attackers within 6 seconds, restoring the 44 server’s ability to handle legitimate traffic. The attack is fully handled within 15 seconds. The average time to identify and block an attack IP address is 4.93 seconds. While most attack IPs are blocked after 1–2 requests, it takes some time for processing of each request to progress through connection life stage sequence into a state that is recognized as an outlier by our elliptic envelope model. This time depends on the legitimate behaviors seen in training. We also determine the weights of the input features of the Random Forest ensemble model using the Gini importance technique [110] on our training dataset. Most important features were: time (gini=53%), and life stage patterns (gini=45%). Remaining features had in aggregate very low importance of 2%. The intuition for this findings is that any attack that overconsumes resources eventually leads to long time or abnormal patterns of life stage visits, as the target application struggles to process attack requests. 3.5.2 Theconservative design scenario For real-world deployment, false positives up to 3.1% can be concerning. Our conservative classification approach handles this problem by requiring some percentage of anomalous connection classifications by our model before flagging a source IP as an attacker. We define a classification threshold as the percentage of additional anomalous connections (after the first anomalous connection) from the same source, required for the source to be labeled as attack. Further, we must observe some minimum number of connections from a given source before we apply our classification threshold to classify the source. We call this value minimum sequence length. For example, if our minimum sequence length is 3 and our classification threshold is 0.5 we must observe at least 3 connections from a given source before we attempt to classify it. Assuming that there is one anomalous classification, we need another one (1 out of additional 2 connections) to label this source as attack. We plot ROC curves for the different sequence lengths and different values of classification threshold in Appendix, Figure 3.5 for sequence lengths 3, 4, 5, 6 and 7. The area under the ROC curves is maximum for sequence length 6 and sequence length 7. Figure 3.7 in the Appendix, shows the optimal 45 classification threshold versus sequence length, based on the cutoff points. While the optimal classification threshold is high for shorter sequences, it dips as the sequence length increases. Since the area under the ROC curves is maximum for sequence length 6 and sequence length 7, we use the analogous optimal classification thresholds to determine the classification accuracy for the conservative scenario (20% and 17% respectively). Table 3.5 summarizes Leader’s classification accuracy for the conservative scenario. Leader achieves higher than 99% accuracy for both legitimate and attack source identification, but it takes about 6 requests per source for classification. This is expected as we trade off detection delay for higher accuracy. 3.6 Comparison With Related Work We evaluate the two of the closely related works using the same experimental setup and attack scenarios that we use for Leader’s evaluation. Rampart [103]: Meng et al. present the defense mechanism Rampart, that protects PHP-based Web ap- plications from sophisticated CPU-exhaustion DoS attacks. Rampart uses statistical methods and function- level program profiling. It builds a statistical model of the CPU time consumed by each function call of the application process, and uses Chebyshev inequality to detect when a function spends more CPU time than its model indicates. These function calls are specific to the protected application, unlike our function calls that capture any invocation of functions in net/socket.c, i.e., any socket communication on the server. Rampart probabilistically terminates suspicious connections after they overspend CPU time, and can also devise source-IP blocklists and payload signatures for future attack filtering. Leader differs from Rampart in the way it models legitimate connections (system function calls and wall-clock time and call frequency vs Rampart’s CPU time and application function calls), and in the way it detects anomalies (elliptic enve- lope vs Rampart’s statistical detection). Because Leader models system function calls it is more broadly applicable to protect any application on the server. We evaluate Rampart [125] using the same settings as for Leader’s evaluation. Table 3.6 shows the accuracy of Rampart over different attack scenarios. While 46 Rampart can handle HC, Re-DoS and IRC very well, it fails to identify attackers in the SL attack. Rampart only identifies the anomalies related to the CPU time spent inside each PHP function, but exDoS attacks do not have to overspend CPU time to deny service. Moreover, Rampart works only on PHP-based servers and the current code [125] supports PHP 5.6 and PHP 7.0. This is the reason why Rampart fails to detect our PHPEx and MCU attacks. Leader does not have such restrictions and dependencies. To summarize, while Leader and Rampart have a comparable false positive and false negative rates, Leader detects and mitigates a wider range of exDoS attacks using more flexible models. Finelame [47]: Demoulin et al. propose Finelame [47], which leverages the operating system’s visibil- ity across the entire software stack to instrument key resource allocation and negotiation points. It leverages extended Berkeley Packet Filter to attach application-level probes to some of the key request processing functions. Finelame trains a K-means based model of resource utilization, which includes models of time spent in each application-level function, memory use, file descriptor use, page faults and total time to serve a request. The model parameters are then shared with the resource monitors, which use them to detect ex- DoS attacks and produce alerts containing process and thread IDs. Leader compares favorably to Finelame. Leader is easily portable to different applications, while Finelame requires instrumentation of each specific application (e.g., it would instrument apache2 separately from nginx and Flask in our experiments), and adds 4–11% instrumentation overhead. Further, Finelame does not provide mitigation, while Leader has an effective mitigation. We evaluate Finelame [47] using the same settings that we use for Leader’s evaluation, and using an already instrumented apache2 binary, shared by Demoulin et al. Table 3.6 shows the accuracy of Finelame over different attack scenarios. Finelame can correctly distinguish legitimate from attack con- nections only during SL and HC attacks. In other cases it labels all connections (legitimate and attack) as attack. The reason for this are unsophisticated Finelame models, which cannot distinguish legitimate from attack connections under heavy server load. We verified with Finelame’s authors that their models suffer from this deficiency. 47 Rampart Finelame Leader (conservative) scenario TP TN FP FN TP TN FP FN TP TN FP FN SL 0% 100% 0% 100% 94.4% 99.9% 0.1% 5.6% 99.9% 100% 0% 0.1% HC 100% 100% 0% 0% 100% 100% 0% 0% 99.9% 100% 0% 0.1% Re-DoS 100% 100% 0% 0% 100% 0.02% 99.98% 0% 99.4% 100% 0% 0.6% PHPEx current code does not support vulnerable PHP versions 99.9% 0.04% 99.96% 0.1% 99.1% 100% 0% 0.9% IRC 99.7% 100% 0% 0.3% 99.9% 0.02% 99.98% 0.1% 99.9% 99.8% 0.2% 0.1% MCU Rampart supports PHP applications only 100% 0% 100% 0% 100% 99.95% 0.05% 0% Table 3.6: Comparison with related works over the same attack scenarios. Contamination Precision Recall 0.000 0.999 0.936 0.002 0.990 0.992 0.005 0.982 0.999 0.01 0.969 0.999 0.03 0.941 0.999 0.05 0.911 1 Table 3.7: Precision and recall values when varying the contamination parameter of elliptic envelope. Cogo [52]: Elsabagh et al. propose Cogo, which builds behavioral models of network I/O events. It employs probabilistic Finite Automata (PFA) over these events to recognize future resource exhaustion states. Similar to Leader, Cogo’s tracing of events spans the entire code stack from userland to kernel. For attack mitigation, Cogo kills the process whose network I/O events depart from the PFA model. We evaluated Cogo on the mini_httpd server, based on the instructions provided by the Cogo’s authors. The rest of our evaluation settings were the same as for Leader. Cogo was successful in detecting the start of the attack, and identifying process and thread ID for the mini_httpd process. Unfortunately, killing that process would also deny service to all the legitimate users of the Web server. Unlike Cogo, Leader is able to identify attack sources and block them, allowing legitimate traffic to proceed unharmed. Due to Cogo software’s current implementation limitations, we could not deploy Cogo in our evaluation scenarios to perform a detailed comparative analysis. 48 3.7 Sensitivity During the learning phase, elliptic envelope uses multiple parameters to learn the baseline model. The key parameter is contamination, which denotes the proportion of outliers allowed in the input dataset. Another parameter is assume_centered, which determines whether the covariance estimates are to be directly com- puted with the FastMCD algorithm [131] or not. The parameter support_fraction determines the proportion of points to be included in the support of the raw MCD (minimum covariance determinant) estimate. If set to False, the minimum value of support_fraction will be used within the algorithm which is: (number of features + number of samples + 1) * 0.5. The parameters and values we used for learning the model are: contamination = 0.002, assume_centered = False and support_fraction = False. Table 3.7 shows the preci- sion and recall values obtained for Leader for the Slowloris attack experiment for various parameter values, and for liberal classification approach. By setting the contamination to be 0.002, Leader could achieve the best overall accuracy. Additionally, assume_centered and support_fraction set as True as well as False do not affect the accuracy. We set them as False and directly compute the covariance estimates using the FastMCD algorithm [130]. We measure the sensitivity of Leader’s model by varying the number and type of legitimate connections used for training the model, shown in Figure 3.8. The results show that: (a) the size of the training data is directly proportional to Leader’s classification accuracy. (b) capturing and including the data of legitimate users’ connections when the application is under load, helps prevent the misclassification of legitimate users. The time required to train the model is lower than 8 seconds for all the scenarios we evaluated. We further test resilience to model poisoning (adversarial data) during training, assuming that we use the contamination parameter value of 0.1 (10%), and vary the portion of the training data that is poisoned. Results are shown in Appendix, Figure 3.6. There is almost no effect on the classification accuracy if the percentage of adversarial data injection does not surpass the contamination parameter value. However, once it exceeds the contamination parameter value, the model’s accuracy for legitimate user classification declines 49 Learning Classification leg. traffic (100 rps) leg. traffic under load (1,000 rps) conns conns training time true positives true negatives 100 0 0.05s 100% 82.7% 1,000 0 0.05s 100% 82.1% 10,000 0 1.5s 100% 84.3% 50,000 0 4.9s 99.9% 87.4% 100 100 0.09s 100% 86.3% 1,000 1,000 0.58s 99.8% 97.2% 10,000 10,000 2.1s 99.4% 98.2% 50,000 50,000 7.5s 99.6% 98.7% Table 3.8: Training time and model’s sensitivity for an MCU attack scenario on varying the number of connections. sharply. Machine learning with adversarial data is an open research problem. We expect that we will be able to use solutions proposed by ML researchers to improve Leader’s training and resilience to adversarial data injection, such as [27]. 3.8 Operational Cost and Scalability Leader’s operational cost is modest and scales sub-linearly with the number of incoming concurrent con- nections. Leader’s core engine is written in C++. We use the tool SystemTap [144] for tracing and logging. Gebai et al. [63] show that the average latency for a SystemTap’s tracepoint in Kernel Space is 130 nanoseconds, and these are the tracepoints we use in Leader. We measured the time (in seconds) that the Web server takes to serve each of the requests to the homepage of the mirrored Imgur Web server both with and without SystemTap probes and Leader, under a heavy request load of 1,000 requests per second. The average time to process a request was 0.3239 seconds with SystemTap probes and Leader on, and 0.3223 seconds without SystemTap probes and without Leader. Thus SystemTap and Leader jointly add less than 0.5% overhead to the server’s latency. Leader’s elliptic envelope model requires a training time of 7 to 10 seconds for training on 100,000 legitimate users’ connections. One could train the baseline model for millions of legitimate users’ connections within a few minutes. On the average across 100,000 connections, Leader took around 0.5 ms for classifying a connection. Since Leader does not run on the request processing path of an application, the classification delay is invisible to the user. 50 Leader stores some state per connection and per source of incoming request. The aggregate of total RAM usage by Leader, including SystemTap and in-memory elliptic envelope model was 2.73 GBs during a 1,000-connections-per-second attack. Maintaining a short-lived state (until the connection finished or its source is blocked) about each active connection is thus affordable even for real-time deployment of Leader on popular sites that serve millions of users daily. Leader’s state consists of several measures of resource consumption, i.e. a few integers per call, added cumulatively. Once a connection finishes, we clear its state. Our current model requires less than 0.5 MB per connection, thus a low-end server with 16GB of memory could accommodate tens of thousands of connection life stage sequences. This is in line with the amount of connections a typical Web server can serve simultaneously.Extremely active Web sites, like Amazon can see about 4 M active clients per hour [141, 142], thus a few GBs would suffice to hold statistics for all active clients. We evaluate scalability of Leader’s mitigation using iptables and ipset. We artificially insert a diverse set of IP-rules and send packets matching these rules at a high rate. This emulates the situation when a server is under attack by numerous bots. We issue Web page requests and measure the time it takes to receive the reply. For space reasons we summarize our results. iptables only add 5% of overhead when there are more than 10 K rules. However, when the number of rules exceeds 1 M IPs the processing time explodes. On the other hand,ipset adds 5–8% to the processing time, as the rules table grows from 100 K to 1 M, and no measurable delay for fewer than 100 K rules. Thus, Leader can block at least one million IPs usingipset. 3.9 Related Work Most defenses handle exDoS attacks piecemeal, focusing on just a single variant, e.g., [13, 135, 58, 32]. This piecemeal handling results in numerous application-specific and attack-specific defenses, which are then slow to transition to practice because they only handle one attack variant. Another way to handle exDoS attacks is to treat them as application-level attacks and look for anomalies in the payload of the 51 incoming service requests. This is the approach taken by DDoS defense providers [80] and requires costly deep-packet inspection. This approach is incomplete. It will detect some attacks that require malformed content, but will miss others that rely on timing and send well-formed payloads. Several defenses take a more general approach to DDoS attack handling. For example, Hoque et al. [68] develop a statistical measure called Feature Feature Score (FFSc) for multivariate data analysis, to distin- guish DDoS attack traffic from normal traffic. They use three traffic features: entropy of source IPs, variation of source IPs and packet rate. They build profiles of these features during normal operation and detect attack traffic as the traffic that deviates from these profiles. While this approach will detect flooding attacks, it may not be so effective against exDoS attacks, because they can be launched at a much lower rate. Such a low rate may not significantly change traffic features monitored by FFSc. Xiang et al. [177] propose using the generalized entropy metric and the information distance metric to detect low-rate DDoS attacks, by mea- suring the difference in these measures between legitimate traffic and attack traffic. They then engage IP traceback to trace the attack close to the sources, and filter it there. Like [68], they consider a limited set of features and thus may miss attacks when attack and legitimate traffic are very similar, or may mistakenly drop legitimate traffic when it changes due to normal fluctuations. Their response strategy is also limited, as attack often comes from far away and traceback is not supported on the Internet. There has been research work done in the area of finding vulnerabilities [29, 115, 118], that can be exploited to launch sophisticated exDoS attacks. This work is complementary to Leader. Patching helps prevent exDoS attacks, while Leader helps detect and neutralize those attacks that exploit new or yet un- patched vulnerabilities. In additon to Cogo [52], Finelame [47] and Rampart [103], which we already discussed in Section 4.3, there are several other exDoS defense approaches. For example, Demoulin et al. present DeDoS [48], a platform for mitigating asymmetric DoS attacks. DeDoS offers a framework to de- ploy code in a modular fashion. If part of the application stack is experiencing a DoS attack, DeDoS can massively replicate only the affected component, potentially across many machines. This allows scaling of 52 the impacted resource separately from the rest of the application stack, so that resources can be precisely added where needed to combat the attack. Leader is complementary to DeDoS, as it looks to filter attack traffic rather than increase available resources. 3.10 Conclusions Exploit-based DDoS (exDoS) attacks exploit some existing vulnerability at the target application, and can be effective at very low rates. V olumetric DDoS defenses thus fail to handle exDoS attacks, and current exDoS defenses only handle a few variants. In this chapter, we introduced Leader, a novel approach for application- agnostic and attack-agnostic detection and mitigation of exDoS attacks. Leader operates by learning normal patterns of network, application and system-level resource usage when an application serves legitimate external requests. These baseline models are then used to detect external connections that use resources in anomalous manner. Leader blocks sources of such anomalous connections. We implement and evaluate Leader for protection of Web applications. Our results show that Leader has an aggregate accuracy of over 99% for both legitimate and attack connections, across six different exDoS variants used in our evaluation. Leader also has modest operating cost and adds only 0.5% of delay to Web request processing. Leader compares favorably to related work (Rampart and Cogo), and is more portable to different applications and attack variants. Our future work includes evaluating Leader for protection of other popular applications, such as DNS servers, VPN proxy servers and mail servers. We would also like to explore if Leader can be used to protect against OS-based exDoS variants, such as TCP SYN floods or IP fragmentation attacks. It is likely that we would need to enrich Leader’s monitoring with other system function calls, in addition tonet/socket.c to make it effective against OS-level exDoS attacks. 53 Chapter 4 Defending Web Servers Against Flash Crowd Attacks A flash crowd attack (FCA) floods a service, such as a Web server, with well-formed requests, generated by numerous bots. FCA traffic is difficult to filter, since individual attack and legitimate service requests look identical. We introduce robust and reliable models of human interaction with server, which can identify and block a wide variety of bots. We implement the models in a system called FRADE, and evaluate them on three Web servers with different server applications and content. Our results show that FRADE detects both naive and sophisticated bots within seconds, and successfully filters out attack traffic. FRADE significantly raises the bar for a successful attack, by forcing attackers to deploy at least three orders of magnitude larger botnets than today. 4.1 Introduction Application layer DDoS attacks or flash-crowd attacks (FCAs) are on the rise [69, 72, 83]. The attacker floods a popular service with legitimate-like requests, using many bots. This usually has a severe impact on the server, impairing its ability to serve legitimate users. The attack resembles a “flash crowd”, where many legitimate clients access popular content. Distinguishing between a flash-crowd and a FCA is hard, as the attack uses requests whose content is identical to a legitimate user’s content, and each bot may send at 54 a low rate [134, 71, 78]. Thus, typical defenses against volumetric attacks, such as looking for malformed requests or rate-limiting clients, do not help against FCAs. We propose FRADE, a server-based FCA defense, which aims to identify and block malicious clients, based on a wholistic assessment of their interaction with the server. FRADE views the problem of distin- guishing between legitimate and attack clients, as distinguishing between humans and bots. Thus, FRADE is well-suited to protect applications where legitimate service requests are issued by humans, such as Web servers. FRADE leverages three key differences between humans and bots. First, humans browse in a bursty manner, while bots try to maximize their request rate and send traffic continuously. FRADE learns the dynamics of human interaction with a given server over several time scales, and builds its dynamics models. Second, humans follow popular content across pages, while bots cannot identify popular content. FRADE learns patterns of human browsing over time, and builds its semantics model. Third, humans only click on visible hyperlinks, while bots cannot discriminate between hyperlinks based on their visibility. FRADE’s deception module embeds invisible hyperlinks into the server’s replies. When the load on the server is high, FRADE labels as “bot”, and blocks clients whose behavior mismatches its dynamics or semantics model, or clients that access deception hyperlinks. FRADE does not make FCAs impossible, but it successfully mitigates a large range of attack strategies. Our evaluation with real traffic, servers and attacks, shows that FRADE identifies and blocks naive bots after 3–5 requests, and stealthy bots after 15–19 requests, thus significantly raising the bar for attackers. To perform a successful, sustained attack, an attacker must employ more sophisticated bots, and deploy them in waves, retiring old ones as they are blocked by FRADE and enlisting new ones. The attacker needs at least three orders of magnitude more bots than used in today’s attacks. Prior work by Oikonomou and Mirkovic [114] proposed the high-level ideas of differentiating humans from bots using dynamics and semantics models, and decoy hyperlinks. We refer to this work as OM. We 55 build upon the basic ideas in OM, but significantly modify and improve them, to make the system robust against sophisticated adversaries, and practical to implement. Our contributions are (also summarized in Table 4.1): Sophisticated Attack Handling: OM cannot handle attacks by an attacker familiar with the defense, while FRADE can (Section 4.3.3). Stealthier Decoy Hyperlinks: FRADE uses stealthier deception hyperlinks than OM (Section 4.2.6), which cannot be detected via automated Web page analysis. Improved Models: FRADE has simpler and more robust dynamics and semantics models (Section 4.2.4 and 4.2.5), which only require legitimate clients’ data to train. OM also required attack data for training, which is hard to obtain and may impair detection of new attacks. FRADE is much more accurate than OM in differentiating bots from humans (Section 4.3.5). Implementation and Evaluation: FRADE is implemented as a complete system and evaluated with real traffic and server content , while OM was evaluated in simulation only. FRADE’s implementation-based evaluation helped us discover and solve major real-time processing issues, such as enabling the defense to receive and analyze requests during FCAs, and dealing with missed and reordered client requests (Section 4.2.7). FRADE as a complete system, mitigates FCAs about ten times faster than OM. Section 4.2.8 provides a detailed explanation of the novelties and improvements that FRADE offers over OM. Our code and data are accessible at the link [143]. 4.2 FRADE We next give an overview of FRADE’s goals and operation. Attacker Model. In our work we consider two attacker models. A naive attacker launches FCAs that are observed today and is not familiar with FRADE. A sophisticated attacker is familiar with FRADE and actively tries to bypass it. 56 Design Goals. We aim to design an FCA defense that mitigates both naive and sophisticated attacks. Our design rests on two premises. First, FRADE’s models are based on features that are difficult, albeit not impossible, for an attacker to learn, because they are only observable at the server. Second, if an attacker does successfully learn and mimic our models, it drastically lowers the usefulness of each bot and forces the attacker to employ many more bots to achieve a sustained attack. In our evaluation, FRADE raises the bar for a successful FCA from just a single bot to 8,000 bots. Extrapolating from the botnet sizes observed in contemporary FCAs, FRADE would raise the bar from 3-6 K to 24–48 M bots — far above the size of botnets available today. Anomaly detection methods regularly learn feature thresholds from training data, and apply them in production. Our contribution lies in (a) selecting which features to learn, to be effective against both naive and sophisticated attacks, (b) implementing and evaluating our approach in three different Web servers, with different content. 4.2.1 Feature Selection FRADE aims to differentiate human users from bots during FCAs, and to do so transparently to the human users. Differentiating humans from bots is challenging in an FCA, since legitimate and attack requests can be identical. Our key insight is that while individual requests are identical, the behavior of traffic sources (humans and bots), observed over sequences of requests differs with regard to dynamics and semantics of interaction with the server, and how they identify content of interest. Dynamics: Human users browse server content following their interest, and occasionally pause to read content or attend to other, unrelated tasks (e.g., lunch). Their rate is therefore bursty – it may be high in a small time window, but not sustained over time. Bots are incentivized to generate requests more aggressively, generating a sustained rate of requests over long time. To capture these differences we develop models that encode the dynamics of human user interaction with the server over multiple time windows. 57 The main challenge lies in how to properly model various types of requests to make it hard for bots to avoid detection. Because requests may be generated in different ways and may consume different resources at the server, we develop three dynamics models: (a) main-page requests are generated through human action, such as clicking on a hyperlink or scrolling to the bottom of a page – we model their rate directly over multiple time windows, (b) requests for embedded content, such as images, are automatically generated by a Web browser, and their rate will vary depending on the browser configuration and the number of embedded objects per page – instead of modeling request rate for embedded objects, we associate each object with its parent page, and allow only those objects to load that belong to a recently loaded parent page, (c) requests for dynamic pages can consume many server resources, even at low rate – we model the demand for server resources over different time windows. Semantics: Since humans follow their interests and understand content, they tend to click on popular content more often than not. Bots, on the other hand, must either hard-code a sequence of pages to visit, fabricate requests for non-existing pages, or choose at random from hyperlinks available on the pages, which they previously visited. The main challenge lies in building a model that properly leverages popularity measures to detect random, fake or hard-coded sequences of bot requests, while being able to handle user sequences that were not seen in training. FRADE models sequences of human user’s requests, and learns the probabilities of these sequences over time. Clients whose request sequences have low probabilities according to the model will be classified as bots. FRADE has a special fall-back mechanism to handle sequences not seen in training. Deception: We expect human users to visit only those hyperlinks that they can see and that are inter- esting to them, in the rendered content. The main challenge in leveraging this difference lies in developing ways to automatically insert decoy hyperlinks in pages, which humans will not visit, and to make it hard for bots to identify them via page source parsing. FRADE dynamically inserts decoy hyperlinks [138], into Web pages, which are linked to anchors invisible to human eye (hidden, small or transparent). FRADE leverages 58 Feature OM [114] FRADE Section Web req. FCA yes yes 4.3.2 Embd. obj. FCA no yes, DYN e mod 4.3.3 Costly req. FCA no yes, DYN c mod 4.3.3 Accuracy fp≥ 0, fn≥ 0 fp = 0, fn = 0 4.3.5 Models DYN h & sem. mod. improved 5.2 Honeytokens simple sophisticated 4.2.6 Training leg. & attack data leg. data 5.2 Evaluation simulation real traffic / servers 4.3 Table 4.1: Comparison between OM [114] and FRADE. page analysis and CSS files to make these anchors hard to identify by automated analysis. Clients that click on decoy anchors are identified as bots. We discuss novelty in Section 4.2.8 and demonstrate effectiveness in Section 4.3. 4.2.2 Overview FRADE runs in parallel with the server and not inline. It includes an attack detection module and three bot identification modules— dynamics, semantics and deception. It interfaces with a firewall (e.g., iptables) to implement attack filtering. FRADE learns how human users interact with the Web server that it protects. It builds the semantics and dynamics models by monitoring Web server access logs (WAL) in absence of FCAs. Deception objects, invisible to humans in rendered content, are also automatically inserted into each Web page on the server. When a potential FCA is detected, FRADE enters the classification mode. FRADE loads its learned models into memory, and begins tracking each user’s behavior. When a user’s behavior deviates from one of the learned models, the user is put on the filter list and all their requests are dropped. When attack stops, the detection module deactivates classification. A filtering rule is removed when the traffic matching it declines. FRADE uses some customizable parameters for its operation. The parameters and values we used in evaluation are shown in Table 4.2 and explained below. We perform sensitivity analysis over these parame- ters in Section 4.3.6. 59 main request? Web request yes no DYN h decoy target? no yes DYN e DYN c Semantics Block Block mismatch mismatch mismatch mismatch Figure 4.1: Overview of FRADE’s processing of a Web request. Parameter Meaning Value intDet monitoring int. 1 s attackHigh incoming req high thresh 10 * avg attackLow incoming req low thresh 2 * avg windows time int. for dyn. models 1 s, 10 s, 60 s, 300 s, 600 s ρ ratio of dec. to vis. obj 1 ThreshPerc high perc. of a modeled quantity 100 Table 4.2: FRADE’s parameters and the values we used. 4.2.3 Attack Detection The attack detection module runs separately from the rest of FRADE, and activates and deactivates other modules by starting and stopping processes. Our detection module is intentionally simple, since our focus was on bot identification. We focus on detecting increase in incoming requests, regardless of whether they are due to legitimate flash-crowd event or due to FCA. We then rely on our, very accurate, identification of bots to handle the event. A deploying network can replace our detection module with other mechanisms, such as the Bro Network Security Monitor [117]. Learning. FRADE’s attack detection module monitors incoming service requests rate, and learns its smoothed historical mean. If the current incoming rate of requests exceeds the historical mean multiplied by the parameter attackHigh, this module raises the alert. Otherwise, the module updates the mean. The update interval, intDet, and the parameter attackHigh, are configurable (we use intDet=1 s and attackHigh=10). Classification. During an FCA, the detection module continues to collect and evaluate the incoming re- quest rate, but does not update its historical mean. When the current rate falls below the pre-attack historical mean, multiplied by a configurable parameter attackLow (we use attackLow=2), FRADE signals the end of 60 the FCA and turns off bot classification modules. Figure 4.1 shows FRADE’s processing of a Web request during attack. In the rest of this section we describe each processing step. 4.2.4 Request Dynamics The dynamics module models the rate of a user’s interaction with a server within a given time interval, and consists of three sub-modules. DYN h models the rate of main-page requests, such as clicking on a hyperlink or scrolling to the bottom of a page. DYN e models embedded-object requests, such as loading an image. DYN c models the rate of a user’s demand for server resources, where the demand is represented as the total time it took to serve the given user’s requests in a given time period. Learning. DYN h and DYN c learn the expected range of the quantity they model (e.g., request rate, processing time, etc.) over all users, by analyzing WAL. We group requests by their source IP address, and assume that each IP address represents one user or a group of users. FRADE classifies each request as either a main-page or embedded. Section 4.2.10 describes how to detect these two types of requests. DYN h and DYN c model the main-page requests and use a high percentile of the range (controlled by ThreshPerc, e.g., 99%) as their learned threshold for the quantity they model. In our evaluation we use ThreshPerc=100%. The number and sizes of windows are configurable parameters. As humans browse in a bursty manner, having multiple windows allows monitoring at different time scales, and drastically raises the bar for a successful FCA. It enables us to correctly classify legitimate bursts and distinguish them from sustained attack floods, even when their peak request rates are equal. We use windows of 1, 10, 60, 300 and 600 seconds. DYN c models the processing time spent to serve a user’s requests. This time depends both on the complexity of the user’s request, and the current server load. DYN c models the time to serve a user’s request on lightly loaded server to capture only that cost to the server that the user can control – the “principal cost”. During an attack, we use this principal cost, rather than the actual processing time (inflated cost), to calculate a user’s demand on server’s resources. This allows us to avoid false positives, where legitimate 61 users hitting a heavily loaded server, experience a large inflated cost through no fault of their own. During learning, each request and its processing time are recorded in a hashmap, called the ProcessMap. DYN c looks up the principal cost for each request in the ProcessMap, and adds it to the running total for the given user. It then learns the ThreshPerc value over these totals and for each window. DYN e learns which embedded objects exist on each Web page and records this in a hashmap, called the ObjectMap. Classification. During classification, DYN h and DYN c collect the same measures of user interaction, per each user, as they did during learning. These measures are continuously updated as new requests arrive. After each update, the module compares the updated measure against its corresponding threshold. If the measure exceeds a threshold, the client’s IP is communicated to the filtering module. Whenever a client issues a main-page request, DYN e loads all the embedded objects related to this request from ObjectMap into this user’s ApprovedObjectList (AOL). DYN e checks for the presence of the embedded object requests made by the same user in his AOL. If found, the object is deleted from the AOL. If not found, DYN e treats this request as a main page request, and forwards it to DYN h and semantic modules. We do this because a user may bookmark an embedded object, e.g., an image, and request it separately at a future time. Our design allows such requests to be served, while preventing FCAs that create floods of embedded requests. 4.2.5 Request Semantics The semantics module models the probability of a sequence of requests generated by human users. Learning. We consider only requests classified as main-page requests. In the learning phase, we com- pute transition probabilities between each pair of pages (e.g., A to B) on the server using Equation (1), where N A→B is the number of transitions from page A to page B, and N A→∗ is the number of transitions from page 62 A to any page. We learn N A→B and N A→∗ from WAL. We define the probability of sequence S={ u 1 , .. , u n } as compound probability of dependent events, which are page transitions, using Equation (2). P t (A→ B)= N A→B N A→∗ , (1); P(S)= n− 1 ∏ i=1 P t (u i → u i+1 ), (2) During learning, the semantics model calculates sequence probabilities for each user. Since sequence probability declines with length, we learn the probability for a given range of sequence length (e.g., 5–10 transitions), grouped into a bin. We also ensure that bins are of balanced size. When learning the threshold for each bin, we sort probabilities of all sequences in our training set that fall into that bin and take a low percentile (1-ThreshPerc) to be the threshold. In practice, if a server has very dynamic content, the semantics module may not see all the transitions during learning, leading to false positives in classification. To handle incomplete training data, semantics module has a fall-back mechanism. It views Web pages as organized into groups of related content. During learning, it learns transitions from pages to groups, groups to pages, and groups to other groups. We define a group as all the pages that cover the same topic. On some Web sites, the page’s topic can be inferred from its file path, while others require analyzing each Web page content to determine the topic (Section 4.2.9). The probability of transition from a page/group to a group, is calculated as the average probability of transition to any file within the group: P t (A→ group(b))= ∑ f∈group(b) P t (A→ f) N A→group(b) , (3) Classification. FRADE processes the request sequence for each client in the active session list (ASL). When a new request arrives, the module updates the client’s sequence probability, just like it did during learning. If a transition from page A → B is not found, FRADE falls back to using groups instead of pages. It attempts to find transitions A → group(B), group(A) → B and group(A) → group(B) in that 63 order. When the first transition is found, its probability is used to multiply the current sequence probability, according to equation 2. If no transitions are found, FRADE multiplies the current sequence probability with a constant called noFileProb≪ 1. After each update, it compares the current sequence’s probability against the corresponding threshold for the sequence’s length. Values lower than the threshold lead to blocking of the client. 4.2.6 Deception The deception module follows the key idea of honeytokens [138], special objects meant to be accessed only by attackers. The module embeds decoy objects, such as overlapping/small images, into Web pages. In websites with mainly textual content, like Wikipedia, we insert hyperlinks around random pieces of text, but do not highlight them. This makes the hyperlink invisible to humans. In websites with mainly media content, like Imgur, we embed hyperlinks around small images, or small-font text. We insert these decoy objects away from existing hyperlinks, to minimize the chance that they are accidentally visited by humans. We automatically insert decoy objects into a page’s source code so that they do not stand out among other embedded objects in that page. The number of decoy objects to be inserted is guided by the parameter ρ – the ratio of the decoy to original objects on the same page. We make decoy hyperlinks hard to identify from the page’s source code by creating separate styles for them in the site’s CSS file. We automatically craft the names of the pages, pointed to by decoy hyperlinks, similar to the names of other, non-decoy pages on the server. We introduce some randomness into the deception object’s placement, to make it harder to identify them programmatically. 4.2.7 Using a Proxy To Speed Up Servers FRADE mines data about user payload from WAL, to classify users as humans/ bots, as shown in Figure 4.2. A server may be so overwhelmed under FCA, that it cannot accept new connections, slowing down logging and delaying FRADE’s action. 64 backend log FRADE client 1. SYN 2. SYNACK 3. ACK/PUSH 4. LOG Figure 4.2: Illustration of high-rate attack handling by the server itself backend log FRADE client 1. SYN 2. SYNACK 3. ACK/PUSH 4. LOG 5. SYN proxy Figure 4.3: Illustration of high-rate attack handling by Trans approach backend log FRADE client 1. SYN 2. SYNACK 3. ACK/PUSH 4. LOG proxy 5. RST Figure 4.4: Illustration of high-rate attack handling by TAB approach 65 We explored two approaches to boost the number of requests a server is able to receive and log during FCAs. Our first approach, transparent proxy (Trans), shown in Figure 4.3, uses a lightweight proxy between clients and the server. It completes the 3-way handshake with the client, receives and logs Web page requests. It then recreates the connection with the backend server. This can speed up logging, but ultimately the target server may overload before we block all bots, and this will back up the Trans server as well. We use http-proxy-middleware [30] as our transparent proxy. It lets us log requests as soon as they arrive, and forward them to the backend server. Our second approach, take-a-break proxy (TAB), shown in Figure 4.4, uses a dropping proxy between clients and the backend server. FRADE runs on the dropping proxy, which logs and drops all the requests, until our blocking manages to reduce the request rate. Logging requests and dropping them immediately allows for faster blocking, as immediate closure of a connection frees the port and socket on the proxy for reuse. Dropping all requests hurts legitimate clients, but it ensures the fastest bot identification, helping us serve users well for the remaining (possibly lengthy) duration of the FCA. We implement the proxy in http-proxy-middleware [30]. To improve the speed of bot detection, we further stop building the ApprovedObjectList (AOL) once TAB proxy is active. Since no replies are returned to users while the TAB proxy is active, a human user will not issue embedded object requests, while a bot may. This helps us identify bots faster. 4.2.8 Improvements over OM We now detail improvements of FRADE over OM – these improvements enable FRADE to be robust against sophisticated attacks, while OM only handles naive attacks. 66 Stealthier Decoy Hyperlinks: FRADE uses stealthier decoy targets and anchors, and makes the place- ment of decoy anchors more robust against false positives than OM. FRADE learns the page naming struc- ture from Web server logs, and automatically crafts the names of the target pages for decoy hyperlinks. OM creates target pages with random names, which can be detected by bots. FRADE inserts the decoy anchors away from the existing, visible anchors to reduce the chance that they are accidentally visited by humans. OM does not address such concerns, and is prone to false positives. FRADE makes decoy anchors invisible by adding new styles to the site’s CSS file, while OM manip- ulates the anchors in the Web page source, making them small or changing their color or z-index. OM’s anchors can be detected more easily by bots. Improved Dynamics Model OM models the request dynamics only for main-page requests, while FRADE models it for main-page and embedded requests, and also models each request’s principal cost. This helps FRADE handle a variety of sophisticated attacks (see Section 4.3.3) that OM cannot handle. OM uses decision trees to capture request dynamics, grouping requests into sessions and using four features per session. This makes OM’s model more complex than FRADE’s, which uses just one feature – the threshold rate of requests per time window. OM further requires both legitimate and attack data for training. Attack data is hard to obtain and overfitting can impair detection of new bot variants. FRADE only requires legitimate data for training. Improved Semantics Model Both OM and FRADE build the request graph to encode transition proba- bilities from one Web page to another. But OM focuses only on pages, while FRADE also models transitions between page groups. This fall-back mechanism enables FRADE to handle transitions in production that were not seen in training. Further, OM computes the sequence probability as the average of probabilities on the request graph, while FRADE computes it as a product (compound probability of dependent events), which ensures fast decline with sequence size. 67 Implementation and Evaluation: FRADE is implemented as a complete system and evaluated in a realistic setting, while OM was evaluated in simulation. 4.2.9 Deployment Considerations Customization. To use FRADE, the Web site administrators must (1) categorize their Web pages into groups for the semantics module, and (2) insert decoy hyperlinks into Web pages. This may in some cases require minor human effort, depending on the server’s content. Table 4.3 shows how we classified pages into groups. For Wikipedia, we leveraged its existing categorization of pages into topics. Imgur and Reddit have a folder-based Web site structure, with related files grouped into the same folder. In absence of both, a Web site could use a topic identification tool, such as [35]. We have automated decoy hyperlinks insertion (around 100 lines of code), which can be customized for a new Web site. User Identification. FRADE currently blocks IP addresses, but this can lead to collateral damage when clients share a NAT. FRADE could use cookies and block users at the application level, but when a server is under FCA, it is too overloaded to process each request and mine its cookie. We thus view IP-based filtering as necessary to relieve the load at the server. Training Data. FRADE requires training data of legitimate clients and needs to be trained per server. Each server needs to tune the frequency of their training and decoy hyperlink insertion to match the fre- quency of their content updates. Attackers may introduce adversarial data before the attack to dilute the learned models. One could address this issue by: (1) sampling training data over multiple days, (2) ex- cluding outliers by adopting lower values for ThreshPerc parameter, (3) using techniques such as machine unlearning [28]. Dynamic content and misclassification. If a server does not update its models on new content, FRADE may miss some transitions in the semantics model, or embedded objects in the AOL. Our fall-back mech- anisms for the semantics model and treating embedded requests not found on AOL as main requests, help 68 minimize this effect. We used the data from Internet Archive [100] to measure the daily updates on some frequently-updated Web sites, CNN, NY Times, Imgur and Amazon. On the average, a small percentage of the Web site’s content (0.17–0.31%) is added daily, around 6 K – 54 K objects and pages. FRADE’s models can be incrementally updated this often, without full re-training. Load Balancers. Larger sites deploy load balancers in front of server farms; we would have to period- ically gather web access logs to a central location and run FRADE there to learn models and classify bots. FRADE could then block bot IPs by inserting filtering rules into the load balancer. 4.2.10 Implementation FRADE’s core engine is written in C++, and runs on the Web server/proxy. Filtering is achieved by interfac- ing with a host-specific mechanism. We use iptables with ipset extension, which scale well with large filter lists. We classify each request as either main-page or embedded in the following way. We crawl the full Web site using the Selenium-based [133] crawler. This helps us identify both static and dynamically generated HTML content. We extract main requests by finding elements with tag “a” and attribute “href”. We label other requests as embedded. These steps are fully automated. 4.3 Evaluation Ideally, we would evaluate FRADE with operational servers, real logs, human users and real FCAs. Unfor- tunately there are many obstacles to such evaluation: (1) there are no publicly available WAL from modern servers, (2) paying real users to interact with a server during evaluation can get costly and prevent repeatable experiments, (3) there are no publicly-available logs of real FCAs. We test FRADE in emulated experiments on the Emulab testbed [169], using replayed human user traffic and real FCAs. We try to make our experi- ments as realistic and representative as possible, given the obstacles listed above. 69 4.3.1 Emulation Evaluation Setup We mirror dynamic content for three popular Web sites: Imgur, Reddit and Wikipedia. All content is gen- erated dynamically by pulling page information from the server’s database, using the original site’s scripts. This content is copyright-free and server configuration files were publicly available. We download each full site, modify it by automatically inserting decoy hyperlinks, and deploy the site’s original configuration and scripts on our server within Emulab testbed. While we wanted to replicate more servers in our tests, this was impossible because their implementation was either private (e.g., Facebook, YouTube, etc.) or their content was not copyright-free (e.g., major news sites). We engage human users to browse our Web sites and gather data to train and test FRADE’s models. We replay human user data in a controlled environment and launch FCAs, with real traffic, targeting our servers from an emulated botnet. We launch repeated FCAs with various botnet sizes and bot behavior, and measure the time it takes to identify and block bots. Our chosen Web sites had server software diversity. Imgur runs on Apache, Reddit runs on haproxy, and we deployed Wikipedia on nginx. Human User Data. We obtained human user data using Amazon Mechanical Turk workers. This study was reviewed and approved by our IRB. In the study we presented an information sheet to each worker, paired the worker with a server at random, and asked the worker to browse naturally. We intentionally did not create specific tasks for workers, as we wanted them to follow their interests and produce realistic data for our semantic models. We also asked each worker to browse at least 20 pages so that we would have sufficient data for training and testing. To keep engagement high, and discourage workers who just click through as fast as possible, we asked each worker to rate each page’s loading speed on a 1–5 scale. These ratings were not used in our study. Human behavior may become more aggressive during FCAs (e.g., more attempts to refresh content), which may lead to misclassification. However, QoS studies show that users tend to click less and not more when the server’s replies are slow [11]. Our dataset does not capture any 70 Server Groups Wikipedia Topic-based categories Imgur Folder-based groups Reddit Folder-based groups Table 4.3: Group assignment for our three Web sites. Time to block all bots Windows Botnet size 8 bots 800 bots 8,000 bots non-unif-5 (current) 3 s 8 s 16 s uniform-5 4 s 15 s 47 s uniform-10 3 s 10 s 38 s uniform-20 3 s 7 s 23 s Table 4.4: Time to block all bots adaptation of users to speed of server replies. Each server had 243 unique users for training and 107 users for testing in our dataset. Legitimate Traffic Generator. During each experiment, we replay user traffic from testing logs. We wrote a custom traffic generator, which extracts timing and URL sequences from logs, and then chooses when to start each sequence based on the desired number of active users. The generator uses many different source IPs. Our replay maintains timing between requests in a sequence, and traffic is replayed at the application level. When a user sequence completes, another sequence is selected and another IP becomes active. If we run out of sequences to replay, we reuse the old ones. Attack Traffic Generator. Our attack traffic generator is a modified httperf tool [108]. We added the ability to choose source IPs from a pool, and to select requests for each IP from a given sequence, in order. Before building our own attack tool, we have investigated popular attack tools, such as HULK [1], LOIC [70] and HOIC [16]. These tools do not allow us to use multiple IP addresses when running on the same physical machine. This feature is important as one can mimic large botnets using few machines. Our tool can generate all attacks generated by HULK, LOIC and HOIC tools, and more. Experiment Topology and Scenarios. Our experiment topology is shown in Figure 4.5. It has 8 physical attack nodes (each emulating 1–1,000 virtual attackers each), 1 node emulating 100 legitimate clients, and 3 nodes for mirrored servers. All nodes are of type d430 on Emulab, with 32 cores and the Ubuntu 14.04 OS. We fine-tuned the nodes to maximize the request rate that each client could generate, 71 and to maximize the request rate that our servers could handle. While having a larger topology would have helped us perform larger-scale tests, Emulab is a shared resource and we were limited in how many nodes we could request. Our tests suffice to illustrate trends in FRADE’s effectiveness as botnet size increases. Figure 4.5: Attackers: A1-A8 (up to 8,000 virtual bots), Legitimate: L (100 clients), the proxy and 3 Web servers. To identify an effective attack rate, we mea- sured the request rate required to slow down each server’s processing below 1 request per second. For all the Web sites, this rate was around 1,000 rps. We chose to generate 8 times this rate during an attack – 8,000 rps. We test one server per run. Legitimate clients start sending traffic to the server following the timing and sequences from the testing logs. We main- tain 100 active, parallel virtual clients throughout the run, each with a separate IP address. After a minute, our virtual attackers (1–1,000 per physical machine) start sending requests to the server at the aggregate rate of 8,000 rps. After 10 minutes we stop the attackers, and a minute later we stop the clients. We illustrate FRADE’s handling of an FCA in Figure 4.6, which shows legitimate and FCA traffic (sent to the server, and allowed by FRADE), and the blocked bots. Legitimate traffic declines at first, until FRADE manages to identify most bots. After 20 seconds, FRADE blocks all bots and legitimate traffic returns to its pre-attack levels. At that point, although the attacker keeps sending the attack traffic (the actual attack area in Figure 4.6), the attack requests cannot reach the server, as the bot IP addresses are blocked at the proxy. 4.3.2 Today’s (Naive) Attacks First, we test FCAs, that resemble today’s attacks as noted by [34]. Our attackers repeatedly request: (t1) non-existing URLs, or (t2) the base URL. In the case (t1), we tailor the URL’s syntax for it to be identified as main requests. Figures ?? and ?? show the time that FRADE took to block all the bots, in these FCAs, for each server, and for 8 and 800 bots. Both attacks show similar trends, with the smaller botnet being blocked sooner (around 4 seconds instead of 8–10 seconds). All bot classification is done by the DYN h module. 72 Figure 4.6: FRADE’s handling of an FCA. 4.3.3 Sophisticated Attacks An attacker familiar with FRADE could attempt to launch a sophisticated FCA, where bots mimic humans to evade detection. To evade DYN h , bots would send at a lower rate, necessitating a larger botnet. Bots could also attempt to generate requests mimicking a human’s semantics, i.e., trying to guess or learning popular sequences. Finally, bots could leverage knowledge of FRADE’s different processing pipelines to engage in embedded or costly request floods. We first explore fully automated FCAs. An attacker has previously engaged a crawler to learn about the target server’s Web site graph, i.e., which pages point to which other pages and to match pages to embedded objects. The attacker knows that lower request rates per bot mean longer detection delays, but does not know each page’s popularity and which hyperlinks are decoy links. We only show results for Imgur. FCAs on other servers show a similar trend. Fully-Automated: Larger Botnet and Smarter Sequences: These FCAs include a larger botnet— 8,000 bots. The first two FCAs are using the same (s1) non-existing and (s2) base URLs as described in 73 Section 4.3.2, with a larger botnet to evade DYN h detection. The third FCA performs a (s3) random walk on the Web site graph, making only main page requests (we investigate FCAs that use embedded links in Section 4.3.3). It cannot differentiate between decoy and non-decoy links. Figure ?? illustrates the time it takes to block all 8,000 bots in s1–s3 attacks, using the TAB proxy approach. The non-existing URL attack (s1) is fully handled within 16 seconds, with each bot blocked after ≈ 5.8 requests, by the semantics module. The random walk (s3) is handled within 16 seconds, with each bot blocked after 3.8 requests on average by the deception module. For the base URL attack (s2) it takes 36 seconds to block 8,000 bots, with each bot blocked after≈ 15 requests by the DYN h module. Fully-Automated: Embedded and Costly Request Floods: Attackers could attempt to flood with embedded or costly requests. The non-existing-object attack (s4) requests made-up URLs, which end up treated as main page requests by FRADE. Figure ?? shows the time to block all 8,000 bots in this FCA. Within a few seconds the FCA is fully handled. Each bot is blocked within 2–3 requests. The semantics module blocks all bots. The costly attack (s5) sends the most expensive main page request repeatedly to the server. All bots are blocked by the DYN c module, within a few seconds. An attacker could collaborate with some human users to learn popular pages on a server, and decoy objects, and then launch semi-automated attacks. The attacker then leverages what they learned to craft sequences of requests, which may evade detection by FRADE’s semantics and deception modules. The requests are sent automatically by bots at predetermined timing. Semi-Automated: Floods that Avoid Deception. The smart walk attack (s6) performs a random walk on the Web site graph avoiding decoy links. The smart-walk-object (s7) performs a smart walk among all embedded objects on the site, and smart-walk-site (s8) performs a smart walk on the site, and requests all non-decoy embedded objects for each main page request. A replay attack [172], where the attacker records and replays legitimate users’ requests, is a special example of the smart-walk-site attack. Figure ?? shows the time it takes to block all 8,000 bots in these FCAs. In a smart-walk attack (s6), FRADE takes 38 seconds 74 Figure 4.7: Today’s (Naive) attacks and performance comparison for sophisticated attacks. 75 Figure 4.8: The time to block 8,000 bots in sophisticated attacks. 76 to block all 8,000 bots. Each bot is blocked after 19 requests on the average, by the DYN h module. Figure ?? illustrates benefits of using a proxy. Without a proxy, it would take around 6 minutes to block all the bots. With Trans, it takes under 3 minutes, and with TAB it takes 38 seconds—almost 10-fold speed-up compared to the server-only approach! In the smart-walk-object attack (s7), all bots are blocked within a few seconds. Each bot is blocked within 2–3 requests, as it requests embedded objects that are not on the AOL during FCA. All bots are blocked by DYN e module. The smart-walk-site attack interleaves main page and their corresponding em- bedded requests, and it avoids decoy links. It thus manages to slip under the radar of DYN h (main page requests come at a low rate), DYN c (requests are not costly) and deception (asking for non-decoy links only) modules. All 8,000 bots are blocked within 22 seconds. Each bot is blocked on the average after 6 re- quests. The complete blocking is done by the semantics module. Since no replies are returned to users while the TAB proxy is active, a human user will not issue embedded object requests. Hence, FRADE does not keep embedded objects on the AOL while TAB is active. Instead, embedded object requests are treated as main-page requests, and forwarded to DYN h and semantic modules, which model only main-page requests. The semantics module blocks all the bots, due to the random walk created, leading to low-probability se- quences. Semi-Automated: Floods that Use Popular Sequences. An attacker may learn which sequences are popular among humans and generate main page requests for them. They need to distribute the rate among many bots to evade detection by DYN h . We evaluate this FCA analytically, using the WAL of a large public network testbed, that serves thousands of users. The logs covered three months of data and around 5 K users. Few users were obvious outliers, making thousands of requests. If we prune the most aggressive 5% of the users and analyze the rest of the user sequences, 95% were shorter than 17 requests. To evade FRADE, the attacker would need to retire each bot after 17 requests. For a 10-min, 1,000 rps FCA, the attacker would 77 need to recruit 35 K bots to attack this specific server . Today, a single server can be brought down by a single, aggressive bot. FRADE thus raises the bar for this specific server’s FCA 35,000 times. 4.3.4 Evasion Attacks It may still be possible to evade FRADE and launch a successful FCA. This would require: (1) Recruiting very large botnets, so each bot is used intermittently. As per our evaluation, FRADE raises the bar from 1 bot to more than 8,000 bots, so at least three orders of magnitude. (2) Leveraging humans instead of bots and instruct users to click on visible, popular content, following their interests. Then, FRADE would not be able to identify malicious (human) clients, but the attacker would need thousands of humans for a sustained FCA. The attacker could combine these two approaches, learning popular sequences from human collaborators, then encoding them in stealthy, low-rate bots. This attack would not be detected by FRADE, but it would require at least 3 orders of magnitude more bots than are in use today (see discussion above of floods that use popular sequences). 4.3.5 FRADE Outperforms OM We experimentally compare the accuracy of FRADE versus OM for DYN h and semantics models. These models exist in both solutions and FRADE improves on OM’s design. We use the same legitimate traffic as in Section 4.3.2, interleaved with synthetically generated FCA bot traffic, exploring a range of request rates as suggested in [114]. For OM, we train decision trees using Weka on the training data and test on the testing data. When testing DYN h we run base-URL FCA, and use 8–8,000 bots. When testing semantics models we run the smart-walk FCA, and also use 8–8,000 bots. A false positive means that the defense classified a human user as a bot. A false negative means that the defense failed to identify a bot. For space reasons we summarize our findings. While FRADE had no false positives or false negatives in our tests, OM had many false positives (7–76%) for the DYN h model, for Wikipedia and Reddit, due to high dimensionality [170] of its models and overfitting. OM also had some false negatives (5–13%) for the 78 semantics model and the 8,000-bot FCAs, because OM cannot handle transitions not seen in training data, while FRADE can using its fallback mechanism. FRADE’s models thus outperform those of OM. In addition to this comparison on attacks they both handle, FRADE also outperforms OM by handling a wider range of attacks (embedded and costly request floods). 4.3.6 Sensitivity FRADE uses multiple parameters in its operation, as shown in Table 4.2. We focus here on analyzing sensitivity of parameters that influence classification accuracy. DYN h and DYN c currently use 5 window sizes as time intervals, during which they learn thresholds for their models. These window sizes follow a non-uniform, exponential-like pattern, with increasing gaps between windows. We also tested 3 different uniform distributions: uniform-5, uniform-10 and uniform-20 with 5, 10 and 20 windows in the 0–600 second range, respectively. We tested non-existing URL FCAs on Imgur with these alternative windowing approaches, and compared the speed of FRADE’s response. Results are shown in Table 4.4. Non-uniform window sizes perform better than uniform sizes, especially for bots that send at a low rate. Figure 4.9: Memory and CPU cost vs # bots. Both dynamics and semantics modules use ThreshPerc to find the percentage of the quantities they model. In our evaluation, we use 100% as ThreshPerc. We chose this value to achieve zero false positives since we had small training data. In reality, a large server would have logs of millions of clients, some of which could be outliers. We have evaluated values of 99%, 95% and 90% for Thresh- Perc with non-existing URL FCAs on Imgur. For DY N h model false positives were 3%, 5% and 9% with ThreshPerc values of 99%, 95% and 90% respectively. This is mainly because our training data is 79 small and does not have outliers, so removing some percentage of aggressive behaviors from training will lead to the similar amount of misclassifications on test data. Semantic model did not generate any false pos- itives with tested ThreshPerc values. Another parameter is the decoy object density ρ—the ratio of decoy objects to visible objects on the same page. In our experiments we useρ = 1. The higher theρ, the faster a bot’s identification, but the higher chances that a human user could accidentally access a decoy object and visible distortion to the original page’s layout. In our MTurk experiments no humans have clicked on our decoy objects. We also observed no visible distortion. Around ρ = 1.5 we observe distortion in Imgur’s Web pages, and aroundρ = 5 distortion becomes severe. 4.3.7 Operational Cost and Scalability We tested FRADE with attacks of up to 0.5 M bots to evaluate its scalability. FRADE’s operational cost is modest. The CPU load never exceeded 5% and the memory grew linearly to around 1.5 GB for 0.5 M bots (Figure 4.9), or around 3 KB per bot or client. Extremely active Web sites like Amazon can see about 4 M active clients per hour [141, 142], and would need 12 GBs of memory, which is feasiable today. It takes on the average 0.05 ms to process a Web log request in FRADE. Thus, FRADE could easily process around 20,000 rps on a single core. Since FRADE does not operate in line, it does not add any user-visible delay to request processing. Number of IPs 0 100 1 K 10 K 100 K 1 M iptables 4.5 4.5 4.5 4.7 4.9 N/A ipset 4.8 4.8 4.9 4.9 4.9 5.3 Table 4.5: Page serve time in ms We evaluate scalability of FRADE’s filtering using iptables andipset. We artificially insert a diverse set of IP-rules and send packets matching these rules at a high rate. This emulates the situation when a server is under FCA by numerous bots. We issue Web page requests and measure the time it takes to receive the reply. Table 4.5 shows the averages over ten runs. iptables’s processing time grows modestly until 100 K IPs, but then explodes. We 80 were not able to complete the tests with 1 M IPs. However,ipset imposes only a small delay of 8% as the rules table grows from 100 K to 1 M, and no measurable delay for fewer than 100 K rules. Thus, FRADE can block a million IPs usingipset. 4.4 Related Work Clouds are a common solution for DDoS. They may offer “attack scrubbing” services, but the details of such services are proprietary. Clouds handle volumetric attacks well, but FCAs may fly under their radar. They also use Javascript-based cookies [38, 123], to detect if a client is running a browser. These challenges are transparent to humans, and good for detecting automated bots. However, attackers can use the Selenium en- gine to generate requests. Since Selenium interprets Javascript, it would pass the cookie challenge. FRADE can complement cloud defenses, enabling server-based solutions for FCAs. Detection mech. Dyn Sem Dec Jung et al. [81] ✓ ✗ ✗ Ranjan et al. [126] ✓ ✗ ✗ Liao et al. [94] ✓ ✗ ✗ Wang et al. [167] ✗ ✓ ✗ Xie and Yu [178] ✗ ✓ ✗ Beitollahi et al. [18] ✓ ✓ ✗ FRADE ✓ ✓ ✓ Table 4.6: Rel. work comparison, showing the absence or pres- ence of human Web server interaction features, even if present at the very basic level. CAPTCHAs [15, 82] are another popular defense against FCAs. Users, who correctly solve a graphical puz- zle have their IPs placed on “allow” list. While a deterrent, CAPTCHAs have some issues. Multiple on-line services offer bulk CAPTCHA solving, using au- tomated and semi-automated methods (e.g. [91]). CAPTCHAs also place a bur- den on human users, while FRADE does not. Google’s reCAPTCHAs [64] and similar approaches for human user detection are transparent to humans, but can still be defeated using 81 deep learning approaches [7, 22, 137]. These approaches are complementary to FRADE, as they model complementary human user features. Jan et al. [76] propose a stream-based bot detection model [168] and augment it with a data synthe- sis method, using Generative Adversarial Networks [107], to synthesize unseen bot behavior distributions. While we lack the data they have, and cannot compare our systems directly, we can comment on their expected relative performance based on their design. Jan et al. system focuses on eventually detecting ad- vanced bots, and is well-suited for click bot or chat bot detection. Authors show that it can adapt to new bot behaviors with small re-training, and that it is robust to adversarial attacks. FRADE focuses on quickly detecting bots involved in an FCA. Such bots are likely to exhibit specific, aggressive behaviors, since they seek to maximize request rate at the server. When FRADE misses a bot, such bot has a low yield to the attacker, necessitating a large botnet for a sustained attack. Thus FRADE could miss some bots that Jan et al. approach detects, but these bots would not be very useful for flash-crowd attacks. Comparing reported performance, Jan et al. require long request sequences (30+ requests in a month) to classify a user as benign or bot. This means that new bots will not be detected for at least 30 requests. FRADE can identify and block most bots within 3–6 requests, and sophisticated bots with less than 20 requests. FRADE also achieves higher accuracy – it identifies all bots in our tests and does not misidentify any benign users as bots. Finally, Jan et al. use a small fraction of bot data in training, while FRADE uses only benign user data. Rampart [103] and COGO [52] build models of resource consumption over time to detect and handle resource exhaustion states. Such defense mechanisms could handle FCAs that employ costly requests, but not other FCA variants. Like FRADE’s dynamics model, several efforts use timing requests to detect FCAs [126, 94]. Ranjan et al. [126] use the inter-arrival of sessions, requests and the cost profile of a session to assign a suspicion value and prioritize requests. Liao et al. [94] look at the inter-arrival of requests within a window. They use 82 custom classification based on sparse vector decomposition and rely heavily on thresholds derived from their dataset. These works have limited evaluation compared to ours and rely only on modeling human requests, while we also deal with embedded and costly requests, we build semantic models of request sequences and use decoys to bait bots. Yatagai et al. [180] look for repetitive sequences of resources, and clients which spend shorter than normal periods of time between requests. Bharathi et al. [21] use fixed sized windows to examine which, and how many, resources a client accesses and to detect repetitive patterns. Najafabadi et al. [109] use PCA and fixed windows to examine which resources a client requests. Beitollahi et al. [18] propose ConnectionScore, where connections are scored based on history and statistical analysis done during the normal conditions. Models engaged in connection scoring are coarser (e.g., 1 rps vs our rate per several time intervals) than FRADE models, and thus we believe that FRADE would outperform this. Jung et al. [81] learn existing clients of a Web server, and perform network aware clustering [87]. When the server is overloaded, they drop aggressive clients that do not fit in the existing clusters. In comparison to these works, we evaluate timing dynamics at a much finer granularity, and evaluate the strict order of requests, allowing us to detect stealthier FCAs. Multiple works are related to FRADE’s semantics model. Wang et al. [167] examine requests over 30-minute windows (sessions) and use a click-ratio (page popularity) model and Markov process to model clients. Their detection is highly accurate for bot identification, but has a high false-positive rate, while we have zero false positives. Similar to [167], Xie et al. [178] capture the transition probabilities between requests in a session through a hidden semi-Markov model. Our approach to training and modeling is simpler, while still very accurate. Our deception model uses honeytokens [138], similar to [61, 24, 67]. We build on ideas from these prior works (use of decoy links), but we use a variety of decoy objects, configurable object density and automate object insertion code for each site. To our knowledge, our work is the first to combine dynamics, semantics of user requests, and the decoy objects in a single defense, and evaluate its effectiveness using 83 realistic traffic and real servers. Our results show that different modules are effective against different FCAs. Thus, a combination is needed to fully handle FCAs. Software and datasets for these prior works are not publicly available, and thus we could not directly compare FRADE to them. Biometrics solutions (e.g., [33] or [176]) can distinguish bots from humans by capturing mouse move- ments and keystrokes. These approaches are orthogonal to FRADE, and may suffer from privacy issues. 4.5 Conclusions FCAs are challenging to handle. We have presented a solution, FRADE, which models how human users interact with servers and detects bots as they deviate from this expected behavior. Our tests show that FRADE stops naive bots within 3–5 requests and sophisticated bots within 15–19 requests. A bot could modify its behavior to bypass FRADE’s detection, but this forces the attacker to use botnets at least three order of magnitude larger than today, to achieve sustained attack. FRADE thus successfully fortifies Web servers against today’s FCAs. 84 Chapter 5 Quantifying Cloud Misbehavior Clouds have gained popularity over the years as they provide on-demand resources without associated long- term costs. Cloud users often gain superuser access to cloud machines, which is necessary to customize them to user needs. But superuser access to a vast amount of resources, without support or oversight of experienced system administrators, can create fertile ground for accidental or intentional misuse. Attackers can rent cloud machines or hijack them from cloud users, and leverage them to generate unwanted traffic, such as spam and phishing, denial of service, vulnerability scans, drive-by downloads, etc. In this chapter, we analyze 13 datasets, containing various types of unwanted traffic, to quantify cloud misbehavior and identify clouds that most often and most aggressively generate unwanted traffic. We find that although clouds own only 5.4% of the routable IPv4 address space (with 94.6% going to non-clouds), they often generate similar amounts of scans as non-clouds, and contribute to 22–96% of entries on block- lists. Among /24 prefixes that send vulnerability scans, a cloud’s /24 prefix is 20–100 times more aggressive than a non-cloud’s. Among /24 prefixes whose addresses appear on blocklists, a cloud’s /24 prefix is almost twice as likely to have its address listed, compared to a non-cloud’s /24 prefix. Misbehavior is heavy-tailed among both clouds and non-clouds. There are 25 clouds that contribute 90% of all the cloud scans, and 10 clouds contribute more than 20% of blocklist entries from clouds. 85 5.1 Introduction The cloud computing industry has seen a growth of about 200 billion U.S. dollars over the past decade [161]. In this chapter, we focus on quantifying misuse of cloud resources for malicious Interent activity (e.g., send- ing spam). Clouds may offer content hosting (e.g. Wordpress) or may rent virtual machines to customers, giving them the liberty to install and run their own software and upload their own data (e.g. Amazon AWS). We focus on the latter category of clouds, since the ability to install custom software requires superuser access to machines, and is often necessary for misbehavior. It may appear that, since clouds charge per usage, the monetary aspect may deter misuse of their resources. Sadly, this is not so. Criminals often avoid paying by taking advantage of free trials, or hijacking legitimate accounts for their own use through Account Takeover Attacks [17]. Some clouds also willingly host malicious content or condone misuse [25]. Cloud owners are aware that their machines can be misused, and many work hard to prevent or detect and stop misuse [6, 159, 158, 156]. However, these protections can be bypassed by attackers [105, 164]. There is anecdotal evidence that clouds generate unwanted traffic in select incidents. For example, in 2014, Amazon AWS EC2 machines were used to launch DDoS attacks on a large regional US bank and a Japanese electronics maker [45, 44]. Further, Zhao et. al [183] observe that cloud service providers are being preferred by cyber-criminals to inflict harm on online services at a large scale. Link11 [36] finds that cloud services from leading providers including Amazon AWS, Microsoft Azure and Alibaba were used in 25% of all DDoS attacks in Europe from July 2017 to June 2018. Recent statistics from Akamai [5] reveal that cloud service providers account for a significant amount of DDoS traffic. Akamai points to trends in cloud providers towards greater capacity, flexibility, and availability as also favorable for misuse. Glazier et. al [175] mention that cloud hosting providers are being used as proxy farms by attackers, because they can spin up multiple virtual instances with different IP addresses and launch widely distributed attacks in a short period of time. There are many other documented incidents, e.g., [49, 120, 121, 122]. 86 5.1.1 Contributions Our work is the first to systematically analyze and quantify Internet-wide cloud misbehavior. We rely on 13 diverse datasets: three datasets document unwanted traffic (vulnerability scans) and ten contain IP addresses or URLs blocklisted for misbehavior by multiple providers. We classify all Internet prefixes as cloud or non-cloud using a multi-pass approach and relying on reverse DNS data, IPinfo.io prefix classifi- cation and manual verification. We then attribute each misbehavior (scan or appearance on a blocklist) from our datasets to a cloud or non-cloud, and analyze differences between clouds and non-clouds. We find that although clouds own only 5.4% of the routable IPv4 address space (with 94.6% going to non-clouds), they generate between 45–55% of all scans in our datasets and contribute to 22–96% of blocklist entries. Commensurate with this, we find that an average /24 cloud prefix scans at a rate that is 20–100 times higher than that of non-cloud prefixes . Observing the /24 prefixes whose IP addresses get blocklisted, cloud prefixes appear on blocklists almost twice the rate of non-cloud prefixes . Clouds also play a pivotal role in spreading malware and hosting phishing URLs. They represent 80-96% of 22K listed entries on phishing and malware blocklists. There is a heavy tail in both cloud and non-cloud misuse. Only 25 clouds account for 90% of the cloud-sourced scans, and 10 clouds account for 20% of the cloud addresses listed in blocklists. DigitalOcean and OVH are two of the most misused clouds, with an average rank of 5.3 and 5.8 respectively across all the 13 datasets we use. 87 Figure 5.1: Identifying Clouds Figure 5.2: Number of cloud prefixes identified from different sources and their overlaps. 5.2 Methodology In this section we define the term “cloud” (Sec. 5.2.1) and describe how we identify cloud prefixes on the Internet (Sec. 5.2.2). We also discuss datasets we use to quantify misbehavior (Sec. 5.3) and define how we measure maliciousness of a prefix or an organization (Sec. 5.3.4). 5.2.1 Cloud Definition We define a “cloud” as an organization that offers servers for rent, and allows users to install custom software on these servers. While there are other organizations that offer content hosting (e.g Wordpress), we focus on 88 the server-based definition of a cloud, because the ability to install custom software is often necessary for misbehavior. For example, this is necessary for spoofed traffic generation, sending certain types of scans or hosting malware. 5.2.2 Identifying Clouds Figure 5.1 outlines our methodology for identifying all /24 prefixes in the Internet that offer server hosting services, and are thus clouds by our definition. We focus on /24 prefixes and not entire organizations, because some large organizations (e.g., Microsoft, Amazon, Google) dedicate only a portion of their address space to cloud services. We also do not consider granularity smaller than a /24 prefix, because this is the smallest address space assigned to an organization by Regional Internet Registries. We identify clouds through four methods: (1) reverse DNS lookup, (2) Alexa and Wikipedia, (3) IP- info.io services and (4) Large offenders. Each of these methods provides us with a list of cloud candidates, and we follow it by our manual verification of each cloud candidate. We provide more details in the rest of this section. Reverse DNS. We use reverse DNS to look up all routable IPv4 /24 prefixes from September 2019. To identify candidate cloud prefixes, we look for the keywords, “cloud”, “hosting”, “host” or "vps" in the domain name part of the DNS names for hosts in the given prefix. If any of these terms are present, we proceed to manually verify if the given domain offers cloud services. At the end of the verification process, we find 385K /24 cloud prefixes using the reverse DNS approach. We manually verify 8 K organizations in 69 hours (out of which 3.1 K are clouds). Some of the clouds we identified through reverse DNS are upcloud.com, dreamhost.com and greencloudvps.com. Alexa and Wikipedia. We identify cloud candidates from the Alexa listing of Top 500 hosting ser- vices [157, 8] and the Wikipedia’s page of well-known clouds [174]. We again manually verify each cloud 89 to ensure they offer servers and not just content hosting. This step adds another 44 K /24 prefixes to our clouds list. We manually verify 636 organizations in 7 hours (out of which 208 are clouds). IPinfo.io. The organization, IPinfo.io [74], collects information about IP addresses and prefixes. These details include information about the associated Autonomous System Number (ASN), AS type, AS domain, geolocation, prefix and prefix domain and type. AS type and prefix type can belong to one of the categories: “hosting”, “business”, “isp”, “education” or “inactive”. AS type and prefix type can be different. Thus, we rely on prefix type, as being more specific and thus likely more accurate. IPinfo.io provides information about all the IP prefixes owned by a given AS. We leverage IPinfo.io in two ways. First, we collect ASes that have type “hosting”, and include all their /24 prefixes, with prefix type “hosting”, in our list of clouds. For example, the AS26481 is associated with Rebel Hosting (rebelhosting.net) and has its ASN type “hosting”. So, we add all its /24 prefixes with prefix type “hosting” to our list of clouds. Second, we collect /24 prefixes whose type is “hosting”, while the AS type may be different. Such prefixes may correspond to internal hosting, or to hosting services offered to public. We keep only those prefixes whose domain name includes “cloud”, “hosting”, “host” or “vps”. We then manually verify these domains and add the confirmed prefixes to our cloud list. For example, the prefix 108.161.147.0/24 associated with m5hosting.com (M5 Hosting), and ASN AS395831, has AS type “business”, but has prefix type “hosting”. Its Web page confirms that it provides server hosting. We manually verify 1 K organizations in 9 hours (out of which 424 are clouds). Our use of IPinfo.io adds another 794 K /24 prefixes to our list of clouds. Using IPinfo.io, we find a total of 4 K clouds. Large offenders. We collect top 20 organizations that contribute the most to scans and blocklist entries, for each dataset. We then manually verify if these organizations are clouds or non-clouds. If an organization is a cloud we add all its /24 prefixes to our list of clouds (e.g., hinet.net AS3462, provides multiple other services along with server hosting). That adds 47 K /24 prefixes to our list. In some cases the Web site may offer minimal information, such as a vague offer of services and a phone number (e.g., ipvolume.net). We 90 will include such an organization on the cloud list if its tendency to participate in misbehavior is documented (e.g., by badpackets.net or by news articles. Manual verification. Our manual verification consists of visit to the Web server of the given organiza- tion associated with a cloud candidate prefix, and examination of offered services. If the services include one of the following options, we conclude that the organization is a cloud: dedicated server hosting, private server hosting, hybrid cloud hosting, virtual private server hosting, custom server rental or virtual machine rental. We use Google Translate to translate foreign-language Web sites into English. If we conclude that the organization is a cloud, we verify all its prefixes on our candidate list. In the end, our list of cloud prefixes includes 889 K unique /24 IP prefixes from 6 K unique organizations, which contribute to 5.4% of the publicly routable IPv4 address space [75]. The number of cloud prefixes identified from different sources and their overlaps are shown in Figure 5.2. We attempted to quantify the accuracy of our overall cloud identification process by randomly selecting 100 organizations from our list of clouds, and visiting their Web pages for manual verification. Out of 100, 97 were indeed clouds and 3 were misclassified, containing 13,429 and 4 /24 prefixes respectively. We thus conclude that our cloud identification precision is 97% for organizations and 99.97% for prefixes. Limitations. One limitation of our work is that there is no ground truth that we could use to evaluate accuracy of our cloud identification. Another limitation is that our manual verification may be subjective, since it is performed by a human, and thus it may in some cases be inaccurate. Further, if an organization offers both content and server hosting, and is not added through IPinfo.io source, we add all the hosting prefixes to our list as we cannot distinguish between them, based on the information we have. Yet another limitation is that we may miss some clouds, because our candidate identification process misses them. Since our manual verification of large offenders adds 5.33% of the total prefixes to our clouds list we estimate that our accuracy of candidate identification (without large offender dataset) is 94.66%. Moreover, as per IPinfo.io, 1%–5% of hosting provider allocations may change every month and IP address allocations across 91 hosting providers can change frequently too. However, 9 out of 13 datasets that we analyze overlap with the time frame when we extract the prefixes from IPinfo.io, which contributes to 89.31% of our clouds list. Our list of cloud prefixes is available as open source [39], and we hope that other researchers can help improve its accuracy. 5.3 Datasets We work with 13 diverse datasets, which document misbehavior. We group these datasets into two major categories (1) network traces, and (2) IP addresses or URLs listed on blocklists. Table 5.1 summarizes all our datasets. 5.3.1 Network Traces CAIDA real-time network telescope data [153] captures all traffic to an unused /8 prefix owned by CAIDA. The traffic to these unassigned addresses (darknet) is unsolicited, and results from a wide range of events, including scanning (unwanted traffic looking for vulnerabilities) and backscatter (replies by the victims of DDoS attacks to randomly spoofed traffic, including darknet space). We analyze TCP SYN traffic sent to darknet, because such traffic indicates vulnerability scanning and it cannot be part of backscat- ter [160], [19]. We analyze traffic from March 12, 2020 to April 28, 2020. Each compressed hourly file is close to 80 GBs, and contains a few billion packets. The traffic is not anonymized. This dataset can only be accessed on CAIDA machines, and it is too large to be processed live in entirety. Instead, we analyze the first twenty minutes of traffic from each hour, and produce a list of top-10 destination ports. These ports are computed by analyzing 113 archived files, randomly sampled from December 1, 2019 to February 28, 2020. The port numbers are 22, 23, 80, 81, 443, 3389, 3390, 5222, 8545 and 8080. Our analysis then focuses only 92 on scans sent to these select port numbers, which enables us to process CAIDA traces in real time, using Berkeley packet filters and parallel processes. Merit network real-time network telescope data [104] also captures full packet traces, using a /13 dark prefix. We analyze their data from March 11, 2020 to March 19, 2020. Each compressed hourly file is close to 2 GBs, and usually contains fewer than 0.1 billion packets, so we can process it fully. The traffic is not anonymized and can be analyzed only on Merit’s machines. Regional optical network RONX dataset contains sampled Netflow records from a mid-size US regional network, connecting educational, research, government and business institutions to the Internet. To detect scans, we identify flows with only TCP SYN flag set. A short flow may thus be misclassified as a scan, because other packets on this flow are not sampled. To address this we also, for each source prefix, keep track of flows that have TCP PSH or TCP ACK flags set, but not TCP SYN flag. We assume that these flows indicate an established connection. We then calculate the difference of SYN flows and PSH/ACK flows. If the difference is positive, we use it as the estimate of scan flows from the given source prefix. Our dataset consists of 5-minute long Netflow record collections, from February 24, 2020 to April 30, 2020. To protect user privacy, the IP addresses in records are anonymized in a prefix-preserving manner, using CryptoPAN [42]. With the consent of RONX operators, we have obtained a list of anonymized and original /24 prefixes in the dataset, so we could use them to classify a prefix as cloud or non-cloud. 5.3.2 Blocklists Blocklists usually represent IP addresses, ASNs or DNS names that are sources of some observed misbehav- ior (e.g., sending spam). We describe our datasets below. When a blocklist includes ASN we disaggregate the corresponding autonomous system into its /24 prefixes and include the prefixes in our dataset. When a blocklist includes DNS names, we use DNS queries to obtain corresponding IP addresses associated and include them in our dataset. 93 Scamalytics: IP fraud risk lookup tool [62] is known for maintaining the largest shared anti-fraud database, mainly dedicated to the online dating industry. It computes the fraud score for all known IPs, and publishes the top 100 IPs each month, and their fraud scores. Our dataset includes top 102 IP addresses for March 2020 and April 2020. F5 Labs [128] maintains the list of the top 50 malicious autonomous systems and attacker IPs, which are associated with 14 million different attacks across the globe. Majority of these are Web application attacks. Organizations and attacker IPs are ranked by the number of attacks generated. This dataset is spread over 90-day period from Aug 2019 to Oct 2019. BLAG [124] project produces a publicly available master blocklist [88], obtained by aggregating content from 157 publicly available, popular blocklists. A new master list is published whenever any of the blocklists updates its contents. We use the data for the entire 2018 and 2019, separated into two datasets: 0.5 billion IP addresses in 2018, and 5 billion IP addresses for 2019 and January 2020. Google Safe Browsing [65] examines billions of URLs per day looking for unsafe websites, and pub- lishes their list. We collected this list from May 8, 2020 to May 16, 2020 from maltiverse.com [65]. COVID-19 phishing URL’s list from maltiverse.com [40] contains 239 phishing URLs related to COVID- 19 content, from March 13, 2020 to May 16, 2020. COVID-19 malicious hostnames/URL’s List from maltiverse.com [41] contains 9,874 malicious host- names/URLs from January 2020 to May 2020 that contain the words “COVID-19” or “corona” and are known to generate different types of unwanted traffic. Openphish [154] maintains the list of autonomous systems associated with phishing. We collect 54 snapshots, from May 15, 2020 to May 22, 2020, each containing top 10 ASNs associated with phishing. The unique snippets of the Web page that we downloaded are available at [155]. 94 Cybercrime Tracker [43] maintains a public list of IP addresses that are known to spread malware at [43]. We use data from January 1st, 2019 to May 12th, 2020, which comprised 3,471 IP addresses that spread malwares. This dataset overlaps with the BLAG dataset. udger.com [73] maintains a list of known user agent strings, and also maintains a list of source IPs of known attacks, which is updated every half an hour. We collected a total of 101 snapshots from this list including a total of 3,030 IP addresses, on April 13, 2020, and also from May 4, 2020 to May 20, 2020. Attacks include installations of vulnerable versions of popular web applications, brute-force login attempts, and floods. BGP Ranking [20] provides a ranking model to rank the ASNs from the most malicious ASN to the least malicious ASN using data from compromised systems and other publicly available blocklists. We collected the snapshot of Top 100 most malicious ASNs on August 21st, 2020. 5.3.3 Limitations The darknet datasets can contain spoofed traffic, which may lead us to misclassify the origin of the scans. We employ techniques outlined in [79] to identify and remove randomly spoofed traffic, but these techniques cannot guarantee that all spoofed traffic is removed. Further, blocklist datasets are produced using propri- etary algorithms, and they may contain some false positives. We have no way to independently establish accuracy of the blocklists. However, we believe that spoofing and false positives are equally likely to affect any prefix, regardless of whether it belongs to a cloud or a non-cloud. Thus we expect that spoofing and false positives do not affect relative comparison between clouds and non-clouds, which is the focus of our work. 95 Dataset Source Format Description Start Date End Date Size CAIDA [153] PCAP files Real-Time Network Telescope Data 12-Mar-20 28-Apr-20 80 GBs hourly compressed files (a few billion packets) Merit [104] PCAP files Real-Time Network Telescope Data 11-Mar-20 19-Mar-20 2 GBs hourly compressed files (fewer than 0.1 billion packets) RONX Anon. Netflow files Live ISP data (all 5-minutes long Netflow records) 24-Feb-20 26-Apr-20 1.15 TB Scamalytics [62] IP addresses Top 100 monthly IPs with the maximum fraud in online dating 1-Mar-20 30-Apr-20 204 IP addresses udger.com [73] IP addresses Source IP addresses associated with different attacks 4-May-20 5-May-20 3,030 IP addresses Cybercrime Tracker [43] IP addresses IPs that spread malwares 1-Jan-19 12-May-20 3,471 IP addresses Google Safebrowsing [65] URLs Malicious URLs List from maltiverse.com 8-May-20 16-May-20 7,886 URLs (that belong to non-bogon IPv4 address range) COVID-19 Hostnames [41] Hostnames / URLs Malicious hostnames/URLs that contain the word "COVID-19"/"corona" and are associated with generating different variants of unwanted traffic 1-Jan-20 16-May-20 9,874 malicious hostnames COVID-19 Phishing [40] URLs Phishing URLs related to COVID-19 content 13-Mar-20 16-May-20 239 phishing URLs Openphish [154] ASNs and ASNs Domains Top 10 ASNs associated with phishing 15-May-20 22-May-20 54 snippets of top 10 ASNs BGP Ranking [20] ASNs Maintains top 100 malicious ASNs 21-Aug-20 21-Aug-20 101 ASNs BLAG (2018) [124], [88] IP addresses Publicly available blacklisted IPs list collected daily using 157 popular blocklists 1-Jan-18 31-Dec-18 0.5 billion IP addresses (14.5 million unique IPs) BLAG (2019) [124], [88] IP addresses Publicly available blacklisted IPs list collected daily using 157 popular blocklists 1-Jan-19 31-Jan-20 5 billion IP addresses F5 Labs: Attack Traffic [128] IP addresses / ASNs Top 50 malicious ASNs and IPs ranked by number of attacks for more than 14 million attacks globally 1-Aug-19 31-Oct-19 50 ASNs and 50 IPs Table 5.1: Datasets summary. 5.3.4 Misbehavior Metrics We define several measures of misbehavior in this section. To quantify misbehavior of a /24 prefix in a network trace dataset, we define mal d scans (t), which is the number of scans this prefix sends during time t in dataset d. Similarly, we define mal d bl , which is the number of times the prefix appears on a given blocklist. To quantify maliciousness of an organization, we calculate their rank for each dataset, ordered in the decreasing order of their contribution to the dataset (number of scans or number of appearances on the blocklist). We then report the average rank for an organization rank avg . As organizations own address spaces of different sizes, it may seem that an organization’s likelihood to host misbehaving nodes grows with its address size. To account for this, we devise another, normalized measure of maliciousness malorg d score , as: malorg d score = prefs d · r d prefs tot (5.1) where prefs tot is the number of /24 prefixes owned by the given organization, and prefs d is the number out of these prefixes that appear in the dataset d. The measure r d is the fraction of the contribution of this organization to the total scans or entries in the dataset. We then report average across datasets – malorg score 96 5.4 Results In this section we report our findings from datasets, organized into findings from network traces and findings from blocklists. In both categories of datasets, clouds appear 2–100 times more aggressive than non-clouds. We also observe heavy tail misbehavior from both clouds and non-clouds. Thus, if defenders focused their efforts on this handful of organizations they could eliminate most of malicious behaviors. 5.4.1 Findings from Network Traces Cloud prefixes are more aggressive than non-cloud prefixes across trace datasets. The average measure mal scans for a cloud prefix is 20 - 100 times higher than the average mal scans for a non-cloud prefix. Figures 5.3(a), 5.3(d) and 5.3(g) show the average mal scans for CAIDA, Merit and RONX respectively, for cloud and non-cloud prefixes. At the same time, the number of /24 prefixes is 30 – 60 times higher for non-clouds than for clouds. Figures 5.3(b), 5.3(e) and 5.3(h) show the number of /24 prefixes for CAIDA, Merit and RONX respectively, for clouds and non-clouds. Although cloud prefixes make only 2-3% of prefixes in the network trace datasets, total scans per hour from clouds are similar to scans per hour from non-clouds (the remaining 97–98% of prefixes). The per- centage contribution of unwanted traffic by clouds and non-clouds stays roughly equal throughout the full duration of the datasets. Both clouds and non-clouds contribute about 50% of the scans, as shown in Figures 5.3(c), 5.3(f) and 5.3(i). We note that in all three datasets, port 8545, associated with Ethereum, cryptocurrency and e-wallet services, is frequently scanned by clouds but far less often by non-clouds. Port 8545 contributes to 1.7%, 1.3% and 3.7% of cloud scans in CAIDA, Merit and RONX datasets, respectively. Conversely it only contributes to 0.3%, 0.4% and 1% of non-cloud scans, respectively. This supports an observation that attackers also misuse clouds for possible monetary gains [53]. 97 Larger clouds seem to be less malicious than smaller clouds in terms of the total mal scans . Figure 5.5(a) shows the rank avg of top clouds and non-clouds across Network Traces datasets. Organizations are ordered by organization-wise total mal scans . There seems to be a slight downward trend in the number of scans generated per /24 prefix as we go from smaller to larger organizations. The misbehavior associated with organization as a whole, i.e. the malorg score , is higher for small-sized clouds and non-clouds than for large clouds. Figure 5.5(b) shows the rank avg of top clouds and non-clouds for Network Traces datasets ordered by organization-wise malorg score . 90% of these top organizations have fewer than 10 /24 prefixes. Misbehavior is heavy-tailed. We also investigate contribution of each organization to the total number of scans sent by clouds and non-clouds. In all three datasets there is a heavy-tailed distribution of organiza- tion’s contributions, as shown in Figure 5.6, i.e., around 25 clouds and 200 non-clouds, generate 90% of the malicious traffic . Thus if defenders focused their efforts on these heavy hitters, they could eliminate a large amount of misbehavior on the Internet. 5.4.2 Findings from Blocklists Figure 5.4 shows the percentage of contribution to blocklist entries by clouds and non-clouds in different datasets. Clouds contribute anywhere from 22% to 96% of entries, in spite of being only 5.4% of the routable address space. On the average, the total mal bl per cloud is 1.82 times higher than the total mal bl per non-cloud. Thus cloud prefixes are almost twice as likely to engage in misbehavior that lands them on a blocklist than non- clouds. Clouds play a vital role in spreading malware and supporting phishing URLs as evident from COVID-19 datasets, Openphish Top ASNs, Google Safe Browsing and Cybercrime Tracker. In each of these datasets, clouds account for at least 80% of the entries. Clouds play a vital role in attacks. Around 61% of the global attacks from the F5 Labs Research dataset originated from clouds, i.e., 8.6 M out of the14 M attacks. There is heavy tail in contributions of 98 Figure 5.3: Network Traces Datasets 99 Figure 5.4: Percentage of clouds and non-clouds in the different Network Traces datasets and Blocklists datasets 100 Figure 5.5: rank avg of top clouds and non-clouds both clouds and non-clouds to the blocklists. Figure 5.6 shows that the top 10 clouds ranked by the total mal bl , account for 20% of the total blocklist entries that list cloud addresses. Most frequently blocklisted clouds are OVH and Digital Ocean, shown in Figures 5(c) and 5(d). OVH, Digital Ocean and Namecheap also have a high malorg score . 5.5 Related Work In addition to news articles mentioned in Section 5.1 , there are research articles examining cloud misbehav- ior. Zhao et. al [183] observe that cloud service providers are being preferred by cyber-criminals to inflict harm on online services at a large scale. Our findings complement their observations. Vlajic et. al [164] show that the virtual private servers of cloud providers are vulnerable to a number of misuse attempts, which involve the use of IP spoofing. Liao et al. [96, 97] discover that cloud web hosting services are being used as a platform for long-tail Search Engine Optimization (SEO) spam attacks, and measure the effectiveness of 101 Figure 5.6: Cumulative scans ratio distribution. 102 these attacks. Liao et. al [95] show that clouds are being used to aid malicious online activities, as reposi- tories for malicious content. While these related works specifically focus on IP spoofing, SEO spam attacks and abused cloud repositories, and on a handful of select clouds, we look at a wider range of misbehaviors using 13 diverse datasets, and we seek to attribute them to a much larger population of clouds. Alrwais et. al [9] present a study on the BulletProof Hosting ecosystem [25], reporting 39 K malicious sub-allocations, distributed across 3,200 autonomous systems. They also show how these services operate and evade de- tection. Their main focus is on BulletProof Hosting, while we cover a wider range of misbehaviors. Miao et al. [105] study attack incidents in a single large cloud provider, and find that outbound attacks dominate inbound ones. Ahmad et al. [2] show that cloud abuse threats have not been addressed yet. They show- case security vulnerabilities associated with clouds, provide anecdotal evidence, and mention some known solutions for different scenarios. Doelitzscher et al. [50] express concern for cloud abuse. They present an anomaly detection system for IaaS clouds based on customers’ usage patterns. Lindemann [98] highlights the problem of cloud abuse and presents a survey of possible abuses from seven clouds. All the related works focus on a limited set of clouds and misbehaviors, while we study a larger set of clouds and leverage diverse data sources to systematically analyze and quantify misbehavior. 5.6 Conclusion Clouds can be misused, either due to the negligence or because they explicitly permit misuse. In this chapter, we quantify misbehavior from clouds and measure it in terms of participation in scanning activities, and being listed on blocklists. In all of our 13 datasets, cloud prefixes misbehave much more than non-clouds: they generate 20 – 100 times more scans and are twice more likely to end on a blocklists. Clouds also play a vital role in spreading malware and supporting phishing. Some small clouds misbehave more per /24 prefix than larger clouds, which may indicate lack of security resources, or higher inclination to allow malicious activities. Both clouds and non-clouds misbehave in heavy-tailed manner. Top 25 clouds account for 90% 103 of the unwanted scans from clouds, and 10 clouds contribute more than 20% of blocklisted cloud addresses. Thus, if efforts are focused on securing these clouds, Internet attacks can be greatly reduced. 104 Chapter 6 Conclusions Distributed denial-of-service (DDoS) attacks are on the rise. Online services are often targets of sophis- ticated DDoS attacks. These are difficult to detect and mitigate, since they target different vulnerabilities and resources. In this dissertation, we have identified the gaps that are present in the existing operational and research-based DDoS defenses. For each gap that we found, we have presented novel research insights and proposed novel solutions utilizing the insights to fill the gaps. For volumetric attacks, our solution, AMON-SENSS, provides deployable, accurate and scalable DDoS detection and signature generation solu- tion, which is open source and can be deployed in today’s networks. For exploit-based attacks, our solution, Leader, provides attack-agnostic defense, which detects anomalous resource usage and blocks sources of such traffic. For flash-crowd attacks, our solution, FRADE, models human behavior across multiple di- mensions, thus providing effective defense against both naive and sophisticated bots. Finally, we explore if static public blocklists can help filter unwanted traffic (including DDoS attacks), and find that a large portion of unwanted traffic emanates from a few public clouds. Since these clouds are heavily used by legit- imate businesses, static public blocklists are not a good solution against unwanted traffic. We need dynamic and custom blocklists, which are built during attacks by evaluating behavior of each traffic source. Such blocklists are built by two of our defenses, Leader and FRADE. In this dissertation, we identify and fill the 105 gaps that are present in the current state-of-the-art solutions by developing deployable and effective DDoS defenses that are robust against strong adversaries. 106 Bibliography [1] AAT, ed. HULK DDoS Tool. https://tinyurl.com/y49tze6w, Accessed: Mar 31, 2021. May 2018. [2] Ishrat Ahmad and Humayun Bakht. “Security Challenges from Abuse of Cloud Service Threat”. In: International Journal of Computing and Digital Systems 8.01 (2019), pp. 19–31. [3] Akamai. https://www.akamai.com/us/en/multimedia/documents/state-of-the-internet/soti- security-a-year-in-review-report-2019.pdf, Accessed: Mar 31, 2021. [4] Akamai. https://www.akamai.com/us/en/multimedia/documents/state-of-the-internet/soti- security-a-year-in-review-report-2020.pdf, Accessed: Mar 31, 2021. [5] Akamai. DDoS Protection. https://www.akamai.com/us/en/resources/ddos-protection.jsp, Accessed: Mar 31, 2021. [6] Akamai: What are the Security Risks of Cloud Computing? https://tinyurl.com/y6c96lxq, Accessed: Mar 31, 2021. [7] Ismail Akrout, Amal Feriani, and Mohamed Akrout. Hacking Google reCAPTCHA v3 using Reinforcement Learning. 2019. arXiv: 1903.01003[cs.LG]. [8] Alexa Top 500 sites. https://tinyurl.com/y5j8lkrc, Accessed: Mar 31, 2021. [9] Sumayah Alrwais, Xiaojing Liao, Xianghang Mi, Peng Wang, XiaoFeng Wang, Feng Qian, Raheem Beyah, and Damon McCoy. “Under the shadow of sunshine: Understanding and detecting bulletproof hosting on legitimate service provider networks”. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE. 2017, pp. 805–823. [10] AMON-SENSS code. https://github.com/jelenamirkovic/AMON-SENSS. [11] Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. “Impact of Response Latency on User Behavior in Web Search”. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in Information Retrieval. New York, NY , USA: Association for Computing Machinery, 2014, pp. 103–112. ISBN: 9781450322577. [12] Arbor DDoS. https://www.netscout.com/arbor-ddos, Accessed: Mar 31, 2021. 107 [13] Zhihao Bai, Ke Wang, Hang Zhu, Yinzhi Cao, and Xin Jin. “Runtime Recovery of Web Applications under Zero-Day ReDoS Attacks”. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE. 2021, pp. 1575–1588. [14] Alexandru G Bardas, Loai Zomlot, Sathya Chandran Sundaramurthy, Xinming Ou, S Raj Rajagopalan, and Marc R Eisenbarth. “Classification of UDP Traffic for DDoS Detection.” In: LEET. 2012. [15] C. Barna, M. Shtern, M. Smit, V . Tzerpos, and M. Litoiu. “Model-based Adaptive DoS Attack Mitigation”. In: Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. SEAMS ’12. Zurich, Switzerland: IEEE Press, 2012, pp. 119–128. ISBN: 978-1-4673-1787-0. [16] Ryan Barnett. HOIC. Ed. by SpiderLabs Blog. https://tinyurl.com/y6en34r3, Accessed: Mar 31, 2021. Jan. 2012. [17] Ryan Barnett. Web Application Defender’s Field Report: Account Takeover Campaigns Spotlight. https://tinyurl.com/ztqq64m. June 2016. [18] Hakem Beitollahi and Geert Deconinck. “Tackling application-layer DDoS attacks”. In: Procedia Computer Science 10 (2012), pp. 432–441. [19] Karyn Benson, Alberto Dainotti, Kimberly C Claffy, and Emile Aben. “Gaining Insight into AS-level Outages Through Analysis of Internet Background Radiation”. In: 2013 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE. 2013, pp. 447–452. [20] BGP Ranking. https://bgpranking.circl.lu/, Accessed: Mar 31, 2021. [21] R. Bharathi, R. Sukanesh, Y . Xiang, and J. Hu. “A PCA based framework for detection of application layer DDoS attacks”. In: WSEAS Transactions on Information Science and Applications. 2012. [22] Kevin Bock, Daven Patel, George Hughey, and Dave Levin. “unCAPTCHA: a low-resource defeat of reCAPTCHA’s audio challenge”. In: 11th{USENIX} Workshop on Offensive Technologies ({WOOT} 17). 2017. [23] Rodrigo Braga, Edjard Mota, and Alexandre Passito. “Lightweight DDoS flooding attack detection using NOX/OpenFlow”. In: IEEE Local Computer Network Conference. 2010, pp. 408–415. DOI: 10.1109/LCN.2010.5735752. [24] D. Brewer, K. Li, L. Ramaswamy, and C. Pu. “A Link Obfuscation Service to Detect Webbots”. In: 2010 IEEE International Conference on Services Computing. July 2010, pp. 433–440. DOI: 10.1109/SCC.2010.89. [25] Bulletproof Hosting. https://en.wikipedia.org/wiki/Bulletproof_hosting, Accessed: Mar 31, 2021. 108 [26] Enrico Cambiaso, Gianluca Papaleo, Giovanni Chiola, and Maurizio Aiello. “Slow DoS attacks: definition and categorisation”. In: International Journal of Trust Management in Computing and Communications 1.3-4 (2013), pp. 300–319. [27] Yinzhi Cao and Junfeng Yang. “Towards making systems forget with machine unlearning”. In: 2015 IEEE Symposium on Security and Privacy. IEEE. 2015, pp. 463–480. [28] Yinzhi Cao and Junfeng Yang. “Towards making systems forget with machine unlearning”. In: 2015 IEEE Symposium on Security and Privacy. IEEE. 2015, pp. 463–480. [29] Richard Chang, Guofei Jiang, Franjo Ivancic, Sriram Sankaranarayanan, and Vitaly Shmatikov. “Inputs of coma: Static detection of denial-of-service vulnerabilities”. In: 2009 22nd IEEE Computer Security Foundations Symposium. IEEE. 2009, pp. 186–199. [30] Steven Chim. Http Proxy Middleware. https://tinyurl.com/y6td93p4. July 2016. [31] Hyoung-Kee Choi and John O Limb. “A behavioral model of web traffic”. In: Proceedings. Seventh International Conference on Network Protocols. IEEE. 1999, pp. 327–334. [32] Jongseok Choi, Jong-gyu Park, Shinwook Heo, Namje Park, and Howon Kim. “Slowloris DoS Countermeasure over WebSocket”. In: International Workshop on Information Security Applications. Springer. 2016, pp. 42–53. [33] Zi Chu, Steven Gianvecchio, Aaron Koehl, Haining Wang, and Sushil Jajodia. “Blog or block: Detecting blog bots through behavioral biometrics”. In: Computer Networks 57.3 (2013), pp. 634–646. [34] Daniel Cid. Analyzing Popular Layer 7 Application DDoS Attacks. Sucuri blog, https://tinyurl.com/y3p7mokb, Accessed: December 6th, 2020. [35] Classification tools . https://tinyurl.com/y6cdav26, Accessed: Mar 31, 2021. May 2019. [36] Cloud services from leading providers including AWS, Microsoft Azure and Alibaba used in 25% of all DDoS attacks in Europe from July 2017 to June 2018. https://www.link11.com/en/blog/threat-landscape/public-cloud-services-increasingly- exploited-to-supercharge-ddos-attacks-new-link11-research/. [37] Cloudflare. DDoS Protection with Cloudflare . Ed. by cloudflare.com. https://tinyurl.com/y6deqtcc, Accessed: Mar 31, 2021. 2017. [38] Cloudflare. How can an HTTP flood be mitigated? https://www.cloudflare.com/learning/ddos/http-flood-ddos-attack/, Accessed: December 6th, 2020. Mar. 2020. [39] Clouds List. https://steel.isi.edu/Projects/Cloud_Misbehavior/. [40] COVID-19 Phishing URL’s. https://maltiverse.com/collection/TYErBHEB8jmkCY9eFseO, Accessed: Mar 31, 2021. 109 [41] Covid19 malicious Hostnames/URL’s. https://maltiverse.com/collection/_IyQknEB8jmkCY9ehYUn, Accessed: Mar 31, 2021. [42] Crypto-PAn. https://en.wikipedia.org/wiki/Crypto-PAn, Accessed: Mar 31, 2021. [43] Cybercrime Tracker. http://cybercrime-tracker.net/, Accessed: Mar 31, 2021. [44] Cybercriminals Abuse Amazon Cloud to Host Linux DDoS Trojans. https://tinyurl.com/yyxwgddt, Accessed: Mar 31, 2021. [45] DDoS-ers Launch Attacks From Amazon EC2. https://www.infosecurity-magazine.com/news/ddos-ers-launch-attacks-from-amazon-ec2/, Accessed: Mar 31, 2021. [46] Roy De Maesschalck, Delphine Jouan-Rimbaud, and Désiré L Massart. “The mahalanobis distance”. In: Chemometrics and intelligent laboratory systems 50.1 (2000), pp. 1–18. [47] Henri Maxime Demoulin, Isaac Pedisich, Nikos Vasilakis, Vincent Liu, Boon Thau Loo, and Linh Thi Xuan Phan. “Detecting asymmetric application-layer denial-of-service attacks in-flight with finelame”. In: 2019{USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 2019, pp. 693–708. [48] Henri Maxime Demoulin, Tavish Vaidya, Isaac Pedisich, Bob DiMaiolo, Jingyu Qian, Chirag Shah, Yuankai Zhang, Ang Chen, Andreas Haeberlen, Boon Thau Loo, et al. “Dedos: Defusing dos with dispersion oriented software”. In: Proceedings of the 34th Annual Computer Security Applications Conference. 2018, pp. 712–722. [49] Dhanalakshmi. The latest cloud hosting service to serve malware. https://www.zscaler.com/blogs/research/latest-cloud-hosting-service-serve-malware. Sept. 2018. [50] F. Doelitzscher, M. Knahl, C. Reich, and N. Clarke. “Anomaly Detection in IaaS Clouds”. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science. V ol. 1. 2013, pp. 387–394. [51] Rohan Doshi, Noah Apthorpe, and Nick Feamster. “Machine learning ddos detection for consumer internet of things devices”. In: 2018 IEEE Security and Privacy Workshops (SPW). IEEE. 2018, pp. 29–35. [52] Mohamed Elsabagh, Dan Fleck, Angelos Stavrou, Michael Kaplan, and Thomas Bowen. “Practical and accurate runtime application protection against dos attacks”. In: International Symposium on Research in Attacks, Intrusions, and Defenses. Springer. 2017. [53] Experts warn hackers have already stolen over $20 Million from Ethereum clients exposing interface on port 8545. https://tinyurl.com/yxv3e29d, Accessed: Mar 31, 2021. [54] Exploit Database. Hashtables Denial of Service. https://www.exploit-db.com/exploits/18296. [55] FastNetMon. https://github.com/pavel-odintsov/fastnetmon. 110 [56] FastNetMon. https://fastnetmon.com/, Accessed: Mar 31, 2022. [57] L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred. “Statistical approaches to DDoS attack detection and response”. In: Proceedings DARPA Information Survivability Conference and Exposition. V ol. 1. 2003, 303–314 vol.1. DOI: 10.1109/DISCEX.2003.1194894. [58] NR Fitri, AHS Budi, I Kustiawan, and SE Suwono. “Low interaction honeypot as the defense mechanism against Slowloris attack on the web server”. In: IOP Conference Series: Materials Science and Engineering. V ol. 850. 1. IOP Publishing. 2020, p. 012037. [59] Flask. https://en.wikipedia.org/wiki/Flask$_$(web$_$framework). [60] Fadaei Fouladi, Eren Kayatas, and Emin Anarim. “Statistical measures: Promising features for time series based DDoS attack detection”. In: MDPI Proceedings. V ol. 2. 2. 2018, p. 96. [61] D. Gavrilis, I. Chatzis, and E. Dermatas. “Flash Crowd Detection Using Decoy Hyperlinks”. In: 2007 IEEE International Conference on Networking, Sensing and Control. Apr. 2007, pp. 466–470. DOI: 10.1109/ICNSC.2007.372823. [62] Scamalytics GDI Content Partners. Scamalytics Release High Risk ISPs For March 2020. https://www.globaldatinginsights.com/content-partners/scamalytics/scamalytics-release-high- risk-isps-for-march-2020/. [63] Mohamad Gebai and Michel R Dagenais. “Survey and analysis of kernel and userspace tracers on Linux: Design, implementation, and overhead”. In: ACM Computing Surveys (CSUR) 51.2 (2018), pp. 1–33. [64] Google. reCAPTCHA v3. https://www.google.com/recaptcha/intro/v3.html, Accessed: Mar 31, 2021. [65] Google Safebrowsing - Malicious URL. https://maltiverse.com/collection/8nxipXAB8jmkCY9euMUp, Accessed: May 31, 2020. [66] Hacking with PHP. Denial of service. http://www.hackingwithphp.com/17/1/9/denial-of-service. [67] X. Han, N. Kheir, and D. Balzarotti. “Evaluation of Deception-Based Web Attacks Detection”. In: Proceedings of the 2017 Workshop on Moving Target Defense. MTD ’17. Dallas, Texas, USA: ACM, 2017, pp. 65–73. ISBN: 978-1-4503-5176-8. DOI: 10.1145/3140549.3140555. [68] Nazrul Hoque, Dhruba K Bhattacharyya, and Jugal K Kalita. “A novel measure for low-rate and high-rate DDoS attack detection using multivariate data analysis”. In: 2016 8th International Conference on Communication Systems and Networks (COMSNETS). IEEE. 2016, pp. 1–2. [69] Imperva. 2020 Cyberthreat Defense Report. https://tinyurl.com/y5jmjuzv, Accessed: Mar 31, 2021. 2020. [70] Imperva. Low Orbit Ion Cannon. Ed. by Imperva. https://tinyurl.com/y3wy32fo, Accessed: Mar 31, 2021. 111 [71] Imperva Incapsula. Q1 2017 Global DDoS Threat Landscape Report. www.incapsula.com, Accessed: December 6th, 2020. May 2017. [72] INDUSFACE. https://tinyurl.com/y4c3ywry, Accessed: December 6th, 2020. 2019. [73] IP Addresses of Attack Sources. https://udger.com/resources/ip-list/known_attack_source, Accessed: Mar 31, 2021. [74] IPinfo.io. Hosted Domains by ASNs Report. https://ipinfo.io/hosting. 2020. [75] IPv4 Private Address Space and Filtering. https://www.arin.net/reference/research/statistics/address_filters/, Accessed: Mar 31, 2021. [76] Steve TK Jan, Qingying Hao, Tianrui Hu, Jiameng Pu, Sonal Oswal, Gang Wang, and Bimal Viswanath. “Throwing darts in the dark? Detecting bots with limited data using neural data augmentation”. In: 2020 IEEE Symposium on Security and Privacy (SP). IEEE. 2020, pp. 1190–1206. [77] Shuyuan Jin and Daniel S Yeung. “A covariance analysis model for DDoS attack detection”. In: 2004 IEEE International Conference on Communications (IEEE Cat. No. 04CH37577). V ol. 4. IEEE. 2004, pp. 1882–1886. [78] M. Jonker, A. King, J. Krupp, C. Rossow, A. Sperotto, and A. Dainotti. “Millions of Targets Under Attack: a Macroscopic Characterization of the DoS Ecosystem”. In: Internet Measurement Conference (IMC). Nov. 2017. [79] Mattijs Jonker, Alistair King, Johannes Krupp, Christian Rossow, Anna Sperotto, and Alberto Dainotti. “Millions of targets under attack: a macroscopic characterization of the DoS ecosystem”. In: Proceedings of the 2017 Internet Measurement Conference. 2017, pp. 100–113. [80] Mattijs Jonker, Anna Sperotto, Roland van Rijswijk-Deij, Ramin Sadre, and Aiko Pras. “Measuring the adoption of DDoS protection services”. In: Proceedings of the 2016 Internet Measurement Conference. 2016, pp. 279–285. [81] Jaeyeon Jung, Balachander Krishnamurthy, and Michael Rabinovich. “Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites”. In: Proceedings of the 11th International Conference on World Wide Web. WWW ’02. Honolulu, Hawaii, USA: ACM, 2002, pp. 293–304. ISBN: 1-58113-449-5. DOI: 10.1145/511446.511485. [82] S. Kandula, D. Katabi, M. Jacob, and A. Berger. “Botz-4-sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds”. In: Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2. NSDI’05. Berkeley, CA, USA: USENIX Association, 2005, pp. 287–300. URL: http://dl.acm.org/citation.cfm?id=1251203.1251224. [83] Kaspersky. Report Finds 18% Rise in DDoS Attacks in Q2 2019. https://tinyurl.com/y258rnpm, Accessed: Mar 31, 2021. 2019. [84] Kaspersky lab. https://tinyurl.com/ybnmogg3, Accessed: Mar 31, 2021. 112 [85] Yoohwan Kim, Wing Cheong Lau, Mooi Choo Chuah, and H.J. Chao. “PacketScore: a statistics-based packet filtering scheme against distributed denial-of-service attacks”. In: IEEE Transactions on Dependable and Secure Computing 3.2 (2006), pp. 141–155. DOI: 10.1109/TDSC.2006.25. [86] Daniel Kopp, Christoph Dietzel, and Oliver Hohlfeld. “DDoS never dies? An IXP perspective on DDoS amplification attacks”. In: International Conference on Passive and Active Network Measurement. Springer. 2021, pp. 284–301. [87] Balachander Krishnamurthy and Jia Wang. “On network-aware clustering of web clients”. In: ACM SIGCOMM Computer Communication Review 30.4 (2000), pp. 97–110. [88] STEEL Lab. BLAG: Blacklist Aggregator. https://steel.isi.edu/members/sivaram/BLAG/. 2020. [89] “Leader”. In: (). https://www.dropbox.com/sh/v0kb7vvo0ytkcfo/AAAxUY7CdfNRlP-SXUT7KFq1a?dl=0, Accessed: Mar 31, 2021. [90] The Security Ledger. https://tinyurl.com/yysvu859. 2018. [91] J. Leyden. Russian serfs paid three dollars a day to break CAPTCHAs. Ed. by The Register. https://tinyurl.com/y2czs7xd, Accessed: December 6th, 2020. Mar. 2008. [92] Kun-Lun Li, Hou-Kuan Huang, Sheng-Feng Tian, and Wei Xu. “Improving one-class SVM for anomaly detection”. In: Proceedings of the 2003 international conference on machine learning and cybernetics (IEEE Cat. No. 03EX693). V ol. 5. IEEE. 2003, pp. 3077–3081. [93] Rui Li, Yu Pang, Jin Zhao, and Xin Wang. “A Tale of Two (Flow) Tables: Demystifying Rule Caching in OpenFlow Switches”. In: International Conference on Parallel Processing. ICPP 2019. Kyoto, Japan, 2019. [94] Q. Liao, H. Li, S. Kang, and C. Liu. “Application layer DDoS attack detection using cluster with label based on sparse vector decomposition and rhythm matching”. In: Security and Communication Networks 8.17 (Nov. 2015), pp. 3111–3120. DOI: 10.1002/sec.1236. [95] Xiaojing Liao, Sumayah Alrwais, Kan Yuan, Luyi Xing, XiaoFeng Wang, Shuang Hao, and Raheem Beyah. “Cloud repository as a malicious service: challenge, identification and implication”. In: Cybersecurity 1.1 (2018), p. 14. [96] Xiaojing Liao, Sumayah Alrwais, Kan Yuan, Luyi Xing, XiaoFeng Wang, Shuang Hao, and Raheem Beyah. “Lurking Malice in the Cloud: Understanding and Detecting Cloud Repository as a Malicious Service”. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016, pp. 1541–1552. [97] Xiaojing Liao, Chang Liu, Damon McCoy, Elaine Shi, Shuang Hao, and Raheem Beyah. “Characterizing long-tail SEO spam on cloud web hosting services”. In: Proceedings of the 25th International Conference on World Wide Web. 2016, pp. 321–332. [98] J. Lindemann. “Towards Abuse Detection and Prevention in IaaS Cloud Computing”. In: 2015 10th International Conference on Availability, Reliability and Security. 2015, pp. 211–217. 113 [99] Mohammad Lotfollahi, Jafari Siavoshani, Hossein Zade, and Mohammdsadegh Saberian. “Deep packet: A novel approach for encrypted traffic classification using deep learning”. In: Soft Computing 24.3 (2020), pp. 1999–2012. [100] Wayback Machine. Internet Archive. https://archive.org/web, Accessed: Mar 31, 2021. 1996. [101] Tasnuva Mahjabin, Yang Xiao, Guang Sun, and Wangdong Jiang. “A survey of distributed denial-of-service attack, prevention, and mitigation techniques”. In: International Journal of Distributed Sensor Networks 13.12 (2017), p. 1550147717741463. [102] Lukas Martinelli. Simulate Hash Collision Attack on a PHP Server. https://github.com/lukasmartinelli/php-dos-attack. [103] Wei Meng, Chenxiong Qian, Shuang Hao, Kevin Borgolte, Giovanni Vigna, Christopher Kruegel, and Wenke Lee. “Rampart: protecting web applications from CPU-exhaustion denial-of-service attacks”. In: 27th{USENIX} Security Symposium ({USENIX} Security 18). 2018. [104] Merit Network, Inc. ’s Network Telescope. https://www.merit.edu/initiatives/#1598371817757-ee63e0a6-5330, Accessed Oct. 31, 2020. [105] Rui Miao, Rahul Potharaju, Minlan Yu, and Navendu Jain. “The Dark Menace: Characterizing Network-based Attacks in the Cloud”. In: Proceedings of the 2015 Internet Measurement Conference. 2015, pp. 169–182. [106] Jelena Mirkovic and Peter Reiher. “A taxonomy of DDoS attack and DDoS defense mechanisms”. In: ACM SIGCOMM Computer Communication Review 34.2 (2004), pp. 39–53. [107] Mehdi Mirza and Simon Osindero. “Conditional generative adversarial nets”. In: arXiv preprint arXiv:1411.1784 (2014). [108] David Mosberger and Tai Jin. “Httperf: a Tool for Measuring Web Server Performance”. In: SIGMETRICS Perform. Eval. Rev. 26.3 (Dec. 1998), pp. 31–37. ISSN: 0163-5999. DOI: 10.1145/306225.306235. [109] M. Najafabadi, T. Khoshgoftaar, C. Calvert, and C. Kemp. “User Behavior Anomaly Detection for Application Layer DDoS Attacks”. In: 2017 IEEE International Conference on Information Reuse and Integration (IRI). Aug. 2017, pp. 154–161. DOI: 10.1109/IRI.2017.44. [110] Stefano Nembrini, Inke R König, and Marvin N Wright. “The revival of the Gini importance?” In: Bioinformatics 34.21 (2018), pp. 3711–3718. [111] NetScout. “What Can NETSCOUT Do for Me?” In: https://www.netscout.com/arbor-ddos#section--1), Retrieved on October 9, 2020 (). [112] Network Telemetry for DDoS. https://linkmeup.ru/blog/927/, Accessed: Mar 31, 2021. [113] Nicolas Niclausse. Tsung. http://tsung.erlang-projects.org/. 114 [114] Georgios Oikonomou and Jelena Mirkovic. “Modeling human behavior for defense against flash-crowd attacks”. In: 2009 IEEE International Conference on Communications. IEEE. 2009, pp. 1–6. [115] Oswaldo Olivo, Isil Dillig, and Calvin Lin. “Detecting and exploiting second order denial-of-service vulnerabilities in web applications”. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015, pp. 616–628. [116] Outage postmortem - July 20, 2016. Accessed: 2017-06-14. [117] Vern Paxson. “Bro: A System for Detecting Network Intruders in Real-Time”. In: Proceedings of the 7th Conference on USENIX Security Symposium. 1998. [118] Theofilos Petsios, Jason Zhao, Angelos D Keromytis, and Suman Jana. “Slowfuzz: Automated domain-independent detection of algorithmic complexity vulnerabilities”. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017, pp. 2155–2168. [119] Xi Qin, Tongge Xu, and Chao Wang. “DDoS Attack Detection Using Flow Entropy and Clustering Technique”. In: International Conference on Computational Intelligence and Security (CIS). 2015, pp. 412–415. DOI: 10.1109/CIS.2015.105. [120] Qurium. AZERBAIJAN AND THE FINEPROXY DIY DDOS SERVICE (REGION40 / QUALITYNETWORK). https://www.qurium.org/alerts/azerbaijan/azerbaijan-and-the-region40-ddos-service/. 2018. [121] Qurium. Azerbaijan and the FineProxy DIY DDoS Service (Region40 / QualityNetwork). https://www.qurium.org/alerts/azerbaijan/azerbaijan-and-the-region40-ddos-service/. 2018. [122] Qurium. FineProxy Used to Launch DDoS Attack Against Site Critical of Azerbaijani State Oil Company’s Leader. https://tinyurl.com/y2t5fqdm. 2018. [123] Radware. JS cookie challenges. https://tinyurl.com/y2bqmtac, Accessed: December 6th, 2020. Mar. 2020. [124] Sivaramakrishnan Ramanathan, Jelena Mirkovic, and Minlan Yu. “BLAG: Improving the Accuracy of Blacklists”. In: Proceedings of NDSS. 2020. [125] “Rampart’s code”. In: (). https://github.com/cuhk-seclab/rampart. [126] S. Ranjan, R. Swaminathan, M. Uysal, and E. Knightly. “DDoS-Resilient Scheduling to Counter Application Layer Attacks Under Imperfect Detection”. In: Proceedings IEEE INFOCOM 2006. Apr. 2006, pp. 1–13. DOI: 10.1109/INFOCOM.2006.127. [127] Red Hat. Introduction to eBPF in Red Hat Enterprise Linux 7. https://www.redhat.com/en/blog/introduction-ebpf-red-hat-enterprise-linux-7, Accessed: Mar 31, 2021. [128] Regional Threat Perspectives, Fall 2019: United States. https://www.f5.com/labs/articles/threat- intelligence/regional-threat-perspectives--fall-2019--united-states. 115 [129] Marc Roig, Marisa Catalan, and Bernat Gastón. “Ensembled Outlier Detection using Multi-Variable Correlation in WSN through Unsupervised Learning Techniques.” In: IoTBDS. 2019, pp. 38–48. [130] Peter J Rousseeuw and Katrien Van Driessen. “A fast algorithm for the minimum covariance determinant estimator”. In: Technometrics 41.3 (1999), pp. 212–223. [131] Peter J Rousseeuw and Katrien Van Driessen. “A fast algorithm for the minimum covariance determinant estimator”. In: Technometrics 41.3 (1999), pp. 212–223. [132] Scikit learn. “EllipticEnvelope”. In: (). https://scikit-learn.org/stable/modules/generated/sklearn.covariance.EllipticEnvelope.html. [133] Selenium. Selenium Webdriver. https://tinyurl.com/y6a4czhe, Accessed: December 6th, 2020. 2012. [134] Verisign Security Services. Verisign DDoS Trends Report Q2 2016. https://verisign.com/, Accessed: December 6th, 2020. June 2016. [135] Mark Shtern, Roni Sandel, Marin Litoiu, Chris Bachalo, and Vasileios Theodorou. “Towards mitigation of low and slow application ddos attacks”. In: 2014 IEEE International Conference on Cloud Engineering. IEEE. 2014, pp. 604–609. [136] Kyle A Simpson, Simon Rogers, and Dimitrios P Pezaros. “Per-host DDoS mitigation by direct-control reinforcement learning”. In: Transactions on Network and Service Management 17.1 (2019), pp. 103–117. [137] Suphannee Sivakorn, Iasonas Polakis, and Angelos D Keromytis. “I am robot:(deep) learning to break semantic image CAPTCHAs”. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE. 2016, pp. 388–403. [138] L. Spitzner. Honeytokens. Ed. by SecurityFocus.com. https://tinyurl.com/y4gzbjqz. July 2003. [139] JacobMisirian SplittyDev. Python implementation of a slowloris DoS tool. https://github.com/ProjectMayhem/PySlowLoris. [140] A Srivastava, BB Gupta, A Tyagi, Anupama Sharma, and Anupama Mishra. “A recent survey on DDoS attacks and defense mechanisms”. In: International Conference on Parallel Distributed Computing Technologies and Applications. Springer. 2011, pp. 570–580. [141] statista, ed. Combined desktop and mobile visits to Amazon.com from February 2018 to April 2019 (in millions). https://tinyurl.com/y25d8ln8, Accessed: Mar 31, 2021. May 2019. [142] statista, ed. Most popular retail websites in the United States as of December 2019, ranked by visitors (in millions). https://www.statista.com/statistics/271450/monthly-unique-visitors-to-us-retail-websites/, Accessed: Mar 31, 2021. Sept. 2020. [143] STEEL Lab. FRADE. https://steel.isi.edu/Projects/frade/. 2021. 116 [144] SystemTap. SystemTap. https://sourceware.org/systemtap/. [145] Liran Tal. The state of open source security report. https://res.cloudinary.com/snyk/image/upload/v1551172581/The-State-Of-Open-Source-Security- Report-2019-Snyk.pdf. Mar. 2019. [146] Rajat Tandon. A Survey of Distributed Denial of Service Attacks and Defenses. https://arxiv.org/pdf/2008.01345.pdf. 2020. arXiv: 2008.01345[cs.CR]. [147] Rajat Tandon, Jelena Mirkovic, and Pithayuth Charnsethikul. “Quantifying Cloud Misbehavior”. In: 2020 IEEE 9th International Conference on Cloud Networking (CloudNet). IEEE. 2020, pp. 1–8. [148] Rajat Tandon, Abhinav Palia, Jaydeep Ramani, Brandon Paulsen, Genevieve Bartlett, and Jelena Mirkovic. “Defending Web Servers Against Flash Crowd Attacks”. In: 2019 IEEE 27th International Conference on Network Protocols (ICNP). IEEE Computer Society. 2019, pp. 1–2. [149] Rajat Tandon, Abhinav Palia, Jaydeep Ramani, Brandon Paulsen, Genevieve Bartlett, and Jelena Mirkovic. “Defending web servers against flash crowd attacks”. In: International Conference on Applied Cryptography and Network Security. Springer. 2021, pp. 338–361. [150] Rajat Tandon, Haoda Wang, Nicolaas Weideman, Christophe Hauser, and Jelena Mirkovic. “Poster: LEADER (Low-Rate Denial-of-Service Attacks Defense)”. In: (). [151] The Long Tail of Attacker Innovation. https://tinyurl.com/yp74j3f8, Accessed: Mar 31, 2022. [152] The Open Web Application Security Project (OWASP). Regular expression Denial of Service - ReDoS. [153] The UCSD Network Telescope. https://www.caida.org/projects/network_telescope/, Accessed: Mar 31, 2021. [154] Top 10 ASNs. https://openphish.com/phishing_activity.html, Accessed: Mar 31, 2021. [155] Top 10 ASNs. https://tinyurl.com/y36n5lqs, Accessed: Mar 31, 2021. [156] Top 5 Risks of Cloud Computing. https://www.calyptix.com/research-2/top-5-risks-of-cloud-computing/, Accessed: Mar 31, 2021. [157] Top 500 Web Hosts. https://www.hostingkingdom.com/top-500, Accessed: Mar 31, 2021. [158] Top Cloud Data Security Risks, Threats, And Concerns. https://blog.panoply.io/top-cloud-security-threats-risks-and-concerns, Accessed: Mar 31, 2021. [159] Top Cloud Security Risks Every Company Faces. https://www.whizlabs.com/blog/cloud-security-risks/, Accessed: Mar 31, 2021. 117 [160] S. Torabi, E. Bou-Harb, C. Assi, E. B. Karbab, A. Boukhtouta, and M. Debbabi. “Inferring and Investigating IoT-Generated Scanning Campaigns Targeting A Large Network Telescope”. In: IEEE Transactions on Dependable and Secure Computing (2020), pp. 1–1. [161] Total size of the public cloud computing market from 2008 to 2020. https://tinyurl.com/y5g4xbaq, Accessed: Mar 31, 2021. [162] Marino Urso. “High performance eBPF probe for Alternate Marking performance monitoring”. PhD thesis. Politecnico di Torino, 2020. [163] Vickie Li. Preg_replace() PHP Function Exploitation. https://www.yeahhub.com/code-execution-preg_replace-php-function-exploitation/. [164] Natalija Vlajic, Mashruf Chowdhury, and Marin Litoiu. “IP Spoofing In and Out of the Public Cloud: From Policy to Practice”. In: Computers 8.4 (2019), p. 81. [165] Volumetric DDoS Attacks. https://tinyurl.com/2p8rj8pw, Accessed: Mar 31, 2021. [166] Daniel Wagner, Daniel Kopp, Matthias Wichtlhuber, Christoph Dietzel, Oliver Hohlfeld, Georgios Smaragdakis, and Anja Feldmann. “United We Stand: Collaborative Detection and Mitigation of Amplification DDoS Attacks at Scale”. In: ACM SIGSAC Conference on Computer and Communications Security. 2021, pp. 970–987. [167] J. Wang, X. Yang, and K. Long. “Web DDoS Detection Schemes Based on Measuring User’s Access Behavior with Large Deviation”. In: 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011. Dec. 2011, pp. 1–5. DOI: 10.1109/GLOCOM.2011.6133798. [168] Shuhao Wang, Cancheng Liu, Xiang Gao, Hongtao Qu, and Wei Xu. “Session-based fraud detection in online e-commerce transactions using recurrent neural networks”. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer. 2017, pp. 241–252. [169] Brian White, Jay Lepreau, et al. “An Integrated Experimental Environment for Distributed Systems and Networks”. In: OSDI. 2002. [170] Wikipedia. Curse of dimensionality. https://en.wikipedia.org/wiki/Curse_of_dimensionality/, Accessed: December 6th, 2020. [171] Wikipedia. Log rotation. https://en.wikipedia.org/wiki/Log_rotation/, Accessed: Mar 31, 2021. [172] Wikipedia. Replay attack. https://en.wikipedia.org/wiki/Replay_attack, Accessed: Mar 31, 2021. [173] Wikipedia. “Slowloris”. In: https://en.wikipedia.org/wiki/slowloris_(computer_security), Retrieved on October 9, 2020 (). [174] Wikipedia: Cloud computing providers. https://en.wikipedia.org/wiki/Category:Cloud_computing_providers/, Accessed: Mar 31, 2020. 118 [175] Mayank Dhiman Will Glazier. “Automation Attacks at Scale - Credential Exploitation”. In: Grehack : 2017 (2017). https://grehack.fr/data/2017/slides/GreHack17_Automation_Attacks_at_Scale_paper.pdf. [176] Evan Winslow. Bot Detection via Mouse Mapping. https://tinyurl.com/y3kbgwuw. Sept. 2009. [177] Yang Xiang, Ke Li, and Wanlei Zhou. “Low-rate DDoS attacks detection and traceback by using new information metrics”. In: IEEE transactions on information forensics and security 6.2 (2011), pp. 426–437. [178] Y . Xie and S. Z. Yu. “Monitoring the Application-Layer DDoS Attacks for Popular Websites”. In: IEEE/ACM Transactions on Networking 17.1 (Feb. 2009), pp. 15–25. ISSN: 1063-6692. DOI: 10.1109/TNET.2008.925628. [179] Yang Xu and Yong Liu. “DDoS attack detection under SDN context”. In: IEEE INFOCOM. IEEE. 2016, pp. 1–9. [180] T. Yatagai, T. Isohara, and I. Sasase. “Detection of HTTP-GET flood Attack Based on Analysis of Page Access Behavior”. In: 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing. Aug. 2007, pp. 232–235. DOI: 10.1109/PACRIM.2007.4313218. [181] Xiaoyong Yuan, Chuanhuang Li, and Xiaolin Li. “DeepDefense: identifying DDoS attack via deep learning”. In: SMARTCOMP. IEEE. 2017, pp. 1–8. [182] D. Zhang, H. Wang, and K. G. Shin. “Change-Point Monitoring for the Detection of DoS Attacks”. In: IEEE Transactions on Dependable and Secure Computing 1 (Oct. 2004). [183] Benjamin Zi Hao Zhao, Muhammad Ikram, Hassan Jameel Asghar, Mohamed Ali Kaafar, Abdelberi Chaabane, and Kanchana Thilakarathna. “A Decade of Mal-Activity Reporting: A Retrospective Analysis of Internet Malicious Activity Blacklists”. In: Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security. 2019, pp. 193–205. [184] Zhi-Hua Zhou. “Ensemble learning”. In: Machine Learning. Springer, 2021, pp. 181–210. 119
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Mitigating attacks that disrupt online services without changing existing protocols
PDF
Collaborative detection and filtering of DDoS attacks in ISP core networks
PDF
AI-enabled DDoS attack detection in IoT systems
PDF
Leveraging programmability and machine learning for distributed network management to improve security and performance
PDF
Detecting and characterizing network devices using signatures of traffic about end-points
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
Towards highly-available cloud and content-provider networks
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
Studying malware behavior safely and efficiently
PDF
Efficient delivery of augmented information services over distributed computing networks
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Supporting faithful and safe live malware analysis
PDF
When AI helps wildlife conservation: learning adversary behavior in green security games
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
Efficient pipelines for vision-based context sensing
PDF
Security-driven design of logic locking schemes: metrics, attacks, and defenses
PDF
Towards trustworthy and data-driven social interventions
PDF
Toward sustainable and resilient communities with HCI: physical structures and socio-cultural factors
PDF
Satisfying QoS requirements through user-system interaction analysis
Asset Metadata
Creator
Tandon, Rajat
(author)
Core Title
Protecting online services from sophisticated DDoS attacks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2022-08
Publication Date
07/28/2022
Defense Date
06/27/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
blocklists,Clouds,DDoS,exploit based DDoS attacks,flash crowd DDoS attacks,network security,OAI-PMH Harvest,volumetric DDoS attacks
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mirkovic, Jelena (
committee chair
), Bartlett, Genevieve (
committee member
), Raghavan, Barath (
committee member
), Vayanos, Phebe (
committee member
), Wang, Ning (
committee member
)
Creator Email
rajattan@usc.edu,rajattandonmit@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111375412
Unique identifier
UC111375412
Legacy Identifier
etd-TandonRaja-11025
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Tandon, Rajat
Type
texts
Source
20220728-usctheses-batch-962
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
blocklists
DDoS
exploit based DDoS attacks
flash crowd DDoS attacks
network security
volumetric DDoS attacks